- A. R. Hsu, G. Zhou, Y. Cherapanamjeri, Y. Huang, A. Odisho, P. R. Carroll, B. Yu (2024). Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition. ICLR 
- Y. S. Tan, O. Ronen, T. Saarinen, B. Yu (2024). The Computational Curse of Big Data for Bayesian Additive Regression Trees: a Hitting Time Analysis. arXiv 
- O. Ronen, A. I. Humayun, R. Balestriero, R. Baraniuk, B. Yu (2024). ScaLES: Scalable Latent Exploration Score for Pre-Trained Generative Networks. arXiv 
- S. Hayou, N. Ghosh, B. Yu (2024). The Impact of Initialization on LoRA Fineuning Dynamics. NeurIPS 
- B. Yu (2024). After Computational Reproducibility: Scientific Reproducibility and Trustworthy AI (discussion of Donoho's paper "Data Science at the Singularity"). Harvard Data Science Review (HDSR) 
- S. Hayou, N. Ghosh, B. Yu (2024). LoRA+: Efficient Low Rank Adaptation of Large Models. ICML 
- C. F. Elliott, J. Duncan, T. M. Tang, M. Behr, K. Kumbier, B. Yu (2024). Designing a data science simulation with MERITS: a primer. arXiv 
- Y. Chen, C. Singh, X. Liu, S. Zuo, B. Yu, H. He, J. G (2024). Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning. COLING 
- N. R. Mallinar, A. Zane, S. Frei, B. Yu (2024). Minimum-Norm Interpolation Under Covariate Shift. ICML 
- L. Sun, A. Agarwal, A. Kornblith, B. Yu, C. Xiong (2024). ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance. ICML 
- Q. Zhang, C. Singh, L. Liu, X. Liu, B. Yu, J. Gao, T. Zhao (2023). Tell your model where to attend: post-hoc attention steering for LLMs. ICLR 
- Q. Wang*, T. M. Tang*, N. Youlton, C. S. Weldy, A. M. Kenney, O. Ronen, J. W. Hughes, E. T. Chin, S. C. Sutton. A. Agarwal, X. Li, M. Behr, K. Kumbier, C. S. Moravec, W. H. W. Tang, K. B. Margulies, T. P. Cappola, A. J. Buitte, R. Arnaout, J. B. Brown, J. R. Priest, V. N. Parikh, B. Yu*, E. Ashley* (2023). Epistasis regulates genetic control of cardiac hypertrophy. MedrXiv 
- E. Irajizad, A. Kenney, T. Tang, J. Vykoukal, R. Wu, E. Murage, J. B. Dennison, M. Sans, J. P. Long, M. Loftus, J. A. Chabot, M. D. Kluger, F. Kastrinos, L. Brais, A. Babic, K. Jajoo, L. S. Lee, T. E. Clancy, K. Ng, A. Bullock, J. M. Genkinger, A. Maitra, K. A. Do, B. Yu, B. W. Wolpin, S. Hanash, J. F. Fahrmann. (2023) A blood-based metabolomic signature predictive of risk for pancreatic cancer. Cell Reports Medicine 
- R. Dwivedi, C. Singh, B. Yu, M. Wainwright (2023). Revisiting minimum description length complexity in overparametrized models. JMLR 
- N. Ghosh, S. Frei, W. Ha, B. Yu (2023). The effect of SGD batch size on autoencoder learning: sparsity, sharpness and feature learning. arXiv 
- J. Wu, P. L. Bartlett, M. Telgarsky, B. Yu (2024). Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency. arXiv 
- R. Netzorg, J. Li, B. Yu (2024). Improving prototypical part networks with reward reweighting, reselection, and retraining. ICML 
- A. Agarwal, A. M. Kenny, Y. S. Tan, T. M. Tang, B. Yu (2023). MDI+: a flexible random forest-based feature importance framework. arXiv 
- A. R. Hsu, Y. Cherapanamjeri, B. Park, T. Naumann, A. Odisho, and B. Yu (2023). Diagnosing transformers: illuminating feature space for clinical decison-making. ICLR 
- C. Singh*, A. R. Hsu*, R. Antonello, S. Jain, A. G. Huth, B. Yu and J. Gao (2023). Explaining black box text modules in natural language with language models. NeurIPS XAI In Action Workshop 
- B. Park, X. Wu, B. Yu, A. Zhou (2022). Offline evaluation in RL: soft stability weighting to combine fitted Q-learning and model-based methods. NeurIPS 3rd Offline Reinforcement Learning Workshop 
- Y. Tan, C. Singh, K. Nasseri, A. Agarwal, B. Yu (2022). Fast interpretable greedy-tree sums (FIGS). arXiv 
- A. Agarwal, Y. Tan, O. Ronen, C. Singh, B. Yu (2022). Hierarchical shrinkage: improving accuracy and interpretability of tree-based methods. arXiv 
- N. Ghosh, S. Mei, and B. Yu (2021). The three stages of dynamics in high-dimensional kernel methods. arXiv 
- Y. Tan, A. Agarwal, and B. Yu (2021). A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds. arXiv 
- N. Altieri, B. Park, J. DeNero, A. Odisho, B. Yu. (2021). Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity. JAMIA Open. 2021 Sept. 30 4(3). 
- C. Singh, W. Ha and B. Yu (2021). Interpreting and Improving Deep-Learning Models with Reality Checks. arXiv 
- B. Yu and C. Singh (2021). Seven Principles for Rapid-Response Data Science: Lessons from Covid-19 Forecasting. arXiv 
- Ha, W., Singh, C., Lanusse, F., Song, E., Dang, S., He, K., Upadhyayula, S. and Yu, B., 2021. Adaptive wavelet distillation from neural networks through interpretations. NeurIPS 
- M. Behr, Y. Wang, X. Li, B. Yu (2021). Provable Boolean Interaction Recovery from Tree Ensemble obtained via Random Forests. arXiv 
- N. Altieri, B. Park, M. Olson, J. DeNero, A. Odisho, B. Yu. (2020). Enriched Annotations for Tumor Attribute Classification from Pathology Reports with Limited Labeled Data. https://arxiv.org/abs/2012.08113 
- M. Behr*, K. Kumbier*, A. Cordova-Palomera, M. Aguirre, E. Ashley, A. Butte, R. Arnaout, J. B. Brown, J. Preist*, B. Yu* (2020) Learning epistatic polygenic phenotypes with Boolean interactions https://www.biorxiv.org/content/10.1101/2020.11.24.396846v1 
- B. Norgeot*, G. Quer, B. K. Beaulieu-Jones, A. Torkamani, R. Dias, M. Gianfrancesco, R. Arnaout, I. S. Kohane, S. Saria, E. Topol, Z. Obermeyer, B. Yu & A. Butte* (2020). Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist, Nature Medicine, 26, 1320–1324. 
- B. Yu (2020). Stability expanded, in reality. Harvard Data Science Review (HDSR). 
- B. Yu and R. Barter (2020). Data science process: one culture. JASA. 
- R. Dwivedi*, Y. Tan*, B. Park, M. Wei, K. Horgan, D. Madigan*, B. Yu* (2020). Stable discovery of interpretable subgroups via calibration in causal studies (staDISC). International Statistical Review and also at arxiv.org/abs/2008.10109 
- X. Li, T. M. Tang, X. Wang, J. A. Kocher, B. Yu (2020). A stability-driven protocol for drug response interpretable prediction (staDRIP). NeurISP workshop on ML4H (Machine learning for Health) Extended Abstract. https://arxiv.org/abs/2011.06593 
- A. Y. Odisho, B. Park, N. Altieri, J. DeNero, M. R Cooperberg, P. R .Carroll, B. Yu (2020). Natural language processing systems for pathology parsing in limited data environments with uncertainty estimation. Journal of American Medical Informatics Association (JAMIA) Open. 
- 
	L. Reiger, J. W. Murdoch, S. Singh, B. Yu (2020). Interpretations are Useful: Penalizing Explanations to Align Neural Networks with Prior Knowledge. ICML Proceedings. 
- 
	C. Singh, W. Ha, F. Lanusse, V. Boehm , J. Liu, B. Yu (2020). Transformation Importance with Applications to Cosmology ICLR Workshop paper. 
- 	     
      Raaz Dwivedi, Chandan Singh, Bin Yu, Martin J. Wainwright (2020) Revisiting complexity and the bias-variance tradeoff. arxiv.org/abs/2006.10189 
- 
      Nick Altieri, Rebecca Barter, James Duncan (UCB-biostats), Raaz Dwivedi, Karl Kumbier (UCSF), Xiao Li, Robbie Netzorg, Briton Park, Chandan Singh, Yan Shuo Tan, Tiffany Tang, Yu Wang, Bin Yu. (2020) Curating a COVID-19 data repository and forecasting county-level death counts in the United States. arxiv.org/abs/2005.07882. 7-day prediction results through visualizations and maps. Short talk video at Responsible Data Science Summit, July 28, 2020. 
- 
      B. Yu and K. Kumbier (2020) Verdical data science PNAS. 117 (8), 3920-3929. QnAs with Bin Yu. 
- 
      R. Dwivedi, N. Ho, K. Khamaru, M. J. Wainwright, M. I. Jordan and B. Yu (2020) Sharp Analysis of Expectation-Maximization for Weakly Identifiable Mixture Models AISTATS. https://arxiv.org/abs/1902.00194 
- 
      R. Dwivedi, N. Ho, K. Khamaru, M. J. Wainwright, M. I. Jordan and B. Yu (2020) Singularity, Misspecification and the Convergence Rate of EM Annals of Statistics. https://arxiv.org/abs/1810.00828 
- 
      Y. Chen, R. Dwivedi, M. J. Wainwright and B. Yu (2020) Fast Mixing of Metropolized Hamiltonian Monte Carlo: Benefits of Multi-Step Gradients arXiv preprint https://arxiv.org/abs/1905.12247 
- 
      R. Dwivedi, Y. Chen, M. J. Wainwright and B. Yu (2020) Log-concave Sampling: Metropolis Hastings Algorithms are Fast (2019) JMLR (accepted). http://jmlr.org/papers/v20/19-306.html 
- 
      Y. Chen, R. Dwivedi, M. J. Wainwright and B. Yu (2020) Fast MCMC Algorithms on Polytopes JMLR (accepted). http://jmlr.org/papers/v19/18-158.html 
- 
      Y. Chen, R. Dwivedi, M. J. Wainwright and B. Yu (2020) Vaidya Walk: A Sampling Algorithm Based on Volumetric-Logarithmic Barrier Allerton Conference 2017 https://ieeexplore.ieee.org/abstract/document/8262876/ 
- 
      W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu (2019) Interpretable machine learning: definitions, methods, and applications. PNAS, 116 (44) 22071-22080. 
- 
      Y. Wang, S. Wu and B. Yu (2019) Unique Sharp Local Minimum in â1-minimization Complete Dictionary Learning JMLR (accepted with minor revision). https://arxiv.org/abs/1902.08380 
- 
      Y. Chen, R. Abbasi-Asl, A. Bloniarz, M. Oliver, B. Willmore, J. Gallant, and B. Yu (2018) The DeepTune framework for modeling and characterizing neurons in visual cortex area V4 https://www.biorxiv.org/content/10.1101/465534v1 
- 
      K. Kumbier, S. Sumanta, J. B. Brown, S. Celniker, and B. Yu (2018) Refining interaction search through signed iterative Random Forests. https://arxiv.org/abs/1810.07287 
- 
      C. Singh, W. J. Murdoch, and B. Yu (2018) Hierarchical interpretations for neural network predictions. https://arxiv.org/abs/1806.05337 
- 
      Y. Chen C. Jin, and B. Yu (2018) Stability and Convergence Trade-off of Iterative Optimization Algorithms. https://arxiv.org/abs/1804.01619 
- 
      J. Murdoch, P. Liu, and B. Yu (2018) Beyond word importance: contextual decomposition to extract interactions from LSTMs. Proc. ICLR 2018. https://arxiv.org/abs/1705.07356 
- 
      R. Diwivedi, Y. Chen, M. J. Wainwright, and B. Yu (2018) Log-concave sampling: Metropolis-Hastings algorithms are fast! https://arxiv.org/abs/1801.02309. 
- 
      Y. Chen, R. Dwivedi, M. J. Wainwright, and B. Yu (2017) Fast MCMC sampling algorithms on polytopes https://arxiv.org/abs/1710.08165. 
- 
      B. Yu and K. Kumbier (2018) Artificial Intelligence and Statistics Frontiers of Information Technology and Electronic Engineering. 19(1), 6-9. 
- 
      R. Abbasi-Asl and B. Yu (2017) Structural Compression of Convolutional Neural Networks Based on Greedy Filter Pruning https://arxiv.org/abs/1705.07356 
- 
      R. Abbasi-Asl and B. Yu (2017) Interpreting Convolutional Neural Networks Through Compression. NIPS 2017. Symposium on Interpretable Machine Learning. (also https://arxiv.org/abs/1711.02329) 
- 
      S. Kunzel, J. Sekhon, P. Bickel, and B. Yu (2017) Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning https://arxiv.org/abs/1706.03461. 
- 
      S. Basu, K. Kumbier, J. B. Brown, and B. Yu (2018) iterative Random Forests to discover predictive and stable high-order interactions PNAS (early edition). 
- 
      S. Balakrishnan, M. Wainwright, B. Yu (2017) Statistical Guarantees for the EM algorithm: from population to sample-based analysis. Annals of Statistics, 45(1), 77 - 120. 
- 
      R. Barter and B. Yu (2017) Superheat: An R package for creating beautiful and extendable heatmaps for visualizing complex data JCGS (revised). https://github.com/rlbarter/superheat 
- 
      H. Liu and B. Yu (2017) Comments on: High-dimensional simultaneous inference with the bootstrap by Dezeure et al Test. 26: 740-750. 
- 
      C. Carson et al (2016).
      
       UC Berkeley Data Science Planning Initiative Faculty Advisory Board (FAB)
Report.
      
      
       FAB Report Executive Summary
      
      
- 
      S. Wu and B. Yu (2018).
      
       Local identifiability of l1-minimization dictionary learning: a sufficient and almost necessary condition.
      
      
       JMLR. 18, 1 - 56.
      
      
- 
      K. Rohe, T. Qin and B. Yu (2016).
      
       Co-clustering directed graphs to discover asymmetries
and directional communities.
      
      
       Proc. National Academy of Sciences (PNAS), 113(45), 12679 - 12684.
      
      
- 
      R. E. Kass, B. S. Caffo, M. Davidian, X. Meng, B. Yu, Nancy Reid* (2016).
      
       Ten simple rules for effective statistical practice.
      
      
       PLoS Comput. Biol., 12(6): e1004961. doi:10.1371/journal.pcbi.1004961
      
      
- 
      Siqi Wu, Antony Joseph, Ann S. Hammonds, Susan E. Celniker, Bin Yu*, and Erwin Frise* (2016).
      
       Stability-driven nonnegative matrix factorization to
interpret spatial gene expression and build local
gene networks (with support information).
      
      
       PNAS, pp. 4290 - 4295.
      
      
- 
      A. Bloniarz, C. Wu, B. Yu, A. Talwalkar (2016).
      
       Supervised neighborhoods for distributed nonparametric regression.
      
      
       Proc. of AISTATS, Barcelona, Spain.
      
      
- 
      B. Yu (2015).
      
       Data wisdom for data science.
      
      
       Operational Database Management Systems (ODBMS.ORG).
      
      
- 
      A. Bloniarz, H. Liu, C. Zhang, J. Sekhon, and B. Yu* (2015).
      
       Lasso adjustments of treatment effect estimates in randomized experiments.
      
      
       PNAS. 113, 7383 - 7390.
      
      
- 
      P. Ma, M. W. Mahoney and B. Yu (2015).
      
       A Statistical Perspective on Algorithmic Leveraging.
      
      
       Journal of Machine Learning Research,
      
      16, (2015), 861-911.
      
- 
      T. Moon, Y. Wang, Y. Liu, and B. Yu (2015).
      
       Evaluation of a MISR-based high-resolution aerosol retrieval method using AERONET DRAGON campaign data.
      
      
       IEEE Transactions on Geoscience and Remote Sensing, 53, 4328-4339.
      
      
- 
      B. Yu (2014).
      
       Let us own data science.
      
      
       video
      
      
       IMS Bulletin
      
      
       Institute of Mathematical Statistics (IMS) Presidental Address, ASC-IMS Joint Co
nference, Sydney, July, 2014.
      
      
- 
      G. Schiebinger, M. J. Wainwright and B. Yu (2014).
      
       The geometry of kernelized spectral clustering.
      
      
       Annals of Statistics, 43, 819-846.
      
      
- 
      L. Miratrix, J. Jia, B. Yu, B. Gawalt, L. El Ghaoui, L. Barnesmoore, S. Clavier (2014).
      
       Concise comparative summaries (CCS) of large text corpora with a human experiment.
      
      
       Ann. Applied Statist., 8, 499-529.
      
      
- 
      Y. Benjamini and B. Yu (2014).
      
       The shuttle estimator for explainable variance in fMRI experiments.
      
      
       Annals of Applied Statistics, 7, 2007-2033.
      
      
- 
      D. Bean, P. Bickel, N. El Karoui and B. Yu (2014).
      
       Optimal M-estimation in high-dimensional regression.
      
      
       Proceedings of National Academy of Sciences, 110, 1456314568.
      
      
- 
      N. El Karoui, D. Bean, P. Bickel, C. Lim, and B. Yu (2014).
      
       On robust regression with high-dimensional predictors.
      
      
       Proceedings of National Academy of Sciences, 110, 1455714562.
      
      
- 
      P. Ma, M. W. Mahoney, B. Yu (2014).
      
       A Statistical Perspective on Algorithmic Leveraging.
      
      
       Proc. of International Conference on Machine Learning (ICML)
      
      (This conference paper contains some of preliminary results of the journal submission
      
       Ma et al. (2015)
      
      )
      
- 
      A. Bloniarz, A. Talwalkar, J. Terhorst, M. Jordan, D. Patterson, B. Yu and Y. Song (2014).
      
       Changepoint Analysis for Efficient Variant Calling.
      
      
       Proc. of RECOMB 2014 (to appear).
      
      
- 
      Tao Shi (2013),
      
       A conversation with Professor Bin Yu
      
      
       International Chinese Statistical Association (ICSA) Bulletin, Vol 25, Issue 2, pp 85-98.
      
      (
      
       Selected Parts in Statblogs
      
      )
      
- 
      A. Joseph and B. Yu (2016).
      
       The impact of regularization on spectral clustering.
      
      
       Annals of Statistics. 4, 1765 - 1791.
      
      
- 
      C. Lim and B. Yu (2016).
      
       Estimation Stability with Cross Validation (ESCV)
      
      
       Journal of Computational and Graphical Statistics. 25, 464 - 492.
      
      
- 
      A. S. Hammonds, C. A. Bristow, W. W. Fisher, R. Weiszmann, S. Wu, V. Hartenstein, M. Kellis,
    B. Yu, E. Frise, and S. E. Celniker (2013).
      
      
      
       Spatial expression of transcription factors in Drosophila embryonic organ development.
      
      
       Genome Biology, 14(12), R140.
      
      
- 
      H. Liu and B. Yu (2013).
      
       Asymptotic properties of Lasso+mLS and Lasso+Ridge in sparse high-dimensional linear regression.
      
      
       Electron. J. Statist., 7, 312-3169.
      
      
- 
      J. Mairal and B. Yu (2013).
      
       Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows.
      
      
       Journal of Machine Learning Research, 14, 2449-2485.
      
      
- 
      Y. Wang, X. Jiang, B. Yu, M. Jiang (2013).
      
       A Hierarchical Bayesian Approach for Aerosol Retrieval Using MISR Data.
      
      
       J. American Statistical Association, 108, 483-493.
      
      
- 
      Y. He, J. Jia and B. Yu (2013).
      
       Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs.
      
      
      
      
       Annals of Statistics, 41(4), 1742-1779.
      
      
- 
      B. Yu (2013).
      
       Stability.
      
      
       Bernoulli, 19 (4), 1484-1500. (Invited paper for the Special Issue commemorating the 300th anniversary of the publication of Jakob Bernoullis Ars Conjectandi in 1712)
      
      
- 
      J. Mairal and B. Yu (2013).
      
       Discussion on Grouping Strategies and Thresholding for High Dimensional Linear Models
      
      
       Journal of Statistical Planning and Inference, 143, 1451-1453.
      
      
- 
      C. Uhler, G. Raskutti, and P. Buhlmann and B. Yu (2013).
      
       Geometry of faithfulness assumption in causal inference.
      
      
       Annals of Statistics, 41, 436-463.
      
      
- 
      L. Miratrix, J. Sehkon, and B. Yu (2013).
      
       Adjusting Treatment Effect Estimates by Post-Stratification in
        Randomized Experiments.
      
      
       Journal of Royal Statistical Society, Series B, 75 (part 2), 369-396.
      
      
- 
      J. Jia, K. Rohe and B. Yu (2013)
      
       The Lasso under Poisson-like Heteroscadecity.
      
      
       Statistica Sinica, 23, 99-118.
      
      
- 
      S. Negahban, P. Ravikumar, M. Wainwrigt, and B. Yu (2012)
      
       A unified framework for high-dimensional analysis of
        
        M-estimators with decomposable regularizers.
      
      
       Statistical Science, 27, 538-557.
      
      
- 
      G. Raskutti, M. Wainwrigt, and B. Yu (2012)
      
       Minimax-optimal rates for sparse additive models over
        
        kernel classes via convex programming.
      
      
       J. Machine Learning Research, 13, 389-427.
      
      
- 
      J. Mairal and B. Yu (2012).
      
       Complexity analysis of the Lasso regularization path.
      
      
       Proc. of International Conference on Machine Learning (ICML) .
      
      
- 
      Yanfeng Gu, Shizhe Wang, Tao Shi, Yinghui Lu, Eugene E. Clothiaux, and Bin Yu (2012).
      
       Multiple-kernel learning-based unmixing algorithm for estimation of cloud fractions with MODIS and CLOUDSAT data.
      
      
       Proc. of IEEE International Geoscience and Remote Sensing Symposium (IGRSS).
      
      
- 
      S. Nishimoto, A. T. Vu, T. Naselaris, Y. Benjamini, B. Yu, J. L. Gallant (2011).
      
       Reconstructing visual experiences from brain activity evoked by natural movies.
      
      
       Current Biology, 21(19), 1641-1646.
      
      
       related videos
      
      
- 
      P. Ravikumar, M. Wainwright, G. Raskutti, B. Yu (2011).
      
       High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence.
      
      
       Electronic Journal of Statistics, 5, 935-980.
      
      
- 
      G. Raskutti, M. Wainwright, B. Yu (2011).
      
       Minimax rates of estimation for high-dimensional linear regression over lq-balls.
      
      
       IEEE Trans. Inform. Th., 57(10), 6976-6994.
      
      
- 
      K. Rohe, S. Chatterjee, and B. Yu (2011).
      
       Spectral clustering and the high-dimensional Stochastic Block Model.
      
      
       Annals of Statistics, 39 (4), 1878-1915
      
      
- 
      V. Q. Vu, P. Ravikumar, T. Naselaris, K. N. Kay, J. L. Gallant, B. Yu* (2011).
      
       Encoding and decoding V1 fMRI responses to natural images with sparse nonparametric models.
      
      
       Annals of Applied Statistics, 5, 1150-1182.
        (*First senior author as last author in biology tradition)
      
      
- 
      S. N. Pakzad, G. Rocha, and B. Yu (2011).
      
       Distributed modal identification by regularized auto regressive models.
      
      
       International Journal of Systems Science, 42, 1473-1489.
      
      
- 
      J. Yousafzai, P. Sollich, Z. Cvetkovic, and B. Yu (2011).
      
       Combined Features and Kernel Design for Robust Phoneme Classification Using Support Vector Machines.
      
      
       IEEE Trans. Audio, Speech and Language Processing (to appear).
        64.
      
      
- 
      X. Dai, J. Jia, B. Yu, El Ghaoui (2011)
      
       SBA-term: Sparse Bilingual Association for terms.
      
      
       Proc. International Conference on Semantic Computing.
      
      
- 
      B. Yu (2011).
      
       Asymptotics and Coding Theory: One of the n - 1 Dimensions
        of Terry.
      
      
       In Selected Works of Terry Speed (ed. S. Duoit), pp. 33-36, Springer.
      
      
- 
      B. Yu (2010).
      
       Remembering Leo.
      
      
       Annals of Applied Statistics, 4(4), 1657-1659.
      
      
- 
      J. Jia, Y. Benjamini, C. Lim, G. Raskutti, B. Yu (2010).
      
       Comment on "Envelope models for parsimonious and efficient multivariate linear regression" by R. D. Cook, B. Li, and F. Chiaromonte.
      
      
       Statistica Sinica, 20, 961-967.
      
      
- 
      G. Raskutti, M. Wainwrigt, and B. Yu (2010)
      
       Restricted Eigenvalue Properties for Correlated Gaussian Designs.
      
      
       Journal of Machine Learning Research, 11, 2241-2259.
      
      
- 
      J. Jia and B. Yu (2010).
      
       On model selection consistency of elastic net when p >>n.
      
      
       Statistica Sinica, 10, 595-611.
      
      
- 
      P. Buhlmann and B. Yu (2010).
      
       Boosting.
      
      
       Wiley Interdisciplinary Reviews: Computational Statistics, 2, 69-74.
      
      
- 
      L. Huang, J. Jia, B. Yu, B. Chun, P. Maniatis, M. Naik (2010).
      
       Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression.
      
      
       Proc. NIPS 2010.
      
      
- 
      Y. Han, F. Wu, J. Jia, Y. Zhuang and B.  Yu (2010).
      
       Multi-task Sparse Discriminant Analysis (MtSDA) with Overlapping Categories.
      
      
       Proc. of The 24th AAAI Conference on Artificial Intelligence, July 11-15, Atlanta, GA.
      
      
- 
      B. Gawalt, J. Jia, L. Miratrix, L. El Ghaoui, B. Yu, and S. Clavier (2010).
      
       Discovering Word Associations in News Media via Feature Selection and
        
        Sparse Classification.
      
      
       Proc. 11th ACM SIGMM International Confernece on
        
        Multimedia Information Retrieval (MIR).
      
      
- 
      E. Anderes, B. Yu, V. Jovanovic, C. Moroney, M. Garay, A. Braverman, E. Clothiaux (2009)
      
       Maximum Likelihood Estimation of Cloud Height from Multi-Angle Satellite Imagery.
      
      
       Annals of Applied Statistics, 3, 902-921
      
      
- 
      T. Shi, M. Belkin, and B. Yu, (2009)
      
       Data Spectroscopy: Eigenspace of Convolution Operator and Clustering
      
      
       Annals of Statistics, 37 (6B), 3960-3984.
      
      
- 
      Vincent Q. Vu, Bin Yu, Robert E. Kass (2009)
      
       Information In The Non-Stationary Case
      
      
       Neural computation 21, 688-703.
      
      
- 
      N. Meinshausen and B. Yu (2009).
      
       Lasso-type recovery of sparse representations for
        
        high-dimensional data.
      
      
       Annals of Statistics 37, 246-270.
      
      
- 
      P. Zhao, G. Rocha, and B. Yu (2009).
      
       The composite absolute penalties family for grouped and hierarchical variable selection
      
      
       Annals of Statistics 37, 3468-3497. (An earlier version 'appeared as Grouped and hierarchical model selection through composite absolute penalties' by P. Zhao, G. Rocha and B. Yu, Department of Statistics, UC Berkeley, Tech. Rep 703.)
      
      
- 
      S. Negahban, P. Ravikumar, M. Wainwright, and B. Yu (2009).
      
       A unified framework for high-dimensional analysis of $M$-estimators with
        
        decomposable regularizers
      
      
       Proc. NIPS, 2009. (This conference paper contains preliminary results of the journal submission
       
        Negahban et al. 2012
       
       ).
      
      
- 
      G. Raskutti, M. Wainwright, B. Yu (2009)
      
       High-dimensional regression under lq-ball sparsity: Optimal rates of convergence.
      
      
       Proc. of Allerton Conference on Communication, Control, and Computing. (This conference paper contains some of preliminary results of the journal submission
       
        Ravikumar et al. 2011
       
       ).
      
      
- 
      G. Raskutti, M. Wainwrigt, and B. Yu (2009)
      
       Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness.
      
      
       Proc. NIPS, 2009. (This conference paper contains some of preliminary results of the journal submission
       
        Ravikumar et al. 2011
       
       ).
      
      
- 
      T. Shi, B. Yu, E. Clothiaux, and A. Braverman (2008).
      
       Daytime Arctic Cloud
        
        Detection based on Multi-angle Satellite Data with Case Studies.
      
      
       Journal of American Statistical Association.
      
      103( 482), 584-593.
      
- 
      Peter Buhlmann and Bin Yu (2008)
        
        Invited discussion on "Evidence contrary to the statistical view of boosting (D. Mease and A. Wyner)".
      
       (paper with discussion)
      
      
       Journal of Machine Learning Research 9, 187-194.
      
      
- 
      P. Ravikumer, V. Vu, B. Yu, T. Naselaris, K. Kay, J. Gallant (2008).
      
       Nonparametric sparse hiearchical models describe V1 fMRI
        
        responses to natural images
      
      
       In Adavances in Neural Information Processing Systems (NIPS) 21, (2008).
      
      (This conference paper contains some preliminary results of journal paper Vu et al. (2011) on encoding models, but also contains an encoding model that is not in Vu et al. (2011). It does not contain decoding results.)
      
- 
      P. Ravikumar, G. Raskutti, M. Wainwright, B. Yu (2008)
      
       Model selection in Gaussian graphical models: high-dimensional
        
        consistency of l1-regularized MLE.
      
      
       In Adavances in Neural Information Processing Systems (NIPS) 21, (2008).
      
      
- 
      T. Shi, M. Belkin, and B. Yu (2008).
      
       Data spectroscopy: learning mixture models using eigenspaces of convolution operators.
      
      
       Proc. of ICML 2008.
      
      
- 
      M. Ager, Z. Cvetkovic, P. Pollich, and B. Yu (2008).
      
       Towards Robust Phoneme Classification Augmentation of PLP Models with Acoustic Waveforms.
      
      
       Proceedings of EUSIPCO.
      
      
- 
      J. Yousafzai, Z. Cvetkovi Ìc, P. Pollich, and B. Yu (2008).
      
       Combined PLP-Acoustic Waveform Classification for Robust Phoneme Recognition using Support Vector Machines.
      
      
       Proceedings of EUSIPCO.
      
      
- 
      N. Meinshausen, G. Rocha, and B. Yu (2007).
      
       A tale of three cousins: Lasso, L2Boosting, and Danzig
      
      
       Annals of Statistics (invited discussion on
        
        Candes and Tao's Danzig Selector paper)
      
      
- 
      V. Vu, B. Yu, and R. Kass (2007).
      
       Coverage Adjusted Entropy Estimation.
      
      
       Statistics and Medicine, 26(21), 4039-4060.
      
      
- 
      B. Yu (2007).
      
       Embracing Statistical Challenges in the Information Technology Age
      
      
       Technometrics (special issue
        
        on statistics and information technologies). vol. 49 (3), 237-248.
      
      
- 
      X. Jiang, Y. Liu, B. Yu and M. Jiang (2007).
      
       Comparison of MISR aerosol optical thickness with AERONET
            
            measurements in Beijing metropolitan area.
      
      
       Remote Sensing of Environment (Special Issue on Multi-angle Imaging SpectroRadiometer), vol. 107, pp. 45-53.
      
      
- 
      T. Shi, E. E. Clothiaux, B. Yu, A. J. Braverman, and G. N. Groff (2007).
      
       Detection of Daytime Arctic Clouds using MISR and MODIS Data.
      
      
       Remote Sensing of Environment (Special Issue on Multi-angle Imaging SpectroRadiometer), vol. 107, pp. 172-184.
      
      
- 
      Peng Zhao and Bin Yu (2006).
      
       On Model Selection Consistency of Lasso.
      
      
       J. Machine Learning Research, 7 (nov), 2541-2567.
      
      
- 
      B. Yu (2006).
      
       Comments on: Monitoring networked applications with incremental quantile estimation by Chambers et al.
      
      
       Statist. Sci., 21, 483-485.
      
      
- 
      B. Yu (2006).
      
       Comments on: Regularization in Statistics, by P. J. Bickel and B. Li.
      
      
       Test, vol. 15 (2), pages 314-316.
      
      
- 
      P. Buhlmann and B. Yu (2006).
      
       Sparse Boosing
      
      
       Journal of Machine Learning Research ( 7 (June), 1001-1024).
      
      This is a shortened and more focused version of
    
    Buhlmann and Yu "Boosting, Model Selection, Lasso and Nonnegative Garotte"
    
    given below.
      
- 
      J. Gao, H. Suzuki, and B. Yu (2006).
      
       Approximation Lasso Methods for Language Modeling.
      
      
       Proceedings of the 21st International Conference on Computational
        
        Linguistics and 44th Annual Meeting of the ACL, pp. 225-232, Sydney.
      
      
- 
      T. Shi and B. Yu (2005).
      
       Binning in Gaussian Kernel Regularization.
      
      
       Statistica Sinica (special issue on machine learning), 16, 541-567.
      
      
- 
      G. Liang, N. Taft, and B. Yu (2005).
      
       A fast lightweight approach to origin-destination IP traffic
            
            estimation using partial measurements.
      
      
       Tech Report 687, Statistics Department, UCB (accepted for Special Issue of IEEE-IT and ACM Networks on data networks, Jan. 2006)
      
      
- 
      Tong Zhang and B. Yu (2005).
      
       Boosting with early stopping: convergence and consistency.
      
      
       The Annals of Statistics. Vol. 33, 1538-1579.
      
      
- 
      Castro, M. Coates, G. Liang, R. Nowak, and B. Yu (2005)
      
       Network tomography: recent developments.
      
      
       Statistical Science, 19, 499-517.
      
      
- 
      C. D.  Giurcaneanu and B. Yu (2005).
      
       Efficient algorithms for discrete universal denoising for channels
        
        with memeory.
      
      
       Proceedings of International Symposium on
        Information Theory, Australia. (Also as Tech. Report 686, Statistics Department, UCB (Proc. ISIT, Sept. 2005))
      
      
- 
      P. Zhao and B. Yu (2004).
      
       Stagewise Lasso (old title: Boosted Lasso)
      
      
       Journal of Machine Learning Research, 8, 2701-2726. (An earlier version appeared as Tech. Report #678, Statistics Department, UC Berkeley (December, 2004; revised in April, 2005)
      
      
- D. J. Diner et al (2004). PARAGON: A Systematic, Integrated Approach to Aerosol Observation and Modeling. American Meterological Society, Oct., 1491-1501.
- 
      P. Buhlmann and B. Yu (2004).
      
       Discussion on three boosting papers by Jiang, Lugosi and Vayatis, and
        
        Zhang
      
      
       Annals of Statistics. 32 (1): 96-101.
      
      
- 
      R. Jorsten and B. Yu (2004).
      
       Compressing genomic and proteomic array images for statistical analyses.
      
      
       Invited chapter in a book on Genomic signal processing and statistics, edited by E. R. Dougherty, I. Shmulevich, J. Chen, and Z. J. Wang, pp. 341 - 366.
      
      
- 
      G. Liang, B. Yu, and N. Taft (2004).
      
       Maximum entropy models: convergence rates and application in dynamic system monitoring.
      
      
       International Symposium on Information Theory, Chicago.
      
      
- 
      R. Castro, M. Coates, G. Liang, R. Nowak,  and B. Yu (2003).
      
       Internet Tomography: Recent Developments
      
      
       Statistical Science.
      
      Vol. 19(3), 499-517.
      
- 
      G. Liang and B. Yu (2003).
      
       Maximum Pseudo Likelihood Estimation in Network Tomography.
      
      
       IEEE Trans. on Signal Processing (Special Issue on Data Networks).
        
        51(8), 2043-2053
      
      
- 
      Rebecka Jornsten and Bin Yu (2003).
      
       Simultaneous Gene Clustering and Subset Selection for Classification via MDL.
      
      
       Bioinformatics. 19(9): 1100-1109.
      
      
- 
      Peter Buhlmann and Bin Yu (2003).
      
       Boosting with the L2 Loss: Regression and Classification.
      
      
       J. Amer. Statist. Assoc. 98, 324-340.
      
      
- 
      R. Jornsten, W. Wang, B. Yu, and K. Ramchandran (2003).
      
       Microarray image compression: SLOCO and the effects of information loss.
      
      
       Signal Processing Journal (Special Issue on Genomic Signal Processing).
        
        83, 859-869.
      
      
- 
      G. Liang and B. Yu (2003).
      
       Pseudo Likelihood Estimation in Network Tomography.
      
      
       Proceedings of of Infocom, San Francisco.
      
      
- 
      Peter Buhlmann and Bin Yu (2002).
      
       Analyzing Bagging.
      
      
       Annals of Statistics
      
      vol. 30, 927-961.
      
- 
      R. Jornsten, M. Hansen, and B. Yu (2002).
      
       Adaptive Minimum Description Length (MDL) criteria with applications to microarray data.
      
      
       In Advances in Minimum Description Length: Theory and Applications, edited by P. Grunwald, I.J. Myung and M.A. Pitt. The MIT Press, pp. 295-321.
      
      
- 
      Mark Hansen and Bin Yu (2002).
      
       Minimum Description Length Model Selection Criteria for Generalized Linear
        
        Models.
      
      
      
      {\em Science and Statistics: Festschrift for Terry Speed},
    
    IMS Lecture Notes -- Monograph Series, Vol. 40.
      
- 
      Rebecka Jornsten, and Bin Yu (2002).
      
       Multiterminal Estimation: Extensions and a Geometric interpretation.
      
      
       Proceedings of International Symposium on Information Theory (ISIT),
      
      June, 2002.
      
- 
      Gerald Schuller, Bin Yu, Dawei Huang, and Bern Edler (2002).
      
       Perceptual Audio Coding using Pre- and Poster- Filters
        
        and Lossless Compression.
      
      
       IEEE Trans. Speech and Audio Processing.
      
      Vol. 10 (6), 379-390
      
- 
      Mark Coates, Alfred Hero, Robert Nowak, and Bin Yu (2002).
      
       Internet Tomography.
      
      
       Signal Processing Magazine.
      
      vol. 19, No. 3 (May issue),
    
    47-65.
      
- 
      M. Hansen and B. Yu (2001).
      
       Model selection and the principle of Minimum Description Length.
      
      
       Journal of American Statistical Association. 96, 746-774.
      
      
- 
      Jin Cao, Drew Davis, Scott Vander Wiel and Bin Yu (2000).
    
    [
      
       PDF
      
      |
      
       Time-varying network tomography: router link data.
      
      
       J. Amer. Statist. Assoc.
      
      vol. 95, 1063-1075.
      
- 
      Peter Buhlmann and Bin Yu (2000a).
      
       Discussion. Additive logistic regression: a statistical view of
        
        boosting, by Friedman, J., Hastie, T. and Tibshirani, R.
      
      
       Annals of Statistics.
      
      Vol. 28, 377-386
      
- 
      Mark Hansen and Bin Yu (2000).
      
       Wavelet thresholding via MDL for natural images.
      
      
       IEEE Trans. Inform. Theory (Special Issue on Information Theoretic
            
            Imaging).
      
      vol. 46, 1778-1788.
      
- 
      Jorma Rissanen and Bin Yu (2000).
      
       Coding and compression: a happy union of theory and practice.
      
      
       J. Amer. Statist. Assoc. (Year 2000 Commemorative Vignette on Engineering
        
        and Physical Sciences).
      
      vol. 95, 986-988.
      
- 
      Lei Li and Bin Yu (2000).
      
       Iterated logarithm expansions of the pathwise code
            
            lengths for exponential families.
      
      
       IEEE Trans. Inform. Theory.
      
      vol. 46, 2683-2689.
      
- 
      G. Chang, B. Yu and M. Vetterli (2000).
      
       Adaptive wavelet thresholding
        
        for image denoising and compression.
      
      
       IEEE Trans. Image Processing,
      
      vol. 9, 1532-1546.
      
- 
      G. Chang, B. Yu and M. Vetterli (2000).
      
       Spatially adaptive wavelet thresholding
            
            based on context modeling for image denoising.
      
      
       IEEE Trans. Image Processing,
      
      vol. 9, 1522-1531.
      
- 
      G. Chang, B. Yu and M. Vetterli (2000).
      
       Wavelet thresholding for multiple noisy image copies.
      
      
       IEEE Trans. Image Processing,
      
      vol. 9, 1631-1635.
      
- 
      Y. Yoo, A. Ortega, and B. Yu (1999).
      
       Image subband coding
         
         using context-based classification and adaptive quantization.
      
      
       IEEE Trans. Image Processing,
      
      vol. 8, 1702-1215.
      
- 
      B. Yu, M. Ostland, P. Gong and R. Pu (1999).
    
    Penalized discriminant
    
    analysis of
      
       in situ
      
      hyperspectral data for conifer species recognition.
      
       IEEE Trans. Geoscience and Remote Sensing,
      
      in press.
      
- 
      A. Barron, J. Rissanen, and B. Yu (1998).  The Minimum Description
    
    Length principle
    
    in coding and modeling.
    
    
    
    (Special Commemorative Issue: Information Theory: 1948-1998)
      
       IEEE. Trans. Inform. Th.,
      
      44, 2743-2760. Reprinted in Information 50 Years of Discovery, Theory: S. Verdu Ì and S. McLaughlin (eds),
      
       IEEE Press
      
      , 1999.
      
- 
      B. Yu and P. Mykland (1998).
    
    Looking at Markov samplers through
    
    cusum path plots:
    
    a simple diagnostic idea.
      
       Statistics and Computing
      
      , 8, 275-286.
      
- 
      P. Gong, R. Pu and B. Yu (1998)
      
       Conifer species recognition: effects of data
        transformation and band width (in Chinese)
      
      
       Journal of Remote Sensing, 2(3), 211-217.
      
      
- 
      G. Chang, B. Yu and M. Vetterli (1998).
      
       Spatially adaptive wavelet thresholding for image denoising.
      
      
       Proceedings of IEEE International Conference on Image Processing, October, Chicago.
      
      
- 
      S. G. Chang, B. Yu, and M. Vetterli (1998).
      
       Image denoising via lossy compression and wavelet thresholding.
      
      
       Proceedings of International Conference on Image Processing. Santa Barbara, California, vol. 1, pp. 604-607.
      
      
- 
      M. Ostland and B. Yu (1997).
      
       Exploring quasi Monte Carlo for marginal
        
        density approximation.
      
      
       Statistics and Computing,
      
      7, 217-228.
      
- 
      P. Gong, R. Pu, and B. Yu (1997). Conifer species recognition with in Situ
        
        hyperspectral data.
      
       Remote Sensing of Environment,
      
      62, 189-200.
      
- 
      B. Yu and T. P. Speed (1997).
      
       Information and the clone mapping of chromosomes.
      
      
       Ann. Statist.
      
      25, 169-185.
      
- 
      D. Nelson, T. Speed, and B. Yu (1997). The limits of random
    
    fingerprinting.
      
       Genomics
      
      ,  40, 1-12.
      
- 
      B. Yu (1997).
      
       Assouad, Fano, and Le Cam.
      
      
       Festschrift for Lucien Le Cam
      
      .
    
    D. Pollard, E. Torgersen, and G. Yang (eds), pp. 423-435, Springer-Verlag.
      
- 
      B. Yu (1996). Lower bounds on expected redundancy for nonparametric
    
    classes.
      
       IEEE Trans. on Information Theory
      
      ,   42, 272-275.
      
- 
      Y. Yoo, A. Ortega, and B. Yu (1996).
    
    Adaptive quantization of image subbands with efficient
    
    overhead rate selection.
    
    In
      
       Proceedings of IEEE International
      
      
       Conference
        
        on Image Processing,
      
      Lausanne, Switzerland.
      
- 
      B. Yu (1996).
      
       A Statistical analysis of adaptive scalar quantization based on quantized past data.
      
      In
      
       Proceedings of International
        Symposium on Information Theory and its Applications (ISITA96),
      
      Victoria, Canada.
      
- 
      B. Yu (1995).
    
    Comment: Extracting more diagnostic information
    
    from a single run
    
    using cusum path plot.
      
       Statist. Sci.
      
      , 10, 54-58.
      
- 
      J. Rissanen and B. Yu (1995). MDL learning.
    
    In
      
       Learning and Geometry:
        
        Computational Approaches,
        
        Progress in Computer Science and Applied Logic
      
      , 14,
    
    David Kueker and Carl Smith (eds), Birkhäuser, Boston,
    
    pp. 3-19.
      
- 
      P. Mykland, L. Tierney, and B. Yu (1995). Regeneration in Markov Chain samplers.
      
       J. Amer. Statist. Assoc.
      
      , 90, 233-241.
      
- 
      B. Yu (1994a). Rates of convergence for empirical
    
    processes of stationary mixing
    
    sequences.
      
       Ann. Probab.
      
      22, 94-116.
      
- 
      M. Arcones and B. Yu (1994). Central limit theorems for empirical and
    
    U-processes
    
    of stationary mixing sequences.
      
       J. Theor. Probab.
      
      7, 47-71.
      
- 
      B. Yu (1994). Lower bound on the expected redundancy for classes of continuous
    
    Markov sources.  In
      
       Statistical Decision Theory and
        
        Related Topics V,
      
      S. S. Gupta and J. O. Berger (eds), 453-466.
      
- 
      M. Arcones and B. Yu (1994). Limit theorems for empirical processes under dependence.  In
      
       Proceedings in Chaos expansions, multiple Wiener integrals and their applications.
      
      205-221.
      
- 
      A. R. Barron, Y. Yang and B. Yu (1994). Asymptotically
    
    optimal function estimation
    
    by minimum
    
    complexity criteria. In
      
       Proceedings of 1994 International Symposium
      
      
       on Information Theory,
      
      pp. 38, Trondheim, Norway.
      
- 
      B. Yu and T. Speed (1993).
    
    A rate of convergence result for a universal D-semifaithful code.
      
       IEEE Trans. on Information Theory
      
      39, 8813-820.
      
- 
      B. Yu (1993).
    
    Density estimation in the L
      
       ∞
      
      norm for dependent data with
        
        applications
        
        to the Gibbs sampler.
      
       Ann. Statist.
      
      21, 711-735.
      
- 
      T. Speed and B. Yu (1993). Model selection and prediction: normal
    
    regression.
      
       J. Inst. Statist. Math.
      
      45, 35-54.
      
- 
      J. Rissanen, T. Speed and B. Yu (1992). Density estimation by stochastic complexity.
      
       IEEE Trans. on Information Theory
      
      , 38, 315-323.
      
- 
      B. Yu and T. Speed (1992) Data compression and histograms.
      
       Probability Theory and Related Fields
      
      , 92, 195-229.