Bin Yu
I'm Bin Yu, the head of the Yu Group at Berkeley, which consists of 15-20 students and postdocs from statistics and EECS. I was formally trained as a statistician, but my research interests and achievements extend beyond the realm of statistics. Together with my group, my work has leveraged new computational developments to solve important scientific problems by combining novel machine learning approaches with the domain expertise of my many collaborators.

In 2014, I was elected to the National Academy of Sciences based on my statistical and scientific contributions, as well as my broad vision of data science best described in my article Veridical Data Science, written together with my former student Karl Kumbier. In this work, I introduced a framework based on three principles: predictability, computability and stability (abbreviated to PCS). This framework helps guide practitioners who solve domain data problems with data science tools to be creative in their analysis and properly validate their findings. I am currently writing a book on the Veridical Data Science framework together with my former student and current postdoc Rebecca Barter.

In my research group, I cultivate a strongly interdisciplinary and collaborative culture, solving data problems across fields such as neuroscience, genomics, remote sensing, and precision medicine. Through these projects we have successfully mapped a cell's destiny using spatial gene expression images of Drosophila embryos, we have characterized V4 neurons through DeepTune images, and we are currently seeking genomic markers of heart disease using UK Biobank data. Recently, my group and I have developed approaches for predicting county-level COVID-19 death counts in an effort to support the non-profit, Response4Life, who are working towards distributing PPE across the country to those who need it most.

The Yu Group and I have also developed an array of statistical and machine learning methods inspired by their interdisciplinary projects, including staNMF for unsupervised learning, iterative Random Forests (iRF) and signed iRF (s-iRF) for discovering predictive and stable high-order interactions in supervised learning, contextual decomposition (CD) and aggregated contextual decomposition (ACD) for phrase or patch importance extraction from Deep Neural Networks (DNNs).

Chancellor's Distinguished Professor, Class of 1936 Second Chair
 Department of Statistics
Department of Electrical Engineering & Computer Science
University of California, Berkeley
Chan-Zuckerberg Biohub Investigator
mail: 367 Evans Hall #3860 • Berkeley, CA 94720 • phone: 510-642-2021 • fax: 510-642-7892