Most of my work focuses on
the applied mathematics of data, in particular on
algorithmic and statistical aspects
of what is now called big data, although I was doing it back when it was just massive, and prior to that when it was just large.
On the theory side, we develop algorithmic and statistical methods for matrix, graph, regression, optimization, and related problems.
On the implementation side, we provide implementations (e.g., on single machine, distributed data system, and supercomputer environments) of a range of matrix, graph, and optimization algorithms.
On the applied side, we apply these methods to a range of problems in internet and social media analysis, social networks analysis, as well as genetics, mass spec imaging, astronomy, climate, and a range of other scientific applications.
For more information, you can also see
a headshot and bio.
Recent News
DJ Rich of
Mutual_Information posted a very nice
summary video about RandNLA, motivated by but much more general than our recent RandBLAS/RandLAPACK monograph.
Check it out.
Abdul Fatir Ansari and Lorenzo Stella wrote a blog about adapting language model architectures for time series forecasting based on our recent Chronos paper.
Our recent KVQuant paper was described on Apple Podcasts Preview;
here is the audio.
We (Michal Derezinski and I) gave a tutorial at NeurIPS23 on "Recent and Upcoming Developments in Randomized Numerical Linear Algebra for ML":
pdf and video.
We (Mert Gurbuzbalaban, Stefanie Jegelka, Umut Simseklim, and I) are running a workshop NeurIPS23 on
Heavy Tails in Machine Learning: Structure, Stability, Dynamics.
We put up
CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT).
This randomized algorithm does not try to obtain asymptotic speedups over existing algorithms.
Instead, its goal is to provide a dropinreplacement for existing algorithms, while obtaining 10x to 20x speedups running on highperformance hardware.
It plays well with RandBLAS/RandLAPACK.
More to come!
The M.O. of ML: Can AI Foundation Models Drive Accelerated Scientific Discovery?, is a nice popular piece by Carol Pott, which summarizes our recent work, Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior, appearing in the upcoming NeurIPS 2023, where we ask whether the methodology used to develop socalled foundation models for NLP is applicable for scientific ML problems.
The RandLAPACK book, which describes our plan for putting RandNLA into the next generation of BLAS/LAPACK, has been posted in v1 form, and will be published by SIAM.
Recent years have even seen the incorporation of RandNLA methods into MATLAB, the NAG Library, NVIDIA's cuSOLVER, and SciKitLearn.
We are setting up the "RandBLAS" and "RandLAPACK" libraries, to serve as standards (for RandNLA and other methods), conceptually analogous to BLAS and LAPACK.
This is part of the larger BALLISTIC project (with J. Demmel, J. Dongarra, J. Langou, J. Langou, and P. Luszczek).
SuperBench, see also the SuperBench paper, is a benchmark dataset and evaluation framework for superresolution tasks in scientific machine learning.
Revolutionizing AI Efficiency with SqueezeLLM, as described Marktechpost.
Danielle Maddix Robinson wrote a very nice blog on Physicsconstrained machine learning for scientific computing on recent work we have been doing.
We have made it to prime time!
Randomized Numerical Linear Algebra (RandNLA)
made a cameo appearance in
Netflix's
hit series
"The Lincoln Lawyer."
Watch this
short clip
from Season 1, Episode 3.
(Disclaimer: this brief excerpt is reproduced here for teaching/research purposes only.)
Petros Drineas and I coined this term in late 2011 and the first workshop on
Randomized Numerical Linear Algebra (RandNLA)
was held during FOCS 2012.
The
Nuit Blanche
and
My Biased Coin
blog posts describe details at the time.
Our recent work on trying to identify and solve possible failure modes in scientific machine learning models has been highlighted in a recent CACM News article.
The article describes how "physicsinformed machine learning is gaining attention, but suffers from training issues" that we identified, and it has several nice quotes by Amir Gholami (who did the work, along with Aditi Krishnapriyan).
A related article published at LBL can be found here.
Amir Gholami wrote a very nice blog on AI and Memory Wall on recent work we have been doing.
Update (3/22/24): It is now on the arXiv and has been accepted for publication at IEEE Micro.
I was interviewed by Ben Lorica on The Data Exchange.
Topics included our recent NeurIPS 2020 Best Paper Award on the column subset selection method, how adversariallytrained deep networks transfer better, how to predict trends in the quality of stateoftheart neural networks without access to training or testing data, and updates on the WeightWatcher tool.
Congrulations to Michal Derezinski and Rajiv Khanna for winning the Best Paper Award at NeurIPS 2020.
I was interviewed by Ben Lorica on O'Reilly Data Show Podcast.
Topics included understanding deep neural networks and developing a practical theory for deep learning, our new Hessian AWare Quantization (HAWQ) framework for addressing problems pertaining to model size and inference speed/power, and how these relate to challenges at the foundations of data analysis.
I am the Director of the new UC Berkeley FODA (Foundations of Data Analysis) Institute grant, which is part of the NSF TRIPODS program, to deepen the theoretical foundations of data science in a new transdisciplinary institute.
Also involved are coPIs Bin Yu, Fernando Perez, Michael Jordan, and Peter Bartlett.
Here is more information about it:

The main FODA web page.

The UC Berkeley press release about FODA.

The summary article from Datanami about it.

The original NSF announcement.

The report from the planning workshop we wrote.
Our edited volume on The Mathematics of Data, based on lectures from the 2016 PCMI Summer School, is available.
I talked about our recent results on "Why Deep Learning Works: Implicit SelfRegularization in Deep Neural Networks" at the Simons' Institute 2018 Big Data RandNLA meeting, Sept 2018.
Related talks were given by collaborator Charles Martin:
at LBNL, June 2018;
at ICSI, December 2018; and
at Stanford, Januray 2020.
More recently, I gave a longer and updated version
at SF Bay Area DMSIG Meeting, February 2019.
I talked about "Alchemist: An Apache Spark <=> MPI Interface" at the XLDB meeting, May 2018.
A related talk was given by postdoc Kai Rothauge at the Spark+AI meeting, June 2018.
Linear Algebra for Data Science:
A fourunit introductory class on linear algebra, from the perspective of probablity and related topics in the mathematics of data.
It completely redesigns the usual presentation of linear algebra from the ground up to be more appropriate for data applications.
I'll teach this for the first time in Spring 2018.
(This is expanded from a twounit "connector" class on mathematics for matrices/graphs/data that I designes and taught during
Spring 2017 and Spring 2016.)
(I have a detailed but workinprogress set of course notes for this class.
I will post them at an appropriate point; but, in the meantime, email me if you are interested in them.)
Our technical report describes
RISELab challenges.
Video synopsis of our Communications of the ACM article on "RandNLA: Randomized Numerical Linear Algebra."
My postdoc Julian Shun has been awarded the ACM's doctoral dissertation award for his 2015 CMU doctoral thesis "SharedMemory Parallelism Can Be Simple, Fast, and Scalable".
Our AMPLabCrayNERSC/LBL collaboration, to implement randomized linear algebra (RLA) algorithms in Spark on HPC platforms for LBL's scientific data applications, is described on HPC Wire.
Role of RLA in HPC, as described HPC Wire.
I gave a talk on "BIG Biomedicine and the Foundations of BIG Data Analysis"
at Stanford University at Stanford Medical School, May 23, 2014,
at their "Big Data in BioMedicine" Conference.
Click
"here"
for a video of the talk.
They also got two nice CEOstyle pictures:
here; and
here.
Students and Postdocs
I have been fortunate to work with a number of great students and postdocs over the years.
Current Students, Postdocs, and Researchers:
Oleg Balabanov (postdoc, UC Berkeley / ICSI, 2023present)
Kareem Hegazy (postdoc, UC Berkeley / ICSI, 2023present)
Krti Tallam (postdoc, ICSI, 2023present)
Wuyang Chen (postdoc, UC Berkeley / ICSI, 2023present)
Caleb Geniesse (postdoc, LBNL, 2023present)
Siavash Ameli (postdoc, UC Berkeley / ICSI, 2023present)
Rasmus Malik Hoeegh Lindrup (postdoc, UC Berkeley / ICSI, joint with A. Krishnapriyan, 2023present)
Pu Ren (postdoc, LBNL, 2022present)
Sen Na (postdoc, UC Berkeley / ICSI, 2021present)
Junyi Guo (MEng student, UC Berkeley, researcher, ICSI, 2022present)
Jialin Song (MEng student, UC Berkeley, researcher, ICSI, 2021present)
Geoffrey Negiar (PhD student, UC Berkeley, joint w. L. El Ghaoui)
Ryan Theisen (PhD student, UC Berkeley)
Former Students, Postdocs, and Visitors
Ilan Naiman (visiting researcher, from BenGurion, 2023)
Omri Azencot (visiting researcher, from BenGurion, 2023)
Yefan Zhou (MEng student, UC Berkeley, researcher, ICSI, 20212023; onto Dartmouth)
Parth Nobel (visiting researcher, from Stanford, 2022)
Max Melnichenko (visiting researcher, from Tennessee, 2022 and 2023)
Amir Gholaminejad (postdoc, UC Berkeley, 20172023, joint w. K. Keutzer, now res. sci. at UCB/ICSI)
Yaoqing Yang (postdoc, UC Berkeley, 20192022, joint w. J. Gonzalez; onto Dartmouth)
Feynman Liang (student, UC Berkeley, PhD, 2022; onto Meta)
Konstantin Rusch (visiting researcher, from ETH, 2022)
Riley Murray (postdoc, UC Berkeley / ICSI, 20212023; now staff at LBNL/ICSI)
Shashank Subramanian (postdoc, UC Berkeley / LBNL, 20212022; now staff at LBNL)
Aditi Krishnapriyan (postdoc, UC Berkeley / LBNL, 20202022; now faculty at UC Berkeley)
Liam Hodgkinson (postdoc, UC Berkeley, 20202022; now faculty at Melbourne)
Zhenyu Liao (postdoc, UC Berkeley, 20202021; now faculty at Huazhong)
Jianfei Chen (postdoc, UC Berkeley, 20192021, joint w. J. Gonzalez; now faculty at Tsinghua)
Michal Derezinski (postdoc, UC Berkeley, 20182021; now faculty at Michigan)
Rajiv Khanna (postdoc, UC Berkeley, 20182021; now faculty at Purdue)
N. Benjamin Erichson (postdoc, UC Berkeley, 20182021; onto Pittsburgh; now res. sci. at LBNL/ICSI)
Vipul Gupta (student, UC Berkeley, PhD, 2021, joint w. K. Ramchandran; onto Bytedance)
Zhewei Yao (student, UC Berkeley, PhD, 2021; onto Microsoft)
Francisco Utrera (MEng student, UC Berkeley, researcher, ICSI, 20192021; onto Pittsburgh, onto startup)
Dominic Carrano (student, UC Berkeley, MEng capstone, 20202021)
Charles Yang (student, UC Berkeley, MEng capstone, 20202021)
Evan Kravitz (student, UC Berkeley, MEng capstone, 20192020; onto Amazon)
Janelle Lines (student, UC Berkeley, MEng capstone, 20192020)
Qixuan Wu (student, UC Berkeley, undergraduate, 20192020)
Vanessa Lin (student, UC Berkeley, undergraduate, 20192020)
Naijing Zhang (student, UC Berkeley, MEng capstone, 20192020; onto Google Youtube)
Yifan Bai (student, UC Berkeley, MEng capstone, 20192020; onto Amazon)
Xinran Rui (student, UC Berkeley, MEng capstone, 20192020; onto TensTorrent)
Sarvagya Singh (student, UC Berkeley, MEng capstone, 20192020; onto Forward Health)
Xing Jin (student, UC Berkeley, MEng capstone, 20192020; onto Volume Hedge Fund)
Vyom Kavishwar (student, UC Berkeley, MEng capstone, 20192020; onto Hive)
Yaohui Cai (visiting researcher, UC Berkeley, 20192020, joint w. K. Keutzer; onto Cornell for PhD)
Sheng Shen (visiting researcher, 20192020, joint w. K. Keutzer; onto UCB for PhD)
Daiyaan Arfeen (student, UC Berkeley, undergraduate 20192020, joint w. K. Keutzer; onto CMU for PhD)
Linjian Ma (student, UC Berkeley, MEng capstone, 20182019; onto UIUC for PhD)
Gabe Montague (student, UC Berkeley, MEng capstone, 20182019; onto CoFounder of Park and Pedal)
Jiayu Ye (student, UC Berkeley, MEng capstone, 20182019; onto Google)
Kai Rothauge (postdoc, UC Berkeley, 20172019; onto a startup)
Shusen Wang (postdoc, UC Berkeley, 20162018; now faculty at Stevens Institute)
Danqing Zhang (student, UC Berkeley, PhD, 2018, advised by Alexei Pozdnoukhov; now at Amazon)
Ruoxi Wang (student, Stanford, PhD, 2018, advised by Eric Darve; now at Google)
Peng Xu (student, Stanford, PhD, 2018; now at Amazon)
Aditya Devarakonda (student, UC Berkeley, PhD, 2018, joint w. J. Demmel; now at Hopkins APL)
Norman Mu (visiting researcher, 2018, joint w. K. Keutzer; onto UCB for PhD)
Stefan Ivo Palombo (student, UC Berkeley, undergraduate thesis, 20172018)
Noah Golmant (student, UC Berkeley, undergraduate, 20172018, joint w. J. Gonzalez; now at OpenAI)
Kimon Fountoulakis (postdoc, UC Berkeley, 20152018; now faculty at Waterloo)
Julian Shun (postdoc, UC Berkeley, 20152017; now faculty at MIT)
Fred Roosta (postdoc, UC Berkeley, 20152017; now faculty at Queensland)
Alex Gittens (postdoc, UC Berkeley, 20152017; now faculty at RPI)
Evgeniy Faerman (visiting researcher, UC Berkeley, 20162017)
Felix Borutta (visiting researcher, UC Berkeley, 20162017)
Liping Jing (visiting researcher, ICSI, 20162017, joint w. G. Friedland)
Di Wang (student, UC Berkeley, PhD, 2017, joint w. S. Rao; now at Google)
Simon Du (student, UC Berkeley, undergraduate thesis, 20142015, joint w. M. Gu; now at Washington)
Jiyan Yang (student, Stanford, PhD, 2016; now at Facebook)
Aaron Adcock (student, Stanford, PhD, 2014; now at Facebook)
Xiangrui Meng (student, Stanford, PhD, 2014; at LinkedIn, then Databricks)
Lorenzo Orecchia (intern, 2008, Yahoo; faculty at Boston, then Chicago)
Jure Leskovec (intern, Yahoo, 2007; now faculty at Stanford)
Hari Narayanan (intern, Yahoo, 2007; faculty at Washington, then TIFR, Mumbai)
Jeff Phillips (intern, Yahoo, 2007; now faculty at Utah)
LekHeng Lim (intern, Yahoo, 2006; now faculty at Chicago)
Boulos Harb (intern, Yahoo, 2006; now at Google)
James Campbell (student, Yale, undergraduate thesis, 2005)
