
I am at ICSI and the Department of Statistics at UC Berkeley, and
I am also in the RISELab
(in the past AMPLab)
in the Department of EECS.
Email: mmahoney is the username, and stat dot berkeley dot edu is the domain name

Most of my work focuses on the theory and practice of what is now called big data, although I was doing it back when it was just massive, and prior to that when it was just large.
On the theory side, we develop algorithmic and statistical methods for matrix, graph, regression, optimization, and related problems.
On the implementation side, we provide implementations (e.g., on single machine, distributed data system, and supercomputer environments) of a range of matrix, graph, and optimization algorithms.
On the applied side, we apply these methods to a range of problems in internet and social media analysis, social networks analysis, as well as genetics, mass spec imaging, astronomy, climate, and a range of other scientific applications.
For more information, you can also see my CV;
and here is a headshot and bio.
Recent News
I was interviewed by Ben Lorica on O'Reilly Data Show Podcast.
Topics included understanding deep neural networks and developing a practical theory for deep learning, our new Hessian AWare Quantization (HAWQ) framework for addressing problems pertaining to model size and inference speed/power, and how these relate to challenges at the foundations of data analysis.
I am the Director of the new UC Berkeley FODA (Foundations of Data Analysis) Institute grant, which is part of the NSF TRIPODS program, to deepen the theoretical foundations of data science in a new transdisciplinary institute.
Also involved are coPIs Bin Yu, Fernando Perez, Michael Jordan, and Peter Bartlett.
Here is more information about it:

The main FODA web page.

The UC Berkeley press release about FODA.

The summary article from Datanami about it.

The original NSF announcement.

The report from the planning workshop we wrote.
Our edited volume on The Mathematics of Data, based on lectures from the 2016 PCMI Summer School, is available.
I talked about our recent results on "Why Deep Learning Works: Implicit SelfRegularization in Deep Neural Networks" at the Simons' Institute 2018 Big Data RandNLA meeting, Sept 2018.
Related talks were given by collaborator Charles Martin:
at LBNL, June 2018;
at ICSI, December 2018; and
at Stanford, Januray 2020.
More recently, I gave a longer and updated version
at SF Bay Area DMSIG Meeting, February 2019.
I talked about "Alchemist: An Apache Spark <=> MPI Interface" at the XLDB meeting, May 2018.
A related talk was given by postdoc Kai Rothauge at the Spark+AI meeting, June 2018.
Linear Algebra for Data Science:
A fourunit introductory class on linear algebra, from the perspective of probablity and related topics in the mathematics of data.
It completely redesigns the usual presentation of linear algebra from the ground up to be more appropriate for data applications.
I'll teach this for the first time in Spring 2018.
(This is expanded from a twounit "connector" class on mathematics for matrices/graphs/data that I designes and taught during
Spring 2017 and Spring 2016.)
(I have a detailed but workinprogress set of course notes for this class.
I will post them at an appropriate point; but, in the meantime, email me if you are interested in them.)
Our technical report describes
RISELab challenges.
Video synopsis of our Communications of the ACM article on "RandNLA: Randomized Numerical Linear Algebra."
My postdoc Julian Shun has been awarded the ACM's doctoral dissertation award for his 2015 CMU doctoral thesis "SharedMemory Parallelism Can Be Simple, Fast, and Scalable".
Our AMPLabCrayNERSC/LBL collaboration, to implement randomized linear algebra (RLA) algorithms in Spark on HPC platforms for LBL's scientific data applications, is described on HPC Wire.
Role of RLA in HPC, as described HPC Wire.
I gave a talk on "BIG Biomedicine and the Foundations of BIG Data Analysis"
at Stanford University at Stanford Medical School, May 23, 2014,
at their "Big Data in BioMedicine" Conference.
Click
"here"
for a video of the talk.
They also got two nice CEOstyle pictures:
here; and
here.
Students and Postdocs
I have been fortunate to work with a number of great students and postdocs over the years.
FODA/TRIPODS Postdocs:
Michal Derezinski
Jelena Diakonikolas
Amir Gholaminejad
Wooseok Ha
Rajiv Khanna
Francois Lanusse
Yian Ma
Yan Shuo Tan
Current Students and Postdocs:
Former Students and Postdocs
Kai Rothauge (postdoc, UC Berkeley, now at a startup)
Shusen Wang (postdoc, UC Berkeley, now faculty at Stevens Institute)
Peng Xu (student, Stanford, now at Amazon)
Aditya Devarakonda (student, UC Berkeley PhD, now at Hopkins APL)
Julian Shun (postdoc, UC Berkeley, now faculty at MIT)
Kimon Fountoulakis (postdoc, UC Berkeley, now faculty at Waterloo)
Fred Roosta (postdoc, UC Berkeley, now faculty at Queensland)
Alex Gittens (postdoc, UC Berkeley; now faculty at RPI)
Di Wang (student, UC Berkeley PhD, 2017; now at Georgia Tech)
Jiyan Yang (student, Stanford PhD, 2016; now at Facebook)
Aaron Adcock (student, Stanford PhD, 2014; now at Facebook)
Xiangrui Meng (student, Stanford PhD, 2014; at LinkedIn, then Databricks)
Lorenzo Orecchia (intern, Yahoo; now faculty at Boston)
Jure Leskovec (intern, Yahoo; now faculty at Stanford)
Hari Narayanan (intern, Yahoo; faculty at Washington, then TIFR, Mumbai)
Jeff Phillips (intern, Yahoo; now faculty at Utah)
LekHeng Lim (intern, Yahoo; now faculty at Chicago)
Boulos Harb (intern, Yahoo; now at Google)
