Michael Mahoney - Presentations

Talks and Presentations

Recent tutorial presentations:

Random Matrix Theory and Modern Machine Learning (June 2025, RMT+RNLA at UW) (pdf)
Recent and Upcoming Developments in Randomized Numerical Linear Algebra for ML (December 2023, Tutorial at 2023 NeurIPS; v2 at KDD24) (pdf)
Practical neural network theory: from statistical mechanics basics to heavy-tailed self regularization to working with state of the art models (Apr 2023) (pdf)

Recent seminar presentations:

Does scientific computing have anything to offer scientific foundation models? (May 2025) (pdf)
Foundational Methods for Foundation Models for Scientific Machine Learning (v3: April 2025) (pdf)
Foundational Methods for Foundation Models for Scientific Machine Learning (v2: December 2024) (pdf)
Visualizing Loss Functions as Topological Landscape Profiles (December 2024) (pdf)
Foundational Methods for Foundation Models for Scientific Machine Learning (v1: April 2024) (pdf)
Model Selection And Ensembling When There Are More Parameters Than Data (November 2023) (pdf)
Foundations for scientific machine learning (May 2023) (pdf)
Multiplicative noise and heavy tails in stochastic optimization and machine learning (May 2023) (pdf)
Algorithmic Methods, Backdoors, and Model Robustness (Apr 2023) (pdf)
Putting Randomness into LAPACK and Next Generation RandNLA Theory (Mar 2023) (pdf)
Scientific machine learning: methods to bridge scientific spatial and temporal modeling with machine learning (Jun 2022) (pdf)
Practice, Theory, and Theorems for Random Matrix Theory in Modern Machine Learning (Jun 2022) (pdf)
Building foundations for scientific machine learning at scale (Mar 2022) (pdf)
Continuous Network Models for Sequential Predictions (Jan 2022) (pdf)
Toward combining principled scientific models and principled machine learning models (Nov 2021) (pdf)
Column Subset Selection: TCS and NLA ("at" the CMC Seminar Series, Nov 2021) (pdf)
Putting Randomized Matrix Algorithms in LAPACK, and Connections with Second-order Stochastic Optimization ("at" the NeurIPS 2021 Optimization Workshop) (pdf)
Least Squares in RandNLA ("at" the CMC Seminar Series, Sept 2021) (pdf)
Incorporating second order ideas into first class machine learning methods ("at" the MOPTA Meeting, Aug 2021) (pdf)
Overcoming Inversion Bias in Distributed Newton's Method (May 2021) (pdf)
Practical Theory and Neural Network Models ("at" TOPML Workshop and elsewhere, April 2021) (pdf)
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning ("at" MIT and elsewhere, September 2020) (pdf)
Dynamical systems and machine learning: combining in a principled way data-driven models and domain-driven models (September 2020) (pdf)
Continuous-in-Depth Neural Networks (August 2020) (pdf)
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning (older, August 2020) (pdf)
Determinantal Point Processes and Randomized Numerical Linear Algebra (April 2020) (pdf)

Older tutorial presentations:

Statistical Mechanics Methods for Discovering Knowledge from Modern Production Quality Neural Networks (at ACM-SIGKDD 2019) (pdf)
Sampling for Linear Algebra, Statistics, and Optimization (Aug 2018, at Simons' Institute 2018 Big Data Bootcamp) (pdf)
RandNLA: Randomization in Numerical Linear Algebra (5.5 hr version at UW Madison Summer School, July 2018) (pdf)
Randomization in Numerical Linear Algebra: Theory and Practice (2.0 hr version at SIAM ALA Meeting, October 2015) (pdf)
Past, Present and Future of Randomized Numerical Linear Algebra: (3.0 hr version at Simons' Institute 2013 Big Data Bootcamp) (Part I: pdf, ppt and Part II: pdf, ppt)
Theory (and some practice) of Randomized Algorithms for Matrices and Data (tutorial from FOCS 2012 Workshop) (pdf, ppt)
Geometric Tools for Identifying Structure in Large Social and Information Networks (1.5 hr version at SAMSI Opening Workshop 2010, etc.) (pdf, ppt)
Geometric Tools for Identifying Structure in Large Social and Information Networks (2 hr version at ICASSP 2011, etc.) (pdf, ppt)
Geometric Tools for Identifying Structure in Large Social and Information Networks (3 hr version at ICML 2010 and KDD 2010, etc.) (pdf, ppt) (The pdf file in four pieces: here, here, here, and here.)
Randomized Algorithms for Matrices and Massive Data Sets (at SIAM-SDM06 2006 and VLDB 2006) (ppt)
Randomized Algorithms for Matrices and Massive Data Sets (at ACM-SIGKDD 2005) (ppt)

Older seminar presentations:

Using dynamical systems ideas to combine in a principled way data-driven models and domain-driven models (older, April 2020) (pdf)
Exact expressions for double descent and implicit regularization via surrogate random design (December 2019) (pdf)
Making the Deep Learning Revolution Practical Through Second Order Methods (at MIT, October 2019) (pdf (big=40MB))
Newton-MR: Newton's Method Without Smoothness or Convexity (October 2019) (pdf)
Minimax and Bayesian experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression (at TTIC, Sept 2019) (pdf)
Why Deep Learning Works: Traditional and Heavy-Tailed Implicit Self-Regularization in Deep Neural Networks (at the ICML 2019 Workshop on Theoretical Physics for Deep Learning, June 2019) (pdf)
Why Deep Learning Works: Implicit Self-Regularization in Deep Neural Networks (at the SF Bay Area DM-SIG Meeting, February 2019) (pdf)
Why Deep Learning Works: Implicit Self-Regularization in Deep Neural Networks (Sept 2018, at Simons' Institute 2018 Big Data RandNLA meeting) (pdf)
Large Scale Training of Neural Networks (Sept/Nov 2018) (pdf)
Alchemist: An Apache Spark <=> MPI Interface (June 2018) (pdf)
Biomedicine & the Foundations of Data? (May 2018) (pdf)
Scientific Machine Learning with Alchemist (An Apache Spark <=> MPI Interface) and Beyond (Apr 2018) (pdf)
Numerically-intensive Machine Learning at Scale (Fall 2017) (pdf)
Second-order Machine Learning (Fall 2017) (pdf)
UC Berkeley's FODA Institute: Foundations of Data Analysis (NSF TRIPODS Kickoff, Oct 2017) (pdf)
Second-order Machine Learning (Pre-fall 2017) (pdf)
Local graph analytics: beyond characterizing community structure (Spring 2017) (pdf)
Terabyte-scale Computational Statistics (talk Fall 2016) (pdf)
Scientific Matrix Factorizations in Spark at Scale: Cross-platform performance, scaling, and comparisons with C+MPI (talk at 2016 Dato Data Science Summit and elsewhere) (pdf)
Optimization Algorithms for Analyzing Large Datasets (talk at 2016 PCMI Summer School) (pdf)
Foundations of Data Science (talk at NSF pre-TRIPODS workshop/meeting, Apr 2016) (pdf)
Sub-Sampled Newton Methods (talk at ITA 2016 and elsewhere) (pdf)
Challenges in Multiresolution Methods for Graph-based Learning (talk, 3of3, from NIPS15 Workshops) (pdf)
Using Local Spectral Methods in Theory and in Practice (talk, 2of3, from NIPS15 Workshops) (pdf)
Column Subset Selection on Terabyte-sized Scientific Data (talk, 1of3, from NIPS15 Workshops) (pdf)
Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra (DIMACS, Aug 2015) (pdf)
Overview of RandNLA: Randomized Numerical Linear Algebra (pdf)
Tree-like structure in social graphs (pdf (big=28MB), ppt (big=48MB))
Eigenvector localization, implicit regularization, and algorithmic anti-differentiation for large-scale graphs and network data (pdf, ppt)
Locally-biased and semi-supervised eigenvectors (talk from MMDS 2014) (pdf, ppt)
Implicit regularization in sublinear approximation algorithms (pdf, ppt)
BIG Biomedicine and the Foundations of BIG Data Analysis (at Big Data in Biomedicine at Stanford's Medical School, 5/23/14) (pdf, ppt)
Revisiting the Nystrom Method for Improved Large-Scale Machine Learning (pdf, ppt)
Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments (version from Simons Big Data Workshop II) (pdf)
Input-sparsity Time Algorithms for Embeddings and Regression Problems (talk from Simons Big Data Workshop I) (pdf)
Randomized Regression in Parallel and Distributed Environments (talk from GraphLab 2013) (pdf)
Extracting insight from large networks: implications of small-scale and large-scale structure (pdf, ppt)
Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments (version from MMDS 2012) (pdf)
Sensors, networks, and massive data (pdf, ppt)
Randomized Algorithms for Matrices and Data (pdf, ppt)
Approximate computation and implicit regularization in large-scale data analysis (PODS vsn) (pdf, ppt)
Approximate computation and implicit regularization in large-scale data analysis (Stats vrsn1) (pdf, ppt)
Approximate computation and implicit regularization in large-scale data analysis (Short vrsn) (pdf, ppt)
Looking for clusters in your data ... in theory and in practice (pdf, ppt)
Fast Approximation of Matrix Coherence and Statistical Leverage (pdf, ppt)
Implementing regularization implicitly via approximate eigenvector computation (pdf, ppt)
Linear Algebra and Machine Learning of Large Informatics Graphs (pdf, ppt)
Geometric Network Analysis Tools (talk from MMDS 2010) (pdf, ppt)
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis (pdf, ppt)
Community structure in large social and information networks (newer) (pdf, ppt)
Statistical leverage and improved matrix algorithms (newer and long) (pdf, ppt)
Approximation Algorithms as Experimental Probes of Informatics Graphs (pdf, ppt)
Community structure in large social and information networks (talk from MMDS 2008) (pdf, ppt)
Community structure in large social and information networks (older) (pdf, ppt)
Statistical leverage and improved matrix algorithms (older and short) (pdf, ppt)
Sampling algorithms and core-sets for Lp regression and applications (pdf, ppt)
CUR Matrix Decompositions for Improved Data Analysis (talk from MMDS 2006) (pdf, ppt)
A Relative-Error CUR Decomposition for Matrices and Its Data Applications (pdf, ppt)
Sampling Algorithms for L2 Regression and Applications (talk from SODA 2006) (pdf, ppt)
Approximating a Gram Matrix for Improved Kernel-Based Learning (talk from COLT 2005) (ps, pdf)
Fast Monte Carlo Algorithms for Matrix Operations and Massive Data Set Analysis (newer) (pdf, ppt)
Fast Monte Carlo Algorithms for Matrix Operations and Massive Data Set Analysis (older) (pdf)
CUR Matrix Decomposition with Applications to Algorithm Design and Massive Data Set Analysis (pdf)
Fast Monte Carlo Algorithms for Massive Data Sets and Approximating Max-Cut (ps, pdf)

The TIP5P Water talk:

The Computational Statistical Mechanics of Simple Models of Liquid Water (pdf)

Videos of Talks and Presentations

"Foundational Methods for Foundation Models for Scientific Machine Learning"
(v3: at UCSD, April 2025).
"Practice, Theory, and Theorems for Random Matrix Theory in Modern Machine Learning"
(at DIMACS, June 2024).
"Foundational Methods for Foundation Models for Scientific Machine Learning"
(v1: at Michigan, April 2024).
"Recent and Upcoming Developments in Randomized Numerical Linear Algebra for ML"
(Tutorial at 2023 NeurIPS, December 2023)
"Practice, Theory, and Theorems for Random Matrix Theory in Modern Machine Learning"
("at" RMT+NLA-UW, June 2022).
"Continuous Network Models for Sequential Predictions"
("at" JHU MIDS, January 2021).
"Toward combining principled scientific models and principled machine learning models"
("at" LLNL, November 2021).
"Practical Theory and Neural Network Models"
("at" IARAI, October 2021).
"Practical Theory and Neural Network Models"
("at" TOPML, April 2021).
"ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning"
("at" MIT (and elsewhere), September 2020).
"Dynamical systems and machine learning: combining in a principled way data-driven models and domain-driven models"
("at" NJIT (and elsewhere), September 2020).
"Continuous-in-Depth Neural Networks"
("at" Fields Institute, August 2020).
"Determinantal Point Processes and Numerical Randomized Linear Algebra"
("at" Tel Aviv, April 2020).
"Statistical Mechanics Methods for Discovering Knowledge from Modern Production Quality Neural Networks"
(at ACM-SIGKDD, August 2019).
"Why Deep Learning Works: Traditional and Heavy-Tailed Implicit Self-Regularization in Deep Neural Networks"
(at the ICML 2019 Workshop on Theoretical Physics for Deep Learning, June 2019).
"Why Deep Learning Works: Heavy-Tailed Random Matrix Theory as an Example of Physics Informed Machine Learning"
(June 2019, at UW's Physics Informed ML Meeting; and also at the Oct 2019 IPAM Interpretable Learning in Physical Sciences Workshop).
"Why Deep Learning Works: Implicit Self-Regularization in Deep Neural Networks"
(at the SF Bay Area DM-SIG Meeting, February 2019).
"Why Deep Learning Works: Implicit Self-Regularization in Deep Neural Networks"
(Sept 2018, at Simons' Institute 2018 Big Data RandNLA meeting; and also at the Sept 2019 DIMACS RandNLA Workshop).
"Sampling for Linear Algebra, Statistics, and Optimization I" and "Sampling for Linear Algebra, Statistics, and Optimization II"
(Aug 2018, at Simons' Institute 2018 Big Data Bootcamp).
"Biomedicine & the Foundations of Data?"
(at the BD2K Guide to the Fundamentals of Data Science seminar series).
"Matrix factorizations at scale: a comparison of scientific data analytics on Spark and MPI"
(at the 2016 Dato Data Science Summit).
"Challenges in Multiresolution Methods for Graph-based Learning"
(at the NIPS Workshops, December 2015).
"Using Local Spectral Methods in Theory and in Practice"
(at the NIPS Workshops, December 2015).
"Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra"
(at DIMACS, August 2015).
"Eigenvector localization and implicit regularization for large-scale graphs and networked data"
(at CIMAT, March 2015).
"Locally-biased and semi-supervised eigenvectors"
(at MMDS 2014, June 2014).
"Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments"
(at the SF Bay Area DM-SIG Meeting, June 2, 2014).
"BIG Biomedicine and the Foundations of BIG Data Analysis"
(at the Big Data in BioMedicine Conference at Stanford University, May 2014).
"Randomized Matrix Algorithms and Large-scale Scientific Data Analysis"
(at the 2014 ASE Conference, May 31, 2014).
"Eigenvector localization, implicit regularization, and algorithmic anti-differentiation for large-scale graphs and networked data" (or try also here for another version)
(at ICERM and IMA, May and April 2014, respectively).
"Randomized matrix algorithms and large-scale scientific data analysis"
(at the University of Michigan, April 2014).
"Recent Results in Randomized Numerical Linear Algebra" (or click here)
(at the NIPS 2013 Workshops, December 2013).
"Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments"
(at the Simons Big Data Workshop II, October 2013).
"Input-sparsity Time Algorithms for Embeddings and Regression Problems"
(at the Simons Big Data Workshop I, September 2013).
"Past, Present and Future of Randomized Numerical Linear Algebra: Part I (PD) and Part II" (MM)
(at the Simons Big Data Bootcamp, September 2013).
"Randomized Regression in Parallel and Distributed Environments"
(at the GraphLab Workshop, July 2013).
"Sensors, networks, and massive data" or click here
(at Kavli FoS, November 2012).
"Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments"
(at MMDS 2012, July 2012).
"Extracting insight from large networks: implications of small-scale and large-scale structure"
(at the University of Marlyand, April 2012).
"Fast Approximation of Matrix Coherence and Statistical Leverage"
(at the NIPS Workshops, December 2011).
"Approximate computation and implicit regularization in large-scale data analysis" or click here
(at the Workshop on Beyond Worst-Case Analysis, September 2011).
"Linear Algebra and Machine Learning of Large Informatics Graphs"
(at the NIPS Workshops, December 2010).
"Geometric Tools for Identifying Structure in Large Social and Information Networks"
(90 minute version, tutorial at SAMSI Opening workshop, August 2010).
"Geometric Tools for Graph Mining of Large Social and Information Networks"
(3 hour version, tutorial at KDD 2010, July 2010).
"Community Structure in Large Social and Information Networks"
(at the Newton Institute in Cambridge, June 2010).
"Algorithmic and Statistical Perspectives on Large-Scale Data Analysis"
(at the SF Bay Area DM-SIG Meeting, February 2010).
"Community Structure in Large Social and Information Networks," in avi or mpg (note: slow to download),
(at the 2009 IIT Kanpur Processing Massive Data Sets Workshop, December 2009).
"Statistical Leverage and Improved Matrix Algorithms"
(at the 2009 ICML Workshops, June 2009).
"Sampling Algorithms and Coresets for Lp Regression and Applications," in mpg (note: slow to download),
(at the 2006 IIT Kanpur Data Streams Workshop, December 2006).
"CUR Matrix Decompositions for Improved Data Analysis"
(at Johns Hopkins University, March 2006).
"Fast Monte Carlo Algorithms for Matrix Operations and Massive Data Set Analysis"
(at the 2005 IPAM Summer School, July 2005).