Michael Mahoney - Presentations

Talks and Presentations

Recent tutorial presentations:

  • Recent and Upcoming Developments in Randomized Numerical Linear Algebra for ML (December 2023, Tutorial at 2023 NeurIPS; v2 at KDD24) (pdf)

  • Practical neural network theory: from statistical mechanics basics to heavy-tailed self regularization to working with state of the art models (Apr 2023) (pdf)

Recent seminar presentations:

  • Foundational Methods for Foundation Models for Scientific Machine Learning (April 2024) (pdf)

  • Model Selection And Ensembling When There Are More Parameters Than Data (November 2023) (pdf)

  • Foundations for scientific machine learning (May 2023) (pdf)

  • Multiplicative noise and heavy tails in stochastic optimization and machine learning (May 2023) (pdf)

  • Algorithmic Methods, Backdoors, and Model Robustness (Apr 2023) (pdf)

  • Putting Randomness into LAPACK and Next Generation RandNLA Theory (Mar 2023) (pdf)

  • Scientific machine learning: methods to bridge scientific spatial and temporal modeling with machine learning (Jun 2022) (pdf)

  • Practice, Theory, and Theorems for Random Matrix Theory in Modern Machine Learning (Jun 2022) (pdf)

  • Building foundations for scientific machine learning at scale (Mar 2022) (pdf)

  • Continuous Network Models for Sequential Predictions (Jan 2022) (pdf)

  • Toward combining principled scientific models and principled machine learning models (Nov 2021) (pdf)

  • Column Subset Selection: TCS and NLA ("at" the CMC Seminar Series, Nov 2021) (pdf)

  • Putting Randomized Matrix Algorithms in LAPACK, and Connections with Second-order Stochastic Optimization ("at" the NeurIPS 2021 Optimization Workshop) (pdf)

  • Least Squares in RandNLA ("at" the CMC Seminar Series, Sept 2021) (pdf)

  • Incorporating second order ideas into first class machine learning methods ("at" the MOPTA Meeting, Aug 2021) (pdf)

  • Overcoming Inversion Bias in Distributed Newton's Method (May 2021) (pdf)

  • Practical Theory and Neural Network Models ("at" TOPML Workshop and elsewhere, April 2021) (pdf)

  • ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning ("at" MIT and elsewhere, September 2020) (pdf)

  • Dynamical systems and machine learning: combining in a principled way data-driven models and domain-driven models (September 2020) (pdf)

  • Continuous-in-Depth Neural Networks (August 2020) (pdf)

  • ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning (older, August 2020) (pdf)

  • Determinantal Point Processes and Randomized Numerical Linear Algebra (April 2020) (pdf)

Older tutorial presentations:

  • Statistical Mechanics Methods for Discovering Knowledge from Modern Production Quality Neural Networks (at ACM-SIGKDD 2019) (pdf)

  • Sampling for Linear Algebra, Statistics, and Optimization (Aug 2018, at Simons' Institute 2018 Big Data Bootcamp) (pdf)

  • RandNLA: Randomization in Numerical Linear Algebra (5.5 hr version at UW Madison Summer School, July 2018) (pdf)

  • Randomization in Numerical Linear Algebra: Theory and Practice (2.0 hr version at SIAM ALA Meeting, October 2015) (pdf)

  • Past, Present and Future of Randomized Numerical Linear Algebra: (3.0 hr version at Simons' Institute 2013 Big Data Bootcamp) (Part I: pdf, ppt and Part II: pdf, ppt)

  • Theory (and some practice) of Randomized Algorithms for Matrices and Data (tutorial from FOCS 2012 Workshop) (pdf, ppt)

  • Geometric Tools for Identifying Structure in Large Social and Information Networks (1.5 hr version at SAMSI Opening Workshop 2010, etc.) (pdf, ppt)

  • Geometric Tools for Identifying Structure in Large Social and Information Networks (2 hr version at ICASSP 2011, etc.) (pdf, ppt)

  • Geometric Tools for Identifying Structure in Large Social and Information Networks (3 hr version at ICML 2010 and KDD 2010, etc.) (pdf, ppt) (The pdf file in four pieces: here, here, here, and here.)

  • Randomized Algorithms for Matrices and Massive Data Sets (at SIAM-SDM06 2006 and VLDB 2006) (ppt)

  • Randomized Algorithms for Matrices and Massive Data Sets (at ACM-SIGKDD 2005) (ppt)

Older seminar presentations:

  • Using dynamical systems ideas to combine in a principled way data-driven models and domain-driven models (older, April 2020) (pdf)

  • Exact expressions for double descent and implicit regularization via surrogate random design (December 2019) (pdf)

  • Making the Deep Learning Revolution Practical Through Second Order Methods (at MIT, October 2019) (pdf (big=40MB))

  • Newton-MR: Newton's Method Without Smoothness or Convexity (October 2019) (pdf)

  • Minimax and Bayesian experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression (at TTIC, Sept 2019) (pdf)

  • Why Deep Learning Works: Traditional and Heavy-Tailed Implicit Self-Regularization in Deep Neural Networks (at the ICML 2019 Workshop on Theoretical Physics for Deep Learning, June 2019) (pdf)

  • Why Deep Learning Works: Implicit Self-Regularization in Deep Neural Networks (at the SF Bay Area DM-SIG Meeting, February 2019) (pdf)

  • Why Deep Learning Works: Implicit Self-Regularization in Deep Neural Networks (Sept 2018, at Simons' Institute 2018 Big Data RandNLA meeting) (pdf)

  • Large Scale Training of Neural Networks (Sept/Nov 2018) (pdf)

  • Alchemist: An Apache Spark <=> MPI Interface (June 2018) (pdf)

  • Biomedicine & the Foundations of Data? (May 2018) (pdf)

  • Scientific Machine Learning with Alchemist (An Apache Spark <=> MPI Interface) and Beyond (Apr 2018) (pdf)

  • Numerically-intensive Machine Learning at Scale (Fall 2017) (pdf)

  • Second-order Machine Learning (Fall 2017) (pdf)

  • UC Berkeley's FODA Institute: Foundations of Data Analysis (NSF TRIPODS Kickoff, Oct 2017) (pdf)

  • Second-order Machine Learning (Pre-fall 2017) (pdf)

  • Local graph analytics: beyond characterizing community structure (Spring 2017) (pdf)

  • Terabyte-scale Computational Statistics (talk Fall 2016) (pdf)

  • Scientific Matrix Factorizations in Spark at Scale: Cross-platform performance, scaling, and comparisons with C+MPI (talk at 2016 Dato Data Science Summit and elsewhere) (pdf)

  • Optimization Algorithms for Analyzing Large Datasets (talk at 2016 PCMI Summer School) (pdf)

  • Foundations of Data Science (talk at NSF pre-TRIPODS workshop/meeting, Apr 2016) (pdf)

  • Sub-Sampled Newton Methods (talk at ITA 2016 and elsewhere) (pdf)

  • Challenges in Multiresolution Methods for Graph-based Learning (talk, 3of3, from NIPS15 Workshops) (pdf)

  • Using Local Spectral Methods in Theory and in Practice (talk, 2of3, from NIPS15 Workshops) (pdf)

  • Column Subset Selection on Terabyte-sized Scientific Data (talk, 1of3, from NIPS15 Workshops) (pdf)

  • Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra (DIMACS, Aug 2015) (pdf)

  • Overview of RandNLA: Randomized Numerical Linear Algebra (pdf)

  • Tree-like structure in social graphs (pdf (big=28MB), ppt (big=48MB))

  • Eigenvector localization, implicit regularization, and algorithmic anti-differentiation for large-scale graphs and network data (pdf, ppt)

  • Locally-biased and semi-supervised eigenvectors (talk from MMDS 2014) (pdf, ppt)

  • Implicit regularization in sublinear approximation algorithms (pdf, ppt)

  • BIG Biomedicine and the Foundations of BIG Data Analysis (at Big Data in Biomedicine at Stanford's Medical School, 5/23/14) (pdf, ppt)

  • Revisiting the Nystrom Method for Improved Large-Scale Machine Learning (pdf, ppt)

  • Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments (version from Simons Big Data Workshop II) (pdf)

  • Input-sparsity Time Algorithms for Embeddings and Regression Problems (talk from Simons Big Data Workshop I) (pdf)

  • Randomized Regression in Parallel and Distributed Environments (talk from GraphLab 2013) (pdf)

  • Extracting insight from large networks: implications of small-scale and large-scale structure (pdf, ppt)

  • Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments (version from MMDS 2012) (pdf)

  • Sensors, networks, and massive data (pdf, ppt)

  • Randomized Algorithms for Matrices and Data (pdf, ppt)

  • Approximate computation and implicit regularization in large-scale data analysis (PODS vsn) (pdf, ppt)

  • Approximate computation and implicit regularization in large-scale data analysis (Stats vrsn1) (pdf, ppt)

  • Approximate computation and implicit regularization in large-scale data analysis (Short vrsn) (pdf, ppt)

  • Looking for clusters in your data ... in theory and in practice (pdf, ppt)

  • Fast Approximation of Matrix Coherence and Statistical Leverage (pdf, ppt)

  • Implementing regularization implicitly via approximate eigenvector computation (pdf, ppt)

  • Linear Algebra and Machine Learning of Large Informatics Graphs (pdf, ppt)

  • Geometric Network Analysis Tools (talk from MMDS 2010) (pdf, ppt)

  • Algorithmic and Statistical Perspectives on Large-Scale Data Analysis (pdf, ppt)

  • Community structure in large social and information networks (newer) (pdf, ppt)

  • Statistical leverage and improved matrix algorithms (newer and long) (pdf, ppt)

  • Approximation Algorithms as Experimental Probes of Informatics Graphs (pdf, ppt)

  • Community structure in large social and information networks (talk from MMDS 2008) (pdf, ppt)

  • Community structure in large social and information networks (older) (pdf, ppt)

  • Statistical leverage and improved matrix algorithms (older and short) (pdf, ppt)

  • Sampling algorithms and core-sets for Lp regression and applications (pdf, ppt)

  • CUR Matrix Decompositions for Improved Data Analysis (talk from MMDS 2006) (pdf, ppt)

  • A Relative-Error CUR Decomposition for Matrices and Its Data Applications (pdf, ppt)

  • Sampling Algorithms for L2 Regression and Applications (talk from SODA 2006) (pdf, ppt)

  • Approximating a Gram Matrix for Improved Kernel-Based Learning (talk from COLT 2005) (ps, pdf)

  • Fast Monte Carlo Algorithms for Matrix Operations and Massive Data Set Analysis (newer) (pdf, ppt)

  • Fast Monte Carlo Algorithms for Matrix Operations and Massive Data Set Analysis (older) (pdf)

  • CUR Matrix Decomposition with Applications to Algorithm Design and Massive Data Set Analysis (pdf)

  • Fast Monte Carlo Algorithms for Massive Data Sets and Approximating Max-Cut (ps, pdf)

The TIP5P Water talk:

  • The Computational Statistical Mechanics of Simple Models of Liquid Water (pdf)

Videos of Talks and Presentations