Statistical Mechanics Methods for Discovering Knowledge from Production-Scale Neural Networks

(A Tutorial at KDD 2019)

August 4th 2019.

1:00PM - 5:00PM, Summit 8, Ground Level, Egan

To Prepare for the Tutorial: Part of the tutorial will cover the weightwatcher tool, which we have developed and which can be used to reproduce and extend our results. In preparation for the tutorial, you should "pip install weightwatcher" and/or go to our WeightWatcher repo for more information, demo notebooks, etc.

Link to the slides: click here.



Charles Martin

Charles Martin holds a PhD in Theoretical Chemistry from the University of Chicago. He was then an NSF Postdoctoral Fellow and worked in a Theoretical Physics group at UIUC that studied the statistical mechanics of Neural Networks. He currently owns and operates Calculation Consulting, a boutique consultancy specializing in ML and AI, supporting clients doing applied research in AI. He maintains a well-recognized blog on practical ML theory and he has to date supported and performed the work on Implicit and Heavy Tailed Self Regularization in Deep Learning.

Michael Mahoney

Michael Mahoney is at ICSI and Department of Statistics at UC Berkeley. He works on algorithmic and statistical aspects of modern large-scale data analysis. He is a leader in Randomized Numerical Linear Algebra; he led the largest large-scale empirical evaluation of community structure in social and information networks; he has developed implicit regularization methods and scalable optimization methods for convex and non-convex problems; and he has applied these methods and complementary RMT methods to DNN problems.


The tutorial will review recent developments in using techniques from statistical mechanics to understand the properties of modern deep neural networks. Although there have long been connections between statistical mechanics and neural networks, in recent decades connections have withered. In light of recent failings of traditional statistical learning theory and stochastic optimization theory even to qualitatively describe many properties of production quality deep neural network models, researchers have revisited ideas from the statistical mechanics of neural networks. The tutorial will provide an overview of the area; it will go into detail on how connections with heavy tailed random matrix theory can lead to a practical phenomenological theory for large-scale deep neural networks; and it will describe future directions.

More details can be found here.


Several relevant papers, including code to reproduce the results:

  • Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks,
  • C. H. Martin and M. W. Mahoney,
    Technical Report, Preprint: arXiv:1901.08278 (2019) (arXiv), (code),
  • Traditional and Heavy-Tailed Self Regularization in Neural Network Models,
  • C. H. Martin and M. W. Mahoney,
    Technical Report, Preprint: arXiv:1901.08276 (2019) (arXiv), (code),
    Accepted for publication, Proc. ICML 2019.
  • Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning,
  • C. H. Martin and M. W. Mahoney,
    Technical Report, Preprint: arXiv:1810.01075 (2018) (arXiv), (code),
  • Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior,
  • C. H. Martin and M. W. Mahoney,
    Technical Report, Preprint: arXiv:1710.09553 (2017) (arXiv),

    Several related presentations:

  • Charles talking at LBNL, June 2018.
  • Michael talking at the Simons' Institute 2018 Big Data RandNLA meeting, Sept 2018.
  • Charles talking at ICSI, December 2018.
  • Michael talking at SF Bay Area DM-SIG Meeting, February 2019.