Spencer Frei

Postdoctoral Fellow
Simons Institute for the Theory of Computing
University of California, Berkeley

Email: frei@berkeley.edu
photo

I am a postdoctoral fellow at the Simons Institute for the Theory of Computing at UC Berkeley, hosted by Peter Bartlett and Bin Yu as a part of the NSF/Simons Collaboration on the Theoretical Foundations of Deep Learning. I'm interested in machine learning, statistics, and optimization, with a particular interest in understanding statistical and computational phenomena observed in deep learning.

Before coming to Berkeley, I completed my PhD in Statistics at UCLA under the supervision of Quanquan Gu and Ying Nian Wu. Prior to this, I completed a masters in mathematics at the University of British Columbia, Vancouver, where I was supervised by Ed Perkins. Before that, I completed my undergraduate degree in mathematics at McGill University.

I am on the 2022-2023 job market.

My latest CV is here (last updated September 2022).


Selected works

Click for paper summary
Benign overfitting, where a statistical model perfectly fits noisy training data yet still generalizes well, was first observed in neural networks trained by gradient descent. Reconciling this behavior with the long-standing intuition from statistics that overfitting is a hazard to be avoided has become a central task for statisticians and learning theorists.

In this work, we provide a characterization of benign overfitting in two-layer neural networks trained by gradient descent following random initialization. We prove that even when a constant fraction of training data have random labels, neural networks trained by gradient descent can achieve 100% training accuracy and simultaneously generalize near-optimally. In contrast to previous works that require either linear or kernel-based predictors, we characterize a benign overfitting phenomenon in a setting where both the model and learning dynamics are fundamentally nonlinear.
Click for paper summary
The success of neural networks trained by gradient descent has been surprising from the standpoint of optimization as the landscapes of neural network objectives are known to be non-convex. Moreover, recent work has shown that there exist many neural networks with simple architectures for which no efficient algorithm can find a global minimum. These hardness results prevent a straightforward application of typical non-convex optimization frameworks like the PL inequality that rely upon all local minima being global minima. In this work, we propose a new non-convex optimization framework that can accommodate objectives for which the best-possible outcome for gradient-based optimization is finding a good local (but not global) minimum. We show that most previous research on provable guarantees for neural networks trained by gradient descent can be unified within our proposed framework.
Random feature amplification: feature learning and generalization in neural networks.
Spencer Frei, Niladri S. Chatterji, and Peter L. Bartlett.
Preprint, arXiv:2202.07626.
Click for paper summary
Key to the success of neural networks in practice is that gradient descent training allows the networks to learn useful features of the data they are trained on. However, standard approaches for understanding neural network training are restricted to settings where such feature learning is impossible or only hold in the limit as the network becomes infinitely wide. In this work we characterize the feature-learning process in two-layer ReLU networks of finite width in a classification task where the labels are generated by an XOR-like function of the input features. We develop a novel proof technique that shows that at random initialization, most neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.

For a complete list of publications, click the Publications tab above.
2022
* I am giving a talk at the University of Alberta Statistics Department Seminar on October 26th.
* I am giving a talk at the EPFL Fundamentals of Learning and Artificial Intelligence Seminar on September 30th.
* I am a visiting scientist at EPFL in September and October, hosted by Emmanuel Abbe.
* I am giving a talk at the Joint Statistical Meetings about benign overfitting without linearity.
* Benign overfitting without linearity was accepted at COLT 2022.
* I am an organizer for the Deep Learning Theory Summer School and Workshop, to be held this summer at the Simons Institute.
* I will be speaking at the ETH Zurich Data, Algorithms, Combinatorics, and Optimization Seminar on June 7th.
* I will be a keynote speaker at the University of Toronto Statistics Research Day on May 25th.
* I am giving a talk at Harvard University's Probabilitas Seminar on May 6th.
* Two recent works accepted at the Theory of Overparameterized Machine Learning 2022 workshop, including one as a contributed talk.
* I am giving a talk at the Microsoft Research ML Foundations Seminar on April 28th.
* I am giving a talk at the University of British Columbia (Christos Thrampoulidis's group) on April 8th.
* I am giving a talk at Columbia University (Daniel Hsu's group) on April 4th.
* I am giving a talk at Oxford University (Yee Whye Teh's group) on March 23rd.
* I am giving a talk at the NSF/Simons Mathematics of Deep Learning seminar on March 10th.
* I am giving a talk at the Google Algorithms Seminar on March 8th.
* I'm reviewing for the Theory of Overparameterized Machine Learning 2022 workshop.
* Two new preprints with Niladri Chatterji and Peter Bartlett: Benign Overfitting without Linearity and Random Feature Amplification.
* Recent work on sample complexity of a self-training algorithm accepted at AISTATS 2022.

Older news (click to expand)
2021
* I am speaking at the Deep Learning Theory Symposium at the Simons Institute on December 6th.
* My paper on proxy convexity as a framework for neural network optimization was accepted at NeurIPS 2021.
* Two new preprints on arxiv: (1) Proxy convexity: a unified framework for the analysis of neural networks trained by gradient descent, and (2) Self training converts weak learners to strong learners in mixture models.
* I am reviewing for the ICML 2021 workshop Overparameterization: Pitfalls and Opportunities (ICMLOPPO2021).
* Three recent papers accepted at ICML, including one as a long talk.
* New preprint on provable robustness of adversarial training for learning halfspaces with noise.
* I will be presenting recent work at TOPML2021 as a lightning talk, and at the SoCal ML Symposium as a spotlight talk.
* I'm giving a talk at the ETH Zurich Young Data Science Researcher Seminar on April 16th.
* I'm giving a talk at the Johns Hopkins University Machine Learning Seminar on April 2nd.
* I'm reviewing for the Theory of Overparameterized Machine Learning Workshop.
* I'm giving a talk at the Max-Planck-Insitute (MPI) MiS Machine Learning Seminar on March 11th.
* New preprint showing SGD-trained neural networks of any width generalize in the presence of adversarial label noise.

2020
* New preprint on agnostic learning of halfspaces using gradient descent is now on arXiv.
* My single neuron paper was accepted at NeurIPS 2020.
* I will be attending the IDEAL Special Quarter on the Theory of Deep Learning hosted by TTIC/Northwestern for the fall quarter.
* I've been awarded a Dissertation Year Fellowship by UCLA's Graduate Division.
* New preprint on agnostic PAC learning of a single neuron using gradient descent is now on arXiv.
* New paper accepted at Brain Structure and Function from work with researchers at UCLA School of Medicine.
* I'll be (remotely) working at Amazon's Alexa AI group for the summer as a research intern, working on natural language understanding.

2019
* My paper with Yuan Cao and Quanquan Gu, "Algorithm-dependent Generalization Bounds for Overparameterized Deep Residual Networks", was accepted at NeurIPS 2019 (arXiv version, NeurIPS version).
I have a monthly radio show where I play music like house, techno, new wave, disco, and reggae.

My partner is a historian.
Representative works are highlighted below.

Random feature amplification: feature learning and generalization in neural networks.
Spencer Frei, Niladri S. Chatterji, and Peter L. Bartlett.
Preprint, arXiv:2202.07626.
Accepted at the Theory of Overparameterized Machine Learning (TOPML2022) workshop.
Benign overfitting without linearity: neural network classifiers trained by gradient descent for noisy linear data.
Spencer Frei, Niladri S. Chatterji, and Peter L. Bartlett.
COLT 2022.
Also appeared at the Theory of Overparameterized Machine Learning (TOPML2022) workshop.
Self-training converts weak learners to strong learners in mixture models.
Spencer Frei*, Difan Zou*, Zixiang Chen*, and Quanquan Gu.
AISTATS 2022.
Provable robustness of adversarial training for learning halfspaces with noise.
Difan Zou*, Spencer Frei*, and Quanquan Gu.
ICML 2021.
Provable generalization of SGD-trained neural networks of any width in the presence of adversarial label noise.
Spencer Frei, Yuan Cao, and Quanquan Gu.
ICML 2021.
Also appeared at the Theory of Overparameterized Machine Learning (TOPML2021) workshop.
Agnostic learning of halfspaces with gradient descent via soft margins.
Spencer Frei, Yuan Cao, and Quanquan Gu.
ICML 2021, Oral (long talk).
Agnostic learning of a single neuron with gradient descent.
Spencer Frei, Yuan Cao, and Quanquan Gu.
NeurIPS 2020.
A lower bound for $p_c$ in range-$R$ bond percolation in two and three dimensions.
Spencer Frei and Edwin Perkins.
Electronic Journal of Probability, 2016.
On thermal resistance in concentric residential geothermal heat exchangers.
Spencer Frei, Kathryn Lockwood, Greg Stewart, Justin Boyer, and Burt S. Tilley.
Journal of Engineering Mathematics, 2014.

* denotes equal contribution.