* Benign overfitting without linearity was accepted at COLT 2022.
* I am an organizer for the Deep Learning Theory Summer School and Workshop, to be held this summer at the Simons Institute.
* I will be speaking at the ETH Zurich Data, Algorithms, Combinatorics, and Optimization Seminar on June 7th.
* I will be a keynote speaker at the University of Toronto Statistics Research Day on May 25th.
* I am giving a talk at Harvard University's Probabilitas Seminar on May 6th.
* Two recent works accepted at the Theory of Overparameterized Machine Learning 2022 workshop, including one as a contributed talk.
* I am giving a talk at the Microsoft Research ML Foundations Seminar on April 28th.
* I am giving a talk at the University of British Columbia (Christos Thrampoulidis's group) on April 8th.
* I am giving a talk at Columbia University (Daniel Hsu's group) on April 4th.
* I am giving a talk at Oxford University (Yee Whye Teh's group) on March 23rd.
* I am giving a talk at the NSF/Simons Mathematics of Deep Learning seminar on March 10th.
* I am giving a talk at the Google Algorithms Seminar on March 8th.
* I'm reviewing for the Theory of Overparameterized Machine Learning 2022 workshop.
* Two new preprints with Niladri Chatterji and Peter Bartlett: Benign Overfitting without Linearity and Random Feature Amplification.
* Recent work on sample complexity of a self-training algorithm accepted at AISTATS 2022.
Older news (click to expand)
2021
* I am speaking at the Deep Learning Theory Symposium at the Simons Institute on December 6th.
* My paper on proxy convexity as a framework for neural network optimization was accepted at NeurIPS 2021.
* Two new preprints on arxiv: (1) Proxy convexity: a unified framework for the analysis of neural networks trained by gradient descent, and (2) Self training converts weak learners to strong learners in mixture models.
* I am reviewing for the ICML 2021 workshop Overparameterization: Pitfalls and Opportunities (ICMLOPPO2021).
* Three recent papers accepted at ICML, including one as a long talk.
* New preprint on provable robustness of adversarial training for learning halfspaces with noise.
* I will be presenting recent work at TOPML2021 as a lightning talk, and at the SoCal ML Symposium as a spotlight talk.
* I'm giving a talk at the ETH Zurich Young Data Science Researcher Seminar on April 16th.
* I'm giving a talk at the Johns Hopkins University Machine Learning Seminar on April 2nd.
* I'm reviewing for the Theory of Overparameterized Machine Learning Workshop.
* I'm giving a talk at the Max-Planck-Insitute (MPI) MiS Machine Learning Seminar on March 11th.
* New preprint showing SGD-trained neural networks of any width generalize in the presence of adversarial label noise.
2020
* New preprint on agnostic learning of halfspaces using gradient descent is now on arXiv.
* My single neuron paper was accepted at NeurIPS 2020.
* I will be attending the IDEAL Special Quarter on the Theory of Deep Learning hosted by TTIC/Northwestern for the fall quarter.
* I've been awarded a Dissertation Year Fellowship by UCLA's Graduate Division.
* New preprint on agnostic PAC learning of a single neuron using gradient descent is now on arXiv.
* New paper accepted at Brain Structure and Function from work with researchers at UCLA School of Medicine.
* I'll be (remotely) working at Amazon's Alexa AI group for the summer as a research intern, working on natural language understanding.
2019
* My paper with Yuan Cao and Quanquan Gu, "Algorithm-dependent Generalization Bounds for Overparameterized Deep Residual Networks", was accepted at NeurIPS 2019 (arXiv version, NeurIPS version).
Click for paper summary
Key to the success of neural networks in practice is that gradient descent training allows the networks to learn useful features of the data they are trained on. However, standard approaches for understanding neural network training are restricted to settings where such feature learning is impossible or only hold in the limit as the network becomes infinitely wide. In this work we characterize the feature-learning process in two-layer ReLU networks of finite width in a classification task where the labels are generated by an XOR-like function of the input features. We develop a novel proof technique that shows that at random initialization, most neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.