Publications

Jacob Steinhardt (jsteinhardt@berkeley)

My goal is to make the conceptual advances necessary for machine learning systems to be reliable and aligned with human values. This includes the following directions:

Robustness: How can we build models robust to distributional shift, to adversaries, to model mis-specification, and to approximations imposed by computational constraints? What is the right way to evaluate such models?
Reward specification and reward hacking: Human values are too complex to be specified by hand. How can we infer complex value functions from data? How should an agent make decisions when its value function is approximate due to noise in the data or inadequacies in the model? How can we prevent reward hacking--degenerate policies that exploit differences between the inferred and true reward?
Scalable alignment: Modern ML systems are often too large, and deployed too broadly, for any single person to reason about in detail, posing challenges to both design and monitoring. How can we design ML systems that conform to interpretable abstractions? How do we enable meaningful human oversight at training and deployment time despite the large scale? How will these large-scale systems affect societal equilibria?

These challenges require rethinking both the theoretical and empirical paradigms of ML. Theories of statistical generalization do not account for the extreme types of generalization considered above, and decision theory does not account for cases where the reward function is only approximate. Meanwhile, measuring empirical test accuracy on a fixed distribution is insufficient to analyze phenomena such as robustness to distributional shift.

I seek students who are technically strong, broad-minded, and want to improve the world through their research. I particularly value creative, curious thinkers who are excited to revisit the conceptual foundations of the field.

Outside of research, I am a coach for the USA Computing Olympiad and an instructor at the Summer Program in Applied Rationality and Cognition. I also consult part-time for the Open Philanthropy Project. I like indoor bouldering and ultimate frisbee.

Teaching

STAT240 (Robust and Nonparametric Statistics) (Spring 2021)
STAT260 (Robust Statistics) (Fall 2019)

Past/Present

I joined the Statistics faculty at UC Berkeley in Fall of 2019, where I am also a member of the Berkeley Artificial Intelligence Lab and of the EECS department (by courtesy). I recently finished a PhD in machine learning at Stanford University working with Percy Liang. In-between I spent some time working at the Open Philanthropy Project and at OpenAI.

Blog

I maintain a (somewhat slow-updating) expository blog. I also used to keep an online daily research log early in graduate school.

Essays

AI Alignment Research Overview (October 2019) [link]
Research as a Stochastic Decision Process (December 2018) [link]
Long-Term and Short-Term Challenges to Ensuring the Safety of AI Systems (June 2015) [link]
The Power of Noise (June 2014) [link]
A Fervent Defense of Frequentist Statistics (February 2014) [link]
Beyond Bayesians and Frequentists (October 2012) [link]

Students/Post-docs

Publications

2021

Agnostic learning with unknown utilities. Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt. Innovations in Theoretical Computer Science (ITCS), 2021. [bib] [paper] [talk]

Measuring massive multitask language understanding. Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt. International Conference on Learning Representations (ICLR), 2021. [bib] [paper] [data]

Aligning AI with shared human values. Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt. International Conference on Learning Representations (ICLR), 2021. [bib] [paper] [data]

Natural adversarial examples. Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, Dawn Song. Computer Vision and Pattern Recognition (CVPR), 2021. [bib] [paper] [data]

Approximating how single head attention learns. Charlie Snell, Ruiqi Zhong, Dan Klein, Jacob Steinhardt. arXiv, 2021. [bib] [paper]

Limitations of post-hoc feature alignment for robustness. Collin Burns, Jacob Steinhardt. Computer Vision and Pattern Recognition (CVPR), 2021. [bib] [paper]

Measuring mathematical problem solving with the MATH dataset. Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt. arXiv, 2021. [bib] [paper] [code]

Understanding generalization in adversarial training via the bias-variance decomposition. Yaodong Yu, Zitong Yang, Edgar Dobriban, Jacob Steinhardt, Yi Ma. arXiv, 2021. [bib] [paper]

2020

Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming. Sumanth Dathathri, Krishnamurthy Dvijotham, Alexey Kurakin, Aditi Raghunathan, Jonathan Uesato, Rudy Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow, Percy Liang, Pushmeet Kohli. Advances in Neural Information Processing Systems (NeurIPS), 2020. [bib] [paper] [blog post] [CodaLab]

How data science can ease the COVID-19 pandemic. Nigam Shah, Jacob Steinhardt. TechStream, 2020. [bib] [paper]

Why robustness is key to deploying AI. Jacob Steinhardt, Helen Toner. TechStream, 2020. [bib] [paper]

Estimating household transmission of SARS-CoV-2. Mihaela Curmei, Andrew Ilyas, Owain Evans, Jacob Steinhardt. medrXiv, 2020. [bib] [paper]

Prevalence tracking mechanisms for SARS-CoV-2. Jacob Steinhardt, Andrew Ilyas. preprint, 2020. [bib] [paper]

Estimation of SARS-CoV-2 infection prevalence in Santa Clara County. Steve Yadlowsky, Nigam Shah, Jacob Steinhardt. preprint, 2020. [bib] [paper] [code] [blog post]

Identifying statistical bias in dataset replication. Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Jacob Steinhardt, Aleksander Madry. International Conference on Machine Learning (ICML), 2020. [bib] [paper] [code] [blog post]

The many faces of robustness: a critical analysis of out-of-distribution generalization. Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, Justin Gilmer. arXiv, 2020. [bib] [paper] [code]

Robust estimation via generalized quasi-gradients. Banghua Zhu, Jiantao Jiao, Jacob Steinhardt. arXiv, 2020. [bib] [paper]

Rethinking bias-variance trade-off for generalization of neural networks. Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, Yi Ma. International Conference on Machine Learning (ICML), 2020. [bib] [paper]

When does the tukey median work? Banghua Zhu, Jiantao Jiao, Jacob Steinhardt. IEEE International Symposium on Information Theory (ISIT), 2020. [bib] [paper]

Scaling out-of-distribution detection for real-world settings. Dan Hendrycks, Steven Basart, Mantas Mazeika, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song. arXiv, 2020. [bib] [paper] [data]

2019

Generalized resilience and robust statistics. Banghua Zhu, Jiantao Jiao, Jacob Steinhardt. arXiv, 2019. [bib] [paper] [talk]

Testing robustness against unforeseen adversaries. Daniel Kang, Yi Sun, Dan Hendrycks, Tom Brown, Jacob Steinhardt. arXiv, 2019. [bib] [paper] [reviews] [code]

Sever: a robust meta-algorithm for stochastic optimization. Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Jacob Steinhardt, Alistair Stewart. International Conference on Machine Learning (ICML), 2019. [bib] [paper] [code]

Troubling trends in machine learning scholarship. Zachary C. Lipton, Jacob Steinhardt. ACM Queue, 2019. [bib] [paper]

FrAngel: component-based synthesis with control structures. Kensen Shi, Jacob Steinhardt, Percy Liang. Principles of Programming Languages (POPL), 2019. [bib] [paper] [CodaLab]

2018

Better agnostic clustering via relaxed tensor norms. Pravesh Kothari, Jacob Steinhardt. Symposium on Theory of Computing (STOC), 2018. [bib] [paper]

Resilience: a criterion for learning in the presence of arbitrary outliers. Jacob Steinhardt, Moses Charikar, Gregory Valiant. Innovations in Theoretical Computer Science (ITCS), 2018. [bib] [paper] [talk] [slides]

Robust learning: information theory and algorithms. Jacob Steinhardt. Stanford University, 2018. [bib] [thesis]

The malicious use of artificial intelligence: forecasting, prevention, and mitigation. Miles Brundage, Shahar Avin, Jack Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Allan Dafoe, Paul Scharre, Thomas Zeitzoff, Bobby Filar, others. arXiv, 2018. [bib] [paper]

Stronger data poisoning attacks break data sanitization defenses. Pang Wei Koh, Jacob Steinhardt, Percy Liang. arXiv, 2018. [bib] [paper] [code]

Semidefinite relaxations for certifying robustness to adversarial examples. Aditi Raghunathan, Jacob Steinhardt, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2018. [bib] [paper]

Certified defenses against adversarial examples. Aditi Raghunathan, Jacob Steinhardt, Percy Liang. International Conference on Learning Representations (ICLR), 2018. [bib] [paper] [reviews] [CodaLab]

2017

Learning from untrusted data. Moses Charikar, Jacob Steinhardt, Gregory Valiant. Symposium on Theory of Computing (STOC), 2017. [bib] [paper] [talk] [slides] [poster]

Does robustness imply tractability? A lower bound for planted clique in the semi-random model. Jacob Steinhardt. arXiv, 2017. [bib] [paper]

Certified defenses for data poisoning attacks. Jacob Steinhardt, Pang Wei Koh, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2017. [bib] [paper] [code] [CodaLab]

2016

Memory, communication, and statistical queries. Jacob Steinhardt, Gregory Valiant, Stefan Wager. Conference on Learning Theory (COLT), 2016. [bib] [paper]

Avoiding imposters and delinquents: adversarial crowdsourcing and peer prediction. Jacob Steinhardt, Gregory Valiant, Moses Charikar. Advances in Neural Information Processing Systems (NeurIPS), 2016. [bib] [paper]

Concrete problems in AI safety. Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané. arXiv preprint arXiv:1606.06565, 2016. [bib]

Unsupervised risk estimation using only conditional independence structure. Jacob Steinhardt, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2016. [bib] [paper]

2015

Minimax rates for memory-bounded sparse linear regression. Jacob Steinhardt, John Duchi. Conference on Learning Theory (COLT), 2015. [bib] [paper] [talk] [slides] [poster]

Learning with relaxed supervision. Jacob Steinhardt, Percy Liang. Advances in Neural Information Processing Systems (NeurIPS), 2015. [bib] [paper] [poster] [CodaLab]

Reified context models. Jacob Steinhardt, Percy Liang. International Conference on Machine Learning (ICML), 2015. [bib] [paper] [talk] [slides] [poster] [CodaLab]

Learning fast-mixing models for structured prediction. Jacob Steinhardt, Percy Liang. International Conference on Machine Learning (ICML), 2015. [bib] [paper] [talk] [slides] [poster] [CodaLab]

Learning where to sample in structured prediction. Tianlin Shi, Jacob Steinhardt, Percy Liang. Artificial Intelligence and Statistics (AISTATS), 2015. [bib] [paper] [slides] [CodaLab]

2014

The statistics of streaming sparse regression. Jacob Steinhardt, Stefan Wager, Percy Liang. arXiv preprint arXiv:1412.4182, 2014. [bib] [paper]

Adaptivity and optimism: an improved exponentiated gradient algorithm. Jacob Steinhardt, Percy Liang. International Conference on Machine Learning (ICML), 2014. [bib] [paper] [slides] [poster]

Filtering with abstract particles. Jacob Steinhardt, Percy Liang. International Conference on Machine Learning (ICML), 2014. [bib] [paper] [supplemental material] [slides] [poster]

2012

Flexible martingale priors for deep hierarchies. Jacob Steinhardt, Zoubin Ghahramani. Artificial Intelligence and Statistics (AISTATS), 2012. [bib] [paper] [slides] [poster]

2011

Finite-time regional verification of stochastic nonlinear systems. Jacob Steinhardt, Russ Tedrake. Robotics: Science and Systems (RSS), 2011. [bib] [paper] [journal] [slides] [poster]

Pathological properties of deep bayesian hierarchies. Jacob Steinhardt, Zoubin Ghahramani. NIPS Workshop on Bayesian Nonparametrics, 2011. [bib] [paper] [poster]

2010

Permutations with ascending and descending blocks. Jacob Steinhardt. Electronic Journal of Combinatorics, 2010. [bib] [paper] [slides]

2009

On coloring the odd-distance graph. Jacob Steinhardt. Electronic Journal of Combinatorics, 2009. [bib] [paper]

2007

Cayley graphs formed by conjugate generating sets of s_n. Jacob Steinhardt. preprint, 2007. [bib] [paper]