Michael Mahoney's new home page

Basic info
I am at ICSI and the Department of Statistics at UC Berkeley, and
I am also in the
RISELab
(in the past AMPLab)
in the Department of Computer Science.
You can reach me via electronic mail:
mmahoney
is the username, and
stat dot berkeley dot edu
follows the at symbol.
(Of course, my gmail address is still valid.)
Also, some other (somewhat outdated) information can be found at
my old web page
at Stanford and
my older web page
at Yale.

Students, postdocs, etc.
Most of my work focuses on the theory and practice of what is now called big data, although I was doing it back when it was just massive, and prior to that when it was just large.
On the theory side, we develop algorithms and statistical methods for matrix, graph, regression, optimization, and related problems.
On the practice side, we provide implementations (e.g., on single machine, distributed data system, and supercomputer environments), and we also apply the methods to a range of problems in internet and social media analysis, social networks analysis, as well as genetics, mass spec imaging, astronomy, climate, and a range of other scientific applications.
I have been fortunate enough to work with a number of great students and postdocs over the years.
Shusen Wang (postdoc, UC Berkeley)
Peng Xu (student, Stanford)
Aditya Devarakonda (UC Berkeley)
Julian Shun (postdoc, UC Berkeley, now faculty at MIT)
Kimon Fountoulakis (postdoc, UC Berkeley)
Fred Roosta (postdoc, UC Berkeley, now faculty at Queensland)
Alex Gittens (postdoc, UC Berkeley; now faculty at RPI)
Di Wang (UC Berkeley PhD, 2017; now at Georgia Tech)
Jiyan Yang (Stanford PhD, 2016; now at Facebook)
Aaron Adcock (Stanford PhD, 2014; now at Facebook)
Xiangrui Meng (Stanford PhD, 2014; at LinkedIn, then Databricks)
Lorenzo Orecchia (intern, Yahoo; now faculty at Boston)
Jure Leskovec (intern, Yahoo; now faculty at Stanford)
Hari Narayanan (intern, Yahoo; faculty at Washington, then TIFR, Mumbai)
Jeff Phillips (intern, Yahoo; now faculty at Utah)
LekHeng Lim (intern, Yahoo; now faculty at Chicago)
Boulos Harb (intern, Yahoo; now at Google)
Recent items of interest:
Feb 2018: We have an initial version of our code for subsampled algorithms for second order nonconvex optimization. The empirical paper upon which it is based can be found here. Click here for the repo, and email me or Peng Xu with questions/comments.
Feb 2018: We have an initial version of our code for GPUaccelerated subsampled Newton's method for second order optimization. The implementation paper upon which it is based can be found here. Click here for the repo, and email me or Sudhir Kylasa with questions/comments.
Dec 2017: We have an initial version of our code for local graph clustering (still for intermediatetoexpert users). Click here for it, and email me or Kimon Fountoulakis with questions/comments.
Dec 2017: The introductory "connector" class on mathematics for matrices/graphs/data that I taught during S16 and S17 is being expanded into a fourunit introductory class on linear algebra for data.
Here is the link.
Nov 2017: The Fall 2018 program on Foundations of Data Science at the Simons Institute is looking for research fellows. Please consider applying for it too.
Nov 2017: As part of our new FODA/TRIPODS Institute, we have several postdoctoral positions. See here for more details. Please email me if you are interested.
Oct 2017: I am the PI on the new UC Berkeley FODA (Foundations of Data Analysis) Institute grant, which is part of the NSF TRIPODS program, to deepen the theoretical foundations of data science in a new transdisciplinary institute.
Also involved are coPIs Bin Yu, Fernando Perez, Michael Jordan, and Dick Karp.
Click here for a UC Berkeley press release about FODA/TRIPODS,
(or click here for a summary article from Datanami about it),
or click here for the original NSF announcement.
We will be advertising soon for postdocs to be involved, so please email me if you are interested.
Oct 2017: At RISELab, we have an initial release of our report on how to build more secure, faster AI systems.
Click here for the technical report, and click here for a press release about it.
Sept 2017: We have an initial release of our
local graph clustering package.
May 2017: Congratulations to postdoc Julian Shun, who will start as an assistant professor at MIT.
May 2017: Congratulations to postdoc Fred Roosta, who will start as an assistant professor at the University of Queensland.
Spring 2017, Teaching:
I'll be teaching "Stat89a: Introduction to Matrices and Graphs in Data Science" again.
The link is here.
The link from last year is here.
This is a twounit freshmanlevel "connector" class, taught in conjunction with the main freshman "Foundations of Data Science" class, and it will cover linear algebra, probablity, and related topics in the mathematics of data.
I have a detailed but workinprogress set of course notes for this class.
I will post them at an appropriate point; but, in the meantime, email me if you are interested in them.
Jan 2017: Congratulations to postdoc Alex Gittens, who will start as an assistant professor at RPI.
Aug 2016: I posted on the arXiv (click here and here, respectively) the Lecture Notes on Spectral Graph Methods and the Lecture Notes on Randomized Linear Algebra, that are based on the classes I taught last year and the year before, respectively. They will both be updated at some point.
Jul 2016: Our analysis of the performance of linear algebra in Spark has been posted on the arXiv.
Click here for the code;
click here for a talk based on it that was given at the 2016 Dato Data Science Summit; and
click here for a blog post by Alex Gittens about it.
Jun 2016: The article on "RandNLA: Randomized Numerical Linear Algebra" by Petros Drineas and me, as well as an associated video synopsis has appeared in the Communications of the ACM.
May 2016: Congratulations to postdoc Julian Shun who has been awarded ACM's doctoral dissertation award for his 2015 CMU doctoral thesis "SharedMemory Parallelism Can Be Simple, Fast, and Scalable."
More details can be found in the official ACM announcement.
Jan 2016:
We are running MMDS again!
MMDS 2016
will take place
on the campus of UC Berkeley on Tuesday, June 21 through Friday, June 24, 2016.
See the
main MMDS web page
for more information.
Jan 2016:
A summer school on the "Mathematics of Data" is being organized by Anna Gilbert, John Duchi, and me, with the Park City Mathematics Institute.
Dates: June 30 to July 20, 2016.
Location: near Park City, Utah.
Deadline for applications: January 15, 2016.
Click
here
for the flyer and
here for the web page.
Spring 2016, Teaching:
"Stat89a: Introduction to Matrices and Graphs in Data Science".
This is a twounit freshmanlevel "connector" class, taught in conjunction with the main freshman "Foundations of Data Science" class, and it will cover linear algebra, probablity, and related topics in the mathematics of data.
Nov 2015:
Two articles on HPC Wire:
click here for a discussion of our collaboration between AMPLab, Cray, and NERSC/LBL, where we will develop and implement randomized linear algebra (RLA) algorithms in Spark on HPC platforms for LBL's scientific data applications; and
click here for the role of RLA in HPC more generally.
Jan 2015:
We will be running the Gene Golub SIAM Summer School on
"RandNLA: Randomization in Numerical Linear Algebra".
The dates are June 1526, 2015, and the location is Delphi, Greece.
Click
here
for the flyerand note that the deadline for student applications is February 1, 2015.
Update (1/16): click
here or
here
for the SIAM News article about it.
Teaching,
Spring 2015:
"Stat260/CS294: Topics in Spectral Graph Methods".
First meeting is Thursday, January 22, 2015 in 320 Soda Hall.
We ran MMDS again.
MMDS 2014
took place
on the campus of UC Berkeley on Tuesday, June 17 through Friday, June 20, 2014.
See the
main MMDS web page
for more information.
The videos for MMDS 2014 are available
here.
I attended and gave a talk on "BIG Biomedicine and the Foundations of BIG Data Analysis"
at Stanford University at Stanford Medical School, May 23, 2014,
at their "Big Data in BioMedicine" Conference.
Click
"here"
for a video of the talk.
They also got two nice CEOstyle pictures:
here
I am, and
here
I am again.
I'm moving to ICSI and the Department of Statistics at UC Berkeley.
Teaching,
Fall 2013, at UC Berkeley:
Stat260/CS294: Randomized Algorithms for Matrices and Data, which will be an overview of Randomized Linear Algebra.
Our NRC report on the "Frontiers in Massive Data Analysis" is out and
available here.
I'm involved with running the program on "Theoretical Foundations of Big
Data Analysis," to be held at the Simons Institute at UC Berkeley during the
fall of 2013.
Click
here
for details; in particular, note the link at the bottom of the page to
information about fellowships for young researchers to participate.
We ran MMDS again.
MMDS 2012
took place on the campus of Stanford University on July 1013, 2012.
Speaker videos and presentations are now available.
For pdfs and videos of the speaker presentations, go to the main MMDS web
page; or click
here for the entire
video collection.
The overview article on
"Approximate Computation and Implicit Regularization for Very Largescale Data Analysis"
associated with the invited talk at PODS 2012 meeting is on the arXiv
here.
LSRN:
An initial version of our code for LSRN, the randomized leastsquares solver
for parallel environments, is available
here.
The monograph on "Randomized Algorithms for Matrices and Data" is available
in NOW's "Foundations and Trends in Machine Learning" series
here,
and it is also
available on the arXiv here.


Click
here
for information (including the slides and video!) on the Tutorial on
"Geometric Tools for Identifying Structure in Large Social and Information
Networks," given originally at ICML10 and KDD10 and subsequently at many
other places. (The slides are also linked to below.)
The overview chapter on
"Algorithmic and Statistical Perspectives on LargeScale Data Analysis" is
finally on the arXiv
here; the book in which it will
appear is in press; and a video of the associated talk is
here.
Teaching:
Fall 2009:
CS369M: Algorithms for Massive Data Set Analysis
We have made public many of the networks we have used in our "Community
Structure" papers.
Click
here
for the networks and for related information;
older networks are still
here.
Our PNAS paper on "CUR Matrix Decompositions for Improved Data Analysis"
has appeared as
Proc. Natl. Acad. Sci. USA, 106, 697702 (2009).
Please email me if you can't access a copy from the PNAS web site.
Research interests

Algorithmic and statistical aspects of modern largescale data analysis.

Randomized linear algebra and randomized numerical linear algebra.

Scalable convex and nonconvex optimization in parallel/distributed environments.

Implicit regularization and implicit optimization
methods in scalable approximation algorithms.

Graph approximation algorithms and applications to large social and information networks.

Applications to DNA microarray, SNP, astronomical, medical imaging, and other scientific data.
A lot of my work has focused on
Randomized Linear Algebra, i.e., using random sampling and
random projection methods to solve very large matrixbased problems;
developing
geometric network analysis tools, i.e., using scalable approximation
algorithms with a geometric
or statistical flavor to analyze the structure and dynamics of large
informatics graphs;
developing approximate computation and regularization methods for large
informatics graphs;
applications to community detection, clustering, and information dynamics in
large social and information networks; and
applications to DNA single nucleotide polymorphism (SNP) data,
astronomical and medical imaging data,
and largescale statistical data analysis more generally.
In the past, I developed and analyzed algorithms for large matrix, graph,
and regression problems, and I applied these and related tools to the
statistical data analysis of extremely large scientific and Internet data
sets.
For example, I worked on
largescale web analytics, machine learning, and query log analysis;
applications of graph partitioning algorithms to clustering and
community identification; and
applications of randomized matrix algorithms to hyperspectral medical image
data, DNA microarray data, and DNA SNP data.
In the more distant past, I have also worked on developing and analyzing
Monte Carlo algorithms for performing useful computations on extremely
large matrices,
e.g., the additiveerror and relativeerror CUR matrix decompositions.
Past research has also included work in computational statistical mechanics
on the
development and analysis of the
TIP5P
model of liquid water, as well as work in both computational and experimental
biophysics on proteins and proteinnucleic acid interactions.
Publications
2018

Group Collaborative Representation for Image Set Classification,

B. Liu, L. Jing, J. Li, J. Yu, A. Gittens, and M. W. Mahoney,

Accepted for publication, International Journal of Computer Vision.

Error Estimation for Randomized LeastSquares Algorithms via the Bootstrap,

M. E. Lopes, S. Wang, M. W. Mahoney,

Technical Report, Preprint: arXiv:1803.08021 (2018)
(arXiv),

Submitted for publication.

GPU Accelerated SubSampled Newton's Method,

S. B. Kylasa, F. RoostaKhorasani, M. W. Mahoney, and A. Grama,

Technical Report, Preprint: arXiv:1802.09113 (2018)
(arXiv),
(code),

Submitted for publication.

Hessianbased Analysis of Large Batch Training and Robustness to Adversaries,

Z. Yao, A. Gholami, Q. Lei, K. Keutzer, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1802.08241 (2018)
(arXiv),

Submitted for publication.

Inexact NonConvex NewtonType Methods,

Z. Yao, P. Xu, F. RoostaKhorasani, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1802.06925 (2018)
(arXiv),

Submitted for publication.

Outofsample extension of graph adjacency spectral embedding,

K. Levin, F. RoostaKhorasani, M. W. Mahoney, and C. E. Priebe,

Technical Report, Preprint: arXiv:1802.06307 (2018)
(arXiv),

Submitted for publication.
2017

Lectures on Randomized Numerical Linear Algebra,

P. Drineas and M. W. Mahoney,

Technical Report, Preprint: arXiv:1712.08880 (2017)
(arXiv),

To appear in: Lectures of the 2016 PCMI Summer School on Mathematics of Data.

Avoiding Synchronization in FirstOrder Methods for Sparse Convex Optimization,

A. Devarakonda, K. Fountoulakis, J. Demmel, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1712.06047 (2017)
(arXiv),

Accepted for publication, Proc. IPDPS 2018.

Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior,
(click here for a blog about this paper)

C. H. Martin and M. W. Mahoney,

Technical Report, Preprint: arXiv:1710.09553 (2017)
(arXiv),

LASAGNE: Locality And Structure Aware Graph Node Embedding,

E. Faerman, F. Borutta, K. Fountoulakis, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1710.06520 (2017)
(arXiv),

A Berkeley View of Systems Challenges for AI,

I. Stoica, D. Song, R. A. Popa, D. A. Patterson, M. W. Mahoney, R. H. Katz, A. D. Joseph, M. Jordan, J. M. Hellerstein, J. Gonzalez, K. Goldberg, A. Ghodsi, D. E. Culler, and P. Abbeel,

Technical Report No. UCB/EECS2017159, October 2017
(www),

GIANT: Globally Improved Approximate Newton Method for Distributed Optimization,

S. Wang, F. RoostaKhorasani, P. Xu, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1709.03528 (2017)
(arXiv),

Submitted for publication.

SecondOrder Optimization for NonConvex Machine Learning: An Empirical Study,

P. Xu, F. RoostaKhorasani, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1708.07827 (2017)
(arXiv),
(code),

Submitted for publication.

NewtonType Methods for NonConvex Optimization Under Inexact Hessian Information,

P. Xu, F. RoostaKhorasani, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1708.07164 (2017)
(arXiv),

Submitted for publication.

A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication,

M. E. Lopes, S. Wang, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1708.01945 (2017)
(arXiv),

Submitted for publication.

Capacity releasing diffusions for speed and locality,

D. Wang, K. Fountoulakis, M. Henzinger, M. W. Mahoney, and S. Rao,

Technical Report, Preprint: arXiv:1706.05826 (2017)
(arXiv),

Proc. of the 34th ICML Conference 35983607 (2017)
(pdf),
(supp).

Scalable Kernel KMeans Clustering with Nystrom Approximation: RelativeError Bounds,

S. Wang, A. Gittens, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1706.02803 (2017)
(arXiv),

Accepted for publication, J. Machine Learning Research.

Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction,

K. E. Bouchard, A. F. Bujan, F. RoostaKhorasani, S. Ubaru, Prabhat, A. M. Snijders, J.H. Mao, E. F. Chang, M. W. Mahoney, S. Bhattacharyya,

Technical Report, Preprint: arXiv:1705.07585 (2017)
(arXiv),

Proc. of the 2017 NIPS Conference
(pdf).

SkipGram  Zipf + Uniform = Vector Additivity,

A. Gittens, D. Achlioptas, and M. W. Mahoney,

Proc. of the 55th ACL Meeting 6976 (2017)
(pdf).

Principles and Applications of Science of Information [Scanning the Issue],

T. Courtade, A. Grama, M. W. Mahoney, and T. Weissman,

Proceedings of the IEEE, 105(2): 183188 (2017)
(pdf).

Social Discrete Choice Models,

D. Zhang, K. Fountoulakis, J. Cao, M. Yin, M. W. Mahoney, and A. Pozdnoukhov,

Technical Report, Preprint: arXiv:1703.07520 (2017)
(arXiv),

Submitted for publication.

Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging,

S. Wang, A. Gittens, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1702.04837 (2017)
(arXiv),

Proc. of the 34th ICML Conference 36083616 (2017)
(pdf),

Accepted for publication, J. Machine Learning Research.
2016

Avoiding communication in primal and dual block coordinate descent methods,

A. Devarakonda, K. Fountoulakis, J. Demmel, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1612.04003 (2016)
(arXiv),

Submitted for publication.

Featuredistributed sparse regression: a screenandclean approach,

J. Yang, M. W. Mahoney, M. A. Saunders, and Y. Sun,

Proc. of the 2016 NIPS Conference
(pdf).

Multilabel learning with semantic embeddings,

L. Jing, M. Cheng, L. Yang, A. Gittens, M. W. Mahoney,

ICLR 2017 OpenReview.net
(www),

Mapping the Similarities of Spectra: Global and Locallybiased Approaches to SDSS Galaxy Data,

D. Lawlor, T. Budavari, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1609.03932 (2016)
(arXiv),

The Astrophysical Journal, 833:1, 26 (2016)
(pdf).

Lecture Notes on Spectral Graph Methods,

M. W. Mahoney,

Technical Report, Preprint: arXiv:1608.04845 (2016)
(arXiv),

Lecture Notes on Randomized Linear Algebra,

M. W. Mahoney,

Technical Report, Preprint: arXiv:1608.04481 (2016)
(arXiv),

An optimization approach to locallybiased graph algorithms,

K. Fountoulakis, D. F. Gleich, M. W. Mahoney,

Technical Report, Preprint: arXiv:1607.04940 (2016)
(arXiv),

Proceedings of the IEEE, 105(2): 256272, 2017
(pdf).

DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection,

L. Jing, B. Liu, J. Choi, A. Janin, J. Bernd, M. W. Mahoney, and G. Friedland,

Technical Report, Preprint: arXiv:1607.04378 (2016)
(arXiv),

Proc. of the 2016 ACM Multimedia Conference 5761 (2016)
(pdf),

Accepted for publication, IEEE Transactions on Multimedia.

Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies,

A. Gittens, A. Devarakonda, E. Racah, M. Ringenburg, L. Gerhardt, J. Kottalam, J. Liu, K. Maschhoff, S. Canon, J. Chhugani, P. Sharma, J. Yang, J. Demmel, J. Harrell, V. Krishnamurthy, M. W. Mahoney, and Prabhat,

Technical Report, Preprint: arXiv:1607.01335 (2016)
(arXiv),
(code),

Proc. 2016 IEEE BigData, 204213 (2016)
(pdf).

Subsampled Newton Methods with Nonuniform Sampling,

P. Xu, J. Yang, F. RoostaKhorasani, C. Re, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1607.00559 (2016)
(arXiv),

Proc. of the 2016 NIPS Conference
(pdf).

Approximating the Solution to Mixed Packing and Covering LPs in parallel
time,

M. W. Mahoney, S. Rao, D. Wang, and P. Zhang,

Proc. of the 43rd ICALP Conference, (2016)
(pdf).

A Simple and StronglyLocal FlowBased Method for Cut Improvement,

N. Veldt, D. F. Gleich, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1605.08490 (2016)
(arXiv),

Proc. of the 33rd ICML Conference 19381947 (2016)
(pdf),
(supp).

RandNLA: Randomized Numerical Linear Algebra,

P. Drineas and M. W. Mahoney,

Communications of the ACM, 59, 8090 (2016)
(pdf).

FLAG n' FLARE: Fast LinearlyCoupled Adaptive Gradient Methods,

X. Cheng, F. RoostaKhorasani, S. Palombo, P. L. Bartlett, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1605.08108 (2016)
(arXiv),

Proc. of the 21st AISTATS Conference (2018).

Parallel Local Graph Clustering,

J. Shun, F. RoostaKhorasani, K. Fountoulakis, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1604.07515 (2016)
(arXiv),

Proceedings of the VLDB Endowment, 9(12) 10411052 (2016)
(pdf).

A multiplatform evaluation of the randomized CX lowrank matrix factorization in Spark,

A. Gittens, J. Kottalam, J. Yang, M. F. Ringenburg, J. Chhugani, E. Racah, M. Singh, Y. Yao, C. Fischer, O. Ruebel, B. Bowen, N. G. Lewis, M. W. Mahoney, V. Krishnamurthy, and Prabhat,

Proc. 5th International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics, at IPDPS,
2016
(pdf).

Mining Large Graphs,

D. F. Gleich and M. W. Mahoney,

In
Handbook of Big Data.
pp. 191220,
edited by
P. Buhlmann, P. Drineas, M. Kane, and M. van de Laan,
Chapman and Hall/CRC Press,
2016
(pdf).

Structural properties underlying highquality Randomized Numerical Linear Algebra algorithms,

M. W. Mahoney and P. Drineas,

In
Handbook of Big Data.
pp. 137154,
edited by
P. Buhlmann, P. Drineas, M. Kane, and M. van de Laan,
Chapman and Hall/CRC Press,
2016
(pdf).

Variational Perspective on Local Graph Clustering,

K. Fountoulakis, X. Cheng, J. Shun, F. RoostaKhorasani and M. W. Mahoney,

Technical Report, Preprint: arXiv:1602.01886 (2016)
(arXiv),

Journal version submitted for publication.

SubSampled Newton Methods II: Local Convergence Rates,

F. RoostaKhorasani and M. W. Mahoney,

Technical Report, Preprint: arXiv:1601.04738 (2016)
(arXiv),

Journal version submitted for publication.

SubSampled Newton Methods I: Globally Convergent Algorithms,

F. RoostaKhorasani and M. W. Mahoney,

Technical Report, Preprint: arXiv:1601.04737 (2016)
(arXiv),

Journal version submitted for publication.

RandNLA, Pythons, and the CUR for Your Data Problems: Reporting from G2S3 2015 in Delphi,

E. Gallopoulos, P. Drineas, I. Ipsen, and M. W. Mahoney,

SIAM News 49:1 January/February 2016
(web),
(pdf).
2015

Faster Parallel Solver for Positive Linear Programs via DynamicallyBucketed Selective Coordinate Descent,

D. Wang, M. W. Mahoney, N. Mohan, and S. Rao,

Technical Report, Preprint: arXiv:1511.06468 (2015)
(arXiv).

A Local Perspective on Community Structure in Multilayer Networks,

L. G. S. Jeub, M. W. Mahoney, P. J. Mucha, and M. A. Porter,

Technical Report, Preprint: arXiv:1510.05185 (2015)
(arXiv),

Network Science, 5(2): 144163, 2017
(pdf).

Optimal Subsampling Approaches for Large Sample Linear Regression,

R. Zhu, P. Ma, M. W. Mahoney, and B. Yu,

Technical Report, Preprint: arXiv:1509.05111 (2015)
(arXiv).

Unified Acceleration Method for Packing and Covering Problems via Diameter Reduction,

D. Wang, S. Rao, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1508.02439 (2015)
(arXiv),

Proc. of the 43rd ICALP Conference, (2016)
(pdf).

Using local spectral methods to robustify graphbased learning algorithms,

D. F. Gleich and M. W. Mahoney,

Proc. of the 21st Annual SIGKDD, (2015)
(pdf)
(code).

Structured Block Basis Factorization for Scalable Kernel Matrix Evaluation,

R. Wang, Y. Li, M. W. Mahoney, and E. Darve,

Technical Report, Preprint: arXiv:1502.03571 (2015)
(arXiv).

Identifying important ions and positions in mass spectrometry imaging data using CUR matrix decompositions,

J. Yang, O. Rubel, Prabhat, M. W. Mahoney, and B. P. Bowen,

Analytical Chemistry, 87 (9), 46584666 (2015)
(pdf)
(code).

Spectral Gap Error Bounds for Improving CUR Matrix Decomposition and the Nystrom Method,

D. G. Anderson, S. S. Du, M. W. Mahoney, C. Melgaard, K. Wu, and M. Gu,

Proc. of the 18th International Conference on AISTATS, 1927 (2015)
(pdf,
supp)
(code).

Weighted SGD for Lp Regression with Randomized Preconditioning,

J. Yang, Y.L. Chow, C. Re, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1502.03571 (2015)
(arXiv),

Proc. of the 27th Annual SODA, 558569 (2016)
(pdf),

Accepted for publication, J. Machine Learning Research.

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments,

J. Yang, X. Meng, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1502.03032 (2015)
(arXiv)
(code),

Proceedings of the IEEE 104(1): 5892 (2016)
(pdf).
2014

Tree decompositions and social graphs,

A. B. Adcock, B. D. Sullivan, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1411.1546 (2014)
(arXiv),
(code).

Internet Mathematics, 12(5), 315361 (2016)
(pdf).

Fast Randomized Kernel Methods With Statistical Guarantees,

A. El Alaoui and M. W. Mahoney,

Technical Report, Preprint: arXiv:1411.0306 (2014)
(arXiv),

Proc. of the 2015 NIPS Conference
(pdf).

Signal Processing for Big Data (Editorial for Special Issue)

G. B. Giannakis, F. Bach, R. Cendrillon, M. Mahoney, and J. Neville,

IEEE Signal Processing Magazine, 31: 1516 (September 2014)
(pdf).

A Statistical Perspective on Randomized Sketching for Ordinary LeastSquares,

G. Raskutti and M. W. Mahoney,

Technical Report, Preprint: arXiv:1406.5986 (2014)
(arXiv),

Proc. of the 32nd ICML Conference (2015)
(pdf),

J. Machine Learning Research, 17(214): 131, 2016
(pdf).

Random Laplace Feature Maps for Semigroup Kernels on Histograms,

J. Yang, V. Sindhwani, Q. Fan, H. Avron, and M. W. Mahoney,

Proc. of the 27th CVPR Conference, 971978 (2014)
(pdf).

Antidifferentiating Approximation Algorithms: A case study with Mincuts, Spectral, and Flow,

D. F. Gleich and M. W. Mahoney,

Proc. of the 31st ICML Conference, JMLR W&CP 32 (1): 10181025 (2014)
(pdf).
(code, code).
(talk).

QuasiMonte Carlo Feature Maps for ShiftInvariant Kernels,

J. Yang, V. Sindhwani, H. Avron, and M. W. Mahoney,

Proc. of the 31st ICML Conference, JMLR W&CP 32 (1): 485493 (2014)
(pdf),
(code),

Technical Report, Preprint: arXiv:1412.8293 (2014)
(arXiv),

J. Machine Learning Research, 17(120): 138 (2016)
(pdf).

Think Locally, Act Locally: The Detection of Small, MediumSized, and Large Communities in Large Networks,

L. G. S. Jeub, P. Balachandran, M. A. Porter, P. J. Mucha, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1403.3795 (2014)
(arXiv),
(code, code),

Physical Review E, 91, 012821 (2015)
(pdf).

A new spin on an old algorithm: technical perspective on "Communication costs of Strassen's matrix multiplication,"

M. W. Mahoney,

Communications of the ACM, 57(2): 106 (2014)
(pdf).
2013

Treelike Structure in Large Social and Information Networks,

A. B. Adcock, B. D. Sullivan, and M. W. Mahoney,

Proc. of the 2013 IEEE ICDM, 110 (2013)
(pdf).

Objective Identification of Informative Wavelength Regions in Galaxy Spectra,

C.W. Yip, M. W. Mahoney, A. S. Szalay, I. Csabai, T. Budavari, R. F. G. Wyse,
and L. Dobos,

Technical Report, Preprint: arXiv:1312.0637 (2013)
(arXiv),

Astronomical Journal, 147, 5, 110 (2014)
(pdf).

Evaluating OpenMP Tasking at Scale for the Computation of Graph Hyperbolicity,

A. B. Adcock, B. D. Sullivan, O. R. Hernandez, and M. W. Mahoney,

Proc. of the 9th IWOMP, 7183 (2013)
(pdf).

Frontiers in Massive Data Analysis,

Committee on the Analysis of Massive Data, et al. (M. I. Jordan, et al.),

The National Academies Press (2013)
(pdf),
(web).

A Statistical Perspective on Algorithmic Leveraging,

P. Ma, M. W. Mahoney, and B. Yu,

Technical Report, Preprint: arXiv:1306.5362 (2013)
(arXiv),

Proc. of the 31st ICML Conference, JMLR W&CP 32 (1): 9199 (2014)
(pdf),

J. Machine Learning Research, 16, 861911 (2015)
(pdf).

Robust Regression on MapReduce,

X. Meng, and M. W. Mahoney,

Proc. of the 30th ICML Conference, JMLR W&CP 28(3): 888896 (2013)
(pdf).

Quantile Regression for Largescale Applications,

J. Yang, X. Meng, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1305.0087 (2013)
(arXiv),
(code),

Proc. of the 30th ICML Conference, JMLR W&CP 28(3): 881887 (2013)
(pdf),

SIAM J. Scientific Computing, 36(5), S78S110 (2014)
(pdf).

Revisiting the Nystrom Method for Improved LargeScale Machine Learning,

A. Gittens and M. W. Mahoney,

Technical Report, Preprint: arXiv:1303.1849 (2013)
(arXiv),
(code),

Proc. of the 30th ICML Conference, JMLR W&CP 28(3): 567575 (2013)
(pdf),

J. Machine Learning Research, 17(117): 165 (2016)
(pdf).
2012

Semisupervised Eigenvectors for Largescale Locallybiased Learning,

T. J. Hansen and M. W. Mahoney,

Proc. of the 2012 NIPS Conference
(pdf),
(code),

Technical Report, Preprint: arXiv:1304.7528 (2013)
(arXiv),

J. Machine Learning Research, 15, 36913734 (2014)
(pdf).

Lowdistortion Subspace Embeddings in Inputsparsity Time and Applications to Robust Linear Regression,

X. Meng and M. W. Mahoney,

Technical Report, Preprint: arXiv:1210.3135 (2012)
(arXiv),

Proc. of the 45th STOC, 91100 (2013)
(pdf).

The Fast Cauchy Transform and Faster Robust Linear Regression,

K. L. Clarkson, P. Drineas, M. MagdonIsmail, M. W. Mahoney, X. Meng, and D. P. Woodruff,

Technical Report, Preprint: arXiv:1207.4684 (2012)
(arXiv),

Proc. of the 24th Annual SODA, 466477 (2013)
(pdf),

SIAM J. Computing, 45, 763810 (2016)
(pdf).

rCUR: an R package for CUR matrix decomposition,

A. Bodor, I. Csabai, M. W. Mahoney, and N. Solymosi,

BMC Bioinformatics, 13:103 (2012)
(pdf),
(code).

Approximate Computation and Implicit Regularization for Very Largescale Data Analysis,

M. W. Mahoney,

Technical Report, Preprint: arXiv:1203.0786 (2012)
(arXiv),

Proc. of the 2012 ACM Symposium on Principles of Database Systems, 143154, 2012
(pdf).

On the Hyperbolicity of SmallWorld and TreeLike Random Graphs,

W. Chen, W. Fang, G. Hu, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1201.1717 (2012)
(arXiv),

Proc. of the 23rd ISAAC 278288 (2012)
(pdf),

Internet Mathematics, 9(4), 434491 (2013)
(pdf).
2011

Randomized Dimensionality Reduction for Kmeans Clustering,

C. Boutsidis, A. Zouzias, M. W. Mahoney, and P. Drineas,

Technical Report, Preprint: arXiv:1110.2897 (2011)
(arXiv),

IEEE Transactions on Information Theory, 61(2), 10451062 (2015)
(pdf).

Regularized Laplacian Estimation and Fast Eigenvector Approximation,

P. O. Perry and M. W. Mahoney,

Technical Report, Preprint: arXiv:1110.1757 (2011)
(arXiv),

Proc. of the 2011 NIPS Conference
(pdf).

LSRN: A Parallel Iterative Solver for Strongly Over or UnderDetermined Systems,

X. Meng, M. A. Saunders, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1109.5981 (2011)
(arXiv),
(code),

SIAM J. Scientific Computing, 36(2), C95C118 (2014)
(pdf).

Fast approximation of matrix coherence and statistical leverage,

P. Drineas, M. MagdonIsmail, M. W. Mahoney, and D. P. Woodruff,

Technical Report, Preprint: arXiv:1109.3843 (2011)
(arXiv),

Proc. of the 29th ICML Conference (2012)
(pdf),

J. Machine Learning Research, 13, 34753506 (2012)
(pdf).

Localization on loworder eigenvectors of data matrices,

M. Cucuringu and M. W. Mahoney,

Technical Report, Preprint: arXiv:1109.1355 (2011)
(arXiv).

Efficient Genomewide Selection of PCACorrelated tSNPs for Genotype Imputation,

A. Javed, P. Drineas, M. W. Mahoney, and P. Paschou,

Annals of Human Genetics, 75, 707722 (2011)
(
pdf).

Randomized Algorithms for Matrices and Data,

M. W. Mahoney,

Foundations and Trends in Machine Learning,
NOW Publishers,
Volume 3, Issue 2, 2011
(now),

TR version:
Technical Report, Preprint: arXiv:1104.5557 (2011)
(arXiv).

(Abridged version in:
Advances in Machine Learning and Data Mining for Astronomy,
edited by
M. J. Way, et al.,
pp. 647672,
2012.)
2010

Computation in LargeScale Scientific and Internet Data Applications is a Focus of MMDS 2010,

M. W. Mahoney,

Technical Report, Preprint: arXiv:1012.4231 (2010)
(arXiv),

Appeared in
SIGKDD Explorations,
SIGACT News,
ASASCGN Newsletter,
and IMS Bulletin.

CUR from a Sparse Optimization Viewpoint,

J. Bien, Y. Xu, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1011.0413 (2010)
(arXiv),

Proc. of the 2010 NIPS Conference
(ps,
pdf).

Algorithmic and Statistical Perspectives on LargeScale Data Analysis,

M. W. Mahoney,

Technical Report, Preprint: arXiv:1010.1609 (2010)
(arXiv),

In:
Combinatorial Scientific Computing,
pp. 427469,
edited by
U. Naumann and O. Schenk,
2012.

Implementing regularization implicitly via approximate eigenvector computation,

M. W. Mahoney and L. Orecchia,

Technical Report, Preprint: arXiv:1010.0703 (2010)
(arXiv),

Proc. of the 28th ICML Conference, 121128 (2011)
(pdf).
(talk).

Approximating HigherOrder Distances Using Random Projections,

P. Li, M. W. Mahoney, and Y. She,

Proc. of the 26th UAI Conference, 312321 (2010)
(ps,
pdf),

Technical Report, Preprint: arXiv:1203.3492 (2012)
(arXiv).

Effective Resistances, Statistical Leverage, and Applications to Linear Equation Solving,

P. Drineas and M. W. Mahoney,

Technical Report, Preprint: arXiv:1005.3097 (2010)
(arXiv).

Empirical Comparison of Algorithms for Network Community Detection,

J. Leskovec, K. J. Lang, and M. W. Mahoney,

Technical Report, Preprint: arXiv:1004.3539 (2010)
(arXiv),

Proc. of the 19th International WWW, 631640 (2010)
(ps,
pdf).
2009

A Local Spectral Method for Graphs: with Applications to Improving Graph
Partitions and Exploring Data Graphs Locally,

M. W. Mahoney, L. Orecchia, and N. K. Vishnoi,

Technical Report, Preprint: arXiv:0912.0681 (2009)
(arXiv),

J. Machine Learning Research, 13, 23392365 (2012)
(ps,
pdf).

Unsupervised Feature Selection for the kmeans Clustering Problem,

C. Boutsidis, M. W. Mahoney, and P. Drineas,

Proc. of the 2009 NIPS Conference
(ps,
pdf).

Learning with Spectral Kernels and HeavyTailed Data,

M. W. Mahoney and H. Narayanan,

Technical Report, Preprint: arXiv:0906.4539 (2009)
(arXiv).

Empirical Evaluation of Graph Partitioning Using Spectral Embeddings and Flow,

K. J. Lang, M. W. Mahoney, and L. Orecchia,

Proc. of the 8th International SEA, 197208 (2009)
(ps,
pdf).

CUR Matrix Decompositions for Improved Data Analysis,

M. W. Mahoney and P. Drineas,

Proc. Natl. Acad. Sci. USA, 106, 697702 (2009)
(ps,
pdf).
2008

An Improved Approximation Algorithm for the Column Subset Selection Problem,

C. Boutsidis, M. W. Mahoney, and P. Drineas,

Technical Report, Preprint: arXiv:0812.4293 (2008)
(arXiv),

Proc. of the 20th Annual SODA, 968977 (2009)
(ps,
pdf).

Algorithmic and Statistical Challenges in Modern LargeScale Data Analysis are the Focus of MMDS 2008

M. W. Mahoney, L.H. Lim, and G. E. Carlsson

Technical Report, Preprint: arXiv:0812.3702 (2008)
(arXiv),

Appeared in
SIGKDD Explorations
(ps,
pdf),
SIAM News
(ps,
pdf),
and
ASASCGN Newsletter
(ps,
pdf),
and abridged versions appeared in IMS Bulletin
(ps,
pdf)
and AmStat News.

Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large WellDefined Clusters,

J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney,

Technical Report, Preprint: arXiv:0810.1355 (2008)
(arXiv),

Internet Mathematics, 6(1), 29123 (2009)
(pdf).

Unsupervised Feature Selection for Principal Components Analysis,

C. Boutsidis, M. W. Mahoney, and P. Drineas,

Proc. of the 14th Annual SIGKDD, 6169 (2008)
(ps,
pdf).

Statistical Properties of Community Structure in Large Social and Information Networks,

J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney,

Proc. of the 17th International WWW, 695704 (2008)
(ps,
pdf).
2007

Faster Least Squares Approximation,

P. Drineas, M. W. Mahoney, S. Muthukrishnan, and T. Sarlos,

Technical Report, Preprint: arXiv:0710.1435 (2007)
(arXiv),

Numerische Mathematik, 117, 219249 (2011).

PCACorrelated SNPs for Structure Identification in Worldwide Human Populations,

P. Paschou, E. Ziv, E. G. Burchard, S. Choudhry, W. RodriguezCintron, M. W. Mahoney, and P. Drineas,

PLoS Genetics, 3, 16721686 (2007)
(ps,
pdf).

RelativeError CUR Matrix Decompositions,

P. Drineas, M. W. Mahoney, and S. Muthukrishnan,

Technical Report, Preprint: arXiv:0708.3696 (2007)
(arXiv),

SIAM J. Matrix Analysis and Applications, 30, 844881 (2008)
(ps,
pdf).

Feature Selection Methods for Text Classification,

A. Dasgupta, P. Drineas, B. Harb, V. Josifovski, and M. W. Mahoney,

Proc. of the 13th Annual SIGKDD, 230239 (2007)
(ps,
pdf).

Sampling Algorithms and Coresets for Lp Regression,

A. Dasgupta, P. Drineas, B. Harb, R. Kumar, and M. W. Mahoney,

Technical Report, Preprint: arXiv:0707.1714 (2007)
(arXiv),

Proc. of the 19th Annual SODA, 932941 (2008)
(ps,
pdf),

SIAM J. Computing, 38, 20602078 (2009)
(ps,
pdf).

Web Information Retrieval and Linear Algebra Algorithms,

A. Frommer, M. W. Mahoney, and D. B. Szyld (Eds.),

Proc. of Dagstuhl Seminar 07071, (2007)
(web).

Intra and interpopulation genotype reconstruction from tagging SNPs,

P. Paschou, M. W. Mahoney, A. Javed, J. R. Kidd, A. J. Pakstis, S. Gu, K. K. Kidd, and P. Drineas,

Genome Research, 17(1), 96107 (2007)
(ps,
pdf).
2006

Bridging the Gap Between Numerical Linear Algebra, Theoretical Computer Science, and Data Applications,

G. H. Golub, M. W. Mahoney, P. Drineas, and L.H. Lim,

SIAM News 39:8 October 2006
(ps,
pdf).

Randomized Algorithms for Matrices and Massive Data Sets,

P. Drineas and M. W. Mahoney,

Proc. of the 32nd Annual VLDB, 1269 (2006)
(ps,
pdf).

Subspace Sampling and RelativeError Matrix Approximation: ColumnRowBased Methods,

P. Drineas, M. W. Mahoney, and S. Muthukrishnan,

Proc. of the 14th Annual ESA, 304314 (2006)
(ps,
pdf).

Subspace Sampling and RelativeError Matrix Approximation: ColumnBased Methods,

P. Drineas, M. W. Mahoney, and S. Muthukrishnan,

Proc. of the 10th Annual RANDOM, 316326 (2006)
(ps,
pdf).

TensorCUR Decompositions For TensorBased Data,

M. W. Mahoney, M. Maggioni, and P. Drineas,

Proc. of the 12th Annual SIGKDD, 327336 (2006)
(ps,
pdf),

SIAM J. Matrix Analysis and Applications, 30, 957987 (2008)
(ps,
pdf).

Polynomial Time Algorithm for ColumnRowBased RelativeError LowRank Matrix Approximation,

P. Drineas, M. W. Mahoney, and S. Muthukrishnan,

Technical Report, DIMACS TR 200604 March 2006
(ps,
pdf).

Sampling Algorithms for L2 Regression and Applications,

P. Drineas, M. W. Mahoney, and S. Muthukrishnan,

Proc. of the 17th Annual SODA, 11271136 (2006)
(ps,
pdf).
2005

A Randomized Algorithm for a TensorBased Generalization of the Singular Value Decomposition,

P. Drineas and M. W. Mahoney,

Technical Report, YALEU/DCS/TR1327, June 2005
(ps,
pdf),

Linear Algebra and its Applications, 420, 553571 (2007)
(ps,
pdf).

On the Nystrom Method for Approximating a Gram Matrix for Improved KernelBased Learning,

P. Drineas and M. W. Mahoney,

Technical Report, YALEU/DCS/TR1319, April 2005
(ps,
pdf),

Proc. of the 18th Annual COLT, 323337 (2005)
(ps,
pdf),

J. Machine Learning Research, 6, 21532175 (2005)
(ps,
pdf).
2004

Sampling Subproblems of Heterogeneous MaxCut Problems and Approximation Algorithms,

P. Drineas, R. Kannan, and M. W. Mahoney,

Technical Report, YALEU/DCS/TR1283, April 2004
(ps,
pdf),

Proc. of the 22nd Annual STACS, 5768 (2005)
(ps,
pdf),

Random Structures and Algorithms, 32:3, 307333 (2008)
(ps,
pdf).

Fast Monte Carlo Algorithms for Matrices III: Computing an Efficient Approximate Decomposition of a Matrix,

P. Drineas, R. Kannan, and M. W. Mahoney,

Technical Report, YALEU/DCS/TR1271, February 2004
(ps,
pdf),

SIAM J. Computing, 36, 184206 (2006)
(ps,
pdf).

Fast Monte Carlo Algorithms for Matrices II: Computing LowRank Approximations to a Matrix,

P. Drineas, R. Kannan, and M. W. Mahoney,

Technical Report, YALEU/DCS/TR1270, February 2004
(ps,
pdf),

SIAM J. Computing, 36, 158183 (2006)
(ps,
pdf).

Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication,

P. Drineas, R. Kannan, and M. W. Mahoney,

Technical Report, YALEU/DCS/TR1269, February 2004
(ps,
pdf),

SIAM J. Computing, 36, 132157 (2006)
(ps,
pdf).
2003

Rapid Mixing of Several Markov Chains for a HardCore Model,

R. Kannan, M. W. Mahoney, and R. Montenegro,

Proc. of the 14th Annual ISAAC, 663675 (2003)
(ps,
pdf).
2001

Quantum, Intramolecular Flexibility, and Polarizability Effects on the Reproduction of the Density Anomaly of Liquid Water by Simple Potential Functions,

M. W. Mahoney and W. L. Jorgensen,

J. Chem. Phys., 115, 1075810768 (2001)
(ps,
pdf).

Rapid Estimation of Electronic Degrees of Freedom in Monte Carlo Calculations for Polarizable Models of Liquid Water,

M. W. Mahoney and W. L. Jorgensen,

J. Chem. Phys., 114, 93379349 (2001)
(ps,
pdf).

Diffusion Constant of the TIP5P Model of Liquid Water,

M. W. Mahoney and W. L. Jorgensen,

J. Chem. Phys., 114, 363366 (2001)
(ps,
pdf).
2000

A FiveSite Model for Liquid Water and the Reproduction of the Density Anomaly by Rigid, Nonpolarizable Potential Functions,

M. W. Mahoney and W. L. Jorgensen,

J. Chem. Phys., 112, 89108922 (2000)
(ps,
pdf).
1997

Repression and Activation of PromoterBound RNA Polymerase Activity by Gal Repressor,

H. E. Choy, R. R. Hanger, T. Aki, M. Mahoney, K. Murakami, A. Ishihama, and S. Adhya,

J. Mol. Biol. 272: 293300, 1997
(ps,
pdf).

Discrete Representations of the Protein Calpha Chain,

X. F. de la Cruz, M. W. Mahoney, and B. K. Lee,

Fold. & Des. 2: 223234, 1997
(ps,
pdf).
MMDS Workshops
I run the MMDS meetings.
We started the
MMDS Workshops on
"Algorithms for Modern Massive Data Sets"
to address
algorithmic and statistical
challenges in modern largescale statistical data analysis.

MMDS 2016
took place on the campus of UC Berkeley on June 2124, 2016.
See the
main MMDS web page
for more information.

MMDS 2014
took place on the campus of UC Berkeley on June 1720, 2014.
See the
main MMDS web page
for more information.
Click
here for the entire
video collection.

MMDS 2012
took place on the campus of Stanford University on July 1013, 2012.
For pdfs and videos of the presentations, go to the main MMDS web
page; or click
here for the entire
video collection.

MMDS 2010
took place on the campus of Stanford University on June 1518, 2010.
MMDS 2010 addressed computation in largescale scientific and internet data
applications more generally.
See
the MMDS web page
for details, including articles and all the speaker overheads!

MMDS 2008
took place on June 2528, 2008.
MMDS 2008 grew out of our expectation for what
the algorithmic and statistical
foundations of
largescale
data analysis
should look like a generation from now.
Click
here
for an article that appeared in SIGKDD Explorations and
SIAM News about the meeting.

MMDS 2006
took place on June 2124, 2006.
MMDS 2006 was originally motivated by the complementary perspectives
brought by numerical linear algebra and theoretical computer science to
matrix algorithms in largescale data applications.
Click
here
for an article in SIAM News about the meeting.
These MMDS meetings generated intense interdisciplinary interest and were a
big success  so keep an eye out for future MMDSs!
Talks and presentations
Several recent talks:
Numericallyintensive Machine Learning at Scale
(Fall 2017)
(pdf)
UC Berkeley's FODA Institute: Foundations of Data Analysis
(NSF TRIPODS Kickoff, Oct 2017)
(pdf)
Secondorder Machine Learning
(Fall 2017)
(pdf)
Secondorder Machine Learning
(Prefall 2017)
(pdf)
Local graph analytics: beyond characterizing community structure
(Spring 2017)
(pdf)
Several tutorial presentations:
Randomization in Numerical Linear Algebra: Theory and Practice
(2.0 hr version at SIAM ALA Meeting, October 2015)
(pdf)
Past, Present and Future of Randomized Numerical Linear Algebra:
(3.0 hr version at Simons' Institute 2013 Big Data Bootcamp)
(Part I:
pdf,
ppt
and Part II:
pdf,
ppt)
Theory (and some practice) of Randomized Algorithms for Matrices and Data
(tutorial from FOCS 2012 Workshop)
(pdf,
ppt)
Geometric Tools for Identifying Structure in Large Social and Information Networks
(1.5 hr version at SAMSI Opening Workshop 2010, etc.)
(pdf,
ppt)
Geometric Tools for Identifying Structure in Large Social and Information Networks
(2 hr version at ICASSP 2011, etc.)
(pdf,
ppt)
Geometric Tools for Identifying Structure in Large Social and Information Networks
(3 hr version at ICML 2010 and KDD 2010, etc.)
(pdf,
ppt)
(The pdf file in four pieces:
here,
here,
here,
and
here.)
Randomized Algorithms for Matrices and Massive Data Sets
(at SIAMSDM06 2006 and VLDB 2006)
(ppt)
Randomized Algorithms for Matrices and Massive Data Sets
(at ACMSIGKDD 2005)
(ppt)
Several other older talks:
Terabytescale Computational Statistics
(talk Fall 2016)
(pdf)
Scientific Matrix Factorizations in Spark at Scale: Crossplatform performance, scaling, and comparisons with C+MPI
(talk at 2016 Dato Data Science Summit and elsewhere)
(pdf)
Optimization Algorithms for Analyzing Large Datasets
(talk at 2016 PCMI Summer School)
(pdf)
Foundations of Data Science
(talk at NSF preTRIPODS workshop/meeting, Apr 2016)
(pdf)
SubSampled Newton Methods
(talk at ITA 2016 and elsewhere)
(pdf)
Challenges in Multiresolution Methods for Graphbased Learning
(talk, 3of3, from NIPS15 Workshops)
(pdf)
Using Local Spectral Methods in Theory and in Practice
(talk, 2of3, from NIPS15 Workshops)
(pdf)
Column Subset Selection on Terabytesized Scientific Data
(talk, 1of3, from NIPS15 Workshops)
(pdf)
Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra
(DIMACS, Aug 2015)
(pdf)
Overview of RandNLA: Randomized Numerical Linear Algebra
(pdf)
Treelike structure in social graphs
(pdf (big=28MB),
ppt (big=48MB))
Eigenvector localization, implicit regularization, and algorithmic antidifferentiation for largescale graphs and network data
(pdf,
ppt)
Locallybiased and semisupervised eigenvectors
(talk from MMDS 2014)
(pdf,
ppt)
Implicit regularization in sublinear approximation algorithms
(pdf,
ppt)
BIG Biomedicine and the Foundations of BIG Data Analysis (at Big Data in Biomedicine at Stanford's Medical School, 5/23/14)
(pdf,
ppt)
Revisiting the Nystrom Method for Improved LargeScale Machine Learning
(pdf,
ppt)
Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments
(version from Simons Big Data Workshop II)
(pdf)
Inputsparsity Time Algorithms for Embeddings and Regression Problems
(talk from Simons Big Data Workshop I)
(pdf)
Randomized Regression in Parallel and Distributed Environments
(talk from GraphLab 2013)
(pdf)
Extracting insight from large networks: implications of smallscale and largescale structure
(pdf,
ppt)
Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments
(version from MMDS 2012)
(pdf)
Sensors, networks, and massive data
(pdf,
ppt)
Randomized Algorithms for Matrices and Data
(pdf,
ppt)
Approximate computation and implicit regularization in largescale data analysis
(PODS vsn)
(pdf,
ppt)
Approximate computation and implicit regularization in largescale data analysis
(Stats vrsn1)
(pdf,
ppt)
Approximate computation and implicit regularization in largescale data analysis
(Short vrsn)
(pdf,
ppt)
Looking for clusters in your data ... in theory and in practice
(pdf,
ppt)
Fast Approximation of Matrix Coherence and Statistical Leverage
(pdf,
ppt)
Implementing regularization implicitly via approximate eigenvector computation
(pdf,
ppt)
Linear Algebra and Machine Learning of Large Informatics Graphs
(pdf,
ppt)
Geometric Network Analysis Tools
(talk from MMDS 2010)
(pdf,
ppt)
Algorithmic and Statistical Perspectives on LargeScale Data Analysis
(pdf,
ppt)
Community structure in large social and information networks
(newer)
(pdf,
ppt)
Statistical leverage and improved matrix algorithms
(newer and long)
(pdf,
ppt)
Approximation Algorithms as Experimental Probes of Informatics Graphs
(pdf,
ppt)
Community structure in large social and information networks
(talk from MMDS 2008)
(pdf,
ppt)
Community structure in large social and information networks
(older)
(pdf,
ppt)
Statistical leverage and improved matrix algorithms
(older and short)
(pdf,
ppt)
Sampling algorithms and coresets for Lp regression and applications
(pdf,
ppt)
CUR Matrix Decompositions for Improved Data Analysis (talk from MMDS 2006)
(pdf,
ppt)
A RelativeError CUR Decomposition for Matrices and Its Data Applications
(pdf,
ppt)
Sampling Algorithms for L2 Regression and Applications
(talk from SODA 2006)
(pdf,
ppt)
Approximating a Gram Matrix for Improved KernelBased Learning
(talk from COLT 2005)
(ps,
pdf)
Fast Monte Carlo Algorithms for Matrix Operations and Massive Data Set Analysis
(newer)
(
pdf,
ppt)
Fast Monte Carlo Algorithms for Matrix Operations and Massive Data Set Analysis
(older)
(
pdf)
CUR Matrix Decomposition with Applications to Algorithm Design and Massive Data Set Analysis
(pdf)
Fast Monte Carlo Algorithms for Massive Data Sets and Approximating MaxCut
(ps,
pdf)
The TIP5P Water talk:
The Computational Statistical Mechanics of Simple Models of Liquid Water
(
pdf)
Videos of several talks:
"Matrix factorizations at scale: a comparison of scientific data analytics on Spark and MPI"
(at the 2016 Dato Data Science Summit).
"Challenges in Multiresolution Methods for Graphbased Learning"
(at the NIPS Workshops, December 2015).
"Using Local Spectral Methods in Theory and in Practice"
(at the NIPS Workshops, December 2015).
"Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra"
(at DIMACS, August 2015).
"Eigenvector localization and implicit regularization for largescale graphs and networked data"
(at CIMAT, March 2015).
"Locallybiased and semisupervised eigenvectors"
(at MMDS 2014, June 2014).
"Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments"
(at the SF Bay Area DMSIG Meeting, June 2, 2014).
"BIG Biomedicine and the Foundations of BIG Data Analysis"
(at the Big Data in BioMedicine Conference at Stanford University, May 2014).
"Randomized Matrix Algorithms and Largescale Scientific Data Analysis"
(at the 2014 ASE Conference, May 31, 2014).
"Eigenvector localization, implicit regularization, and algorithmic antidifferentiation for largescale graphs and networked data"
(or try also here for another version)
(at ICERM and IMA, May and April 2014, respectively).
"Randomized matrix algorithms and largescale scientific data analysis"
(at the University of Michigan, April 2014).
"Recent Results in Randomized Numerical Linear Algebra"
(or click here)
(at the NIPS 2013 Workshops, December 2013).
"Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments"
(at the Simons Big Data Workshop II, October 2013).
"Inputsparsity Time Algorithms for Embeddings and Regression Problems"
(at the Simons Big Data Workshop I, September 2013).
"Past, Present and Future of Randomized Numerical Linear Algebra:
Part I (PD) and
Part II" (MM)
(at the Simons Big Data Bootcamp, September 2013).
"Randomized Regression in Parallel and Distributed Environments"
(at the GraphLab Workshop, July 2013).
"Sensors, networks, and massive data"
or click
here
(at Kavli FoS, November 2012).
"Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments"
(at MMDS 2012, July 2012).
"Extracting insight from large networks: implications of smallscale and largescale structure"
(at the University of Marlyand, April 2012).
"Fast Approximation of Matrix Coherence and Statistical Leverage"
(at the NIPS Workshops, December 2011).
"Approximate computation and implicit regularization in largescale data analysis"
or click
here
(at the Workshop on Beyond WorstCase Analysis, September 2011).
"Linear Algebra and Machine Learning of Large Informatics Graphs"
(at the NIPS Workshops, December 2010).
"Geometric Tools for Identifying Structure in Large Social and Information Networks"
(90 minute version, tutorial at SAMSI Opening workshop, August 2010).
"Geometric Tools for Graph Mining of Large Social and Information Networks"
(3 hour version, tutorial at KDD 2010, July 2010).
"Community Structure in Large Social and Information Networks"
(at the Newton Institute in Cambridge, June 2010).
"Algorithmic and Statistical Perspectives on LargeScale Data Analysis"
(at the SF Bay Area DMSIG Meeting, February 2010).
"Community Structure in Large Social and Information Networks,"
in avi
or mpg (note: slow to download),
(at the 2009 IIT Kanpur Processing Massive Data Sets Workshop, December 2009).
"Statistical Leverage and Improved Matrix Algorithms"
(at the 2009 ICML Workshops, June 2009).
"Sampling Algorithms and Coresets for Lp Regression and Applications,"
in mpg (note: slow to download),
(at the 2006 IIT Kanpur Data Streams Workshop, December 2006).
"CUR Matrix Decompositions for Improved Data Analysis"
(at Johns Hopkins University, March 2006).
"Fast Monte Carlo Algorithms for Matrix Operations and Massive Data Set Analysis"
(at the 2005 IPAM Summer School, July 2005).
My old web page
at Yale has additional (somewhat outdated) information;
and my less old web page
at Stanford has additional (somewhat less outdated) information;
but you should click on them (or Google search for them) if you are interested in
extremely large homes.
