Statistical leverage, spectral regularization, and large data set applications Michael W. Mahoney Yahoo Research Several recent results in the theory of algorithms provide a novel perspective on classical statistical concepts, and this has led to novel applications in large-scale statistical data analysis. Two examples will be described. First, the statistical concept of leverage, widely-used in diagnostic regression analysis, can be used to prove improved worst-case performance guarantees for fundamental matrix problems such as the linear least squares approximation problem and the problem of choosing the best set of exactly k columns from a large matrix. Applications to feature selection and large-scale computing will be described. Second, the concept of regularization, often included as a smoothness assumption or a norm bound, is implicitly included in certain spectral approximation algorithms for intractable combinatorial problems. For example, for the minimum conductance cut problem, widely-used in clustering and community detection in large sparse graphs, local spectral methods can be used not only to obtain an efficient approximation algorithm but also to probe the graph for regularized clusters that are more meaningful in applications. Applications of this to characterizing the presence and absence of small-scale and large-scale community structure in modern social and information networks, a ubiquitous problem in many Internet applications, will be described.