Stat260/CompSci294:
Randomized Algorithms for Matrices and Data


Instructor: Michael Mahoney
  • Email: mmahoney ATSYMBOL cs.stanford.edu
  • Office hours: By appointment.
  • Office is on the third floor of Calvin Hall.

    Teaching Assistant: Yuchen Zhang
  • Email: yuczhang ATSYMBOL eecs.berkeley.edu
  • Office hours: Fri 5:00-6:30pm, SODA aclove 411

    Class time and Location:
  • Mon-Wed 5:00-6:30pm, in 3109 Etcheverry Hall, on the UC Berkeley campus. (First meeting is Wed Sept 4, 2013.)


    Notes:
  • (12/31) My scribed version of the class lectures are available below. Feedback welcome!
  • (11/15) Final project presentations will be in class on Dec 11th.
  • (11/15) There will be no class Dec 9th due to the NIPS workshops many people will be attending.
  • (11/1) Most of you have already coordianted with me regarding the final project. If you haven't done so already, then you should do so ASAP.
  • (10/16) The second homework may be found here; it is due 11/13/13.
  • (10/13) The description for the final project is posted here. I don't have regular office hours, but I do have them by appointment, so let me know if you want to meet as you formulate the project.
  • (9/23) Planning ahead, we will not be having class on the Wednesday immediately prior to Thanksgiving.
  • (9/23) We have starting posting the scribed notes below. In general, we will get them up about a week after the class. They have not been thoroughly checked for completeness/correctness, so use them as a starting point to complement the readings.
  • (9/20) The list to sign up to scribe is here. For those who do not sign up by next week, we will assign the remaining students to the remaining classes randomly.
  • (9/17) The first homework may be found here; it is due 10/09/13.
  • (9/11) Please sign up here for a class to scribe, and coordinate with Yuchen as necessary. Depending on the numbers, taking the class for a grade or P/F, etc., everyone will have to scribe one or two classes.
  • (9/9) Starting on Wed 9/11, the class will meet in 3109 Etcheverry Hall.
  • (9/9) A template in tex for scribing the lectures can be found here, and what it should look like when it is complied can be seen here.
  • (8/30) The class is oversubscribed. If you decide to drop the class, please un-register immediately so that another student can be admitted. Some additional spaces will be made available, but if the class remains full, it may be necessary to limit enrollment.
  • (8/30) All students, including auditors, are requested to register for the class. Auditors should register S/U; an S grade will be awarded for class participation and satisfactory scribe notes.
  • (8/15) This class will be taught at UC Berkeley, not Stanford, during Fall 2013. I will be at Berkeley this fall as part of the program on "Theoretical Foundations of Big Data Analysis," to be held at the Simons Institute.


    Course description: Matrices are a popular way to model data (e.g., term-document data, people-SNP data, social network data, machine learning kernels, and so on), but the size-scale, noise properties, and diversity of modern data presents serious challenges for many traditional deterministic matrix algorithms. The course will cover the theory and practice of randomized algorithms for large-scale matrix problems arising in modern massive data set analysis (i.e., Randomized Numerical Linear Algebra). Topics to be covered include: underlying theory, including the Johnson-Lindenstrauss lemma, random sampling and projection algorithms, and connections between representative problems such as matrix multiplication, least-squares regression, least-absolute deviations regression, low-rank matrix approximation, etc.; numerical and computational issues that arise in practice in implementing algorithms in different computational environments; machine learning and statistical issues, as they arise in modern large-scale data applications; and extensions/connections to related problems as well as recent work that builds on the basic methods. Appropriate for advanced graduate students in computer science, statistics, and mathematics, as well as computationally-inclined students from application domains.

    Prerequisites: General mathematical sophistication; and a solid understanding of Algorithms, Linear Algebra, and Probability Theory, at the advanced undergraduate or beginning graduate level, or equivalent.

    Course requirements: Most likely, three homeworks (ca. 15-20% each), scribe a lecture (ca. 10%), and a major project (ca. 40%).


    Primary references: Much of the material has not worked its way into textbooks, and thus we will be reading reviews and primary sources. Here are a few articles that should give you an idea of some of the topics. Additional articles for particular topics and particular classes are listed below.

    Lectures:

    (My scribed versions of the lectures are included below.)