Statistical Computing Materials

Introduction to Parallel Programming for Statisticians

Over the last few years, I've presented a variety of workshops on parallel programming in both shared and distributed memory contexts. These have focused on R, C/C++, Python, and Matlab. The most recent workshops have been:

March 2015: A workshop on parallel programming in R, including some general guidelines on parallel programming and information on using Amazon's EC2 to run virtual Linux machines and clusters. Material is available here

  • [SCF webpage]
  • [Github]
  • [via a git clone: git clone https://github.com/berkeley-scf/parallelR-biostat-2015]
October 2014: A workshop on parallel programming in R, C/C++, Matlab and Python, including information on using the SCF Linux cluster. Material is available here
  • [SCF webpage]
  • [Github]
  • [via a git clone: git clone https://github.com/berkeley-scf/parallel-workshop-2014]

Introduction to MapReduce using Spark for Statisticians

November 2014: I presented a series of two workshops on using Spark and MapReduce for distributed computation. The materials are available here:

  • [SCF webpage]
  • [Github]
  • [via a git clone: git clone https://github.com/berkeley-scf/spark-workshop-2014]
The demo is in the file spark.html in the repository.

Introduction to GPU Computation for Statisticians

April 2014: I presented a workshop on the basics of using GPUs for statistical computation via C, R, and Python. The materials are available here:

  • [SCF webpage]
  • [Github]
  • [via a git clone: git clone https://github.com/berkeley-scf/gpu-workshop-2014]
The demo is in the file gpu.html in the repository.

R Bootcamp: An Intensive Introduction to R

August 2014: With co-sponsorship from D-Lab and help from a number of others, I presented an intensive two-day introduction to R for members of the Berkeley campus, reprising the first bootcamp held August 2013. Materials are available here:

Introduction to C++, Calling C++ from R, and Creating R Packages

April/May 2013: I presented a series of three workshops. Day 1 focused on the basics of C++ as an introduction for statisticians. Day 2 focused on calling C/C++ from R. Day 3 focused on the structure and creation of R packages.

Introduction to Cloud Computing for Statisticians

February 2013: I've prepared some material on using Amazon's EC2 and Google Compute Engine, with a focus on using virtual clusters for statistical computations. The documents are oriented around accessing cloud resources for Berkeley Statistics users but most of the information is generalizable. WARNING: As of January 2014 the Google information is out-of-date, as Google has made a number of changes to Google Compute Engine, but the information here may be useful as a general overview.

Solving systems of equations, generating multivariate normal draws, and inverting matrices efficiently in R

Please see my technical vignette on this topic.

Introduction to Statistical Computing (Stat 243)

I have been teaching the graduate-level introduction to statistical computing here in the Statistics Department at Berkeley. Please see my teaching page for details and a full set of course materials.

Last updated: March 2015.