Possible Topics

The following is one listing/arrangement of possible topics that we might cover in the 4 days. The precise format and way in which we discuss these topics (lectures, discussions about how to teach these, etc.) is still yet to be decided.

In addition to adding new topics and ranking the topics in terms of importance and relevance, let's also add links to descriptions of case studies/exercises/homeworks/examples/practicals that we might use for the lab sessions for each of these topics. In many cases, these will cover more than one topic, so please add the links under each of the relevant topics.

Goals of a Statistical Computing Class

Administrative Details

Basic Computing Concepts

R programming

  • Basic data structures
  • Vectorization
  • Subsetting
  • Recycling Rule.
  • Invoking functions
  • Data input
  • Control Flow
  • Writing Functions
  • Designing functions and software development
  • Debugging
  • Profiling
  • OOP - S3 classes
  • Rainfall data (Doug Nychka)
  • Traffic data (John Rice)
  • Supernovae (Juan Meza)
  • Randon number generation - Acceptance/Rejection sampling
  • Fibonacci sequence


  • Principles of statistical graphics
  • Basic graphics model in R (grz)
  • Lattice
  • Animation
  • Interactive Graphics
    Mashups and using other technologies
  • Election maps
  • Napoleon's march
  • NASA Environmental data fron ASA Data Expo
  • Baseball data
  • manyeyes.com and swivel.com


  • Random number generation algorithms
  • Markov Chain Monte Carlo (MCMC)
  • Computer Experiments
  • Acceptance/Rejection sampling Beta(a, b)
  • 2D Acceptance/Rejection sampling for an ad hoc network.
  • Birth/Death process
  • 2D reinforcing random walk

External Data Formats - Text

  • String manipulation
  • Regular expressions
  • Shell tools
    commands, pipes, variables, regular expressions and globbing.
  • Mixing R and the shell
    where to do the computations
  • State of the Union address
  • Web logs

Databases and SQL

  • Client Server model
  • The relational model and algebra
  • The Structure Query Language
  • Baseball data
  • TCP/IP data


  • What is XML
  • Strategies for parsing XML
  • Exporting/Generating XML (e.g. Google Earth)
  • Statistics journal bibliographies
  • Elephant seal migration animation on Google Earth
  • Google Page Rank

Computational Statistics Methods

  • Bootstrapping
  • Cross Validation
  • Naive Bayes
  • Nearest neighbor methods
  • CART
  • SVM
  • Clustering
  • Spline smoothing
  • State of the Union (Prject Gutenberg)
  • Spam filtering (Spam Assassin)
  • Geo-location in wireless networks (CRAWDAD)
  • Bootstrapping 1/median(X)

Other Languages and Systems

Using other languages do different processing outside of R
  • Python and Perl
  • C/C++
  • Java
  • Excel

