# Possible Topics

The following is one listing/arrangement of possible topics that we might cover in the 4 days. The precise format and way in which we discuss these topics (lectures, discussions about how to teach these, etc.) is still yet to be decided.

In addition to adding new topics and ranking the topics in terms of importance and relevance, let's also add links to descriptions of case studies/exercises/homeworks/examples/practicals that we might use for the lab sessions for each of these topics. In many cases, these will cover more than one topic, so please add the links under each of the relevant topics.

### Goals of a Statistical Computing Class

• What are the main and subsidiary goals - understanding of computing, foundation for work in other classes, professional skills.
• What are core elements of the class and what are optional and for which audiences/category of students.

• Writing a Syllabus
• Identifying homeworks, exercises, examples and projects and sources of interesting problems and data.
• Grading (exams, projects, etc.) and group work, sharing of code.
• Students installing software
• Bulletin-Boards, chat rooms, mailing lists.
• Installing R and packages.
• Printing a Word document as PDF.
• X server for displaying graphics.
• Creating class accounts.

### Basic Computing Concepts

• Folder
• E-mail and attachments
• Editors - TINN, Emacs, NotePad.
• Shell tools
• Digital information - bits, bytes, characters, numbers
• Submitting homework
• HTML, Word, LaTeX

### R programming

• Basic data structures
• Vectorization
• Subsetting
• Recycling Rule.
• Invoking functions
• Data input
• Control Flow
• Writing Functions
• Designing functions and software development
• Debugging
• Profiling
• OOP - S3 classes
• Rainfall data (Doug Nychka)
• Traffic data (John Rice)
• Supernovae (Juan Meza)
• Randon number generation - Acceptance/Rejection sampling
• Fibonacci sequence

### Graphics

• Principles of statistical graphics
• Basic graphics model in R (grz)
• Lattice
• Animation
• Interactive Graphics
Mashups and using other technologies
• Election maps
• Napoleon's march
• NASA Environmental data fron ASA Data Expo
• Baseball data
• manyeyes.com and swivel.com

### Simulation

• Random number generation algorithms
• Markov Chain Monte Carlo (MCMC)
• Computer Experiments
• Acceptance/Rejection sampling Beta(a, b)
• 2D Acceptance/Rejection sampling for an ad hoc network.
• Birth/Death process
• 2D reinforcing random walk

### External Data Formats - Text

• String manipulation
• Regular expressions
• Shell tools
commands, pipes, variables, regular expressions and globbing.
• Mixing R and the shell
where to do the computations
• State of the Union address
• Web logs

### Databases and SQL

• Client Server model
• The relational model and algebra
• The Structure Query Language
• Baseball data
• TCP/IP data

### XML

• What is XML
• Strategies for parsing XML
• Exporting/Generating XML (e.g. Google Earth)
• Statistics journal bibliographies
• Elephant seal migration animation on Google Earth

### Computational Statistics Methods

• Bootstrapping
• Cross Validation
• Naive Bayes
• Nearest neighbor methods
• CART
• SVM
• Clustering
• Spline smoothing
• State of the Union (Prject Gutenberg)
• Spam filtering (Spam Assassin)
• Geo-location in wireless networks (CRAWDAD)
• Bootstrapping 1/median(X)

### Other Languages and Systems

Using other languages do different processing outside of R
• Python and Perl
• C/C++
• Java
• Excel

Duncan Temple Lang <duncan@wald.ucdavis.edu>