This course is an introduction to the theory and application of statistical methods. The topics to be covered are fundamental concepts of mathematical statistics, including survey sampling, estimation and hypothesis testing, topics in descriptive statistics and data analysis, with particular emphasis on graphical displays, aspects of experimental design, and a variety of applications. We will cover most of the material in chapters 7-14 of the text. The computer will play a key role in the course, although no prior experience is assumed; the open source statistical package R will be used in labs to analyze real data sets and conduct simulations. These concrete activities will be a valuable complement to the lectures.

Pre-requisites: Calculus and linear algebra. Statistics 134 or an equivalent course in probability theory

Text: J. A. Rice. *Mathematical Statistics and Data Analysis. *3rd Edition. We will cover most of chapters 7-14.

Library reserves: I put two optional books on reserve. I recommend that you check them out sometime during the semester.

*Statistics: A Guide to the Unknown*is a collection of essays describing how statistics is used in a wide variety of fields.*The Lady Tasting Tea*is a history of how statistical methods developed during the 20th century, written for a general audience.

John Rice

Office: 425 Evans Hall

Phone: 642-6930

Email: rice AT stat.berkeley.edu

url: www.stat.berkeley.edu/~rice

Office Hours: Wed 10-12

Mike Higgins

Office: 335 Evans

Email: mjh4646 AT stat.berkeley.edu

Office Hours: Monday 3-4, Tuesday 1-2, Thursday 1-2. 307 Evans

Tu-Th 2:00-3:30. 2 LeConte

I don't allow cell phones or laptops in lecture. However if you wish to use a laptop to take notes, please speak to me.

Friday 12:00-1:00 and 2:00-3:00 332 Evans. The section meeting will be used to instruct and help with computer assignements and to review course material.

Lab website is on Bspace

Grades will be based on a midterm, a final exam, homework, and labs.

The *midterm *will count for 25% of your grade; the score on the midterm will be replaced by the score on the final if the latter is higher. There will be no makeup midterms: if you miss the midterm, the score on it will be your score on the final. The midterm is scheduled for Oct 7.

The* final exam *will count 40%. It will be on Monday Dec 15, 12:30-3:30 pm. There will be no alternative times, so if you can't take the exam at this time, don't take the course.

*Homework* will be assigned every week and will count 20%. Your two lowest homework grades will be dropped. Assignments will be posted below. Homework will be collected in class on Thursdays.

There will be several* labs*; this component of the course will count 15%. They require data analysis using the statistical software R (see links below).

You are encouraged to work together with others on the homework, but you must write up your own solutions. The same applies to labs -- you must ultimately do your own computing and writing. So, for example, if a lab assignment involved taking a random sample, your random sample had better not be identical to any other in the class. No collaboration is allowed on exams. Cheating will be taken seriously and the penalties will be severe.

- The Bureau of the Census
- Gallup Polls.
- Interested in data? Check out swivel.com
- The Data and Story Library contains examples of the use of elementary statistics in many fields
- StatLib. A collection of data, software, news on Statistics, and other links
- UC Irvine Machine Learning Repository
- Completely meaningless randomly generated essays produced by the Postmodern Generator.
- Chance News.The aim of Chance is to make students more informed, and critical, readers of current news that uses probability and statistics as reported in daily newspapers such as "The New York Times" and current journals and magazines such as "Chance", "Science", "Nature", and the "New England Journal of Medicine"
- The R Project for Statistical Computing You can download the software we will use for this class from this site. The site also has other iinformation, including documentation. Here is a short introduction and a reference card for R. Here is an introduction written by two Berkeley undergraduates.
- An interesting article on why the polls for the presidential election differ.
- A discussion of the methodology of the Gallup Polls.
- Data from the textbook.
- Errata for the textbook
- The placebo effect
- Controlled experiments in the social sciences
- The American Statistical Association has information about careers in statistics and about internships
- Statistics Undergraduate Research Seminar

Demos on sampling and the data

Demo with globe

Bayes demo. I wasn't able to do this in class. I had intended to show the effects of changing the prior parameters, the number of trials and the number of successes

Midterm from 2007 with solutions

Old final exam and solutions

Midterm Solutions. Check the solutions against your answers and if you are still confused go over them with Mike or me.

Score distribution on the midterm (there were 26 points possible):

> stem(mid,scale=2); summary(mid)

The decimal point is at the |

7 | 0

8 | 00

9 |

10 | 00

11 | 00

12 | 000

13 | 0

14 | 0000

15 | 0

16 | 00

17 | 00000

18 | 0

19 | 00000

20 | 000

21 | 0000

22 | 00000000000

23 | 00000

24 | 000000

25 | 0000000000

26 | 000000000

Min. 1st Qu. Median Mean 3rd Qu. Max.

7.00 17.00 22.00 20.05 24.00 26.00

I see three possible explanations for this skewed distribution: (1) many students studied very hard, (2) I did a great job of teaching, (3) the midterm was too easy. I don't assign letter grades to individual components of the course. At the end of the semester the scores on components are combined in a weighted average and then letter grades are assigned. The midterm counts for 25%.

Class demos of chi-square tests: geissler.R geissler.txt cont-table.R delinq.txt smokepreg.txt

Class demos of two sample tests: ozone.R ozonecontrol.csv ozonetreat.csv calcium.R calcium.csv

Baseball data shown in class: baseball.R obp_nl.txt

Solutions to final exam

Week of Aug 27: Review 4.3. 7.1-7.2

Week of Sept 3: 7.2-7.3

Week of Sept 10: 7.3.3; Begin Chapter 8. (I will not cover 7.4-7.5, but read 7.6)

Week of Sept 17: 8.1 - 8.5

Week of Sept 24: Finish 8.5; start 8.6

Week of Oct 1:

Week of Oct 6: midterm; 9.1-9.2

Week of Oct 13: 9.3 - 9.5

Week of Oct 20: 9.5; 13.1-13.4

Week of Oct 27: 11.1-11.3

Week of Nov 3: 11.3-11.4

Week of Nov 10: begin chapter 14

Week of Nov 17:

Week of Nov 24:

Week of Dec 1:

Homework will be due in class on Thursday unless otherwise specified. Late assignments will not be accepted. The list of homework assignments and due dates follows. Show your work.

Sept 4: no homework

Sept 11: Chapter 7: 2,4,8,11,28,32

Sept 18: Chapter 7: 14, 16,18,24,34. Chapter 8: 4ac, 7ab

Sept 25: Chapter 8: 4d, 11, 14, 16ab, 18abc, 24, 28, 30

Oct 2: Chapter 8: 4e, 25, 29, 58abc, 63, 64

Oct 9: no assignment

Oct 16: Chapter 9: 4, 6,18,20

Oct 23: Chapter 9: 24, 26, 30, 33, 38

Oct 30: Chapter 9: 40; Chapter 13: 2, 6, 24abc, 28

Nov 6: Chapter 11: 8, 16, 18, 24, 39

Nov 13: Chapter 11: 23ab, 34, 45b, 47c (in these two problems you are not required to analyze the data), 52bcgh

Nov 20: Chapter 14: 1, 10, 11, 12, 15, 22, 23, 25

Dec 2 (*Tuesday)*: Chapter 14: 4,6,8, 16a, 19

Dec 9: *(Tuesday):* Chapter 14: 26, 27,30,32