This course is an introduction to the theory and application of statistical methods. The topics to be covered are fundamental concepts of mathematical statistics, including survey sampling, estimation and hypothesis testing, topics in descriptive statistics and data analysis, with particular emphasis on graphical displays, aspects of experimental design, and a variety of applications. We will cover most of the material in chapters 7-14 of the text. The computer will play a key role in the course, although no prior experience is assumed; the open source statistical package R will be used in labs to analyze real data sets and conduct simulations. These concrete activities will be a valuable complement to the lectures.
Pre-requisites: Calculus and linear algebra. Statistics 134 or an equivalent course in probability theory
Text: J. A. Rice. Mathematical Statistics and Data Analysis. 3rd Edition. We will cover most of chapters 7-14.
Library reserves: I put two optional books on reserve. I recommend that you check them out sometime during the semester.
Office: 425 Evans Hall
Email: rice AT stat.berkeley.edu
Office Hours: Wed 10-12
Office: 335 Evans
Email: mjh4646 AT stat.berkeley.edu
Office Hours: Monday 3-4, Tuesday 1-2, Thursday 1-2. 307 Evans
Tu-Th 2:00-3:30. 2 LeConte
I don't allow cell phones or laptops in lecture. However if you wish to use a laptop to take notes, please speak to me.
Friday 12:00-1:00 and 2:00-3:00 332 Evans. The section meeting will be used to instruct and help with computer assignements and to review course material.
Lab website is on Bspace
Grades will be based on a midterm, a final exam, homework, and labs.
The midterm will count for 25% of your grade; the score on the midterm will be replaced by the score on the final if the latter is higher. There will be no makeup midterms: if you miss the midterm, the score on it will be your score on the final. The midterm is scheduled for Oct 7.
The final exam will count 40%. It will be on Monday Dec 15, 12:30-3:30 pm. There will be no alternative times, so if you can't take the exam at this time, don't take the course.
Homework will be assigned every week and will count 20%. Your two lowest homework grades will be dropped. Assignments will be posted below. Homework will be collected in class on Thursdays.
There will be several labs; this component of the course will count 15%. They require data analysis using the statistical software R (see links below).
You are encouraged to work together with others on the homework, but you must write up your own solutions. The same applies to labs -- you must ultimately do your own computing and writing. So, for example, if a lab assignment involved taking a random sample, your random sample had better not be identical to any other in the class. No collaboration is allowed on exams. Cheating will be taken seriously and the penalties will be severe.
Demos on sampling and the data
Demo with globe
Bayes demo. I wasn't able to do this in class. I had intended to show the effects of changing the prior parameters, the number of trials and the number of successes
Midterm from 2007 with solutions
Old final exam and solutions
Midterm Solutions. Check the solutions against your answers and if you are still confused go over them with Mike or me.
Score distribution on the midterm (there were 26 points possible):
> stem(mid,scale=2); summary(mid)
The decimal point is at the |
7 | 0
8 | 00
10 | 00
11 | 00
12 | 000
13 | 0
14 | 0000
15 | 0
16 | 00
17 | 00000
18 | 0
19 | 00000
20 | 000
21 | 0000
22 | 00000000000
23 | 00000
24 | 000000
25 | 0000000000
26 | 000000000
Min. 1st Qu. Median Mean 3rd Qu. Max.
7.00 17.00 22.00 20.05 24.00 26.00
I see three possible explanations for this skewed distribution: (1) many students studied very hard, (2) I did a great job of teaching, (3) the midterm was too easy. I don't assign letter grades to individual components of the course. At the end of the semester the scores on components are combined in a weighted average and then letter grades are assigned. The midterm counts for 25%.
Class demos of chi-square tests: geissler.R geissler.txt cont-table.R delinq.txt smokepreg.txt
Class demos of two sample tests: ozone.R ozonecontrol.csv ozonetreat.csv calcium.R calcium.csv
Baseball data shown in class: baseball.R obp_nl.txt
Solutions to final exam
Week of Aug 27: Review 4.3. 7.1-7.2
Week of Sept 3: 7.2-7.3
Week of Sept 10: 7.3.3; Begin Chapter 8. (I will not cover 7.4-7.5, but read 7.6)
Week of Sept 17: 8.1 - 8.5
Week of Sept 24: Finish 8.5; start 8.6
Week of Oct 1:
Week of Oct 6: midterm; 9.1-9.2
Week of Oct 13: 9.3 - 9.5
Week of Oct 20: 9.5; 13.1-13.4
Week of Oct 27: 11.1-11.3
Week of Nov 3: 11.3-11.4
Week of Nov 10: begin chapter 14
Week of Nov 17:
Week of Nov 24:
Week of Dec 1:
Homework will be due in class on Thursday unless otherwise specified. Late assignments will not be accepted. The list of homework assignments and due dates follows. Show your work.
Sept 4: no homework
Sept 11: Chapter 7: 2,4,8,11,28,32
Sept 18: Chapter 7: 14, 16,18,24,34. Chapter 8: 4ac, 7ab
Sept 25: Chapter 8: 4d, 11, 14, 16ab, 18abc, 24, 28, 30
Oct 2: Chapter 8: 4e, 25, 29, 58abc, 63, 64
Oct 9: no assignment
Oct 16: Chapter 9: 4, 6,18,20
Oct 23: Chapter 9: 24, 26, 30, 33, 38
Oct 30: Chapter 9: 40; Chapter 13: 2, 6, 24abc, 28
Nov 6: Chapter 11: 8, 16, 18, 24, 39
Nov 13: Chapter 11: 23ab, 34, 45b, 47c (in these two problems you are not required to analyze the data), 52bcgh
Nov 20: Chapter 14: 1, 10, 11, 12, 15, 22, 23, 25
Dec 2 (Tuesday): Chapter 14: 4,6,8, 16a, 19
Dec 9: (Tuesday): Chapter 14: 26, 27,30,32