STATISTICS 135, FALL 06

Ani Adhikari

SUMMARY OF CLASS PERFORMANCE
(This is all you are going to get, and it's plenty; please don't send me or Donghui email asking for more details because you won't get them. To see your final exam, please come to my office in the Spring semester.)

The class has done very well. Letter grades are based on overall scores. The median overall score is 71 and 20% of the students have overall scores of 87 or more. I didn't go by a set formula but rather by looking at gaps in the histogram, which led to somewhat over 30% in the A-range and about 30% in the B's.

Here are some relevant figures.

Overall HW scores: Despite all the whining about how harsh the grading was, the hw scores are spectacular. The median is 24/32 and more than 1/3 of the students have perfect HW scores of 32/32. Stop whining. Now.

The midterm: We went over this at length so I won't go over it all again, except to remind you that the median was 60%.

The final: I set a final which was a thorough workout in what you have learned. The class rose to the challenge extremely well. The median final score is 65.5, with 20% of the class getting 79 and above. The highest score is 93. By and large, I am delighted with your performance. I knew that 80 would be hard to get on this test - after all, just a couple of minor errors per page and you're averaging 8 points out of each 10. So I'd especially like to congratulate all those with scores in the 80s and 90s. It was a pleasure to read your answers.

I am even more pleased to see a great improvement over the midterm in the work of numerous students. Indeed, 39 students killed their midterm scores with better finals. I'd like to congratulate all those students as well. Your determination and perseverance has been impressive and delightful.
No, the improvement is not just the regression effect. The regression effect doesn't lift the median of the whole class.

As in the midterm you got plenty of partial credit for having the right idea, even if your execution was faulty.

What worked out well on the final:
Between you and me we have done something right this term, because almost the entire class, even those with low scores, made the correct decisions about what technique to use and precisely what inference was being done, virtually every time. That is extremely satisfying to me and you should be proud of yourselves. Much abuse of statitstics comes from people who use the techniques without first thinking carefully. You're doing a lot better than that.

And now for what turned out to be hard:

3. The "meta-analysis". This was horrible. I had the same question on my Stat 20 final and they did better. If the null hypothesis is true, the result of a statistical test is NOT like tossing a coin! I don't know how many people informed me that if the treatment did nothing then 50% of the tests would have concluded that worked. Is that why we spent a whole semester learning inference, to develop something that's only as good as a coin toss? If the tests are all at the 5% level and the treatment does nothing, then only about 5% of the tests (= 5% of 300 = 15) should conclude that it works. Under the null, the number of wrong conclusions is binomial (300, 0.05), and the observed number of 141 is off-scale large. The analysis indicates that the treatment works.
I got all manner of ramblings on this one. I was interested to notice that it was answered better by people with below-median final exam scores than by those with above-median scores.

7c. The correlations between pairs of variables in a regression. Next time, draw a diagram or THINK about the variables before launching into a long calculation. No calculation was needed.
First, the estimated score and age; that is, y_hat and x. If you plot the estimates against the given variable then the estimates are on a straight line, guys. That's kind of the point of linear regression, remember? Correlation has to be +1 or -1. In this case it's -1 since the line is sloping down.
Second, the estimated score and observed score; that is, y_hat and y. If the regression method is worth even talking about, surely this one should be at least positive? The bigger the observed value, the bigger its estimate? The most common answer was 0, which essentially implies that your estimate has nothing to with what you're estimating! The estimate is a linear function of x, so the correlation between y and the estimate will be the same (upto a change of sign) as the correlation between y and x. Because the correlation between y and x is negative, flip the sign. The answer is 0.7. Those of you who got close by a calculation typically ended up with -0.7, because you forgot that SD(bx) = |b|SD(x), not bSD(x). It matters, when b is negative.
Third, the correlation between the estimated score and the residuals; that is y_hat and e. How many residual plots have you looked at? None carefully, I guess, or you'd have noticed that this is precisely what is plotted in residual plots and that you proved yourself (last problem, HW 9) that the correlation is 0.

8d. The variance of the MLE. You should not make your life difficult by jumping to the Fisher information, for the following reasons. First, the result you learned only applies when you have an i.i.d. sequence from a distribution, whereas in the problem you had two i.i.d. sequences from two different distributions with a common parameter. Second, 1/nI is only the asymptotic variance of the MLE, that is, the variance of the limit as the sample size gets large. The question asked for the variance for fixed n and m. As most of you came up with an estimate that was a linear combination of the X's and Y's, you just had to use the basic rules of variance (no covariances, even) to find it.
By the way, the majority of you can't correctly differentiate (Y_i - 2 mu)^2 with respect to mu. There should be two factors of 2 in the derivative, not one. I'll have to take this up with the math department.

10b. The variance of the difference between X_bar and Y_bar where those are the means of the first 10 draws and next 10 draws respectively, made at random without replacement from a population of 30 numbers. This one involves a covariance which I expected would be hard, and it was, although it was fairly well done by those who'd obviously spent some time on the covariance worksheet. It's very much like Problems 6 and 7 on that sheet.

To see your final exam score, go to Statgrades. If you are taking the class P/NP, you will see the letter grade that you would have got had you taken the class for a regular letter grade.