STATISTICS 135, FALL 06
Ani Adhikari
SUMMARY OF CLASS PERFORMANCE
(This is all you are going to get, and it's plenty; please don't send
me or Donghui email asking for more details because you won't get them.
To see your final exam, please come to my office in the Spring
semester.)
The class has done very well. Letter grades are based on
overall scores. The median overall score is 71 and 20% of the
students have overall scores of 87 or more. I didn't go by a set
formula but rather by looking at gaps in the histogram, which led to
somewhat over 30% in the A-range and about 30% in the B's.
Here are some relevant figures.
Overall HW scores: Despite all the whining about how harsh
the grading was, the hw scores are spectacular. The median is 24/32 and
more than 1/3 of the students have perfect HW scores of 32/32.
Stop whining. Now.
The midterm: We went over this at length so I won't go over it all again, except to remind you that the median was 60%.
The final: I set a final
which was a thorough workout in what you have learned. The class
rose to the challenge extremely well. The median final score is
65.5, with 20% of the class getting 79 and above. The highest
score is 93. By and large, I am delighted with your performance.
I knew that 80 would be hard to get on this test - after all,
just a couple of minor errors per page and you're averaging 8 points
out of each 10. So I'd especially like to congratulate all those
with scores in the 80s and 90s. It was a pleasure to read your
answers.
I am even more pleased to see a great improvement over the midterm in
the work of numerous students. Indeed, 39 students killed their
midterm scores with better finals. I'd like to congratulate all those students as
well. Your determination and perseverance has been impressive and
delightful.
No, the improvement is not just the regression effect.
The regression effect doesn't lift the median of the whole class.
As in the midterm you got plenty of partial credit for having the right idea, even if your execution was faulty.
What worked out well on the final:
Between you and me we have done something right this
term, because almost the entire class, even those with low
scores, made the correct decisions about what technique to use and
precisely what inference was being done, virtually every time.
That is extremely satisfying to me and you should be proud of
yourselves. Much abuse of statitstics comes from people who use
the techniques without first thinking carefully. You're doing a
lot better than that.
And now for what turned out to be hard:
3. The "meta-analysis". This was horrible.
I had the same question on my Stat 20 final and they did better.
If the null hypothesis is true, the result of a statistical test is NOT like tossing a coin!
I don't know how many people informed me that if the treatment
did nothing then 50% of the tests would have concluded that worked.
Is that why we spent a whole semester learning inference, to
develop something that's only as good as a coin toss? If the
tests are all at the 5% level and the treatment does nothing, then only about 5% of the tests (= 5% of 300 = 15) should conclude that it works.
Under the null, the number of wrong conclusions is binomial (300,
0.05), and the observed number of 141 is off-scale large. The analysis indicates that the treatment works.
I got all manner of ramblings on this one. I was
interested to notice that it was answered better by people with
below-median final exam scores than by those with above-median scores.
7c. The correlations between pairs of variables in a regression. Next time, draw a diagram or THINK about the variables before launching into a long calculation. No calculation was needed.
First, the estimated score and age; that is, y_hat and x. If you plot the estimates against the given variable then the estimates are on a straight line,
guys. That's kind of the point of linear regression, remember?
Correlation has to be +1 or -1. In this case it's -1 since
the line is sloping down.
Second, the estimated score and observed score; that is, y_hat and y.
If the regression method is worth even talking about, surely this one should
be at least positive? The bigger the observed value, the bigger
its estimate? The most common answer was 0, which essentially
implies that your estimate has nothing to with what you're estimating!
The estimate is a linear function of x, so the correlation between y and the estimate will be the same (upto a change of sign) as the correlation between y and x. Because the correlation between y and x
is negative, flip the sign. The answer is 0.7. Those of you
who got close by a calculation typically ended up with -0.7, because
you forgot that SD(bx) = |b|SD(x), not bSD(x). It matters, when b is negative.
Third, the correlation between the estimated score and the residuals; that is y_hat and e.
How many residual plots have you looked at? None carefully,
I guess, or you'd have noticed that this is precisely what is plotted
in residual plots and that you proved yourself (last problem, HW 9)
that the correlation is 0.
8d. The variance of the
MLE. You should not make your life difficult by jumping to the
Fisher information, for the following reasons. First, the result you
learned only applies when you have an i.i.d. sequence from a
distribution, whereas in the problem you had two i.i.d. sequences from
two different distributions with a common parameter. Second, 1/nI is only the asymptotic
variance of the MLE, that is, the variance of the limit as the sample
size gets large. The question asked for the variance for fixed n and m. As most of you came up with an estimate that was a linear combination of the X's and Y's, you just had to use the basic rules of variance (no covariances, even) to find it.
By the way, the majority of you can't correctly differentiate (Y_i - 2 mu)^2 with respect to mu. There should be two factors of 2 in the derivative, not one. I'll have to take this up with the math department.
10b. The variance of the difference between X_bar and Y_bar
where those are the means of the first 10 draws and next 10 draws
respectively, made at random without replacement from a population of
30 numbers. This one involves a covariance which I expected would
be hard, and it was, although it was fairly well done by those who'd
obviously spent some time on the covariance worksheet. It's very
much like Problems 6 and 7 on that sheet.
To see your final exam score, go to Statgrades.
If you are taking the class P/NP, you will see the letter grade
that you would have got had you taken the class for a regular letter
grade.