Statistics 215B: Applied Statistics. Spring 2012

Instructor: P.B. Stark, stark [AT] stat [DOT] berkeley [DOT] edu
Office Hours: Tuesdays, 11am–12pm, 403 Evans Hall
GSI: Yuval Benjamini, yuvalb [AT] stat [DOT] berkeley [DOT] edu
Office Hours: Fridays 10am–12pm, 332 Evans Hall
Meets: Tuesday, Thursday 9:30-11am, 332 Evans Hall
Texts: See reading list below

Course format:

3 hours of lecture per week, divided between discussing particular applications and papers, and presenting theory and methodology. There will be written assignments roughly every two weeks, and a term project that includes a written report and an oral class presentation. I hope that term projects will lead to publishable research: Bring your favorite data or favorite scientific problem.

The written assignments will largely be drawn from Freedman's book Statistical Models: Theory and Practice (2009 revised edition). I will not be lecturing on all the chapters from which I assign problems: I expect students to read and digest the material on their own, but I am happy to answer questions in class or in office hours, and if something turns out to be a stumbling block for more than a few students, I will lecture on it. I plan to reserve most of the lecture time to talk about particular applications and case studies.

List of pervasive themes:

Making sense of probability in applications
- when the experiment creates the probability (randomization, instrumental error, etc.)
- when the scientific theory includes a random component (e.g., cosmology)
- when the analyst pretends (statistical models, in general; earthquake prediction)
- when the probability model is postulated just to evaluate plausibility
Cultures of different applied disciplines
- geophysics
- cosmology
- helioseismology
- litigation
- elections
- Bayesian versus frequentist leanings in different disciplines
- coherent and incoherent analyses
- attention to implicit and explicit assumptions
- statistics: tool, incantation, or fauxphisication?
Solving real problems versus applying methods to data
- What's the big picture? (requires learning some science)
- Data quality, data quality, data quality
- The (lazy) tendency to classify problems by data type
- Choosing a good question (requires learning some science)
- Helping design experiments (requires learning some science)
- Designing methods to fit the problem: standard ≠ appropriate (requires learning some science)
- You can't always get what you (or your collaborators) want
Model selection, model choice
- What's the goal? Prediction? Estimation? Adjustment? Inference?
- Occam's Razor versus The Ostrich Principle
- Post-selection inference about model parameters. Meaning, methods, and madness
Test selection and its perils
Uncertainty quantification
Testing nonparametric hypotheses
Causal inference: randomization, the Neymann model, regression adjustments to experimental data, response schedules
Path models
[possible] Hierarchical linear models

List of applications (preliminary, time permitting):

Geophysics:
- Earthquake prediction, hazard maps, clustering
- Seismic structure of Earth, bumps on the core-mantle boundary
- Correlation of the geoid and magnetic field
Astrophysics
- Microwave cosmology
- using supernovae to measure the expansion of the universe
Voting
- Signature verification
- Election auditing
Medical research
- Placebos and active placebos
- Voodoo correlation
Litigation
- Sampling (in wage and hour and consumer class actions, and other)
- Damage models
Education, Sociology, Economics
- The effect of Catholic Schools
- Modeling credit risk

Techniques and tools likely to be discussed

AIC, BIC, Mallows C_p, Minimum Description Length
Confidence sets, tests, and the duality between them
Constraints versus priors in scientific problems
Credible regions and their connection to confidence sets
Inverse problems
Linear models and least-squares
Logit and Probit models
Maximum likelihood
Nonparametric inference about the mean of a restricted population
Optimization in infinite-dimensional spaces
Path Models
Permutation tests, the 2-sample problem, Fisher's Exact test and generalizations
Prediction intervals and tolerance intervals, nonparametric and Gaussian
Randomization
Sampling (simple, with replacement, stratified, cluster, proportional-to-size, multi-stage; ratio estimates, confidence intervals, tests)

Reading list (preliminary)

Angell, M., 2011. The Epidemic of Mental Illness: Why?, The New York Review of Books http://www.nybooks.com/articles/archives/2011/jun/23/epidemic-mental-illness-why/?pagination=false
Angell, M., 2011. The Illusions of Psychiatry, The New York Review of Books http://www.nybooks.com/articles/archives/2011/jul/14/illusions-of-psychiatry/?pagination=false
Barron, A., J. Rissanan, and B. Yu, 1998. The Minimum Description Length Principle in Coding and Modeling, IEEE Trans. Info. Th., 44, 2743–2760. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=720554
Benjamini, Y. and Y. Hochberg, 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing, Journal of the Royal Statistical Society B, 57, 289–300. http://www.jstor.org/stable/2346101
Berk, R., L. Brown, E. George, E. Pitkin, M. Traskin, K. Zhang, and L. Zhao, 2012. What You Can Learn From Wrong Causal Models.
Berk, R., L. Brown, A. Buja, K. Zhang, and L. Zhao, 2011. Valid Post-Selection Inference, stat.wharton.upenn.edu/~buja/PoSI.pdf
Berk, R., L. Brown, and L. Zhao, 2009. Statistical Inference After Model Selection, J. Quant Criminol DOI 10.1007/s10940-009-9077-7. http://statistics.wharton.upenn.edu/documents/research/BerkBrownZhao2.pdf
Chakraborti, S. and J. Li, 2007. Confidence Interval Estimation of a Normal Percentile, The American Statistician, 61, 331–336. http://dx.doi.org/10.1198/000313007X244457
Chamberlain, G., 1982. Multivariate regression models for panel data, J. Econometrics, 18, 5–46. http://pdn.sciencedirect.com/science?_ob=MiamiImageURL&_cid=271689&_user=4420&_pii=030440768290094X&_check=y&_origin=search&_zone=rslt_list_item&_coverDate=1982-01-31&wchp=dGLbVlV-zSkzk&md5=2f1c5dc4376cd27f7cfd359c1eeec0c0/1-s2.0-030440768290094X-main.pdf
Cousins, R.D., 2011. Negatively Biased Relevant Subsets Induced by the Most-Powerful One-Sided Upper Confidence Limits for a Bounded Physical Parameter, http://arxiv.org/abs/1109.2023
Eckhardt, D.H., 1984. Correlations Between Global Features of Terrestrial Fields, Math. Geol., 16, 155–171. http://www.springerlink.com/content/jw023j7157806hn4/fulltext.pdf
Federal Judicial Center, 2000. Reference Manual on Scientific Evidence. www.fjc.gov/public/pdf.nsf/lookup/sciman00.pdf/$file/sciman00.pdf
Reference Guide on Statistics, David H. Kaye & David A. Freedman;
Reference Guide on Survey Research, Shari Seidman Diamond
Field, E.H., K.R. Milner, and the 2007 Working Group on California Earthquake Probabilities, 2008. Forecasting California's Earthquakes—What Can We Expect in the Next 30 Years? http://pubs.usgs.gov/fs/2008/3027/fs2008-3027.pdf
Freedman, D.A., 2009. Statistical Models, Theory and Practice, Cambridge. http://www.amazon.com/Statistical-Models-Practice-David-Freedman/dp/0521743850/
- Observational Studies and Experiments
- Path Models
- Maximum Likelihood
Freedman, D.A., 2009. Statistical Models and Causal Inference: A Dialogue with the Social Sciences, Cambridge. http://www.amazon.com/Statistical-Models-Causal-Inference-Dialogue/dp/0521123909/
- Issues in the Foundations of Statistics: Probability and Statistical Models
- Statistical Assumptions as Empirical Commitments
- What is the Chance of an Earthquake?
- Survival Analysis: An Epidemiological Hazard?
- On Regression Adjustments in Experiments with Several Treatments
- Randomization Does not Justify Logistic Regression
- Diagnostics Cannot Have Much Power Against General Alternatives
- On Types of Scientific Inquiry: The Role of Qualitative Reasoning
Geller, R.J., 2011. Shake-up time for Japanese seismology, Nature, 472, 407–409. doi:10.1038/nature10105 http://www.nature.com/nature/journal/v472/n7344/full/nature10105.html
Golomb, B.A., L.C. Erickson, S. Koperski, D. Sack, M. Enkin, and J. Howick, 2010. What's in Placebos: Who Knows? Analysis of Randomized, Controlled Trials Ann. Intern. Med., 153, 532–535. http://www.annals.org/content/153/8/532.abstract
Hansen, M.H., and B. Yu, 2001. Model Selection and the Principle of Minimum Description Length. J. Am. Stat. Assoc., 96(454), 746–774. doi:10.1198/016214501753168398. http://pubs.amstat.org/doi/pdf/10.1198/016214501753168398
Hide, R. and S.R.C. Malin, 1970. Novel correlations between global features of the Earth's gravitational and magnetic fields, Nature, 225, 605–609. http://www.nature.com/nature/journal/v225/n5233/pdf/225605a0.pdf
Jönrup, H. and B. Rennermalm, 1976. Regression analysis in samples from finite populations, Scandinavian J. Statistics, 3, 33–36. http://www.jstor.org/stable/4615605
Kaptchuk, T.J., W.B. Stason, R.B. Davis, A.T.R. Legedza, R.N. Schnyer, C.E. Kerr, D.A. Stone, B.H. Nam, I. Kirsch, and R.H. Goldman, 2006. Sham device v inert pill: randomised controlled trial of two placebo treatments, British Medical Journal, doi:10.1136/bmj.38726.603310.55 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1370970/pdf/bmj33200391.pdf
Lindeman, M. and P.B. Stark, 2011. A Gentle Introduction to Risk-Limiting Audits. http://statistics.berkeley.edu/~stark/Preprints/gentle11.pdf
Loredo, T., 1994. The return of the prodical: Bayesian inference in astrophysics. http://www.astro.cornell.edu/staff/loredo/bayes/return.pdf
McCormick, D., D.H. Bor, S. Woolhandler and D.U. Himmelstein, 2012. Giving Office-Based Physicians Electronic Access To Patients' Prior Imaging And Lab Results Did Not Deter Ordering Of Tests, Health Affairs, 31, 488–496. http://content.healthaffairs.org/content/31/3/488.full.pdf+html
NY Times article about the study:
http://www.nytimes.com/2012/03/06/business/digital-records-may-not-cut-health-costs-study-cautions.html?_r=1
Miratrix, L.W., J.S. Sekhon, and B. Yu, 2012. Adjusting Treatment Effect Estimates by Post-Stratification in Randomized Experiments, http://sekhon.berkeley.edu/papers/postadjustment.pdf
Morelli, A., and A.M. Dziewonski, 1987. Topography of the core-mantle boundary and lateral homogeneity of the liquid core, Nature, 325, 678–683. http://www.nature.com/nature/journal/v325/n6106/pdf/325678a0.pdf
Moseley, J.B., K. O'Malley, N.J. Petersen, T.J. Menke, B.A. Brody, D.H. Kuykendall, J.C. Hollingsworth, C.M. Ashton, and N.P. Wray, 2002. A Controlled Trial of Arthroscopic Surgery for Osteoarthritis of the Knee New Engl. J. Med., 347(2), 81–88. http://www.nejm.org/doi/pdf/10.1056/NEJMoa013259
Noymer, A., A. Penner, and A. Saperstein, 2011. Cause of death affects racial classification on death certificates. PLoS One 6(1):e15812 https://webfiles.uci.edu/noymer/web/journal.pone.0015812.pdf
Pan, An, Qi Sun, A.M. Bernstein, M.B. Schulze, J.E. Manson, M.J. Stampfer, W.C. Willett, and F.B. Hu, 2012. Red Meat Consumption and Mortality: Results From 2 Prospective Cohort Studies Archives of Intern Med. Published online March 12, 2012. doi:10.1001/archinternmed.2011.2287 http://archinte.ama-assn.org/cgi/content/full/archinternmed.2011.2287
Also news reports of the findings:
http://www.latimes.com/health/boostershots/la-heb-red-meat-why-bad-20120314,0,181706.story
http://www.reuters.com/article/2012/03/14/us-health-redmeat-idUSBRE82C1AT20120314
Peck, A.J., 2012. Decision and Order in MONIQUE DA SILVA MOORE, et al., v. PUBLICIS GROUPE & MSL GROUP, 11 Civ. 1279 (ALC) (AJP). http://www.mofo.com/files/Uploads/Images/120301-First-Ever-Court-Decision-on-Predictive-Coding-Attachment.pdf
Penner, A.M. and A. Saperstein. 2008. How social status shapes race. Proceedings of the National Academy of Sciences, 105, 19,628–19,630. http://www.socsci.uci.edu/~penner/media/pnas.pdf
Pulliam, R.J. and P.B. Stark, 1993. Bumps on the Core-Mantle Boundary: Are they facts or artifacts? J. Geophysical Res., 98, 1943–1956. http://www.agu.org/journals/jb/v098/iB02/92JB02692/92JB02692.pdf
Schafer, J.P., 2011. An exact multiple comparsions test for a multinomial distribution. British J. Math. Stat. Psych., 24(2), 267–272. DOI: 10.1111/j.2044-8317.1971.tb00471.x http://onlinelibrary.wiley.com/doi/10.1111/j.2044-8317.1971.tb00471.x/pdf.
Shearer, P.M., and P.B. Stark, 2011. The global risk of big earthquakes has not recently increased, Proc. Nat. Acad. Sci., DOI 10.1073/pnas.1118525109, http://www.pnas.org/content/early/2011/12/12/1118525109.full.pdf+html
Smoot, G.F., C.L. Bennett, A. Kogut, E.L. Wright, J. Aymon, N.W. Boggess, E.S. Cheng, G. De Amici, S. Gulkis, M.G. Hauser, G. Hinshaw, C. Lineweaver, K. Lowenstein, P.D. Jackson, M. Janssen, E. Kaita, T. Kelsall, P. Keegstra, P. Lubin, J. Mather, S.S. Meyer, S.H. Moseley, T. Murdock, L. Rokke, R.F. Silverberg, L. Tenorio, R. Weiss, and D.T. Wilkinson, 1992. Structure in the COBE DMR First Year Maps, Astroph. J., 396, L1. http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=1992ApJ...396L...1S&link_type=ARTICLE&db_key=AST&high=
Stark, P.B., and N.W. Hengartner, 1993. Reproducing Earth's Kernel: Uncertainty of the shape of the Core-Mantle Boundary from PKP and PcP Travel Times, J. Geophys. Res., 98, 1957–1972. http://www.agu.org/journals/jb/v098/iB02/92JB02071/92JB02071.pdf
Stark, P.B., 1993. Uncertainty of the COBE Quadrupole Detection, Astroph. J. Lett., 408, L73. http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=1993ApJ...408L..73S&link_type=ARTICLE&db_key=AST&high=
Stark, P.B., 2008. The effectiveness of Internet content filters, I/S: A Journal of Law and Policy for the Information Society, 4, 411–429. Preprint: http://statistics.berkeley.edu/ stark/Preprints/filter07.pdf
Stark, P.B., 2009. Risk-limiting post-election audits: P-values from common probability inequalities. IEEE Transactions on Information Forensics and Security, 4, 1005–1014. http://statistics.berkeley.edu/~stark/Preprints/pvalues09.pdf
Stark, P.B., 2012. Constraints versus Priors. http://statistics.berkeley.edu/~stark/Preprints/constraintsPriors12.pdf
Stein, S., R.J. Geller, and M. Liu, 2011. Why Earthquake Hazard Maps Often Fail and What To Do About It, http://www.earth.northwestern.edu/people/seth/Texts/mapfailure.pdf
Tenorio, L. and P.B. Stark and C.H. Lineweaver, 1999. Bigger uncertainties and the Big Bang, Inverse Problems, 15, 329–341. http://iopscience.iop.org/0266-5611/15/1/029/pdf/0266-5611_15_1_029.pdf
U.S. Geological Survey, 2008. 2008 Bay Area Earthquake Probabilities. http://earthquake.usgs.gov/regional/nca/ucerf/
U.S. Court of Appeals, Seventh Circuit, 2011. Opinion in Nos. 11-1382, 11-1492 ATA AIRLINES, INC., Plaintiff-Appellee, Cross-Appellant, v. FEDERAL EXPRESS CORPORATION, Defendant-Appellant, Cross-Appellee. http://docs.justia.com/cases/federal/appellate-courts/ca7/11-1382/11-1382-2011-12-27-opinion-2011-12-27.pdf
Vul, Edward, Christine Harris, Piotr Winkielman, and Harold Pashler, 2009. Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition, Perspectives on Psychological Science, 4(3), 274–290. http://www.edvul.com/pdf/VulHarrisWinkielmanPashler-PPS-2009.pdf (Also Scientific American article: http://www.scientificamerican.com/article.cfm?id=brain-scan-results-overstated)
White, P.D., K.A. Goldsmith, A.L. Johnson, L. Potts, R. Walwyn, J.C. DeCesare, H.L. Baber, M. Burgess, L.V. Clark, D.L. Cox, J. Bavinton, B.J. Angus, G. Murphy, M. Murphy, H. O'Dowd, D. Wilks, P. McCrone, T. Chalder, and M. Sharpe, 2011. Comparison of adaptive pacing therapy, cognitive behaviour therapy, graded exercise therapy, and specialist medical care for chronic fatigue syndrome (PACE): a randomised trial The Lancet, DOI:10.1016/S0140-6736(11)60096-2. http://esme-eu.com/getfile.php/Files/PACE-Trial-MRC-DWP%5B1%5D.pdf

Assignments

Read Freedman, Statistical Models: Theory and Practice (SMTP), Chapters 1–4;
Freedman, Statistical Models and Causal Inference: A Dialogue with the Social Sciences (SMCI), Chapters 1, 8;
(chapter 8 is also here: href="http://statistics.berkeley.edu/~stark/Preprints/611.pdf)
Shearer & Stark, 2011.
[Due 1/26 in class] Freedman, SMTP, problems 4.B.7, 4.B.8, 4.B.11, 4.5.3, 4.5.5, 4.5.6, 4.5.10, 4.5.11.
[Due 2/2 in class. Relates to the climate change paper we discussed in class on 1/17.]
1. Consider a random walk with n=137 steps, constructed as follows:
  X(0) = 0.
  [X(i) - X(i-1)], i = 1, … 136, are IID, and take the value +1 or -1 with probability 1/2 each.
  You will test the hypothesis that a=0 on the assumption that the data (or subsets of the data) come from the normal linear model X(i) = ai + b + ε_i, where the errors {ε_i} are IID N(0, σ²), with σ² unknown (to be estimated from the data), based on fitting the model by OLS.
  - (a) By simulation, estimate the actual significance level of a nominal 5% test of the hypothesis a = 0. That is, estimate how often OLS estimate of the slope a is "statistically significant at level 5%" when the significance calculation assumes that the normal linear model is true. Justify your choice of the number of replications in the simulation.
  - (b) By simulation, estimate the chance that the sign of the slope of the line fitted (by OLS) to the last 58 points in the series differs from the slope of the line fitted (by OLS) to the entire series of 137 points. Justify your choice of the number of replications.
  - (c) By simulation, estimate the chance that the sign of the slope of the line fitted (by OLS) to the last 58 points in the series differs from the slope of the line fitted (by OLS) to the entire series of 137 points, and that both estimated slopes are statistically significant at level 5%. Justify your choice of the number of replications.
  - (d) By simulation, estimate the chance that the sign of the slope of the line fitted (by OLS) to some contiguous block of 58 points in the series differs from the slope of the line fitted (by OLS) to the entire series of 137 points, and that both estimated slopes are statistically significant at level 5%. Justify your choice of the number of replications.
  - (e) By simulation, estimate the chance that the sign of the slope of the line fitted (by OLS) to some contiguous block of at least 30 points in the series differs from the slope of the line fitted (by OLS) to the entire series of 137 points, and that both estimated slopes are statistically significant at level 5%. Justify your choice of the number of replications.
2. Now consider a different generating process:
  X(0) = 0. X(1) = 1.
  P([X(i) - X(i-1)] = [X(i-1) - X(i-2)]) = p, and
  P([X(i) - X(i-1)] = -[X(i-1) - X(i-2)]) = 1-p, i = 2, … 136.
  By simulation, estimate the probabilities in parts 1(a)–1(e) (above) when this process (rather than the random walk) generates the data, for p = 0.7, 0.8, and 0.9.
3. What do you conclude about the significance of estimated regression coefficients when the regression model did not generate the data? What do you conclude about the climate change study? Discuss.

Read Freedman, SMTP, Chapter 7; Freedman, SMCI, Chapters 12, 13; White et al. (2011).
[Due 2/27].

Freedman, SMTP, problems 7.B.2, 7.B.3, 7.C.5, 7.D.7, 7.E.2, 7.E.3, 7.E.10, 7.5.2, 7.5.3, 7.5.4, 7.5.5

As we discussed in class, the experimental design used by White et al. does not match the way they analyzed the data. Their design was stratified on various things (study center, severity of disease, etc.), but Fisher's exact test and the Kruskal-Wallis test assume simple randomization without stratification. Moreover, the study does not seem to account for multiplicity in the use of Fisher's exact test to compare three pairs of treatments. This assignment will look at the effect of the mismatch between the design and the analysis and the failure to take into account multiplicity on apparent p-values.
We have a population of 632 subjects (White et al. had 641 and then some were lost or excluded, and some responses were imputed; we're simplifying slightly). 158 subjects are assigned at random to each of four treatments. Consider a binary outcome variable, for instance, a variable that is 1 if at 52 weeks, the subject has improved by either 2 or more points on the Chalder fatigue questionnaire or by 8 or more points on the short form-36, and has improved on both; and that is zero otherwise. Let N denote the total number of 1s among the 632 subjects.

Suppose N=80 for the moment. Allocate those 80 1s at random to the four treatment groups (control and three others). Find the three p-values for pairwise comparisons of control to each of the other three treatments using Fisher's exact test. Repeat the random allocation 1,000 times. What's the estimated chance that at least one p-value is below 0.05? What's the estimated chance that at least two p-values are below 0.05? Plot the empirical CDF of the smallest p-value in each each simulation. Repeat this simulation for N=160 and N=320 and report the results.
The previous simulation ignored the stratification by centers. Invent a generalization of Fisher's exact test that takes stratification into account: the randomization across treatments does not mix across centers. Think of at least three ways to combine results across strata to get an overall test statistic. Explain what alternatives they should have the most power against.
Code the test in the previous question that you like best. Base the p-value on simulation, since the test statistic no longer has a hypergeometric distribution.

Suppose centers 1 and 2 have 106 subjects and centers 3—6 each have 105 subjects. Suppose that the reported results are as follows, where the numbers in parentheses are the number allocated to the treatment and the numbers not in parentheses are the number of 1s in the group.

Center	control	treatment 1	treatment 2	treatment 3
1	10 (27)	15 (27)	20 (26)	20 (26)
2	10 (27)	15 (27)	20 (26)	20 (26)
3	10 (27)	15 (26)	20 (26)	20 (26)
4	20 (27)	15 (26)	15 (26)	10 (26)
5	20 (27)	15 (26)	15 (26)	10 (26)
6	20 (27)	15 (26)	15 (26)	10 (26)

For the three paired comparisons with control, compare simulated p-values that take stratification into account with the p-values for Fisher's exact test (which ignores stratification). Try to find different sets of reported results that would make the two p-values differ as much as possible for some paired comparison. What happens if the centers have different sizes? Can you use Simpson's paradox to construct examples where the sign of the effect is reversed?

Read Read Golomb et al. 2010; Jönrup, H. and B. Rennermalm 1976; Kaptchuk et al. 2006; Berk et al. 2009 and 2011.
[Due 3/19].
1. Simulate 1,000 iid N(0,1) random variables. Take the subset that are larger than 2. Find a 1-sided (upper) p-value of the z-test of the hypothesis that the subset you selected is a random sample from a N(0,1) population. Repeat this overall simulation 1,000 times. Plot the empirical cdf of the p-values. What fraction are below 0.1? Why is that fraction so much larger than 0.1? Isn't the null hypothesis true? Discuss.
2. Simulate 1,000 iid N(0,1) random variables, as before, but instead of selecting those that are larger than 2, select the 50 that are largest. Find a 1-sided (upper) p-value of the z-test of the hypothesis that the subset you selected is a random sample from a N(0,1) population. Repeat this overall simulation 1,000 times. Plot the empirical cdf of the p-values. What fraction are below 0.1? Why is that fraction so much larger than 0.1? Isn't the null hypothesis true? Discuss. What's the difference between this situation and the first situation?
3. Reproduce the simulations described in the "Simulation Results" section of the Berk et al. 2009 paper, that produced figures 3–7. Reproduce the figures. Repeat the simulations, this time constructing 95% confidence intervals for any variables that are selected. What fraction of the confidence intervals constructed cover their corresponding parameter? Is there a notable difference between the coefficient in the model that is actually zero and those that are not? Discuss.
4. Simulate 600 iid N(0,1) random variables. Divide them into 6 groups of 100. Perform multiple linear regression of the first group onto the following 20 variables: the 5 other groups, the squares of the 5 other groups, the cubes of the other 5 groups, and the reciprocals of the other 5 groups. Select any estimated coefficients that are statistically significant at level 0.05. Construct 95% confidence intervals for just those "significant" coefficients. Note the number of confidence intervals you constructed, and the fraction of them that include zero—the true population value of all the coefficients in this set-up. Repeat the simulation of 600 variables, the regression, the selection, and the construction of confidence intervals, a total of 1,000 times. What fraction of simulations gave one or more confidence intervals? What fraction of simulations gave one or more confidence intervals that did not contain zero? What fraction of the confidence intervals you constructed overall contained zero? Discuss.
Read McCormick et al. 2012; Pan et al. 2012; Freedman, SMCI, Chapter 11.
[Due 4/9].
Comment critically (not necessarily negatively) on McCormick et al.:
- How were the data obtained? What kind of sample was it?
- How did they adjust for possible confounders? Stratification? Regression? A combination of the two? Do they give a simple crosstab of the data?
- What techniques were used in the analysis? Comment on the assumptions required for those techniques to be reliable, and discuss whether the assumptions are plausible in this application.
- Is multiplicity an issue in their analysis? If so, how (if at all) did they account for it?
- The paper presents confidence intervals for various things. What assumptions are those confidence intervals based on? What, precisely, is random in this study?
- Do they use model selection? Do they make confidence intervals for coefficients in the selected model? Comment.
- What are the three best things about the study? What part of their argument is most convincing?
- If you were designing the study and the data analysis, what are the three things you think are most important to do differently? Explain why they are important and how the approach taken in the paper might be misleading.
- Do you believe the findings? Why or why not?

P.B. Stark, statistics.berkeley.edu/~stark. http://statistics.berkeley.edu/~stark/Teach/S215B/S12/index.htm Last modified 9 April 2012.