Statistics 215B: Applied Statistics. Spring 2012
 Instructor: P.B. Stark, stark [AT] stat [DOT] berkeley [DOT] edu
Office Hours: Tuesdays, 11am–12pm, 403 Evans Hall
 GSI: Yuval Benjamini, yuvalb [AT] stat [DOT] berkeley [DOT] edu
Office Hours: Fridays 10am–12pm, 332 Evans Hall
 Meets: Tuesday, Thursday 9:3011am, 332 Evans Hall
 Texts: See reading list below
Course format:
3 hours of lecture per week, divided between discussing particular applications and papers, and
presenting theory and methodology.
There will be written assignments roughly every two weeks, and a term project that includes
a written report and an oral class presentation.
I hope that term projects will lead to publishable research: Bring your favorite data or favorite scientific
problem.
The written assignments will largely be drawn from Freedman's book
Statistical Models: Theory and Practice (2009 revised edition).
I will not be lecturing on all the chapters from which I assign problems: I expect students
to read and digest the material on their own, but I am happy to answer questions
in class or in office hours, and if something turns out to be a stumbling block for more
than a few students, I will lecture on it.
I plan to reserve most of the lecture time to talk about particular applications and case studies.
List of pervasive themes:

Making sense of probability in applications
 when the experiment creates the probability (randomization, instrumental error, etc.)
 when the scientific theory includes a random component (e.g., cosmology)
 when the analyst pretends (statistical models, in general; earthquake prediction)
 when the probability model is postulated just to evaluate plausibility

Cultures of different applied disciplines
 geophysics
 cosmology
 helioseismology
 litigation
 elections
 Bayesian versus frequentist leanings in different disciplines
 coherent and incoherent analyses
 attention to implicit and explicit assumptions
 statistics: tool, incantation, or fauxphisication?

Solving real problems versus applying methods to data
 What's the big picture? (requires learning some science)
 Data quality, data quality, data quality
 The (lazy) tendency to classify problems by data type
 Choosing a good question (requires learning some science)
 Helping design experiments (requires learning some science)
 Designing methods to fit the problem: standard ≠ appropriate (requires learning some science)
 You can't always get what you (or your collaborators) want

Model selection, model choice
 What's the goal? Prediction? Estimation? Adjustment? Inference?
 Occam's Razor versus The Ostrich Principle
 Postselection inference about model parameters. Meaning, methods, and madness

Test selection and its perils

Uncertainty quantification

Testing nonparametric hypotheses

Causal inference: randomization, the Neymann model, regression adjustments to experimental data, response schedules

Path models

[possible] Hierarchical linear models
List of applications (preliminary, time permitting):

Geophysics:
 Earthquake prediction, hazard maps, clustering
 Seismic structure of Earth, bumps on the coremantle boundary
 Correlation of the geoid and magnetic field

Astrophysics
 Microwave cosmology
 using supernovae to measure the expansion of the universe

Voting
 Signature verification
 Election auditing

Medical research
 Placebos and active placebos
 Voodoo correlation

Litigation
 Sampling (in wage and hour and consumer class actions, and other)
 Damage models

Education, Sociology, Economics
 The effect of Catholic Schools
 Modeling credit risk
Techniques and tools likely to be discussed
 AIC, BIC, Mallows C_{p}, Minimum Description Length
 Confidence sets, tests, and the duality between them
 Constraints versus priors in scientific problems
 Credible regions and their connection to confidence sets
 Inverse problems
 Linear models and leastsquares
 Logit and Probit models
 Maximum likelihood
 Nonparametric inference about the mean of a restricted population
 Optimization in infinitedimensional spaces
 Path Models
 Permutation tests, the 2sample problem, Fisher's Exact test and generalizations
 Prediction intervals and tolerance intervals, nonparametric and Gaussian
 Randomization
 Sampling (simple, with replacement, stratified, cluster, proportionaltosize, multistage;
ratio estimates, confidence intervals, tests)
Reading list (preliminary)

Angell, M., 2011.
The Epidemic of Mental Illness: Why?,
The New York Review of Books
http://www.nybooks.com/articles/archives/2011/jun/23/epidemicmentalillnesswhy/?pagination=false

Angell, M., 2011.
The Illusions of Psychiatry,
The New York Review of Books
http://www.nybooks.com/articles/archives/2011/jul/14/illusionsofpsychiatry/?pagination=false

Barron, A., J. Rissanan, and B. Yu, 1998.
The Minimum Description Length Principle in Coding and Modeling,
IEEE Trans. Info. Th., 44, 2743–2760.
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=720554

Benjamini, Y. and Y. Hochberg, 1995.
Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing,
Journal of the Royal Statistical Society B, 57, 289–300.
http://www.jstor.org/stable/2346101

Berk, R., L. Brown, E. George, E. Pitkin, M. Traskin, K. Zhang, and L. Zhao, 2012.
What You Can Learn From Wrong Causal Models.
 Berk, R., L. Brown, A. Buja, K. Zhang, and L. Zhao, 2011.
Valid PostSelection Inference,
stat.wharton.upenn.edu/~buja/PoSI.pdf
 Berk, R., L. Brown, and L. Zhao, 2009.
Statistical Inference After Model Selection,
J. Quant Criminol DOI 10.1007/s1094000990777.
http://statistics.wharton.upenn.edu/documents/research/BerkBrownZhao2.pdf

Chakraborti, S. and J. Li, 2007.
Confidence Interval Estimation of a Normal Percentile,
The American Statistician, 61, 331–336.
http://dx.doi.org/10.1198/000313007X244457

Chamberlain, G., 1982.
Multivariate regression models for panel data,
J. Econometrics, 18, 5–46.
http://pdn.sciencedirect.com/science?_ob=MiamiImageURL&_cid=271689&_user=4420&_pii=030440768290094X&_check=y&_origin=search&_zone=rslt_list_item&_coverDate=19820131&wchp=dGLbVlVzSkzk&md5=2f1c5dc4376cd27f7cfd359c1eeec0c0/1s2.0030440768290094Xmain.pdf

Cousins, R.D., 2011.
Negatively Biased Relevant Subsets Induced by the MostPowerful
OneSided Upper Confidence Limits for a Bounded Physical Parameter,
http://arxiv.org/abs/1109.2023

Eckhardt, D.H., 1984. Correlations Between Global Features of Terrestrial Fields,
Math. Geol., 16, 155–171.
http://www.springerlink.com/content/jw023j7157806hn4/fulltext.pdf

Federal Judicial Center, 2000.
Reference Manual on Scientific Evidence.
www.fjc.gov/public/pdf.nsf/lookup/sciman00.pdf/$file/sciman00.pdf
Reference Guide on Statistics, David H. Kaye & David A. Freedman;
Reference Guide on Survey Research, Shari Seidman Diamond

Field, E.H., K.R. Milner, and the 2007 Working Group on California Earthquake Probabilities,
2008. Forecasting California's Earthquakes—What Can We Expect in the Next 30 Years?
http://pubs.usgs.gov/fs/2008/3027/fs20083027.pdf

Freedman, D.A., 2009. Statistical Models, Theory and Practice, Cambridge.
http://www.amazon.com/StatisticalModelsPracticeDavidFreedman/dp/0521743850/
 Observational Studies and Experiments
 Path Models
 Maximum Likelihood
 Freedman, D.A., 2009. Statistical Models and Causal Inference: A Dialogue with the Social Sciences,
Cambridge.
http://www.amazon.com/StatisticalModelsCausalInferenceDialogue/dp/0521123909/
 Issues in the Foundations of Statistics: Probability and Statistical Models
 Statistical Assumptions as Empirical Commitments
 What is the Chance of an Earthquake?
 Survival Analysis: An Epidemiological Hazard?
 On Regression Adjustments in Experiments with Several Treatments
 Randomization Does not Justify Logistic Regression
 Diagnostics Cannot Have Much Power Against General Alternatives
 On Types of Scientific Inquiry: The Role of Qualitative Reasoning

Geller, R.J., 2011. Shakeup time for Japanese seismology,
Nature, 472, 407–409. doi:10.1038/nature10105
http://www.nature.com/nature/journal/v472/n7344/full/nature10105.html

Golomb, B.A., L.C. Erickson, S. Koperski, D. Sack, M. Enkin, and J. Howick, 2010.
What's in Placebos: Who Knows? Analysis of Randomized, Controlled Trials
Ann. Intern. Med., 153, 532–535.
http://www.annals.org/content/153/8/532.abstract

Hansen, M.H., and B. Yu, 2001.
Model Selection and the Principle of Minimum Description Length.
J. Am. Stat. Assoc., 96(454), 746–774.
doi:10.1198/016214501753168398.
http://pubs.amstat.org/doi/pdf/10.1198/016214501753168398
 Hide, R. and S.R.C. Malin, 1970.
Novel correlations between global features of the Earth's gravitational and magnetic fields,
Nature, 225, 605–609.
http://www.nature.com/nature/journal/v225/n5233/pdf/225605a0.pdf

Jönrup, H. and B. Rennermalm, 1976.
Regression analysis in samples from finite populations,
Scandinavian J. Statistics, 3, 33–36.
http://www.jstor.org/stable/4615605

Kaptchuk, T.J., W.B. Stason, R.B. Davis, A.T.R. Legedza, R.N. Schnyer, C.E. Kerr,
D.A. Stone, B.H. Nam, I. Kirsch, and R.H. Goldman, 2006.
Sham device v inert pill: randomised controlled trial of two placebo treatments,
British Medical Journal, doi:10.1136/bmj.38726.603310.55
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1370970/pdf/bmj33200391.pdf

Lindeman, M. and P.B. Stark, 2011.
A Gentle Introduction to RiskLimiting Audits.
http://statistics.berkeley.edu/~stark/Preprints/gentle11.pdf

Loredo, T., 1994.
The return of the prodical: Bayesian inference in astrophysics.
http://www.astro.cornell.edu/staff/loredo/bayes/return.pdf

McCormick, D., D.H. Bor, S. Woolhandler and D.U. Himmelstein, 2012.
Giving OfficeBased Physicians Electronic Access To Patients' Prior Imaging
And Lab Results Did Not Deter Ordering Of Tests,
Health Affairs, 31, 488–496.
http://content.healthaffairs.org/content/31/3/488.full.pdf+html
NY Times article about the study:
http://www.nytimes.com/2012/03/06/business/digitalrecordsmaynotcuthealthcostsstudycautions.html?_r=1

Miratrix, L.W., J.S. Sekhon, and B. Yu, 2012.
Adjusting Treatment Effect Estimates by PostStratiﬁcation in Randomized Experiments,
http://sekhon.berkeley.edu/papers/postadjustment.pdf

Morelli, A., and A.M. Dziewonski, 1987. Topography of the coremantle boundary and lateral
homogeneity of the liquid core,
Nature, 325, 678–683.
http://www.nature.com/nature/journal/v325/n6106/pdf/325678a0.pdf

Moseley, J.B., K. O'Malley, N.J. Petersen, T.J. Menke, B.A. Brody, D.H. Kuykendall,
J.C. Hollingsworth, C.M. Ashton, and N.P. Wray,
2002.
A Controlled Trial of Arthroscopic Surgery for Osteoarthritis of the Knee
New Engl. J. Med., 347(2), 81–88.
http://www.nejm.org/doi/pdf/10.1056/NEJMoa013259

Noymer, A., A. Penner, and A. Saperstein, 2011.
Cause of death affects racial classification on death certificates.
PLoS One 6(1):e15812
https://webfiles.uci.edu/noymer/web/journal.pone.0015812.pdf

Pan, An, Qi Sun, A.M. Bernstein, M.B. Schulze,
J.E. Manson, M.J. Stampfer, W.C. Willett, and F.B. Hu, 2012.
Red Meat Consumption and Mortality: Results From 2 Prospective Cohort Studies
Archives of Intern Med. Published online March 12, 2012. doi:10.1001/archinternmed.2011.2287
http://archinte.amaassn.org/cgi/content/full/archinternmed.2011.2287
Also news reports of the findings:
http://www.latimes.com/health/boostershots/lahebredmeatwhybad20120314,0,181706.story
http://www.reuters.com/article/2012/03/14/ushealthredmeatidUSBRE82C1AT20120314

Peck, A.J., 2012.
Decision and Order in MONIQUE DA SILVA MOORE, et al., v. PUBLICIS GROUPE & MSL GROUP,
11 Civ. 1279 (ALC) (AJP).
http://www.mofo.com/files/Uploads/Images/120301FirstEverCourtDecisiononPredictiveCodingAttachment.pdf

Penner, A.M. and A. Saperstein. 2008.
How social status shapes race.
Proceedings of the National Academy of Sciences, 105, 19,628–19,630.
http://www.socsci.uci.edu/~penner/media/pnas.pdf

Pulliam, R.J. and P.B. Stark, 1993.
Bumps on the CoreMantle Boundary: Are they facts or artifacts?
J. Geophysical Res., 98, 1943–1956.
http://www.agu.org/journals/jb/v098/iB02/92JB02692/92JB02692.pdf

Schafer, J.P., 2011.
An exact multiple comparsions test for a multinomial distribution.
British J. Math. Stat. Psych., 24(2), 267–272.
DOI: 10.1111/j.20448317.1971.tb00471.x
http://onlinelibrary.wiley.com/doi/10.1111/j.20448317.1971.tb00471.x/pdf.

Shearer, P.M., and P.B. Stark, 2011. The global risk of big earthquakes has not recently increased,
Proc. Nat. Acad. Sci., DOI 10.1073/pnas.1118525109,
http://www.pnas.org/content/early/2011/12/12/1118525109.full.pdf+html

Smoot, G.F., C.L. Bennett, A. Kogut, E.L. Wright,
J. Aymon, N.W. Boggess, E.S. Cheng, G. De Amici,
S. Gulkis, M.G. Hauser, G. Hinshaw, C. Lineweaver,
K. Lowenstein, P.D. Jackson, M. Janssen, E. Kaita,
T. Kelsall, P. Keegstra, P. Lubin, J. Mather,
S.S. Meyer, S.H. Moseley, T. Murdock, L. Rokke,
R.F. Silverberg,
L. Tenorio, R. Weiss, and D.T. Wilkinson, 1992.
Structure in the COBE DMR First Year Maps,
Astroph. J., 396, L1.
http://adsabs.harvard.edu/cgibin/nphdata_query?bibcode=1992ApJ...396L...1S&link_type=ARTICLE&db_key=AST&high=

Stark, P.B., and N.W. Hengartner, 1993.
Reproducing Earth's Kernel: Uncertainty of the shape of the
CoreMantle Boundary from PKP and PcP Travel Times,
J. Geophys. Res., 98, 1957–1972.
http://www.agu.org/journals/jb/v098/iB02/92JB02071/92JB02071.pdf

Stark, P.B., 1993.
Uncertainty of the COBE Quadrupole Detection,
Astroph. J. Lett., 408, L73.
http://adsabs.harvard.edu/cgibin/nphdata_query?bibcode=1993ApJ...408L..73S&link_type=ARTICLE&db_key=AST&high=

Stark, P.B., 2008. The effectiveness of Internet content filters,
I/S: A Journal of Law and Policy for the Information Society, 4, 411–429. Preprint:
http://statistics.berkeley.edu/ stark/Preprints/filter07.pdf

Stark, P.B., 2009. Risklimiting postelection audits:
Pvalues from common probability inequalities. IEEE Transactions on Information Forensics and Security,
4, 1005–1014.
http://statistics.berkeley.edu/~stark/Preprints/pvalues09.pdf

Stark, P.B., 2012. Constraints versus Priors.
http://statistics.berkeley.edu/~stark/Preprints/constraintsPriors12.pdf

Stein, S., R.J. Geller, and M. Liu, 2011.
Why Earthquake Hazard Maps Often Fail and What To Do About It,
http://www.earth.northwestern.edu/people/seth/Texts/mapfailure.pdf

Tenorio, L. and P.B. Stark and C.H. Lineweaver, 1999.
Bigger uncertainties and the Big Bang,
Inverse Problems, 15, 329–341.
http://iopscience.iop.org/02665611/15/1/029/pdf/02665611_15_1_029.pdf

U.S. Geological Survey, 2008. 2008 Bay Area Earthquake Probabilities.
http://earthquake.usgs.gov/regional/nca/ucerf/

U.S. Court of Appeals, Seventh Circuit, 2011.
Opinion in Nos. 111382, 111492 ATA AIRLINES, INC., PlaintiffAppellee, CrossAppellant, v.
FEDERAL EXPRESS CORPORATION, DefendantAppellant, CrossAppellee.
http://docs.justia.com/cases/federal/appellatecourts/ca7/111382/11138220111227opinion20111227.pdf

Vul, Edward, Christine Harris, Piotr Winkielman, and Harold Pashler, 2009.
Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social
Cognition,
Perspectives on Psychological Science, 4(3), 274–290.
http://www.edvul.com/pdf/VulHarrisWinkielmanPashlerPPS2009.pdf
(Also Scientific American article:
http://www.scientificamerican.com/article.cfm?id=brainscanresultsoverstated)

White, P.D., K.A. Goldsmith, A.L. Johnson, L. Potts, R. Walwyn, J.C. DeCesare, H.L. Baber, M. Burgess,
L.V. Clark, D.L. Cox, J. Bavinton, B.J. Angus, G. Murphy, M. Murphy, H. O'Dowd, D. Wilks, P. McCrone, T. Chalder,
and M. Sharpe, 2011.
Comparison of adaptive pacing therapy, cognitive behaviour therapy, graded exercise therapy, and specialist medical care
for chronic fatigue syndrome (PACE): a randomised trial
The Lancet, DOI:10.1016/S01406736(11)600962.
http://esmeeu.com/getfile.php/Files/PACETrialMRCDWP%5B1%5D.pdf
Assignments

Read Freedman, Statistical Models: Theory and Practice (SMTP), Chapters 1–4;
Freedman, Statistical Models and Causal Inference: A Dialogue with the Social Sciences (SMCI), Chapters 1, 8;
(chapter 8 is also here:
href="http://statistics.berkeley.edu/~stark/Preprints/611.pdf)
Shearer & Stark, 2011.
[Due 1/26 in class]
Freedman, SMTP, problems 4.B.7, 4.B.8, 4.B.11, 4.5.3, 4.5.5, 4.5.6, 4.5.10, 4.5.11.

[Due 2/2 in class. Relates to the climate change paper we discussed in class on 1/17.]

Consider a random walk with n=137 steps, constructed as follows:
X(0) = 0.
[X(i)  X(i1)], i = 1, … 136, are IID,
and take the value +1 or 1 with probability 1/2 each.
You will test the hypothesis that a=0 on the assumption that the data (or subsets of the data)
come from the normal linear model
X(i) = ai + b + ε_{i}, where the errors {ε_{i}}
are IID N(0, σ^{2}), with σ^{2} unknown (to be estimated from the data),
based on fitting the model by OLS.

(a) By simulation, estimate the actual significance level of a nominal 5% test of the hypothesis a = 0.
That is, estimate how often OLS estimate of the slope a is "statistically significant at level 5%" when the
significance calculation assumes that the normal linear model is true.
Justify your choice of the number of replications in the simulation.

(b) By simulation, estimate the chance that the sign of the slope of the line fitted (by OLS) to the last 58 points
in the series differs from the slope of the line fitted (by OLS) to the entire series of 137 points.
Justify your choice of the number of replications.

(c) By simulation, estimate the chance that the sign of the slope of the line fitted (by OLS) to the last 58 points
in the series differs from the slope of the line fitted (by OLS) to the entire series of 137 points,
and that both estimated slopes are statistically significant at level 5%.
Justify your choice of the number of replications.

(d) By simulation, estimate the chance that the sign of the slope of the line fitted
(by OLS) to some contiguous block of 58 points
in the series differs from the slope of the line fitted (by OLS) to the entire series of 137 points,
and that both estimated slopes are statistically significant at level 5%.
Justify your choice of the number of replications.

(e) By simulation, estimate the chance that the sign of the slope of the line fitted (by OLS) to some contiguous
block of at least 30 points
in the series differs from the slope of the line fitted (by OLS) to the entire series of 137 points,
and that both estimated slopes are statistically significant at level 5%.
Justify your choice of the number of replications.

Now consider a different generating process:
X(0) = 0. X(1) = 1.
P([X(i)  X(i1)] = [X(i1)  X(i2)]) = p, and
P([X(i)  X(i1)] = [X(i1)  X(i2)]) = 1p, i = 2, … 136.
By simulation, estimate the probabilities in parts 1(a)–1(e) (above) when this process (rather than the random walk)
generates the data, for p = 0.7, 0.8, and 0.9.

What do you conclude about the significance of estimated regression coefficients when the regression
model did not generate the data?
What do you conclude about the climate change study? Discuss.

Read Freedman, SMTP, Chapter 7; Freedman, SMCI, Chapters 12, 13; White et al. (2011).
[Due 2/27].

Freedman, SMTP, problems 7.B.2, 7.B.3, 7.C.5, 7.D.7, 7.E.2, 7.E.3, 7.E.10, 7.5.2, 7.5.3, 7.5.4, 7.5.5

As we discussed in class, the experimental design used by White et al. does not match the way they
analyzed the data.
Their design was stratified on various things (study center, severity of disease, etc.), but Fisher's exact test
and the KruskalWallis test assume simple randomization without stratification.
Moreover, the study does not seem to account for multiplicity in the use of Fisher's exact test to compare
three pairs of treatments.
This assignment will look at the effect of the mismatch between the design and the analysis and
the failure to take into account multiplicity on apparent pvalues.
We have a population of 632 subjects (White et al. had 641 and then some were lost or excluded, and
some responses were imputed;
we're simplifying slightly).
158 subjects are assigned at random to each of four treatments.
Consider a binary outcome variable, for instance, a variable that is 1 if
at 52 weeks, the subject has improved by
either 2 or more points on the Chalder fatigue questionnaire or by 8 or more points on the short form36,
and has improved on both; and that is zero otherwise.
Let N denote the total number of 1s among the 632 subjects.

Suppose N=80 for the moment. Allocate those 80 1s at random to the four treatment groups (control and
three others).
Find the three pvalues for pairwise comparisons of control to each of the other
three treatments using
Fisher's exact test.
Repeat the random allocation 1,000 times.
What's the estimated chance that at least one pvalue is below 0.05?
What's the estimated chance that at least two pvalues are below 0.05?
Plot the empirical CDF of the smallest pvalue in each each simulation.
Repeat this simulation for N=160 and N=320 and report the results.

The previous simulation ignored the stratification by centers.
Invent a generalization of Fisher's exact test that takes stratification into account:
the randomization across treatments does not mix across centers.
Think of at least three ways to combine results across strata to get an overall test statistic.
Explain what alternatives they should have the most power against.

Code the test in the previous question that you like best.
Base the pvalue on simulation, since the test statistic no longer has a hypergeometric
distribution.

Suppose centers 1 and 2 have 106 subjects and centers 3—6 each have 105 subjects.
Suppose that the reported results are as follows, where
the numbers in parentheses are the number allocated to the treatment and the numbers not in parentheses
are the number of 1s in the group.
Center  control  treatment 1  treatment 2  treatment 3 
1  10 (27)  15 (27)  20 (26)  20 (26) 
2  10 (27)  15 (27)  20 (26)  20 (26) 
3  10 (27)  15 (26)  20 (26)  20 (26) 
4  20 (27)  15 (26)  15 (26)  10 (26) 
5  20 (27)  15 (26)  15 (26)  10 (26) 
6  20 (27)  15 (26)  15 (26)  10 (26) 
For the three paired comparisons with control, compare simulated pvalues that take stratification into
account with the pvalues for Fisher's exact test (which ignores stratification).
Try to find different sets of reported results that would make the two pvalues differ as much
as possible for some paired comparison.
What happens if the centers have different sizes? Can you use Simpson's paradox to construct
examples where the sign of the effect is reversed?

Read Read Golomb et al. 2010; Jönrup, H. and B. Rennermalm 1976; Kaptchuk et al. 2006; Berk et al. 2009 and 2011.
[Due 3/19].

Simulate 1,000 iid N(0,1) random variables.
Take the subset that are larger than 2.
Find a 1sided (upper) pvalue of the ztest of the hypothesis that the subset you selected is a
random sample from a N(0,1) population.
Repeat this overall simulation 1,000 times.
Plot the empirical cdf of the pvalues.
What fraction are below 0.1?
Why is that fraction so much larger than 0.1? Isn't the null hypothesis true? Discuss.

Simulate 1,000 iid N(0,1) random variables, as before, but instead of selecting those that are larger than 2,
select the 50 that are largest.
Find a 1sided (upper) pvalue of the ztest of the hypothesis that the subset you selected is a random sample from a N(0,1)
population.
Repeat this overall simulation 1,000 times.
Plot the empirical cdf of the pvalues.
What fraction are below 0.1?
Why is that fraction so much larger than 0.1? Isn't the null hypothesis true? Discuss.
What's the difference between this situation and the first situation?

Reproduce the simulations described in the "Simulation Results" section of the Berk et al. 2009 paper,
that produced figures 3–7. Reproduce the figures.
Repeat the simulations, this time constructing 95% confidence intervals for any variables that are selected.
What fraction of the confidence intervals constructed cover their corresponding parameter?
Is there a notable difference between the coefficient in the model that is actually zero and those
that are not? Discuss.

Simulate 600 iid N(0,1) random variables. Divide them into 6 groups of 100.
Perform multiple linear regression of the first group onto the following 20 variables:
the 5 other groups, the squares of the 5 other groups, the cubes of the other 5 groups,
and the reciprocals of the other 5 groups.
Select any estimated coefficients that are statistically significant at level 0.05.
Construct 95% confidence intervals for just those "significant" coefficients.
Note the number of confidence intervals you constructed, and the fraction of them that include
zero—the true population value of all the coefficients in this setup.
Repeat the simulation of 600 variables, the regression, the selection, and the construction of confidence
intervals, a total of 1,000 times.
What fraction of simulations gave one or more confidence intervals?
What fraction of simulations gave one or more confidence intervals that did not contain zero?
What fraction of the confidence intervals you constructed overall contained zero?
Discuss.

Read McCormick et al. 2012; Pan et al. 2012; Freedman, SMCI, Chapter 11.
[Due 4/9].
Comment critically (not necessarily negatively) on McCormick et al.:

How were the data obtained? What kind of sample was it?

How did they adjust for possible confounders? Stratification? Regression?
A combination of the two?
Do they give a simple crosstab of the data?

What techniques were used in the analysis?
Comment on the assumptions required for those techniques to be reliable, and
discuss whether the assumptions are plausible in this application.

Is multiplicity an issue in their analysis?
If so, how (if at all) did they account for it?

The paper presents confidence intervals for various things.
What assumptions are those confidence intervals based on?
What, precisely, is random in this study?

Do they use model selection? Do they make confidence intervals for coefficients in
the selected model?
Comment.

What are the three best things about the study?
What part of their argument is most convincing?

If you were designing the study and the data analysis, what are the three things
you think are most important to do differently?
Explain why they are important and how the approach taken in the paper
might be misleading.

Do you believe the findings? Why or why not?
P.B. Stark, statistics.berkeley.edu/~stark.
http://statistics.berkeley.edu/~stark/Teach/S215B/S12/index.htm
Last modified 9 April 2012.