Current projects

Simple Random Sampling: Not So Simple

with Ron Rivest and Philip Stark

A simple random sample (SRS) of size k from a population of size n is a sample drawn at random in such a way that every subset of k of the n items is equally likely to be selected. The theory of inference from SRSs is fundamental in statistics; many statistical techniques and formulae assume that the data are an SRS. True SRSs are rare; in practice, people tend to draw samples by using pseudo-random number generators (PRNGs) and algorithms that map a set of pseudo-random numbers into a subset of the population. Most statisticians take for granted that the software they use ‘‘does the right thing,’’ producing samples that can be treated as if they are SRSs. In fact, the PRNG algorithm and the algorithm for drawing samples using the PRNG matter enormously. Using basic counting principles, we show that some widely used methods cannot generate all SRSs of size k. In simulations, we demonstrate that the subsets that they do generate do not have equal frequencies, which introduces bias and makes uncertainty calculations meaningless. We compare the ‘‘randomness’’ and computational efficiency of commonly-used PRNGs to a PRNG based on the SHA-256 hash function, which avoids these pitfalls because its state space is countably infinite. We propose several best practices for researchers using PRNGs, including the wide adoption of hash function based PRNGs.

permute: a Python Package for Permutation Methods

with Jarrod Millman, Philip Stark, and Stéfan van der Walt

See my talk from UseR! 2016 on the corresponding R package

permute provides permutation tests for a variety of common hypothesis testing problems. Permutation tests require only minimal distributional assumptions about the data and guarantee correct type I error rates. Our goal is to enable researchers to enable researchers to reason about what nonparametric tests are suitable for their experimental design and to effectively implement these tests in Python. permute is available on Python Package Index and on GitHub.

A Comparison of Parametric and Permutation Tests for Regression Analysis of Randomized Experiments

with Fraser Lewis and Luigi Salmaso

Visit the project repo

Hypothesis tests based on linear models are widely accepted by organizations that regulate clinical trials. These tests are derived using strong assumptions about the data-generating process so that the resulting inference can be based on parametric distributions. Because these methods are well understood and robust, they are sometimes applied to data that depart from assumptions, such as ordinal integer scores. Permutation tests are a nonparametric alternative that require minimal assumptions which are often guaranteed by the randomization that was conducted. We compare analysis of covariance (ANCOVA), a special case of linear regression that incorporates stratification, to several permutation tests based on linear models that control for pretreatment covariates. In simulations using a variety of data-generating processes, some of which violate the parametric assumptions, the permutation tests maintain power comparable to ANCOVA. We illustrate the use of these permutation tests alongside ANCOVA with data from a clinical trial comparing the effectiveness of two treatments for gastroesophageal reflux disease. Given the considerable costs and scientific importance of clinical trials, one may want to include an additional nonparametric method, such as a linear model permutation test, as a robustness check on the statistical inference for the main study endpoints.

Past Projects

Student Evaluations of Teaching (Mostly) Do Not Measure Teaching Effectiveness

with Anne Boring and Philip Stark

View the article on ScienceOpen Research

In the press: NPR, Inside Higher Ed

Student evaluations of teaching (SET) are widely used in academic personnel decisions as a measure of teaching effectiveness. We show:

  • SET are biased against female instructors by an amount that is large and statistically significant

  • the bias affects how students rate even putatively objective aspects of teaching, such as how promptly assignments are graded the bias varies by discipline and by student gender, among other things

  • it is not possible to adjust for the bias, because it depends on so many factors

  • SET are more sensitive to students’ gender bias and grade expectations than they are to teaching effectiveness

  • gender biases can be large enough to cause more effective instructors to get lower SET than less effective instructors.

These findings are based on nonparametric statistical tests applied to two datasets: 23,001 SET of 379 instructors by 4,423 students in six mandatory first-year courses in a five-year natural experiment at a French university, and 43 SET for four sections of an online course in a randomized, controlled, blind experiment at a US university.

Model-based Matching

with Philip Stark and Jasjeet Sekhon

The goal of observational studies is to make inferences the causal effect of a treatment on some outcome of interest. The “fundamental problem of causal inference” makes this difficult: we are only able to see each individual’s outcome after treatment or no treatment, but not both. To estimate the effect of treatment, one must use a control group as the counterfactual. However, the treatment and control groups may be unalike in ways other than the treatment. Ideally, to adjust for these confounding variables, one would estimate the difference in outcomes between cases and controls who are identical with respect to all confounders, then average over the pairs. In practice, the large number of covariates accounted for makes this impossible. We propose to circumvent the problem of balancing treatment and control groups with a novel method for matching and estimation. This two-step procedure combines predictive modeling with nonparametric hypothesis testing to assess whether or not the treatment assignment provides any additional information about the outcome beyond what we would expect given all other covariates.

Is Salt Bad for Nations?

with Andrew Noymer, Philip Stark, and Crystal Yu

The World Health Organization is running a major campaign to reduce salt consumption worldwide. However, the main lines of evidence that salt is bad come from observational studies on hypertension. We investigated WHO’s real outcome of interest: mortality. We collected data on mortality, alcohol, tobacco, sodium, and economic factors in 36 countries in 1990 and 2010. Using a nonparametric permutation test, we studied whether the sodium intake helps predict change in life expectancy after accounting for other known health predictors.