Distribution of the Sample Chi-Square Statistic

When the tool starts, the "box" on the right will contain four probabilities; these are the probabilities of four distinct outcome categories. You might think of them as four colors. Each time you press the "Take Sample" button, the computer draws a pseudo-random sample of categories, according to the probabilities. It then computes the observed value of the chi-squared statistic, and appends the value to the list of data displayed in the histogram. The curve displayed is the Chi-Squared curve with 4 degrees of freedom. Press the "Take Sample" button a few times to see what happens. Then type 1000 into the "Take ____ samples" box and hit the return key. Now each time you press the "Take Sample" button, the computer will take 1000 samples of size 5, and add the 1000 resulting values of the chi-squared statistic to the data displayed in the histogram. Press the "Take Sample" button until you have taken a total of 10,000 samples of size 5. Note that the chi-square curve generally follows the histogram of values of the Chi-square statistic, but does not match it exactly. In particular, the histogram has gaps that the curve does not share, as well as "spikes" at frequent values. Change the sample size to 200 and repeat the experiment, taking 10,000 samples. Now the chi-squared curve will be an excellent approximation to the histogram. In general, if the expected number of outcomes in every category is 10 or larger, the chi-squared curve with k - 1 degrees of freedom is an accurate approximation to the probability histogram of the chi-squared statistic for k categories.