> x = rnorm(10) > y = rnorm(10) > t.test(x,y) Welch Two Sample t-test data: x and y t = 1.4896, df = 15.481, p-value = 0.1564 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.3221869 1.8310421 sample estimates: mean of x mean of y 0.1944866 -0.5599410Before we can use this function in a simulation, we need to find out how to extract the t-statistic (or some other quantity of interest) from the output of the

> ttest = t.test(x,y) > names(ttest) [1] "statistic" "parameter" "p.value" "conf.int" "estimate" [6] "null.value" "alternative" "method" "data.name"The value we want is named "

> ttest$statistic t 1.489560 > ttest[['statistic']] t 1.489560Of course, just one value doesn't let us do very much - we need to generate many such statistics before we can look at their properties. In R, the

> ts = replicate(1000,t.test(rnorm(10),rnorm(10))$statistic)Under the assumptions of normality and equal variance, we're assuming that the statistic will have a t-distribution with 10 + 10 - 2 = 18 degrees of freedom. (Each observation contributes a degree of freedom, but we lose two because we have to estimate the mean of each group.) How can we test if that is true? One way is to plot the theoretical density of the t-statistic we should be seeing, and superimposing the density of our sample on top of it. To get an idea of what range of x values we should use for the theoretical density, we can view the range of our simulated data:

> range(ts) > range(ts) [1] -4.564359 4.111245Since the distribution is supposed to be symmetric, we'll use a range from -4.5 to 4.5. We can generate equally spaced x-values in this range with

> pts = seq(-4.5,4.5,length=100) > plot(pts,dt(pts,df=18),col='red',type='l')Now we can add a line to the plot showing the density for our simulated sample:

> lines(density(ts))The plot appears below. Another way to compare two densities is with a quantile-quantile plot. In this type of plot, the quantiles of two samples are calculated at a variety of points in the range of 0 to 1, and then are plotted against each other. If the two samples came from the same distribution with the same parameters, we'd see a straight line through the origin with a slope of 1; in other words, we're testing to see if various quantiles of the data are identical in the two samples. If the two samples came from similar distributions, but their parameters were different, we'd still see a straight line, but not through the origin. For this reason, it's very common to draw a straight line through the origin with a slope of 1 on plots like this. We can produce a quantile-quantile plot (or QQ plot as they are commonly known), using the

> qqplot(ts,rt(1000,df=18)) > abline(0,1)We can see that the central points of the graph seems to agree fairly well, but there are some discrepancies near the tails (the extreme values on either end of the distribution). The tails of a distribution are the most difficult part to accurately measure, which is unfortunate, since those are often the values that interest us most, that is, the ones which will provide us with enough evidence to reject a null hypothesis. Because the tails of a distribution are so important, another way to test to see if a distribution of a sample follows some hypothesized distribution is to calculate the quantiles of some tail probabilities (using the

> probs = c(.9,.95,.99) > quantile(ts,probs) 90% 95% 99% 1.427233 1.704769 2.513755 > qt(probs,df=18) [1] 1.330391 1.734064 2.552380The quantiles agree fairly well, especially at the .95 and .99 quantiles. Performing more simulations, or using a large sample size for the two groups would probably result in values even closer to what we have theoretically predicted. One final method for comparing distributions is worth mentioning. We noted previously that one of the assumptions for the t-test is that the variances of the two samples are equal. However, a modification of the t-test known as Welch's test is said to correct for this problem by estimating the variances, and adjusting the degrees of freedom to use in the test. This correction is performed by default, but can be shut off by using the

> t.test(x,y) Welch Two Sample t-test data: x and y t = -0.8103, df = 17.277, p-value = 0.4288 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.0012220 0.4450895 sample estimates: mean of x mean of y 0.2216045 0.4996707 > t.test(x,y,var.equal=TRUE) Two Sample t-test data: x and y t = -0.8103, df = 18, p-value = 0.4284 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.9990520 0.4429196 sample estimates: mean of x mean of y 0.2216045 0.4996707Since the statistic is the same in both cases, it doesn't matter whether we use the correction or not; either way we'll see identical results when we compare the two methods using the techniques we've already described. Since the degree of freedom correction changes depending on the data, we can't simply perform the simulation and compare it to a different number of degrees of freedom. The other thing that changes when we apply the correction is the p-value that we would use to decide if there's enough evidence to reject the null hypothesis. What is the behaviour of the p-values? While not necessarily immediately obvious, under the null hypothesis, the p-values for any statistical test should form a uniform distribution between 0 and 1; that is, any value in the interval 0 to 1 is just as likely to occur as any other value. For a uniform distribution, the quantile function is just the identity function. A value of .5 is greater than 50% of the data; a value of .95 is greater than 95% of the data. As a quick check of this notion, let's look at the density of probability values when the null hypothesis is true:

> tps = replicate(1000,t.test(rnorm(10),rnorm(10))$p.value) > plot(density(tps))The graph appears below. Another way to check to see if the probabilities follow a uniform distribution is with a QQ plot:

> qqplot(tps,runif(1000)) > abline(0,1)The graph appears below. The idea that the probabilities follow a uniform distribution seems reasonable. Now, let's look at some of the quantiles of the p-values when we force the

> tps = replicate(1000,t.test(rnorm(10),rnorm(10),var.equal=TRUE)$p.value) > probs = c(.5,.7,.9,.95,.99) > quantile(tps,probs) 50% 70% 90% 95% 99% 0.4873799 0.7094591 0.9043601 0.9501658 0.9927435The agreement actually looks very good. What about when we let

> tps = replicate(1000,t.test(rnorm(10),rnorm(10))$p.value) > quantile(tps,probs) 50% 70% 90% 95% 99% 0.4932319 0.7084562 0.9036533 0.9518775 0.9889234There's not that much of a difference, but, of course, the variances in this example were equal. How does the correction work when the variances are not equal?

> tps = replicate(1000,t.test(rnorm(10),rnorm(10,sd=5),var.equal=TRUE)$p.value) > quantile(tps,probs) 50% 70% 90% 95% 99% 0.5221698 0.6926466 0.8859266 0.9490947 0.9935562 > tps = replicate(1000,t.test(rnorm(10),rnorm(10,sd=5))$p.value) > quantile(tps,probs) 50% 70% 90% 95% 99% 0.4880855 0.7049834 0.8973062 0.9494358 0.9907219There is an improvement, but not so dramatic.

t.power = function(nsamp=c(10,10),nsim=1000,means=c(0,0),sds=c(1,1)){ lower = qt(.025,df=sum(nsamp) - 2) upper = qt(.975,df=sum(nsamp) - 2) ts = replicate(nsim, t.test(rnorm(nsamp[1],mean=means[1],sd=sds[1]), rnorm(nsamp[2],mean=means[2],sd=sds[2]))$statistic) sum(ts < lower | ts > upper) / nsim }Let's try it with our simple example:

> t.power(means=c(0,1)) [1] 0.555Not bad for a sample size of 10! Of course, if the differences in means are smaller, it's going to be harder to reject the null hypothesis:

> t.power(means=c(0,.3)) [1] 0.104How large a sample size would we need to detect that difference of .3 with 95% power?

> samps = c(100,200,300,400,500) > res = sapply(samps,function(n)t.power(means=c(0,.3),nsamp=c(n,n))) > names(res) = samps > res 100 200 300 400 500 0.567 0.841 0.947 0.992 0.999It would take over 300 samples in each group to be able to detect such a difference. Now we can return to the issue of unequal variances. We saw that Welch's adjustment to the degrees of freedom helped a little bit under the null hypothesis. Now let's see if the power of the test is improved using Welch's test when the variances are unequal. To do this, we'll need to modify our

t.power1 = function(nsamp=c(10,10),nsim=1000,means=c(0,0),sds=c(1,1),var.equal=TRUE){ tps = replicate(nsim, t.test(rnorm(nsamp[1],mean=means[1],sd=sds[1]), rnorm(nsamp[2],mean=means[2],sd=sds[2]))$p.value) sum(tps < .025 | tps > .975) / nsim }Since I set

> t.power1(nsim=10000,sds=c(1,2),mean=c(1,2)) [1] 0.1767 > t.power1(nsim=10000,sds=c(1,2),mean=c(1,2),var.equal=FALSE) [1] 0.1833There does seem to be an improvement, but not so dramatic. We can look at the same thing for a variety of sample sizes:

> res1 = sapply(sizes,function(n)t.power1(nsim=10000,sds=c(1,2), + mean=c(1,2),nsamp=c(n,n))) > names(res1) = sizes > res1 10 20 50 100 0.1792 0.3723 0.8044 0.9830 > res2 = sapply(sizes,function(n)t.power1(nsim=10000,sds=c(1,2), + mean=c(1,2),nsamp=c(n,n),var.equal=FALSE)) > names(res2) = sizes > res2 10 20 50 100 0.1853 0.3741 0.8188 0.9868

File translated from T

On 8 Apr 2011, 15:11.