Next: About this document ...
R Lab Session : Part 3
Testing proportions
Problem : 19 Douglas-fir trees in a stand of 53 have succumbed to a fungal pathogen. Lab tests have suggested that 70% of Douglas fir are immune. Assuming that all trees in the stand are infected and all those that are susceptible have died, test if these data are consistent with the laboratory tests.
> prop.test(19, 53, p=.3, alternative="two.sided")
1-sample proportions test with continuity correction
data: 19 out of 53, null probability 0.3
X-squared = 0.6074, df = 1, p-value = 0.4358
alternative hypothesis: true p is not equal to 0.3
95 percent confidence interval:
0.2349203 0.5025337
sample estimates:
p
0.3584906
Problem : Western red cedar is being tested for vulnerability to the same pathogen. In an infected stand of 39 cedar trees, 5 have died. Test if susceptibility is the same between the two tree species.
> prop.test(c(19,5),c(53,39),p=NULL,alternative="two.sided")
2-sample test for equality of proportions with continuity correction
data: c(19, 5) out of c(53, 39)
X-squared = 5.0427, df = 1, p-value = 0.02473
alternative hypothesis: two.sided
95 percent confidence interval:
0.04166461 0.41890626
sample estimates:
prop 1 prop 2
0.3584906 0.1282051
Exercises : Repeat the first problem using the exact binomial test, binom.test, instead. Does the answer differ? How and why? Also test the hypothesis that 80% of Western red cedar trees are immune.
Paired
test
Problem : Weights of 9 students (in lbs) before and after a month-long training schedule are given below. Test if the mean weight after training is smaller than the mean weight before training.
pre-training:
124 152 129 144 150 153 138 161 140
post-training:
133 145 131 150 164 132 130 156 141
Copy the two vectors above and save as text files, say pre.txt and post.txt.
> pre <- as.numeric(scan("C:/pre.txt", what="numeric"))
Read 9 items
> post <- as.numeric(scan("C:/post.txt", what="numeric"))
Read 9 items
> weight <- cbind(pre,post)
> t.test(weight[,1],weight[,2],paired=T,conf.level=.95,alternative="greater")
Paired t-test
data: weight[, 1] and weight[, 2]
t = 0.2847, df = 8, p-value = 0.3915
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
-5.530518 Inf
sample estimates:
mean of the differences 1
Note : the difference in means is mean(pre) - mean(post)
Perform the same test when the only interest is to see if there is any change.
> t.test(weight[,1],weight[,2],paired=T,conf.level=.95,alternative="two.sided")
Paired t-test
data: weight[, 1] and weight[, 2]
t = 0.2847, df = 8, p-value = 0.783
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-7.09842 9.09842
sample estimates:
mean of the differences
1
Paired
test : sample size calculation
Suppose, for the same problem, it is known from past experience that the standard deviation of the difference in weights is 10 lbs. What should be the minimum sample size necessary to detect a 15 lbs weight change with proability 0.9 ?
> power.t.test(delta=15,sd=10,type="paired",alternative="two.sided",power=.9)
Paired t test power calculation
n = 6.869959
delta = 15
sd = 10
sig.level = 0.05
power = 0.9
alternative = two.sided
NOTE: n is number of *pairs*, sd is std.dev. of *differences* within pairs
Suppose you have only 5 observations and are interested in testing the hypothesis that there is a difference in mean weights. What would be the power for detecting an effect of size 0.8? (Recall : effect size
)
> power.t.test(n=5,delta=.8,type="paired",alternative ="two.sided")
Paired t test power calculation
n = 5
delta = 0.8
sd = 1
sig.level = 0.05
power = 0.2806895
alternative = two.sided
Exercise : What if the level of significance is 1% instead of 5% ?
Two sample
test
Problem : Using the kfm data from R Lab 1, we want to know if there is a difference in birth weight by gender, assuming that weights of boys and girls have the same standard deviation. First you will need to go to R Lab 1, click on the kfm link, and save the data file as text.
> kfm <- read.table("C:/kfmdata.txt",header=T)
> kfm
> t.test(weight~sex,data=kfm,type="two.sample",alternative="two.sided",var.equal=T)
Two Sample t-test
data: weight by sex
t = 1.5626, df = 48, p-value = 0.1247
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.06848806 0.54616806
sample estimates:
mean in group boy mean in group girl
5.43816 5.19932
Do the same dropping the assumption of equal variance for the groups.
> t.test(weight~sex,data=kfm,type="two.sample",alternative="two.sided",var.equal=F)
Welch Two Sample t-test
data: weight by sex
t = 1.5626, df = 47.383, p-value = 0.1248
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.06859144 0.54627144
sample estimates:
mean in group boy mean in group girl
5.43816 5.19932
Exercise : If you know that the s.d. = 0.5 lb for both the groups and the two samples have the same size (say 25), what would be the power for detecting a difference in mean of 0.5 lb?
Wilcoxon-Mann-Whitney Test
Problem : Ms. Whitmore has been handing out lollypops for Halloween. At the beginning of the evening, she is very generous, but as the evening wears on she begins to fear for her lollypop supply and becomes stingier. The numbers given below are counts of lollypops given to each child. Test the hypothesis that the average number of lollypops received by a child is 6.
26 18 20 25 19 23 14 12 15 10 8 7 5 2 3 2 2
Copy the data and save it as a text file.
> lolly <- as.numeric(scan("C:/lolly.txt", what="numeric"))
> qqnorm(lolly)
> wilcox.test(lolly, exact=F, mu=6, alternative="two.sided")
Wilcoxon signed rank test with continuity correction
data: lolly
V = 128, p-value = 0.01561
alternative hypothesis: true mu is not equal to 6
Problem : Percent ground area covered by moss has been measured in two different arctic tundra types, in twelve plots in each, data given below. Do the two tundra types differ in moss cover?
> ws <- c(0, 2, 3, 8, 5, 12, 24, 53, 6, 79, 19, 61)
> tt <- c(13, 9, 58, 17, 49, 71, 39, 10, 89, 54, 26, 75)
> qqnorm(ws)
> qqnorm(tt)
> wilcox.test(ws,tt,paired=F,alternative="two.sided")
Wilcoxon rank sum test
data: ws and tt
W = 37, p-value = 0.0449
alternative hypothesis: true mu is not equal to 0
Exercises : Using the kfm data, test the hypothesis that maternal weight differs between mothers who give birth to boys vs. girls.
The data below represent counts of dental cavities detected in the children who visited Ms. Whitmore on Halloween. The first row is from dental examinations six months prior to Halloween, the second from six months after. Is there evidence of an increase in cavities? If so, does this necessarily implicate Ms. Whitmore, or, for that matter, Halloween?
1 0 3 1 2 0 1 3 0 0 4 2 0 1 1 0 2
2 0 1 1 0 1 2 0 1 0 2 3 3 2 1 1 0
Test
The Chi-squared test has not been covered yet in class, but it is another one you will need to know how to use. Enter
>help(chisq.test)
to see a description of this test and the parameters it takes. You can use this command for an explanation of any of the tests here. To see a full list of tests available in R, enter
data: obs >help(package=ctest)
.
Snapdragon Example (from Book)
Test whether data fit the model of probabilities
for (white, pink, red), respectively.
> obs<-c(54,122,58)
> nullprob<-c(.25,.5,.25)
> chisq.test(obs,p=nullprob)
Chi-squared test for given probabilities
X-squared = 0.5641, df = 2, p-value = 0.7542
For a 2-way test, you just enter in the contingency table:
> data(HairEyeColor) #Brings in data set that's in R
> malecolor<-HairEyeColor[,,1] #Just brings the males'
(3-diminsional array)-- don't worry about understanding
> malecolor
Eye
Hair Brown Blue Hazel Green
Black 32 11 10 3
Brown 38 50 25 15
Red 10 10 7 7
Blond 3 30 5 8
> chisq.test(malecolor)
Pearson's Chi-squared test
data: malecolor
X-squared = 42.1633, df = 9, p-value = 3.068e-06
Warning message:
Chi-squared approximation may be incorrect in: chisq.test(malecolor)
Exercise (Good for Homework!) Bring in the graft dataset on web and do a
test for independence - hint use table( ) to get in the contingency table format. (type is the type of leukaemia: Acute Myeloid, Acute Lymphatic, and Chronic Myeloid; gvhd is whether patient developed graft-versus-host disease)
Next: About this document ...
Elizabeth Purdom
2003-11-05