The Fertilizer Experiment


Classical experimental design has its roots in agricultural applications. In this lab we will consider one such experiment conducted in 1932 to determine the effect of various levels of nitrogen and phosphorus additives on the productivity of potato plants. In this experiment, 6 treatments consisting of different combinations of nitrogeneous and phosphatic fertilizers were applied to 36 plots of land that formed a 6 by 6 square grid, on which potatoes were grown. The weight (in pounds) of potatoes harvested from each plot was recorded.

Let R be a factor with levels 1, 2, 3, 4, 5, and 6 indicating the row in which a given plot lies; let C be a factor also with levels 1, 2, 3, 4, 5, and 6 indicating the column in which the plot lies; and let T be a factor with levels A, B, C, D, E, and F indicating which of the 6 fertilizer treatments were applied to the plot. With this notation, the results of our experiment can be presented as follows (the numerical entry in each cell represents the weight of the potatoes harvested from the given plot with the particular treatment):

This data set is taken from pages 90--91 and 199--207 of The Design of Experiments by Sir Ronald A. Fisher, 7th edition (New York: Hafner, 1960), and it is also discussed on pages 411--412 and 588--589 of A Course in Probability and Statistics by Charles J. Stone (Belmont, Calif.: Wadsworth, 1995).

Our experiment is in the form of a Latin square; that is, each treatment is applied exactly once in each row and once in each column.

Read sections 11.1 and 11.2 of your text before you continue with the lab.

Let , and denote the artificial random variables with the design distribution, corresponding to R, C, and T, respectively. Let , , and denote spaces of all functions on , , and , respectively, where each .

1. What sort of distribution do each of , , and have? Are , , independent? Are they pairwise independent?

2. What do your answers in part 1 say about the spaces , , and ? Derive analytic expressions for , , and . Evaluate these expressions using the data from the table on the previous page.

3. The data stored in s200b/data/fert.dat consists of the values of the four factors R, C, N, P and the response (pounds of potatoes harvested) for each of the 36 runs of our experiment. You will have to code the values of T yourself. The coding of N and P is different than the coding below. This is done to simplify the calculations of parts 7 and 8.

Using the S commands lm and anova (see appendix), complete the followinga table.

What do you observe? Interpret your results. How do the entries in the ``SS" column of this table compare with the values of , , and you obtained in part 2? How many parameters did you estimate in the regression ?

So far, we have treated this experiment as though nothing were known about the quantities of nitrogeneous and phosphatic fertilizers that were combined to form our six treatments. Let N denote a factor with two levels 0 and 1, indicating the amount of nitrogeneous fertilizer in a given treatment; and let P denote a factor with three levels 0, 1 and 2 indicating the amount of phosphatic fertilizer in the treatment. We can now express the levels of T as a function of N and P as follows

Let and denote the artificial random variables with the design distribution, corresponding to N and P, respectively. Let and denote spaces of all functions on , and , respectively, and set .

4. What sort of distribution does have? Are , , pairwise independent? Are and independent?

5. What do your answers in part 4 say about the spaces , , , , ?
6. Using the values of N and P given in the dataset, complete an ANOVA table similar to the one on page 2 of this lab, but with the row labeled ``T" replaced by three rows with labels ``N", ``P", and ``NP". What do you observe? Interpret your results. In particular, what can you say about the strength of the interaction NP?

In the plot below the three lines represent the change in mean weight of potatoes as you switch from N=0 to N=1 holding P fixed (observations are pooled across R and C). Is this picture what you would have expected based on your ANOVA results above? Explain.

Part 7 and 8 below are bonus. I suggest you to do part 8.

In choosing a Latin square design, the experimenters implicitly made a decision to ignore interactions between R, C and T in order to reduce the number of runs required to estimate the various main effects. Classically, interactions between treatments (N and P) and blocking variables (R and C) are completely ignored independently of the choice of design (see the discussion on blocking and randomization in Section 11.6 of the text). In our experiment for example, the interaction between any two of T, R, or C has 25 degrees of freedom. It is clearly not possible to estimate the main effects of these three factors as well as even a single interaction.

In this case, however, we are able to entertain an alternate analysis to our data in which R, C, N, and P are viewed as quantitative, and we restrict our attention to quadratic polynomials in these four variables. If we treat our variables in this way, the interaction space between R and C, say, has only 1 degree of freedom.

7. Consider the quadratic model in R, C, N, and P, that leaves out only the squared term for N (why?). Perform the backwards deletion technique you used in lab 2 to come up with an alternative model for the mean yield of pounds of potatoes in terms of R, C, N and P (you do not have to bother with adding variables back in at the end, but do observe the rules about hierarchical structure and variable deletion). What do you find?

8. Finally, repeat the same steps in part 7, this time starting with the quadratic model that leaves out the squared term for N as well as all the interaction terms involving R and C with N and P (that is, leaving out the block treatment interactions). Referring to the model fit in part 6 (leaving out the NP interaction) as ``Categorical" and the model you arrive at by the backwards deletion in this section as ``Quantitative," fill in the following ANOVA table

What do you observe ? Comment on your results, explaining what exactly this ANOVA table is telling you. What is the null hypothesis here ?

Note : In parts 7 and 8, we treated each of the variables R, C, N and P as quantitative. You can take only N and P to be quantitative, while leaving R and C as qualitative factors, and then repeat parts 7 and 8 to see the effect of this.