Understanding Logistic Regression


Since this lab requires playing around with the data a lot, it is not necessary to print out all the commands used during the process. On the other hand, it is requested that you provide enough information to make it clear what steps you have taken. In other words, trivial output such as printing the design matrix should not be included in your final report.

  1. Variable (RACE) is a qualitative variable having three levels. Collapse it into two levels (for simplicity) --- black and non-black. Collapse also levels 1 and 2 of into one level.
  2. Perform a logistic regression using AGE, SER, CAN and intercept (in a linear maner) as predictors. What are the estimates of the coefficient and their asymptotic variance. How many iterations were needed until convergence ?
  3. Obtain the above results using the techniques outlined in the text, specifically the iterateively reweighted least squares method outlined in Chapter 13. You will need to write a function implementing the algorithm on page 407 of the text:

    In the algorithm above, X is the form of the design matrix (including a constant column). Make an initial guess at the value of , which is a vector in this case. Repeat the steps outlined in the algorithm above until you reach the desired degree of accuracy or until agrees with your estimate in I.2 above. List the coefficients obtained in each iteration. How many iterations were needed until convergence ?

  4. Using the output from I.3 above, compute the associated covariance matrixs. Do the variance estimates agree with the output from I.2?
  5. Suppose you want to include some or all of the three predictor variables AGE, SER, CAN, and no others. Which ones would you use? Why? Based your answer on the above results.