Hormone Replacement Therapy Does Not Save Lives: Comments on the Women's Health Initiative David A Freedman Department of Statistics, UC Berkeley, CA 94720-3860, U.S.A. email: freedman@stat.berkeley.edu and Diana B Petitti Kaiser Permanente Southern California, U.S.A. 393 E. Walnut Street, Pasadena, CA 91188 email: diana.b.petitti@kp.org We thank Ross Prentice and his colleagues for a rich and provocative paper that has generated many insights in a variety of methodological areas. We also thank our editor, Xihong Lin, for organizing this discussion. Ours is an age of specialization, and we propose to consider only the effect of HRT (hormone replacement therapy) on three cardiovascular endpoints: coronary heart disease, stroke, and venous thromboembolism. First, some background. Ideas of biological mechanism and evidence from observational epidemiology led many observers to conclude that HRT was protective, reducing cardiovascular death rates by a factor of 2 or more. According to Grodstein and Stampfer (1998, pp. 211, 217), "Consistent evidence from over 40 epidemiologic studies demonstrates that postmenopausal women who use estrogen therapy after the menopause have significantly lower rates of heart disease than women who do not take estrogen...the evidence clearly supports a clinically important protection against heart disease for postmenopausal women who use estrogen." Also see Stampfer and Colditz (1991), Grodstein et al (1996). Such findings profoundly influenced the practice of medicine. In the late 1990s, postmenopausal hormones were best-selling drugs worldwide. About 90 million prescriptions for HRT were issued annually in the United States, corresponding to 15 million HRT users (Hersh, Stefanick, and Stafford, 2004). Some observers remained skeptical. See, for instance, Petitti (1994), Posthuma, Westendorp and Vandenbroucke (1994), or Vandenbroucke (1995). Two large clinical trials were organized to resolve the issue-- HERS (Heart Progestin/Estrogen Replacement study) and WHI (Women's Health Initiative). Prentice and his colleagues were actively involved in the design and analysis of WHI. The experiments demonstrated no benefit from HRT, and some harm: WHI was stopped early, largely due to an increased risk from breast cancer among the HRT group. Debate continues on these issues-- for instance, a different mix of hormones administered along a different time path might be beneficial. See, for example, International Journal of Epidemiology (2004; 33: 441-67). However, the experiments led to another major change in medical practice. Today, HRT would rarely be prescribed to prevent cardiovascular disease. WHI had two branches, an observational study and a randomized controlled experiment. By contrast with the experiment, the observational study-- like many of the other observational studies-- found a protective effect from HRT. What accounts for the discrepancy? Prentice et al have two answers that we find persuasive. (1) Observational studies can be misleading. Therefore, it is important to adjust for confounding variables, including socioeconomic status. This may seem obvious. It is not. The Nurses' Health Study on HRT did not adjust for socioeconomic status (Grodstein et al., 1996; Humphrey, Chan, and Sox, 2002). (2) In many contexts, including the present one, time is a crucial variable. Treatment and disease are dynamic, not static. When arguing these points, Prentice et al could be read as suggesting that-- if properly analyzed-- the observational study agrees with the randomized controlled experiment. We would have several questions about such an interpretation. (1) Observational data can be adjusted in a variety of ways. Without experimental data, it will be unclear which adjustments to make, or how far to go. (2) Table 3 in Prentice et al. (2005b) only shows results on coronary heart disease and thromboembolism. However, even after all the modeling is done, there remains a large disparity with respect to an important cardiovascular endpoint-- stroke Prentice et al. (2005a). Prentice et al. (2005b) mention stroke, but do not discuss the difficulties created by this endpoint. (3) Prentice et al chose for their null hypothesis equality between the two branches of WHI. However, statistical power is limited, and the choice of null greatly influences conclusions. Power is limited because the women in the treatment arm of the clinical trial are mainly short-term users of HRT. By contrast, in the observational study, users have been taking hormones for a long time. (According to the conventions used by Prentice et al., in the observational study, exposure prior to baseline is counted.) To illustrate how substantive conclusions may be determined by apparently innocuous technical choices, we suggest the following null hypothesis: compared to the randomized controlled experiment, the observational study under-estimates the risks of HRT by a factor in the range 1.5 to 3, depending on risk group and endpoint (heart disease, stroke, thromboembolism). The data seem to be at least as compatible with our null hypothesis as with the null hypothesis of equivalence. These null hypotheses have rather different implications for bias in observational epidemiology. [In their rejoinder, Prentice et al apparently confuse overall risk wih risk conditional on risk group: the comment above applies to risk groups, as in Table 3 of Prentice et al 2005b, and the corresponding results in Prentice et al 2005a for thromboembolism.] Bias stems from incomplete adjustment. Adjustment must be incomplete, because relevant lifestyle factors are extraordinarily difficult to identify or measure. Here is one example. In observational studies, women on HRT are "compliers": they follow a treatment regime prescribed by their doctors. But compliance-- even by subjects assigned to placebo in a clinical trial-- is associated with favorable outcomes. A factor of 2 for compliance bias is compatible with previous literature. Compliance is thoroughly confounded with treatment in observational studies of HRT. See Petitti (1994) and Barrett-Connor (1991) for additional discussion. HRT comes in two forms: (1) unopposed (estrogen only) and (2) combined (estrogen plus progestin). WHI considered both forms (Tables 1 and 2 in Prentice et al.). Modeling results are presented only for the combined form (Table 3 in Prentice et al.). Hence our focus on combined therapy. We turn now to a policy issue. Although WHI is tax-supported, its data are not available to us. Data from clinical trials are available only rarely, and conditions may be imposed that almost preclude independent analysis. Policies governing data dissemination need to be reconsidered, although due regard must be paid to patient confidentiality. Only by thorough scrutiny can error be avoided. Transparency is the best assurance of scientific quality. For additional discussion, see Geller et al. (2004). We would sum up the methodological lessons as follows. Rigorous causal inferences have been made using observational data, from the time of John Snow on cholera and Ignaz Semmelweis on puerperal fever. Recent examples include the health effects of smoking, and the demonstration that cervical cancer is in part a sexually transmitted disease. Indeed, most of what we know about causation in the medical sciences comes from observational studies-- because experiments are often unethical or impractical. We might even suggest that observation necessarily precedes experiment. What else could provide motivation, or help define protocols? On the other hand, observational data need to be approached with caution. When there is a conflict between observational epidemiology and experiments-- HRT not being an isolated case-- we think that the experiments are the ones to watch. The gap between association and causation will not generally be bridged by proportional-hazard models, even with stratification and time-dependent exposure variables. For more discussion on the relative merits of experiment and observation, see Mill (1868, Book III, Chapters VII and X). Prentice and his colleagues deserve our thanks for the paper, and their work on WHI. REFERENCES Barrett-Connor, E. (1991). Postmenopausal estrogen and prevention bias. Annals of Internal Medicine 115, 455-6. Geller, N. L., Sorlie, P., Coady, S., Fleg, J., and Friedman, L. (2004). Limited access data sets from studies funded by the National Heart, Lung, and Blood Institute. Clinical Trials 1, 517–524. Grodstein, F. and Stampfer, M. J. (1998). The cardioprotective effects of estrogen. In The Management of the Menopause, (J. Studd, ed.), Chapter 22, 211-9. London: Parthenon. Grodstein F, Stampfer MJ, Manson JE, Colditz GA, Willett WC, Rosner B, Speizer FE, Hennekens CH. (1996). Post menopausal estrogen and progestin use and the risk of cardiovascular disease. New England Journal of Medicine 335, 453-461. Hersh, I. L., Stefnick, M. L., and Stafford, R. S. (2004). National use of postmenopausal hormone therapy: annual trends and response to recent evidence. JAMA 291, 47-53. Humphrey, L. L., Chan, B. K. S., and Sox, H. C. (2002). Postmenopausal hormone replacement therapy and the primary prevention of cardiovascular disease. Annals of Internal Medicine 137, 273-84. Mill, J. S. (1868). A System of Logic, Ratiocinative and Inductive, 7th ed. (1st ed., 1843). London: Longmans, Green, Reader, and Dyer. Petitti, D. B. (1994). Coronary heart disease and estrogen replacement therapy: Can compliance bias explain the results of observational studies? Annals of Epidemiology 4, 115-118. Posthuma, W. F., Westendorp, R. G., and Vandenbroucke, J. P. (1994). Cardioprotective effect of hormone replacement therapy in postmenopausal women: is the evidence biased? BMJ 308, 1268-1269. Prentice, R. L., Langer, R., Stefanick, M. et al (2005a). Combined postmenopausal hormone therapy and cardiovascular disease: toward resolving the discrepancy between observational studies and the Women's Health Initiative clinical trial. American Journal of Epidemiology 162, 404-14. Prentice, R. L., Pettinger, M., Anderson, G. (2005b). Statistical issues arising in the Women's Health Initiative. Biometrics, this issue. Stampfer, M. J. and Colditz, G. A. (1991). Estrogen replacement therapy and coronary heart disease: A quantitative assessment of the epidemiologic evidence. Preventive Medicine 20: 47-63. (Reprinted in the International Journal of Epidemiology 2004; 33, 445-453.) Vandenbroucke, J. P. (1995). How much of the cardioprotective effect of postmenopausal estrogens is real? Epidemiology 6, 207-208. This paper is scheduled to appear in the December 2005 issue of Biometrics. Copyright has been transferred to the International Biometric Society, who should be contacted for permission to circulate.