Ecological Inference D.A. Freedman, Professor of Statistics P.B. Stark, Professor of Statistics University of California Berkeley, CA 94720-3860 In: 1 Encyclopedia of Law and Society: American and Global Perspectives, 447--448, David S. Clark, ed., Sage Publications. Invited. Data are often reported using one kind of grouping ("aggregation"), while information is needed about a different sort of grouping. Can the data be disaggregated and reaggregated--taken apart and put back together--by statistical methods? That is what ecological inference tries to do. Our lead example is a voting rights case where Hispanic plaintiffs seek redistricting. We must decide whether the non-Hispanic majority generally votes as a bloc to defeat the candidate preferred by the Hispanics. For any precinct, the number of votes for each candidate is a matter of public record. So is the number of Hispanic voters. However, the secret ballot prevents us from knowing how the non-Hispanics voted, or which candidate is preferred by the Hispanics. We have vote totals aggregated by precinct. We want the votes reaggregated by ethnic group. Across precincts, there is a statistical relationship between the percentage of votes for each candidate and the ethnic makeup of the precinct. Under certain assumptions, the relationship can be used to infer the number of Hispanics and non-Hispanics who voted for the various candidate. As it turns out, the assumptions generally cannot be tested using available data on precinct vote totals and numbers of Hispanic voters. Assumptions could be tested with exit polls, but then ecological inference would be unnecessary: estimates could be made directly from the polling data. A common method for ecological inference is "ecological regression." This technique relies on the "constancy assumption" that Hispanics vote alike, no matter where they live. So do non-Hispanics. Demography is destiny, geography is accident. If the constancy assumption fails, ecological regression can make serious errors, as demonstrated by W.S. Robinson. Using census data on states, he correlated the percentage of persons who were literate with the percentage who were foreign-born. Literacy rates were much higher in states with higher percentages of foreign-born persons. According to ecological regression, the foreign-born must have been substantially more literate than the native-born. In fact, however, foreign-born persons were less literate. Nativity--born abroad, or born in the US--is analogous to ethnicity. Literacy is analogous to voting. The constancy assumption is that literacy depends on nativity, not state of residence. What accounts for the statistical relationship in the state-level data? Literacy rates among the native-born varied substantially from one state to another. Furthermore, when foreign-born persons immigrated to the US, they tended to settle in states where the native-born were relatively literate. That created a strong relationship between the percentage of foreign-born in a state, and literacy rates among the native-born. The constancy assumption was wrong. That is why ecological regression gave the wrong answers. Ecological inference runs into trouble in many contexts, when behavior of individuals is related to the demographics of their neighborhoods. On the other hand, ecological inference often succeeds. Judgment must be made case by case, focusing on the assumptions behind the methods. Further reading Freedman, D.A. (2001). "Ecological Inference and the Ecological Fallacy." International Encyclopedia of the Social and Behavioral Sciences 6: 4027-30. Edited by N.J. Smelser and P.B. Baltes. Published by Elsevier. Grofman, B. and C. Davidson (1992). Controversies in Minority Voting: The Voting Rights Act in Perspective. Washington, D.C.: Brookings Institution. Kaye, D.H., and D.A. Freedman (2000). Reference guide on statistics. In Reference Manual on Scientific Evidence. 2nd ed. Washington, D.C.: Federal Judicial Center. Robinson, W.S. (1950). "Ecological correlations and the behavior of individuals." American Sociological Review 15: 351-357. Rubinfeld, D.L., Editor (1991). Statistical and demographic issues underlying voting rights cases. Evaluation Review 15: 659-816. Schuessler, A.A. (1999). Ecological inference. Proceedings of the National Academy of Sciences USA 96: 10578-10581.