A Statistician's Perspective on Census Adjustment

Presented at the Berkeley Breakfast Club

4 December 1998

P.B. Stark
Department of Statistics
University of California, Berkeley

 


Outline

  1. Why adjust the census?

  2. What's happening right now?

  3. What happened in 1990?

  4. What's the proposal to adjust the census?

  5. Why didn't it work in 1990?

  6. Why won't it work in 2000?

 

 


Why adjust the census?

The census misses some people.

The undercount is different in different places, and for different demographic groups.

Uneven undercount yields errors in state population shares, which determine congressional representation and allocation of Federal funds.

If the undercount were even, wouldn't affect state shares.

Unevenness is unfair (and politically incorrect!)

It would be wonderful to know how many people the census missed, and where.

Then could add them where they belong to improve state shares.

 


What's happening right now?

Monday, 30 November 1998, the Supreme Court heard arguments regarding the use of sampling in the 2000 Decennial Census.

Suit brought by Speaker of the House (then Gingrich) against Department of Commerce to bar using sampling for apportionment.

Similar suit brought by Southeastern Legal Foundation

The Constitution requires an "actual enumeration" of the population.

The 1976 amendment of the Census Act of 1957 states that "except for the determination of the population for purposes of apportionment of Representatives in Congress among the several States, the Secretary shall, if he considers it feasible, authorize the use of the statistical method known as `sampling' in carrying out the provisions of this title."

This would at least seem to prohibit sampling for some purposes, as was held by lower courts.


What happened in 1990?

1990 Census counted nearly 249 million people in the US.

The Census Bureau used sampling to estimate the 1990 undercount, to adjust for it.

(For the 2000 Census, there is a proposal to use sampling to adjust for undercount, and to follow up some people who do not mail back their census forms. This talk is just about adjusting for undercount.)

The sampling-based estimates come from an idea in wildlife management, called "capture-recapture."

To estimate the number of fish in a pond, could

Problems using this for people are outlined below.

Secretary of Commerce Mossbacher decided not to adjust the official 1990 census numbers.

Led to litigation by the City of New York, et al., against the Federal government.

In July 1991 (when Mossbacher had to decide), undercount estimate was about 5.3 million people.

Then found a programming error that had inflated the estimate by about 1 million.

More careful matching of records decreased the estimate by about another 300,000.

Now acknowledged that about 2-3 million of the remaining estimate is error in the estimate, not error in the census:

60% to 80% of the proposed 1990 adjustment was erroneous!

 


Adjustment has two kinds of error:

Bias is a technical term: it does not mean someone is intentionally skewing the results.

Sampling errors tend to average out. Bias does not.


Estimating from a sample is like shooting a rifle

Each shot hits the target in a different place.

Sampling error is the scatter in the shots.

Bias is a tendency for all the shots to be off in the same direction.

Can fix bias in a rifle by sighting it in.

Possible, because you can see where the shots land.

Fixing statistical bias in a census adjustment is hard.

Only get one shot (because you only take one sample).

Cannot see where the shot lands (because you do not know the true undercount).

 


Proposed adjustment procedure

Example. Black male renters age 30-44 living in the central city of a major metropolitan area in New England were one 1990 post-stratum.

In the 1990 procedure, there were 1,392 post-strata in all.

Sample was about 380,000 people in 169,000 households in 5,290 block groups.

For 2000, propose about 12,600 post-strata (50 states × 6 race/origin × 7 age/sex × 2 tenure × 3 geography)

Sample to be about 1.7 million people in 750,000 households in 60,000 block groups.

Sample is about 5 times larger for 2000, but taken in much less time.

Increased number of post-strata outweighs any improvement the larger sample might afford.

Decrease in the time allowed means using more poorly trained staff; more errors are likely.

 


Assumptions for adjusting

Basic idea:

Just a sketch--the details are extremely complex.

Assume:

 


None of these assumptions is true.

The failures add bias, enough to make the adjustments worse than doing nothing.

Match/rematch mismatch rate 1.8%

Fabricated interview rate estimated to be from about 0.03% to nearly 9%.

Just 13 (detected) fabrications identified in 1990 added about 50,000 to the undercount estimate.

A 1% fabrication rate would inflate the undercount by about 1.7 million.

A single unmatched family of 5 in the sample added 45,000 to the undercount estimate.

Garbage in, Garbage out:
Small errors in the sample give huge errors in the adjustment.

Address errors and geocoding errors: searched over a ring of one or two blocks.
Without search, estimated undercount would have been twice as large.
If search area were larger, Census Bureau estimates that 75% of the unmatched cases would become matches.

Arbitrary choices in the adjustment have large effects.

The estimate does not fix error in the census---just adds new errors.

 


The Adjustments Don't Make Sense

New York, Pennsylvania, and Illinois lose shares in the 1990 adjustment. Texas and Arizona gain shares.

Probably easier to count in Dallas and Phoenix than in the Bronx, Philadelphia, and Chicago.

Taking shares from New York, Pennsylvania and Illinois might be right --- or it might be bias from bad assumptions.

Illustration of the effect of the proposed 1990 adjustment on State shares.

Figure courtesy of Brown et al. 1998.

Effect on sex ratios of children under 10: instead of 51%-49%, get 58%, ...


Statistical Analysis Supporting Adjustment is Poor

For adjustment to make the census better, systematic errors (biases) need to cancel.

Random errors to cancel, but systematic errors don't.

Arguments that systematic errors in the adjustment cancel depend on statistical models.

The models are false, and have bizarre consequences.

E.g., the model for “correlation bias” says that the 1990 census missed nearly 900,000 white males, but only 13 between 20 and 30 years old.

Model also says census missed >750,000 black males, but counted almost 30,000 too many black males under age 10.

Relying on that model, the bureau claimed that some of the adjustment biases cancel, to give a net bias of 38%.

Without the model, the bureau estimated the bias at 57%, almost 20% higher.

Best study (in my opinion) finds the bias over 80%.

 

Unresolved match status: 4,000,000 (weighted) people in census, 4,000,000 in sample survey.

Undercount estimate ranges from 9,000,000 and -1,000,000 (overcount) depending on how unresolved cases are treated.

Adjustment depends on dubious statistical model for unresolved cases.

 


Demographic Analysis

There is another way to estimate the total population, called Demographic Analysis:

population = births - deaths + immigration - emigration

Because inter- and intra-state moves are not tracked, Demographic Analysis estimates only national totals (not state shares).

The 1990 adjustment adds more people than Demographic Analysis says were missed, including about a million extra women.

Because of bias, the adjustment probably puts the people in the wrong place, making state shares worse.

 


References

Bell, W.R., 1993. Using Information from Demographic Analysis in Post-Enumeration Survey Estimation, J. Amer. Statist. Assoc., 88, 1106-1118.

Breiman, L., 1994. The 1991 Census Adjustment: Undercount or Bad Data? Statistical Science, 9, 458-537.

Lawrence D. Brown, Morris L. Eaton, David A. Freedman, Stephen P. Klein, Richard A. Olshen, Kenneth W. Wachter, Martin T. Wells, and Donald Ylvisaker. Statistical Controversies in Census 2000, Technical Report 537, Department of Statistics, U.C. Berkeley, November 1998.

Bureau of the Census, 1993. Decision of the Director of the Bureau of the Census on Whether to Use Information From the 1990 Post-Enumeration Survey (PES) To Adjust the Base for the Intercensal Population Estimates Produced by the Bureau of the Census ACTION: Notice of final decision. Federal Register 58 FR 69 .

Committee on Adjustment of Postcensal Estimates, 1992. Asessment of Accuracy of Adjusted Versus Unadjusted 1990 Census Base for Use in Intercensal Estimates, Bureau of the Census (C.A.P.E. Report).

Census 2000 Operational Plan, U.S. Department of Commerce, Economics and Statistics Administration, Bureau of the Census, April 1988 (revised).

Darga, K., 1998. "Straining Out Gnats and Swallowing Camels: The Perils of Adjusting for Census Undercount," and "Quantifying Measurement Error and Bias in the 1990 Undercount Estimates." (submitted as testimony to the US House of Representatives Subcommittee on the Census on 5/5/98.)

Freedman, D. and Wachter, K., 1994. Heterogeneity and Census Adjustment for the Intercensal Base, Statistical Science, 9, 476-485.

Freedman, D., and Wachter, K., 1994. Rejoinder, Statistical Science, 9, 527-537.

Hogan, H., 1993. The 1990 Post-Enumeration Survey: Operations and Results, J. Amer. Statist. Assoc., 88, 1047-1060.

Waite, P.J., and Hogan, H., 1998. Statistical Methodologies for Census 2000: Decisions, Issues, and Preliminary Results, submitted to Amer. Statist. Assoc. July 1998.

Miscellaneous testimony on the Census.