-
Is each record in the dataset one family?
No, each record represents one person. If there are 3
people in a family then there will be three records in
the file, one for each.
There is no family identifier in the file, so families
cannot be exactly matched up.
-
Do I take a SRS of 1,000 from the whole population?
No, you are to take a stratified cluster sample.
To do this you take a sample from each of the
regions in the city separately.
You need to figure out how many persons you will sample
from each region. the total sample size should be 1000.
-
How do I take a cluster sample when the data set is not in
clusters?
Look at the random digit dialing example in your text.
It shows how a SRS of phone numbers can be used to sample
banks of 100 phone numbers where the probability is
proportional to the number of residential phones.
-
What if I wind up with two people from the same family?
This should be a rare event -- ignore it.
- How do I rake when the race codes in the data set do not
match the categories supplied (white non-Hispanic, etc)?
You will need to create these categories by using two
variable --race and ethnicity.
For example, white non-Hispanic are those with
race = 1 and ethnicity = 8, and Hispanics are those
with the ethnicity code between 1 and 7.
The other category should be the remainder ( you can
include the NA and DKs in this group).
- When I rake, do I need to weight the data according
to the probability of picking a family?
No, the raking should be done at the individual person level.
Make a table of the counts in your sample of the 8 categories:
Male White non-Hispanic, Female White non-Hispanic, ...,
Male Other, and Female Other.
Use this table to find the weight-class weights using
the raking methods. (The population totals are in
the Fort Evans project description.)
- What do I do with the records that have NA in them
for everything but the region?
You just drop these records from your sample.
They are your non-respondents.
Your total "working sample" size will be less than 1,000.
- When I rake, do I need to worry about the way that
I allocated the sample sizes to the strata?
Yes, if you have allocated your sample proportional to
the population totals for the strata then you need not
worry. Otherwise, the counts in your table should take
into consideration the allocation.
-
When I impute the education level for those with missing
values, do I need to use the weighting scheme in selecting
the records at random in the hot deck procedure?
No, you may simply choose at random with replacement from
the group of persons with sex and race that match the
record with the missing education. (Note that race is
defined according to the variable you created in 2ii,
i.e. White non-Hispanic, Black non-Hispanic, Hispanic,
and Other.)
-
What should my estimator for the proportion of college educated
adults look like?
-
Start by creating a new variable that indicates whether
a person is college educated or not, i.e. y_i is 1 if
the person has at least a college education, and 0
otherwise. Also create an indicator for whether the
person is an adult or not, i.e. x_i is 1 if the
person is 18 or over and 0 otherwise.
-
Use the weights from 2ii in creating your estimator.
Call these rates r_i for raking weights.
These are the only weights that you will need, because
your estimator is at the person level, not family
level.
-
Find the proportion of people in each stratum that
are college educated. Be sure to use your weights.
This means that your estimator should be
(sum y_i r_i)/(sum x_i r_i),
where the sum is over those individuals sampled from
the first ( or second) stratum.
-
Use the population totals to combine your two
proportions into a single estimate.
-
What weights do I need to use in finding the family income?
Here, you will need to take the product of the family
weight (call it f_i) and the raking weight:
w_i = f_i * r_i.
Also if you have not proportionally allocated then you
will need adjust for the allocation method (It's
probably a good idea to proportionally allocate
your sample). You should probably check to see if
the sum r_i over stratum 1 is reasonable close to the
population total for stratum 1. If not, you may want
to include a third weight to adjust for this difference.
- Do I compute the median separately for each stratum
and then combine them as I did with the estimate for
proportion of college educated adults?
Unfortunately that won't work. Instead, you need to
find that income I* such that
Sum w_i for those with incomes less than or
equal to I* is
1/2 Sum w_i for all those in the sample.
-
How do I compute the median using weights?
Try the follwoing -- sort your weigthts acccording to
salary. Then do the follwoing:
htot<-sum(wt)*.5
indexmedian <- min ((1:lngth(wt))[cumsum(wt)>htot]