Thomas Lumley: SNP data

My example is finding regions of homozygosity and identity-by-descent in whole-genome SNP data:

The data are sets of roughly a million single-nucleotide polymorphisms per person, where I can definitely get the data on the 270 HapMap individuals and possibly a larger data set.

Homozygosity means that an unusually large chunk of the two (independent) copies of the genome in the same individual matches. Identity-by-descent is the same matching of an unusually large chunk in two different individuals. The statistical questions are whether (and how) the frequency of this sharing varies along the genome and whether is is more likely or less likely to occur within a single person.

