Research Interests
| Selection | Next Generation Sequencing | Demographic Models | Publications |
Natural Selection
Characterizing Adaptive Events
A first step to characterizing an adaptive event, consists of
estimating the age and the strength of selection of a mutation.
To estimate these parameters, however, we must first assess which
model of selection best fits the patterns of genetic
variation around the selected site, because the estimates will be
model dependent. I am interested in estimating these parameters for
various models of natural selection: (1) selection on a de novo
mutation (the selection becomes immediately beneficial when it
enters the population), (2) selection on standing
variation (a mutation may be neutral or weakly deleterious in
the population, but due to an environmental change, the mutation becomes
beneficial) or (3) balancing selection.
The age of the mutations will be different depending on the model considered.
For example, under a model of selection on standing variation or balancing
selection, the mutation can be really old, but selection on the mutation may
be recent (something that does not happen under a model of selection on
a de novo mutation).
Natural selection
How much of human genetic variation is neutral, deleterious or beneficial still
remains an open question. I am interested in understanding how natural
selection (both positive and negative), demographic histories and cultural practices
have shaped genetic variation in the genome. I am particularly interested in how
these forces have impacted the genetic variation on the X chromosome. Since it is
present in one copy in males and two in females, the X chromosome may shed
light on how these processes (demographic histories, negative and positive selection,
dominance, etc.) affect genetic variation, as different theoretical
predictions are expected for this chromosome, compared to the autosomes.
Next Generation Sequencing (NGS) Data
With the advent of NGS, it is now possible to sequence multiple
whole genomes.
However, this brings new challenges as most of the data generated is
low coverage (less than 10X per individual) and sequencing error rates
are higher than more traditional technologies. These characteristics make
calling genotypes challenging. Furthermore, allele frequencies estimated
from called genotypes are very often innacurate, and this leads to biased
estimates of population genetic parameters. I have been involved in the
development of statistical techniques that allow us to bypass genotype
calling and estimate frequencies directly from NGS reads.
More work needs to be done to assess how these
estimates affect the commonly used statistics of genetic variation.
Demographic Inference
The Beta Coalescent
An underlying assumption for Kingman's coalescent is that
the number of offspring are binomially distributed with finite variance as the
population size tends to infinity. A more realistic model for
some marine species with very large family sizes is the beta coalescent.
We modified Hudson's backward simulator ms to adjust for
large family sizes and compared results to theoretical predictions
of the allele frequency spectrum.
Closely related populations
It appears that many of the commonly used methods to estimate demographic
parameters break down when two population have recently diverged. I am currently
working on characterizing which methods or measures of genetics variation
perform better when populations are closely related. This is pertinent because
it will help us better estimate the divergence time between the Tibetan and Han
populations, as the timing of the colonization of the Tibetan plateau
is still an open questions. Estimating the divergence time between these two
populations may help us reconcile estimates based on genetic evidence and
those based on archeaological evidence.
Publications
B. Peter, E. Huerta-Sanchez and R. Nielsen. Distinguishing between selective
sweeps from standing variation and from a de novo mutation. Submitted to PloS Genetics.
K. E. Lohmueller, A. Albrechtsen, Y. Li, S. Y. Kim, T. Korneliussen, N. Vinckenbosch, G. Tian,
E. Huerta-Sanchez et al. Natural Selection Affects Multiple Aspects of Genetic Variation at
Putatively Neutral Sites across the Human Genome. PLoS Genetics (2011).
X. Yi*, Y. Liang*, E. Huerta-Sanchez*, X. Jin*, Z. X. P. Cuo*, J.E. Pool* et al.
Archaelogy Augments Tibet's Genetic History--Response. Science 329(5998):1467-1468, 2010.
X. Yi*, Y. Liang*, E. Huerta-Sanchez*, X. Jin*, Z. X. P. Cuo*, J.E. Pool* et al.
Sequencing of 50 human exomes reveals adaptation to high altitude.
Science 329(5987):75-78, 2010.(Subject of a Perspective's article in the same issue and media coverage)
Y. Li*, N. Vinckenbosch*, G. Tian*, E. Huerta-Sanchez*, T. Jiang* et al.
Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants.
Nature Genetics 42, 969-972, 2010.
C. Murray*, E. Huerta-Sanchez*, F. Casey and D. Bradley.
Cattle demographic history modelled from autosomal sequence variation.
Phil. Trans. R. Soc. B 365:2531-2539, 2010.
E. Huerta-Sanchez, R. Durrett, and C. D. Bustamante.
Population genetics of polymorphism and divergence under fluctuating selection.
Genetics 178:325-337, 2008.
E. Huerta-Sanchez and R. Durrett.
Wagner's canalization model.
Theoretical Population Biology 71(2):121-130, 2007.
B. Gonzales, E. Huerta-Sanchez, C. Kribs, A. Ortiz-Nieves and T. Vazquez-Alvarez.
Am I Too Fat? Bulimia as an Epidemic.
Journal of Mathematical Psychology 47(5-6): 515-526, 2003.
E. Huerta-Sanchez, A. Lopez, D. Uminsky. Iterations
of Even-Odd Splitting Map Can Make Integration Easier
The Pi Mu Epsilon Journal. Vol. 11, No. 5, 241-250, 2001.
E. Huerta-Sanchez, K. Rios-Soto, G. Jordan-Salivia. The Effects
of Mass Transportation During a Deliberate Release of Smallpox. Technical report
for the Mathematical and Theoretical Biology Institute(MTBI).
Cornell University, Ithaca, NY Summer 2002.
* Joint first author
Publications in preparation
Estimating the distribution of family sizes in the lambda coalescent (with Rick Durrett and Carlos Bustamante).
One of the underlying assumptions of the Kingman's coalescent is that
the offspring distribution is binomial, leading to a limiting
process (as the population tends to infinity) which is the
Kingman's coalescent model.
We study here an alternative coalescent model called the Beta coalescent
which is a better representation for some marine species which are
characterized by very large family sizes.
Characterizing the genetic basis behind high altitude adaptation in Tibetans.
To further characterize the genetic adaptation we identified, we are analyzing sequences of the
candidate regions to estimate the timing and strength of selection under the appropriate model
of selection using an Approximate Bayesian Computation (ABC) framework.
Tibetan Demographic History. The timing of when Tibetans inhabited the
Tibetan plateau is still an open question, with varying estimates from the
archaeological record and genetic studies. In this project, we aim to reconcile
both types of evidence by employing divergence estimates from the closely
related Han Chinese population.