Emilia Huerta-Sánchez

Research Interests

Natural Selection

Characterizing Adaptive Events
A first step to characterizing an adaptive event, consists of estimating the age and the strength of selection of a mutation. To estimate these parameters, however, we must first assess which model of selection best fits the patterns of genetic variation around the selected site, because the estimates will be model dependent. I am interested in estimating these parameters for various models of natural selection: (1) selection on a de novo mutation (the selection becomes immediately beneficial when it enters the population), (2) selection on standing variation (a mutation may be neutral or weakly deleterious in the population, but due to an environmental change, the mutation becomes beneficial) or (3) balancing selection. The age of the mutations will be different depending on the model considered. For example, under a model of selection on standing variation or balancing selection, the mutation can be really old, but selection on the mutation may be recent (something that does not happen under a model of selection on a de novo mutation).

Natural selection
How much of human genetic variation is neutral, deleterious or beneficial still remains an open question. I am interested in understanding how natural selection (both positive and negative), demographic histories and cultural practices have shaped genetic variation in the genome. I am particularly interested in how these forces have impacted the genetic variation on the X chromosome. Since it is present in one copy in males and two in females, the X chromosome may shed light on how these processes (demographic histories, negative and positive selection, dominance, etc.) affect genetic variation, as different theoretical predictions are expected for this chromosome, compared to the autosomes.

Next Generation Sequencing (NGS) Data

With the advent of NGS, it is now possible to sequence multiple whole genomes. However, this brings new challenges as most of the data generated is low coverage (less than 10X per individual) and sequencing error rates are higher than more traditional technologies. These characteristics make calling genotypes challenging. Furthermore, allele frequencies estimated from called genotypes are very often innacurate, and this leads to biased estimates of population genetic parameters. I have been involved in the development of statistical techniques that allow us to bypass genotype calling and estimate frequencies directly from NGS reads. More work needs to be done to assess how these estimates affect the commonly used statistics of genetic variation.

Demographic Inference

The Beta Coalescent
An underlying assumption for Kingman's coalescent is that the number of offspring are binomially distributed with finite variance as the population size tends to infinity. A more realistic model for some marine species with very large family sizes is the beta coalescent. We modified Hudson's backward simulator ms to adjust for large family sizes and compared results to theoretical predictions of the allele frequency spectrum.

Closely related populations
It appears that many of the commonly used methods to estimate demographic parameters break down when two population have recently diverged. I am currently working on characterizing which methods or measures of genetics variation perform better when populations are closely related. This is pertinent because it will help us better estimate the divergence time between the Tibetan and Han populations, as the timing of the colonization of the Tibetan plateau is still an open questions. Estimating the divergence time between these two populations may help us reconcile estimates based on genetic evidence and those based on archeaological evidence.

Publications

B. Peter, E. Huerta-Sanchez and R. Nielsen. Distinguishing between selective sweeps from standing variation and from a de novo mutation. Submitted to PloS Genetics.

K. E. Lohmueller, A. Albrechtsen, Y. Li, S. Y. Kim, T. Korneliussen, N. Vinckenbosch, G. Tian, E. Huerta-Sanchez et al. Natural Selection Affects Multiple Aspects of Genetic Variation at Putatively Neutral Sites across the Human Genome. PLoS Genetics (2011).

X. Yi*, Y. Liang*, E. Huerta-Sanchez*, X. Jin*, Z. X. P. Cuo*, J.E. Pool* et al. Archaelogy Augments Tibet's Genetic History--Response. Science 329(5998):1467-1468, 2010.

X. Yi*, Y. Liang*, E. Huerta-Sanchez*, X. Jin*, Z. X. P. Cuo*, J.E. Pool* et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science 329(5987):75-78, 2010.(Subject of a Perspective's article in the same issue and media coverage)

Y. Li*, N. Vinckenbosch*, G. Tian*, E. Huerta-Sanchez*, T. Jiang* et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nature Genetics 42, 969-972, 2010.

C. Murray*, E. Huerta-Sanchez*, F. Casey and D. Bradley. Cattle demographic history modelled from autosomal sequence variation. Phil. Trans. R. Soc. B 365:2531-2539, 2010.

E. Huerta-Sanchez, R. Durrett, and C. D. Bustamante. Population genetics of polymorphism and divergence under fluctuating selection. Genetics 178:325-337, 2008.

E. Huerta-Sanchez and R. Durrett. Wagner's canalization model. Theoretical Population Biology 71(2):121-130, 2007.

B. Gonzales, E. Huerta-Sanchez, C. Kribs, A. Ortiz-Nieves and T. Vazquez-Alvarez. Am I Too Fat? Bulimia as an Epidemic. Journal of Mathematical Psychology 47(5-6): 515-526, 2003.

E. Huerta-Sanchez, A. Lopez, D. Uminsky. Iterations of Even-Odd Splitting Map Can Make Integration Easier The Pi Mu Epsilon Journal. Vol. 11, No. 5, 241-250, 2001.

E. Huerta-Sanchez, K. Rios-Soto, G. Jordan-Salivia. The Effects of Mass Transportation During a Deliberate Release of Smallpox. Technical report for the Mathematical and Theoretical Biology Institute(MTBI). Cornell University, Ithaca, NY Summer 2002.

* Joint first author

Publications in preparation

Estimating the distribution of family sizes in the lambda coalescent (with Rick Durrett and Carlos Bustamante). One of the underlying assumptions of the Kingman's coalescent is that the offspring distribution is binomial, leading to a limiting process (as the population tends to infinity) which is the Kingman's coalescent model. We study here an alternative coalescent model called the Beta coalescent which is a better representation for some marine species which are characterized by very large family sizes.

Characterizing the genetic basis behind high altitude adaptation in Tibetans. To further characterize the genetic adaptation we identified, we are analyzing sequences of the candidate regions to estimate the timing and strength of selection under the appropriate model of selection using an Approximate Bayesian Computation (ABC) framework.

Tibetan Demographic History. The timing of when Tibetans inhabited the Tibetan plateau is still an open question, with varying estimates from the archaeological record and genetic studies. In this project, we aim to reconcile both types of evidence by employing divergence estimates from the closely related Han Chinese population.