Annual Retreat 2019

8:30AM - Breakfast

9:00AM - Welcome/Introduction [Organizers]


9:10AM - *Keynote Speaker : Daniel L. Hartl

:: Higgins Professor of Biology in the Department of Organismic and Evolutionary Biology at Harvard University::

  • Title: "Eliza Doolittle as metaphor: Cognate mutants in orthologous proteins"
  • Summary: Most proteins undergo rapid transitions between a nonfunctional unfolded state and a functional folded state, a phenomenon called "breathing." The ratio of folded to unfolded molecules is captured in a quantity ΔG referred to as the thermodynamic stability. Low stability results in a large pool of unfolded molecules, whereas high stability results in more folded molecules and a more stable structure. Most real proteins are only marginally stable, with ΔG values between –3 and –10 kcal/mol. Because the energy of a single hydrogen bond is 2–10 kcal/mol (roughly the same magnitude as ΔG), many amino acid replacements may slightly hinder or enhance protein folding, and those with opposing effects may be compensatory. These biophysical considerations are relevant to increasing evidence that supports the hypothesis that most amino acid polymorphisms within populations are slightly deleterious. The polymorphisms tend to be surface residues accessible to solvent. Slightly adverse effects on protein folding, stability, aggregation, or degradation may mediate these deleterious effects. Compensatory mutations could therefore account for the finding that most amino acid replacements between related species are subject to weak positive selection even though most amino acid polymorphisms within populations are deleterious. One test of this model is to examine cognate amino acid replacements in orthologous proteins. If fixed differences affect protein folding, stability, aggregation, or degradation, then cognate replacements should in some cases have different effects on protein stability either quantitatively or even qualitatively. We have carried out experimental tests of bacterial DHFR orthologs carrying cognate mutations conferring resistance to trimethoprim in transgenic E. coli. These results support a model in which cognate mutations in orthologous proteins can result in major differences in protein stability. 

10:20AM - Coffee Break (25min)


10:45AM - 1) L. Thibério Rangel: Department of Earth, Atmospheric & Planetary Sciences, MIT

  • Title: Slow-evolving sites negatively impact reconstruction of deep Tree of Life phylogenies
  • Summary: Slow-fast analysis has been broadly used in phylogenetic reconstruction. The underlying assumption of this method is that fast-evolving sites do not retain accurate phylogenetic signal due to site saturation via multiple substitution events. Therefore, removing these sites improves the signal-to-noise ratio in phylogenetic analyses, with the remaining slower-evolving sites preserving a more reliable record of deep evolutionary relationships. However, slow-evolving sites are less likely to have experienced substitutions along shorter branches, and therefore could be less likely to retain evolutionary information about many bipartitions. Here we show that slow-fast analysis can potentially negatively impact the accuracy of phylogenetic reconstruction in both real and simulated aligned sequence datasets. Simulated alignments generated under a predefined phylogeny, modeled after Tree of Life ribosomal protein datasets, consistently show that slow-evolving sites are less likely to recover true bipartitions than even the fastest-evolving sites. Furthermore, site rate is positively correlated with accurately recovering shorter branched bipartitions. We further tested our hypothesis using the concatenated ribosomal protein dataset published by Hug et al. (2016). We show that phylogenetic signal present among both the slowest and fastest evolving sites is significantly less compatible to the overall signal than within other sites. Furthermore, for this dataset, trimming fast sites, slow sites, or both has distinct levels of impact on phylogenetic reconstruction under different evolutionary models. This is perhaps most evident in the resulting placements of Eukarya and Asgard groups, which are especially sensitive to the implementation of different trimming schemes.

11:10AM - 2) James Xue: Sabeti Lab

  • Title:  Uncovering Lost Sequences: Genome-wide Functional Screen of Human-Specific Deletions
  • Summary: The identification and characterization of human-specific regulatory elements is crucial towards understanding our unique evolutionary history. Particularly compelling are highly conserved regions of the primate genome that are deleted specifically in humans. Prior work has focused on identifying large and intermediate sized (>20bp) human CONserved DELetions (hCONDELs) which have been shown to reside almost entirely in non-coding regions and are enriched near genes involved in steroid hormone receptor signaling and neural functions. However, smaller deletions, which are in much greater in abundance, have not been rigorously identified or characterized for their function due to various technical challenges. Using multiple sequence alignments generated by our group from the latest available primate reference genomes, we find that the vast majority of putative hCONDELs (>95%) are very small, suggesting that most hCONDELs are yet to be explicitly described. We computationally validated our hCONDELs by ensuring that these putative deletion sites are completely removed from a diverse set of human populations from the Simons Genome Diversity Project. To characterize the functional importance of the deletions, we employed the Massively Parallel Reporter Assay (MPRA) as a non-coding screen to assess the cis-regulatory impact of 17,197 deletions. We utilized MPRA across a diverse range of cell types to test for function across different tissue backgrounds. hCONDELs which were found to be active were enriched to lie in active regulatory regions of specific tissues. Furthermore, several hCONDELs show significant differences in activity via reporter assay between their native human and chimp contexts, and are prime candidates for contributing to possible human-specific phenotypes. We further characterize these hCONDELs to show changes in epigenetic activity at these loci and also transcriptional changes in nearby genes between human and chimpanzee. On a broader scale, we explored genome-wide trends in transcription factor binding, motif usage, and chromatin activity associated with those sites displaying species-specific activity. The characterization of these sites allows us to construct a clearer picture of the functional significance for sequences lost along the human lineage.

11:35AM - 3) Vagheesh Narasimhan: Reich Lab

  • Title:  Reconstructing whole genome epigenetic profiles from ancient DNA data and its application to understanding the age of death of human samples over the past 10,000 years.
  • Summary: DNA methylation is a key hallmark of gene activity in mammals, where it occurs almost exclusively in cytosines in the context of CpG dinucleotides. Naturally degraded methylated cytosines in ancient DNA are converted to thymines and can be used to reconstruct ancient methylomes that are highly correlated with genome wide data from modern bone. DNA methylation levels from modern blood samples have been shown to be highly predictive of human age. Here we combined epigenetic data derived from over 10,000 individuals from published human cancer studies across multiple tissues to develop a new predictor for chronological age that is suitable for application to epigenetic information obtained from ancient DNA sequencing of human bone samples. Using additional modern data, we show that this clock is robust to the examined tissue type, sex, and across samples across a broad age range and to samples for which epigenetic profiles have been obtained from alternative technologies. To validate our method, we applied our predictor to a set of ancient DNA samples with skeletal morphological information for which anthropological age estimates based on these were available and then applied this method to a set of high coverage ancient DNA samples from Mesolithic Europe (7000-10000BCE). Our method opens up the possibility of to examine changes in DNA methylation profiles and human life expectancy across the past 10,000 years of human history as additional ancient DNA samples become available. 

12:00PM - Lunch 


1:10PM - 4) Katherine Lawrence and Alex Nguyen Ba: Desai Lab

  • Title:  Comprehensive measurements of genetic architecture in a diverse yeast cross through Barcoded Bulk QTL mapping
  • Summary: Across organisms, studies of the genetic basis underlying complex traits have consistently observed two trends: high polygenicity, where quantitative trait loci (QTLs) are numerous, dispersed, and contributing at small effect sizes; and missing heritability, where the detected QTLs do not explain all of the genetic variance in the phenotypes. Potential sources of missing heritability include numerous small-effect QTLs below the studies’ sensitivity as well as epistatic interactions between QTLs. Recent work has significantly advanced the spatial resolution of QTL mapping for identifying causal nucleotides, but at the expense of resolution for effect size and epistatic interactions. Here we demonstrate a QTL mapping analysis of a pool of 100,000 yeast segregants. Our approach combines the advantages of individual phenotyping/genotyping and bulk segregant analysis, allowing detection of QTLs with sub-1% effect sizes and precise identification of their locations. Collecting high-resolution genotype and phenotype data on this scale is achievable and cost-effective due to a suite of novel techniques: robotic liquid handling, lineage barcoding, combinatorial indexing, custom enzyme purification, and bulk assays of fitness and other phenotypes. We identify hundreds of small-effect QTLs across dozens of complex traits and quantify their contributions to missing heritability, as well as their epistatic interactions and pleiotropic effects. This flexible and powerful advance in QTL mapping enables comprehensive measurements of genetic architecture in diverse yeast crosses, which has profound implications for the evolutionary trajectories of recombining populations.

1:35PM - 5) Chris Bakerlee: Desai Lab

  • Title:  Systematic analysis of higher-order genetic interactions using combinatorial CRISPR-Cas9 gene drives
  • Summary: An evolving population can be conceptualized as traversing a high-dimensional “fitness landscape,” where each coordinate in genotype space maps to some fitness value. To understand the process of evolution, it is essential to understand the topography of these fitness landscapes. However, given their high dimensionality, attempts to analyze empirical landscapes have encountered steep technical barriers, with researchers limited to either combinatorially complete sets of very few mutations or a much sparser sampling of many more mutations, often at the level of a single protein. Here, we have overcome these limitations through the use of gRNA-coupled CRISPR gene drives, which enable the facile creation of combinatorially complete sets of an arbitrary number of mutations across the entire genome. We used this technology to construct a complete landscape spanning ten loci in budding yeast, generating 210 (1024) genotypes as haploids, heterozygotes, and homozygotes. We measured genotypes' fitnesses in bulk in multiple environments via DNA barcode-based assays. Consistent with previous results, we find that higher-order interactions contribute significantly to the fitness of genotypes carrying multiple mutations. Additionally, while it appears that mutations' fitness effects are mediated by genotypes' background fitness, these examples of global epistasis frequently collapse to specific idiosyncratic interactions upon closer inspection. We discuss our findings' implications for adaptive evolution on various timescales. 

2:00PM - Coffee Break (25min)


2:25PM - 6) Eadaoin Harney: Wakeley/Reich Labs

  • Title:  Assessing the Performance of qpAdm: A Statistical Tool for Studying Population Admixture
  • Summary: A major drawback of many population genetic tools for modeling the ancestry of admixed populations is the requirement for the user to have a complete understanding of the population histories of all other groups included in the model. qpAdm—a statistical tool for modeling the ancestry of admixed (or un-admixed) populations—circumvents this problem by eliminating the need for users to specify the underlying relationships of all other populations in the model. Although qpAdm is growing in popularity (particularly in the field of ancient DNA), relatively little has been done to assess its performance under both simple and complex scenarios. We performed a simulation-based study to assess the behavior of qpAdm under various scenarios in order to identify areas of potential weakness and establish recommended best practices for use.

2:50PM - 7) Vladimir Seplyarskiy: Sunyaev Lab            

  • Title:  Population sequencing data reveal a compendium of mutational processes in human germline
  • Summary: Mechanistic processes underlying human germline mutation remain largely unknown. Variation in mutation rate and spectra along the genome is informative about the biological mechanisms. We statistically decompose this variation into separate processes using matrix factorization. The analysis of large-scale whole genome sequencing dataset (TOPMed) reveals nine processes that explain the variation in mutation properties between loci. Six of these processes lend themselves to a biological interpretation. Most of the genomic variation in mutation rate is due to a process associated with bulky DNA damage. Two processes independently track direction of replication fork and replication timing. We identify a mutagenic effect of active demethylation primarily acting in regulatory regions. We also demonstrate that a recently discovered mutagenic process specific to oocytes can be localized solely from population sequencing data. This process is spread across all chromosomes and is highly asymmetric with respect to direction of transcription suggesting a major role of DNA damage.

3:15PM - Closing Remarks [Organizers]