Computational personal genomics: selection, regulation, epigenomics, - - PowerPoint PPT Presentation
Computational personal genomics: selection, regulation, epigenomics, - - PowerPoint PPT Presentation
Computational personal genomics: selection, regulation, epigenomics, disease Manolis Kellis Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory
ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA TATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC TAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC TGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT CTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG AATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA GCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA CTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG TTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT TGATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAA TCGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTG CACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACCA TGGCAGTGGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATAT CCAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTT GGTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGAA TAGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCAC CAGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAG TTCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAAG GCTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGA AATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT TTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG CGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC ATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA GAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA ATTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACT AGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATA GTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGG ACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAG CTTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTAC GAACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACA AAAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGA AATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCA TTCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCAT CCCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATT AGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAA GTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATA GCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACA CAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATC CACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGT GTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCT
ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA TATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC TAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC TGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT CTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG AATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA GCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA CTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG TTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT TGATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAA TCGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTG CACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACCA TGGCAGTGGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATAT CCAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTT GGTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGAA TAGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCAC CAGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAG TTCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAAG GCTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGA AATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT TTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG CGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC ATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA GAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA ATTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACT AGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATA GTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGG ACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAG CTTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTAC GAACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACA AAAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGA AATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCA TTCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCAT CCCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATT AGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAA GTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATA GCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACA CAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATC CACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGT GTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCT
Genes
Encode proteins
Regulatory motifs
Control gene expression
Recombination breakpoints
Family Inheritance Me vs. my brother
My dad Dad’s mom Mom’s dad
Human ancestry Disease risk Genomics: Regions mechanisms drugs Systems: genes combinations pathways
Personal genomics today: 23 and We
Goal: A systems-level understanding of genomes and gene regulation:
- The regulators: Transcription factors, microRNAs, sequence specificities
- The regions: enhancers, promoters, and their tissue-specificity
- The targets: TFstargets, regulatorsenhancers, enhancersgenes
- The grammars: Interplay of multiple TFs prediction of gene expression
The parts list = Building blocks of gene regulatory networks
CATGACTG CATGCCTG Disease-associated variant (SNP/CNV/…) Gene annotation (Coding, 5’/3’UTR, RNAs) Evolutionary signatures Non-coding annotation Chromatin signatures Roles in gene/chromatin regulation Activator/repressor signatures Other evidence of function Signatures of selection (sp/pop)
Understanding human variation and human disease
- Challenge: from loci to mechanism, pathways, drug targets
Tools for interpreting the human genome
- Evolutionary signatures Genome annotation
– Distinct signatures for proteins, ncRNAs, miRNAs, motifs – Read-through, excess-constraint, networks,human selection
- Chromatin signatures Dynamic regulatory regions
– Define chromatin states from combinations of histone marks – Distinct classes of promoter/enhancer/transcribed/repressed
- Activity signatures Link enhancer networks
– Activity-based linking of regulators enhancers targets – Testing of 1000s of enhancer activator / repressor motifs
- Personal genomics Interpret disease mechanism
– Disease-associations: Mechanistic predictions for variants – Beyond top hits: 2000+ GWAS T1D-associated enhancers – Global methylation changes in Alzheimer’s: NRSF, ELK1, CTCF
Evolutionary signatures reveal genes, RNAs, motifs
Com pare 2 9 m am m als
Increased conservation pinpoints functional regions Distinct patterns of change distinguish diff. functions
Protein-coding genes
- Codon Substitution Frequencies
- Reading Frame Conservation
RNA structures
- Compensatory changes
- Silent G-U substitutions
microRNAs
- Shape of conservation profile
- Structural features: loops, pairs
- Relationship with 3’UTR motifs
Regulatory motifs
- Mutations preserve consensus
- Increased Branch Length Score
- Genome-wide conservation
Lindblad-Toh Nature 2011 Stark Nature 2007
Compare 29 mammals: Reveal constrained positions
- Reveal individual transcription factor binding sites
- Within motif instances reveal position-specific bias
- More species: motif consensus directly revealed
NRSF motif
Human constraint outside conserved regions
- Non-conserved regions:
– ENCODE-active regions show reduced diversity Lineage-specific constraint in biochemically-active regions
- Conserved regions:
– Non-ENCODE regions show increased diversity Loss of constraint in human when biochemically-inactive Average diversity (heterozygosity) Aggregate over the genome Active regions
Ward and Kellis Science 2012
Human-specific enhancer functions play regulatory roles
Regulatory genes: Transcription, Chromatin, Signaling. Developmental enhancers: embryo, nerve growth
Transcription initiation from Pol2 promoter Transcription coactivator activity Transcription factor binding Chromatin binding Negative regulation of transcription, DNA-dependent Transcription factor complex Protein complex Protein kinase activity Nerve growth factor receptor signaling pathway Signal transducer activity Protein serine/threonine kinase activity Negative regulation of transcription from Pol2 prom Protein tyrosine kinase activity In utero embryonic development
Tools for interpreting the human genome
- Evolutionary signatures Genome annotation
– Distinct signatures for proteins, ncRNAs, miRNAs, motifs – Read-through, excess-constraint, networks,human selection
- Chromatin signatures Dynamic regulatory regions
– Define chromatin states from combinations of histone marks – Distinct classes of promoter/enhancer/transcribed/repressed
- Activity signatures Link enhancer networks
– Activity-based linking of regulators enhancers targets – Testing of 1000s of enhancer activator / repressor motifs
- Personal genomics Interpret disease mechanism
– Disease-associations: Mechanistic predictions for variants – Beyond top hits: 2000+ GWAS T1D-associated enhancers – Global methylation changes in Alzheimer’s: NRSF, ELK1, CTCF
Integrate epigenomics datasets in multiple cell types
- Epigenetic modifications
- DNA/histone/nucleosome
- Encode epigenetic state
- Histone code hypothesis
- Distinct function for distinct
combinations of marks?
- Hundreds of histone marks
- Astronomical number of
histone mark combinations
- How do we find biologically
relevant ones?
- Unsupervised approach
- Probabilistic model
- Explicit combinatorics
- 1. Histone
modifications
- 3. DNA
accessibility
- 2. DNA
methylation
Epigenomic maps
Chromatin state dynamics across nine cell types
- Single annotation track for each cell type
- Summarize cell-type activity at a glance
- Can study 9-cell activity pattern across
Correlated activity Predicted linking
Epigenomics Roadmap: 90 complete epigenomes
Interpret GWAS, global effects, reveal relevant cell types
Enhancer modules associated with tissue identity Clustering of 500,000 distal enhancers
- Tissue-specific disease-relevant tissues and processes
Dissect motifs in 10,000s of human enhancers
54000+ measurements (x2 cells, 2x repl)
- Predict activators/repressors based on activity correlations
- Validate by engineering enhancers disrupting causal motifs
Example activator: conserved HNF4 motif match
WT expression specific to HepG2 Non-disruptive changes maintain expression Motif match disruptions reduce expression to background Random changes depend on effect to motif match
Tools for interpreting the human genome
- Evolutionary signatures Genome annotation
– Distinct signatures for proteins, ncRNAs, miRNAs, motifs – Read-through, excess-constraint, networks,human selection
- Chromatin signatures Dynamic regulatory regions
– Define chromatin states from combinations of histone marks – Distinct classes of promoter/enhancer/transcribed/repressed
- Activity signatures Link enhancer networks
– Activity-based linking of regulators enhancers targets – Testing of 1000s of enhancer activator / repressor motifs
- Personal genomics Interpret disease mechanism
– Disease-associations: Mechanistic predictions for variants – Beyond top hits: 2000+ GWAS T1D-associated enhancers – Global methylation changes in Alzheimer’s: NRSF, ELK1, CTCF
xx
- Disease-associated SNPs enriched for enhancers in relevant cell types
- E.g. lupus SNP in GM enhancer disrupts Ets1 predicted activator
Revisiting disease- associated variants
GED: Global repression of brain enhancers in AD
- Variation in methylation patterns largely genotype driven
- Global signature of repression in 1000s regulatory regions:
hypermethylation, enhancer states, brain regulator targets
Genotype (1M SNPs x700 ind.) Methylation (450k probes x 700 ind) Reference Chromatin states
Dorsolateral PFC MAP Memory and Aging Project + ROS Religious Order Study
Global hyper-methylation in 1000s of AD-associated loci
Alzheimer’s-associated probes are hypermethylated 480,000 probes, ranked by Alzheimer’s association P-value Methylation
Top 7000 probes
- Global effect across 1000s of probes
– Rank all probes by Alzheimer’s association – 7000 probes increase methylation (repressed) – Enriched in brain-specific enhancers – Near motifs of brain-specific regulators
Complex disease: genome-wide effects
Covers computational challenges associated with personal genomics:
- genotype phasing and haplotype reconstruction resolve mom/dad chromosomes
- exploiting linkage for variant imputation co-inheritance patterns in human population
- ancestry painting for admixed genomes result of human migration patterns
- predicting likely causal variants using functional genomics from regions to mechanism
- comparative genomics annotation of coding/non-coding elements gene regulation
- relating regulatory variation to gene expression or chromatin quantitative trait loci
- measuring recent evolution and human selection selective pressure shaped our genome
- using systems/network information to decipher weak contributions combinatorics
- challenge of complex multi-genic traits: height, diabetes, Alzheimer's 1000s of genes
Daniel Marbach Mike Lin Jason Ernst Jessica Wu Rachel Sealfon Pouya Kheradpour Manolis Kellis Chris Bristow Loyal Goff Irwin Jungreis
MIT Computational Biology group Compbio.mit.edu
Sushmita Roy Luke Ward
Stata4 Stata3
Louisa DiStefano Dave Hendrix Angela Yen Ben Holmes Soheil Feizi Mukul Bansal Bob Altshuler Stefan Washietl Matt Eaton