lecture 6 regulatory genomics
play

Lecture 6: Regulatory genomics Gene regulation, chromatin - PowerPoint PPT Presentation

6.874, 6.802, 20.390, 20.490, HST.506 Computational Systems Biology Deep Learning in the Life Sciences Lecture 6: Regulatory genomics Gene regulation, chromatin accessibility, DNA regulatory code Prof. Manolis Kellis Slides credit: 6.047,


  1. 6.874, 6.802, 20.390, 20.490, HST.506 Computational Systems Biology Deep Learning in the Life Sciences Lecture 6: Regulatory genomics Gene regulation, chromatin accessibility, DNA regulatory code Prof. Manolis Kellis Slides credit: 6.047, Anshul Kundaje, David Gifford http://mit6874.github.io

  2. Deep Learning for Regulatory Genomics 1. Biological foundations: Building blocks of Gene Regulation – Gene regulation: Cell diversity, Epigenomics, Regulators (TFs), Motifs, Disease role – Probing gene regulation: TFs/histones: ChIP-seq, Accessibility: DNase/ATAC-seq 2. Classical methods for Regulatory Genomics and Motif Discovery – Enrichment-based motif discovery: Expectation Maximization, Gibbs Sampling – Experimental: PBMs, SELEX. Comparative genomics: Evolutionary conservation. 3. Regulatory Genomics CNNs (Convolutional Neural Networks): Foundations – Key idea: pixels  DNA letters. Patches/filters  Motifs. Higher  combinations – Learning convolutional filters  Motif discovery. Applying them  Motif matches 4. Regulatory Genomics CNNs/RNNs in Practice: Diverse Architectures – DeepBind: Learn motifs, use in (shallow) fully-connected layer, mutation impact – DeepSea: Train model directly on mutational impact prediction – Basset: Multi-task DNase prediction in 164 cell types, reuse/learn motifs – ChromPuter: Multi-task prediction of different TFs, reuse partner motifs – DeepLIFT: Model interpretation based on neuron activation properties – DanQ: Recurrent Neural Network for sequential data analysis

  3. 1a. Basics of gene regulation

  4. One Genome – Many Cell Types ACCAGTTACGACGGTCA GGGTACTGATACCCCAA ACCGTTGACCGCATTTA CAGACGGGGTTTGGGTT TTGCCCCACACAGGTAC GTTAGCTACTGGTTTAG CAATTTACCGTTACAAC GTTTACAGGGTTACGGT TGGGATTTGAAAAAAAG TTTGAGTTGGTTTTTTC ACGGTAGAACGTACCGT TACCAGTA 4 Image Source wikipedia

  5. DNA packaging • Why packaging – DNA is very long – Cell is very small • Compression – Chromosome is 50,000 times shorter than extended DNA • Using the DNA – Before a piece of DNA is used for anything, this compact structure must open locally • Now emerging: – Role of accessibility – State in chromatin itself – Role of 3D interactions

  6. Combinations of marks encode epigenomic state Enhancers Promoters Transcribed Repressed • H3K4me1 • H3K4me3 • H3K36me3 • H3K9me3 • H3K27ac • H3K9ac • H3K79me2 • H3K27me3 • DNase • DNase • H4K20me1 • DNAmethyl • H3K4me3 • H3K4me1 • H3K27ac • H3K36me3 • H4K20me1 • H3K79me3 • H3K27me3 • H3K9me3 • H3K9ac • H3K18ac • 100s of known modifications, many new still emerging • Systematic mapping using ChIP-, Bisulfite-, DNase-Seq

  7. Summarize multiple marks into chromatin states Chromatin state track summary 30+ epigenomics marks WashU Epigenome Browser ChromHMM: multi-variate hidden Markov model

  8. T ra nsc ription fa c tors c ontrol a c tiva tion of c e ll- type - spe c ific promote rs a nd e nha nc e rs Enhancer region Promoter region Protein-coding sequence

  9. T F s use DNA-b inding do ma ins to re c o g nize spe c ific DNA se q ue nc e s in the g e no me “ Logo ” or “ motif ” TAATTA CACGTG AGATAAGA DNA-binding domain of Engrailed TCATTA

  10. Re g ula to r struc ture  re c o g nize d mo tifs • Pro te ins ‘ fe e l’ DNA - Re a d c he mic a l pro pe rtie s o f b a se s - Do NOT o pe n DNA (no b a se c o mple me nta rity) • 3D T o po lo g y dic ta te s spe c ific ity - F ully c o nstra ine d po sitio ns:  e ve ry a to m ma tte rs - “Amb ig uo us / de g e ne ra te ” po sitio ns  lo o se ly c o nta c te d • Othe r type s o f re c o g nitio n - Mic ro RNAs: c o mple me nta rity - Nuc le o so me s: GC c o nte nt - RNAs: struc ture / se q n c o mb ina tio n

  11. Mo tifs summa rize T F se q ue nc e spe c ific ity • Summa rize info rma tio n • I nte g ra te ma ny po sitio ns • Me a sure o f info rma tio n • Disting uish mo tif vs. mo tif insta nc e • Assumptio ns: - I nde pe nde nc e - F ixe d spa c ing

  12. Re gulator y motifs at all le ve ls of pr e / post- tx r e gulation Enhancer regions Promoter motifs Splicing signals Motifs at RNA level Where in the body? When in time? Which variants? Which subsets? • T he pa rts list: ~20-30k g e ne s - Pro te in-c o ding g e ne s, RNA g e ne s (tRNA, mic ro RNA, snRNA) • T he c irc uitry: c o nstruc ts c o ntro lling g e ne usa g e - E nha nc e rs, pro mo te rs, splic ing , po st-tra nsc riptio na l mo tifs • T he re g ula to ry c o de , c o mplic a tio ns: - Co mb ina to ria l c o ding o f ‘ uniq ue ta g s’ - Da ta -c e ntric e nc o ding o f a ddre sse s - Ove rla id with ‘ me mo ry’ ma rks - L a rg e -sc a le o n/ o ff sta te s - Mo dula tio n o f the la rg e -sc a le c o ding - Po st-tra nsc riptio na l a nd po st-tra nsla tio na l info rma tio n • T o da y: disc o ve ring mo tifs in c o -re g ula te d pro mo te rs a nd de no vo mo tif disc o ve ry & ta rg e t ide ntific a tio n

  13. Disrupte d mo tif a t the he a rt o f F T O o b e sity lo c us Strongest association C-to-T disruption of AT-rich with obesity regulatory motif Lean Obese Restoring motif restores thermogenesis

  14. 1b. Technologies for probing gene regulation

  15. Mapping regulator binding: ChIP-seq (Chromatin immunoprecipitation followed by sequencing) TF=transcription factor antibody Bar-coded multiplexed sequencing

  16. ChIP-chip and ChIP-Seq technology overview or modification Image adapted from Wikipedia Modification-specific antibodies  Chromatin Immuno-Precipitation followed by: ChIP-chip: array hybridization ChIP-Seq: Massively Parallel Next-gen Sequencing

  17. ChIP-Seq Histone Modifications: What the raw data looks like • Each sequence tag is 30 base pairs long • Tags are mapped to unique positions in the ~3 billion base reference genome • Number of reads depends on sequencing depth. Typically on the order of 10 million mapped reads. 17

  18. Chro ma tin a c c e ssib ility c a n re ve a l T F b inding She rwo o d, RI , e t a l. “ Disc ove r y of dir e c tiona l a nd nondir e c tiona l pione e r tr a nsc r iption fa c tor s by mode ling DNa se pr ofile ma g nitude a nd sha pe ” Nat. Bio te c h 2014.

  19. DNa se - se q r e ve a ls g e nome pr ote c tion pr ofile s

  20. AT AC-se q

  21. AT AC- se q and DNase - se q ar e not ide ntic al GM12878, Chr. 14, E a c h po int is a c c e ssib ility in a 2 kb windo w Ha shimo to T B, e t a l. “ A Syne r gistic DNA L ogic Pr e dic ts Ge nome - wide Chr omatin Ac c e ssibility” Ge no me Re se arc h 2016

  22. Dnase - se q is le ss de fine d e vide nc e than ChIP- se q A ChIP-seq reports TF-binding locations regions (specifically) seq DNase-seq reports proximal TF- non-binding locations ( noisily ) seq

  23. Bound fa c tor s le a ve distinc t DNa se - se q pr ofile s Esrrb Zfx CTCF Oct4 Brg motif Individua l binding site pr e dic tion is diffic ult Individual CTCF: Aggregate CTCF:

  24. Motifs c a n pr e dic t T F binding Binding site s c ha ng e a c r oss time ~50,000 binding sites for a typical TF ~650,000 TF Motifs

  25. Chr omatin ac c e ssibly influe nc e s tr ansc r iption fac tor binding • Mo de ling a c c e ssib ility pro file s yie lds b inding pre dic tio ns a nd pio ne e r fa c to r disc o ve ry • Asymme tric a c c e ssib ility is induc e d b y dire c tio nal pio ne e rs • T he b inding o f se ttle r fac to rs c a n b e e na b le d b y pro xima l pio ne e r fa c to r b inding She rwo o d, RI , e t a l. “ Disc ove r y of dir e c tiona l a nd nondir e c tiona l pione e r tr a nsc r iption fa c tor s by mode ling DNa se pr ofile ma g nitude a nd sha pe ” Nat. Bio te c h 2014.

  26. Deep Learning for Regulatory Genomics 1. Biological foundations: Building blocks of Gene Regulation – Gene regulation: Cell diversity, Epigenomics, Regulators (TFs), Motifs, Disease role – Probing gene regulation: TFs/histones: ChIP-seq, Accessibility: DNase/ATAC-seq 2. Classical methods for Regulatory Genomics and Motif Discovery – Enrichment-based motif discovery: Expectation Maximization, Gibbs Sampling – Experimental: PBMs, SELEX. Comparative genomics: Evolutionary conservation. 3. Regulatory Genomics CNNs (Convolutional Neural Networks): Foundations – Key idea: pixels  DNA letters. Patches/filters  Motifs. Higher  combinations – Learning convolutional filters  Motif discovery. Applying them  Motif matches 4. Regulatory Genomics CNNs/RNNs in Practice: Diverse Architectures – DeepBind: Learn motifs, use in (shallow) fully-connected layer, mutation impact – DeepSea: Train model directly on mutational impact prediction – Basset: Multi-task DNase prediction in 164 cell types, reuse/learn motifs – ChromPuter: Multi-task prediction of different TFs, reuse partner motifs – DeepLIFT: Model interpretation based on neuron activation properties – DanQ: Recurrent Neural Network for sequential data analysis

  27. 2. Classical regulatory genomics (before Deep Learning)

  28. Enrichment-based discovery methods Given a set of co-regulated/functionally related genes, find common motifs in their promoter regions • Align the promoters to each other using local alignment • Use expert knowledge for what motifs should look like • Find ‘median’ string by enumeration (motif/sample driven) • Start with conserved blocks in the upstream regions

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend