The Epigenome Tools 2: ChIP-Seq and Data Analysis
Chongzhi Zang
zang@virginia.edu http://zanglab.com PHS5705: Public Health Genomics March 20, 2017
1
The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang - - PowerPoint PPT Presentation
The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang zang@virginia.edu http://zanglab.com PHS5705: Public Health Genomics March 20, 2017 1 Outline Epigenome: basics review ChIP-seq overview ChIP-seq data analysis 2
1
2
3 Original figure from ENCODE, Darryl Leja (NHGRI), Ian Dunham (EBI)
nucleosome histone
The epigenome is a multitude of chemical compounds that can tell the genome what to do. The epigenome is made up of chemical compounds and proteins that can attach to DNA and direct such actions as turning genes on or off, controlling the production of proteins in particular cells. -- from genome.gov
4
5
Allis C. et al. Epigenetics. 2006
Notation: H3K4me3
Differential expression log2 (fold-change)
6 Wang, Zang et al. Nat Genet 2008
0.35 Fractions of enhancers 0.30 0.25 0.20 0.15 0.10 0.05
H2AK5ac H2AK9ac H2BK5ac H2BK5me1 H2BK12ac H2BK20ac H2BK120ac H3K4ac H3K4me1 H3K4me2 H3K4me3 H3K9me1 H3K9me2 H3K9me3 H3K14ac H3K18ac H3K23ac H3K27ac H3K27me1 H3K27me2 H3K36me1 H3K36me3 H3K79me1 H3K79me2 H3K79me3 H3R2me1 H3R2me2 H4K5ac H4K8ac H4K12ac H4K16ac H4K91ac H4K20me1 H4K20me3 H3K27me3 H3K36ac H3K9ac H2A.Z
Table 3. Distinctive Chromatin Features of Genomic Elements Functional Annotation Histone Marks Promoters H3K4me3 Bivalent/Poised Promoter H3K4me3/H3K27me3 Transcribed Gene Body H3K36me3 Enhancer (both active and poised) H3K4me1 Poised Developmental Enhancer H3K4me1/H3K27me3 Active Enhancer H3K4me1/H3K27ac Polycomb Repressed Regions H3K27me3 Heterochromatin H3K9me3
7 Rivera & Ren Cell 2013
8
H3K4me3 H3K27me3
Repressed Remained Induced Poised
From: https://pubs.niaaa.nih.gov/publications/arcr351/77-85.htm
9 Original figure from ENCODE, Darryl Leja (NHGRI), Ian Dunham (EBI)
nucleosome histone ATAC-seq
10
500 1000 1500 2000 2500 3000 Mei et al. Nucleic Acids Research 2016
11
12
13
14
15
16
17
18
Scale chr19: 500 bases hg19 15,308,000 15,308,100 15,308,200 15,308,300 15,308,400 15,308,500 15,308,600 15,308,700 15,308,800 15,308,900 15,309,000 15,309,100 15,309,200 User Supplied Track
@ILLUMINA-8879DC:231:KK:3:1:1070:945 1:Y:0: NNNAATACAGTCAGAAACATATCATATTGGAGAATA #################################### @ILLUMINA-8879DC:231:KK:3:1:1153:945 1:Y:0: NNNAAGCACACAGAAGATAACTAAACAATCAAGTAG #################################### @ILLUMINA-8879DC:231:KK:3:1:1222:945 1:Y:0: NNNAAGGGTCTTGAGAAGAAATCATTCTGGATGGCA #################################### @ILLUMINA-8879DC:231:KK:3:1:1304:939 1:Y:0: NNNCCAGGCTCCCGCGATTCTCCTGCCTCAGCTTCT #################################### @ILLUMINA-8879DC:231:KK:3:1:1354:945 1:Y:0: NNNCTCTTCCTTAGCTAAACTTTCAACTAAGCCAAA #################################### @ILLUMINA-8879DC:231:KK:3:1:1411:932 1:Y:0: NNNGTAGGACCATTGGCGTTGCGACACAAAAAATTT #################################### @ILLUMINA-8879DC:231:KK:3:1:1496:937 1:Y:0: NNNTTCATCGGGTTGAGAGTCCCCTTGTTGCATGCA #################################### @ILLUMINA-8879DC:231:KK:3:1:1533:939 1:Y:0: NNNATTTTCCCGTTCCAGGTCGCAATTTCCGCCGTT #################################### @ILLUMINA-8879DC:231:KK:3:1:1573:940 1:Y:0: NNNGGGGTGCGCCTTTAGTCCCAGCTACTCAGGAAC ####################################
19
20
cannot map to the reference genome can map to multiple loci in the genome can map to a unique location in the genome ✗ ✗ ✔ ✔
Langmead et al. 2009, Zang et al. 2009
d
21
−600 −400 −200 200 400 600 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Distance to the middle Percentage forward tags reverse tags0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 50 100 150 200 250 300 350 400
s
peak model cross-correlation
transcription factor binding, DNase, ATAC-seq MACS (Zhang, 2008)
dynamic background Poisson model
Histone modifications, “super-enhancers” Diffuse
SICER (Zang, 2009) Spatial clustering of localized weak signal and integrative Poisson model
22 Wang, Zang et al. 2014
NOTCH1 H3K27ac
= total tag / genome size)
– Chromatin and sequencing bias – 200-300bp control windows have to few tags – But can look further Dynamic λlocal = max(λBG, [λctrl, λ1k,] λ5k, λ10k) ChIP Control 300bp 1kb 5kb 10kb
http://liulab.dfci.harvard.edu/MACS/ Zhang et al, Genome Bio, 2008
24 Zang et al. Bioinformatics 2009
10kb 5kb
25
Parameter Remarks Genome Species and reference genome version, e.g. hg38, hg18, mm10, mm9 Effective genome rate Fraction of the mappable genome, vary in species, read length, etc. DNA fragment size Estimated by default; can specify
Window size Data resolution, usually nucleosome periodicity length, i.e. 200bp Gap size (for SICER only) Allowable gaps between eligible windows, usually 2 or 3 windows P-value cut-off Threshold for peak calling, from model False discovery rate (FDR) cut-off Threshold for peak calling, BH correction from p-value.
26
chr11 10344210 10344260 255
76649430 76649480 255 + chr3 77858754 77858804 255 + chr16 62688333 62688383 255 + chr22 33031123 33031173 255
visualization
27
28
Raw sequence reads
Aligned reads
Profile; Peaks
MACS/SICER Bowtie/BWA Reference genome
29
30
31
– UCSC genome browser: http://genome.ucsc.edu/ – WashU epigenome browser: http://epigenomegateway.wustl.edu/ – IGV: http://software.broadinstitute.org/software/igv/
– CEAS: http://liulab.dfci.harvard.edu/CEAS/
– BETA: http://cistrome.org/BETA/ – MARGE: http://cistrome.org/MARGE/
– GREAT: http://great.stanford.edu – ENCODE SCREEN: http://screen.umassmed.edu/ – MANCIE: https://cran.r-project.org/package=MANCIE – Cistrome DB: http://cistrome.org/db/
32
33
P(gi) =
exp
λ
i
34 Wang, Zang et al. Genome Res 2016
samples samples samples sample selection enhancer prediction
35
https://www.encodeproject.org/
36
37
38
The cancer epigenome: Concepts, challenges, and therapeutic opportunities Science 17 Mar 2017: Vol. 355, Issue 6330, pp.1147-1152 http://science.sciencemag.org/content/355/6330/1147
39
Heterochromatin Heterochromatin Euchromatin
Reader Eraser Writer Nucleosome DNA Histones BRD2/3/4 SMARCA2 SMARCA4 BAZ2A PB1 BRPF1 ATAD2 L3MBTL3 WDR5 BRD9 BRD7 CBX7 EZH2 MMSET DOT1L SETD7 MLL1 PRMT1 PRMT3 PRMT5 SMYD2 G9a/GLP DNMT HDAC JMJD3/UTX LSD1 CREBBP EP300 MOZ IDH1* IDH2* DNA modifcations DNA and histone modifcations Histone modifcations Therapies targeting:
Nucleus
40