Computational Systems Biology Deep Learning in the Life Sciences
6.802 6.874 20.390 20.490 HST.506
David Gifford Lecture 10 March 12, 2019
Histone Marks Chromatin 3D Structure
http://mit6874.github.io
1
Computational Systems Biology Deep Learning in the Life Sciences - - PowerPoint PPT Presentation
Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 David Gifford Lecture 10 March 12, 2019 Histone Marks Chromatin 3D Structure http://mit6874.github.io 1 Goals for today Chromatin
David Gifford Lecture 10 March 12, 2019
1
Nucleosome DNA - 146 base pairs, wrapped 1.7 times in a left-handed superhelix Proteins - two copies of each Histones H2A, H2B, H3 and H4. Higher organisms have linker H1 histone
Green -H3, yellow - H4, red - H2A, pink - H2B. Dark and light blue - DNA
Histone variants H3 variants: H3.3 - transcribed CENP-A - centromeres H2A variants: H2A.X - DNA damage macroH2A - X chromosome H2A.Z - transcribed regions Khorasanizadeh, (2004)
Chro ma tin
multiple struc tura l la ye rs a nd o rg a nize s c hro ma tin into “do ma ins” Bo th DNA me thyla tio n a nd c hro ma tin ma rks c o nta in impo rta nt func tio na l info rma tio n
Histo ne T a il Mo dific a tio ns
Sims III et al., 2003
H3K 4me 3 RNA Po l I I
We c an obse r ve c hr
ks and othe r ge nome assoc iate d pr
De te c tion of Class I (ac tive ) and Class II (poise d) e nhanc e r
SC ChI P-se q re a d de nsity pro file s we re g e ne ra te d fo r the indic a te d histo ne mo dific a tio ns c e nte re d o n p300-b o und re g io ns in the to p 1000 Cla ss I a nd Cla ss I I e nha nc e rs, re spe c tive ly. c ) hE SC Na no g ChI P-se q sho ws tha t Na no g b inds a t the thre e pre dic te d Cla ss I I e nha nc e r po sitio ns ne a r the CDX2 g e ne
Roadmap Epigenomics Consortium et al. Nature 518, 317-330 (2015) doi:10.1038/nature14248
Can we find late nt state to e xplain obse r ve d mar ks?
Hidde n Mar kov Mode ls
Hidde n sta te x in [1 .. m] F
E mitte d symb o l y c a n b e multi dime nsio na l F
lo c us t One no de e ve ry 200b p do wn g e no me Pa ra me te rs a re P(xt+1 | xt), P(yt | xt)
Hidde n Mar kov Mode ls c an be use d to c r e ate late nt state s that ge ne r ate c hr
ks
Hidde n Ma rko v Mo de l (Chro mHMM) Divide g e no me into 200b p windo ws Hidde n sta te fo r a 200b p windo w mo de ls wha t histo ne ma rks a re pre se nt in the windo w Unsupe rvise d – re sulting sta te s must b e inte rpre te d with inde pe nde nt da ta T he numb e r o f sta te s is fixe d a nd is a mo de ling de c isio n
Hoffman M M et al. Nucl. Acids Res. 2013;41:827-841
P(xt+1 | xt) P(yt | xt)
Roadmap Epigenomics Consortium et al. Nature 518, 317-330 (2015) doi:10.1038/nature14248
Tissues and cell types profiled in the Roadmap Epigenomics Consortium.
Roadmap Epigenomics Consortium et al. Nature 518, 317-330 (2015) doi:10.1038/nature14248
125 DNa se fe a ture s, 690 T F fe a ture s, 104 histo ne fe a ture s 1000 b p windo w thre e c o nvo lutio n la ye rs with 320, 480 a nd 960 ke rne ls 17% o f g e no me 690 T F b inding pro file s fo r 160 diffe re nt T F s, 125 DHS pro file s a nd 104 histo ne -ma rk pro file s Chr 8 a nd 9 e xc lude d
Ge ne Po l I I Ma ste r Re g ula to rs Me dia to r E nha nc e r Co he sin
Ce ll. 2014 De c 18; 159(7): 1665–1680.
a ,b inte ra c tio ns o b se rve d
a ,b inte ra c tio ns o b se rve d
Nucleic Acids Research, 14 February 2019, gkz051, https://doi.org/10.1093/nar/gkz051
connecting two genomic regions. Each arc is a PET. (B) The PETs plotted on a two-dimensional map using the genomic coordinates of the two reads. Each point is a PET. The colors represent the density values, defined as the number of PETs in the neighborhood. The red dashed square represents the size of the neighborhood. (C) The clustering decision graph. Each point is a PET. The points with high density and high delta values are selected as cluster
(E) The clusters are visualized as arcs. The clusters are labeled as in (C) and (D).
Me tho d 2: CI D use s de nsity-b a se d c luste ring to disc o ve r c hro ma tin inte ra c tio ns
We use a thre e -c o mpo ne nt mixture mo de l to de sc rib e c o nditio na l distrib utio n o f PE T
T c luste rs. One c o mpo ne nt re pre se nts true inte ra c tio n PE T c luste r (T iPC), a nd the o the r two fo r ra ndo m c o llisio n PE T c luste r (Rc PC) a nd ra ndo m lig a tio n PE T c luste r (RlPC), re spe c tive ly. T iPC a nd Rc PC mo de ls inc lude d a ,b dista nc e b e twe e n c luste rs
https:/ / a c a de mic .o up.c o m/ b io info rma tic s/ a rtic le / 31/ 23/ 3 https:/ / a c a de mic .o up.c o m/ na r/ a dva nc e -a rtic le / do i/ 10.1093/ na r/ g
https:/ / www.na ture .c o m/ a rtic le s/ ng .3539
Da rk – inte ra c ting ; L ig ht – no n-inte ra c ting
F e a ture s fo r e nha nc e rs a nd pro mo te rs o nly (E / P), e xte nde d e nha nc e rs a nd pro mo te rs (E E / P), a nd e nha nc e rs a nd pro mo te rs plus the windo ws b e t
Se q ue nc e
se q ue nc e windo ws Chro ma tin – 10 kb / 200 b p b ins DNa se -se q , H3K 4me 1, H3K 4me 2, H3K 27a c , H3K 27me 3, H3K 36me 3, a nd H3K 9me 3