Reverse engineering mammalian transcriptional regulatory circuits
Andrew D Smith Pavel Sumazin
Zhang Lab, CSHL & Califano Lab, Columbia
ISMB 2007
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 1 / 109
Reverse engineering mammalian transcriptional regulatory circuits - - PowerPoint PPT Presentation
Reverse engineering mammalian transcriptional regulatory circuits Andrew D Smith Pavel Sumazin Zhang Lab, CSHL & Califano Lab, Columbia ISMB 2007 Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB07 1
Zhang Lab, CSHL & Califano Lab, Columbia
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 1 / 109
Outlines Part I: Lecture Format
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 2 / 109
Outlines Part II: Worked Examples
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 3 / 109
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 4 / 109
Overview
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 5 / 109
Introduction
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 6 / 109
Introduction Background on regulatory networks
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 7 / 109
Introduction Background on regulatory networks
(Levine & Tjian, 2003)
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 8 / 109
Introduction Background on regulatory networks
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 9 / 109
Introduction Background on regulatory networks
TF target TF target gene gene
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 10 / 109
Introduction Background on regulatory networks
target target target target target target target target target target target target target target target target target target target target TF TF TF TF TF TF TF target gene gene gene gene gene gene gene gene gene gene gene gene gene gene gene gene gene gene gene gene gene
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 10 / 109
Introduction Background on regulatory networks
target target target target target target target target target target target target target target target target target target target TF TF TF TF TF TF target TF target gene gene gene gene gene gene gene gene gene gene gene gene gene gene gene gene gene gene gene gene gene
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 10 / 109
Introduction Background on regulatory networks
TF target Edge indicates physical interaction TF target gene gene
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 11 / 109
Introduction Background on regulatory networks
TF target
gene
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 11 / 109
Introduction Background on regulatory networks
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 12 / 109
Introduction Data available for analysis
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 13 / 109
Introduction Data available for analysis
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 14 / 109
Introduction Data available for analysis
Wang, Snyder & Gerstein (2007)
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 15 / 109
Introduction Data available for analysis
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 16 / 109
Introduction Data available for analysis
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 17 / 109
Analysis methods Identifying gene modules
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 18 / 109
Analysis methods Identifying gene modules
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 19 / 109
Analysis methods Identifying gene modules
Condition 1 Condition 2 pm1a Six6os1 Six6 Six1 Six4 Mnat1 Trmt5 Tmem30b Prkch Hif1a Snapc1 Syt16 Dbpht2 Kcnh5 Rhoj Gphb5 Ppp2r5e Sgpp1 Esr2 Ttc9 Tex21 Mthfd1 Zbtb25 Zbtb1
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 20 / 109
Analysis methods Identifying gene modules
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 21 / 109
Analysis methods Identifying gene modules
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 21 / 109
Analysis methods Identifying gene modules
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 22 / 109
Analysis methods Identifying gene modules
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 23 / 109
Analysis methods Modeling regulatory elements
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 24 / 109
Analysis methods Modeling regulatory elements
Transcription Factor ACGTGACACAATTGGCATACGATCTACGTACAA Binding site
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 25 / 109
Analysis methods Modeling regulatory elements
ACAACGTACATGATGTGCCCAGTC CACGTTTTTTAACACCGTGCCAAT T T A C G G TC C CCACGTGACGTAACCTGCATCACA A G T T C C C A T ACACGTGACCCAATATATGGACTT AGTCTCGACAGCCTTCCCTTCGCG CAACCATGCACGAATTGAATTAAT TTT C C TG G A GATCATCATCATTGTGCAGCAGTC CG CC G C T C G TGAAGAGAGAGAACATGACAACGA TGCGTATAACCCCATGATGCCCGA GATGACCAACACACACCACACCAG A C G C T T GC A
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 25 / 109
Analysis methods Modeling regulatory elements
G C C A T C T G T G C C A T C C G C G C C A T C T T G G C C A T G T A C G C C A T A T T T G C C A T C T T T G A C A T T T T G T C C A T T T T G T C T A G G T T T G C T C C A T T T T C C A T G G T T G C C A T C T T G G C C A T T T T G G C C A T C T T G A C C A T G T C A T C C A T G T G T G C C A T C A C A
G C C A T C T T G
Consensus sequence Alignment of binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 26 / 109
Analysis methods Modeling regulatory elements
G C C A T C T G T G C C A T C C G C G C C A T C T T G G C C A T G T A C G C C A T A T T T G C C A T C T T T G A C A T T T T G T C C A T T T T G T C T A G G T T T G C T C C A T T T T C C A T G G T T G C C A T C T T G G C C A T T T T G G C C A T C T T G A C C A T G T C A T C C A T G T G T G C C A T C A C A
D M Y M B N N N N
M ⇒ A or C V ⇒ A, C or G R ⇒ A or G H ⇒ A, C or T W ⇒ A or T D ⇒ A, G or T S ⇒ C or G B ⇒ C, G or T Y ⇒ C or T N ⇒ A, C, G or T K ⇒ G or T
Degenerate nucleotides Degenerate consensus
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 26 / 109
Analysis methods Modeling regulatory elements
G C C A T C T G T G C C A T C C G C G C C A T C T T G G C C A T G T A C G C C A T A T T T G C C A T C T T T G A C A T T T T G T C C A T T T T G T C T A G G T T T G C T C C A T T T T C C A T G G T T G C C A T C T T G G C C A T T T T G G C C A T C T T G A C C A T G T C A T C C A T G T G T G C C A T C A C A 1 2 3 4 5 6 7 8 9 A 1 1 16 2 1 1 2 C 16 15 1 1 7 1 2 2 G 12 1 5 1 3 6 T 4 2 15 3 14 11 7
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 27 / 109
Analysis methods Modeling regulatory elements
1 2 3 4 5 6 7 8 9 A 1 1 16 2 1 1 2 C 16 15 1 1 7 1 2 2 G 12 1 5 1 3 6 T 4 2 15 3 14 11 7
1 2 3 4 5 6 7 8 9 A 0.06 0.06 0.00 0.94 0.00 0.12 0.06 0.06 0.12 C 0.00 0.94 0.88 0.06 0.06 0.41 0.06 0.12 0.12 G 0.71 0.00 0.00 0.00 0.06 0.29 0.06 0.18 0.35 T 0.24 0.00 0.12 0.00 0.88 0.18 0.82 0.65 0.41
(normalized counts)
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 27 / 109
Analysis methods Modeling regulatory elements
1 2 3 4 5 6 7 8 9 A 1 1 16 2 1 1 2 C 16 15 1 1 7 1 2 2 G 12 1 5 1 3 6 T 4 2 15 3 14 11 7
A
A
C
G
A C G
A
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 28 / 109
Analysis methods Modeling regulatory elements
weblogo.berkeley.edu
1 2 bits 5′ 1 A
2
A
3
4
C
5
G C
6 7
G C
A
8
AC
G
9
G
T
3′
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 28 / 109
Analysis methods Modeling regulatory elements
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 29 / 109
Analysis methods Predicting binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 30 / 109
Analysis methods Predicting binding sites
1 2 3 4 5 6 7 8 9 A 0.06 0.06 0.00 0.94 0.00 0.12 0.06 0.06 0.12 C 0.00 0.94 0.88 0.06 0.06 0.41 0.06 0.12 0.12 G 0.71 0.00 0.00 0.00 0.06 0.29 0.06 0.18 0.35 T 0.24 0.00 0.12 0.00 0.88 0.18 0.82 0.65 0.41 T C T A T G T T T ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ 0.24 × 0.94 × 0.12 × 0.94 × 0.88 × 0.29 × 0.82 × 0.65 × 0.41 = 0.001419188
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 31 / 109
Analysis methods Predicting binding sites
1 2 3 4 5 6 7 8 9 A 0.06 0.06 0.00 0.94 0.00 0.12 0.06 0.06 0.12 C 0.00 0.94 0.88 0.06 0.06 0.41 0.06 0.12 0.12 G 0.71 0.00 0.00 0.00 0.06 0.29 0.06 0.18 0.35 T 0.24 0.00 0.12 0.00 0.88 0.18 0.82 0.65 0.41 T C T A T G T T T ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ 0.24 × 0.94 × 0.12 × 0.94 × 0.88 × 0.29 × 0.82 × 0.65 × 0.41 = 0.001419188
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 31 / 109
Analysis methods Predicting binding sites
1 2 3 4 5 6 7 8 9 A 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 C 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 G 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 T 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 T C T A T G T T T ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ 0.2 × 0.3 × 0.2 × 0.2 × 0.2 × 0.3 × 0.2 × 0.2 × 0.2 = 0.00000152
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 32 / 109
Analysis methods Predicting binding sites
1 2 3 4 5 6 7 8 9 A
2.2
C
1.6 1.5
0.4
G 1.2
0.2 T 0.2
2.1
2.0 1.6 1.0
log
probability from motif probability from base composition
= log 0.94
0.30
A 0.20 C 0.30 G 0.30 T 0.20 1 2 3 4 5 6 7 8 9 A 0.06 0.06 0.00 0.94 0.00 0.12 0.06 0.06 0.12 C 0.00 0.94 0.88 0.06 0.06 0.41 0.06 0.12 0.12 G 0.71 0.00 0.00 0.00 0.06 0.29 0.06 0.18 0.35 T 0.24 0.00 0.12 0.00 0.88 0.18 0.82 0.65 0.41
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 33 / 109
Analysis methods Predicting binding sites
1 2 3 4 5 6 7 8 9 A
2.2
C
1.6 1.5
0.4
G 1.2
0.0
0.2 T 0.2
2.1
2.0 1.6 1.0 ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ 0.2 + 1.6
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 34 / 109
Analysis methods Predicting binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 35 / 109
Analysis methods Predicting binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 35 / 109
Analysis methods Predicting binding sites
VS.
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 36 / 109
Analysis methods Predicting binding sites
VS.
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 36 / 109
Analysis methods Predicting binding sites
VS.
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 36 / 109
Analysis methods Predicting binding sites
VS.
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 36 / 109
Analysis methods Predicting binding sites
OOPS One Occurrence Per Sequence ZOOPS Zero Or One Occurrence Per Sequence TCM Two Component Mixture (any number per sequence)
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 37 / 109
Analysis methods Predicting binding sites
Foreground sequences
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 38 / 109
Analysis methods Predicting binding sites
Foreground sequences Background sequences
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 38 / 109
Analysis methods Predicting binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 39 / 109
Analysis methods Predicting binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 40 / 109
Analysis methods Conservation of regulatory elements
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 41 / 109
Analysis methods Conservation of regulatory elements
chr19: RefSeq Genes Conservation mouse rat rabbit dog armadillo elephant
chicken x_tropicalis tetraodon 50518000 50518500 50519000 UCSC Known Genes Based on UniProt, RefSeq, and GenBank mRNA RefSeq Genes Vertebrate Multiz Alignment & Conservation (17 Species) CKM
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 42 / 109
Analysis methods Conservation of regulatory elements
Conserved regions Non−conserved sites Conserved sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 43 / 109
Analysis methods Conservation of regulatory elements
Non−conserved sites Conserved sites Conservation profile
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 43 / 109
Analysis methods Conservation of regulatory elements
Gaps Human Chimp Rhesus Bushbaby TreeShrew Mouse Rat GuineaPig Rabbit Shrew Hedgehog Dog Cat Horse Cow Armadillo Elephant Tenrec Opossum Platypus Vertebrate Multiz Alignment & PhastCons Conservation (28 Species) 1 4 A C C A C G A A C A T G C C G G T A C A T G T T T G T T T A C C A C G A A C A T G C C G G T A C A T G T T T G T T T A C C A C G A A C A T G C C G G T A C A T G T T T G T T T A C C A T G A A C T T G C C T G T A C A T G T T T G T T T A C C A C G G A C A T G C T G G T A C A T G C T T G T T T A C C A A G A A C A T G C C G G T A C A T G T T T G T T T A C G A G G A A C A T G C C G G T A C A T G T T T G T T T A A C A C G A A C G T G C C G G G A C A T G T T T G T T T A C C A C G A A C A T G T C G G T A C A T G T T T G T T T C C C A T G A A C A G G T C G G T A C A T G T T T G T T T A C C A T G A A C A G G C T G G A A C A T G T T T G T T T A C C A C G A A C A T G C C G G T A C A T G T T T G T T T A C C A C G A A C A T G C C A G T A C A T G T C T G T T T A C C A C G A A C A T G C C A C T A C A T G T T T G T T T A C C C A G A A C A C A C C A G T A C A T G T T T G T T A A C C G C G A A C A T G C C G G T A C A T G T T T G T T T A C T G G G C A C T T G C A G G T A C T T G T T T G T T T A C C G G G A A C T T G C C A G T A C A T A T T T G T T T C C T G A G A A C A T G C C A G T A C A T G T T T G T T T A C C
T G G T A C A C A T T T A T T T
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 43 / 109
Analysis methods Conservation of regulatory elements
Human Chimp Mouse Rat Dog Cow Frog Present−day Ancestral Binding sites:
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 44 / 109
Analysis methods Conservation of regulatory elements
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 45 / 109
Analysis methods Motif discovery
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 46 / 109
Analysis methods Motif discovery
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 47 / 109
Analysis methods Motif discovery
TTTGT
and their occurrences Table of words
AAGTCTACATGAAAAGGATGGTTTCTTGGAGCTTCCACAAACTTAAAAATGGATTCAACATCTATTATTGCTACTATTGT TCTCCCGGGCTGGCAGCAGGGCCCCAGCGGCACCATGTCTATGGATTCCGGAGTCACCGTGGCCCTGCTGGTGTGGGCGG TCTCTGGCAGTAGGCACCAGGGCTGGAATGGGATGGATTCCCGGCTCCCCATGGCAGTGGGTGACGCTGCTGCTGGGGCT AGCCGTTCCCAGCTTGACTTTCCCCTTTAGCCTAGTGATTTGGGGGCCCCAAGGTTTATTTTCCTTTCGCGTAGCTTCGC GGGTGGGGTTGGGAGGAAACCCTTATCTGTGGCCGATGGCCCTCCGTTGTGAGTCTATTAAAACTCTGGGAAACTGCTAT AAGACCCTGAGAAGCAAATCTTTAATTTTTTTGTTTTTGTGAGACGGAGCACTCTGTCGCCCAGGCTAGAGTGCAATTAG GGTGCAATCTCGGCTCACTGGAACCTCCGCCTCCTGAGTCCAAGCGATTCTCCTGCCTCAGCCTCCCGAGTAGCTGGTTA AAGTCTACATGGAGTCGATGGTTTCTTGGAGCTTCCACAAACTTAAAACCATGAAACATCTATTATTGCTACTATTGTTA TAAATAAATTCATCTGATCAAAAGAAATTTAAAAACCAACCAACCCTAATGAGCTCTAAAGACAGCAGAGTCACACGCGA AGGAGCGGCGCCTTCACCCTCCGGCCTCAGCCCGCGAGGCTGCAACCCTTTCCGCACCTGGCTCCATCTCCCTGGCCCTC GGAGCGAGAAGGCGGCGGGGGATCTGGCGCCCGGCTTAGGGGCGAGACGGCCGCACCGGGAGCCTAGCGATCAGGGCACC AGAAGGGTGCCCTGTCCTGGGAGTCCCTTTTGCAGCCACTCAGATGTGCTGCTGCGGTGTCCTTTGTGCTGGTGGCAGCC GCCACGCCGCCGTGAGCCCCGCCCAACATAGCCCCAGGAGTCGCTTCGCGTGTAGAAGCGTCCGGGTGGCGGAGGCCGCA TGTGTCCTGGTGTCTTCTCTCCTCAGCCTGTTTCTCATCCTGGAAACATGAGGTGTGCTGGCGCAGGGCGATAGCGCAGTG
AAAAA AAAAC AAAAG AAAAT AAACA TTTTG TTTTT AAACC AAACG
521 534 366 501 718 ??? 521
GAGGT GAGTA GAGTC GAGTG GAGTT TTTTC TTTTA TTTGG
622
current word
GAGTC
847 243
AGTAGAGACTGGAGTCACCATGTTGGCCAGGCTGGTCTCGAACTCCTGACCCCAAGTGATCCACCTGCCTCAGCCTCTT
For each word of width k: Apply statistics to counts count number of occurrences
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 48 / 109
Analysis methods Motif discovery
Start with a given motif and a set of occurrences
TCCATTTTG TCTAGGTTT GCTCCATTT TCCATGGTT GCCATCTTG GCCATTTTG GCCATCTTG ACCATGTCA GCCATGACA TCCATGTGT GACATTTTG GCCATCTTT
A C G T
1 1 11 1 1 2 11 10 1 1 3 2 7 1 5 1 1 5 4 2 10 3 10 9 5
GATCATTCCTGGAAACCGCCTACTCAGGGCAGAGGTACAGAAAGAAAAGATTGCTCTTGAAAGTTGCCTGTCTTTCCTC AAGTCTACATGAAAAGGATGGTTTCTTGGAGCTTCCACAAACTTAAAACCATGAAACATCTATTATTGCTACTATTGT AATGCAGGTGTGGCGGGCCCTGGCCTCTGCACCCTCATAGAGGGGCTCAACAGCATCAACAGAAGGTGGGGGAGCAGAAGGT TCTCCCGGGCTGGCAGCAGGGCCCCAGCGGCACCATGTCTGCCCTCGGAGTCACCGTGGCCCTGCTGGTGTGGGCGG AGTGCACGAAGACGCTGTCGGGAGAGCCCAGGATTCAACACGGGCCTTGAGAAATGTGAGTAAGGGTGATGGGCAACCA TCTCTGGCAGTAGGCACCAGGGCTGGAATGGGGCCGCCCGGCTCCCCATGGCAGTGGGTGACGCTGCTGCTGGGGCT TCCCACATGGGATTCTTATCAAGTAGGATTATGCAGTGCTTTTCTTTCTGTGTCTGATTTATTTCACTTAACATGATGTG TTTAGTAAAACAAAGTTAGCTTAGTTGTGGGAATTATTTAAAAGGAGCTCTTACCAGGTCAGCTTCCTTCGGTGTTGCGG CTAGGCCGCCTGTCTCCTACCCATACTTAGAGGCCCCGCTCAGACGGTCCTTAAAACGTCTGAAAGGCCGTTCCTGCCA GTGCCCTGAGTTCTGAGGCAGAGAGGAGGACAGAAGAAACAAGAGGCTGGAGATTGTCAAATTCAGTATCCCAGTTG ATGGATTCTCTTGTGGTCCTTGTGCTCTGTCTCTCATGTTTGCTTCTCCTTTCACTCTGGAGACAGAGCTCTGGGAG ACATGCTAACCGGAATCCCTAGGCCGCCTGTCTCCTACCCATACTTAGAGGCCCCGCTCAGACGGTCCTTAAAACGTCT
GCCATCTTT GACATTTTG TCTAGGTTT GCCATCTTG TCCATGGTT GCTCCATTT TCCATTTTG GCCATTTTG GCCATGACA TCCATGTGT ACCATGTCA GCCATCTTG Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 49 / 109
Analysis methods Motif discovery
Iterate these steps: 1) Sample a new occurrence from one sequence
TCCATTTTG TCTAGGTTT GCTCCATTT TCCATGGTT GCCATCTTG GCCATTTTG GCCATCTTG ACCATGTCA GCCATGACA TCCATGTGT GACATTTTG GCCATCTTT
A C G T
1 1 11 1 1 2 11 10 1 1 3 2 7 1 5 1 1 5 4 2 10 3 10 9 5
GATCATTCCTGGAAACCGCCTACTCAGGGCAGAGGTACAGAAAGAAAAGATTGCTCTTGAAAGTTGCCTGTCTTTCCTC AAGTCTACATGAAAAGGATGGTTTCTTGGAGCTTCCACAAACTTAAAACCATGAAACATCTATTATTGCTACTATTGT AATGCAGGTGTGGCGGGCCCTGGCCTCTGCACCCTCATAGAGGGGCTCAACAGCATCAACAGAAGGTGGGGGAGCAGAAGGT TCTCCCGGGCTGGCAGCAGGGCCCCAGCGGCACCATGTCTGCCCTCGGAGTCACCGTGGCCCTGCTGGTGTGGGCGG AGTGCACGAAGACGCTGTCGGGAGAGCCCAGGATTCAACACGGGCCTTGAGAAATGTGAGTAAGGGTGATGGGCAACCA TCCCACATGGGATTCTTATCAAGTAGGATTATGCAGTGCTTTTCTTTCTGTGTCTGATTTATTTCACTTAACATGATGTG TTTAGTAAAACAAAGTTAGCTTAGTTGTGGGAATTATTTAAAAGGAGCTCTTACCAGGTCAGCTTCCTTCGGTGTTGCGG CTAGGCCGCCTGTCTCCTACCCATACTTAGAGGCCCCGCTCAGACGGTCCTTAAAACGTCTGAAAGGCCGTTCCTGCCA GTGCCCTGAGTTCTGAGGCAGAGAGGAGGACAGAAGAAACAAGAGGCTGGAGATTGTCAAATTCAGTATCCCAGTTG ATGGATTCTCTTGTGGTCCTTGTGCTCTGTCTCTCATGTTTGCTTCTCCTTTCACTCTGGAGACAGAGCTCTGGGAG ACATGCTAACCGGAATCCCTAGGCCGCCTGTCTCCTACCCATACTTAGAGGCCCCGCTCAGACGGTCCTTAAAACGTCT TCTCTGGCAGTAGGCACCAGGGCTGGAATGGGGCCGCCCGGCTCCCCATGGCAGTGGGTGACGCTGCTGCTGGGGCT
Probability of selecting particular site related to strength of match to matrix
GCCATCTTT GACATTTTG TCTAGGTTT GCCATCTTG TCCATGGTT GCTCCATTT TCCATTTTG GCCATTTTG GCCATGACA TCCATGTGT ACCATGTCA GCCATCTTG GCCATCTTT Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 49 / 109
Analysis methods Motif discovery
Iterate these steps: 1) Sample a new occurrence from one sequence 2) Update the matrix based on new occurrence
TCCATTTTG TCTAGGTTT GCTCCATTT TCCATGGTT GCCATCTTG GCCATTTTG GCCATCTTG ACCATGTCA GCCATGACA TCCATGTGT GACATTTTG GCCATCTTT
A C G T
1 1 11 1 1 2 11 10 1 1 3 2 7 1 5 1 1 5 4 2 10 3 10 9 5
GATCATTCCTGGAAACCGCCTACTCAGGGCAGAGGTACAGAAAGAAAAGATTGCTCTTGAAAGTTGCCTGTCTTTCCTC AAGTCTACATGAAAAGGATGGTTTCTTGGAGCTTCCACAAACTTAAAACCATGAAACATCTATTATTGCTACTATTGT AATGCAGGTGTGGCGGGCCCTGGCCTCTGCACCCTCATAGAGGGGCTCAACAGCATCAACAGAAGGTGGGGGAGCAGAAGGT TCTCCCGGGCTGGCAGCAGGGCCCCAGCGGCACCATGTCTGCCCTCGGAGTCACCGTGGCCCTGCTGGTGTGGGCGG AGTGCACGAAGACGCTGTCGGGAGAGCCCAGGATTCAACACGGGCCTTGAGAAATGTGAGTAAGGGTGATGGGCAACCA TCCCACATGGGATTCTTATCAAGTAGGATTATGCAGTGCTTTTCTTTCTGTGTCTGATTTATTTCACTTAACATGATGTG TTTAGTAAAACAAAGTTAGCTTAGTTGTGGGAATTATTTAAAAGGAGCTCTTACCAGGTCAGCTTCCTTCGGTGTTGCGG CTAGGCCGCCTGTCTCCTACCCATACTTAGAGGCCCCGCTCAGACGGTCCTTAAAACGTCTGAAAGGCCGTTCCTGCCA GTGCCCTGAGTTCTGAGGCAGAGAGGAGGACAGAAGAAACAAGAGGCTGGAGATTGTCAAATTCAGTATCCCAGTTG ATGGATTCTCTTGTGGTCCTTGTGCTCTGTCTCTCATGTTTGCTTCTCCTTTCACTCTGGAGACAGAGCTCTGGGAG ACATGCTAACCGGAATCCCTAGGCCGCCTGTCTCCTACCCATACTTAGAGGCCCCGCTCAGACGGTCCTTAAAACGTCT TCTCTGGCAGTAGGCACCAGGGCTGGAATGGGGCCGCCCGGCTCCCCATGGCAGTGGGTGACGCTGCTGCTGGGGCT
11 1 11 12 4
stronger motif will move matrix toward Usually the changes
GCCATCTTT GACATTTTG TCTAGGTTT GCCATCTTG TCCATGGTT GCTCCATTT TCCATTTTG GCCATTTTG GCCATGACA TCCATGTGT ACCATGTCA GCCATCTTG GCCATCTTT Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 49 / 109
Analysis methods Motif discovery
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 50 / 109
Analysis methods Motif discovery
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 50 / 109
Analysis methods Motif discovery
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 51 / 109
Analysis methods Motif discovery
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 51 / 109
Analysis methods Motif discovery
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 51 / 109
Analysis methods Cis-regulatory modules
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 52 / 109
Analysis methods Cis-regulatory modules
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 53 / 109
Analysis methods Cis-regulatory modules
(Yuh et al., 2001)
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 53 / 109
Analysis methods Cis-regulatory modules
chr2: module STS Markers RefSeq Genes Exoniphy ExonWalk Spliced ESTs Conservation mouse rat rabbit dog armadillo elephant
chicken x_tropicalis tetraodon SNPs RepeatMasker 236860000 236865000 236870000 236875000 236880000 236885000 PReMod Predicted Regulatory Modules STS Markers on Genetic (blue) and Radiation Hybrid (black) Maps UCSC Known Genes (June, 05) Based on UniProt, RefSeq, and GenBank mRNA RefSeq Genes Mammalian Gene Collection Full ORF mRNAs Exoniphy Human/Mouse/Rat/Dog ExonWalk Alt-Splicing Transcripts Human mRNAs from GenBank Human ESTs That Have Been Spliced Vertebrate Multiz Alignment & Conservation Simple Nucleotide Polymorphisms (dbSNP build 125) Repeating Elements by RepeatMasker GBX2 ASB18 AF118452 AK123854
Occurrences tightly clustered Far from gene
Strong occurrences Highly conserved region PReMod (Blanchette et al, 2006)
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 54 / 109
Analysis methods Cis-regulatory modules
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 55 / 109
Analysis methods Cis-regulatory modules
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 56 / 109
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 57 / 109
Overview
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 58 / 109
Analyzing sets of co-regulated genes
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 59 / 109
Analyzing sets of co-regulated genes An example gene module
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 60 / 109
Analyzing sets of co-regulated genes An example gene module
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 61 / 109
Analyzing sets of co-regulated genes An example gene module
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 62 / 109
Analyzing sets of co-regulated genes An example gene module
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 63 / 109
Analyzing sets of co-regulated genes Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 64 / 109
Analyzing sets of co-regulated genes Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 65 / 109
Analyzing sets of co-regulated genes Identifying enriched known motifs
Background sequences Foreground sequences
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 65 / 109
Analyzing sets of co-regulated genes Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 66 / 109
Analyzing sets of co-regulated genes Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 67 / 109
Analyzing sets of co-regulated genes Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 67 / 109
Analyzing sets of co-regulated genes Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 67 / 109
Analyzing sets of co-regulated genes Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 67 / 109
Analyzing sets of co-regulated genes Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 67 / 109
Analyzing sets of co-regulated genes Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 68 / 109
Analyzing sets of co-regulated genes Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 69 / 109
Analyzing sets of co-regulated genes Identifying enriched known motifs
Name Logo Sn Sp Error p-value 1. NFKB1
C
G
A
G
C
T
G T
0.743 0.603 0.327 2. RELA
T
C
T
T
C
G
A
0.686 0.655 0.33 3. NF-kappaB GG
A
G
T
C
G
A
A
C G
T
A
0.4 0.9 0.35 0.002 4. Dorsal 1 T
T
C T
A C
G
A
G
A
A
T
A G
T
0.886 0.413 0.351 5. REL
C T
A
T
C
T
G
T
C G
A
0.314 0.956 0.365 0.008 6. En1
C
G
T
C G
C
A
A
A G
A G
T
0.686 0.584 0.365 0.009 7. IRF2
T
C
A
G
G
T
C T
G
A
G
T
C G
T
G
C
T A
G
T
A
0.371 0.872 0.378 0.015 8. TBP
T
A
C G
A G
C
T
T
C
A G
T A
C G
T
A
G
C
T
A
C G
T A
C G
A
T
C
G
T A
C
G
0.371 0.867 0.381 0.018 9. Dorsal 2
C T
C
C
A
0.429 0.798 0.387 0.032 10. ZNF42 5-13
A
C
A
C
T
C
C
A
T
G
C T
G
0.629 0.59 0.391 0.03
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 70 / 109
Analyzing sets of co-regulated genes Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 71 / 109
Analyzing sets of co-regulated genes Predicting functional binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 72 / 109
Analyzing sets of co-regulated genes Predicting functional binding sites
Sequences Where we will search (e.g. promoters) Genomeic Regions Where functional sites are not likely (e.g. inside CDS) Alignments for conservation in sequences searched Motif library Known or novel motifs whose sites we want to identify 1) Identify candidate sites Scan sequences for sites scoring above the cutoff for each motif. 2) Filter by location Eliminate candidate sites
regions. 3) Filter by conservation Eliminate candidate sites without desired conservation properties. Predicted sites Final set of predicted sites; to be evaluated experimentally
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 73 / 109
Analyzing sets of co-regulated genes Predicting functional binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 74 / 109
Analyzing sets of co-regulated genes Predicting functional binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 75 / 109
Analyzing sets of co-regulated genes Predicting functional binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 76 / 109
Analyzing sets of co-regulated genes Predicting functional binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 77 / 109
Analyzing sets of co-regulated genes Predicting functional binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 78 / 109
Analyzing sets of co-regulated genes Predicting functional binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 79 / 109
Analyzing sets of co-regulated genes Predicting functional binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 80 / 109
Analyzing sets of co-regulated genes Predicting functional binding sites
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 81 / 109
Analysis of transcription factor localization data
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 82 / 109
Analysis of transcription factor localization data ChIP-chip data examples
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 83 / 109
Analysis of transcription factor localization data ChIP-chip data examples
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 84 / 109
Analysis of transcription factor localization data ChIP-chip data examples
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 85 / 109
Analysis of transcription factor localization data ChIP-chip data examples
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 86 / 109
Analysis of transcription factor localization data ChIP-chip data examples
composition
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 87 / 109
Analysis of transcription factor localization data Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 88 / 109
Analysis of transcription factor localization data Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 89 / 109
Analysis of transcription factor localization data Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 90 / 109
Analysis of transcription factor localization data Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 91 / 109
Analysis of transcription factor localization data Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 92 / 109
Analysis of transcription factor localization data Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 93 / 109
Analysis of transcription factor localization data Identifying enriched known motifs
Acc TF Err Sens Spec p-val FD Logo 1 MA0045 HMG-IY 0.348 0.60 0.70 0.000 0.88
T
A
G
C
G
T
A
C G
T
T
A
G
C
G
T
G
T
G
A C
T
C T
A
G
T
A
C
G
G
C
A
C T
G
A
G
T
G
T
C
MA0120 ID1 0.416 0.43 0.74 0.000 0.92
A C
G
C
G
A
C
G
G
C
G
T
A
G
T
G
A
C
A C G
C
A
C G
T
A
T
2 MA0041 Foxd3 0.355 0.72 0.57 0.000 0.89
C
T
A
C
T
C
A
G
A
G
A
C
C
G
MA0042 FOXI1 0.386 0.76 0.47 0.000 0.86
C
A
T
G
A
T C
G
C
G
T
ATA
G
C
G
A
C
A
G
3 MA0013 Broad-complex4 0.381 0.61 0.63 0.000 0.88 A C
G
T
G
T
T
T
4 MA0010 Broad-complex1 0.383 0.67 0.56 0.000 0.85
C T
A
G
A C G
G T
C
G T
C
A G
G T
G
A G T
C
5 MA0082 SQUA 0.385 0.61 0.62 0.000 0.88
GT
A
C
T
T
G
C
T
A
T
T
A
A G
A CC
T
A
C
G
T
A G
T
1 MA0123 ABI4 0.435 0.32 0.81 0.000 0.92 CA
C
G
G
T
A
T G
A
G
T
A
G
T
2 MA0003 TFAP2A 0.443 0.50 0.61 0.000 0.95 GCCA
G
T
A
T C G
T
C
A
T
C
A
T C
A
A
T
3 MA0048 NHLH1 0.445 0.62 0.49 0.000 0.84
T A C
G
G
A
A
C
T
G
A
T G
A
T
A
G
C
T
4 MA0117 MafB 0.448 0.45 0.66 0.000 0.93 G
A G
T
G
C
A
T
T
C
A
T
C
A
A
T
5 MA0028 ELK1 0.449 0.27 0.83 0.000 0.91 T
A C
G
C G
T
A
C
T
A G
G
T
A
G T
A
A
G
C
T
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 94 / 109
Analysis of transcription factor localization data Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 95 / 109
Analysis of transcription factor localization data Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 96 / 109
Analysis of transcription factor localization data Identifying enriched known motifs
Acc TF Err Sens Spec p-val FD Logo 1 MA0060 NF-Y 0.327 0.64 0.71 0.000 0.86
T
G
A C
A
G
C
T
A
G
T
C
TC
G
A
C TA GCA
A
G
C
TC
G
A
T A
C
G
T
G
A C
C
T
A
G
A
T G C 2 MA0024 E2F1 0.350 0.72 0.57 0.000 0.86 TTTC
C
G
3 MA0080 SPI1 0.359 0.61 0.68 0.000 1.00
T
C
T
A
T A
A
MA0028 ELK1 0.364 0.58 0.69 0.000 0.93 T
A C
G
C G T
A
C
T
A G
G
T
A
G T
A
A
G
C
T
MA0076 ELK4 0.373 0.62 0.64 0.000 0.85
C T
G
C
A
A
C
MA0026 E74A 0.399 0.42 0.78 0.000 0.98
T
A G
C T
MA0062 GABPA 0.412 0.64 0.54 0.000 0.85 C G
G
C G
T
T
C
4 MA0021 Dof3 0.375 0.45 0.80 0.000 1.00 AAAG
G
MA0020 Dof2 0.449 0.91 0.19 0.006 0.95 AAAG G T
A
G
MA0053 MNB1A 0.449 0.91 0.19 0.000 1.00 AAAG
MA0064 PBF 0.449 0.91 0.19 0.003 1.00 AAAG
A G
5 MA0123 ABI4 0.398 0.81 0.40 0.000 0.91 CA
C
G
G
T
A
T G
A
G
T
A
G
T
6 MA0018 CREB1 0.406 0.43 0.76 0.000 0.87
T G
C
A
G
T C
C
G
A
C
C
T
A
GTT
C
T
G
A
C
MA0096 bZIP910 0.433 0.45 0.68 0.000 0.86
G
7 MA0034 GAMYB 0.420 0.59 0.57 0.000 0.90 A
T
C
T
C
G
A
C G
A
G
G
A
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 97 / 109
Analysis of transcription factor localization data Identifying enriched known motifs
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 98 / 109
Analysis of transcription factor localization data Identifying enriched known motifs
Acc TF Err Sens Spec p-val FD Logo 1 MA0060 NF-Y 0.321 0.64 0.71 0.000 0.86
T
G
A C
A
G
C
T
A
G
T
C
TC
G
A
C TA GCA
A
G
C
TC
G
A
T A
C
G
T
G
A C
C
T
A
G
A
T G C 2 MA0024 E2F1 0.344 0.73 0.58 0.000 0.86 TTTC
C
G
3 MA0080 SPI1 0.347 0.63 0.68 0.000 1.00
T
C
T
A
T A
A
MA0028 ELK1 0.367 0.58 0.69 0.000 0.93 T
A C
G
C G T
A
C
T
A G
G
T
A
G T
A
A
G
C
T
MA0076 ELK4 0.373 0.54 0.72 0.000 0.86
C T
G
C
A
A
C
MA0062 GABPA 0.406 0.66 0.53 0.000 0.85 C G
G
C G
T
T
C
4 MA0021 Dof3 0.374 0.46 0.80 0.000 1.00 AAAG
G
MA0053 MNB1A 0.448 0.91 0.19 0.000 1.00 AAAG
MA0064 PBF 0.448 0.91 0.19 0.000 1.00 AAAG
A G
5 MA0123 ABI4 0.386 0.83 0.39 0.000 0.91 CA
C
G
G
T
A
T G
A
G
T
A
G
T
6 MA0018 CREB1 0.396 0.45 0.76 0.000 0.88
T G
C
A
G
T C
C
G
A
C
C
T
A
GTT
C
T
G
A
C
MA0096 bZIP910 0.434 0.30 0.83 0.002 0.88
G
7 MA0034 GAMYB 0.407 0.62 0.57 0.000 0.90 A
T
C
T
C
G
A
C G
A
G
G
A
MA0100 Myb 0.424 0.57 0.58 0.000 0.91
C
T
A
T
A
A
A
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 99 / 109
Analysis of transcription factor localization data Identifying co-factors
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 100 / 109
Analysis of transcription factor localization data Identifying co-factors
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 101 / 109
Analysis of transcription factor localization data Identifying co-factors
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 102 / 109
Analysis of transcription factor localization data Identifying co-factors
Acc TF Err Sens Spec p-val FD Logo 1 MA0060 NF-Y 0.390 0.37 0.85 0.000 0.82
T
G
A C
A
G
C
T
A
G
T
C
TC
G
A
C TA GCA
A
G
C
TC
G
A
T A
C
G
T
G
A C
C
T
A
G
A
T G C 2 MA0037 GATA3 0.417 0.65 0.52 0.000 0.90
G
C
G
G
T
C
T
A
MA0036 GATA2 0.433 0.61 0.53 0.002 0.95 T
A C
C
C
G
MA0070 Pbx 0.437 0.70 0.42 0.016 0.70
G
A T
C
A G T
C
C G
C
C
C T
T
A
C G
T
A
MA0094 Ubx 0.438 0.81 0.31 0.002 0.83
A
G
C
T
T
G
G
C
3 MA0011 Broad-complex2 0.419 0.89 0.27 0.000 0.73
C G
A
G
C
T A
T
C
C
G
G
A
G
C
MA0082 SQUA 0.435 0.81 0.32 0.011 0.74
GT
A
C
T
C
T
G
A
GC
T
A
T
AT
A
CT
A
GA
T
CA G
A CC
T
A
C
AC
G
T
AC
A G
T
4 MA0110 ATHB5 0.421 0.70 0.45 0.000 0.70
AT
G
C
G
MA0075 Prrx2 0.428 0.77 0.38 0.000 0.79
TC
G
MA0008 Athb-1 0.430 0.91 0.23 0.001 0.67 A
G
T
C
T
G
5 MA0096 bZIP910 0.423 0.41 0.75 0.000 0.85
G
MA0018 CREB1 0.428 0.72 0.43 0.001 0.78
T G
C
A
G
T C
C
G
T
A
C
G
C
T
A
GTT
C
T
G
A
C
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 103 / 109
Analysis of transcription factor localization data Discovering motifs de novo
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 104 / 109
Analysis of transcription factor localization data Discovering motifs de novo
per column specified using -i
degeneracy optionally specified using -g
degeneracy optionally specified using -r
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 105 / 109
Analysis of transcription factor localization data Discovering motifs de novo
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 106 / 109
Analysis of transcription factor localization data Discovering motifs de novo
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 107 / 109
Analysis of transcription factor localization data Discovering motifs de novo
Acc Err Sens Spec FD Logo 1 DME-10-1.60-6 0.372 0.42 0.83 0.90 T
2 DME-10-1.60-10 0.373 0.42 0.83 0.98 CCT
T
G
3 DME-10-1.60-11 0.375 0.49 0.76 0.97 AGA
G A
T C
4 DME-10-1.60-28 0.422 0.60 0.56 0.90 T
5 DME-10-1.60-26 0.423 0.54 0.61 0.90 T
6 DME-10-1.60-39 0.423 0.42 0.73 0.90 CA
7 DME-10-1.60-27 0.426 0.38 0.77 0.98 A
8 DME-10-1.60-23 0.427 0.54 0.61 0.90 CA
C
9 DME-10-1.60-35 0.427 0.32 0.83 0.96 GGA
C
A
C
10 DME-10-1.60-13 0.429 0.36 0.78 0.97 T
G
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 108 / 109
Analysis of transcription factor localization data Discovering motifs de novo
Smith & Sumazin (CSHL & Columbia) Transcriptional regulatory circuits ISMB’07 109 / 109