MOTIFS DISTRIBUTION IN DNA SEQUENCES
St´ ephane ROBIN
robin@inapg.inra.fr UMR INA-PG / INRA, Paris Math´ ematique et Informatique Appliqu´ ees Bio-Info-Math Workshop, Tehran, April 2005
- S. Robin (Motif statistics in DNA)
MOTIFS DISTRIBUTION IN DNA SEQUENCES St ephane ROBIN - - PowerPoint PPT Presentation
MOTIFS DISTRIBUTION IN DNA SEQUENCES St ephane ROBIN robin@inapg.inra.fr UMR INA-PG / INRA, Paris Math ematique et Informatique Appliqu ees Bio-Info-Math Workshop, Tehran, April 2005 S. Robin (Motif statistics in DNA) 0 Biological
1
2
3
4
5
6
7
8
9
M1{N(w) ≥ nobs(w)} = Pr M1{N(gctt) ≥ 56}
10
11
d _ T V ( % ) 1 0 2 0 3 0 e x p e c t e d c o u n t ( l o g 1 0 )
1 2 3 4 5
d _ T V ( % ) 1 0 2 0 3 0 e x p e c t e d c o u n t ( l o g 1 0 )
1 2 3 4 5
Figure 1: T12
13
−10 −5 5 −10 −5 5 aaattt aagctt aacgtt aatatt agatct aggcct agcgct agtact acatgt acgcgt accggt actagt atatat atgcat atcgat attaat gaattc gagctc gacgtc gatatc ggatcc gggccc ggcgcc ggtacc gcatgc gcgcgc gccggc gctagc gtatac gtgcac gtcgac gttaac caattg cagctg cacgtg catatg cgatcg cggccg cgcgcg cgtacg ccatgg ccgcgg cccggg cctagg ctatag ctgcag ctcgag cttaag taatta tagcta tacgta tatata tgatca tggcca tgcgca tgtaca tcatga tcgcga tccgga tctaga ttataa ttgcaa ttcgaa tttaaa
Fig. 4.814
y−1
|w|
kp(y − z)
15
16
I×I
17
18
(d1 : d2)
number of regions containing m expected number gttgaca (16 : 18) atataat 7 2.43 10−2 gttgaca (16 : 18) tataata 8 2.23 10−2 tgttgac (16 : 18) tataata 10 2.12 10−2 ttgacaa (16 : 18) tacaat 9 9.82 10−2 ttgacaa (16 : 18) tataata 10 5.07 10−2 ttgacag (16 : 18) tataat 9 7.12 10−2 ttgacaa (17 : 19) ataataa 9 6.97 10−2 ttgttga (17 : 19) tataata 8 5.17 10−2 gttgaca (17 : 19) ataataa 8 3.09 10−2 gttgaca (17 : 19) tataata 8 2.19 10−2 cttgaca (17 : 19) tataat 8 6.04 10−2 tgttgac (17 : 19) tataata 12 2.09 10−2 tgttgac (17 : 19) atataat 7 2.29 10−2 ttgttga (18 : 20) tataata 8 5.09 10−2 gttgaca (18 : 20) ataatga 7 1.79 10−2 gttgttg (18 : 20) tataata 7 2.53 10−2 tgttgac (18 : 20) ataataa 10 2.90 10−2 tgttgac (18 : 20) atacta 7 2.77 10−2 tgttgac (19 : 21) ataataa 10 2.86 10−2 tgttgac (19 : 21) atacta 7 2.73 10−2 tgttgac (19 : 21) tataat 10 6.53 10−2 gttgact (19 : 21) ataata 8 6.25 10−2
19
20
C(x)
21
N(ℓ)
22
23
24
i = i+r−1
25
min = min i {Y r i }
x {∆N(x)}
min :
y
min ≤ y} − e−(n−r) Pr{Y r≤y}
26
i=1...222 Y r i :
27
28
i=1..220
i
29
30
31
10000 20000 30000 40000 50000 −4 −3 −2 −1 10000 20000 30000 40000 50000 −4 −3 −2 −1