Learning algorithms and statistical software, with applications to bioinformatics
PhD defense of Toby Dylan Hocking toby.hocking@inria.fr http://cbio.ensmp.fr/~thocking/ 20 November 2012
1
Learning algorithms and statistical software, with applications to - - PowerPoint PPT Presentation
Learning algorithms and statistical software, with applications to bioinformatics PhD defense of Toby Dylan Hocking toby.hocking@inria.fr http://cbio.ensmp.fr/~thocking/ 20 November 2012 1 Summary of contributions Ch. 2: clusterpath for
1
◮ Ch. 7: direct labels for readable statistical graphics, Best
◮ Ch. 8: documentation generation to convert comments
◮ Ch. 9: named capture regular expressions for extracting
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
◮ ℓ1 homotopy method O(pn log n). ◮ ℓ2 active-set method O(pn2). ◮ ℓ∞ Franck-Wolfe algorithm.
19
20
21
22
23
24
25
26
cghseg.k, pelt.n flsa.norm dnacopy.sd
20 40 60 80 −5 −4 −3 −2 −1 0 −2 −1 1 2 3 0.0 0.5 1.0
log10(smoothing parameter lambda) percent incorrectly predicted annotations in training set
statistic false.positive false.negative errors
<− more breakpoints fewer breakpoints −>
27
approximate optimization glad
flsa norm flsa pelt.n pelt.default cghseg.mBIC
dnacopy.alpha dnacopy prune dnacopy.sd dnacopy default
glad.lambdabreak glad.MinBkpWeight glad.default
0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4
False positive rate = probability(predict breakpoint | normal) True positive rate = probability(predict breakpoint | breakpoint)
28
80 82 84 86 88 90 92 94 96 98 100 1 5 10 15 20 25 30
29
30
31
32
32
33
34
bases/probe = 374 bases/probe = 7
6 13 14 1 7 20 1 7 20
Number of segments of estimated cghseg model Error relative to latent breakpoints
35
alpha = 0 alpha = 0.5 alpha = 1
10 −5 5 −5 5 −5 5
log10(lambda) error
bases/probe
374
36
10 20 30 40 5 10 train test −1 1 2
penalty exponent alpha total error relative to latent breaks
37
38
39
i (L)
40
41
42
43
44
45
46
47
48
49
detailed.low.density
log.s.log.d cghseg.k 2.0 2.4 2.8 5 6 7
percent test annotation error, mean +1 standard deviation model 50
51
52
53
54
55
56
57
58
59
60
61
62
63
64