Fast sparse methods for genomics data
Jean-Philippe Vert Optimization and Statistical Learning workshop, Les Houches, January 6-11, 2013
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 1 / 47
Fast sparse methods for genomics data Jean-Philippe Vert - - PowerPoint PPT Presentation
Fast sparse methods for genomics data Jean-Philippe Vert Optimization and Statistical Learning workshop, Les Houches, January 6-11, 2013 JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 1 / 47 Normal vs cancer cells What
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 1 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 2 / 47
DOE Joint Genome Institute JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 3 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 4 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 5 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 6 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 7 / 47
0.5 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2021 22 23 X Chromosome Log-ratio
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 8 / 47
100 200 300 400 500 600 700 800 900 1000 −0.2 0.2 0.4 0.6 0.8 1 1.2 1.4
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 9 / 47
100 200 300 400 500 600 700 800 900 1000 −0.2 0.2 0.4 0.6 0.8 1 1.2 1.4
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 9 / 47
100 200 300 400 500 600 700 800 900 1000 −0.2 0.2 0.4 0.6 0.8 1 1.2 1.4
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 10 / 47
100 200 300 400 500 600 700 800 900 1000 −0.2 0.2 0.4 0.6 0.8 1 1.2 1.4
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 10 / 47
100 200 300 400 500 600 700 800 900 1000 −0.2 0.2 0.4 0.6 0.8 1 1.2 1.4
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 10 / 47
100 200 300 400 500 600 700 800 900 1000 −0.2 0.2 0.4 0.6 0.8 1 1.2 1.4
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 10 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 11 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 12 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 13 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 14 / 47
1: I0 represents the interval [1, n] 2: P = {I0} 3: for i = 1 to k do 4:
I2P
5:
6:
7: end for 8: return P
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 15 / 47
1 2 3 4 5 6 7 8 9 10 x 10
5
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 signal length seconds Speed for K=1, 10, 1e2, 1e3, 1e4, 1e5
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 16 / 47
500 1000 1500 2000 2500 −0.5 0.5 500 1000 1500 2000 2500 −1 1 2 500 1000 1500 2000 2500 −2 −1 1 500 1000 1500 2000 2500 −2 2 4 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −1 −0.5 0.5 500 1000 1500 2000 2500 −1 −0.5 0.5 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −1 1
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 17 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 18 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 18 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 18 / 47
500 1000 1500 2000 2500 3000 3500 4000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 BAC Weight
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 19 / 47
100 200 300 400 500 600 700 800 900 1000 −0.5 0.5 1 1.5 100 200 300 400 500 600 700 800 900 1000 −0.5 0.5 1 1.5 100 200 300 400 500 600 700 800 900 1000 −0.5 0.5 1 1.5
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 20 / 47
100 200 300 400 500 600 700 800 900 1000 −0.5 0.5 1 1.5 100 200 300 400 500 600 700 800 900 1000 −0.5 0.5 1 1.5 100 200 300 400 500 600 700 800 900 1000 −0.5 0.5 1 1.5
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 20 / 47
100 200 300 400 500 600 700 800 900 1000 −0.5 0.5 1 1.5 100 200 300 400 500 600 700 800 900 1000 −0.5 0.5 1 1.5 100 200 300 400 500 600 700 800 900 1000 −0.5 0.5 1 1.5
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 21 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 22 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 23 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 23 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 24 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 25 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 26 / 47
10 10
2
10
4
10
6
10
8
10
−3
10
−2
10
−1
10 10
1
10
2
10
3
n time (s) GFLars GFLasso 10 10
5
10
−3
10
−2
10
−1
10 10
1
10
2
p time (s) GFLars GFLasso 10 10
1
10
2
10
3
10
−3
10
−2
10
−1
10 10
1
10
2
10
3
k time (s) GFLars GFLasso
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 27 / 47
100 200 300 400 500 600 700 800 900 1000 −2 2 4 100 200 300 400 500 600 700 800 900 1000 −2 2 4 100 200 300 400 500 600 700 800 900 1000 −2 2 4
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 28 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 29 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 30 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 31 / 47
200 400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p Accuracy: unweighted u=50 u=60 u=70 u=80 u=90 200 400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p Accuracy: weighted u=50 u=60 u=70 u=80 u=90 200 400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p Accuracy: weighted+vary u=50±2 u=60±2 u=70±2 u=80±2 u=90±2
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 32 / 47
200 400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p Accuracy U−LARS W−LARS U−Lasso W−Lasso 200 400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p Accuracy 200 400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p Accuracy
Figure 4: Multiple change-point accuracy. Accuracy as a function of the number of profiles p when change-points are placed at the nine positions {10, 20, . . . , 90} and the variance σ2 of the centered Gaussian noise is either 0.05 (left), 0.2 (center) and 1 (right). The profile length is 100.
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 33 / 47
−0.5 0.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 192021 22 Chromosome Log−ratio 1 2 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 192021 22 Chromosome Log−ratio −0.5 0.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 192021 22 Chromosome Log−ratio
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 34 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 35 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 36 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 37 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 38 / 47
(Costa et al., 2011) JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 39 / 47
(Xia et al., 2011) JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 40 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 41 / 47
+
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 42 / 47
+
1
2
3
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 43 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 44 / 47
+
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 45 / 47
1 100 10000 1e−01 1e+01 1e+03
Number of CANDIDATES Elapsed TIME (s)
NSMAP
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 46 / 47
JP Vert (ParisTech ) Sparse methods in genomics Les Houches 2013 47 / 47