Ultra-high dimensional statistics and statistical learning on some - PowerPoint PPT Presentation

Examples Linear model Mathematical ingredients LOL algo Ultra-high dimensional statistics and statistical learning on some applications Dominique Picard Universit´ e Paris-Diderot Laboratoire Probabilit´ es et Mod` eles Al´ eatoires M2MO : 20 ans !

Examples Linear model Mathematical ingredients LOL algo Plan Examples Linear model Mathematical ingredients LOL algo

Examples Linear model Mathematical ingredients LOL algo Example 1 : prediction of electrical consumption 4 (354) 20071220 with 12 coeff (Group 9) 8.6 x 10 original Model 8.4 8.2 8 7.8 7.6 7.4 7.2 0.0010% 7 0 5 10 15 20 25 30 35 40 45 50 Figure : Signal- Prediction M. Mougeot, K. Tribouley, Laurence Maillard, V. Lefieux, D.P.

Examples Linear model Mathematical ingredients LOL algo Examples of days (worst) (2009 07 14) 4 20090714 (2387) 4.6 x 10 original Model 4.4 4.2 4 3.8 3.6 3.4 mape 0.06769 3.2 0 5 10 15 20 25 30 35 40 45 50

Examples Linear model Mathematical ingredients LOL algo Example 3 : genomic

Examples Linear model Mathematical ingredients LOL algo Example 4 : Estimate a probability density on the sphere

Examples Linear model Mathematical ingredients LOL algo Example 5 : CMB

Examples Linear model Mathematical ingredients LOL algo C M B : mask

Examples Linear model Mathematical ingredients LOL algo High frequency signal FBUND FBund 20091207 123.1 123.05 123 122.95 122.9 122.85 122.8 122.75 122.7 122.65 122.6 0 500 1000 1500 2000 2500 Trading time E. Bacry

Examples Linear model Mathematical ingredients LOL algo High frequency signal FBUND FBund 20091208 123.9 123.8 123.7 123.6 123.5 123.4 123.3 123.2 123.1 123 122.9 0 500 1000 1500 2000 2500 Trading time E. Bacry

Examples Linear model Mathematical ingredients LOL algo High frequency signal FBOBL FBobl 20091207 116.3 116.25 116.2 116.15 116.1 116.05 116 115.95 0 500 1000 1500 2000 2500 Trading time E. Bacry

Examples Linear model Mathematical ingredients LOL algo High frequency signal FBOBL FBobl 20091208 117 116.9 116.8 116.7 116.6 116.5 116.4 116.3 116.2 0 500 1000 1500 2000 2500 Trading time E. Bacry

Examples Linear model Mathematical ingredients LOL algo Model

Examples Linear model Mathematical ingredients LOL algo Linear Model Observation : Y = ( Y 1 , . . . , Y n ) t Y = Φ α + ǫ R p is the unknown parameter (to be estimated) α ∈ I • ǫ = ( ǫ 1 , . . . , ǫ n ) t is a (non observed) vector of random errors. It is assumed to be variables i.i.d. N (0 , σ 2 ) • Φ is a known matrix n × p . High dimension : p >> n

Examples Linear model Mathematical ingredients LOL algo Example : genomic   1  .  .   . Y =     1 0 Φ =

Examples Linear model Mathematical ingredients LOL algo • Large random matrices : Φ is composed of n × p random variables i.i.d. N (0 , 1).

Examples Linear model Mathematical ingredients LOL algo Signal denoising FBund 20091207 123.1 123.05 123 122.95 122.9 122.85 122.8 122.75 122.7 122.65 Y = 122.6 0 500 1000 1500 2000 2500 Trading time What is Φ in this case ?

Examples Linear model Mathematical ingredients LOL algo • Statistical learning, regression estimation Y i = f ( X i ) + ǫ i + u i , i = 1 . . . n • ǫ ′ i s are i.i.d. N (0 , 1). • u i ’s possibly random, not necessarily random nor iid but ’small’. i ’s random i.i.d. taking values in a compact set of R d . • X ′ • f is the parameter to be estimated.

Examples Linear model Mathematical ingredients LOL algo To embed this problem in a linear model, we consider a dictionary R d . We assume that f D of size p , of real functions defined on I can be ’reasonably’ well approached by the dictionary functions { g ∈ D} : i.e. there exists α g tel que � f = α g g + h g ∈D where h is ’small’. Then the model writes � Y i = α g g ( X i ) + h ( X i ) + ǫ i , i = 1 , . . . , n g ∈D Y = Φ α + u + ǫ if we put u i = h ( X i ) pour i = 1 , . . . , n et Φ being the matrix with general terms Φ i ℓ = g ℓ ( X i )

Examples Linear model Mathematical ingredients LOL algo Associated problems Y = Φ α + u + ǫ n observations : Y ( n × 1), Φ ( n × p ) ◮ Estimation : determine ˆ α ◮ Selection : α ∗ = ˆ Find the significant coefficients ˆ α 1 | ˆ α | > T ◮ Predict : ˆ Y = Φˆ α

Examples Linear model Mathematical ingredients LOL algo Conditions generally required to solve the problem • ’sparsity’ of the vector α • good approximation of the ’true function’ by the dictionary • ’Coherence’ conditions on the matrix Φ

Examples Linear model Mathematical ingredients LOL algo Approximation � u i = h ( X i ) = f ( X i ) − α g g ( X i ) g ∈D Asking the u i ’s to be small means that f est well approximated by a linear combination of the dictionnary

Examples Linear model Mathematical ingredients LOL algo Sparsity conditions : what does it means to be sparse ?

Examples Linear model Mathematical ingredients LOL algo Sparsity conditions • { α ℓ } ℓ ≤ p S sparse

Examples Linear model Mathematical ingredients LOL algo Sparsity conditions • { α ℓ } ℓ ≤ p S sparse • Strict sparsity # { ℓ ∈ { 1 , . . . , p } , | α ℓ | � = 0 } ≤ S

Examples Linear model Mathematical ingredients LOL algo Sparsity conditions • { α ℓ } ℓ ≤ p S sparse • Strict sparsity # { ℓ ∈ { 1 , . . . , p } , | α ℓ | � = 0 } ≤ S • more generally � | α ℓ | q ≤ M , 0 < q < 1 ℓ

Examples Linear model Mathematical ingredients LOL algo The dictionary problem Of course sparsity is linked with the dictionary. • Fourier Basis • Wavelet basis • Needlets • Combination of ’bases’

Examples Linear model Mathematical ingredients LOL algo Fourier basis Dictionary func 1 1.5 1 0.5 0 −0.5 −1 0 5 10 15 20 25 30 35 40 45 50 Dictionary func 4 0.25 0.2 0.15 0.1 0.05 0 −0.05 −0.1 −0.15 −0.2 −0.25 0 5 10 15 20 25 30 35 40 45 50 Dictionary func 30 0.25 0.2 0.15 0.1 0.05 0 −0.05 −0.1 −0.15 −0.2 −0.25 0 5 10 15 20 25 30 35 40 45 50

Examples Linear model Mathematical ingredients LOL algo Haar wavelets Dictionary func 48 Dictionary func 49 Dictionary func 50 Dictionary func 51 0.2 0.5 0.5 0.5 0 0 0 0 −0.2 −0.5 −0.5 −0.5 0 50 0 50 0 50 0 50 Dictionary func 52 Dictionary func 53 Dictionary func 54 Dictionary func 55 0.5 0.5 0.5 0.5 0 0 0 0 −0.5 −0.5 −0.5 −0.5 0 50 0 50 0 50 0 50 Dictionary func 56 Dictionary func 57 Dictionary func 58 Dictionary func 59 0.5 0.5 0.5 0.5 0 0 0 0 −0.5 −0.5 −0.5 −0.5 0 50 0 50 0 50 0 50 Dictionary func 60 Dictionary func 61 Dictionary func 62 0.5 0.5 0.5 0 0 0 −0.5 −0.5 −0.5

Examples Linear model Mathematical ingredients LOL algo Functions defined on the sphere

Examples Linear model Mathematical ingredients LOL algo Spherical Harmonics on the sphere 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 3.5 −0.8 3 7 2.5 6 5 2 4 1.5 3 1 2 0.5 1 0 0 THETA PHI

Examples Linear model Mathematical ingredients LOL algo Needlets on the sphere (Petrushev-co-authors) 1.5 1 0.5 0 −0.5 8 6 3.5 3 2.5 4 2 1.5 2 1 0.5 0 0 PHI THETA

Examples Linear model Mathematical ingredients LOL algo Needlets associated to Jacobi polynomials on [0,1] (Petrushev-co-authors) 60 6 3 6 40 4 2 4 20 2 1 2 0 0 0 0 −20 −2 −1 −2

Examples Linear model Mathematical ingredients LOL algo Sparsity conditions and functional approximation spaces In the wavelet, needlet cases, Besov spaces are especially adapted to reflect sparsity conditions. More complex : How to translate in terms of spaces sparsity conditions for combinations of bases ? Petrushev, Narkowitch, Ward, Xu, Kyriasis ; Coulon, Kerkyacharian, Petrushev

Examples Linear model Mathematical ingredients LOL algo Conditions generally required to solve the problem • ’Sparsity’ sur le vecteur α • good approximation of the ’true function’ by the dictionary • ’Coherence’ conditions on the matrix Φ

Examples Linear model Mathematical ingredients LOL algo RIP- Coherence The raws of Φ are supposed to be normalized For C ⊂ { 1 , . . . p } , denote Φ C the matrix Φ restricted to the raws which are in C and the associated Gram-matrix M ( C ) := 1 n Φ t C Φ C RIP( m 0 , ν ) assumes that M ( C ) is almost diagonal for any C as soon as #( C ) ≤ m 0 , in the following sense : There exist 0 ≤ ν < 1 and m 0 ≥ 1 such that : R m , � x � 2 l 2 ( m ) (1 − ν ) ≤ x t M ( C ) x ≤ � x � 2 ∀ x ∈ I l 2 ( m ) (1 + ν ) ,

Examples Linear model Mathematical ingredients LOL algo Coherence. • Introduce the p × p Gram matrix : M := 1 n Φ t Φ . and the Coherence � n | 1 τ n = sup | M ℓ m | = sup Φ i ℓ Φ im | n ℓ � = m ℓ � = m i =1 Coherence = ⇒ RIP( ⌊ ν/τ n ⌋ , ν )

Ultra-high dimensional statistics and statistical learning on some - PowerPoint PPT Presentation

Examples Linear model Mathematical ingredients LOL algo Ultra-high dimensional statistics and statistical learning on some applications Dominique Picard Universit e Paris-Diderot Laboratoire Probabilit es et Mod` eles Al eatoires

ChemBioDraw Today & Tomorrow Mark L. Olson, PhD Vice-President, Software Development

Basic of Ultra Sound Dr. Yashodhara Pradeep Professor Dept. ObGyn King George Medical

Innovative Power Control for Ultra Low-Power and High- Ultra Low Power and High Performance

Ultra Ultra Long Long-Haul and High Haul and High-Capacity Capacity 40 Gbps DWDM Transmission

Development of an Ultra High Pressure Development of an Ultra High Pressure Liquid

Silica Monolith Columns for Silica Monolith Columns for Ultra High Speed Separations Ultra High

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Ultra P Ultra Petr troleum Corp. oleum Corp. Michael D. W Michael D. Watf tfor ord

A/D Conversion and A/D Conversion Filtering for Ultra Low Filtering for Ultra Low A/D

Presentation GWT TM Ultra Filtration Systems GWT Ultra filtration systems incorporate advanced

TBEN-S Ultra-Compact Multiprotocol I/O Modules Ultra-Compact Multiprotocol I/O Modules in IP67

Statistics for High-Dimensional Data: Selected Topics Peter B uhlmann Seminar f ur

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Random projections, reweighting and half-sampling for high-dimensional statistical inference Art

Radio Cherenkov Cherenkov searches for searches for cosmogenic cosmogenic ultra ultra- -

Statistical Literacy March 1998 Statistics STATISTICAL Statistics 3/99 3/99 Association

CISC323: Introduction to Software developing software that organizes the effort into a number of

TENET: Tail-Event-driven NETwork Risk Wolfgang Karl Hrdle Weining Wang Lining Yu Ladislaus

German EoI for Power Converters of SIS100 - SIS100 Dipole Power Converter 1678 k -

Boosting: more than an ensemble method for prediction Peter B uhlmann ETH Z urich

Ontstolling in de dagelijkse praktijk casustiek Alles wat u wilt weten over ontstolling en

Learning Software Engineering - Online 29 th Pacific NW Software Quality Conference Oct 10-12 2011

A Hackers guide to reducing side-channel atuack surgaces using deep-learning Google,

Spring Scales Theyre only accurate when everything is at rest Turn off all electronic

Ultra-high dimensional statistics and statistical learning on some - PowerPoint PPT Presentation

Examples Linear model Mathematical ingredients LOL algo Ultra-high dimensional statistics and statistical learning on some applications Dominique Picard Universit e Paris-Diderot Laboratoire Probabilit es et Mod` eles Al eatoires

ChemBioDraw Today &amp; Tomorrow Mark L. Olson, PhD Vice-President, Software Development

Basic of Ultra Sound Dr. Yashodhara Pradeep Professor Dept. ObGyn King George Medical

Innovative Power Control for Ultra Low-Power and High- Ultra Low Power and High Performance

Ultra Ultra Long Long-Haul and High Haul and High-Capacity Capacity 40 Gbps DWDM Transmission

Development of an Ultra High Pressure Development of an Ultra High Pressure Liquid

Silica Monolith Columns for Silica Monolith Columns for Ultra High Speed Separations Ultra High

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Ultra P Ultra Petr troleum Corp. oleum Corp. Michael D. W Michael D. Watf tfor ord

A/D Conversion and A/D Conversion Filtering for Ultra Low Filtering for Ultra Low A/D

Presentation GWT TM Ultra Filtration Systems GWT Ultra filtration systems incorporate advanced

TBEN-S Ultra-Compact Multiprotocol I/O Modules Ultra-Compact Multiprotocol I/O Modules in IP67

Statistics for High-Dimensional Data: Selected Topics Peter B uhlmann Seminar f ur

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Random projections, reweighting and half-sampling for high-dimensional statistical inference Art

Radio Cherenkov Cherenkov searches for searches for cosmogenic cosmogenic ultra ultra- -

Statistical Literacy March 1998 Statistics STATISTICAL Statistics 3/99 3/99 Association

CISC323: Introduction to Software developing software that organizes the effort into a number of

TENET: Tail-Event-driven NETwork Risk Wolfgang Karl Hrdle Weining Wang Lining Yu Ladislaus

German EoI for Power Converters of SIS100 - SIS100 Dipole Power Converter 1678 k -

Boosting: more than an ensemble method for prediction Peter B uhlmann ETH Z urich

Ontstolling in de dagelijkse praktijk casustiek Alles wat u wilt weten over ontstolling en

Learning Software Engineering - Online 29 th Pacific NW Software Quality Conference Oct 10-12 2011

A Hackers guide to reducing side-channel atuack surgaces using deep-learning Google,

Spring Scales Theyre only accurate when everything is at rest Turn off all electronic

ChemBioDraw Today & Tomorrow Mark L. Olson, PhD Vice-President, Software Development