 
              Research in AppStat B. K´ egl / AppStat 1 AppStat: Applied Statistics and Machine Learning AppStat: Apprentissage Automatique et Statistique Appliqu´ ee Bal´ azs K´ egl Linear Accelerator Laboratory, CNRS/University of Paris Sud Conseil Scientifique Dec 13, 2010 1
B. K´ egl / AppStat 2 Research statement Computer Science Real data Data analysis methodology Motivation Experimental rigor Experimental Physics • Works extremely well in bioinformatics
B. K´ egl / AppStat 3 Two recent examples From: Robert Sulej <Robert.Sulej@cern.ch> Subject: PLA application Date: 2010 December 04 22:51:40 GMT+01:00 To: Bal´ azs K´ egl <kegl@iro.umontreal.ca> Dear Balazs, We are working on the particle track reconstruction for ICARUS experiment at LNGS, placed in the underground lab in Gran Sasso, Italy. Goal of the experiment is to study properties of neutrinos coming from natural sources and from artificial beam created at CERN (CNGS beam) [...] We have found the Polygonal Line Algorithm very efficient in fitting the particle trajectories. The work was started with your Java applets and simulatio of physics data. Now we have implemented the algorithm in the collaboration software to use it on the real data collected this year from neutrino interactions. First results are very promising [...] regards, Dorota Stefan, Robert Sulej, and the Icarus software group
B. K´ egl / AppStat 4 Two recent examples
B. K´ egl / AppStat 5 Scientific path Hungary 1989 – 94 M.Eng. Computer Science BUTE 1994 – 95 research assistant BUTE Canada 1995 – 99 Ph.D. Computer Science Concordia U 2000 postdoc Queen’s U 2001 – 06 assistant professor U of Montreal France 2006 – research scientist (CR1) CNRS / U Paris Sud • Research interests: machine learning, pattern recognition, signal pro- cessing, applied statistics • Applications: image and music processing, bioinformatics, software en- gineering, grid control, experimental physics
B. K´ egl / AppStat 6 The team B. Kégl (team leader) 2006 - - boosting - MCMC - Auger D. Benbouzid (Ph.D. student) 2010 - R. Busa-Fekete (postdoc) - boosting 2008 - """"""""""""""""""" - JEM EUSO - boosting - optimization R. Bardenet (Ph.D student) - SysBio 2009 - - MCMC - optimization - Auger � ������������������������������������������������������������� � ������������������������������������ ��������������������� � ������������������������������������ ���������������������� F-D. Collin (software D. Garcia (postdoc; 01/01/2011) engineer; 01/12/2010) � - generative models - multiboost.org - Auger / JEM EUSO - MCMC in root - system integration � � � ����������������������������������������������������������������������������������� ������������������������������������ � � ����������������������������������� ����������������������������������������������� � ������������������������������������������������������������������������������������������ ����������������
B. K´ egl / AppStat 7 Collaborations LRI Telecom ParisTech LTCI TAO MCMC boosting optimization LAL AppStat o n t i z a Hungarian Academy m i i p t o a i l k t c c o g u d r ESBG g reconstruction M n C i M t trigger s C o o b JEM EUSO Auger ILC, LSST, Computer Science etc. Experimental Science Existing link Future link
B. K´ egl / AppStat 8 Funding • ANR “jeune chercheur” MetaModel • 2007–2010, 150K e • ANR “COSINUS” Siminole • 2010–2014, 1043K e (658K e at LAL) • MRM Grille Paris Sud • 2010–2012, 60K e (31K e at LAL)
B. K´ egl / AppStat 9 Siminole within ANR COSINUS • Simulation: third pillar of scientific discovery • Improving simulation • algorithmic development inside the simulator • implementation on high-end computing devices • our approach: control the number of calls to the simulator
B. K´ egl / AppStat 10 Siminole within ANR COSINUS • Optimization • simulate from f ( x ) , find max f ( x ) x • Inference • simulate from p ( x | θ ) , find p ( θ | x ) • Discriminative learning (aka MVA) • simulate from p ( x , θ ) , find θ = f ( x )
B. K´ egl / AppStat 11 Inference: p ( x | θ ) → p ( θ | x ) Piecewise power law fit with cut at E FD � 3 EeV E FD � EeV � 50.0 10.0 1.085 1.085 E FD � 0.137 � S 38 E FD � 0.137 � S 38 5.0 1.0 0.5 1.08 1.08 E FD � 0.139 � S 38 E FD � 0.139 � S 38 S 38 � VEM � 2 5 10 20 50 100 200
B. K´ egl / AppStat 12 Inference: p ( x | θ ) → p ( θ | x ) Piecewise power law fit with cut at E FD � 3 EeV E FD � EeV � 50.0 10.0 1.085 1.085 E FD � 0.137 � S 38 E FD � 0.137 � S 38 5.0 1.0 0.5 1.08 1.08 E FD � 0.139 � S 38 E FD � 0.139 � S 38 S 38 � VEM � 2 5 10 20 50 100 200 • The data: D = { ( x i , y i , σ x i , σ y i ) } n i = 1 • The parameters to estimate: θ = { a , a 2 , c } • nuisance parameters: “projections” ˜ x
B. K´ egl / AppStat 13 The likelihood x ) 2 x )) 2 � � ( x − ˜ + ( y − f θ ( ˜ �� 1 − 1 p ( x , y , σ x , σ y | θ , ˜ x ) = exp . σ 2 σ 2 2 πσ x σ y 2 x y 40 30 � x,y � � x,y � � ,f Θ � x � ,f Θ � x � �� � �� 20 � x � x y 10 0 40 60 80 100 120 140 x
B. K´ egl / AppStat 14 The maximum likelihood projection x ∗ = argmax p ( x , y , σ x , σ y | θ , ˜ x ) ˜ x ˜ 40 30 � x,y � � x,y � 20 y � � ,f Θ � x � � ,f Θ � x � � �� � � �� � x � x 10 0 40 60 80 100 120 140 x
B. K´ egl / AppStat 15 Marginalizing over the projection Z ∞ p ( x , y , σ x , σ y | θ ) = − ∞ p ( x , y , σ x , σ y , ˜ x | θ ) d ˜ x Z ∞ = − ∞ p ( x , y , σ x , σ y | ˜ x , θ ) p ( ˜ x | θ ) d ˜ x Z ∞ = − ∞ p ( x , y , σ x , σ y | ˜ x , θ ) p ( ˜ x ) d ˜ x . 40 30 � x,y � � x,y � 20 y 10 0 40 60 80 100 120 140 x
B. K´ egl / AppStat 16 Inference: p ( x | θ ) → p ( θ | x ) • Maximum likelihood estimate • θ ∗ = argmax p ( x , y , σ x , σ y | θ ) θ • Bayesian estimate p ( x , y , σ x , σ y | θ ) p ( θ ) • Bayes theorem: p ( θ | x , y , σ x , σ y ) = R p ( x , y , σ x , σ y | θ ′ ) p ( θ ′ ) d θ ′ • θ ∗ = E { p ( θ | x , y , σ x , σ y ) }
B. K´ egl / AppStat 17 The Metropolis-Hastings algorithm • parameters to estimate: θ = { a , a 2 , c } , ( ˜ x ) • data: D = { ( x i , y i , σ x i , σ y i ) } n i = 1 M ETROPOLIS H ASTINGS ( D ) sample ← {} 1 θ ← θ init 2 3 do θ candidate ← θ + perturbation 4 posterior-ratio ← p ( D | θ candidate ) p ( θ candidate ) 5 p ( D | θ ) p ( θ ) if posterior-ratio > r ∼ U [ 0,1 ] 6 θ ← θ candidate 7 sample ← sample ∪{ θ } 8 9 until convergence return sample 10
B. K´ egl / AppStat 18 Slope posteriors Linear in log � log a posterior histogram : a � 1.1011 � 0.0134 p 0.004 0.003 0.002 0.001 0.000 a 1.05 1.06 1.07 1.08 1.09 1.1 1.11 1.12 1.13 1.14 Power law a posterior histogram : a � 1.0849 � 0.0064 p 0.006 0.005 0.004 0.003 0.002 0.001 0.000 a 1.05 1.06 1.07 1.08 1.09 1.1 1.11 1.12 1.13 1.14
B. K´ egl / AppStat 19 Auger surface detector signal • Generative parameters: θ = t µ [ ns ] muon arrival times 700 725 750 900 • Signal: observed signal x � red � muons; blue � photons � 2.0 1.5 VEM 1.0 x = 0.5 0.0 0 500 1000 1500 2000 2500 3000 3500 t � ns �
B. K´ egl / AppStat 20 Auger surface detector signal • Generation, simulation: θ = t µ [ ns ] muon arrival times 700 725 750 900 ↓ p ( x | θ ) observed signal x � red � muons; blue � photons � 2.0 1.5 VEM 1.0 x = 0.5 0.0 0 500 1000 1500 2000 2500 3000 3500 t � ns �
B. K´ egl / AppStat 21 Auger surface detector signal • Estimation, inference: θ = t µ [ ns ] muon arrival times 700 725 750 900 ↑ p ( θ | x ) observed signal x � red � muons; blue � photons � 2.0 1.5 VEM 1.0 x = 0.5 0.0 0 500 1000 1500 2000 2500 3000 3500 t � ns �
Recommend
More recommend