Research in AppStat B. K egl / AppStat 1 AppStat: Applied - PowerPoint PPT Presentation

Research in AppStat B. K´ egl / AppStat 1 AppStat: Applied Statistics and Machine Learning AppStat: Apprentissage Automatique et Statistique Appliqu´ ee Bal´ azs K´ egl Linear Accelerator Laboratory, CNRS/University of Paris Sud Conseil Scientifique Dec 13, 2010 1

B. K´ egl / AppStat 2 Research statement Computer Science Real data Data analysis methodology Motivation Experimental rigor Experimental Physics • Works extremely well in bioinformatics

B. K´ egl / AppStat 3 Two recent examples From: Robert Sulej <Robert.Sulej@cern.ch> Subject: PLA application Date: 2010 December 04 22:51:40 GMT+01:00 To: Bal´ azs K´ egl <kegl@iro.umontreal.ca> Dear Balazs, We are working on the particle track reconstruction for ICARUS experiment at LNGS, placed in the underground lab in Gran Sasso, Italy. Goal of the experiment is to study properties of neutrinos coming from natural sources and from artificial beam created at CERN (CNGS beam) [...] We have found the Polygonal Line Algorithm very efficient in fitting the particle trajectories. The work was started with your Java applets and simulatio of physics data. Now we have implemented the algorithm in the collaboration software to use it on the real data collected this year from neutrino interactions. First results are very promising [...] regards, Dorota Stefan, Robert Sulej, and the Icarus software group

B. K´ egl / AppStat 4 Two recent examples

B. K´ egl / AppStat 5 Scientific path Hungary 1989 – 94 M.Eng. Computer Science BUTE 1994 – 95 research assistant BUTE Canada 1995 – 99 Ph.D. Computer Science Concordia U 2000 postdoc Queen’s U 2001 – 06 assistant professor U of Montreal France 2006 – research scientist (CR1) CNRS / U Paris Sud • Research interests: machine learning, pattern recognition, signal processing, applied statistics • Applications: image and music processing, bioinformatics, software en- gineering, grid control, experimental physics

B. K´ egl / AppStat 6 The team B. Kégl (team leader) 2006 - - boosting - MCMC - Auger D. Benbouzid (Ph.D. student) 2010 - R. Busa-Fekete (postdoc) - boosting 2008 - """"""""""""""""""" - JEM EUSO - boosting - optimization R. Bardenet (Ph.D student) - SysBio 2009 - - MCMC - optimization - Auger � �� F-D. Collin (software D. Garcia (postdoc; 01/01/2011) engineer; 01/12/2010) � - generative models - multiboost.org - Auger / JEM EUSO - MCMC in root - system integration � � � ��

B. K´ egl / AppStat 7 Collaborations LRI Telecom ParisTech LTCI TAO MCMC boosting optimization LAL AppStat o n t i z a Hungarian Academy m i i p t o a i l k t c c o g u d r ESBG g reconstruction M n C i M t trigger s C o o b JEM EUSO Auger ILC, LSST, Computer Science etc. Experimental Science Existing link Future link

B. K´ egl / AppStat 8 Funding • ANR “jeune chercheur” MetaModel • 2007–2010, 150K e • ANR “COSINUS” Siminole • 2010–2014, 1043K e (658K e at LAL) • MRM Grille Paris Sud • 2010–2012, 60K e (31K e at LAL)

B. K´ egl / AppStat 9 Siminole within ANR COSINUS • Simulation: third pillar of scientific discovery • Improving simulation • algorithmic development inside the simulator • implementation on high-end computing devices • our approach: control the number of calls to the simulator

B. K´ egl / AppStat 10 Siminole within ANR COSINUS • Optimization • simulate from f ( x ) , find max f ( x ) x • Inference • simulate from p ( x | θ ) , find p ( θ | x ) • Discriminative learning (aka MVA) • simulate from p ( x , θ ) , find θ = f ( x )

B. K´ egl / AppStat 11 Inference: p ( x | θ ) → p ( θ | x ) Piecewise power law fit with cut at E FD � 3 EeV E FD � EeV � 50.0 10.0 1.085 1.085 E FD � 0.137 � S 38 E FD � 0.137 � S 38 5.0 1.0 0.5 1.08 1.08 E FD � 0.139 � S 38 E FD � 0.139 � S 38 S 38 � VEM � 2 5 10 20 50 100 200

B. K´ egl / AppStat 12 Inference: p ( x | θ ) → p ( θ | x ) Piecewise power law fit with cut at E FD � 3 EeV E FD � EeV � 50.0 10.0 1.085 1.085 E FD � 0.137 � S 38 E FD � 0.137 � S 38 5.0 1.0 0.5 1.08 1.08 E FD � 0.139 � S 38 E FD � 0.139 � S 38 S 38 � VEM � 2 5 10 20 50 100 200 • The data: D = { ( x i , y i , σ x i , σ y i ) } n i = 1 • The parameters to estimate: θ = { a , a 2 , c } • nuisance parameters: “projections” ˜ x

B. K´ egl / AppStat 13 The likelihood x ) 2 x )) 2 � � ( x − ˜ + ( y − f θ ( ˜ �� 1 − 1 p ( x , y , σ x , σ y | θ , ˜ x ) = exp . σ 2 σ 2 2 πσ x σ y 2 x y 40 30 � x,y � � x,y � � ,f Θ � x � ,f Θ � x � �� 20 � x � x y 10 0 40 60 80 100 120 140 x

B. K´ egl / AppStat 14 The maximum likelihood projection x ∗ = argmax p ( x , y , σ x , σ y | θ , ˜ x ) ˜ x ˜ 40 30 � x,y � � x,y � 20 y � � ,f Θ � x � � ,f Θ � x � � �� x � x 10 0 40 60 80 100 120 140 x

B. K´ egl / AppStat 15 Marginalizing over the projection Z ∞ p ( x , y , σ x , σ y | θ ) = − ∞ p ( x , y , σ x , σ y , ˜ x | θ ) d ˜ x Z ∞ = − ∞ p ( x , y , σ x , σ y | ˜ x , θ ) p ( ˜ x | θ ) d ˜ x Z ∞ = − ∞ p ( x , y , σ x , σ y | ˜ x , θ ) p ( ˜ x ) d ˜ x . 40 30 � x,y � � x,y � 20 y 10 0 40 60 80 100 120 140 x

B. K´ egl / AppStat 16 Inference: p ( x | θ ) → p ( θ | x ) • Maximum likelihood estimate • θ ∗ = argmax p ( x , y , σ x , σ y | θ ) θ • Bayesian estimate p ( x , y , σ x , σ y | θ ) p ( θ ) • Bayes theorem: p ( θ | x , y , σ x , σ y ) = R p ( x , y , σ x , σ y | θ ′ ) p ( θ ′ ) d θ ′ • θ ∗ = E { p ( θ | x , y , σ x , σ y ) }

B. K´ egl / AppStat 17 The Metropolis-Hastings algorithm • parameters to estimate: θ = { a , a 2 , c } , ( ˜ x ) • data: D = { ( x i , y i , σ x i , σ y i ) } n i = 1 M ETROPOLIS H ASTINGS ( D ) sample ← {} 1 θ ← θ init 2 3 do θ candidate ← θ + perturbation 4 posterior-ratio ← p ( D | θ candidate ) p ( θ candidate ) 5 p ( D | θ ) p ( θ ) if posterior-ratio > r ∼ U [ 0,1 ] 6 θ ← θ candidate 7 sample ← sample ∪{ θ } 8 9 until convergence return sample 10

B. K´ egl / AppStat 18 Slope posteriors Linear in log � log a posterior histogram : a � 1.1011 � 0.0134 p 0.004 0.003 0.002 0.001 0.000 a 1.05 1.06 1.07 1.08 1.09 1.1 1.11 1.12 1.13 1.14 Power law a posterior histogram : a � 1.0849 � 0.0064 p 0.006 0.005 0.004 0.003 0.002 0.001 0.000 a 1.05 1.06 1.07 1.08 1.09 1.1 1.11 1.12 1.13 1.14

B. K´ egl / AppStat 19 Auger surface detector signal • Generative parameters: θ = t µ [ ns ] muon arrival times 700 725 750 900 • Signal: observed signal x � red � muons; blue � photons � 2.0 1.5 VEM 1.0 x = 0.5 0.0 0 500 1000 1500 2000 2500 3000 3500 t � ns �

B. K´ egl / AppStat 20 Auger surface detector signal • Generation, simulation: θ = t µ [ ns ] muon arrival times 700 725 750 900 ↓ p ( x | θ ) observed signal x � red � muons; blue � photons � 2.0 1.5 VEM 1.0 x = 0.5 0.0 0 500 1000 1500 2000 2500 3000 3500 t � ns �

B. K´ egl / AppStat 21 Auger surface detector signal • Estimation, inference: θ = t µ [ ns ] muon arrival times 700 725 750 900 ↑ p ( θ | x ) observed signal x � red � muons; blue � photons � 2.0 1.5 VEM 1.0 x = 0.5 0.0 0 500 1000 1500 2000 2500 3000 3500 t � ns �

Research in AppStat B. K egl / AppStat 1 AppStat: Applied - PowerPoint PPT Presentation

Research in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine Learning AppStat: Apprentissage Automatique et Statistique Appliqu ee Bal azs K egl Linear Accelerator Laboratory, CNRS/University of Paris Sud

Software development in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine

Research Funding Opportunities for Research Fellows Research Development Office Trinity Research

Research & HIPAA October, 2016 Overview HIPAA & Research Increased Enforcement

Chapter 9. Survey Research Chapter 9. Survey Research survey research methods? survey research

Creative Research Methods Arts-based research Research using technology

Tuberculosis Researches in Thailand

TRESPASSING KNOWLEDGE RESEARCH AS BEING, RESEARCH AS DOING, RESEARCH AS PRACTICE. Research is

MS Research Revolution Research that will STOP MS in its tracks. Research that will RESTORE

Research funding within Metro South Health: Understanding the - PA Research Support Scheme - Dr

Getting Funding for Your Research Applying for External Research Grants Jeff Boyd Research

What is research? An Introduction COM 432 Dr Yeoh Kok Cheow What is research? Research is the

Faculty Research and Faculty Research and Creative Activity Creative Activity at UW- -Parkside

RESEARCH RESEARCH RESEARCH ON RESEARCH RESEARCH RESEARCH RESEARCH ON RESEARCH ON ON ON

National Water Research Group Mauritius Research Council (MRC - Chair) Agricultural Research

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

NICHD NICHD Research Mission & Research Mission & Research Mission & Research

O OpenFow @ Korea: F @ K Li ki Linking OpenFlow Activities O Fl A i i i in Korea in Korea

PREFERENCES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING Eyke

EE182 Computer Organization and Design Winter 1998 Chapter 5 Lectures Processor Datapath and

Where are We Now? The Five Classic Components of Computer System Architecture a Computer

Fast Bayesian automatic Fast Bayesian automatic adaptive quadrature adaptive quadrature Gh.

The Dueling Bandits Problem Yisong Yue Collaborators Yanan

Struktur Data & Algoritme ( Data Structures & Algorithms ) Sorting Denny (

Dependability Modelling and Assessment of Avionics Systems with Altarica. P. Bieber, Ch. Castel,

Research in AppStat B. K egl / AppStat 1 AppStat: Applied - PowerPoint PPT Presentation

Research in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine Learning AppStat: Apprentissage Automatique et Statistique Appliqu ee Bal azs K egl Linear Accelerator Laboratory, CNRS/University of Paris Sud

Software development in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine

Research Funding Opportunities for Research Fellows Research Development Office Trinity Research

Research &amp; HIPAA October, 2016 Overview HIPAA &amp; Research Increased Enforcement

Chapter 9. Survey Research Chapter 9. Survey Research survey research methods? survey research

Creative Research Methods Arts-based research Research using technology

Tuberculosis Researches in Thailand

TRESPASSING KNOWLEDGE RESEARCH AS BEING, RESEARCH AS DOING, RESEARCH AS PRACTICE. Research is

MS Research Revolution Research that will STOP MS in its tracks. Research that will RESTORE

Research funding within Metro South Health: Understanding the - PA Research Support Scheme - Dr

Getting Funding for Your Research Applying for External Research Grants Jeff Boyd Research

What is research? An Introduction COM 432 Dr Yeoh Kok Cheow What is research? Research is the

Faculty Research and Faculty Research and Creative Activity Creative Activity at UW- -Parkside

RESEARCH RESEARCH RESEARCH ON RESEARCH RESEARCH RESEARCH RESEARCH ON RESEARCH ON ON ON

National Water Research Group Mauritius Research Council (MRC - Chair) Agricultural Research

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

NICHD NICHD Research Mission &amp; Research Mission &amp; Research Mission &amp; Research

O OpenFow @ Korea: F @ K Li ki Linking OpenFlow Activities O Fl A i i i in Korea in Korea

PREFERENCES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING Eyke

EE182 Computer Organization and Design Winter 1998 Chapter 5 Lectures Processor Datapath and

Where are We Now? The Five Classic Components of Computer System Architecture a Computer

Fast Bayesian automatic Fast Bayesian automatic adaptive quadrature adaptive quadrature Gh.

The Dueling Bandits Problem Yisong Yue Collaborators Yanan

Struktur Data &amp; Algoritme ( Data Structures &amp; Algorithms ) Sorting Denny (

Dependability Modelling and Assessment of Avionics Systems with Altarica. P. Bieber, Ch. Castel,

Research & HIPAA October, 2016 Overview HIPAA & Research Increased Enforcement

NICHD NICHD Research Mission & Research Mission & Research Mission & Research

Struktur Data & Algoritme ( Data Structures & Algorithms ) Sorting Denny (