Research in AppStat B. K egl / AppStat 1 AppStat: Applied - - PowerPoint PPT Presentation

research in appstat
SMART_READER_LITE
LIVE PREVIEW

Research in AppStat B. K egl / AppStat 1 AppStat: Applied - - PowerPoint PPT Presentation

Research in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine Learning AppStat: Apprentissage Automatique et Statistique Appliqu ee Bal azs K egl Linear Accelerator Laboratory, CNRS/University of Paris Sud


slide-1
SLIDE 1
  • B. K´

egl / AppStat 1

Research in AppStat

AppStat: Applied Statistics and Machine Learning AppStat: Apprentissage Automatique et Statistique Appliqu´ ee

Bal´ azs K´ egl

Linear Accelerator Laboratory, CNRS/University of Paris Sud Conseil Scientifique Dec 13, 2010

1

slide-2
SLIDE 2
  • B. K´

egl / AppStat 2

Research statement

Computer Science Experimental Physics Data analysis methodology Real data Motivation Experimental rigor

  • Works extremely well in bioinformatics
slide-3
SLIDE 3
  • B. K´

egl / AppStat 3

Two recent examples

From: Robert Sulej <Robert.Sulej@cern.ch> Subject: PLA application Date: 2010 December 04 22:51:40 GMT+01:00 To: Bal´ azs K´ egl <kegl@iro.umontreal.ca> Dear Balazs, We are working on the particle track reconstruction for ICARUS experiment at LNGS, placed in the underground lab in Gran Sasso, Italy. Goal of the experiment is to study properties of neutrinos coming from natural sources and from artificial beam created at CERN (CNGS beam) [...] We have found the Polygonal Line Algorithm very efficient in fitting the particle trajectories. The work was started with your Java applets and simulatio

  • f physics data.

Now we have implemented the algorithm in the collaboration software to use it on the real data collected this year from neutrino interactions. First results are very promising [...] regards, Dorota Stefan, Robert Sulej, and the Icarus software group

slide-4
SLIDE 4
  • B. K´

egl / AppStat 4

Two recent examples

slide-5
SLIDE 5
  • B. K´

egl / AppStat 5

Scientific path

Hungary 1989 – 94 M.Eng. Computer Science BUTE 1994 – 95 research assistant BUTE Canada 1995 – 99 Ph.D. Computer Science Concordia U 2000 postdoc Queen’s U 2001 – 06 assistant professor U of Montreal France 2006 – research scientist (CR1) CNRS / U Paris Sud

  • Research interests: machine learning, pattern recognition, signal pro-

cessing, applied statistics

  • Applications: image and music processing, bioinformatics, software en-

gineering, grid control, experimental physics

slide-6
SLIDE 6
  • B. K´

egl / AppStat 6

The team

  • B. Kégl (team leader)

2006 -

  • boosting
  • MCMC
  • Auger
  • R. Busa-Fekete (postdoc)

2008 -

  • boosting
  • optimization
  • SysBio
  • R. Bardenet (Ph.D student)

2009 -

  • MCMC
  • optimization
  • Auger
  • D. Benbouzid (Ph.D. student)

2010 -

  • boosting
  • JEM EUSO

"""""""""""""""""""

  • F-D. Collin (software

engineer; 01/12/2010)

  • multiboost.org
  • MCMC in root
  • system integration
  • D. Garcia (postdoc; 01/01/2011)
  • generative models
  • Auger / JEM EUSO
slide-7
SLIDE 7
  • B. K´

egl / AppStat 7

Collaborations

Computer Science

LAL

AppStat Auger JEM EUSO ILC, LSST, etc. LTCI

Telecom ParisTech

TAO

LRI

ESBG

Hungarian Academy Experimental Science Existing link Future link

boosting

  • ptimization

MCMC d r u g c

  • c

k t a i l

  • p

t i m i z a t i

  • n

trigger b

  • s

t i n g M C M C reconstruction

slide-8
SLIDE 8
  • B. K´

egl / AppStat 8

Funding

  • ANR “jeune chercheur” MetaModel
  • 2007–2010, 150Ke
  • ANR “COSINUS” Siminole
  • 2010–2014, 1043Ke (658Ke at LAL)
  • MRM Grille Paris Sud
  • 2010–2012, 60Ke (31Ke at LAL)
slide-9
SLIDE 9
  • B. K´

egl / AppStat 9

Siminole within ANR COSINUS

  • Simulation: third pillar of scientific discovery
  • Improving simulation
  • algorithmic development inside the simulator
  • implementation on high-end computing devices
  • our approach: control the number of calls to the simulator
slide-10
SLIDE 10
  • B. K´

egl / AppStat 10

Siminole within ANR COSINUS

  • Optimization
  • simulate from f (x), find max

x

f (x)

  • Inference
  • simulate from p(x | θ), find p(θ | x)
  • Discriminative learning (aka MVA)
  • simulate from p(x,θ), find θ = f (x)
slide-11
SLIDE 11
  • B. K´

egl / AppStat 11

Inference: p(x | θ) → p(θ | x)

EFD 0.137 S38

1.085

EFD 0.137 S38

1.085

EFD 0.139 S38

1.08

EFD 0.139 S38

1.08

2 5 10 20 50 100 200 S38 VEM 1.0 0.5 5.0 10.0 50.0 EFD EeV

Piecewise power law fit with cut at EFD 3 EeV

slide-12
SLIDE 12
  • B. K´

egl / AppStat 12

Inference: p(x | θ) → p(θ | x)

EFD 0.137 S38 1.085 EFD 0.137 S38 1.085 EFD 0.139 S38 1.08 EFD 0.139 S38 1.08 2 5 10 20 50 100 200 S38 VEM 1.0 0.5 5.0 10.0 50.0 EFD EeV

Piecewise power law fit with cut at EFD 3 EeV

  • The data: D = {(xi,yi,σxi,σyi)}n

i=1

  • The parameters to estimate: θ = {a,a2,c}
  • nuisance parameters: “projections” ˜

x

slide-13
SLIDE 13
  • B. K´

egl / AppStat 13

The likelihood

p(x,y,σx,σy | θ, ˜ x) = 1 2πσxσy exp

  • −1

2 (x− ˜ x)2 σ2

x

+ (y− fθ(˜

x))2 σ2

y

  • .

x,y x,y x ,fΘx

  • x

,fΘx

  • 40

60 80 100 120 140 10 20 30 40 x y

slide-14
SLIDE 14
  • B. K´

egl / AppStat 14

The maximum likelihood projection

˜ x∗ = argmax

˜ x

p(x,y,σx,σy | θ, ˜ x)

x,y x,y x ,fΘx

  • x

,fΘx

  • 40

60 80 100 120 140 10 20 30 40 x y

slide-15
SLIDE 15
  • B. K´

egl / AppStat 15

Marginalizing over the projection

p(x,y,σx,σy | θ) =

Z ∞

−∞ p(x,y,σx,σy, ˜

x | θ)d ˜ x

=

Z ∞

−∞ p(x,y,σx,σy | ˜

x,θ)p(˜ x | θ)d ˜ x

=

Z ∞

−∞ p(x,y,σx,σy | ˜

x,θ)p(˜ x)d ˜ x.

x,y x,y 40 60 80 100 120 140 10 20 30 40 x y

slide-16
SLIDE 16
  • B. K´

egl / AppStat 16

Inference: p(x | θ) → p(θ | x)

  • Maximum likelihood estimate
  • θ∗ = argmax

θ

p(x,y,σx,σy | θ)

  • Bayesian estimate
  • Bayes theorem: p(θ | x,y,σx,σy) =

p(x,y,σx,σy | θ)p(θ)

R p(x,y,σx,σy | θ′)p(θ′)dθ′

  • θ∗ = E {p(θ | x,y,σx,σy)}
slide-17
SLIDE 17
  • B. K´

egl / AppStat 17

The Metropolis-Hastings algorithm

  • parameters to estimate: θ = {a,a2,c}, (˜

x)

  • data: D = {(xi,yi,σxi,σyi)}n

i=1

METROPOLISHASTINGS(D) 1 sample ← {} 2 θ ← θinit 3 do 4 θcandidate ← θ+ perturbation 5 posterior-ratio ← p(D | θcandidate)p(θcandidate) p(D | θ)p(θ) 6 if posterior-ratio > r ∼ U[0,1] 7 θ ← θcandidate 8 sample ← sample ∪{θ} 9 until convergence 10 return sample

slide-18
SLIDE 18
  • B. K´

egl / AppStat 18

Slope posteriors

1.05 1.06 1.07 1.08 1.09 1.1 1.11 1.12 1.13 1.14 a 0.000 0.001 0.002 0.003 0.004 p

Linear in loglog a posterior histogram: a 1.1011 0.0134

1.05 1.06 1.07 1.08 1.09 1.1 1.11 1.12 1.13 1.14 a 0.000 0.001 0.002 0.003 0.004 0.005 0.006 p

Power law a posterior histogram: a 1.0849 0.0064

slide-19
SLIDE 19
  • B. K´

egl / AppStat 19

Auger surface detector signal

  • Generative parameters:

θ =

muon arrival times tµ [ns] 700 725 750 900

  • Signal:

x =

500 1000 1500 2000 2500 3000 3500 0.0 0.5 1.0 1.5 2.0 t ns VEM

  • bserved signal x red muons; blue photons
slide-20
SLIDE 20
  • B. K´

egl / AppStat 20

Auger surface detector signal

  • Generation, simulation:

θ =

muon arrival times tµ [ns] 700 725 750 900

↓ p(x | θ)

x =

500 1000 1500 2000 2500 3000 3500 0.0 0.5 1.0 1.5 2.0 t ns VEM

  • bserved signal x red muons; blue photons
slide-21
SLIDE 21
  • B. K´

egl / AppStat 21

Auger surface detector signal

  • Estimation, inference:

θ =

muon arrival times tµ [ns] 700 725 750 900

↑ p(θ | x)

x =

500 1000 1500 2000 2500 3000 3500 0.0 0.5 1.0 1.5 2.0 t ns VEM

  • bserved signal x red muons; blue photons
slide-22
SLIDE 22
  • B. K´

egl / AppStat 22

The prior, the signal, and the posterior

500 1000 1500 2000 2500 3000 3500 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030 t ns ptΜ,ptΓ

arrival time prior distributions ptΜ and ptΓ at r 1270m

500 1000 1500 2000 2500 3000 3500 0.00 0.01 0.02 0.03 0.04 0.05 t ns ptΜx

muon arrival time posterior histogram ptΜx

slide-23
SLIDE 23
  • B. K´

egl / AppStat 23

Inference: p(x | θ) → p(θ | x)

  • Research questions
  • other sampling algorithms (sequential Monte-Carlo, particle filters,

Hamiltonian MCMC)

  • unknown number of parameters
  • adaptive MCMC
  • connections to adaptive stochastic optimization
  • large scale issues (grid, multicore, GPU)
slide-24
SLIDE 24
  • B. K´

egl / AppStat 24

Discriminative learning: p(x | θ) → θ = f (x)

1 1 2 3 4 5 6 x1 1 1 2 3 4 5 x2

'Two Moons' data for twoclass classification problem

slide-25
SLIDE 25
  • B. K´

egl / AppStat 25

Discriminative learning: p(x | θ) → θ = f (x)

1 1 2 3 4 5 6 x1 1 1 2 3 4 5 x2

Discriminant function with Parzen fits, h 0.12

slide-26
SLIDE 26
  • B. K´

egl / AppStat 26

Discriminative learning: p(x | θ) → θ = f (x)

  • observation vector: x ∈ Rd
  • class label: θ ∈ {−1,1} – binary classification
  • class label: θ ∈ {1,...,K} – multi-class classification
  • classifier: f : Rd → {−1,1}
  • discriminant function: g : Rd → [−1,1]

f (x) =

  • 1,

if g(x) ≥ 0, −1, if g(x) < 0

slide-27
SLIDE 27
  • B. K´

egl / AppStat 27

Discriminative learning: p(x | θ) → θ = f (x)

  • Inductive learning
  • training sample: Dn =

(x1,θ1),...,(xn,θn)

  • function set: F
  • learning algorithm: ALGO :
  • Rd ×{−1,1}

n → F ALGO(Dn) → f,g

  • goal: small generalization error P
  • f (X) = Θ
slide-28
SLIDE 28
  • B. K´

egl / AppStat 28

History of discriminative learning

  • Algorithms
  • 1958: Perceptron [Rosenblatt, ’58] – [Minsky–Papert ’69]
  • 1986: Multilayer perceptrons (neural networks) and the

back-propagation algorithm [Rumelhart–Hinton–Williams, ’86]

  • 1995: Support vector machines [Boser–Guyon–Vapnik, ’92], [Cortes–

Vapnik, ’95]

  • 1997: boosting, AdaBoost [Freund, ’95], [Freund–Schapire, ’97]
slide-29
SLIDE 29
  • B. K´

egl / AppStat 29

Discriminative learning: HEP applications

Boosted decision trees in HEP studies

MiniBooNE (e.g. physics/0408124 NIM A543:577-584, physics/0508045 NIM A555:370-385, hep-ex/0704.1500) D0 single top evidence (PRL98:181802,2007, PRD78:012005,2008) D0 and CDF single top quark observation (PRL103:092001,2009, PRL103:092002,2009) D0 tau ID and single top search (in press in PLB) GLAST (same code as D0) BaBar (hep-ex/0607112) ATLAS: diboson analyses, SUSY analysis (hep-ph/0605106 JHEP060740), single top CSC note, tau ID b-tagging for LHC (physics/0702041) Electron ID in CMS More and more underway

Yann Coadou (CPPM) — Boosted decision trees SOS2010, Autrans, 20 May 2010 46/50

slide-30
SLIDE 30
  • B. K´

egl / AppStat 30

Discriminative learning: HEP applications

slide-31
SLIDE 31
  • B. K´

egl / AppStat 31

Research questions

  • Classical setup:
  • small generalization error P
  • f (X) = Θ
  • Neyman-Pearson setup (triggers, tests):
  • high true positive rate

P

  • g(X) ≥ b | Θ = signal
  • at given false positive rate

P

  • g(X) ≥ b | Θ = background
  • ≤ α
slide-32
SLIDE 32
  • B. K´

egl / AppStat 32

Research questions

  • Research goals
  • algorithmic and theoretical questions of Neyman-Pearson boosting
  • automatic cascade and DAG design
  • extending boosting to regression, ranking, reinforcement learning
  • large-scale issues (grid, multi-core, GPU)
  • multiboost.org
slide-33
SLIDE 33
  • B. K´

egl / AppStat 33

Collaborations

Computer Science

LAL

AppStat Auger JEM EUSO ILC, LSST, etc. LTCI

Telecom ParisTech

TAO

LRI

ESBG

Hungarian Academy Experimental Science Existing link Future link

boosting

  • ptimization

MCMC d r u g c

  • c

k t a i l

  • p

t i m i z a t i

  • n

trigger b

  • s

t i n g M C M C reconstruction

slide-34
SLIDE 34
  • B. K´

egl / AppStat 34

The team

  • B. Kégl (team leader)

2006 -

  • boosting
  • MCMC
  • Auger
  • R. Busa-Fekete (postdoc)

2008 -

  • boosting
  • optimization
  • SysBio
  • R. Bardenet (Ph.D student)

2009 -

  • MCMC
  • optimization
  • Auger
  • D. Benbouzid (Ph.D. student)

2010 -

  • boosting
  • JEM EUSO

"""""""""""""""""""

  • F-D. Collin (software

engineer; 01/12/2010)

  • multiboost.org
  • MCMC in root
  • system integration
  • D. Garcia (postdoc; 01/01/2011)
  • generative models
  • Auger / JEM EUSO