Higgs Machine Learning Challenge experience. A HEP pattern - - PowerPoint PPT Presentation

higgs machine learning challenge experience a hep pattern
SMART_READER_LITE
LIVE PREVIEW

Higgs Machine Learning Challenge experience. A HEP pattern - - PowerPoint PPT Presentation

Higgs Machine Learning Challenge experience. A HEP pattern recognition challenge ? David Rousseau LAL-Orsay 10th February 2015 CTD 2015, Berkeley Outline q Machine Learning, Challenges q The Higgs Machine Learning challenge q A


slide-1
SLIDE 1

Higgs Machine Learning Challenge experience. A HEP pattern recognition challenge ?

David Rousseau LAL-Orsay 10th February 2015

CTD 2015, Berkeley

slide-2
SLIDE 2

2

Outline

q Machine Learning, Challenges … q The Higgs Machine Learning challenge q A HEP pattern recognition challenge ?

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-3
SLIDE 3

3

Machine Learning and HEP

q Neural Nets used somewhat in the 90’ies (e.g. LEP) q BDT (Adaboost) invented in 97 q MVA techniques (= Machine Learning) have been used extensively at D0/CDF (mostly BDT, but not only) in the 00’ies q Atlas/CMS less eager to adopt MVA at LHC starts for some good reasons:

  • Need to understand well the input variables first
  • Still a lot to gain by improving input variables
  • Systematics more difficult to evaluate
  • Collected luminosity was increasing fast

q But lot of work recently with MVA techniques

  • Competition
  • Best use of available data

q Meanwhile Neural Net reappear in their “deep” incantation (See Peter Sadowski’s talk this afternoon)

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-4
SLIDE 4

4

Machine Learning in HEP (2)

q However:

  • TMVA, within Root, has been instrumental in popularising MVA

technique within HEP

  • Most people using TMVA, most people using BDT in TMVA
  • Although getting a reasonable answer from TMVA is quick and

easy, it takes time to really become an expert with e.g. BDT

  • People are focussing on the choices of input variables and the

evaluation of systematics (which of course are excellent things to do)

q Not much work on studying possible better MVA techniques, for which you need the software and the know-how

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-5
SLIDE 5

5

Challenge ?

q Challenges have become in the last 10 years a common way of working for the machine learning community q Machine learning scientists are eager to test their algorithms on real life problemsèmore valuable(=publisheable) than artificial problems q Company or academics want to outsource a problem to machine learning scientist, but also geeks etc. The company sets up a challenge like:

  • Netflix : predict movie preference from past movie selection
  • Gesture recognition
  • Separating pictures of cats from pictures of dogs
  • NASA/JPL mapping dark matter through (simulated) galaxy distortion

q Some companies makes a business from organising challenges: datascience.net, kaggle q A few recent examples now…

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-6
SLIDE 6

6

Looking Looking at People eople (2012-14) 2012-14)

Wave Point Clap Shake Hands Hug Fight Actions Interactions

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

http://chalearn.org/

slide-7
SLIDE 7

7

Neur Neural al connect connectomics

  • mics (2015)

2015)

http://chalearn.org/

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-8
SLIDE 8

8

NE NEW: W: Aut utoM

  • ML challenge

hallenge (2015) 2015)

Fully automatic machine learning without ANY human intervention

http://codalab.org/AutoML December 2014 – May 2015 $30,000 in prizes

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-9
SLIDE 9

9

Why challenges work ?

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

Olga Kokshagina 2015

MOTIVATION OF ORGANIZING CONTESTS: EXTREME VALUE

20 Courtesy : Lakhani 2014

OI is suitable for a variety of nonconvential surprising ideas that are « far » from traditional expertise - > high volatility Experts are highly skilled, trained - > more focused, performed solution, low variety

Not just ML, but a general trend: Open Innovation

slide-10
SLIDE 10

10

From domain to challenge and back

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

Challenge Problem Solution Domain e.g. HEP Domain experts solve the domain problem Solution The crowd solves the challenge problem Problem simplify reimport

slide-11
SLIDE 11

Higgs Machine Learning Challenge

slide-12
SLIDE 12

12

… in a nutshell

q Why not put some Atlas simulated data on the web and ask data scientists to find the best machine learning algorithm to find the Higgs ?

  • Instead of HEP people browsing machine learning

papers, coding or downloading possibly interesting algorithm, trying and seeing whether it can work for

  • ur problems

q Challenge for us : make a full ATLAS Higgs analysis simple for non physicists, but not too simple so that it remains useful q Also try to foster long term collaborations between HEP and ML

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-13
SLIDE 13

13

Committees

q Organization committee:

  • David Rousseau : Atlas-LAL
  • Claire Adam-Bourdarios : Atlas-LAL (outreach, legal matter)
  • Glen Cowan : Atlas-RHUL (statistics)
  • Balazs Kegl : Appstat-LAL
  • Cécile Germain : TAO-LRI
  • Isabelle Guyon : Chalearn (challenges organisation)

q Advisory committee:

  • Andreas Hoecker : Atlas-CERN (PC,TMVA)
  • Joerg Stelzer : Atlas-CERN (TMVA)
  • Thorsten Wengler : Atlas-CERN (ATLAS management)
  • Marc Schoenauer : INRIA (french computer science institute)

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

{ {

ATLAS Machine Learning

slide-14
SLIDE 14

14

H tautau

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

ATLAS-CONF-2013-108

4.1 σ evidence (now superseded by paper arXiv:1501.049)

slide-15
SLIDE 15

15

How did it work ?

q First idea in Sep 2012 q Challenge ran from May to September 2014 q People register to Kaggle web site hosted https://www.kaggle.com/c/higgs-boson . (additional info on https://higgsml.lal.in2p3.fr) q Open to almost any one

  • Data scientist
  • HEP physicists
  • Students, geeks,
  • Except LAL-Orsay employees (for legal reasons)

q …download training dataset (with label) with 250k events q …train their own algorithm to optimise the significance (à la s/sqrt(b)) q …download test dataset (without labels) with 550k events q …upload their own classification q The site automatically calculates significance. Public (100k events) and private (450k events) leader boards update instantly. q Competition closes mid september 2014. People are asked to provide their code and

  • methods. Best 1 2 3 from private leaderboard win 7k€ 4k€ 2k€

q The most interesting one gets the “HEP meets ML award”

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

Funded by: Paris Saclay Center for Data Science, Google, INRIA

slide-16
SLIDE 16

16

Dataset

ASCII csv file, with mixture of Higgs to tautau signal and corresponding background, from official GEANT4 ATLAS simulation Weight and signal/background (for training dataset

  • nly)

weight (fully normalised) label : « s » or « b » Conf note variables used for categorization or BDT: DER_mass_MMC DER_mass_transverse_met_lep DER_mass_vis DER_pt_h DER_deltaeta_jet_jet DER_mass_jet_jet DER_prodeta_jet_jet DER_deltar_tau_lep DER_pt_tot DER_sum_pt DER_pt_ratio_lep_tau DER_met_phi_centrality DER_lep_eta_centrality Primitive 3-vectors allowing to compute the conf note variables (mass neglected), 16 independent variables: PRI_tau_pt PRI_tau_eta PRI_tau_phi PRI_lep_pt PRI_lep_eta PRI_lep_phi PRI_met PRI_met_phi PRI_met_sumet PRI_jet_num (0,1,2,3, capped at 3) PRI_jet_leading_pt PRI_jet_leading_eta PRI_jet_leading_phi PRI_jet_subleading_pt PRI_jet_subleading_eta PRI_jet_subleading_phi PRI_jet_all_pt David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-17
SLIDE 17

17

From domain to challenge and back

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

Challenge Problem Solution Domain e.g. HEP Domain experts solve the domain problem Solution The crowd solves the challenge problem Problem simplify reimport 18 months >2 years ? 4 months

slide-18
SLIDE 18

18

Real analysis vs challenge

1. Systematics 2. 2 categories x n BDT score bins 3. Background estimated from data (embedded, anti tau, control region) and some MC 4. Weights include all corrections. Some negative weights (tt) 5. Potentially use any information from all 2012 data and MC events 6. Few variables fed in two BDT 7. Significance from complete fit with NP etc… 8. MVA with TMVA BDT

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

1. No systematics 2. No categories, one signal region 3. Straight use of ATLAS G4 MC 4. Weights only include normalisation and pythia

  • weight. Neg. weight events

rejected. 5. Only use variables and events preselected by the real analysis 6. All BDT variables + categorisation variables + primitives 3-vector 7. Significance from “regularised Asimov” 8. MVA “no-limit”

Simpler, but not too simple!

slide-19
SLIDE 19

19

Participation

q Big success ! q 1785 teams (1942 people) have participated (participation=submission of at least one solution)

  • (6517 people have downloaded the data)
  • èmost popular challenge on the Kaggle

platform, ever (Amazon.com employee access challenge 1687 teams, Allstate Purchase Prediction Challenge 1567 teams)

q 35772 solutions uploaded q 136 forum topics with 1100 posts

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-20
SLIDE 20

20

Final leaderboard

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

7000$ 4000$ 2000$

HEP meets ML award XGBoost authors Free trip to CERN

TMVA expert, with TMVA improvements Best physicist

slide-21
SLIDE 21

21

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

=significance

slide-22
SLIDE 22

22

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

Clear overtraining ! =significance

slide-23
SLIDE 23

23

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

Clearly at the top!

slide-24
SLIDE 24

24

What did we learn

q Very successful satellite workshop at NIPS in Dec 2014 @ Montreal:

https://indico.lal.in2p3.fr/event/2632/

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

q In short:

  • 20% gain w.r.t. to untuned TMVA
  • deep Neural nets (but marginally better than BDT)
  • Ensemble methods (random forest, boosting) rule
  • Meta-ensembles of diverse models
  • careful cross-validation (250k training sample really small)
  • Complex software suites using routinely multithreading, GPU, etc…
  • Some techniques (e.g. Meta-ensembles) too complex to be practical, and

marginal gain

  • Others appear practical and useful
slide-25
SLIDE 25

25

Next steps

q Re-importing into HEP all the ML developments q dataset being released imminently on CERN Open Data Portal http://

  • pendata.cern.ch/education/ATLAS, to remain available until the end of time

(citeable with a d.o.i)

  • Release with the full truth

q Better understand what was done by the best participants q NIPS proceedings write-up (with detailed description of “how they did it ?”) q Organisation of visit of winners of HEP meets ML award at CERN (Tianqi Chen and Tong He, authors of XGBoost, and Gabor Melis

  • verall winner)
  • Mini workshop 19th May 2015 2PM in CERN Auditorium

q Discussion on-going with TMVA experts

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-26
SLIDE 26

A HEP tracking pattern recognition challenge ?

slide-27
SLIDE 27

27

Tracking with pileup

q Tracking dominates reconstruction CPU time at LHC q HL-LHC (phase 2) perspective : increased pileup :

  • Run 1 (2012): <>~20
  • Run 2 (2015): <>~30
  • Phase 2 (2025): <>~150

q CPU time quadratic/exponential extrapolation (difficult to quote any number)

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

Graeme Stewart ECFA HL-LHC workshop 2014

slide-28
SLIDE 28

28

A tracking Challenge

q LHC experiments future computing budget flat (at best) q Installed CPU power per $==€==CHF expected increase factor ~10 in 10 years q Experiments plan on increase of data taking rate ~10 as well (~1KHz to 10kHz) q èCPU reconstruction demand to be similar at HL-LHC pileup requires very significant software improvement q Large effort within HEP to optimise software and tackle micro and macro parallelism. Sufficient gains for Run 2 but still a long way for HL-LHC q Why not organise a HEP tracking Challenge, to get fresh ideas and software from the outside world ?

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-29
SLIDE 29

29

What in tracking?

q >30 years of hard work of hundreds of experts q èwhat could be learned from anyone outside

  • ur field ? Spending a few weeks of his time at

best ? q Reinvent kalman filtering, track fitting in non homogeneous magnetic field, multiple scattering, soft/hard Bremsstrahlung ? q Probably not! (not very common outside our field) q Reinvent track pattern recognition ? q Maybe yes!

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-30
SLIDE 30

30

Pattern recognition

q Pattern recognition, connecting the dots, is a very old, very hot topic in Artificial Intelligence q Just one example among many from NIPS 2014 :

http://papers.nips.cc/paper/5572-a-complete-variational-tracker.pdf

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

q Note that these are real- time applications, with CPU constraints q Worry about efficiency, “track swap”,…

slide-31
SLIDE 31

31

HEP tracking…

31

slide-32
SLIDE 32

32

…also Higgs boson, CERN, are very attractive …fascinates ML experts

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-33
SLIDE 33

33

From domain to challenge and back

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

Challenge Problem Solution Domain e.g. HEP Domain experts solve the domain problem Solution The crowd solves the challenge problem Problem simplify reimport

slide-34
SLIDE 34

34

So how would it look like ?

(the following is the result of brain storming with a few HEP and ML

  • colleagues. Food for thought rather than definitive statements!)

q Focus on pattern recognition of ATLAS/CMS –like experiment at HL- LHC q Give full G4 simulation of a possible HL-LHC Si tracker (which one does not matter) q Mixture of meaningful e.g. events W,Z,top q Give list of 3D points

  • Maybe also consider also giving pixel/strip pattern. Would be more

realistic but would increase complexity.

q Give millions of events q Participant develop algorithm on these millions events, then their algorithm is evaluated (figure of merit f.o.m see next slide) on withheld smallish test sample q F.o.m appear online on leaderboard q Best f.o.m win at the end of the challenge

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-35
SLIDE 35

35

Figure of merit

q We’re more interested by CPU gain, than efficiency or fake rate reduction (the latter two probably also require more specific expertise), provided they are “good enough” q Efficiency, fake rate measured per track wrt fraction of 3D point belonging to the same true track (we’re not really interested in track parameter estimation) q So something like: f.o.m=1/CPU*sigmoïd(efficiency, 95%)*sigmoïd(1/fake,1000) q Why sigmoïd rather than a hard threshold ? To avoid luck factor with participant being close to the limit and losing all on the test sample) (top participants do not like luck factor) q Might want to measure efficiency per PT/eta bin (would not want to score well submission with e.g. 0 efficiency above 10 GeV). Also maybe need to deal with electron, tracks from detached vertices separately. q Many many details to sort out in practice

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-36
SLIDE 36

36

CPU challenges have been done

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

Olga Kokshagina 2015 Courtesy : Lakhani 2013

Harvard Medical School Contest for Biology Big Data Problem in Genomics Two week long competition - $2000 prize pot x 3 on TopCoder.com

Best in-house solutions Challenge submissions

slide-37
SLIDE 37

37

CPU measurement

q Contrary to HiggsML challenge need to evaluate CPU time q Some platforms (see AutoML, Codalab, Topcoder) now allow to automatically upload, compile and run software

  • èwell defined hardware (CPU and memory available)
  • èuniform comparison

q Positive side-effect : limit diversity of software languages and libraries q We’re more interested in the detailed algorithm rather than the software itself q We’re more interested in new approaches than in super-

  • ptimised version of old approaches

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-38
SLIDE 38

38

Miscellaneous

q Some participants will want to compete without releasing their software (at least not at the beginning)ècan make separate leader board and prices q Incentive : price for the winners but also a function of the score reached q Need a starter kit with real HEP software

  • Nicely packaged and documented

q Strive to promote “coopetition” so that participant collaborates (tricky) q Foresee of releasing the sample publicly (e.g. CERN Open Data Portal) just after the challenge q Foresee a publication outlet (e.g. a satellite NIPS workshop proceedings, like for HiggsML) q Anticipate from the very beginning the final re-import stage

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-39
SLIDE 39

39

Conclusion

q The Higgs Machine Learning Challenge successful in having ML experts tackle one specific HEP problem

  • re-import to HEP of ML techniques exposed on-going (and will

take long)

q We (HEP) expect that breakthrough in pattern recognition would be invaluable to efficiently reconstruct future HL-LHC data q èA Challenge on HEP pattern recognition could allow to make such breakthrough happen q A personal note : I’m still quite busy with HiggsML, so I try to promote this idea, but I don’t own it and cannot have a leading role in making it happen.

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-40
SLIDE 40

Spares

slide-41
SLIDE 41

41

Machine Learning?

q Machine Learning is the part of computer science which, in particular, deals among other with automatic classification:

  • E.g. neural nets to read handwritten digits (~30 years ago)
  • In HEP we call it MVA (Multi Variate Analysis)

q Developing rapidly

  • Lots of data to deal with
  • Lots of CPU power too
  • Big money involved:

§ google advertisements based on your searches or your gmail messages § amazon : “we recommend for you” based on what you already bought

  • “Big data” buzz word
  • New field “Data Science”

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-42
SLIDE 42

42

q Good problem q Good data + enough data q Clear objective + good metric q Simple rules + no Information Protection contingencies q Prizes (~ $5000) q Starter kit q On-line feed-back q Publication outlets q Avoid many pitfalls, in particular leaking the truth (see http://ciml.chalearn.org/schedule a NIPS satellite workshop on organising challenges, in particular Ben Hammer’s, Kaggle chief scientist, on do’s and don’ts)

Isabelle Guyon http://chalearn.org/

What make a good challenge?

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-43
SLIDE 43

43

What data did we release ?

q From ATLAS full sim Geant4 MC12 production q 30 variables (see later) q Signal is Hètautau, Background a mixture of : Z, top, W q Based on November 2013 ATLAS Htautau conf note ATLAS- CONF-2013-108 q Preselection for lep-had topology : single lepton trigger, one lepton identified, one hadronic tau identified q è800.000 events:

  • 250.000 training data set
  • 550.000 test data set without label and weight

q Reproduces reasonably well (~20%) content of 3 highest sensitivity bins (x 2 categories) in conf note q (some background and many correction factors deliberately omitted so that the sample cannot be used for physics, only for machine learning studies)

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-44
SLIDE 44

44

The ATLAS hacker

q In principle, one ATLAS hacker could:

  • use the variables to match back (using lepton/

hadron/jet Pt eta phi) to the original MC event

  • check whether it is signal or background
  • cheat to get best significance

q (this would be discovered at the software release stage)

q This was transparently explained to ATLAS :it would be bad for our image and counterproductive if the public leaderboard is cluttered by ATLAS hackers

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-45
SLIDE 45

45

Atlas hacker (2)

q We thought of smearing the parameters to prevent the matching q Back of an envelope calculation: by how much should D variables be smeared so that the original can be matched with 50% probability among N entries q èsmearing should be at least 15% q èevents become meaningless q èbad idea!

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

N entries Relative smearing D=16 D=8 D=3

slide-46
SLIDE 46

46

ATLAS hacker (3)

q Hiding the origin of an entry in a DB is called “sanitizing”, this is a notoriously difficult, and very hot topic, e.g:

  • Finding owner of a medical record given anonymized parameters

(gender, zip code, and birthdate uniquely identifies 83% of US people)

  • Finding owner of a mobile phone given hour and area of a few calls:

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-47
SLIDE 47

47

Licensing issues

q Anyone participating to the challenge had to agree to the rules ( https://www.kaggle.com/c/higgs-boson/rules ) in particular: q Software

  • Participants can use whatever software they like, but to win a

price, they have to release it under an OS license (so that we can 1) verify it 2) use it)

q Simulated data:

  • All ATLAS real and simulated data belongs to CERN
  • We’ve had the signature of Bertolucci, CERN director of research

to release the data (Thorsten W handled this negociation)

  • Data was made available only for the duration of the challenge

and only for the challenge (“Can I use the data for a master thesis ?” Sorry no.)èfinally late agreement to release in on CERN ODP

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-48
SLIDE 48

48

Significance

q Need to have one robust estimator of the quality of the classification algorithm q Decided to use the well known (in ATLAS) “Asimov” formula (G. Cowan, K.

Cranmer, E. Gross, and O. Vitells, “Asymptotic formulae for likelihood-based tests of new physics”, EPJCC, vol. 71, pp. 1–19, 2011. ) with regularization on top

  • √(2*((s+b’)*log(1+s/b’)-s))
  • with s and b’=b+10 normalised to 2012 luminosity:
  • s=Σ(selected signal) weights_i
  • b=Σ(selected background) weights_i

q Why b’=b+10 (“regularisation”) : practical way to avoid large significance fluctuation when small phase space region with very few background events is chosen. Do not want to pick winners on their luck. q Note that normalisation already done in the weights : no need to explain integrated luminosity and cross-section q Glen Cowan has derived a new version of Asimov formula including a sigma_b (to be shown at coming Statistics Forum) from systematics or statistics

  • However not robust enough in our case

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-49
SLIDE 49

49

Rank distribution after bootstrap

! Gabor really on top

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-50
SLIDE 50

50

Are winning score different ?

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-51
SLIDE 51

51

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-52
SLIDE 52

52

  • Learn a data representation from one task:
  • Use it for another task:

Trans ansfer er Lear Learning ning (2011) 2011)

http://chalearn.org/

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-53
SLIDE 53

53

2009 $1,000,000 2012

Million illion $ $ pr priz ize e challenges hallenges

http://chalearn.org/

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-54
SLIDE 54

54

BDT and Machine Learning

q BDT (Boosted Decision Tree) which is by far the most used technique in Atlas/CMS is actually an old technique (Adaboost 1997) q More recent techniques, just an example:

  • Unsupervised neural network (Example: The Google cat: deep learning

technique running on 16K cores for three days, watching 10M random YouTube video stills [Le et al., ICML’12]

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-55
SLIDE 55

55

Real-time face recognition : efficiency, fake, CPU time…

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-56
SLIDE 56

56

Who are the winners?

q See

http://atlas.ch/news/2014/machine-learning-wins-the- higgs-challenge.html

q 1 : Gabor Melis (Hungary) lisp developer and consultant : wins 7000$ q 2 : Tim Salimans (Neitherland) data science consultant: wins 4000$ q 3 : Pierre Courtiol (France) ? : wins 2000$ q HEP meets ML award: team crowwork, Tianqi Chen and Tong He PhD students in data science at Seattle and Vancouver. Provided XGBoost used by many

  • participants. Win a free trip and visit to

CERN in 2015

?

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

slide-57
SLIDE 57

57

10 20 30 40 50 60 70 80 3 3,04 3,08 3,12 3,16 3,2 3,24 3,28 3,32 3,36 3,4 3,44 3,48 3,52 3,56 3,6 3,64 3,68 3,72 3,76 3,8 3,84 3,88 3,92 3,96 4

Best private LD distribution

David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley

Simple tmva BDT Tuned and improved tmva BDT

score