Higgs Machine Learning Challenge experience. A HEP pattern - PowerPoint PPT Presentation

Higgs Machine Learning Challenge experience. A HEP pattern recognition challenge ? David Rousseau LAL-Orsay 10th February 2015 CTD 2015, Berkeley

Outline q Machine Learning, Challenges … q The Higgs Machine Learning challenge q A HEP pattern recognition challenge ? David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 2

Machine Learning and HEP q Neural Nets used somewhat in the 90’ies (e.g. LEP) q BDT (Adaboost) invented in 97 q MVA techniques (= Machine Learning) have been used extensively at D0/CDF (mostly BDT, but not only) in the 00’ies q Atlas/CMS less eager to adopt MVA at LHC starts for some good reasons: o Need to understand well the input variables first o Still a lot to gain by improving input variables o Systematics more difficult to evaluate o Collected luminosity was increasing fast q But lot of work recently with MVA techniques o Competition o Best use of available data q Meanwhile Neural Net reappear in their “deep” incantation (See Peter Sadowski’s talk this afternoon) David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 3

Machine Learning in HEP (2) q However: o TMVA, within Root, has been instrumental in popularising MVA technique within HEP o Most people using TMVA, most people using BDT in TMVA o Although getting a reasonable answer from TMVA is quick and easy, it takes time to really become an expert with e.g. BDT o People are focussing on the choices of input variables and the evaluation of systematics (which of course are excellent things to do) q Not much work on studying possible better MVA techniques, for which you need the software and the know-how David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 4

Challenge ? q Challenges have become in the last 10 years a common way of working for the machine learning community q Machine learning scientists are eager to test their algorithms on real life problems è more valuable(=publisheable) than artificial problems q Company or academics want to outsource a problem to machine learning scientist, but also geeks etc. The company sets up a challenge like: o Netflix : predict movie preference from past movie selection o Gesture recognition o Separating pictures of cats from pictures of dogs o NASA/JPL mapping dark matter through (simulated) galaxy distortion o … q Some companies makes a business from organising challenges: datascience.net, kaggle q A few recent examples now… David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 5

Looking Looking at People eople (2012-14) 2012-14) Actions Interactions Wave Point Clap Shake Hands Hug Fight http://chalearn.org/ David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 6

Neur Neural al connect connectomics omics (2015) 2015) http://chalearn.org/ David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 7

NE NEW: W: Aut utoM oML challenge hallenge (2015) 2015) Fully automatic machine learning without ANY human intervention http://codalab.org/AutoML December 2014 – May 2015 $30,000 in prizes David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 8

Why challenges work ? MOTIVATION OF ORGANIZING CONTESTS: EXTREME VALUE Courtesy : Lakhani 2014 Experts are highly skilled, trained - > more focused, performed solution, low variety OI is suitable for a variety of nonconvential surprising ideas that are « far » from traditional Not just ML, but a general trend: expertise - > high volatility Open Innovation Olga Kokshagina 2015 20 David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 9

From domain to challenge and back Domain e.g. HEP Challenge simplify Problem Problem Domain The crowd experts solves solve the challenge the domain problem problem reimport Solution Solution David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 10

Higgs Machine Learning Challenge

… in a nutshell q Why not put some Atlas simulated data on the web and ask data scientists to find the best machine learning algorithm to find the Higgs ? o Instead of HEP people browsing machine learning papers, coding or downloading possibly interesting algorithm, trying and seeing whether it can work for our problems q Challenge for us : make a full ATLAS Higgs analysis simple for non physicists, but not too simple so that it remains useful q Also try to foster long term collaborations between HEP and ML David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 12

Committees q Organization committee: { ATLAS o David Rousseau : Atlas-LAL o Claire Adam-Bourdarios : Atlas-LAL (outreach, legal matter) o Glen Cowan : Atlas-RHUL (statistics) { Learning Machine o Balazs Kegl : Appstat-LAL o Cécile Germain : TAO-LRI o Isabelle Guyon : Chalearn (challenges organisation) q Advisory committee: o Andreas Hoecker : Atlas-CERN (PC,TMVA) o Joerg Stelzer : Atlas-CERN (TMVA) o Thorsten Wengler : Atlas-CERN (ATLAS management) o Marc Schoenauer : INRIA (french computer science institute) David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 13

H tautau ATLAS-CONF-2013-108 4.1 σ evidence (now superseded by paper arXiv:1501.049) David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 14

How did it work ? q First idea in Sep 2012 q Challenge ran from May to September 2014 q People register to Kaggle web site hosted https://www.kaggle.com/c/higgs-boson . (additional info on https://higgsml.lal.in2p3.fr) q Open to almost any one o Data scientist o HEP physicists o Students, geeks, o Except LAL-Orsay employees (for legal reasons) q …download training dataset (with label) with 250k events q …train their own algorithm to optimise the significance (à la s/sqrt(b)) q …download test dataset (without labels) with 550k events q …upload their own classification q The site automatically calculates significance. Public (100k events) and private (450k events) leader boards update instantly. q Competition closes mid september 2014. People are asked to provide their code and methods. Best 1 2 3 from private leaderboard win 7k € 4k € 2k € q The most interesting one gets the “HEP meets ML award” Funded by: Paris Saclay Center for Data Science, Google, INRIA David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 15

Dataset ASCII csv file, with mixture of Higgs to tautau signal Primitive 3-vectors allowing to compute the conf and corresponding background, from official note variables (mass neglected), GEANT4 ATLAS simulation 16 independent variables: PRI_tau_pt Weight and signal/background (for training dataset PRI_tau_eta only) PRI_tau_phi weight (fully normalised) PRI_lep_pt label : « s » or « b » PRI_lep_eta Conf note variables used for categorization or BDT: PRI_lep_phi DER_mass_MMC PRI_met DER_mass_transverse_met_lep PRI_met_phi DER_mass_vis PRI_met_sumet DER_pt_h PRI_jet_num (0,1,2,3, capped at 3) DER_deltaeta_jet_jet PRI_jet_leading_pt DER_mass_jet_jet PRI_jet_leading_eta DER_prodeta_jet_jet PRI_jet_leading_phi DER_deltar_tau_lep PRI_jet_subleading_pt DER_pt_tot PRI_jet_subleading_eta DER_sum_pt PRI_jet_subleading_phi DER_pt_ratio_lep_tau PRI_jet_all_pt DER_met_phi_centrality DER_lep_eta_centrality David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 16

From domain to challenge and back Domain e.g. HEP Challenge 18 months simplify Problem Problem Domain The crowd experts solves 4 months solve the challenge the domain problem problem reimport Solution Solution >2 years ? David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 17

Real analysis vs challenge 1. Systematics 1. No systematics 2. 2 categories x n BDT score bins 2. No categories, one signal region 3. Straight use of ATLAS G4 MC 3. Background estimated from data (embedded, anti tau, control 4. Weights only include region) and some MC normalisation and pythia weight. Neg. weight events 4. Weights include all corrections. rejected. Some negative weights (tt) 5. Only use variables and events 5. Potentially use any information preselected by the real analysis from all 2012 data and MC events 6. All BDT variables + categorisation variables + 6. Few variables fed in two BDT primitives 3-vector 7. Significance from “regularised 7. Significance from complete fit Asimov” with NP etc… 8. MVA “no-limit” 8. MVA with TMVA BDT Simpler, but not too simple! David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 18

Participation q Big success ! q 1785 teams (1942 people) have participated (participation=submission of at least one solution) o (6517 people have downloaded the data) o è most popular challenge on the Kaggle platform, ever (Amazon.com employee access challenge 1687 teams, Allstate Purchase Prediction Challenge 1567 teams) q 35772 solutions uploaded q 136 forum topics with 1100 posts David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 19

Final leaderboard 7000$ 4000$ 2000$ Best physicist HEP meets ML award XGBoost authors Free trip to CERN TMVA expert, with TMVA improvements David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 20

=significance David Rousseau HiggsML and tracking challenges CTD 2015 Berkeley 21

Higgs Machine Learning Challenge experience. A HEP pattern - PowerPoint PPT Presentation

Higgs Machine Learning Challenge experience. A HEP pattern recognition challenge ? David Rousseau LAL-Orsay 10th February 2015 CTD 2015, Berkeley Outline q Machine Learning, Challenges q The Higgs Machine Learning challenge q A

Higgs searches at LHC Higgs searches at LHC SM Higgs discovery potential SM Higgs

Higgs Physics - current status and future prospects Higgs physics at the LHC Higgs physics at

Searches for Rare Higgs Decays and an Additional Higgs Singlet Learning from the current

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

Learning to Inflate Tom Rudelius IAS Based on 1810.05159/hep-th Outline Machine Learning

Effective field theory for Higgs Physics Margherita Ghezzi Higgs Hunting 2016 Paris, 1st

Higgs @HL/HE-LHC S. Jzquel (LAPP-IN2P3) On behalf of the Higgs Working group (WG2) Higgs

Looking through the Higgs portal with exotic Higgs decays Jessie Shelton University of Illinois,

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

The SM and the Higgs Boson Daniele.Zanzi@cern.ch Is the Higgs boson responsible for our mass?

Beyond the Higgs Boson John Ellis The Higgs is just one of the questions King s College

Precision Higgs physics: a gateway to New Physics Jonas M. Lindert SM@LHC 2018 Higgs-session

Higgs Physics (in the SM and in the MSSM) Abdelhak DJOUADI (LPT CNRS & U. Paris-Sud) The

Di-Higgs production and Higgs self-coupling in ATLAS at HL-LHC Petar Bokan on behalf of the

Hi Higgs and the Cosmos d th C Kerson Huang MIT 2013 1 After decades of search, the Higgs

The dual life of giant gravitons David Berenstein UCSB Based on: hep-th/0306090, hep-th/0403110

NM Data In Integration Grant Program: Webinar 2 Zach Grant New Mexico Sentencing Commission

The Place of the Region Higher education governance in Germany, Norway and the UK Jrgen Enders

Latin and Greek Elements in English Lesson 21: Blends BLEND : a word formed by combining

10 main methods of how NOT to implement Agile Danny (Danko) Kovatch danko@Ajimeh.com

Business Processes in a Global Domain Monica J. Martin Sun Microsystems monica.martin@sun.com

Comments on Industry Comments on Industry Research @ CRA Snow bird Research @ CRA Snow bird

Innovation through the www.linkedin.com/pulse/climate-change-healthcare-lucien- hacking

Economics and Computation Ad Auctions and Other Stories Christopher A. Wilkens UC Berkeley

Higgs Machine Learning Challenge experience. A HEP pattern - PowerPoint PPT Presentation

Higgs Machine Learning Challenge experience. A HEP pattern recognition challenge ? David Rousseau LAL-Orsay 10th February 2015 CTD 2015, Berkeley Outline q Machine Learning, Challenges q The Higgs Machine Learning challenge q A

Higgs searches at LHC Higgs searches at LHC SM Higgs discovery potential SM Higgs

Higgs Physics - current status and future prospects Higgs physics at the LHC Higgs physics at

Searches for Rare Higgs Decays and an Additional Higgs Singlet Learning from the current

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

Learning to Inflate Tom Rudelius IAS Based on 1810.05159/hep-th Outline Machine Learning

Effective field theory for Higgs Physics Margherita Ghezzi Higgs Hunting 2016 Paris, 1st

Higgs @HL/HE-LHC S. Jzquel (LAPP-IN2P3) On behalf of the Higgs Working group (WG2) Higgs

Looking through the Higgs portal with exotic Higgs decays Jessie Shelton University of Illinois,

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

The SM and the Higgs Boson Daniele.Zanzi@cern.ch Is the Higgs boson responsible for our mass?

Beyond the Higgs Boson John Ellis The Higgs is just one of the questions King s College

Precision Higgs physics: a gateway to New Physics Jonas M. Lindert SM@LHC 2018 Higgs-session

Higgs Physics (in the SM and in the MSSM) Abdelhak DJOUADI (LPT CNRS &amp; U. Paris-Sud) The

Di-Higgs production and Higgs self-coupling in ATLAS at HL-LHC Petar Bokan on behalf of the

Hi Higgs and the Cosmos d th C Kerson Huang MIT 2013 1 After decades of search, the Higgs

The dual life of giant gravitons David Berenstein UCSB Based on: hep-th/0306090, hep-th/0403110

NM Data In Integration Grant Program: Webinar 2 Zach Grant New Mexico Sentencing Commission

The Place of the Region Higher education governance in Germany, Norway and the UK Jrgen Enders

Latin and Greek Elements in English Lesson 21: Blends BLEND : a word formed by combining

10 main methods of how NOT to implement Agile Danny (Danko) Kovatch danko@Ajimeh.com

Business Processes in a Global Domain Monica J. Martin Sun Microsystems monica.martin@sun.com

Comments on Industry Comments on Industry Research @ CRA Snow bird Research @ CRA Snow bird

Innovation through the www.linkedin.com/pulse/climate-change-healthcare-lucien- hacking

Economics and Computation Ad Auctions and Other Stories Christopher A. Wilkens UC Berkeley

Higgs Physics (in the SM and in the MSSM) Abdelhak DJOUADI (LPT CNRS & U. Paris-Sud) The