Boosting New Physics Searches with Deep Learning David Shih NHETC, - PowerPoint PPT Presentation

Boosting New Physics Searches with Deep Learning David Shih NHETC, Rutgers University Accelerating the Search for Dark Matter with Machine Learning ICTP , Trieste April 9, 2019

Announcement You are invited to submit an abstract for the ML parallel session at SUSY 2019. The deadline is TOMORROW!!

The AI Revolution is Here

The AI Revolution is Here So many stunning real world successes in recent years. Driven by: • Growth in computational power • Improvements in algorithms • Increased quantity and quality of data Prerequisite for deep learning: large, complex, and well-understood datasets. Many real world applications are limited by the quality and quantity of the data.

Big Data and Deep Learning The LHC is the perfect setting for deep L H C learning! L H C ( s t o r e d ) e r a w B u s i n e s s e m a i l s The data is s e n t p e r y e a r G o o g l e s e a r c h • large (billions of events on tape) i n d e x • complex (hundreds of particles per event) • well-understood (Standard Model of particle physics). e F a c e b o o k y e a r l y Also, it is relatively easy to generate realistic u p l o a d s simulated data.   (Madgraph, Pythia, Herwig, Delphes, GEANT,…) https://www.wired.com/2013/04/bigdata/ Pasquale Musella, ETH-Zurich seminar

A brief introduction to the LHC

An introduction to the LHC The Large Hadron Collider is the largest and highest-energy particle accelerator in the world. It is part of CERN, located at the border of France and Switzerland, near the city of Geneva. • 27 km long tunnel • 100 m underground • ~ $10 billion • ~5,000 scientists from ~200 countries

At the LHC, protons are accelerated to 99.9999991% of the speed of light, and collided together at four interaction points (ATLAS, CMS, LHCb, ALICE) video from the ATLAS experiment Beam energy: 6.5 TeV / proton ~ 300 trillion protons (in ~3000 bunches) in each beam 25 ns bunch spacing

An LHC Detector Detector is cylindrical (symmetric around beam axis)

Collision events at the LHC raw event rate ~ GHz => ~ 100 Hz after “triggering” data rate: ~ 1 GB/s ~ several PB/year

What is all this for?

The Standard Model of Particle Physics

Was established in the 1970s… … and people have been trying (and failing) to break it ever since.

What else is there beyond the Standard Model? What is the next layer of fundamental matter and interactions?

The main tool in the search for new physics beyond the SM is the particle collider. By smashing together elementary particles at higher and higher energies, we hope to create new particles. We attempt to “see” these new particles by studying the collision debris with very powerful detectors.

<latexit sha1_base64="(nul)">(nul)</latexit> <latexit sha1_base64="(nul)">(nul)</latexit> <latexit sha1_base64="(nul)">(nul)</latexit> <latexit sha1_base64="(nul)">(nul)</latexit> We know there’s new physics out there… dark matter grand unification hierarchy problem L ⊃ θ α s 8 π G µ ν ˜ G µ ν θ . 10 − 10 flavor puzzle neutrino masses strong CP problem

But no sign of it yet at the LHC… Precision measurements of SM processes. Agreement between theory and experiment across ~9 orders of magnitude.

But no sign of it yet at the LHC… Countless searches for new physics beyond the SM. So far no concrete evidence, only lower limits on the NP scale.

What does a typical search for new physics look like at the LHC? Typical new physics production rates are many, many orders of magnitude smaller than SM processes. Need a way to improve signal to noise to have any hope of seeing new physics.

What does a typical search for new physics look like at the LHC? • Identify a “signal region” in the phase space, motivated by some model, where one expects S/N to be greatly enhanced. • Estimate SM background using combination of simulations and data- driven methods (control regions) • Compare data to SM prediction: announce discovery significance or set a limit on the model

This generally assumes we know what we’re looking for. ML can still help in this case, by improving S/N — ➡ supervised learning, classification, regression What if we don’t know what we’re looking for? Can we find the unexpected signal buried underneath all this raw data? ML can help in this case — unsupervised learning, ➡ clustering, anomaly detection A promising path forward: Adapt sophisticated ML tools developed for real-world applications in order to improve data analysis at the LHC

The Landscape of ML

The Landscape of ML @ LHC Autoencoders CaloGAN pile-up reduction PCA LaGAN Dimensionality JUNIPR Reduction Regression Generation Unsupervised Supervised Learning Learning Clustering Machine Classification Learning Jet finding Anomaly algorithms Detection top tagging b tagging W/Z tagging Autoencoders q/g tagging CWoLa strange tagging Triggering full event tagging Reinforcement Learning jet grooming

Recent progress in ML @ LHC • Huge performance gains, especially for object classification • Exploring the possibilities of learning physics directly from the data • Developing new and unconventional ways of searching for new physics In the rest of this talk, I will focus on some recent works that touch upon these points.

A benchmark problem: boosted top tagging vs. How to differentiate between these two types of jets? QCD boosted jet g This is a straightforward q supervised classification problem in ML. ¯ q

QCD boosted jet g vs q ¯ q Some obvious ideas: 13 TeV 13 TeV 0.16 0.16 A.U. A.U. CMS CMS Simulation Preliminary Simulation Preliminary 0.14 0.14 Top, 470<p <600 GeV, 65% CA15, flat p , η 110 < m < 210 GeV T SD. T Top, 600<p <800 GeV, 71% 0.12 0.12 T Top, 470<p <600 GeV, 69% < >=20, 25ns AK8, flat p , µ η Top, 800<p <1000 GeV, 75% T T T Top, 600<p <800 GeV, 68% Top, 1000<p <1400 GeV, 78% T < >=20, 25ns µ T Top, 800<p <1000 GeV, 72% 0.1 0.1 QCD, 470<p <600 GeV, 19% T T Top, 1000<p <1400 GeV, 68% QCD, 600<p <800 GeV, 23% T T QCD, 470<p <600 GeV, 14% QCD, 800<p <1000 GeV, 26% T 0.08 0.08 T QCD, 600<p <800 GeV, 15% QCD, 1000<p <1400 GeV, 28% T T QCD, 800<p <1000 GeV, 14% T QCD, 1000<p <1400 GeV, 12% 0.06 0.06 T 0.04 0.04 0.02 0.02 0 0 0 0.2 0.4 0.6 0.8 1 0 100 200 Ungroomed / HTT V2 Mass (GeV) τ τ 3 2 jet mass (m top vs 0) jet substructure (3 vs 1) <470 GeV, 39%

State of the art with cuts on kinematic quantities: CMS QCD jet Simulation Preliminary 13 TeV “ROC curve” mistag rate B ε 800 < p < 1000 GeV, | | < 1.5 η T R(top,parton) < 0.6 Δ flat p and η 1 − 10 T 2 − 10 CMSTT min. m CMSTT top m Filtered (r=0.2, n=3) m HTT V2 f Rec HTT V2 m Pruned (z=0.1, r =0.5) m 3 − cut 10 Q-jet volatility Softdrop (z=0.1, =0) m β Softdrop (z=0.2, =1) m β Trimmed (r=0.2, f=0.03) m Ungroomed / τ τ 3 2 log( χ ) (R=0.2) 4 − 10 0 0.2 0.4 0.6 0.8 1 top tagging efficiency ε S Can deep learning do better??

Automated Feature Engineering By training on raw, low-level inputs, deep learning can achieve much better performance. Deep neural networks automate and optimize the process of “feature engineering”. From towardsdatascience.com m inv , τ 21 , τ 32 , … Deep learning algorithm Cuts BDT Jet constituents Top or QCD

Data Representations Although deep learning capable of building features from raw data, how we represent the data can still matter a lot. In the case of jets, some popular options are • Four vectors (DNNs) • Sequences (RNNs, LSTMs) • Binary trees (RecNNs) • Graphs (point clouds) • Images (CNNs)

Jet Images Can think of a jet as an image in eta and phi, with • Pixelation provided by calorimeter towers • Pixel intensity = pT recorded by each tower Calorimeter Figure credit: B. Nachman from B. Nachman Should be able to apply “off-the-shelf” NNs developed for image recognition to classify jets at the LHC! de Oliveira et al 1511.05190

Top Tagging with CNNs Macaluso & DS 1803.00107 Individual images very sparse QCD Tops CMS 13 TeV p T ∈ (800 , 900) GeV, | η | < 1 Jet sample Pythia 8 and Delphes particle-flow match: ∆ R ( t, j ) < 0 . 6 merge: ∆ R ( t, q ) < 0 . 6 1.2M + 1.2M 37 × 37 Image ∆ η = ∆ φ = 3 . 2 ( p neutral , p track Colors , N track , N muon ) T T Building on previous “DeepTop” tagger of Kasieczka et al 1701.08784 Other approaches also promising (DNNs, RecNNs, RNNs, LSTMs, GNNs, …)

Boosting New Physics Searches with Deep Learning David Shih NHETC, - PowerPoint PPT Presentation

Boosting New Physics Searches with Deep Learning David Shih NHETC, Rutgers University Accelerating the Search for Dark Matter with Machine Learning ICTP , Trieste April 9, 2019 Announcement You are invited to submit an abstract for the ML

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Searches with a Searches with a Disappearing-Track Signature Disappearing-Track Signature Andy

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

Guiding New Physics Searches with Unsupervised Learning [DS, Jacques - 1807.06038]

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

The Boosting Approach to Machine Learning Maria-Florina Balcan 03/16/2015 Boosting General

STK-IN4300 Statistical Learning Methods in Data Science Likelihood-based Boosting introduction

STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

The impact of rare K decays in The impact of rare K decays in New Physics searches New Physics

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Scaling Saved Searches Serving real time push-notifications for millions saved searches Who are

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Statistical Natural Language Processing . ltekin, variables learned without labels

DEVELOPING A THEORETICAL MODEL OF CLINICIAN INFORMATION USAGE PROPENSITY Dr Philip J Scott MSc

The Overlapping Thermodynamic Dissociation Constants of the Antidepressant Vortioxetine Using

Principal Component Analysis Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE 175A Winter

LEARNING NEW PHYSICS FROM A MACHINE Raffaele Tito DAgnolo - SLAC GGI 2018 RTD and Andrea

Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best

Cbio 16S analysis pipeline Katie Lennard Microbiome analysis workflow Data preprocessing (UCT

HALO (Highly Addictive, 1 sociaLly Optimized) Software Engineering Swapneel Sheth, Jonathan