[PPT] - Identifying top quarks in the pursuit of new physics Siddharth PowerPoint Presentation

SLIDE 1

Identifying top quarks in the pursuit of new physics

Siddharth Narayanan, MIT PPC MIT LNS Seminar, 20/02/2018

S. Narayanan

(MIT) LNS Seminar 20/02/2018 1 / 42

SLIDE 2

Searching for new physics at the LHC

◮ The LHC is a particle factory ◮ Many ways to search for beyond-the-Standard Model physics at the LHC:

◮ Produce SM particles and precisely measure their properties ◮ Produce SM bound states and study interesting decay channels ◮ Produce BSM particles and identify them ◮ . . .

◮ Identify BSM particles by looking for:

◮ Resonant final states ◮ Exotic decays (e.g. semi-stable particles) ◮ Particles with small couplings to SM

particles

◮ . . .

S. Narayanan

(MIT) LNS Seminar 20/02/2018 2 / 42

SLIDE 3

Seeing the invisible

◮ Particles that do not interact with our detector are “invisible” for practical purposes ◮ Their presence must be inferred through momentum conservation: ◮ This momentum imbalance is referred to as pmiss T ◮ Invisible particles include dark matter candidates

L’essentiel est invisible pour les yeux. - A. Exup´ ery

S. Narayanan

(MIT) LNS Seminar 20/02/2018 3 / 42

SLIDE 4

Hadronic top quarks and missing momentum

◮ Choice of SM particle probes different models and

parameter spaces

◮ Light quarks or gluons ◮ W/Z/H/γ bosons ◮ Top quarks

◮ Single top quark + pmiss T

implies flavor-violating new physics

◮ Can have implications for baryogenesis and DM

◮ Require the top quark to decay hadronically:

◮ Larger branching ratio ◮ pmiss

T

is purely due to BSM particlecs

u V g u t ¯ χ χ φ ¯ s ¯ d t ψ

S. Narayanan

(MIT) LNS Seminar 20/02/2018 4 / 42

SLIDE 5

Outline

◮ The mono-top search at CMS

◮ Constructing principle observables ◮ Identifying hadronic top quarks ◮ Constraining backgrounds ◮ Interpreting results

◮ New approaches to top-tagging

S. Narayanan

(MIT) LNS Seminar 20/02/2018 5 / 42

SLIDE 6

Search for top+pmiss

T

S. Narayanan

(MIT) LNS Seminar 20/02/2018 6 / 42

SLIDE 7

Compact Muon Solenoid

◮ General-purpose detector ◮ pmiss T

is constructed as:

p miss

T

= −  

i∈particles
pi

 

T

where the sum is over all identifiable particles

S. Narayanan

(MIT) LNS Seminar 20/02/2018 7 / 42

SLIDE 8

Compact Muon Solenoid

Solenoid

◮ 3.8 T field parallel to beam

Silicon tracker

◮ Inner layers of pixels and outer

layers of strip detectors

◮ Momentum measurement of

charged hadrons, electrons, muons

◮ Track vertexing to ID

◮ . . . pile-up noise ◮ . . . B-meson decays

S. Narayanan

(MIT) LNS Seminar 20/02/2018 8 / 42

SLIDE 9

Compact Muon Solenoid

Calorimeters

◮ EM: homogenous, PbWO4

crystals

◮ Hadronic: sampling, brass and

plastic scintillator

◮ Energy resolution and large

coverage of calorimeters critical to pmiss

T

resolution Muon chambers

◮ Various ionization detectors ◮ Used to ID muons and improve

momentum measurement

S. Narayanan

(MIT) LNS Seminar 20/02/2018 9 / 42

SLIDE 10

Observables of mono-top final state

◮ Large pmiss T

and hadronic top decay define mono-top

◮ Top quark decays to 3 quarks ⇒ 3 jets ◮ “Jet” is algorithm-dependent

◮ e.g. anti-kT algorithm with radius of R = 0.4 in η-φ plane

◮ Signal models produce more energetic

jets than background processes

◮ Separation between daughter quarks

scales as ∆R ∼ 2mt/pT

◮ If we want to look for highly-boosted

top quarks, we cannot resolve decay products as separate jets

S. Narayanan

(MIT) LNS Seminar 20/02/2018 10 / 42

SLIDE 11

Reconstruction of top quark

◮ Replace 3 AK 0.4 jets (AK4) with a single CA 1.5 jet (CA15)

◮ 0.4 → 1.5: much wider radius ◮ Anti-kT → Cambridge-Aachen: more geometrical clustering

◮ Large radius allows single algorithm to reconstruct range of top quark momenta

◮ As low as pT ∼ 250 GeV

◮ These are big jets

◮ R = 1.5 can contain up to half the detector ◮ Unwanted particles can sneak into top jet ◮ Fake top jets from combinatorial q/g -

S. Narayanan

(MIT) LNS Seminar 20/02/2018 11 / 42

SLIDE 12

Identifying top jets

◮ Remove pile-up contamination from event

◮ Pile-Up Per Particle Identification (PUPPI) algorithm [arXiv:1407.6013] ◮ Likelihood given particle is from primary vertex

◮ Remove soft and wide-angle radiation from jet

◮ Soft drop grooming [arXiv:1402.2657] ◮ Used to improve mass resolution and define “subjets”

◮ Identify b subjets

◮ Probability based on signatures of B meson decays, including displaced decay vertex

50 100 150 200 250

a.u.

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Data t t W+jets Z+jets Single t Diboson QCD Data t t W+jets Z+jets Single t Diboson QCD

CMSPreliminary

(13 TeV)

1

35.8 fb [GeV]

SD

fatjet m 50 100 150 200 250

Exp Data-Exp 0.4 − 0.2 − 0.2 0.4

Jets / 0.02

1 10

2

10

3

10

4

10

5

10

6

10

7

10

8

10

9

10

11

10

12

10

13

10 Data b c udsg

CMS

Preliminary

Soft drop subjets of Muon-tagged AK8 jets Muon Enriched Multijet sample (AK8 jets) > 350 GeV

T

p

CSVv2 Discriminator

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Data/MC

0.5 1 1.5 = 13 TeV, 2016 s ,

1

36 fb

S. Narayanan

(MIT) LNS Seminar 20/02/2018 12 / 42

SLIDE 13

Jet substructure

◮ A top jet is expected to have a structure consistent with 3 partons ◮ Jets from light quarks typically do not have distinct prongs ◮ Observables that are sensitive to such structure are referred to as substructure

S. Narayanan

(MIT) LNS Seminar 20/02/2018 13 / 42

SLIDE 14

Substructure variables

◮ N-subjettiness [arXiv:1011.2268]

◮ τN are measure of compatibility of jet with N-axis hypothesis

◮ HEPTopTagger [arXiv:1312.1504]

◮ Decluster the jet into subjets and re-combine them to reconstruct W and t ◮ Variable of interest is frec ∼ mW /mt

◮ Energy correlation functions [arXiv:1609.07473]

◮ Defines variables e(α, N, a) sensitive to N-point correlations in the jet

32

τ Groomed

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Events/0.02

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

QCD Top QCD Top

CMSPreliminary

< 210 GeV

SD

110 < m

rec

HTT f

0.2 0.4 0.6 0.8 1

Events/

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22

QCD Top QCD Top

CMSPreliminary

< 210 GeV

SD

110 < m

2

e(2,4,2)/e(1,3,2)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Events

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16

QCD Top QCD Top

CMSPreliminary

< 210 GeV

SD

110 < m

S. Narayanan

(MIT) LNS Seminar 20/02/2018 14 / 42

SLIDE 15

ECF ratios

◮ Expect a top jet to have strong 3-point correlations, but not 4-point correlations

◮ e(N = 4)/e(N = 3) ∼ 0

◮ Both N = 3 and N = 4 should be weak for q/g jets

◮ e(N = 4)/e(N = 3) > 0

◮ Ratio proposed in arXiv:1609.07473 is

N(β)

3

= e(2, 4, β) (e(1, 3, β))2

◮ Can naturally extend this to a wider class of dimensionless variables:

ψ(a, α, N; b, β, M) ≡ e(a, N, α) e(b, M, β)x , where M ≤ N and x = aα bβ

S. Narayanan

(MIT) LNS Seminar 20/02/2018 15 / 42

SLIDE 16

Non-trivial ECF ratios

◮ Turns out many correlation function ratios can separate signal and background ◮ Of course, most combinations are highly correlated or not useful

2

e(1,2,2)/e(1,2,1)

2 3 4 5 6 7 8 9 10

Events

0.05 0.1 0.15 0.2 0.25

QCD Top QCD Top

CMSPreliminary

< 210 GeV

SD

110 < m

e(N = 2)/e(N = 2)

1/2

e(3,3,2)/e(3,3,4)

0.05 0.1 0.15 0.2 0.25

Events

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

QCD Top QCD Top

CMSPreliminary

< 210 GeV

SD

110 < m

e(N = 3)/e(N = 3)

2

e(1,4,4)/e(1,3,2)

0.5 1 1.5 2 2.5

Events

0.02 0.04 0.06 0.08 0.1 0.12 0.14

QCD Top QCD Top

CMSPreliminary

< 210 GeV

SD

110 < m

e(N = 4)/e(N = 3)

S. Narayanan

(MIT) LNS Seminar 20/02/2018 16 / 42

SLIDE 17

Building a multivariate tagger

◮ ECFs and τN are computed after soft

drop grooming is applied

◮ Space of ECF ratios is pruned based on

2 criteria:

◮ Separation power ◮ Agreement between MC simulation

and data

◮ Use a boosted decision tree on 11 ECF

ratios, frec, τ3/τ2

ǫbkg(ǫsig = 0.5) ECF+τ32+frec 4.7% τ32+frec 6.2% τ32 6.9% signal efficiency

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

background acceptance

2 −

10

1 −

10 1

BDT

rec

+ f

SD 32

τ 11 ECF + 11 ECF BDT 50 ECF BDT BDT

rec

+f

SD 32

τ

32

τ Groomed

CMSPreliminary

< 210 GeV

SD

110 < m

S. Narayanan

(MIT) LNS Seminar 20/02/2018 17 / 42

SLIDE 18

Validation in data

1 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1

a.u.

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Data t t W+jets Z+jets Single t Diboson QCD Data t t W+jets Z+jets Single t Diboson QCD

CMSPreliminary

(13 TeV)

1

35.8 fb Top BDT 1 − 0.8 − 0.6 − 0.4 − 0.2 − 0.2 0.4 0.6 0.8 1

Exp Data-Exp 0.4 − 0.2 − 0.2 0.4

1 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1

a.u.

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 Data Z+jets t t W+jets Single t Diboson QCD Data Z+jets t t W+jets Single t Diboson QCD

CMSPreliminary

(13 TeV)

1

35.8 fb Top BDT 1 − 0.8 − 0.6 − 0.4 − 0.2 − 0.2 0.4 0.6 0.8 1

Exp Data-Exp 0.4 − 0.2 − 0.2 0.4

50 100 150 200 250 300 350

Events/15 GeV

500 1000 1500 2000 2500 Post-fit Data Background Matched top Pre-fit Post-fit Data Background Matched top Pre-fit

CMSPreliminary

(13 TeV)

1

36.6 fb 0.0114 ± = 0.671

Data tag

ε = 0.713

MC tag

ε 0.00946 ± = 0.557

Data tag+mSD

ε = 0.566

MC tag+mSD

ε

Pass category

[GeV]

SD

fatjet m

50 100 150 200 250 300 350

Exp Data-Exp

0.5 − 0.5

◮ Generally observe good agreement between data and simulation in top and q/g jets ◮ Make an unbiased estimate of ǫsig in data and find it is consistent with MC

S. Narayanan

(MIT) LNS Seminar 20/02/2018 18 / 42

SLIDE 19

Selecting mono-top events

Mono:

◮ Select events with pmiss T

> 250 GeV

◮ Threshold is set by trigger efficiency ◮ Nothing else (e/µ/τ/γ/b) in the event

Top:

◮ One CA15 jet with pT > 250 GeV ◮ . . . containing one b-tagged subjet ◮ . . . having a mass 110 < mSD < 210 GeV ◮ . . . passing a BDT selection ◮ Signal region is split into “tight” and “loose” categories, based on BDT response

Remaining backgrounds: Process Contamination Mechanism t¯ t → (bℓ)ν+jets 50% real top jet, lost charged lepton and b jet Z → νν+jets 30% q/g jet faking a top jet W → (ℓ)ν+jets 15% q/g jet faking a top jet, lost charged lepton

S. Narayanan

(MIT) LNS Seminar 20/02/2018 19 / 42

SLIDE 20

Background estimation

◮ Variable of interest in SR is pmiss T ◮ Reconstructed pmiss T

in SR ∼ pT of the vector boson

◮ A lost charged lepton is typically out of acceptance ◮ In the case of t¯

t, the leptonic W is the vector boson

◮ Therefore, define recoil U:

pW/Z

T

≈ U = p miss

T

+

i∈e,µ,γ
p i

T ◮ U in a Z → µµ event is analogous to pmiss T

in a Z → νν event

◮ Allows us to use visible processes to constrain invisible ones

S. Narayanan

(MIT) LNS Seminar 20/02/2018 20 / 42

SLIDE 21

Background estimation

S. Narayanan

(MIT) LNS Seminar 20/02/2018 21 / 42

SLIDE 22

Additional background constraints

◮ Uncertainties on the extrapolations discussed so far are

small

◮ Lepton identification ◮ b jet tagging ◮ Heavy-flavor fraction

◮ However, Z → ℓℓ is statistically limited in tails of U ◮ Augment Z estimation with two additional constraints ◮ Correlate the yield of Z and W bosons in the SR

◮ Theoretical uncertainty on W/Z ratio ∼ 10%

◮ Introduce a γ+jets CR and correlate with Z yield in SR

◮ γ events have very high yield ◮ Comes with a large theoretical uncertainty (up to 15%)

S. Narayanan

(MIT) LNS Seminar 20/02/2018 22 / 42

SLIDE 23

Background estimation summary

Z → νν SR Z → ℓℓ CR γ CR W → ℓν SR W → ℓν W CR t¯ t SR t¯ t W CR t¯ t top CR

S. Narayanan

(MIT) LNS Seminar 20/02/2018 23 / 42

SLIDE 24

Putting it all together

SR

Events / GeV

3 −

10

2 −

10

1 −

10 1 10

2

10

3

10

Data SM total (pre-fit) SM total (post-fit) Z+jets t t W+jets Single t Diboson QCD multijet

Signal region BDT > 0.45

(13 TeV)

1

36 fb

CMS

[GeV]

miss T

p

300 400 500 600 700 800 900 1000 Data / Pred.

0.5 1 1.5 2 pre-fit post-fit

Z → ee

Events / GeV

3 −

10

2 −

10

1 −

10 1 10

2

10 Data SM total (pre-fit) SM total (post-fit) Z+jets t t Single t Diboson Dielectron CR BDT > 0.45

(13 TeV)

1

36 fb

CMS

[GeV]

recoil T

p

300 400 500 600 700 800 900 1000 Data / Pred.

0.5 1 1.5 2 pre-fit post-fit

γ

Events / GeV

3 −

10

2 −

10

1 −

10 1 10

2

10

3

10 Data SM total (pre-fit) SM total (post-fit) +jets γ QCD multijet Photon CR BDT > 0.45

(13 TeV)

1

36 fb

CMS

[GeV]

recoil T

p

300 400 500 600 700 800 900 1000 Data / Pred.

0.5 1 1.5 2 pre-fit post-fit

t¯ t → µν+jets

Events / GeV

3 −

10

2 −

10

1 −

10 1 10

2

10

3

10

Data SM total (pre-fit) SM total (post-fit) t t W+jets Z+jets Single t Diboson QCD multijet

b-tagged CR µ Single- BDT > 0.45

(13 TeV)

1

36 fb

CMS

[GeV]

recoil T

p

300 400 500 600 700 800 900 1000 Data / Pred.

0.5 1 1.5 2 pre-fit post-fit

◮ Too many regions to show all here ◮ SM processes are able to fit the data quite well in all regions, including the SR ◮ Indicates no sensitivity to a potential signal

S. Narayanan

(MIT) LNS Seminar 20/02/2018 24 / 42

SLIDE 25

Constraining resonant scalars

◮ pT of top quark increases with mφ ◮ Therefore, efficiency of signal selection

improves at high mφ

◮ Scalars up to 3.5 TeV are excluded

φ ¯ s ¯ d t ψ

[TeV]

φ

m

1.5 2 2.5 3 3.5 4

) ψ t → φ → (pp

95% CL

σ

3 −

10

2 −

10

1 −

10 1

Observed Median expected 68% expected 95% expected

theory

σ

CMS

Resonant scalar production = 0.1

q

= b

q

a = 0.2

ψ

= b

ψ

a = 100 GeV

ψ

m (13 TeV)

1

36 fb

S. Narayanan

(MIT) LNS Seminar 20/02/2018 25 / 42

SLIDE 26

Constraining vector FCNCs

◮ Scan both mV and mχ ◮ Vectors up to 1.8 TeV are excluded ◮ Couplings (gq, gχ) chosen to conform to

LHC Dark Matter Working Group benchmarks

u V g u t ¯ χ χ

[TeV]

V

m

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

[TeV]

χ

m

0.2 0.4 0.6 0.8 1

2 −

10

1 −

10 1 10

2

10

= 1 [FCNC]

V χ

= 0.25, g

V q

g Median expected 95% CL

experiment

σ 1 ± Exp. Observed 95% CL

theory

σ 1 ± Obs.

(13 TeV)

1

36 fb

theory

σ /

95% CL

σ Observed

CMS

S. Narayanan

(MIT) LNS Seminar 20/02/2018 26 / 42

SLIDE 27

Constraining axial-vector FCNCs

◮ Behavior at low mχ very similar to the

vector case

◮ Behavior at off-shell boundary heavily

modified

u V g u t ¯ χ χ

[TeV]

V

m

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

[TeV]

χ

m

0.2 0.4 0.6 0.8 1

2 −

10

1 −

10 1 10

2

10

= 1 [FCNC]

A χ

= 0.25, g

A q

g Median expected 95% CL

experiment

σ 1 ± Exp. Observed 95% CL

theory

σ 1 ± Obs.

(13 TeV)

1

36 fb

theory

σ /

95% CL

σ Observed

CMS

S. Narayanan

(MIT) LNS Seminar 20/02/2018 27 / 42

SLIDE 28

Exploring coupling parameter space

[TeV]

V

m

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

V χ

g

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

2 −

10

1 −

10 1 10

2

10

= 0.25 [FCNC]

V q

= 1 GeV, g

χ

m Median expected 95% CL

experiment

σ 1 ± Exp. Observed 95% CL

theory

σ 1 ± Obs.

(13 TeV)

1

36 fb

theory

σ /

95% CL

σ Observed

CMS

V -χ coupling vs mV

[TeV]

V

m

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

V q

g

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

2 −

10

1 −

10 1 10

2

10

= 1 [FCNC]

V χ

= 1 GeV, g

χ

m Median expected 95% CL

experiment

σ 1 ± Exp. Observed 95% CL

theory

σ 1 ± Obs.

(13 TeV)

1

36 fb

theory

σ /

95% CL

σ Observed

CMS

V -q coupling vs mV

V q

g

0.2 0.4 0.6 0.8 1

V χ

g

0.5 1 1.5 2

[TeV]

V

95% CL excluded m

0.5 1 1.5 2 2.5 (13 TeV)

1

36 fb

CMS

gV

q vs gV χ vs mV ◮ Can exclude couplings below 0.1 at sufficiently low mV ◮ Given sufficiently strong (but still physical) couplings, exclude mV < 2.5 TeV

S. Narayanan

(MIT) LNS Seminar 20/02/2018 28 / 42

SLIDE 29

(Re-)learning how to top-tag

S. Narayanan

(MIT) LNS Seminar 20/02/2018 29 / 42

SLIDE 30

Where next with top-tagging?

◮ Top-tagging using QCD-motivated observables works very well ◮ Is there a “maximum” performance threshold that we are saturating? ◮ One approach is to brute-force the problem using deep learning ◮ Factorize the question: physics effects vs. detector effects ◮ Following studies are done using hadron-level MC

◮ Madgraph5 at LO for hard scattering ◮ Pythia8 for hadronization ◮ No detectors were simulated (or harmed) in performing this study

◮ Training is done on a desktop in building 24

◮ NVIDIA GTX 1080 GPU ◮ Keras1 with tensorflow2 backend 1https://github.com/keras-team/keras 2https://github.com/tensorflow/tensorflow

S. Narayanan

(MIT) LNS Seminar 20/02/2018 30 / 42

SLIDE 31

Observables

◮ Jet definition more commonly used in LHC searches:

◮ pT > 400 GeV ◮ Anti-kT, R = 0.8

◮ For each particle in the jet, 7 features:

◮ pµ (4 floats) ◮ Distance between particle and jet axis (1 float) ◮ Soft drop survival (1 boolean) ◮ Particle type (e±, µ±, γ, charged hadron±, neutral hadron)

(1 integer)

◮ Constituents are momentum-ordered ◮ Rotate the jet so:

◮ Jet axis coincides with z-axis ◮ Hardest particle away from jet axis lies in x-z plane

S. Narayanan

(MIT) LNS Seminar 20/02/2018 31 / 42

SLIDE 32

Brutalist architecture

J e t ( N p a r t i c l e s , M f e a t u r e s ) . . . N p a r t i c l e s M f e a t u r e s L i n e a r c

mb

i n a t i

n

s F u l l y c

n

n e c t e d . . . Q c

mb

i n a t i

n

s M f e a t u r e s P r e d i c t i

n

◮ Brute-force approach ◮ First layer only takes

linear combinations of input particles, e.g.

i wipµ

i ◮ Second set of layers is a

classical neural network with many layers

◮ O(106) parameters

S. Narayanan

(MIT) LNS Seminar 20/02/2018 32 / 42

SLIDE 33

Deep network performance

◮ Trained with all 7 features, 50 particles ◮ “Shallow” is network on QCD-motivated

bservables

◮ Deep network comparable to shallow ◮ On the one hand: built a performant

classifier with little a priori knowledge

◮ On the other hand: disappointing

0.0 0.2 0.4 0.6 0.8 1.0

Signal efficiency

10−5 10−4 10−3 10−2 10−1 1

Background fake rate

τ32 τSD

32

Shallow Deep (7,50)

S. Narayanan

(MIT) LNS Seminar 20/02/2018 33 / 42

SLIDE 34

LSTM network

Similar work by S.Egan et al in arXiv:1711.09059 Jet goes into a recurrent neural network. Read as a sentence, with constituents as individual “words”. LSTM is a specific RNN implementation

J e t ( N p a r t i c l e s , M f e a t u r e s ) . . . N p a r t i c l e s M f e a t u r e s F u l l y c

n

n e c t e d P r e d i c t i

n

Q f e a t u r e s R e c u r r e n t R e c u r r e n t . . . R e c u r r e n t

S. Narayanan

(MIT) LNS Seminar 20/02/2018 34 / 42

SLIDE 35

C-LSTM network

Constituents are fed pair-by-pair into 1D convolutions (width 2). Convolutions are “translational-invariant” and relate adjacent constituents Second convolution (width 4) before going into LSTM network

. . . N p a r t i c l e s M f e a t u r e s L S T M + F C P r e d i c t i

n

C

n

v # 1 C

n

v # 1 . . . C

n

v # 1 Q f e a t u r e s . . . C

n

v # 2

S. Narayanan

(MIT) LNS Seminar 20/02/2018 35 / 42

SLIDE 36

C-LSTM network performance

◮ Dramatic improvement from giving

structure to the network

◮ Even using only 4-vectors of 50

particles

◮ More improvement can be had by

adding more information (4 → 7) or more particles (50 → 100)

◮ C-LSTMs have O(105) parameters

0.0 0.2 0.4 0.6 0.8 1.0

Signal efficiency

10−5 10−4 10−3 10−2 10−1 1

Background fake rate

τ32 τSD

32

Shallow Deep (7,50) C-LSTM (4,50) C-LSTM (7,50) C-LSTM (4,100) C-LSTM (7,100)

S. Narayanan

(MIT) LNS Seminar 20/02/2018 36 / 42

SLIDE 37

How realistic is this improvement?

◮ Or put another way: what are we learning that the shallow network cannot? ◮ Difficult to answer, but one hypothesis is the C-LSTM is taking advantage of infinite

resolution

◮ Test: impose finite directional resolution δR on neutral particles

◮ If multiple particles overlap within δR, combine into a single particle ◮ Approximates calorimeter behavior

S. Narayanan

(MIT) LNS Seminar 20/02/2018 37 / 42

SLIDE 38

Finite resolution

◮ Using δR = 0.02 (realistic) significantly

hurts performance

◮ Particle kinematics reflect smearing

from finite resolution

−40 −20 20 40

Particle px [GeV]

10−5 10−4 10−3 10−2 10−1 δR = 0 δR = 0.2 −50 50 100 150 200 250 300

Particle pz [GeV]

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 δR = 0 δR = 0.2

0.0 0.2 0.4 0.6 0.8 1.0

Signal efficiency

10−5 10−4 10−3 10−2 10−1 1

Background fake rate

τ32 τSD

32

Shallow (4,50) δR = 0.02 (7,100) δR = 0.02 (4,50) (7,100)

S. Narayanan

(MIT) LNS Seminar 20/02/2018 38 / 42

SLIDE 39

Correlation with mass

◮ Using the jet mass to extract a

signal is a common technique

◮ Difficult to do when the

background is strongly sculpted by classifier

◮ Need an approach to

decorrelate the classifier output from mass (or any other nuisance)

50 100 150 200 250 300 350

Events/15 GeV

500 1000 1500 2000 2500 Post-fit Data Background Matched top Pre-fit Post-fit Data Background Matched top Pre-fit

CMSPreliminary

(13 TeV)

1

36.6 fb 0.0114 ± = 0.671

Data tag

ε = 0.713

MC tag

ε 0.00946 ± = 0.557

Data tag+mSD

ε = 0.566

MC tag+mSD

ε

Pass category

[GeV]

SD

fatjet m

50 100 150 200 250 300 350

Exp Data-Exp

0.5 − 0.5

50 100 150 200 250 300 350

mSD [GeV]

10−1 100 101 102 103 104 105 εbkg = 1.000 εbkg = 0.500 εbkg = 0.250 εbkg = 0.100 εbkg = 0.010 εbkg = 0.001

Background is sculpted to peak at mSD ∼ mt

S. Narayanan

(MIT) LNS Seminar 20/02/2018 39 / 42

SLIDE 40

Adversarial decorrelation

D i s c r i mi n a t i

n

N N P ( t

p

j e t ) L

s

s L

1

J e t ( N p a r t i c l e s , M f e a t u r e s ) L

s

s L

2

P ( m | P ( t

p

j e t ) ) A d v e r s a r i a l N N G r a d i e n t r e v e r s a l

Technique developed by C.Shimmin et al in

arXiv:1703.03507.

Add a second network that tries to learn P(m|ˆ y). Adversarial network learns L2. Trick discriminatory network into learning L1 − λL2 as loss function.

S. Narayanan

(MIT) LNS Seminar 20/02/2018 40 / 42

SLIDE 41

Effect of adversarial network

◮ Behavior near mt can be controlled to very fairly strong background rejection ◮ Breaks down at ǫbkg = 0.1%

τ SD

32

50 100 150 200 250 300 350

mSD [GeV]

10−1 100 101 102 103 104 105 εbkg = 1.000 εbkg = 0.500 εbkg = 0.250 εbkg = 0.100 εbkg = 0.010 εbkg = 0.001

Deep NN

50 100 150 200 250 300 350

mSD [GeV]

10−1 100 101 102 103 104 105 εbkg = 1.000 εbkg = 0.500 εbkg = 0.250 εbkg = 0.100 εbkg = 0.010 εbkg = 0.001

Decorr. deep NN

50 100 150 200 250 300 350

mSD [GeV]

10−1 100 101 102 103 104 105 εbkg = 1.000 εbkg = 0.500 εbkg = 0.250 εbkg = 0.100 εbkg = 0.010 εbkg = 0.001

S. Narayanan

(MIT) LNS Seminar 20/02/2018 41 / 42

SLIDE 42

Conclusions

Mono-top

◮ First boosted mono-top search

◮ Previous mono-top searches exist, but

could not extend to high top pT

◮ Significant improvement over previous

constraints on flavor-violating DM

◮ Boosted objects and large pmiss T

can be used to probe other final states

◮ Stay tuned for more from CMS!

Top-tagging

◮ Energy correlation functions improve

sensitivity to mono-top production

◮ Deep learning techniques show promise,

but still much to understand before use in experimental context

S. Narayanan

(MIT) LNS Seminar 20/02/2018 42 / 42

SLIDE 43

BACKUP

S. Narayanan

(MIT) LNS Seminar 20/02/2018 43 / 42

SLIDE 44

Generalized ECFs

◮ Extension of original ECFs to allow for different angular orders:

e(o, N, β) ≡

eβ

N =

i1<i2<···<iN∈J

 

1≤k≤j

zik   × min   

k,l∈pairs{i1,...,iN}

∆Rβ

kl

  

◮ e.g. 2e1 3 =

a<b<c∈J

zazbzc × min{∆Rab∆Rac, ∆Rab∆Rbc, ∆Rbc∆Rac}

◮ Summary of parameters:

◮ N = order of the correlation function. An N-pronged jet should have eN ≫ eM, for

N < M

◮ o = order of the angular factor. ◮ β = angular power ◮ Tunes the relative importance of the angular factor and the energy factor ◮ Weights the impact of small angles (assuming ∆R < 1)

S. Narayanan

(MIT) LNS Seminar 20/02/2018 44 / 42

SLIDE 45

Do we really need 11 variables?

(Answer: almost...)

◮ Well-modeled variables are added to

training one-by-one according to discriminating power

◮ Can get down to 8 without losing

performance

var

N

2 4 6 8 10

Fake rate @ 50% eff

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2

CMSPreliminary

S. Narayanan

(MIT) LNS Seminar 20/02/2018 45 / 42

SLIDE 46

Overtraining checks

Top

BDT

1 − 0.8 − 0.6 − 0.4 − 0.2 − 0.2 0.4 0.6 0.8 1

Probability

0.02 0.04 0.06 0.08 0.1 0.12 0.14

Test Signal Train Signal Test Background Train Background Test Signal Train Signal Test Background Train Background

CMSPreliminary

Higgs

BDT

0.5 − 0.4 − 0.3 − 0.2 − 0.1 − 0.1 0.2 0.3 0.4 0.5

Probability

0.02 0.04 0.06 0.08 0.1

Test Signal Train Signal Test Background Train Background Test Signal Train Signal Test Background Train Background

CMSPreliminary

S. Narayanan

(MIT) LNS Seminar 20/02/2018 46 / 42

SLIDE 47

Do we really need 50 variables?

◮ Use all variables that show any

discriminating power

◮ Add variables one by one ◮ Saturate at 20

var

N

10 20 30 40 50

Fake rate @ 50% eff

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2

CMSPreliminary

S. Narayanan

(MIT) LNS Seminar 20/02/2018 47 / 42

SLIDE 48

Measuring a scale factor

Strategy

◮ Fit the mSD shape using MC templates ◮ MC split into three categories:

◮ “1-prong” ◮ QCD and W+jets ◮ “2-prong” ◮ Diboson ◮ Unmatched single-t and t¯

t

◮ “3-prong” ◮ Matched single-t and t¯

t

◮ Efficiency is measured with respect to this category

◮ “3-prong” template is allowed to shift by smearing with a δ-function, correlated

between pass and fail categories

S. Narayanan

(MIT) LNS Seminar 20/02/2018 48 / 42

SLIDE 49

Kinematic dependence in QCD

◮ Mass sculpting ⇒ harder to determine efficiency in data ◮ pT sculpting ⇒ harder to a shape analysis

For example:

[GeV]

SD

m

50 100 150 200 250 300 350 400 450 500

a.u.

0.05 0.1 0.15 0.2 0.25

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

32

τ QCD

mSD [GeV]

50 100 150 200 250 300 350 400 450 500

a.u.

0.05 0.1 0.15 0.2 0.25 0.3 0.35

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

tau21 QCD

S. Narayanan

(MIT) LNS Seminar 20/02/2018 49 / 42

SLIDE 50

Kinematic dependence in QCD

N-subjettiness (3/2)

Ungroomed

[GeV]

SD

m

50 100 150 200 250 300 350 400 450 500

a.u.

0.05 0.1 0.15 0.2 0.25

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

32

τ QCD

Groomed

[GeV]

SD

m

50 100 150 200 250 300 350 400 450 500

a.u.

0.05 0.1 0.15 0.2 0.25

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

SD 32

τ QCD

Grooming helps a bit

S. Narayanan

(MIT) LNS Seminar 20/02/2018 50 / 42

SLIDE 51

Kinematic dependence in QCD

N-subjettiness (2/1)

Ungroomed

mSD [GeV]

50 100 150 200 250 300 350 400 450 500

a.u.

0.05 0.1 0.15 0.2 0.25 0.3 0.35

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

tau21 QCD

Groomed

mSD [GeV]

50 100 150 200 250 300 350 400 450 500

a.u.

0.05 0.1 0.15 0.2 0.25 0.3 0.35

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

tau21SD QCD

Grooming helps a bit

S. Narayanan

(MIT) LNS Seminar 20/02/2018 51 / 42

SLIDE 52

Kinematic dependence in QCD

N-subjettiness (3/2)

Ungroomed

[GeV]

T

p

300 400 500 600 700 800 900 1000

a.u.

0.02 0.04 0.06 0.08 0.1

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

<210 [GeV]

SD

110<m

32

τ QCD

Groomed

[GeV]

T

p

300 400 500 600 700 800 900 1000

a.u.

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

<210 [GeV]

SD

110<m

SD 32

τ QCD

Grooming helps a bit

S. Narayanan

(MIT) LNS Seminar 20/02/2018 52 / 42

SLIDE 53

Kinematic dependence in QCD

Top BDT vs τ SD

32

ECF BDT

[GeV]

T

p

300 400 500 600 700 800 900 1000

a.u.

0.02 0.04 0.06 0.08 0.1

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

<210 [GeV]

SD

110<m ECF BDT QCD

Groomed τ32

[GeV]

T

p

300 400 500 600 700 800 900 1000

a.u.

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

<210 [GeV]

SD

110<m

SD 32

τ QCD

BDT sculpting is slightly worse than τ SD

32

S. Narayanan

(MIT) LNS Seminar 20/02/2018 53 / 42

SLIDE 54

Kinematic dependence in QCD

N-subjettiness (2/1)

Ungroomed

pt [GeV]

300 400 500 600 700 800 900 1000

a.u.

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

<150 [GeV]

SD

100<m tau21 QCD

Groomed

pt [GeV]

300 400 500 600 700 800 900 1000

a.u.

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

<150 [GeV]

SD

100<m tau21SD QCD

Grooming helps a bit

S. Narayanan

(MIT) LNS Seminar 20/02/2018 54 / 42

SLIDE 55

Kinematic dependence in QCD

Higgs BDT vs τ SD

21

ECF BDT

pt [GeV]

300 400 500 600 700 800 900 1000

a.u.

0.02 0.04 0.06 0.08 0.1

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

<150 [GeV]

SD

100<m higgs_ecf_bdt QCD

Groomed τ21

pt [GeV]

300 400 500 600 700 800 900 1000

a.u.

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

<150 [GeV]

SD

100<m tau21SD QCD

BDT sculpting is about the same as τ SD

21

S. Narayanan

(MIT) LNS Seminar 20/02/2018 55 / 42

SLIDE 56

Kinematic dependence in q/g jets

Top BDT vs τ SD

32

ECF BDT

[GeV]

SD

m

50 100 150 200 250 300 350 400 450 500

a.u.

0.05 0.1 0.15 0.2 0.25

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

ECF BDT QCD

Groomed τ32

[GeV]

SD

m

50 100 150 200 250 300 350 400 450 500

a.u.

0.05 0.1 0.15 0.2 0.25

inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. inclusive 50% rej. 75% rej. 90% rej. 95% rej. 98% rej. CMSPreliminary

SD 32

τ QCD

BDT sculpts mass less severely than τ SD

32

S. Narayanan

(MIT) LNS Seminar 20/02/2018 56 / 42

SLIDE 57

Dependence on NPV and pT

NPV

5 10 15 20 25 30 35

〉 BDT 〈

0.6 − 0.4 − 0.2 − 0.2 0.4 0.6

Data MC Data MC

CMSPreliminary

(13 TeV)

1

12.9 fb < 210 GeV

SD

110 < m

[GeV]

T

Fatjet p

250 300 350 400 450 500 550 600 650 700 750

〉 BDT 〈

0.6 − 0.4 − 0.2 − 0.2 0.4 0.6

Data MC Data MC

CMSPreliminary

(13 TeV)

1

12.9 fb < 210 GeV

SD

110 < m

NPV is flat; data/MC agreement is reasonable

S. Narayanan

(MIT) LNS Seminar 20/02/2018 57 / 42

SLIDE 58

Signal definitions

◮ Not using explicit resonances - instead

count number of truth partons inside the jet

◮ More robust and easier to understand

fakes (e.g. a 3-prong background jet is well-defined)

◮ Mass of the identified partons is

consistent with expected resonances

50 100 150 200 250 300 350 400

Parton mass [GeV]

0.0 0.2 0.4 0.6 0.8 1.0 3-prong top 2-prong Higgs 1-prong QCD

S. Narayanan

(MIT) LNS Seminar 20/02/2018 58 / 42