CMS OPEN DATA FOR MACHINE LEARNING: JET DATASET DATA SCIENCE @ HIGH - - PowerPoint PPT Presentation

cms open data for machine learning jet dataset
SMART_READER_LITE
LIVE PREVIEW

CMS OPEN DATA FOR MACHINE LEARNING: JET DATASET DATA SCIENCE @ HIGH - - PowerPoint PPT Presentation

CMS OPEN DATA FOR MACHINE LEARNING: JET DATASET DATA SCIENCE @ HIGH ENERGY PHYSICS 2017 Group Members: Gabriele Benelli, Javier Duarte, Raghav Elayavalli, Frank Golf, Burt Holzman, Michael Krohn, Joe Pastika, Kevin Pedro, Uzziel Perez, Alexx


slide-1
SLIDE 1

Group Members: Gabriele Benelli, Javier Duarte, Raghav Elayavalli, Frank Golf, Burt Holzman, Michael Krohn, Joe Pastika, Kevin Pedro, Uzziel Perez, Alexx Perloff, Sezen Sekmen, Devin Taylor, Caterina Vernieri, Andrew Whitbeck

CMS OPEN DATA FOR MACHINE LEARNING: JET DATASET

Friday, May 12, 17 CMS OPEN DATA ML - JETS

1

DATA SCIENCE @ HIGH ENERGY PHYSICS 2017

slide-2
SLIDE 2

GROUP PHOTO – DAY 1

Friday, May 12, 17 CMS OPEN DATA ML - JETS

2

slide-3
SLIDE 3

GROUP PHOTO – DAY 3

Friday, May 12, 17 CMS OPEN DATA ML - JETS

3

slide-4
SLIDE 4

CAN’T COMPLAIN ABOUT THE VIEW!

Friday, May 12, 17 CMS OPEN DATA ML - JETS

4

slide-5
SLIDE 5

CMS OPEN DATA

Friday, May 12, 17 CMS OPEN DATA ML - JETS

5

  • Public version of 2011 CMS data [link]
  • Data format:
  • AOD -> Numpy Array -> Pandas DataFrame
  • CMS Jet Tuple production 2011 [link]
  • Event Features:
  • ('run', 'lumi', 'event', 'met', 'sumet', 'rho', 'pthat',

'mcweight’)

  • Jet-Level Features:
  • ('njet_ak7', 'jet_pt_ak7', 'jet_eta_ak7',

'jet_phi_ak7', 'jet_E_ak7', 'jet_msd_ak7', 'jet_area_ak7', 'jet_jes_ak7', 'jet_tau21_ak7', 'jet_isW_ak7’)

  • PF Candidate-Level Features:
  • 'jet_ncand_ak7', 'ak7pfcand_pt',

'ak7pfcand_eta', 'ak7pfcand_phi', 'ak7pfcand_id', 'ak7pfcand_charge', 'ak7pfcand_ijet')

  • Asked to add in gen jet information for use in later

projects

slide-6
SLIDE 6

WORKSPACE

Friday, May 12, 17 CMS OPEN DATA ML - JETS

6

  • Worked on Amazon Web Services (AWS) instances
  • Deep Learning Amazon Machine Image (AMI) Amazon Linux Version 2.0
  • For use on Amazon Elastic Compute Cloud(Amazon EC2)
  • 64-bit
  • p2.xlarge instances (designed for general-purpose GPU compute applications using CUDA and

OpenCL)

  • 1 Tesla K80 GPU
  • 4 vCPUs
  • 61 GiB RAM
  • 8 Deep Learning Frameworks
  • MXNet, Caffe, Caffe2, Tensorflow, Theano, Torch, CNTK, and Keras (1.2.2)
  • Other packages and platforms:
  • Jupyter notebooks with Python 2.7 and Python 3.4 kernels, Matplotlib, Scikit-image, CppLint, Pylint,

pandas, Graphviz, Bokeh Python packages, Boto and Boto 3, the AWS CLI, Anaconda 2, and Anaconda

slide-7
SLIDE 7

HANDS-ON SESSION: DAY 1

Friday, May 12, 17 CMS OPEN DATA ML - JETS

7

  • Group exercise to learn how to work with a fully connected NN and a convolutional NN

[link]

  • Problem: Create a classifier to identify boosted W jets
  • Basic Skills:
  • Tuning metaparameters, testing different pre-processing steps, separating image

representation into layers based PF candidate classes, training a recursive NN, etc.

[J. Thaler, et al. arXiv:1011.2268]

slide-8
SLIDE 8

HANDS-ON SESSION: DAY 1

Densely-Connected Neural Network Convolutional Neural Network

Friday, May 12, 17 CMS OPEN DATA ML - JETS

8

[L. de Oliveira, et al. arXiv:1511.05190]

slide-9
SLIDE 9
  • Inspired by “Recursive jets” talk on Wednesday
  • Recursive neural nets too difficult for this short time period, so tried recurrent neural

network = LSTM (Long-short term memory unit)

  • For each jet, gave network a list of PF candidates (pt, eta, phi, ID, charge)
  • Tried sorting candidates by pt or DeltaR w/in jet
  • Varied # of layers in network and # of training epochs
  • Did not use embedding layer - need more work to understand this feature
  • Network seemed to get stuck at low accuracy while training (regardless of variations)
  • Performed worse than chance
  • Based on example using IMDB movie review data [link]
  • Needs more work before it’s ready for primetime

LSTM TO CLASSIFY MERGED W VS QCD JETS

Friday, May 12, 17 CMS OPEN DATA ML - JETS

9

slide-10
SLIDE 10
  • Trying to predict the jet energy scale

(JES)

  • Trained a fully connected NN
  • 1 hidden layer with 100 nodes
  • Inputs were jet pT and eta
  • The plots show the true (blue) and

predicted (red) JES versus pT (top) and eta (bottom)

  • We are able to predict and

improve upon the pT dependence, but we can’t predict the eta dependence

SIMPLE JES REGRESSION

Friday, May 12, 17 CMS OPEN DATA ML - JETS

10

slide-11
SLIDE 11
  • Once again we were trying to predict

the JES, but this time we used the jet images (PF candidates) as inputs.

  • Unfortunately our NN

was unable to predict the JES and seems to simply return a random value.

  • Also tried adding the

jet pT and eta as additional inputs, but this just confused the NN.

CNN JES REGRESSION

Friday, May 12, 17 CMS OPEN DATA ML - JETS

11

slide-12
SLIDE 12

JES REGRESSION – IT CAN BE DONE

Friday, May 12, 17 CMS OPEN DATA ML - JETS

12

  • Another group has successfully done

this same regression with a fully connected NN

  • Based on the DeepJet framework
  • 9 layers, 1 with 350 nodes and 8

with 100 nodes

  • 600+ input variables
  • See Markus Stoye’s talk from

Monday

  • So this is technically possible

Havukainen, Joona. Sneak peek at regression using DNN. Helsinki JetMET Workshop, Helsinki Institute of Physics. [link]

slide-13
SLIDE 13

THANK YOU!

Special thanks to Javier Duarte, Burt Holzman, and Caterina Vernieri

Friday, May 12, 17 CMS OPEN DATA ML - JETS

13