cms open data for machine learning jet dataset
play

CMS OPEN DATA FOR MACHINE LEARNING: JET DATASET DATA SCIENCE @ HIGH - PowerPoint PPT Presentation

CMS OPEN DATA FOR MACHINE LEARNING: JET DATASET DATA SCIENCE @ HIGH ENERGY PHYSICS 2017 Group Members: Gabriele Benelli, Javier Duarte, Raghav Elayavalli, Frank Golf, Burt Holzman, Michael Krohn, Joe Pastika, Kevin Pedro, Uzziel Perez, Alexx


  1. CMS OPEN DATA FOR MACHINE LEARNING: JET DATASET DATA SCIENCE @ HIGH ENERGY PHYSICS 2017 Group Members: Gabriele Benelli, Javier Duarte, Raghav Elayavalli, Frank Golf, Burt Holzman, Michael Krohn, Joe Pastika, Kevin Pedro, Uzziel Perez, Alexx Perloff , Sezen Sekmen, Devin Taylor, Caterina Vernieri, Andrew Whitbeck 1 CMS OPEN DATA ML - JETS Friday, May 12, 17

  2. GROUP PHOTO – DAY 1 2 CMS OPEN DATA ML - JETS Friday, May 12, 17

  3. GROUP PHOTO – DAY 3 3 CMS OPEN DATA ML - JETS Friday, May 12, 17

  4. CAN’T COMPLAIN ABOUT THE VIEW! 4 CMS OPEN DATA ML - JETS Friday, May 12, 17

  5. CMS OPEN DATA • Public version of 2011 CMS data [link] • Data format: • AOD -> Numpy Array -> Pandas DataFrame • CMS Jet Tuple production 2011 [link] • Event Features: ('run', 'lumi', 'event', 'met', 'sumet', 'rho', 'pthat', • 'mcweight’) Jet-Level Features: • ('njet_ak7', 'jet_pt_ak7', 'jet_eta_ak7', • 'jet_phi_ak7', 'jet_E_ak7', 'jet_msd_ak7', 'jet_area_ak7', 'jet_jes_ak7', 'jet_tau21_ak7', 'jet_isW_ak7’) PF Candidate-Level Features: • 'jet_ncand_ak7', 'ak7pfcand_pt', • 'ak7pfcand_eta', 'ak7pfcand_phi', 'ak7pfcand_id', 'ak7pfcand_charge', 'ak7pfcand_ijet') Asked to add in gen jet information for use in later • projects 5 CMS OPEN DATA ML - JETS Friday, May 12, 17

  6. WORKSPACE Worked on Amazon Web Services (AWS) instances • • Deep Learning Amazon Machine Image (AMI) Amazon Linux Version 2.0 • For use on Amazon Elastic Compute Cloud(Amazon EC2) • 64-bit • p2.xlarge instances (designed for general-purpose GPU compute applications using CUDA and OpenCL) • 1 Tesla K80 GPU • 4 vCPUs • 61 GiB RAM • 8 Deep Learning Frameworks • MXNet, Caffe, Caffe2, Tensorflow, Theano, Torch, CNTK, and Keras (1.2.2) • Other packages and platforms: • Jupyter notebooks with Python 2.7 and Python 3.4 kernels, Matplotlib, Scikit-image, CppLint, Pylint, pandas, Graphviz, Bokeh Python packages, Boto and Boto 3, the AWS CLI, Anaconda 2, and Anaconda 6 CMS OPEN DATA ML - JETS Friday, May 12, 17

  7. HANDS-ON SESSION: DAY 1 Group exercise to learn how to work with a fully connected NN and a convolutional NN • [link] Problem: Create a classifier to identify boosted W jets • Basic Skills: • Tuning metaparameters, testing different pre-processing steps, separating image • representation into layers based PF candidate classes, training a recursive NN, etc. [J. Thaler, et al. arXiv:1011.2268] 7 CMS OPEN DATA ML - JETS Friday, May 12, 17

  8. HANDS-ON SESSION: DAY 1 Densely-Connected Neural Network Convolutional Neural Network [L. de Oliveira, et al. arXiv:1511.05190] 8 CMS OPEN DATA ML - JETS Friday, May 12, 17

  9. LSTM TO CLASSIFY MERGED W VS QCD JETS Inspired by “Recursive jets” talk on Wednesday • Recursive neural nets too difficult for this short time period, so tried recurrent neural • network = LSTM (Long-short term memory unit) For each jet, gave network a list of PF candidates (pt, eta, phi, ID, charge) • Tried sorting candidates by pt or DeltaR w/in jet • Varied # of layers in network and # of training epochs • Did not use embedding layer - need more work to understand this feature • Network seemed to get stuck at low accuracy while training (regardless of variations) • Performed worse than chance • Based on example using IMDB movie review data [link] • Needs more work before it’s ready for primetime • 9 CMS OPEN DATA ML - JETS Friday, May 12, 17

  10. SIMPLE JES REGRESSION Trying to predict the jet energy scale • (JES) Trained a fully connected NN • 1 hidden layer with 100 nodes • Inputs were jet pT and eta • The plots show the true (blue) and • predicted (red) JES versus pT (top) and eta (bottom) We are able to predict and • improve upon the pT dependence, but we can’t predict the eta dependence 10 CMS OPEN DATA ML - JETS Friday, May 12, 17

  11. CNN JES REGRESSION Once again we were trying to predict Unfortunately our NN • • the JES, but this time we used the jet was unable to predict images (PF candidates) as inputs. the JES and seems to simply return a random value. Also tried adding the • jet pT and eta as additional inputs, but this just confused the NN. 11 CMS OPEN DATA ML - JETS Friday, May 12, 17

  12. JES REGRESSION – IT CAN BE DONE Another group has successfully done • this same regression with a fully connected NN Based on the DeepJet framework • 9 layers, 1 with 350 nodes and 8 • with 100 nodes 600+ input variables • See Markus Stoye’s talk from • Monday So this is technically possible • Havukainen, Joona. Sneak peek at regression using DNN. Helsinki JetMET Workshop, Helsinki Institute of Physics. [link] 12 CMS OPEN DATA ML - JETS Friday, May 12, 17

  13. THANK YOU! Special thanks to Javier Duarte, Burt Holzman, and Caterina Vernieri 13 CMS OPEN DATA ML - JETS Friday, May 12, 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend