machine learning brushing on tools used for summer
play

Machine Learning. Brushing on tools used for summer student - PowerPoint PPT Presentation

Machine Learning. Brushing on tools used for summer student projects Overview Workbooks available online https://cms-caltech-ml.cern.ch/tree CERN Open Stack VM iPython server under Apache 2.4 Access limited to e-group


  1. Machine Learning. Brushing on tools used for summer student projects

  2. Overview ● Workbooks available online ➢ https://cms-caltech-ml.cern.ch/tree ➢ CERN Open Stack VM ➢ iPython server under Apache 2.4 ➢ Access limited to e-group cms-caltech-ml@cern.ch ➢ Authentication by CERN single sign-on ● GPU available ➢ GeForce GT 610 on pccitevo.cern.ch (Jean-Roch's desktop) ➢ Tesla K40c on felk40.cern.ch (courtesy Felice Pantaleo. CERN) ➢ Useful only for theano based code so far ➢ Download a notebook as python script and run locally THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python code.py ➢

  3. matplotlib http://matplotlib.org/ ● Python library for graphical representation ● Documentation is a bit scattered around ● Lots of practical examples existing ● Used for clustering with k-means, and PCA ● Self-Organizing map getting implemented https://github.com/scikit-learn/scikit-learn/pull/2996 much discussion about it on-going

  4. numpy http://www.numpy.org/ ● Python library for scientific computation ● Documentation is a bit opaque ● Lots of practical examples existing ● Syntax can be very cryptic, but very powerful ● ROOT i/o not much supported in python libraries ● Used for dataset manipulation

  5. h5py http://www.h5py.org/ ● Python library for managing (big) dataset ● Supported on multiple platform ● Strong documentation ● i/o for large dataset ● Dataset engine for NADE (see later) ● Only barely used so far at its full potential

  6. scikit-learn http://scikit-learn.org/dev/index.html ● Python library for machine learning ● Well documented suite of methods ● Lots of example to get started ● Implements most of the classical supervised methods (PCA, SVM, Random forest, …) ● Implements several unsupervised clustering algorithm (k-means, …) ● Used for clustering with k-means, and PCA ● Self-Organizing map getting implemented https://github.com/scikit-learn/scikit-learn/pull/2996 much discussion about it on-going e h t r o f d e r e l u b d m e e p h v o c o h S N s k r o w

  7. pybrain http://pybrain.org/ ● Python library for neural net training ● Well documented ● Easy to get started on neural-net training ● Does not support GPU acceleration ● Used initially to get started with NN ● Faced performance issues early on

  8. theano http://deeplearning.net/software/theano/ ● Python library for manipulating mathematical expressions ● Very complete library ● Intensive tutorial available ● A bit opaque to use by itself ● Full GPU acceleration support ● A bit like cernlib, requires higher level software wrapper ● Easy software manipulation comes with performance hit ● Used for convolutional neural network implementations ● Used as mathematical engine for higher level libray e h t r o f d e r e l u b d m e e p h v o c o h S N s k r o w

  9. theanets http://theanets.readthedocs.org ● Python library for neural net training ● Feels like “pybrain done right” ● Easy to get started on neural-net training ● Use theano as computation engine ● Now used for neural nets in three projects

  10. rNADE http://www.benignouria.com/en/research/RNADE/ http://arxiv.org/abs/1306.0186 ● Python software for neural autoregressive density estimator ● In touch with the authors Ian Murray, Hugo Larochelle, Benigno Uria. ● Transformed into a python library for usage inside noteboooks ● Benchmarking for usability in background estimation ● Possible solutions for outliers detection suggested by the authors

  11. Deep Learning ● Kolmogorov's Theorem http://www.sciencedirect.com/science/article/pii/0893608092900128 ● “Make it deep enough and it will learn anything” ● Provided … ➢ Enough data to fit all the parameters ➢ Enough computing acceleration

  12. spearmint https://github.com/HIPS/Spearmint ● Software to perform bayesian optimization ● Utilized for neural net optimization in the UCI Higgs2tautau/razor papers ● Will use to optimize NN topologies e h t r o f d e r e l u b d m e p e h o v c h o S s N k r o w

  13. NMS Projects Overview ● Assuming Kolmogorov's theorem : we can learn anything ● NN Tracking ✔ Present the full list of hits ✔ Train on hit/track association ➔ Get track candidate categorization ➔ Software acceleration ➔ Possibility of optimization with respect to exiting tracking ● NN Trigger ✔ Present low level reconstruction objects ✔ Train on trigger bits ➔ Emulate the trigger selection ➔ Software acceleration, no optimization with respect to existing trigger table, could free a good fraction of timing in the HLT ● NN Calo Id ✔ Present energy deposition in a jet ✔ Train on generator particle identity ➔ Get the jet “label” (quark, gluon, photon, electron, …) ➔ Has a huge potential impact. Room for optimization

  14. Cost Function and Regularisation ● Regularization is adding sum of squared weights in the cost function. ● Supposed to stabilize and prevent over fitting. ● Observation was that Error (NN mismatch with target) was going down as expected but cost growing back during training. ● Explanation is that discrete gradient descent is going over a barrier in cost to reach out towards a better fit.

  15. NN HLT 1/2

  16. NN HLT 2/2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend