scikit learn to tmva xml converter tool
play

scikit-learn to TMVA: XML converter tool Yuriy Ilchenko (U. of - PowerPoint PPT Presentation

scikit-learn to TMVA: XML converter tool Yuriy Ilchenko (U. of Texas), Nazim Huseynov (JINR) IML LHC Machine Learning WG Meeting Feb 03, 2015 History ttbar production with non-prompt leptons - major background for a few ttH channels


  1. scikit-learn to TMVA: XML converter tool Yuriy Ilchenko (U. of Texas), Nazim Huseynov (JINR) IML LHC Machine Learning WG Meeting Feb 03, 2015

  2. History • ttbar production with non-prompt leptons - major background for a few ttH channels • Idea is to use MVA - boosted decision tree (BDT) - to separate prompt from non-prompt leptons • Employ TMVA from ROOT • List of input variables - object level only • pt, eta, sigd0PV, z0SinTheta, etcone20/pt, ptcone20/pt • С ompare BDT performance against the standard analysis cuts • ROC-curve (BDT) vs a point (cuts) 2

  3. TMVA - electrons 10% sample 33% sample Background rejection versus Signal efficiency Background rejection versus Signal efficiency 1 1 Background rejection Background rejection 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 MVA Method: MVA Method: 0.3 0.3 BDT BDT 0.2 0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Signal efficiency Signal efficiency zoom in zoom in Background rejection versus Signal efficiency Background rejection versus Signal efficiency 1 1 Background rejection Background rejection cuts 0.98 0.95 0.96 0.9 0.94 0.85 0.92 0.8 0.9 0.75 0.88 0.7 MVA Method: MVA Method: 0.86 0.65 BDT BDT 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.84 0.86 0.88 0.9 0.92 Signal efficiency Signal efficiency 3

  4. TMVA - muons 10% sample 33% sample Background rejection versus Signal efficiency 1 Background rejection --- <ERROR> BDT : YOUR tree has 0.9 only 1 Node... kind of a funny 0.8 *tree*. I cannot boost such a 0.7 thing... if after 1 step the error rate 0.6 is == 0.5 0.5 0.4 MVA Method: no results :( 0.3 BDT 0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Signal efficiency zoom in Background rejection versus Signal efficiency cuts 1 Background rejection Decided to try an 0.99 alternative MVA 0.98 library 0.97 0.96 MVA Method: BDT 0.95 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 4 Signal efficiency

  5. scikit-learn • “sklearn” - popular open-source library for data- analysis written in python • Implements all major models - decision trees, neural networks, etc • Supported by an international community of developers Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011. www.scikit-learn.org 5

  6. sklearn - electrons 10% sample 33% sample zoom in zoom in cuts 6

  7. sklearn - muons 10% sample 33% sample zoom in zoom in cuts 7

  8. sklearn to TMVA • Problem: No sklearn available in ATLAS software • Solution: convert a classifier trained with scikit-learn to the xml format readable by TMVA Reader • Perk: apply BDT in ATLAS independently of scikit-learn skTMVA converter For Training For Testing 8

  9. skTMVA converter • skTMVA - sklearn to TMVA converter • part of koza4ok package: contained ROC-curve calculation, some other tools • written in python • @GitHub - https://github.com/yuraic/koza4ok • What’s supported? • BDT binary classification • AdaBoost, Grad Boosting • xml format only 9

  10. skTMVA in action • Getting the converter git clone https://github.com/yuraic/koza4ok.git • Setup the repository source setup_koza4ok.sh • And in your python code scikit-learn model output TMVA xml file TMVA input variables and their type (variable order matters!) 10

  11. skTMVA in practice In koza4ok/example folder • Training - no input data is required, data • is generated on fly bdt_sklearn_to_tmva_AdaBoost.py • bdt_sklearn_to_tmva_Grad.py • Testing and Validation- draw ROC curve • by TMVA and scikit-learn and overlay validate_sklearn_to_tmva.py • Two files created when running examples bdt_sklearn_to_tmva_example.pkl • stores scikit-learn model - bdt_sklearn_to_tmva_example.xml • converted TMVA xml file - 11

  12. Summary Summary • skTMVA - scikit-learn to TMVA converter • supports BDT binary classification - AdaBoost, Gradient Boosting • saves to xml file • comes with examples and validation code • web: https://github.com/yuraic/koza4ok Plans • Convert scikit-learn model to a standalone C++ file • Contact us • Yuriy Ilchenko (core development) - ilchenko@physics.utexas.edu • Nazim Huseynov (validation, testing) - nguseynov@jinr.ru 12

  13. Backup 13

  14. Decision Tree in scikit-learn and TMVA TMVA variable description in • back-up slides (or google) sklearn tree structure is • http://scikit-learn.org/dev/ auto_examples/tree/ unveil_tree_structure.html scikit-learn Decision Tree apply skTMVA converter 14

  15. TMVA minimal weights xml Describe Variables Maps var to VarIndex Tree weight (AdaBoost) Tree number Tree structure as a bunch of included nodes Example: a single tree encoded in TMVA xml file <GeneralInfo> and <Options> - removed, don’t affect BDT score 15

  16. TMVA BDT xml parameters • Variables section • variable Min, Max values show no effect on output BDT score • BinaryTree section - node parameters • IVar=“0" - refers to a variable defined by VarIndex in the Variables section • pos=“s” - root node, pos=“l”- left, pos=“r” - right • Cut=“3.4095886230468750e+01" - node cut value • nType - node type; compared against NodePurityLimit which is set in configuration - TMVA BDT config parameters • nType=“-1" - terminal background node • nType=“1" - terminal signal node • nType=“0" - intermediate node • cType - cut type • cType=“0" - if node variable > cut value, then go left; otherwise - right • cType=“1" - if node variable > cut value, then go right; otherwise - left • purity - S/(S+B); S - number of signal events, B - number of background events • res=“…” and rms=“…” - regression predictions (used in Gradient Boosting) • NCoef=“0" - always zero, some Fisher coefficients, not sure what they are for 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend