scikit-learn to TMVA: XML converter tool Yuriy Ilchenko (U. of - PowerPoint PPT Presentation

scikit-learn to TMVA: XML converter tool Yuriy Ilchenko (U. of Texas), Nazim Huseynov (JINR) IML LHC Machine Learning WG Meeting Feb 03, 2015

History • ttbar production with non-prompt leptons - major background for a few ttH channels • Idea is to use MVA - boosted decision tree (BDT) - to separate prompt from non-prompt leptons • Employ TMVA from ROOT • List of input variables - object level only • pt, eta, sigd0PV, z0SinTheta, etcone20/pt, ptcone20/pt • С ompare BDT performance against the standard analysis cuts • ROC-curve (BDT) vs a point (cuts) 2

TMVA - electrons 10% sample 33% sample Background rejection versus Signal efficiency Background rejection versus Signal efficiency 1 1 Background rejection Background rejection 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 MVA Method: MVA Method: 0.3 0.3 BDT BDT 0.2 0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Signal efficiency Signal efficiency zoom in zoom in Background rejection versus Signal efficiency Background rejection versus Signal efficiency 1 1 Background rejection Background rejection cuts 0.98 0.95 0.96 0.9 0.94 0.85 0.92 0.8 0.9 0.75 0.88 0.7 MVA Method: MVA Method: 0.86 0.65 BDT BDT 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.84 0.86 0.88 0.9 0.92 Signal efficiency Signal efficiency 3

TMVA - muons 10% sample 33% sample Background rejection versus Signal efficiency 1 Background rejection --- <ERROR> BDT : YOUR tree has 0.9 only 1 Node... kind of a funny 0.8 *tree*. I cannot boost such a 0.7 thing... if after 1 step the error rate 0.6 is == 0.5 0.5 0.4 MVA Method: no results :( 0.3 BDT 0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Signal efficiency zoom in Background rejection versus Signal efficiency cuts 1 Background rejection Decided to try an 0.99 alternative MVA 0.98 library 0.97 0.96 MVA Method: BDT 0.95 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 4 Signal efficiency

scikit-learn • “sklearn” - popular open-source library for data- analysis written in python • Implements all major models - decision trees, neural networks, etc • Supported by an international community of developers Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011. www.scikit-learn.org 5

sklearn - electrons 10% sample 33% sample zoom in zoom in cuts 6

sklearn - muons 10% sample 33% sample zoom in zoom in cuts 7

sklearn to TMVA • Problem: No sklearn available in ATLAS software • Solution: convert a classifier trained with scikit-learn to the xml format readable by TMVA Reader • Perk: apply BDT in ATLAS independently of scikit-learn skTMVA converter For Training For Testing 8

skTMVA converter • skTMVA - sklearn to TMVA converter • part of koza4ok package: contained ROC-curve calculation, some other tools • written in python • @GitHub - https://github.com/yuraic/koza4ok • What’s supported? • BDT binary classification • AdaBoost, Grad Boosting • xml format only 9

skTMVA in action • Getting the converter git clone https://github.com/yuraic/koza4ok.git • Setup the repository source setup_koza4ok.sh • And in your python code scikit-learn model output TMVA xml file TMVA input variables and their type (variable order matters!) 10

skTMVA in practice In koza4ok/example folder • Training - no input data is required, data • is generated on fly bdt_sklearn_to_tmva_AdaBoost.py • bdt_sklearn_to_tmva_Grad.py • Testing and Validation- draw ROC curve • by TMVA and scikit-learn and overlay validate_sklearn_to_tmva.py • Two files created when running examples bdt_sklearn_to_tmva_example.pkl • stores scikit-learn model - bdt_sklearn_to_tmva_example.xml • converted TMVA xml file - 11

Summary Summary • skTMVA - scikit-learn to TMVA converter • supports BDT binary classification - AdaBoost, Gradient Boosting • saves to xml file • comes with examples and validation code • web: https://github.com/yuraic/koza4ok Plans • Convert scikit-learn model to a standalone C++ file • Contact us • Yuriy Ilchenko (core development) - ilchenko@physics.utexas.edu • Nazim Huseynov (validation, testing) - nguseynov@jinr.ru 12

Backup 13

Decision Tree in scikit-learn and TMVA TMVA variable description in • back-up slides (or google) sklearn tree structure is • http://scikit-learn.org/dev/ auto_examples/tree/ unveil_tree_structure.html scikit-learn Decision Tree apply skTMVA converter 14

TMVA minimal weights xml Describe Variables Maps var to VarIndex Tree weight (AdaBoost) Tree number Tree structure as a bunch of included nodes Example: a single tree encoded in TMVA xml file <GeneralInfo> and <Options> - removed, don’t affect BDT score 15

TMVA BDT xml parameters • Variables section • variable Min, Max values show no effect on output BDT score • BinaryTree section - node parameters • IVar=“0" - refers to a variable defined by VarIndex in the Variables section • pos=“s” - root node, pos=“l”- left, pos=“r” - right • Cut=“3.4095886230468750e+01" - node cut value • nType - node type; compared against NodePurityLimit which is set in configuration - TMVA BDT config parameters • nType=“-1" - terminal background node • nType=“1" - terminal signal node • nType=“0" - intermediate node • cType - cut type • cType=“0" - if node variable > cut value, then go left; otherwise - right • cType=“1" - if node variable > cut value, then go right; otherwise - left • purity - S/(S+B); S - number of signal events, B - number of background events • res=“…” and rms=“…” - regression predictions (used in Gradient Boosting) • NCoef=“0" - always zero, some Fisher coefficients, not sure what they are for 16

scikit-learn to TMVA: XML converter tool Yuriy Ilchenko (U. of - PowerPoint PPT Presentation

scikit-learn to TMVA: XML converter tool Yuriy Ilchenko (U. of Texas), Nazim Huseynov (JINR) IML LHC Machine Learning WG Meeting Feb 03, 2015 History ttbar production with non-prompt leptons - major background for a few ttH channels

Deep learning in TMVA Benchmarking TMVA DNN Integration of a Deep Autoencoder Marc Huwiler CERN

Scikit-learn some perspectives Lundi 17 septembre 2018 Lancement de linitjatjve scikit-learn

Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, Namespaces, Usage scenarios,

XML and Web Services Lecture 8 1 Outline XML (Section 17) XML syntax, semistructured

Binary XML and its Characterization Robin Berjon, XML Prague, 25/06/2005 What is Binary XML?

Classification scikit-learn Artificial Intelligence @ Allegheny College Janyl Jumadinova

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Accelerating Random Forests in Scikit-Learn Gilles Louppe Universit e de Li` ege, Belgium

Java 2 Micro Edition XML F. Ricci 2010/2011 J2Me XML overview XML, REST Parsing XML :

XML Documents XML Documents The XML Namespace mechanism Anders Mller & Michael I.

Querying XML Documents Querying XML Documents How XML may be supported in databases with

XML in Programming Patryk Czarnik XML and Applications 2015/2016 Lecture 5 4.04.2016 XML in

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Transforming XML Documents Transforming XML Documents How the XSLT language transforms XML

Session 23 XML XML Reading and Reference Reading https://en.wikipedia.org/wiki/XML

Welcome! CS 360: Programming Languages We have 14 weeks to learn the fundamental concepts of

Loop Invariant Code Motion Last Time Uses of SSA: reaching constants, dead-code elimination,

High Speed Route Lookup for Variable-Length IP Address Wanli Zhang, Xiangyang Gong, Ye Tian,

Photon Not Meeting 27 th July 2017 1 TMVA Classification Can now extract the response variable

MVA method in channel @CEPC FANGYI GUO 1 2019/6/17 MC samples and

On the Diversity of Graphs with High Variable Node Degrees Lun Li David Alderson John C. Doyle

Introduction to Probability Click to go to Table of Contents Slide 5 / 188 Probability One way

Factor Vocab Word 2 Its meaning Introduction to (As it is used A whole number A whole number

scikit-learn to TMVA: XML converter tool Yuriy Ilchenko (U. of - PowerPoint PPT Presentation

scikit-learn to TMVA: XML converter tool Yuriy Ilchenko (U. of Texas), Nazim Huseynov (JINR) IML LHC Machine Learning WG Meeting Feb 03, 2015 History ttbar production with non-prompt leptons - major background for a few ttH channels

Deep learning in TMVA Benchmarking TMVA DNN Integration of a Deep Autoencoder Marc Huwiler CERN

Scikit-learn some perspectives Lundi 17 septembre 2018 Lancement de linitjatjve scikit-learn

Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, Namespaces, Usage scenarios,

XML and Web Services Lecture 8 1 Outline XML (Section 17) XML syntax, semistructured

Binary XML and its Characterization Robin Berjon, XML Prague, 25/06/2005 What is Binary XML?

Classification scikit-learn Artificial Intelligence @ Allegheny College Janyl Jumadinova

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Accelerating Random Forests in Scikit-Learn Gilles Louppe Universit e de Li` ege, Belgium

Java 2 Micro Edition XML F. Ricci 2010/2011 J2Me XML overview XML, REST Parsing XML :

XML Documents XML Documents The XML Namespace mechanism Anders Mller &amp; Michael I.

Querying XML Documents Querying XML Documents How XML may be supported in databases with

XML in Programming Patryk Czarnik XML and Applications 2015/2016 Lecture 5 4.04.2016 XML in

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Transforming XML Documents Transforming XML Documents How the XSLT language transforms XML

Session 23 XML XML Reading and Reference Reading https://en.wikipedia.org/wiki/XML

Welcome! CS 360: Programming Languages We have 14 weeks to learn the fundamental concepts of

Loop Invariant Code Motion Last Time Uses of SSA: reaching constants, dead-code elimination,

High Speed Route Lookup for Variable-Length IP Address Wanli Zhang, Xiangyang Gong, Ye Tian,

Photon Not Meeting 27 th July 2017 1 TMVA Classification Can now extract the response variable

MVA method in channel @CEPC FANGYI GUO 1 2019/6/17 MC samples and

On the Diversity of Graphs with High Variable Node Degrees Lun Li David Alderson John C. Doyle

Introduction to Probability Click to go to Table of Contents Slide 5 / 188 Probability One way

Factor Vocab Word 2 Its meaning Introduction to (As it is used A whole number A whole number

XML Documents XML Documents The XML Namespace mechanism Anders Mller & Michael I.