scikit-learn to TMVA: XML converter tool
Yuriy Ilchenko (U. of Texas), Nazim Huseynov (JINR) IML LHC Machine Learning WG Meeting Feb 03, 2015
scikit-learn to TMVA: XML converter tool Yuriy Ilchenko (U. of - - PowerPoint PPT Presentation
scikit-learn to TMVA: XML converter tool Yuriy Ilchenko (U. of Texas), Nazim Huseynov (JINR) IML LHC Machine Learning WG Meeting Feb 03, 2015 History ttbar production with non-prompt leptons - major background for a few ttH channels
Yuriy Ilchenko (U. of Texas), Nazim Huseynov (JINR) IML LHC Machine Learning WG Meeting Feb 03, 2015
for a few ttH channels
prompt from non-prompt leptons
2
Signal efficiency
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Background rejection
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
MVA Method: BDT
Background rejection versus Signal efficiency Signal efficiency
0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92
Background rejection
0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1
MVA Method: BDT
Background rejection versus Signal efficiency Signal efficiency
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Background rejection
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
MVA Method: BDT
Background rejection versus Signal efficiency
3
10% sample
zoom in
33% sample
zoom in
Signal efficiency
0.84 0.86 0.88 0.9 0.92
Background rejection
0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
MVA Method: BDT
Background rejection versus Signal efficiency
cuts
Signal efficiency
0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94
Background rejection
0.95 0.96 0.97 0.98 0.99 1
MVA Method: BDT
Background rejection versus Signal efficiency Signal efficiency
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Background rejection
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
MVA Method: BDT
Background rejection versus Signal efficiency
4
10% sample
zoom in
33% sample
*tree*. I cannot boost such a thing... if after 1 step the error rate is == 0.5
no results :( Decided to try an alternative MVA library cuts
analysis written in python
neural networks, etc
developers
5
Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
www.scikit-learn.org
6
10% sample
zoom in
33% sample
zoom in
cuts
7
10% sample
zoom in
33% sample
zoom in
cuts
8
the xml format readable by TMVA Reader
For Training For Testing
skTMVA converter
calculation, some other tools
9
git clone https://github.com/yuraic/koza4ok.git
source setup_koza4ok.sh
10
scikit-learn model TMVA input variables and their type (variable order matters!)
is generated on fly
by TMVA and scikit-learn and overlay
11
Two files created when running examples
Summary
Plans
12
13
14
scikit-learn Decision Tree apply skTMVA converter
back-up slides (or google)
http://scikit-learn.org/dev/ auto_examples/tree/ unveil_tree_structure.html
15
Describe Variables Maps var to VarIndex Tree structure as a bunch of included nodes
Tree number Tree weight (AdaBoost)
Example: a single tree encoded in TMVA xml file <GeneralInfo> and <Options> - removed, don’t affect BDT score
16