tttt bdt
play

tttt BDT Nick Amin September 29, 2018 Overview Last time, showed - PowerPoint PPT Presentation

tttt BDT Nick Amin September 29, 2018 Overview Last time, showed cut-based analysis with latest data and lumi of (35.87+41.53+35.53=) 112.9fb -1 getting around 2.84 expected significance Repeat with updated BDT (previously, had


  1. tttt BDT Nick Amin September 29, 2018

  2. Overview ⚫ Last time, showed cut-based analysis with latest data and lumi of (35.87+41.53+35.53=) 112.9fb -1 getting around 2.84 𝜏 expected significance ⚫ Repeat with updated BDT (previously, had 19-variable TMVA BDT trained with 2016 samples) ⚫ Explore xgboost instead of TMVA, and in any case, retrain TMVA BDT with 2016+2017 samples for more statistics • One intermediate goal is to come up with a sane binning scheme/formula (rather than trying random partitions and picking the best one) � 2

  3. Input details, TMVA ⚫ 19 variables on the right extracted from 2016+2017 MC • Looser baseline for more stats: Njets ≥ 2, Nb ≥ 1, HT ≥ 250, MET ≥ 30, lepton p T ≥ 15 • No CRZ included, because we will separate that into its own bin for the full feature_names = [ analysis "nbtags", ⚫ All numbers in these slides should be consistent, and associated with a luminosity "njets", of 35.9+41.5= 77.4fb -1 — multiply significances by 1.2 to project to 112.9fb -1 , or 1.3 "met", to project to 132fb -1 "ptl2", ⚫ I checked that the discriminator shape for signal is essentially the same for OS and "nlb40", "ntb40", SS events, so include signal OS events to double the statistics "nleps", • ~400k unweighted signal and background events in total "htb", ⚫ Retrain TMVA BDT with below configuration (found from hyperparameter scan last "q1", time) "ptj1", • Key points — 500 trees with a depth of 5 using the AdaBoost algorithm "ptj6", "ptj7", method = factory.BookMethod(loader, ROOT.TMVA.Types.kBDT, "BDT", "ml1j1", ":".join([ "dphil1l2", "!H", "maxmjoverpt", "!V", "ptl1", "NTrees=500", "detal1l2", "nEventsMin=150", "ptj8", "MaxDepth=5", "ptl3", "BoostType=AdaBoost", ] "AdaBoostBeta=0.25", "SeparationType=GiniIndex", "nCuts=20", "PruneMethod=NoPruning", ])) � 3

  4. xgboost ⚫ Preprocessing • Use absolute value of weights in training for reasons of stability • When re-weighting signal and background to have average weights of 1, throw away a small (sub%) fraction of events num_trees = 500 that have large relative weights, from x+gamma mainly ⚫ Tried to use BayesianOptimization package to get optimal param['objective'] = 'binary:logistic' param['eta'] = 0.07 hyperparameters param['max_depth'] = 5 • This attempts to iteratively find the best point by exploring param['silent'] = 1 regions for which "information gained" is maximized param['nthread'] = 15 • Turns out once you get the learning rate (eta), the number of param['eval_metric'] = "auc" trees, and the subsampling fraction right, the rest don’t param['subsample'] = 0.6 matter/matter very little param['alpha'] = 8.0 ⚫ Also naively tried Condor (pick random points and submit ~4-5k param['gamma'] = 2.0 param['lambda'] = 1.0 trainings) param['min_child_weight'] = 1.0 • Same story here param['colsample_bytree'] = 1.0 ⚫ To avoid picking an overtrained hyperparameter set, rather than pick exactly the best point, I used representative values for the parameters on the right (definitions documented here) and made numbers more round ⚫ Key points here • 500 trees, depth of 5 — same as TMVA • Gradient boosting algorithm instead of AdaBoost — this can actually a ff ect the shape of the discriminator output � 4

  5. Training results ⚫ Bottom left plot shows discriminator shapes for signal/ bkg in train/test sets • Kolmogorov-Smirnov test shows good consistency — no overtraining observed ⚫ Top right shows AUC of xgboost is ~1.2% higher than TMVA ⚫ Bottom right shows maximal s/sqrt(s+b) (single cut) is 1.83 for xgboost, but 1.75 for TMVA (5% higher for xgboost) • The shape is qualitatively di ff erent however xgboost � 5

  6. Significance metrics ⚫ Ran HiggsCombine 10-50k times, using a simplified card/ nuisance structure • Group fakes/flips into Others, rares/ttxx/tttx/xg into "Rares" as shown in the plot on the right • Then compute two versions of the expected significance • significance without MC stats: 5 background processes + 1 signal process + 0 nuisances • significance with MC stats: 5 background processes + 1 signal process + (Nbins * (5+1)) uncorrelated nuisances representing MC statistical uncertainty in each bin • Use the latter for optimization/ranking to hopefully avoid low MC statistic bins/fluctuations, though the TMVA output mapped di ff erence in the two values is only a few percent from [-1,1] to [0,1] because this analysis is statistically limited ⚫ I’m showing s/sqrt(s+b) as the metric for each bin in the ratio panels, but I found that for low number of bins (e.g., Note, these discriminator plots 2-3), it is not indicative of the expected significance from require the actual baseline selection combine. However, the below higher-order likelihood- (HT>300, MET>50, Nb/Njets ≥ 2, lepton p T >25,20) approximation usually agrees with combine within ~2% (again, for 2-3 bins, so not useful in the right plot) σ = 2( s + b )ln(1 + s / b ) − 2 s � 6

  7. Exp. 𝜏 (out-of-the-box) ⚫ Here we can see the shape di ff erence between TMVA and xgboost, though both get very similar AUC and s/sqrt(s+b) ⚫ Note that I scaled the TMVA plot from the previous slide from [0.15,1] to [0,1] to avoid empty bins because the TMVA output doesn’t cover the full [-1,1] range initially • This is one source of slight ambiguity for binning since you can’t just equally partition [-1,1] — you have to decide where to start binning on the left ⚫ Afterwards, create 20 equal-width bins for TMVA and xgboost and calculate the expected significance without MC stat and with MC stat • TMVA is ~6% higher than xgboost even though s,b and AUC metrics indicate xgboost should be winning… • Presumably, combine likes several moderately high s/sqrt(s+b) bins (TMVA) rather than one really high one (xgboost) • AUC doesn’t care about the squished signal on the right, but a fit probably does ⚫ As a quick comparison (in backup), I ran this procedure on the cut-based SR binning (18 bins) and get ~2.25 𝜏 stretched TMVA from [0.15,1] to [0,1] xgboost � 7 2.63477, 2.59117 2.60103, 2.44803

  8. Run combine a lot ⚫ Run combine a few thousand times for TMVA and xgboost discriminators with a random number of bins (between 10-20) and random binning • Get a set of flat- or gaussian-distributed random numbers (50-50 chance) and take cumulative sum and squeeze to [0,1] to obtain a "random binning" • Reject binning scheme if there is an empty bin (or one with <0.05 s+b events) • Additionally, compute s/sqrt(s+b) and make sure >~80% of bins are increasing in this metric to avoid weird-looking distributions (e.g., right) ⚫ Left plot shows significance (no MC stat) vs significance — on average, "sig no stat" is ~1.8% higher than "sig stat" ⚫ Middle plot has 1D distributions of "sig stat" for xgboost and TMVA • The di ff erence here is quite striking. TMVA is better than xgboost and fairly stable ⚫ Right plot shows maximum s/sqrt(s+b) across all bins against the significance • Narrow orange line at the top left contains cases where the last xgboost bin has a higher s/sqrt(s+b) than any other bin, so it dominates the result and is clearly correlated with the output of combine • TMVA has a lower maximum than xgboost on average, but obtains a better significance • This is along the lines of the suspicion on the previous slide about squishing the signal � 8

  9. Dependence on bin count ⚫ Plot expected significance for TMVA (left) and xgboost (right) — the legend is more useful than the histograms though ⚫ For each bin count, display the mean significance and also the mean of the highest 10% of significances • TMVA only has a ~1% gain going from lowest bin count to highest • xgboost has a 5-8% gain � 9

  10. E ff ect of MC stats ⚫ Now plot the di ff erence between expected significance without MC statistics nuisances and with , as a function of the number of bins ⚫ For TMVA, the di ff erence decreases a little bit going from 10 to 19 bins ⚫ For xgboost, the di ff erence increases going from 10 to 19 ⚫ I would expect less bins to mean a smaller e ff ect of MC statistics, along the lines of what xgboost shows � 10

  11. Reshaping xgboost output ⚫ From an earlier slide, signal is compressed at disc=1 for xgboost. Naively try to reshape it to look like TMVA by matching the relative signal counts in each bin ⚫ Take the equally-spaced bins in the xgboost discriminator (x-axis) and make them match TMVA (y-axis) — bins more finely where signal is bunched up ⚫ Green dots are calculated by matching integrals, blue is a linear interpolation that we can apply ⚫ Two approaches • Convert the xgboost discriminator value on an event-by-event basis (blue) • Re-space the bins (orange, which is the inverse of blue) ⚫ Note that orange is very sigmoid-like… � 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend