EML Update
Nick Amin July 10, 2018
EML Update Nick Amin July 10, 2018 Overview Last update (SNT), - - PowerPoint PPT Presentation
EML Update Nick Amin July 10, 2018 Overview Last update (SNT), also presented at ML workshop Feedback from ML talk There were some concerns I might be taking advantage of b,c e (or just being unfair in general) if my network is
Nick Amin July 10, 2018
⚫ Last update (SNT), also presented at ML workshop ⚫ Feedback from ML talk
general) if my network is learning isolation
(BDT-based)
track information with latest architecture, changes, etc to confirm
⚫ Note: in rest of slides here, consider barrel only
2
⚫ Is the network using isolation? (29x15 used vs 𝜏i𝜃i𝜃’s 5x5) ⚫ In DY, virtually all events are "truth-matched" to not be in
the b,c→e category, so we can separate out the two to see how important isolation would be
⚫ Comparing AUCs in legends, b/c class backgrounds are
actually worse than unmatched in all pT bins except pT>45
3 signal vs all signal vs unmatched signal vs b,c→e signal
⚫ Switch from DY to tt̅ ⚫ From pictures below, network should have
no trouble distinguishing isolated/ nonisolated electron candidates since we’re using a large 29x15 window around the seed
4 signal b/c unmatched bkg 20<pT<25 sig 20<pT<25
⚫ 𝜏i𝜃i𝜃 (calculated
from 5x5 crystals) does slightly better than in DY
⚫ CNN has large
improvement
5 signal vs all signal vs unmatched signal vs b,c→e signal vs all shape BDT vs 𝜏i𝜃i𝜃 signal vs all shape BDT vs CNN CNN vs 𝜏i𝜃i𝜃
⚫ Now, as an exercise for just this slide
and the next, we try training/testing with electron images made only with supercluster cells/energies (implementation in backup)
be learned by the CNN as isolation quantities do not consider deposits belonging to the supercluster
⚫ Both signal and background images
become sparser
6 bkg 20<pT<25 sig 20<pT<25 avg. bkg. avg. sig. all cells SC only SC only
⚫ Using SC-only degrades the performance wrt the
around the seed) → ~half of the gain is lost
⚫ Below, show CNN, 𝜏i𝜃i𝜃, and 6-variable BDT
trained on shower shape variables
7 CNN vs 6-var BDT 6-var BDT vs 𝜏i𝜃i𝜃 CNN vs 𝜏i𝜃i𝜃 all cells SC
all cells SC only pT<15 25% 11% 15<pT<25 10% 5% 25<pT<45 10% 6% pT>45 9% 5%
For background efficiency of 20%, signal efficiency increase of CNN wrt BDT
8
⚫ First, see if feeding 21 MVA inputs into DNN achieves comparable performance to
(BDT-trained) MVA
AUC to BDT
tried a network smaller than 100k parameters
~identical (compare AUC values in legend)
9 linear log
InputLayer (21) Dense (128) Dropout(0.1), LeakyReLU
2816
Dense (32) LeakyReLU Dense (2)
66
Dense (64) LeakyReLU
2080
Dense (128) Dropout(0.1), LeakyReLU Dense (128) LeakyReLU
16512 8256
Dense (256) Dropout(0.1), LeakyReLU
33024 32896
⚫ Now try to feed "lower-level" track information into the network ⚫ 19 variables
⚫ Subtract supercluster 𝜃, 𝜚 from the 𝜃, 𝜚 components of the triplets
10 https://github.com/cms-sw/cmssw/blob/0b70aea1b7723a6dfd453d9d015b670d0f735256/ DataFormats/EgammaCandidates/interface/GsfElectron.h#L279-L283
math::XYZPointF positionAtVtx ; // the track PCA to the beam spot math::XYZPointF positionAtCalo ; // the track PCA to the supercluster position math::XYZVectorF momentumAtVtx ; // the track momentum at the PCA to the beam spot math::XYZVectorF momentumAtCalo ; // the track momentum extrapolated at the supercluster position from the innermost track state math::XYZVectorF momentumOut ; // the track momentum extrapolated at the seed cluster position from the outermost track state
⚫ With the previous positions/momenta,
can compute the red track variables
from the CNN, this should account for nearly all of the performance of the MVA
11
Most important Least important
⚫ After reweighting background to signal, split by charge
12 R 𝛦𝜃( · ,SC) 𝛦𝜚( · ,SC) vtx pos. calo pos. vtx mom. calo mom.
13
Dense (15) Dropout(0.1), LeakyReLU Dense (2) 32 Dense (50) Dropout(0.1), LeakyReLU 765 Dense (150) Dropout(0.3), LeakyReLU 7550 Concatenate (400) 60150 Dense (128) Dropout(0.1), LeakyReLU Dense (64) LeakyReLU 8256 Conv2D 3x3 (7,14,64) LeakyReLU MaxPooling2D (3,7,64) Conv2D 3x3 (3,7,16) Dropout(0.2), LeakyReLU 9232 Conv2D 3x3 (15,29,32) LeakyReLU MaxPooling2D (7,14,32) 18496 Dense (128) Dropout(0.1), LeakyReLU Dense (256) Dropout(0.1), LeakyReLU 33024 Dense (512) Dropout(0.1), LeakyReLU 131584 Dense (256) Dropout(0.1), LeakyReLU 131328 32896 InputLayer (19) 2560 Flatten (336) InputLayer (15,29,1) 32019 track variables ⚫ Using two-prong network
⚫ At 10% bkg efficiency, signal efficiency for "NN" is
~2-8% worse than the full MVA
⚫ Can the remaining/lower-ranked variables make up
this difference?
14
Dense (15) Dropout(0.1), LeakyReLU Dense (2) 32 Dense (50) Dropout(0.1), LeakyReLU 765 Dense (150) Dropout(0.3), LeakyReLU 7550 Concatenate (400) 60150 Dense (128) Dropout(0.1), LeakyReLU Dense (64) LeakyReLU 8256 Conv2D 3x3 (7,14,64) LeakyReLU MaxPooling2D (3,7,64) Conv2D 3x3 (3,7,16) Dropout(0.2), LeakyReLU 9232 Conv2D 3x3 (15,29,32) LeakyReLU MaxPooling2D (7,14,32) 18496 Dense (128) Dropout(0.1), LeakyReLU Dense (256) Dropout(0.1), LeakyReLU 33024 Dense (512) Dropout(0.1), LeakyReLU 131584 Dense (256) Dropout(0.1), LeakyReLU 131328 32896 InputLayer (19) 2560 Flatten (336) InputLayer (15,29,1) 32019 track variables +9 MVA variables
⚫ Now take the network from the previous slide and append 9 variables from the
MVA (not covered by the CNN shape information or the 19 track variables), and retrain to see the effect of the "lower-ranked" variables
lower-ranked variables are not negligible
performance of the MVA, so it is not fully utilizing the 19 raw track variables that we are feeding in (?)
⚫ Note, another way of viewing this network/training configuration…
CNN on raw 29x15 crystals, and 6 high-level track variables are replaced by 19 raw track variables … and the performance is slightly worse?
⚫ Try photons ⚫ Eventually get back to endcap training
15
16
⚫ Implementation of "SC-only" cells for reference, as there could be a subtlety with the
"hits and fractions" for the supercluster
this to the full list of energies per crystal to get energies associated to the supercluster
17
// Get all EB rechits and store them in an (ieta,iphi)->energy map std::map<std::pair<int,int>, float> ietaiphi_to_energy; auto rechits = lazyToolnoZS->getEcalEBRecHitCollection(); for (EBRecHitCollection::const_iterator it = rechits->begin(); it != rechits->end(); ++it) { int hit_ieta = EBDetId(it->detid()).ieta(); int hit_iphi = EBDetId(it->detid()).iphi(); float energy = it->energy(); ietaiphi_to_energy[{hit_ieta,hit_iphi}] = energy; } // Get all hits & fractions for the supercluster // For each hit, look up the cell energy from the previous map and multiply // by the fraction // This is then the energy that goes into an ieta/iphi cell. auto supercluster = pat_ele->superCluster(); std::vector<std::pair<DetId,float> > hfSC = supercluster->hitsAndFractions(); for(std::vector<std::pair<DetId,float> >::const_iterator it = hfSC.begin(); it != hfSC.end(); ++it) { DetId id = (*it).first; if (!( id.subdetId() == EcalBarrel)) continue; int ieta = EBDetId(id).ieta(); int iphi = EBDetId(id).iphi(); float rawenergy = ietaiphi_to_energy[{ieta,iphi}]; float frac = (*it).second; float energy = rawenergy*frac; rhs_e.push_back(energy); rhs_iphi.push_back(iphi); rhs_ieta.push_back(ieta); }
⚫ How does an SC-only 5x5 CNN compare to BDT?
⚫ CNN worse than 6-variable shape BDT
18
InputLayer (5,5,1)Conv2D 3x3 (5,5,32) LeakyReLU 320 Conv2D 3x3 (2,2,64) LeakyReLU
MaxPooling2D (1,1,64) Conv2D 3x3 (1,1,16) Dropout(0.2), LeakyReLU9232
Flatten (16) Dense (150) Dropout(0.3), LeakyReLU MaxPooling2D (2,2,32)18496
Dense (50) Dropout(0.1), LeakyReLU Dense (15) Dropout(0.1), LeakyReLU765
Dense (2)32 7550