 
              Pandora track/shower discrimination via semantic segmentation Andy Chappell 11/12/2019 DUNE UK Meeting
2 Roadmap • Overview and project goal • Model architecture • Approach to tuning • (Very) preliminary performance figures
3 Overview and project goal • Assign track and shower probabilities to every hit in U, V and W planes • Train a neural network to calculate the probabilities • Pass these probabilities to downstream Pandora algorithms for cluster creation, merging, etc • Currently cluster property-based cuts x w
4 Architecture • U-Net architecture developed for biomedical image segmentation in 2015 • Convolutions form the down- sampling part of the U • Transpose convolutions form the up-sampling part of the U • Skip connections add images from down-sampling path to up- sampling path • Track and shower probabilities https://arxiv.org/abs/1505.04597 assigned to each pixel
5 Architecture • Building on work started by Steven Green Rectified Linear Unit • PyTorch implementation 1 • Two key blocks in the network • Down-sampling convolution block [Conv2d, ReLU, BatchNorm] [Conv2d, ReLU, BatchNorm] 0 • Up-sampling transpose convolution block -1 0 1 Batch normalisation [ConvTranspose2d, ReLU, BatchNorm] [Conv2d, ReLU, BatchNorm] 𝑦 𝑗 = 𝑦 𝑗 − 𝜈 𝐶 ො • Loss: Categorical cross-entropy 𝜏 𝐶 𝑚𝑝𝑡𝑡 = −𝑚𝑜𝑧 𝑢𝑠𝑣𝑓_𝑑𝑚𝑏𝑡𝑡 𝑧 𝑗 = 𝛿 ො 𝑦 𝑗 + 𝛾 • Accuracy: Fraction classified == truth
6 Architecture Down-sample Up-sample Credit: V. Dumoulin & F. Visin Credit: T. Lane • Multiple input pixels map to one output • Each input pixel maps to multiple output pixel pixels • Each layer increases number of kernels to • Effective stride 1/2 up-samples to return to build more complex features original image size • Stride 2 down-samples to reduce computational overhead
7 Inputs • Trained on a 980 event subset of MCC11 (no space charge) • 80% training, 20% validation • Split into batches of 48 images • Images 512 x 208 (likely to change) • Images generated to cover full extent of plane in drift and wire positions (only W plane in provisional tests) x w
8 Activations
9 Learning rate optimisation • What’s the fastest we can train? • Learning rate controls step size in weights relative to gradient • Start with a very low learning rate, increase each batch • Try different weight decays • Getting this right allows faster training
10 • Clearly very high learning rates attainable, but… • Not for a single value over the whole training cycle • Highest rates either fails to progress, or starts poorly • Notable differences in accuracy evolution
11 • Linear decay better, but… • Higher rates still fail to progress or start poorly • Still notable differences in accuracy evolution
12 • One-cycle learning: https://arxiv.org/abs/1803.09820 • Start slow -> Accelerate -> Decelerate • Smoother loss evolution • Train at higher rates • Consistent accuracy evolution across a range of maximum learning rates
13 Performance • Overall accuracy ~87% • Track accuracy ~82% • Shower accuracy ~95% MC Classification x Network Classification w
14 Future plans • Further network refinements: • Further refine One-Cycle learning rate • Architectural tweaks (e.g. ResBlocks, depth) • Vary learning rate by layer group • Regularisation • Data set refinement • Crop image plane to hit region and higher resolution in x • Image augmentation (e.g. randomly rotate images in each batch) • Train on a much larger data set • Transfer learning with larger images
15 Backup
16 • One-cycle policies best performing • Able to go to higher learning rates • Constant and linear decay performance ok • Expect constant rate to plateau • Exponential decay fails to make progress
17 Conv2d initialization in PyTorch • Conv2d weights use Kaiming uniform initialisation with 𝑏 = √5 • Historical artefact. Jeremy Howard discovered this causes gradients to vanish in deep networks • Will be fixed in a future release of PyTorch • For now, useful to reinitialise weights with Kaiming normal initialisation with 𝑏 = 0 (for standard ReLU)
Recommend
More recommend