discrimination via semantic segmentation Andy Chappell 11/12/2019 - - PowerPoint PPT Presentation

β–Ά
discrimination via
SMART_READER_LITE
LIVE PREVIEW

discrimination via semantic segmentation Andy Chappell 11/12/2019 - - PowerPoint PPT Presentation

Pandora track/shower discrimination via semantic segmentation Andy Chappell 11/12/2019 DUNE UK Meeting 2 Roadmap Overview and project goal Model architecture Approach to tuning (Very) preliminary performance figures 3


slide-1
SLIDE 1

Pandora track/shower discrimination via semantic segmentation

Andy Chappell 11/12/2019 DUNE UK Meeting

slide-2
SLIDE 2
  • Overview and project goal
  • Model architecture
  • Approach to tuning
  • (Very) preliminary performance figures

Roadmap

2

slide-3
SLIDE 3
  • Assign track and shower probabilities to every hit in

U, V and W planes

  • Train a neural network to calculate the probabilities
  • Pass these probabilities to downstream Pandora

algorithms for cluster creation, merging, etc

  • Currently cluster property-based cuts

Overview and project goal

w x

3

slide-4
SLIDE 4
  • U-Net architecture developed for

biomedical image segmentation in 2015

  • Convolutions form the down-

sampling part of the U

  • Transpose convolutions form the

up-sampling part of the U

  • Skip connections add images from

down-sampling path to up- sampling path

  • Track and shower probabilities

assigned to each pixel

Architecture

https://arxiv.org/abs/1505.04597

4

slide-5
SLIDE 5
  • Building on work started by Steven Green
  • PyTorch implementation
  • Two key blocks in the network
  • Down-sampling convolution block

[Conv2d, ReLU, BatchNorm] [Conv2d, ReLU, BatchNorm]

  • Up-sampling transpose convolution block

[ConvTranspose2d, ReLU, BatchNorm] [Conv2d, ReLU, BatchNorm]

  • Loss: Categorical cross-entropy

π‘šπ‘π‘‘π‘‘ = βˆ’π‘šπ‘œπ‘§π‘’π‘ π‘£π‘“_π‘‘π‘šπ‘π‘‘π‘‘

  • Accuracy: Fraction classified == truth

Architecture

ො 𝑦𝑗 = 𝑦𝑗 βˆ’ 𝜈𝐢 𝜏𝐢 𝑧𝑗 = 𝛿 ො 𝑦𝑗 + 𝛾 Batch normalisation

1

  • 1

1

Rectified Linear Unit

5

slide-6
SLIDE 6
  • Multiple input pixels map to one output

pixel

  • Each layer increases number of kernels to

build more complex features

  • Stride 2 down-samples to reduce

computational overhead

Architecture

Up-sample Down-sample

  • Each input pixel maps to multiple output

pixels

  • Effective stride 1/2 up-samples to return to
  • riginal image size

Credit: V. Dumoulin & F. Visin Credit: T. Lane

6

slide-7
SLIDE 7
  • Trained on a 980 event subset of MCC11 (no space charge)
  • 80% training, 20% validation
  • Split into batches of 48 images
  • Images 512 x 208 (likely to change)
  • Images generated to cover full extent of plane in drift and wire

positions (only W plane in provisional tests)

Inputs

w x

7

slide-8
SLIDE 8

Activations

8

slide-9
SLIDE 9
  • What’s the fastest we can

train?

  • Learning rate controls step

size in weights relative to gradient

  • Start with a very low

learning rate, increase each batch

  • Try different weight decays
  • Getting this right allows

faster training

Learning rate optimisation

9

slide-10
SLIDE 10
  • Clearly very high learning rates attainable,

but…

  • Not for a single value over the whole training

cycle

  • Highest rates either fails to progress, or starts

poorly

  • Notable differences in accuracy evolution

10

slide-11
SLIDE 11
  • Linear decay better, but…
  • Higher rates still fail to progress or start

poorly

  • Still notable differences in accuracy evolution

11

slide-12
SLIDE 12
  • One-cycle learning:
  • Start slow -> Accelerate -> Decelerate
  • Smoother loss evolution
  • Train at higher rates
  • Consistent accuracy evolution across a range
  • f maximum learning rates

https://arxiv.org/abs/1803.09820

12

slide-13
SLIDE 13
  • Overall accuracy ~87%
  • Track accuracy ~82%
  • Shower accuracy ~95%

Performance

w x

MC Classification Network Classification

13

slide-14
SLIDE 14
  • Further network refinements:
  • Further refine One-Cycle learning rate
  • Architectural tweaks (e.g. ResBlocks, depth)
  • Vary learning rate by layer group
  • Regularisation
  • Data set refinement
  • Crop image plane to hit region and higher resolution in x
  • Image augmentation (e.g. randomly rotate images in each batch)
  • Train on a much larger data set
  • Transfer learning with larger images

Future plans

14

slide-15
SLIDE 15

Backup

15

slide-16
SLIDE 16
  • One-cycle policies best performing
  • Able to go to higher learning rates
  • Constant and linear decay performance ok
  • Expect constant rate to plateau
  • Exponential decay fails to make progress

16

slide-17
SLIDE 17
  • Conv2d weights use Kaiming uniform initialisation

with 𝑏 = √5

  • Historical artefact. Jeremy Howard discovered this

causes gradients to vanish in deep networks

  • Will be fixed in a future release of PyTorch
  • For now, useful to reinitialise weights with Kaiming

normal initialisation with 𝑏 = 0 (for standard ReLU)

Conv2d initialization in PyTorch

17