GNNs for HL-LHC Tracking ExaTrkX @ Berkeley Lab Daniel Murnane - - PowerPoint PPT Presentation

gnns for hl lhc tracking
SMART_READER_LITE
LIVE PREVIEW

GNNs for HL-LHC Tracking ExaTrkX @ Berkeley Lab Daniel Murnane - - PowerPoint PPT Presentation

GNNs for HL-LHC Tracking ExaTrkX @ Berkeley Lab Daniel Murnane Office of BERKELEY LAB 1 Science Goal Sub-second processing of HL-LHC hit data into: Seeds (i.e. triplets) for further processing with traditional techniques, AND/OR


slide-1
SLIDE 1

BERKELEY LAB

Office of Science 1

GNNs for HL-LHC Tracking

ExaTrkX @ Berkeley Lab

Daniel Murnane

slide-2
SLIDE 2

BERKELEY LAB

Office of Science 2

Goal

Sub-second processing of HL-LHC hit data into:

  • Seeds (i.e. triplets) for further processing with traditional

techniques, AND/OR

  • Tracks, where each hit is assigned to exactly one track
slide-3
SLIDE 3

BERKELEY LAB

Office of Science 3

The Current Pipeline

Raw hit data embedded Filter likely, adjacent doublets Train/classify doublets in GNN Filter, convert to triplets Train/classify tripets in GNN Apply cut for seeds DBSCAN for track labels

slide-4
SLIDE 4

BERKELEY LAB

Office of Science 4

Dataset

  • “TrackML Kaggle

Competition” dataset

  • Generated by simulation
  • 8000 collisions to train on
  • Each collision has up to

100,000 hits of around 10,000 particles

slide-5
SLIDE 5

BERKELEY LAB

Office of Science 5

Dataset

  • Ideal final result is a

“TrackML score” 𝑇 ∈ 0,1

  • All hits belonging to same

track labelled with same unique label ⇒ 𝑇 = 1

  • We use the barrel as a

test case, and ignore noise

slide-6
SLIDE 6

BERKELEY LAB

Office of Science 6

Embedding + MLP Construction

  • Won’t give any detail (Nick’s talk next on embeddings)
  • Generally:

1.

For each hit in event, embed features (co-ordinates, cell direction data, etc.) into N- dimensional space

2.

Associate hits from same tracks as close in N-dimensional distance

3.

Score each hit within embedding neighbourhood against the “seed” hit at centre

4.

Filter by score, to create a set of doublets for the neighbourhood

5.

All doublets in event generate a graph, converted to a directed graph (by ordering layers)

slide-7
SLIDE 7

BERKELEY LAB

Office of Science 7

Segmentation

Hard cut One-directional soft cut Bi-directional soft cut

A full graph from the embedding does not fit on a single GPU. Therefore the event graphs are segmented, according to how large the GNN model is expected to be.

slide-8
SLIDE 8

BERKELEY LAB

Office of Science 8

Previous ML Approaches

  • Tracks as images (CNN)
  • Tracks as sequences
  • f points (LSTM)

8

slide-9
SLIDE 9

BERKELEY LAB

Office of Science 9

Graph Neural Network for Edge Classification

Classify edges with score between [0,1] score > cut: true score < cut: fake

slide-10
SLIDE 10

BERKELEY LAB

Office of Science 10

Passing information around the graph gives it learning power

  • Can make a node

“aware” of its neighbours by concatenating the neighbouring hidden features

  • Iterating this

neighbourhood learning passes information around the graph

  • Can be considered a

generalisation of a flat CNN convolution

slide-11
SLIDE 11

BERKELEY LAB

Office of Science 11

11 11

  • Message Passing
  • Attention Message Passing
  • Attention Message Passing

with Recursion

GNN Edge prediction architecture

  • Attention Message Passing

with Residuals + +

slide-12
SLIDE 12

BERKELEY LAB

Office of Science 12

12 12

  • Message Passing
  • Attention Message Passing
  • Attention Message Passing

with Recursion

GNN Edge prediction architecture

  • Attention Message Passing

with Residuals + +

Have found best efficiency & purity performance.

slide-13
SLIDE 13

BERKELEY LAB

Office of Science 13

Edge attention architecture

  • Input node features
  • Hidden node features
  • Hidden edge features
  • Edge score
  • Attention aggregation
  • New hidden node features
  • New hidden edge features
  • New edge score

13 13 x n iterations (hyperparameter) 𝑦 𝑧 𝑨

slide-14
SLIDE 14

BERKELEY LAB

Office of Science 14

  • Input node features
  • Hidden node features
  • Hidden edge features
  • Edge score
  • Attention aggregation
  • New hidden node features
  • New hidden edge features
  • New edge score

14 14 𝑦 𝑧 𝑨

ℎ1 … ℎ𝑜

x n iterations

Edge attention architecture

slide-15
SLIDE 15

BERKELEY LAB

Office of Science 15

  • Input node features
  • Hidden node features
  • Hidden edge features
  • Edge score
  • Attention aggregation
  • New hidden node features
  • New hidden edge features
  • New edge score

15 15 x n iterations

ℎ1 … ℎ𝑜 ℎ1 … ℎ𝑜

1

ℎ1 … ℎ𝑜

0,1

Edge attention architecture

slide-16
SLIDE 16

BERKELEY LAB

Office of Science 16

  • Input node features
  • Hidden node features
  • Hidden edge features
  • Edge score
  • Attention aggregation
  • New hidden node features
  • New hidden edge features
  • New edge score

16 16 x n iterations 0.6

ℎ1 … ℎ𝑜

0,1

Edge attention architecture

slide-17
SLIDE 17

BERKELEY LAB

Office of Science 17

  • Input node features
  • Hidden node features
  • Hidden edge features
  • Edge score
  • Attention aggregation
  • New hidden node features
  • New hidden edge features
  • New edge score

17 17 x n iterations 0.6 0.4 0.1 0.4 0.9 0.1 0.8 0.8

Edge attention architecture

slide-18
SLIDE 18

BERKELEY LAB

Office of Science 18

  • Input node features
  • Hidden node features
  • Hidden edge features
  • Edge score
  • Attention aggregation
  • New hidden node features
  • New hidden edge features
  • New edge score

18 18 x n iterations 0.6 0.4 0.4 0.1 0.8

ℎ1 ℎ2 ℎ3 ℎ4 ℎ5

Edge attention architecture

slide-19
SLIDE 19

BERKELEY LAB

Office of Science 19

  • Input node features
  • Hidden node features
  • Hidden edge features
  • Edge score
  • Attention aggregation
  • New hidden node features
  • New hidden edge features
  • New edge score

19 19 x n iterations +

ℎ1

ℎ2 ℎ3

ℎ5

ℎ4

ℎ1 … ℎ𝑜

0.6 0.4 0.4 0.1 0.8

Edge prediction architecture

slide-20
SLIDE 20

BERKELEY LAB

Office of Science 20

  • Input node features
  • Hidden node features
  • Hidden edge features
  • Edge score
  • Attention aggregation
  • New hidden node features
  • New hidden edge features
  • New edge score

20 20 x n iterations

ℎ1 … ℎ𝑜 ℎ1 … ℎ𝑜

1

ℎ1 … ℎ𝑜

0,1

0.6

Edge attention architecture

slide-21
SLIDE 21

BERKELEY LAB

Office of Science 21

  • Input node features
  • Hidden node features
  • Hidden edge features
  • Edge score
  • Attention aggregation
  • New hidden node features
  • New hidden edge features
  • New edge score

21 21 x n iterations 0.9 0.2 0.1 0.2 0.9 0.3 0.9 0.2

ℎ1 … ℎ𝑜

0,1

x n iterations (hyperparameter)

Edge attention architecture

slide-22
SLIDE 22

BERKELEY LAB

Office of Science 22

Doublet GNN Performance

Threshold 0.5 0.8 Accuracy 0.9761 0.9784 Purity 0.9133 0.9694 Efficiency 0.9542 0.9052 Two points to keep in mind

  • In the past, graphs have been constructed with a heuristic procedure

that had much lower efficiency than the learned embedding. This GNN is classifying a ∼ 96% efficient doublet dataset

  • These metrics are not the end product: we use the scores of the

doublets to create triplets without losing efficiency

slide-23
SLIDE 23

BERKELEY LAB

Office of Science 23

Why not simply join together our doublet predictions?

23 23

0.99

x1 x2 x3 x4

0.01 0.99 Distance from detector centre

Pretty easy decision

slide-24
SLIDE 24

BERKELEY LAB

Office of Science 24

Doublet choice can be ambiguous

24 24

0.99

x1 x2 x3 x4

0.87 0.84 Distance from detector centre

Not so easy… so teach the network how to combine

slide-25
SLIDE 25

BERKELEY LAB

Office of Science 25

But a GNN doesn’t know about “triplets”

25 25

?

x1 x2 x3 x4

Distance from detector centre

A GNN only knows about nodes and edge

?

slide-26
SLIDE 26

BERKELEY LAB

Office of Science 26

Moving to a “doublet graph” gives us back GNN power

26 26

0.99

x1 x2 x3 x4

0.87 0.84

Now… nodes represent doublets, edges represent triplets

x2 x2

slide-27
SLIDE 27

BERKELEY LAB

Office of Science 27

Moving to a “doublet graph” gives us back GNN power

27 27

0.99

x1 x2 x3 x4

0.87 0.84

Now… nodes represent doublets, edges represent triplets

x2 x2

( ) ( ) ( )

slide-28
SLIDE 28

BERKELEY LAB

Office of Science 28

Triplet Propaganda

Threshold 0.5 0.8 Accuracy 0.9761 0.9784 Purity 0.9133 0.9694 Efficiency * relative 0.9542 0.9052

Doublet GNN Triplet GNN

Threshold 0.5 0.8 Accuracy 0.9960 0.9957 Purity 0.9854 0.9923 Efficiency * relative 0.9939 0.9850

slide-29
SLIDE 29

BERKELEY LAB

Office of Science 29

Triplet propaganda

Gold: Unambiguously correct triplet or quadruplet Other colours: False positive/negative Key:

Silver: Ambiguously correct triplet or quadruplet (i.e. edge shared by correct triplet and false positive triplet) Bronze dashed: Correct triplet, but missed quadruplet (i.e. edge shared by correct triplet and false negative triplet) Red: Completely false positive triplet Blue dashed: Completely false negative triplet

slide-30
SLIDE 30

BERKELEY LAB

Office of Science 30

Gold: Unambiguously correct triplet or quadruplet Other colours: False positive/negative Key:

Silver: Ambiguously correct triplet or quadruplet (i.e. edge shared by correct triplet and false positive triplet) Bronze dashed: Correct triplet, but missed quadruplet (i.e. edge shared by correct triplet and false negative triplet) Red: Completely false positive triplet Blue dashed: Completely false negative triplet

Triplet propaganda

slide-31
SLIDE 31

BERKELEY LAB

Office of Science 31

Black: Triplet classifier correctly labelled, doublet classifier mislabelled Red: Doublet classifier correctly labelled, triplet classifier mislabelled In this graph, triplet classifier Fixes 389 edges Worsens 10 edges

Triplet GNN improves doublet GNN results

slide-32
SLIDE 32

BERKELEY LAB

Office of Science 32

Seeding: Final Performance

Purity: 99.1% ± 0.07% Efficiency: 88.6% ± 0.19% - This is objective Inference time: ∼ 5 seconds per event per GPU, split between:

  • ∼ 3 seconds for embedding construction
  • ∼ 2 seconds for two GNN steps and processing
slide-33
SLIDE 33

BERKELEY LAB

Office of Science 33

Seeding: Next Steps

  • Direct comparison with ACTS seed generator
  • N-plet GNN
  • The problem is combinatorically

increasing graph size e.g. For TrackML data:

  • 𝑃(1,000) tracks,
  • 𝑃(6,000) hits,
  • 𝑃(20,000) doublets,
  • 𝑃(60,000) triplets
  • Cut doublet input before triplet

construction

  • Doublet threshold of 0.01 retains

99% efficiency

  • Reduces doublets 𝑃(20,000) →

𝑃 6,000

  • We thus have a sustainable process

to N-plet GNN

slide-34
SLIDE 34

BERKELEY LAB

Office of Science 34

Track Labelling

GOAL Given a classified doublet and/or triplet graph, use edge scores to group likely nodes into tracks and label with unique identifier.

slide-35
SLIDE 35

BERKELEY LAB

Office of Science 35

DBSCAN on a Graph

  • DBSCAN typically calculates a distance metric and clusters based on

neighbourhood density

  • Feed the edge scores 𝑓𝑗𝑘 as a precomputed, sparse, metric matrix,

with each distance element given by 𝑒𝑗𝑘 = 1 − 𝑓𝑗𝑘

  • Fill out sparse matrix to ensure it is diagonal, i.e. undirected. A

directed graph does not perform well with DBSCAN.

slide-36
SLIDE 36

BERKELEY LAB

Office of Science 36

DBSCAN Performance

  • We can construct a “truth graph”

from TrackML data, where every hit is connected to hits of a shared track in adjacent layers, with a high score (e.g. 0.99), and randomly connected to other hits with a low score (e.g. 0.01)

  • We can randomly mislabel true

edges to reduce efficiency, or mislabel fake edges to reduce purity

  • We see linear reduction in

TrackML score against efficiency

  • Exponential reduction in TrackML

score against purity

slide-37
SLIDE 37

BERKELEY LAB

Office of Science 37

GNN TrackML Score Performances

  • DBSCAN on truth graph

0.989

slide-38
SLIDE 38

BERKELEY LAB

Office of Science 38

GNN TrackML Score Performances

  • DBSCAN on truth graph

0.989

  • DBSCAN on adjacent-layer

truth graph 0.957

slide-39
SLIDE 39

BERKELEY LAB

Office of Science 39

GNN TrackML Score Performances

  • DBSCAN on truth graph

0.989

  • DBSCAN on adjacent-layer

truth graph 0.957

  • Embedding-constructed

doublet hits 0.935

Loss from embedding construction 96% efficiency

slide-40
SLIDE 40

BERKELEY LAB

Office of Science 40

GNN TrackML Score Performances

  • DBSCAN on truth graph

0.989

  • DBSCAN on adjacent-layer

truth graph 0.957

  • Embedding-constructed

doublet graph using truth 0.935

  • DBSCAN on doublet GNN

classification 0.815

Loss from embedding construction 96% efficiency

slide-41
SLIDE 41

BERKELEY LAB

Office of Science 41

GNN TrackML Score Performances

  • Triplet graph constructed from

doublet graph (truth) 0.846

Loss from embedding construction 96% efficiency Lost doublets

slide-42
SLIDE 42

BERKELEY LAB

Office of Science 42

GNN TrackML Score Performances

  • Triplet graph constructed from

doublet graph (truth) 0.846

  • DBSCAN on triplet graph from

triplet GNN classification 0.815

Loss from embedding construction 96% efficiency Lost doublets

slide-43
SLIDE 43

BERKELEY LAB

Office of Science 43

Missing Doublets

All Hits Missing Doublet Hits

𝜃 𝜃 𝜚 𝜚

slide-44
SLIDE 44

BERKELEY LAB

Office of Science 44

Missing Doublets

All Hits Missing Doublet Hits

𝜃 𝜃 𝜚 𝜚 Doublets on end of barrel

slide-45
SLIDE 45

BERKELEY LAB

Office of Science 45

Missing Doublets

All Hits Missing Doublet Hits

𝜃 𝜃 𝜚 𝜚 Doublets on edge of segments

slide-46
SLIDE 46

BERKELEY LAB

Office of Science 46

Stitching

  • Significant speed up from eliminated duplicates on edges of segments

𝜃 𝜚

Pre-clean-up Post-clean-up

slide-47
SLIDE 47

BERKELEY LAB

Office of Science 47

Ignoring Fragmented Tracks

  • We throw away all tracks that:
  • Only hit one or two different layers in the barrel
  • Have more than three hits elsewhere in the detector

E.g. Although most of this track is outside the barrel, we keep the track to challenge the GNN

𝑨 𝑧 𝑦 𝑦

slide-48
SLIDE 48

BERKELEY LAB

Office of Science 48

Track Labelling: Final-ish Performance

  • Triplet graph truth in eta

range (-2.1, 2.1) 0.912

slide-49
SLIDE 49

BERKELEY LAB

Office of Science 49

Track Labelling: Final-ish Performance

  • Triplet graph truth in eta

range (-2.1, 2.1) 0.912

  • DBSCAN on triplet GNN

classification in eta (-2.1, 2.1) 0.876

slide-50
SLIDE 50

BERKELEY LAB

Office of Science 50

Track Labelling: Final-ish Performance

  • Triplet graph truth in eta

range (-2.1, 2.1) 0.912

  • DBSCAN on triplet GNN

classification in eta (-2.1, 2.1) 0.876

  • Triplet graph truth in eta (-2.1,

2.1) & no fragments 0.925

slide-51
SLIDE 51

BERKELEY LAB

Office of Science 51

Track Labelling: Final-ish Performance

  • Triplet graph truth in eta

range (-2.1, 2.1) 0.912

  • DBSCAN on triplet GNN

classification in eta (-2.1, 2.1) 0.876

  • Triplet graph truth in eta (-2.1,

2.1) & no fragments 0.925

  • DBSCAN on triplet GNN in

eta (-2.1, 2.1) & no fragments 0.888

This is the take-away

slide-52
SLIDE 52

BERKELEY LAB

Office of Science 52

Track Labelling: Final-ish Performance

  • 0.888 TrackML Score in barrel,

emulating whole detector (no punishment for tracks crossing detector volumes) recovers almost all missing doublets

  • This is an early result – two big

improvement areas are now seen: 1. Doublet-to-triplet efficiency, and 2. Embedding construction efficiency

  • Every 1% of efficiency gained ≈

+ 0.015 TrackML score

  • Winning score is 0.922…
slide-53
SLIDE 53

BERKELEY LAB

Office of Science 53

Summary

  • Seeding pipeline complete, with good performance
  • Need concrete comparison with ACTS for CTD
  • Track labelling just beginning, with promising performance
  • Many low-hanging-fruit optimisations to try and boost efficiency and speed
  • HPO on embedding and GNN
  • Mixed-precision in GNN
  • Include cell features in GNN
  • Some GPU processing with CuPy, but much more could be transferred to work on GPU
  • A multitude of different GNN architectures, one may be especially suited to the physics