A short overview on Reducing model bias in a deep learning - PowerPoint PPT Presentation

A short overview on “Reducing model bias in a deep learning classifier using domain adversarial neural networks in the MINERA experiment” Anushree Ghosh, UTFSM, Chile Fermilab Date: 2018-11-07 � 1

Outline • MINERvA detector and the problem with the vertex reconstruction in DIS events • Deep convolutional neural network • Results from ML based vertex reconstruction • Implication of domain adversarial neural network to remove/limit the model bias What is model bias? -We train the ML model using simulated events and test the model on real data. -Our models are not perfect ->domain discrepancies arises - Find ways to reduce any biases in the algorithm that may come from training our models in one domain and applying them in another � 2

MINERvA Detector • Consists of a core of scintillator strips surrounded by ECAL and HCAL • MINOS Near Detector for muon charge and momentum 3

Problem with vertex finding: motivation behind ML technique • With the increase of our beam energy, there is an increase in the hadronic showers near the event of interactions. • Cause more difficulty in vertexing with increase rates of failure in getting the correct vertex position reconstructed vertex Strip number true vertex Plane number 4

ML Approach To Determine Event Vertex Goal: Find the location of the event vertex • -Treat the localization as a classification problem Segment 0 1 2 3 4 5 6 7 8 9 10 4 tracker modules between each target Water Target Helium Target Active Fiducial Mass Tracker 0.25 tons NUC. TARGET 5 Fiducial Mass 1 2 4 5 Fe: 161 kg 3 Pb: 135 kg NUC. TARGET 3 WATER TARGET NUC. TARGET 1 NUC. TARGET 4 NUC. TARGET 2 Fiducial Mass Fiducial Mass Fiducial Mass Fiducial Mass Fiducial Mass C: 166 kg 625 kg H 2 0 Fe: 323 kg Pb: 228 kg Target 1 2 3 4 5 Fe: 323 kg Fe: 169 kg Pb: 264 kg Pb: 266 kg Pb: 121 kg Fiducial: within 85 cm apothem of beam spot Carbon Iron Lead CH Make images for DNN(Convolutional Prediction at which segment three different views neural network) an interaction occurs 5

Convolutional neural network (CNN) Stacking layers of convolutions leads from geometric / spatial representation to semantic representation: u view v view x view We have three separate convolutional towers that look at each of the X, U, Convolutional Convolutional Convolutional unit and V images. unit unit Label predictor 7

Data Data: hits-x Data: hits-u Data: hits-v Height: 127 Height: 127 Height: 127 Width: 50 Width: 25 Width: 25 Convolution Unit Convolution Convolution Convolution Outputs: 12 Outputs: 12 Outputs: 12 Kernel Size: (8,3) Kernel Size: (8,3) Kernel Size: (8,3) ReLU ReLU ReLU MaxPooling MaxPooling MaxPooling Kernel Size: (2,1) Kernel Size: (2,1) Kernel Size: (2,1) Stride: (2,1) Stride: (2,1) Stride: (2,1) Convolution Unit Convolution Convolution Convolution Outputs: 20 Outputs: 20 Outputs: 20 Kernel Size: (7,3) Kernel Size: (7,3) Kernel Size: (7,3) ReLU ReLU ReLU MaxPooling MaxPooling MaxPooling Kernel Size: (2,1) Kernel Size: (2,1) Kernel Size: (2,1) Stride: (2,1) Stride: (2,1) Stride: (2,1) Convolution Unit Convolution Convolution Convolution Outputs: 28 Outputs: 28 Outputs: 28 Kernel Size: (7,3) Kernel Size: (7,3) Kernel Size: (7,3) ReLU ReLU ReLU MaxPooling MaxPooling MaxPooling Kernel Size: (2,1) Kernel Size: (2,1) Kernel Size: (2,1) Stride: (2,1) Stride: (2,1) Stride: (2,1) Convolution Unit Convolution Convolution Convolution Outputs: 36 Outputs: 36 Outputs: 36 Kernel Size: (7,3) Kernel Size: (7,3) Kernel Size: (7,3) ReLU ReLU ReLU MaxPooling MaxPooling MaxPooling Kernel Size: (2,1) Kernel Size: (2,1) Kernel Size: (2,1) Stride: (2,1) Stride: (2,1) Stride: (2,1) Fully Connected InnerProduct InnerProduct InnerProduct Outputs: 196 Outputs: 196 Outputs: 196 ReLU ReLU ReLU Dropout Dropout Dropout InnerProduct Outputs: 128 ReLU Dropout InnerProduct Outputs: 11 Loss Softmax w/ Loss � 8

Network structure • We have three separate convolutional towers that look at each of the X • Each tower consists of four iterations of convolution and max pooling layers with ReLUs acting as the non-linear activations and after that there is a fully connected layer • The out of three views are concatenated and fed into another fully connected layer .This is the input to the final fully connected layer with output -> input to the softmax layer. • We use non-square kernels, they are much larger along the transverse direction than along the z direction-> localization information contained directly in the energy distribution along Z. So, we allow the images to shrink along the transverse dimension but largely preserved the image size along the Z axis. Also, we pooled the tensor elements together only along the transverse axis, not along the z axis. 9

Confusion matrix Log 10 Row normalized Tracking Row normalized Tracking 10 10 0 . 9 1 . 5 8 8 0 . 8 1 . 0 0 . 7 0 . 5 Reconstructed z-segment Reconstructed z-segment 6 0 . 6 6 0 . 0 0 . 5 − 0 . 5 0 . 4 4 4 − 1 . 0 0 . 3 − 1 . 5 0 . 2 2 2 − 2 . 0 0 . 1 − 2 . 5 0 . 0 0 0 0 2 4 6 8 10 0 2 4 6 8 10 True z-segment True z-segment Log 10 Row normalized DNN Row normalized DNN 10 10 0 . 9 1 . 5 8 8 0 . 8 1 . 0 0 . 7 0 . 5 Reconstructed z-segment Reconstructed z-segment 0 . 6 6 6 0 . 0 0 . 5 − 0 . 5 0 . 4 4 4 − 1 . 0 0 . 3 − 1 . 5 0 . 2 2 2 − 2 . 0 0 . 1 − 2 . 5 0 . 0 0 0 0 2 4 6 8 10 0 2 4 6 8 10 True z-segment True z-segment � 10

Track-based approach vs ML approach Signal purity has been improved by the factor of 2-3 using ML technique compared to track based approach 11

Domain Adversarial Neural Network (DANN) http://adsabs.harvard.edu/cgi-bin/bib_query?arXiv:1505.07818 CNN: • Train with labeled data: in our case it is Monte Carlo • Test with unlabeled data: in our case it is real data Limitation: Labeled simulated data for training >> unlabeled real data for testing Our models are not perfect ->domain discrepancies arises Need strategy to reduce any biases in the algorithm that may come from training our models in one domain and applying them in another Here DANN comes into the picture 12

DANN Train from the labeled source domain (MC ) and unlabeled target domain (real data) Goal to achieve the features: 1) discriminative for the main learning task on the source domain 2) indiscriminate with respect to the shift between domains V view X view U view This adaptation behavior can be achieved by Convolutional Convolutional Convolutional adding a gradient unit unit unit reversal layer with few standard layers Inner product Domain classifier Label predictor 13

DANN • Two classifiers into the network: Label predictor: output Domain classifier: works internally • Minimize the loss of the label classifier so that network can predicts the input level • Maximize the loss of the domain classifier so that network can not distinguish between source and target domain. • The network develops an insensitivity to features that are present in one domain but not the other, and train only on features that are common to both domains. 14

Data Data: hits-x Data: hits-u Data: hits-v Height: 127 Height: 127 Height: 127 Width: 50 Width: 25 Width: 25 Convolution Unit Convolution Convolution Convolution Outputs: 12 Outputs: 12 Outputs: 12 Kernel Size: (8,3) Kernel Size: (8,3) Kernel Size: (8,3) ReLU ReLU ReLU MaxPooling MaxPooling MaxPooling Kernel Size: (2,1) Kernel Size: (2,1) Kernel Size: (2,1) Stride: (2,1) Stride: (2,1) Stride: (2,1) Convolution Unit Convolution Convolution Convolution Outputs: 20 Outputs: 20 Outputs: 20 Kernel Size: (7,3) Kernel Size: (7,3) Kernel Size: (7,3) ReLU ReLU ReLU MaxPooling MaxPooling MaxPooling Kernel Size: (2,1) Kernel Size: (2,1) Kernel Size: (2,1) Stride: (2,1) Stride: (2,1) Stride: (2,1) Convolution Unit Convolution Convolution Convolution Outputs: 28 Outputs: 28 Outputs: 28 Kernel Size: (7,3) Kernel Size: (7,3) Kernel Size: (7,3) ReLU ReLU ReLU MaxPooling MaxPooling MaxPooling Kernel Size: (2,1) Kernel Size: (2,1) Kernel Size: (2,1) Stride: (2,1) Stride: (2,1) Stride: (2,1) Convolution Unit Convolution Convolution Convolution Outputs: 36 Outputs: 36 Outputs: 36 Kernel Size: (7,3) Kernel Size: (7,3) Kernel Size: (7,3) ReLU ReLU ReLU MaxPooling MaxPooling MaxPooling Kernel Size: (2,1) Kernel Size: (2,1) Kernel Size: (2,1) Stride: (2,1) Stride: (2,1) Stride: (2,1) F ully Connected InnerProduct InnerProduct InnerProduct Outputs: 196 Outputs: 196 Outputs: 196 ReLU ReLU ReLU Dropout Dropout Dropout InnerProduct Outputs: 128 ReLU Dropout InnerProduct Outputs: 11 Label Predictor Domain Classifier Split Gradient Reversal Source Features Target Features InnerProduct Outputs: 1024 ReLU InnerProduct Dropout Silence Outputs: 11 InnerProduct Softmax w/ Loss Outputs: 1024 ReLU Dropout InnerProduct 15 Outputs: 1 Sigmoid Cross Entropy Loss

A short overview on Reducing model bias in a deep learning - PowerPoint PPT Presentation

A short overview on Reducing model bias in a deep learning classifier using domain adversarial neural networks in the MINERA experiment Anushree Ghosh, UTFSM, Chile Fermilab Date: 2018-11-07 1 Outline MINERvA detector and the

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

pn -junctonJ under dark conditons No Bias Forward Bias Reverse Bias Model - + Circuit P N

Estimating and Mitigating Gender Bias in Deep Image Representations Tianlu Wang University of

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

Scale Dependency Analysis of Atmospheric Pollutants with EMEP MSC-W Model Semeena Valiyaveetil

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

go to the source The Media Bias Chart The Media Bias Chart A new taxonomy for discussing the

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November

Transistor bias circuits 1 Objectives Discuss the concept of dc biasing of a transistor for

Microwave Scan Bias Status Report Bjorn Lambrigtsen February 25, 2003 AIRS Science Team

Lecture 5.1: Groups acting on sets Matthew Macauley Department of Mathematical Sciences Clemson

1 Example Modular program design Top-down design Bank Transactions Begin with main

How to Reconcile between Human Rights and Counter-Terrorism? Professor Martin Scheinin, EUI

XII.A change in Christs ministry in response to His rejection A. Introduction to the parables

Deep Learning Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 521, 436444 (28 May 2015)

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Africa and the World: the view from Washington Howard Wolpe Former special envoy to the Great

Agile Approaches Roman Kontchakov Birkbeck, University of London Based on Chapter 3 of Bennett,

A short overview on Reducing model bias in a deep learning - PowerPoint PPT Presentation

A short overview on Reducing model bias in a deep learning classifier using domain adversarial neural networks in the MINERA experiment Anushree Ghosh, UTFSM, Chile Fermilab Date: 2018-11-07 1 Outline MINERvA detector and the

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

pn -junctonJ under dark conditons No Bias Forward Bias Reverse Bias Model - + Circuit P N

Estimating and Mitigating Gender Bias in Deep Image Representations Tianlu Wang University of

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

Scale Dependency Analysis of Atmospheric Pollutants with EMEP MSC-W Model Semeena Valiyaveetil

Equity &amp; Excellence: Hidden Bias Implicit Bias Inherent Bias

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

go to the source The Media Bias Chart The Media Bias Chart A new taxonomy for discussing the

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November

Transistor bias circuits 1 Objectives Discuss the concept of dc biasing of a transistor for

Microwave Scan Bias Status Report Bjorn Lambrigtsen February 25, 2003 AIRS Science Team

Lecture 5.1: Groups acting on sets Matthew Macauley Department of Mathematical Sciences Clemson

1 Example Modular program design Top-down design Bank Transactions Begin with main

How to Reconcile between Human Rights and Counter-Terrorism? Professor Martin Scheinin, EUI

XII.A change in Christs ministry in response to His rejection A. Introduction to the parables

Deep Learning Yann LeCun, Yoshua Bengio &amp; Geoffrey Hinton Nature 521, 436444 (28 May 2015)

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Africa and the World: the view from Washington Howard Wolpe Former special envoy to the Great

Agile Approaches Roman Kontchakov Birkbeck, University of London Based on Chapter 3 of Bennett,

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias

Deep Learning Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 521, 436444 (28 May 2015)