with Interpretable Deep Learning Presented by: Avanti Shrikumar - PowerPoint PPT Presentation

Understanding Genome Regulation with Interpretable Deep Learning Presented by: Avanti Shrikumar Kundaje Lab Stanford University

Example biological problem: understanding stem cell differentiation liver cells Lung cells fertilized egg Kidney cells Cell-types are different because different genes are turned on How is cell-type-specific gene expression controlled? Ans: “regulatory elements” act like switches to turn genes on 1

“Regulatory elements” are switches that turn genes on Sequence contain “DNA patterns” that …and activate nearby genes proteins called transcription factors bind to ACGTGTAACTGATAATGCCGATATT DNA sequence of a gene Regulatory element Transcription factors bind to DNA words Regulatory element + transcription factors loop over … 2

90%+ * of disease-associated mutations are outside genes! Regulatory element has “DNA patterns” that transcription factors bind to ACGTGTAACTGATAATGCCGATATT DNA sequence of a gene Transcription factors Many positions in a regulatory element are not essential for its function! → Which positions in regulatory elements matter? 2 *Stranger et al ., Genet. , 2011

Q: Which positions in regulatory elements matter? Predict tissue- Interpret the Experimentally specific activity model to learn measure of regulatory regulatory important elements from elements in patterns in the sequence using different tissues input! deep learning 3

Questions for the model - Which parts of the input are the most important for making a given prediction? - What are the recurring patterns in the input? 4

Overview of deep learning model Active in Active in Accessible in Accessible in Output: Active (+1) vs not Erythroid Liver Lung HSCs active (0) Later layers build on patterns of previous layer Learned pattern detectors G A T A A C C G A T A T C 1 1 0 0 0 0 0 1 1 0 1 A 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 C 0 0 0 0 1 0 0 0 0 0 0 G 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 T 5 Input: DNA sequence represented as ones and zeros

How can we identify important nucleotides? ? Active in Active in In-silico Liver Lung mutagenesis G A T A A C C G A T A T C A C A …................................ C G G Alipanahi et al, 2015 T T 6 T Zhou & Troyanskaya, 2015

Saturation problem illustrated y o =1 1 y in =2 y o 1 2 0 =1 i 1 i 2 =1 y in = i 1 + i 2 0 Avoiding saturation means perturbing combinations of inputs → increased computational cost 7

“Backpropagation” based approaches Active in Active in Active in Examples Liver Liver Lung - Gradients (Simonyan et al.) - Integrated Gradients (ICML 2017) - DeepLIFT (ICML 2017); https://github.com/kundajelab /deeplift G A T C G A A A G A T A A C C G A T A T C C 1 1 0 0 0 0 0 0 0 1 1 0 1 A 0 0 0 1 0 0 0 1 0 0 0 1 0 C 0 0 0 0 1 0 0 0 0 0 0 G 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 T 8 Input: DNA sequence represented as ones and zeros

Saturation revisited When (i 1 + i 2 ) >= 1, y o =1 gradient is 0 y in =2 1 y o i 1 =1 =1 i 2 1 2 0 y in = i 1 + i 2 Affects: - Gradients - Deconvolutional Networks - Guided Backpropagation - Layerwise Relevance Propagation 9

The DeepLIFT solution: difference from reference 0 =0 & i 2 0 =0 Reference: i 1 y o =1 0 + i 2 0 =0 as (i 1 0 ) = 0 (reference) y o With (i 1 + i 2 ) = 2, the y in =2 1 y o “difference from reference” (Δy ) is +1, NOT 0 =1 i 1 =1 i 2 1 2 0 y in = i 1 + i 2 Δi 1 =1 Δi 2 =1 C Δi1Δy =0.5=C Δi2Δy Detailed backpropagation rules in the paper 10

DeepLIFT scores at active regulatory element near HNF4A gene Anna Shcherbina Liver Lung Kidney 11

Choice of reference matters! CIFAR10 model, class = “ship” Suggestions on how to pick a DeepLIFT Reference Original scores reference : - MNIST: all zeros (background) - Consider using a distribution of references - E.g. multiple references generated by dinucleotide-shuffling a genomic sequence 12

Integrated Gradients: Another reference-based approach y =0 1 y =0.0 i 1 i 2 =0.0 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.0 0.0 1 dy/di x = 1 13

Integrated Gradients: Another reference-based approach y =0 1 y =0.2 i 1 i 2 =0.2 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.0 0.0 1 dy/di x = 1 0.2 0.2 1 13

Integrated Gradients: Another reference-based approach y =0 1 y =0.4 i 1 i 2 =0.4 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.0 0.0 1 dy/di x = 1 0.2 0.2 1 13 0.4 0.4 1

Integrated Gradients: Another reference-based approach y =0 1 y =0.6 i 1 i 2 =0.6 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.6 0.6 0 0.0 0.0 1 dy/di x = 0 0.2 0.2 1 13 0.4 0.4 1

Integrated Gradients: Another reference-based approach y =0 1 y =0.8 i 1 i 2 =0.8 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.6 0.6 0 0.0 0.0 1 dy/di x = 0 0.8 0.8 0 0.2 0.2 1 13 0.4 0.4 1

Integrated Gradients: Another reference-based approach y =0 1 y =1.0 i 1 i 2 =1.0 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.6 0.6 0 0.0 0.0 1 dy/di x = 0 Average dy/di x = 0.5 0.8 0.8 0 0.2 0.2 1 (Average dy/di 1 )*Δi 1 = 0.5 13 (Average dy/di 1 )*Δi 2 = 0.5 1.0 1.0 0 0.4 0.4 1

Integrated Gradients: Another reference-based approach • Sundararajan et al. • Pros: – completely black-box except for gradient computation – functionally equivalent networks guaranteed to give the same result • Cons: – Repeated gradient calc. adds computational overhead – Linear interpolation path between the baseline and actual input can result in chaotic behavior from the network, esp. for things like one- hot encoded DNA sequence 14

- Original: Original one-hot encoded DNA sequences - “Shuffled”: shuffled sequences as “baseline” - Interpolation parameterized by “alpha” from 0 to 1 15

Neural nets can behave unexpectedly when supplied inputs outside the training set distribution 15

Might be why Integrated Gradients sometimes performs worse than grad*input on DNA… Per-position perturbation Region active in cell type “A549” (“In - Silico Mutagenesis”) DeepLIFT Grad*Input Integrated Gradients 16

Integrated Gradients: Another reference-based approach • Sundararajan et al. • Pros: – completely black-box except for gradient computation – functionally equivalent networks guaranteed to give the same result • Cons: – Repeated gradient calc. adds computational overhead – Linear interpolation path between the baseline and actual input can result in chaotic behavior from the network, esp. for things like one- hot encoded DNA sequence – Still relies on gradients, which are local by nature and can give misleading interpretations 17

Failure-case: “min” (AND) relation y = i 1 – h = i 1 – max(0, i 1 – i 2 ) i 1 y = min(i 1 , i 2 ) h = ReLU(i 1 – i 2 ) = max(0, i 1 -i 2 ) i 2 i 1 , i 2 y i 2 < i 1 i 1 – (i 1 -i 2 ) = i 2 i 2 > i 1 i 1 – 0 = i 1 Gradient=0 for either i 1 or i 2 , whichever is larger This is true even when interpolating from (0,0) to (i 1 ,i 2 )! 18

The DeepLIFT solution: consider different orders for adding positive and negative terms i 1 = 10, i 2 = 6 = 10 – ReLU(4) = 6  min(i 1 =10, i 2 =6) y = i 1 – ReLU(i 1 – i 2 ) 19

The DeepLIFT solution: consider different orders for adding positive and negative terms i 1 = 10, i 2 = 6 = 10 – ReLU(4) = 6  min(i 1 =10, i 2 =6) y = i 1 – ReLU(i 1 – i 2 ) Standard breakdown: 4 = (10 from i 1 ) + (-6 from i 2 ) ReLU(i 1 - i 2 ) +10 -6 4 i 1 - i 2 4 i 1 =10 19 i 2 =6

The DeepLIFT solution: consider different orders for adding positive and negative terms i 1 = 10, i 2 = 6 = 10 – ReLU(4) = 6  min(i 1 =10, i 2 =6) y = i 1 – ReLU(i 1 – i 2 ) Standard breakdown: y = 6 = (10 from i 1 ) – [(10 from i 1 ) – (6 from i 2 )] = 6 from i 2 Standard breakdown: 4 = (10 from i 1 ) + (-6 from i 2 ) ReLU(i 1 - i 2 ) +10 -6 4 i 1 - i 2 4 i 1 =10 19 i 2 =6

The DeepLIFT solution: consider different orders for adding positive and negative terms i 1 = 10, i 2 = 6 = 10 – ReLU(4) = 6  min(i 1 =10, i 2 =6) y = i 1 – ReLU(i 1 – i 2 ) Standard breakdown: y = 6 = (10 from i 1 ) – [(10 from i 1 ) – (6 from i 2 )] = 6 from i 2 Standard breakdown: Other possible breakdown: 4 = (10 from i 1 ) + (-6 from i 2 ) 4 = (4 from i 1 ) + (0 from i 2 ) ReLU(i 1 - i 2 ) ReLU(i 1 - i 2 ) +10 -6 4 4 +4 0 i 1 - i 2 i 1 - i 2 4 4 i 2 =6 i 1 =10 19 i 1 =10 i 2 =6

with Interpretable Deep Learning Presented by: Avanti Shrikumar - PowerPoint PPT Presentation

Understanding Genome Regulation with Interpretable Deep Learning Presented by: Avanti Shrikumar Kundaje Lab Stanford University Example biological problem: understanding stem cell differentiation liver cells Lung cells fertilized egg

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

Towards Interpretable Deep Learning for Natural Language Processing Roy Schwartz University of

Interpretable & Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Not Just a Black Box: Interpretable Deep Learning for Genomics Presented by: AvanA Shrikumar 1

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

SDRL: Interpretable and Data-efficient Deep Liu Reinforcement Learning Introduction Background

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

Toward In Interpretable De Deep Re Reinforcement Lea Learning g wi with Li Linea ear Model

Layer-wise Relevance Propagation in Neural Neural Networks Deep Learning Shortcomings Networks

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

WIPO INTERNATIONAL SEMINAR ZLATA SLADI, M. Sc. in Biomedicine Banska Bystrica Republic of

cardiogenic shock team Sandeep Nathan, MD, MSc, FACC, FSCAI Associate Professor of Medicine

Implementation of CDISC Standards at Nycomed PhUSE, Basel (19-21 October 2009) Nycomed GmbH, Dr.

Detecting Self-Interruptions during Reading Jan Pilzer and Sam Liu 2017-11-27 Detecting

Intensive Case Management for Seniors at Risk Programs Catholic Family Services Programs

cooling circuit plumb line tubing joints sealing sheets temperature probes coffin casings and

Standard of living Mara Oviedo Employment Sooner or later, a woman will want to start to

France integration of Provenance data: the Bibale database Zakad Narodowy im. Ossoliskich ,

with Interpretable Deep Learning Presented by: Avanti Shrikumar - PowerPoint PPT Presentation

Understanding Genome Regulation with Interpretable Deep Learning Presented by: Avanti Shrikumar Kundaje Lab Stanford University Example biological problem: understanding stem cell differentiation liver cells Lung cells fertilized egg

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan&gt; Shrikumar, Peyton

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

Towards Interpretable Deep Learning for Natural Language Processing Roy Schwartz University of

Interpretable &amp; Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Not Just a Black Box: Interpretable Deep Learning for Genomics Presented by: AvanA Shrikumar 1

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

SDRL: Interpretable and Data-efficient Deep Liu Reinforcement Learning Introduction Background

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

Toward In Interpretable De Deep Re Reinforcement Lea Learning g wi with Li Linea ear Model

Layer-wise Relevance Propagation in Neural Neural Networks Deep Learning Shortcomings Networks

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

WIPO INTERNATIONAL SEMINAR ZLATA SLADI, M. Sc. in Biomedicine Banska Bystrica Republic of

cardiogenic shock team Sandeep Nathan, MD, MSc, FACC, FSCAI Associate Professor of Medicine

Implementation of CDISC Standards at Nycomed PhUSE, Basel (19-21 October 2009) Nycomed GmbH, Dr.

Detecting Self-Interruptions during Reading Jan Pilzer and Sam Liu 2017-11-27 Detecting

Intensive Case Management for Seniors at Risk Programs Catholic Family Services Programs

cooling circuit plumb line tubing joints sealing sheets temperature probes coffin casings and

Standard of living Mara Oviedo Employment Sooner or later, a woman will want to start to

France integration of Provenance data: the Bibale database Zakad Narodowy im. Ossoliskich ,

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton

Interpretable & Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech