 
              Understanding Genome Regulation with Interpretable Deep Learning Presented by: Avanti Shrikumar Kundaje Lab Stanford University
Example biological problem: understanding stem cell differentiation liver cells Lung cells fertilized egg Kidney cells Cell-types are different because different genes are turned on How is cell-type-specific gene expression controlled? Ans: “regulatory elements” act like switches to turn genes on 1
“Regulatory elements” are switches that turn genes on Sequence contain “DNA patterns” that …and activate nearby genes proteins called transcription factors bind to ACGTGTAACTGATAATGCCGATATT DNA sequence of a gene Regulatory element Transcription factors bind to DNA words Regulatory element + transcription factors loop over … 2
90%+ * of disease-associated mutations are outside genes! Regulatory element has “DNA patterns” that transcription factors bind to ACGTGTAACTGATAATGCCGATATT DNA sequence of a gene Transcription factors Many positions in a regulatory element are not essential for its function! → Which positions in regulatory elements matter? 2 *Stranger et al ., Genet. , 2011
Q: Which positions in regulatory elements matter? Predict tissue- Interpret the Experimentally specific activity model to learn measure of regulatory regulatory important elements from elements in patterns in the sequence using different tissues input! deep learning 3
Questions for the model - Which parts of the input are the most important for making a given prediction? - What are the recurring patterns in the input? 4
Questions for the model - Which parts of the input are the most important for making a given prediction? - What are the recurring patterns in the input? 4
Overview of deep learning model Active in Active in Accessible in Accessible in Output: Active (+1) vs not Erythroid Liver Lung HSCs active (0) Later layers build on patterns of previous layer Learned pattern detectors G A T A A C C G A T A T C 1 1 0 0 0 0 0 1 1 0 1 A 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 C 0 0 0 0 1 0 0 0 0 0 0 G 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 T 5 Input: DNA sequence represented as ones and zeros
How can we identify important nucleotides? ? Active in Active in In-silico Liver Lung mutagenesis G A T A A C C G A T A T C A C A …................................ C G G Alipanahi et al, 2015 T T 6 T Zhou & Troyanskaya, 2015
Saturation problem illustrated y o =1 1 y in =2 y o 1 2 0 =1 i 1 i 2 =1 y in = i 1 + i 2 0 Avoiding saturation means perturbing combinations of inputs → increased computational cost 7
“Backpropagation” based approaches Active in Active in Active in Examples Liver Liver Lung - Gradients (Simonyan et al.) - Integrated Gradients (ICML 2017) - DeepLIFT (ICML 2017); https://github.com/kundajelab /deeplift G A T C G A A A G A T A A C C G A T A T C C 1 1 0 0 0 0 0 0 0 1 1 0 1 A 0 0 0 1 0 0 0 1 0 0 0 1 0 C 0 0 0 0 1 0 0 0 0 0 0 G 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 T 8 Input: DNA sequence represented as ones and zeros
Saturation revisited When (i 1 + i 2 ) >= 1, y o =1 gradient is 0 y in =2 1 y o i 1 =1 =1 i 2 1 2 0 y in = i 1 + i 2 Affects: - Gradients - Deconvolutional Networks - Guided Backpropagation - Layerwise Relevance Propagation 9
The DeepLIFT solution: difference from reference 0 =0 & i 2 0 =0 Reference: i 1 y o =1 0 + i 2 0 =0 as (i 1 0 ) = 0 (reference) y o With (i 1 + i 2 ) = 2, the y in =2 1 y o “difference from reference” (Δy ) is +1, NOT 0 =1 i 1 =1 i 2 1 2 0 y in = i 1 + i 2 Δi 1 =1 Δi 2 =1 C Δi1Δy =0.5=C Δi2Δy Detailed backpropagation rules in the paper 10
DeepLIFT scores at active regulatory element near HNF4A gene Anna Shcherbina Liver Lung Kidney 11
Choice of reference matters! CIFAR10 model, class = “ship” Suggestions on how to pick a DeepLIFT Reference Original scores reference : - MNIST: all zeros (background) - Consider using a distribution of references - E.g. multiple references generated by dinucleotide-shuffling a genomic sequence 12
Integrated Gradients: Another reference-based approach y =0 1 y =0.0 i 1 i 2 =0.0 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.0 0.0 1 dy/di x = 1 13
Integrated Gradients: Another reference-based approach y =0 1 y =0.2 i 1 i 2 =0.2 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.0 0.0 1 dy/di x = 1 0.2 0.2 1 13
Integrated Gradients: Another reference-based approach y =0 1 y =0.4 i 1 i 2 =0.4 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.0 0.0 1 dy/di x = 1 0.2 0.2 1 13 0.4 0.4 1
Integrated Gradients: Another reference-based approach y =0 1 y =0.6 i 1 i 2 =0.6 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.6 0.6 0 0.0 0.0 1 dy/di x = 0 0.2 0.2 1 13 0.4 0.4 1
Integrated Gradients: Another reference-based approach y =0 1 y =0.8 i 1 i 2 =0.8 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.6 0.6 0 0.0 0.0 1 dy/di x = 0 0.8 0.8 0 0.2 0.2 1 13 0.4 0.4 1
Integrated Gradients: Another reference-based approach y =0 1 y =1.0 i 1 i 2 =1.0 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.6 0.6 0 0.0 0.0 1 dy/di x = 0 Average dy/di x = 0.5 0.8 0.8 0 0.2 0.2 1 (Average dy/di 1 )*Δi 1 = 0.5 13 (Average dy/di 1 )*Δi 2 = 0.5 1.0 1.0 0 0.4 0.4 1
Integrated Gradients: Another reference-based approach • Sundararajan et al. • Pros: – completely black-box except for gradient computation – functionally equivalent networks guaranteed to give the same result • Cons: – Repeated gradient calc. adds computational overhead – Linear interpolation path between the baseline and actual input can result in chaotic behavior from the network, esp. for things like one- hot encoded DNA sequence 14
- Original: Original one-hot encoded DNA sequences - “Shuffled”: shuffled sequences as “baseline” - Interpolation parameterized by “alpha” from 0 to 1 15
15
15
15
15
15
15
Neural nets can behave unexpectedly when supplied inputs outside the training set distribution 15
Might be why Integrated Gradients sometimes performs worse than grad*input on DNA… Per-position perturbation Region active in cell type “A549” (“In - Silico Mutagenesis”) DeepLIFT Grad*Input Integrated Gradients 16
Integrated Gradients: Another reference-based approach • Sundararajan et al. • Pros: – completely black-box except for gradient computation – functionally equivalent networks guaranteed to give the same result • Cons: – Repeated gradient calc. adds computational overhead – Linear interpolation path between the baseline and actual input can result in chaotic behavior from the network, esp. for things like one- hot encoded DNA sequence – Still relies on gradients, which are local by nature and can give misleading interpretations 17
Failure-case: “min” (AND) relation y = i 1 – h = i 1 – max(0, i 1 – i 2 ) i 1 y = min(i 1 , i 2 ) h = ReLU(i 1 – i 2 ) = max(0, i 1 -i 2 ) i 2 i 1 , i 2 y i 2 < i 1 i 1 – (i 1 -i 2 ) = i 2 i 2 > i 1 i 1 – 0 = i 1 Gradient=0 for either i 1 or i 2 , whichever is larger This is true even when interpolating from (0,0) to (i 1 ,i 2 )! 18
The DeepLIFT solution: consider different orders for adding positive and negative terms i 1 = 10, i 2 = 6 = 10 – ReLU(4) = 6  min(i 1 =10, i 2 =6) y = i 1 – ReLU(i 1 – i 2 ) 19
The DeepLIFT solution: consider different orders for adding positive and negative terms i 1 = 10, i 2 = 6 = 10 – ReLU(4) = 6  min(i 1 =10, i 2 =6) y = i 1 – ReLU(i 1 – i 2 ) Standard breakdown: 4 = (10 from i 1 ) + (-6 from i 2 ) ReLU(i 1 - i 2 ) +10 -6 4 i 1 - i 2 4 i 1 =10 19 i 2 =6
The DeepLIFT solution: consider different orders for adding positive and negative terms i 1 = 10, i 2 = 6 = 10 – ReLU(4) = 6  min(i 1 =10, i 2 =6) y = i 1 – ReLU(i 1 – i 2 ) Standard breakdown: y = 6 = (10 from i 1 ) – [(10 from i 1 ) – (6 from i 2 )] = 6 from i 2 Standard breakdown: 4 = (10 from i 1 ) + (-6 from i 2 ) ReLU(i 1 - i 2 ) +10 -6 4 i 1 - i 2 4 i 1 =10 19 i 2 =6
The DeepLIFT solution: consider different orders for adding positive and negative terms i 1 = 10, i 2 = 6 = 10 – ReLU(4) = 6  min(i 1 =10, i 2 =6) y = i 1 – ReLU(i 1 – i 2 ) Standard breakdown: y = 6 = (10 from i 1 ) – [(10 from i 1 ) – (6 from i 2 )] = 6 from i 2 Standard breakdown: Other possible breakdown: 4 = (10 from i 1 ) + (-6 from i 2 ) 4 = (4 from i 1 ) + (0 from i 2 ) ReLU(i 1 - i 2 ) ReLU(i 1 - i 2 ) +10 -6 4 4 +4 0 i 1 - i 2 i 1 - i 2 4 4 i 2 =6 i 1 =10 19 i 1 =10 i 2 =6
Recommend
More recommend