Not Just a Black Box: Interpretable Deep Learning for Genomics - - PowerPoint PPT Presentation

not just a black box interpretable deep learning for
SMART_READER_LITE
LIVE PREVIEW

Not Just a Black Box: Interpretable Deep Learning for Genomics - - PowerPoint PPT Presentation

Not Just a Black Box: Interpretable Deep Learning for Genomics Presented by: AvanA Shrikumar 1 With great power comes really poor interpretability Deep Power Learning Traditional machine learning Classical statistics 2


slide-1
SLIDE 1

Not Just a Black Box: Interpretable Deep Learning for Genomics

Presented by: AvanA Shrikumar

1

slide-2
SLIDE 2

With great power comes really poor interpretability…

Deep Learning

Interpretability Power

Classical statistics Traditional machine learning

2

slide-3
SLIDE 3

With great power comes really poor interpretability…

Deep Learning

Interpretability Power

Classical statistics Traditional machine learning Interpretable Deep Learning

2

slide-4
SLIDE 4

QuesAons for the model

  • Which parts of the input are the most

important for making a given predic:on?

3

slide-5
SLIDE 5

QuesAons for the model

  • Which parts of the input are the most

important for making a given predic:on?

  • What are the recurring pa<erns in the

input?

3

slide-6
SLIDE 6

QuesAons for the model

  • Which parts of the input are the most

important for making a given predic:on?

  • What are the recurring pa<erns in the

input?

4

slide-7
SLIDE 7

How can we find the important parts of the input for a given predic:on?

Output

… … … …

Yellow = inputs

Idea 1: perturbaAon

5

slide-8
SLIDE 8

How can we find the important parts of the input for a given predic:on?

Output

… … … …

Yellow = inputs

Idea 1: perturbaAon

5

slide-9
SLIDE 9

How can we find the important parts of the input for a given predic:on?

Output

… … … …

Yellow = inputs

Idea 1: perturbaAon

5

slide-10
SLIDE 10

How can we find the important parts of the input for a given predic:on?

Output

… … … …

Yellow = inputs

Idea 1: perturbaAon

5

slide-11
SLIDE 11

How can we find the important parts of the input for a given predic:on?

Output

… … … …

Yellow = inputs

Idea 1: perturbaAon

5

slide-12
SLIDE 12

How can we find the important parts of the input for a given predic:on?

Output

… … … …

Yellow = inputs

Idea 1: perturbaAon

5

slide-13
SLIDE 13

How can we find the important parts of the input for a given predic:on?

Output

… … … …

Yellow = inputs

Idea 1: perturbaAon

5

slide-14
SLIDE 14

How can we find the important parts of the input for a given predic:on?

Output

… … … …

Yellow = inputs

Drawbacks

1) Computa:onal efficiency - requires one forward prop for each perturba:on

Idea 1: perturbaAon

5

slide-15
SLIDE 15

How can we find the important parts of the input for a given predic:on?

Output

… … … …

Yellow = inputs

Drawbacks

1) Computa:onal efficiency - requires one forward prop for each perturba:on 2) Satura:on

Idea 1: perturbaAon

5

slide-16
SLIDE 16

i1 i2

y

i1 + i2

1 1 2

y

Satura:on problem illustrated

6

slide-17
SLIDE 17

i1 i2

y

i1 + i2

1 1 2

y

Satura:on problem illustrated

=1 =1 =1

6

slide-18
SLIDE 18

i1 i2

y

i1 + i2

1 1 2

y

Satura:on problem illustrated

=1 =1 =1

6

slide-19
SLIDE 19

Output

… … … …

Yellow = inputs

Idea 2: backpropagate importance

7

How can we find the important parts of the input for a given predic:on?

slide-20
SLIDE 20

Output

… … … …

Yellow = inputs

Idea 2: backpropagate importance

7

How can we find the important parts of the input for a given predic:on?

slide-21
SLIDE 21

Output

… … … …

Yellow = inputs

Idea 2: backpropagate importance

7

How can we find the important parts of the input for a given predic:on?

slide-22
SLIDE 22

Output

… … … …

Yellow = inputs

Idea 2: backpropagate importance

Examples:

  • Gradients (Simonyan et al.)
  • Deconvolu:onal Networks (Zeiler & Fergus)
  • Guided Backpropaga:on (Springenberg et al.)
  • Layerwise Relevance Propaga:on (Bach et al.)
  • Integrated Gradients (Sundararajan et al.)

7

How can we find the important parts of the input for a given predic:on?

slide-23
SLIDE 23

Output

… … … …

Yellow = inputs

Idea 2: backpropagate importance

Examples:

  • Gradients (Simonyan et al.)
  • Deconvolu:onal Networks (Zeiler & Fergus)
  • Guided Backpropaga:on (Springenberg et al.)
  • Layerwise Relevance Propaga:on (Bach et al.)
  • Integrated Gradients (Sundararajan et al.)
  • DeepLIFT (Learning Important FeaTures)
  • h<ps://github.com/kundajelab/

deepli^, ICML 2017

  • With Peyton Greenside and Anshul

Kundaje

7

How can we find the important parts of the input for a given predic:on?

slide-24
SLIDE 24

Satura:on revisited

8

i1 + i2

1 1 2

y

i1 i2

y

slide-25
SLIDE 25

Satura:on revisited

When (i1 + i2) >= 1, gradient is 0

8

i1 + i2

1 1 2

y

i1 i2

y

slide-26
SLIDE 26

The DeepLIFT solu:on: difference from reference

i1 + i2

1 1 2 Reference: i1=0 & i2=0

9

y

i1 i2

y

slide-27
SLIDE 27

The DeepLIFT solu:on: difference from reference

i1 + i2

1 1 2 y=0 when (i1 + i2) = 0 (reference) Reference: i1=0 & i2=0

9

y

i1 i2

y

slide-28
SLIDE 28

The DeepLIFT solu:on: difference from reference

i1 + i2

1 1 2 y=0 when (i1 + i2) = 0 (reference)

At (i1 + i2) = 2, the “difference from reference” is +1, NOT 0

Reference: i1=0 & i2=0

9

y

i1 i2

y

slide-29
SLIDE 29

The DeepLIFT solu:on: difference from reference

i1 + i2

1 1 2 y=0 when (i1 + i2) = 0 (reference)

At (i1 + i2) = 2, the “difference from reference” is +1, NOT 0

Reference: i1=0 & i2=0

9

y

i1 i2

y

DeepLIFT addresses other failure modes besides saturaAon (see paper)

slide-30
SLIDE 30

Reference ma<ers!

Original

CIFAR10 model, class = “ship”

10

slide-31
SLIDE 31

Reference ma<ers!

Original Reference DeepLIFT scores

CIFAR10 model, class = “ship”

10

slide-32
SLIDE 32

Reference ma<ers!

Original Reference DeepLIFT scores

CIFAR10 model, class = “ship”

SuggesAons on how to pick a reference:

  • MNIST: all zeros (background)

10

slide-33
SLIDE 33

Reference ma<ers!

Original Reference DeepLIFT scores

CIFAR10 model, class = “ship”

SuggesAons on how to pick a reference:

  • MNIST: all zeros (background)
  • Consider using a distribuAon
  • f references
  • E.g. mul:ple references

generated by shuffling a genomic sequence

10

slide-34
SLIDE 34

Eg: morphing 8 to a 3 or a 6

  • riginal

8->3 8->6 Guided Backprop Integrated gradients DeepLIFT

11

slide-35
SLIDE 35

Example biological problem: understanding stem cell differen:a:on

fer:lized egg liver cells cardiac cells blood cells

12

Cell-types are different because different genes are turned on

slide-36
SLIDE 36

Example biological problem: understanding stem cell differen:a:on

fer:lized egg liver cells cardiac cells blood cells

How is cell-type-specific gene expression controlled?

12

Cell-types are different because different genes are turned on

slide-37
SLIDE 37

Example biological problem: understanding stem cell differen:a:on

fer:lized egg liver cells cardiac cells blood cells

How is cell-type-specific gene expression controlled?

Ans: “control elements” act like switches to turn genes on 12

Cell-types are different because different genes are turned on

slide-38
SLIDE 38

“Control Elements” are switches that turn genes

13

DNA sequence of a gene Control element

slide-39
SLIDE 39

“Control Elements” are switches that turn genes

13

DNA sequence of a gene Control element ACGTGTAACTGATAATGCCGATATT Sequence contain “DNA words” that controller proteins bind to

slide-40
SLIDE 40

“Control Elements” are switches that turn genes

13

DNA sequence of a gene Control element ACGTGTAACTGATAATGCCGATATT Controller proteins bind to DNA words Sequence contain “DNA words” that controller proteins bind to

slide-41
SLIDE 41

“Control Elements” are switches that turn genes

13

DNA sequence of a gene Control element + controller proteins loop over…

slide-42
SLIDE 42

“Control Elements” are switches that turn genes

13

DNA sequence of a gene Control element + controller proteins loop over… …and ac:vate nearby genes

slide-43
SLIDE 43

89%* of disease-associated mutaAons are outside genes!

14

DNA sequence of a gene Controller proteins *Stranger et al., Genet., 2011

slide-44
SLIDE 44

89%* of disease-associated mutaAons are outside genes!

14

DNA sequence of a gene ACGTGTAACTGATAATGCCGATATT Controller proteins Control element has “DNA words” that controller proteins bind to *Stranger et al., Genet., 2011

slide-45
SLIDE 45

89%* of disease-associated mutaAons are outside genes!

14

DNA sequence of a gene ACGTGTAACTGATAATGCCGATATT Controller proteins Control element has “DNA words” that controller proteins bind to *Stranger et al., Genet., 2011

slide-46
SLIDE 46

89%* of disease-associated mutaAons are outside genes!

14

DNA sequence of a gene ACGTGTAACTGATAATGCCGATATT Controller proteins Control element has “DNA words” that controller proteins bind to *Stranger et al., Genet., 2011

slide-47
SLIDE 47

89%* of disease-associated mutaAons are outside genes!

14

DNA sequence of a gene ACGTGTAACTGATAATGCCGATATT Controller proteins Control element has “DNA words” that controller proteins bind to

Many posi:ons in a control element are not essen:al for its func:on!

*Stranger et al., Genet., 2011

slide-48
SLIDE 48

89%* of disease-associated mutaAons are outside genes!

14

DNA sequence of a gene ACGTGTAACTGATAATGCCGATATT Controller proteins Control element has “DNA words” that controller proteins bind to

Many posi:ons in a control element are not essen:al for its func:on!

à Which posiAons in controller elements maber?

*Stranger et al., Genet., 2011

slide-49
SLIDE 49

Q: Which posiAons in control elements maber?

15

slide-50
SLIDE 50

Q: Which posiAons in control elements maber?

Experimentally measure control elements in different :ssues

15

slide-51
SLIDE 51

Q: Which posiAons in control elements maber?

Experimentally measure control elements in different :ssues Predict :ssue- specific ac:vity of control elements from sequence using deep learning

15

slide-52
SLIDE 52

Q: Which posiAons in control elements maber?

Experimentally measure control elements in different :ssues Predict :ssue- specific ac:vity of control elements from sequence using deep learning

15

Interpret the model to learn important posi:ons!

slide-53
SLIDE 53

Overview of deep learning model C G A T A A C C G A T A T

Learned pa<ern detectors Input: DNA sequence represented as ones and zeros Later layers build on pa<erns of previous layer Accessible in HSCs Output: ON (+1) vs OFF (0)

A C G T 1 1 1 1 1 1 1 1 1 1 1 1 1

ON in cell- type X ON in cell- type Y

16

Architecture:

  • 3 convolu:onal

layers + batch norm

  • Max pooling
  • 2 fully

connected layers

  • Output
slide-54
SLIDE 54

Peyton Greenside

Publicly available dataset profiling control element ac:vity (Corces & Buenrostro et al., 2016)

Case study: understanding “control elements” of blood cell types

17

slide-55
SLIDE 55

Peyton Greenside

Publicly available dataset profiling control element ac:vity (Corces & Buenrostro et al., 2016)

Case study: understanding “control elements” of blood cell types

Hematopoe:c stem cell White blood cell Red blood cell

17

slide-56
SLIDE 56

Cell-type-specific use of “controller” sequence in HSC, B-cells and Erythroid

Peyton Greenside 18

slide-57
SLIDE 57

Gata Gata Gata SPI1

Cell-type-specific use of “controller” sequence in HSC, B-cells and Erythroid

Peyton Greenside 18

slide-58
SLIDE 58

Importance in HSC’s Gata Gata Gata SPI1

Cell-type-specific use of “controller” sequence in HSC, B-cells and Erythroid

SPI1 controller protein binding signal GATA1 controller protein binding signal

(Data unavailable) (Data unavailable)

“Is an acAve control element” signal Peyton Greenside HSC’s

18

slide-59
SLIDE 59

Importance in B-cells Gata Gata Gata SPI1

Cell-type-specific use of “controller” sequence in HSC, B-cells and Erythroid

SPI1 controller protein binding signal GATA1 controller protein binding signal

No peak No peak No peak

“Is an acAve control element” signal Peyton Greenside HSC’s B-cells

18

slide-60
SLIDE 60

Importance in Erythroid Gata Gata Gata SPI1

Cell-type-specific use of “controller” sequence in HSC, B-cells and Erythroid

SPI1 controller protein binding signal GATA1 controller protein binding signal “Is an acAve control element” signal Peyton Greenside HSC’s Erythroid B-cells

18

slide-61
SLIDE 61

Can study the regulatory code in millions of control elements!

Peyton Greenside

19

slide-62
SLIDE 62

QuesAons for the model

  • Which parts of the input are the most

important for making a given predic:on?

  • What are the recurring pa<erns in the

input?

20

QuesAon in biology: What are the DNA “words” (“moAfs”) driving controller protein binding?

slide-63
SLIDE 63

Individual GATA pa<ern detectors mo:fs found by DeepBind (Alipanahi et al.)

Naïve idea: look at individual pa<ern detectors

Problem: High levels of redundancy, because mulAple neurons cooperate with each other

21

slide-64
SLIDE 64

Individual GATA pa<ern detectors mo:fs found by DeepBind (Alipanahi et al.)

Naïve idea: look at individual pa<ern detectors

Problem: High levels of redundancy, because mulAple neurons cooperate with each other Computer vision

21

slide-65
SLIDE 65

How do we combine the contribu:ons of mul:ple pa<ern detectors to find consolidated pa<erns?

22

slide-66
SLIDE 66

How do we combine the contribu:ons of mul:ple pa<ern detectors to find consolidated pa<erns?

Insight: input-level importance scores reveal combined contribu:ons

Sequence 1 Sequence 2 Sequence 3 score score score

22

slide-67
SLIDE 67

How do we combine the contribu:ons of mul:ple pa<ern detectors to find consolidated pa<erns?

Insight: input-level importance scores reveal combined contribu:ons

22

slide-68
SLIDE 68

How do we combine the contribu:ons of mul:ple pa<ern detectors to find consolidated pa<erns?

Insight: input-level importance scores reveal combined contribu:ons

22

slide-69
SLIDE 69

How do we combine the contribu:ons of mul:ple pa<ern detectors to find consolidated pa<erns?

Insight: input-level importance scores reveal combined contribu:ons

22

slide-70
SLIDE 70

How do we combine the contribu:ons of mul:ple pa<ern detectors to find consolidated pa<erns?

Insight: input-level importance scores reveal combined contribu:ons

TF-MoDISco: TF Mo:f Discovery from Importance Scores

22

slide-71
SLIDE 71

TF-MoDISco: More details on the clustering

23

slide-72
SLIDE 72

TF-MoDISco: More details on the clustering (1) Compute affini:es between pairs of important segments using a cross-correla:on-like metric

23

slide-73
SLIDE 73

TF-MoDISco: More details on the clustering (1) Compute affini:es between pairs of important segments using a cross-correla:on-like metric

23

slide-74
SLIDE 74

TF-MoDISco: More details on the clustering (2) Cluster affinity matrix with community detec:on (1) Compute affini:es between pairs of important segments using a cross-correla:on-like metric

23

slide-75
SLIDE 75

Key idea: Density-AdapAve Distances (1)

  • Problem: no:on of “far away” varies with the

cluster

– Weak pa<erns clusters: instances of pa<ern may be farther away on average – No:on of “far” needs to take this into account

24

slide-76
SLIDE 76

Key idea: Density-AdapAve Distance (2)

  • Soln: Adapt no:on of distance to the local density of the data!

25

slide-77
SLIDE 77

Key idea: Density-AdapAve Distance (2)

  • Soln: Adapt no:on of distance to the local density of the data!

– First step of t-sne: compute condi:onal probs – βi is tuned to a<ain a desired perplexity! 25

slide-78
SLIDE 78

Key idea: Density-AdapAve Distance (2)

  • Soln: Adapt no:on of distance to the local density of the data!

– First step of t-sne: compute condi:onal probs – βi is tuned to a<ain a desired perplexity!

  • Larger βi will be used in denser region of the space

25

slide-79
SLIDE 79

Key idea: Density-AdapAve Distance (2)

  • Soln: Adapt no:on of distance to the local density of the data!

– First step of t-sne: compute condi:onal probs – βi is tuned to a<ain a desired perplexity!

  • Larger βi will be used in denser region of the space

– Use density-adapted probabili:es with clustering based on Louvain community detec:on 25

slide-80
SLIDE 80

26

We can learn 1000s of known and novel DNA words defining :ssue-specific control elements!

Peyton Greenside

slide-81
SLIDE 81

Summary

  • DeepLIFT: can efficiently reveal important parts of the

input for a given predic:on

– With advantages over other methods – h<ps://github.com/kundajelab/deepli^

  • TF-MoDISco: Mo:f Discovery from Importance Scores

– More details in talk at NIPS comp bio: h<ps://www.youtube.com/watch?v=fXPGVJg956E

  • Can be used to understand the regulatory sequence

controlling :ssue-specific control elements

27

slide-82
SLIDE 82

Oana Ursu Amr Alexandari Daniel Kim Michael Wainberg Maryna Taranova Chris Probert Jin-Wook Lee

Chuan Sheng Foo Johnny Israeli Irene Kaplow Funding HHMI Interna:onal Student Research Fellowship Bio-X Fellowship Microso^ Women’s Fellowship NIH R01ES02500902 Peyton Greenside Anna Shcherbina Anshul Kundaje