Transfer Learning and Applications in Computational Biology 1 - - PowerPoint PPT Presentation

transfer learning and applications in computational
SMART_READER_LITE
LIVE PREVIEW

Transfer Learning and Applications in Computational Biology 1 - - PowerPoint PPT Presentation

Transfer Learning and Applications in Computational Biology 1 Christian Widmer, 1 , 2 Marius Kloft, 1 , 3 , 4 Gunnar R atsch, Nico G ornitz, Gabriele Schweikert 1 Memorial Sloan-Kettering Cancer Center, NY, USA 2 Microsoft Research, Los


slide-1
SLIDE 1

Transfer Learning and Applications in Computational Biology

Gunnar R¨ atsch,

1 Christian Widmer, 1,2 Marius Kloft, 1,3,4

Nico G¨

  • rnitz, Gabriele Schweikert

1 Memorial Sloan-Kettering Cancer Center, NY, USA 2 Microsoft Research, Los Angeles, USA 3 Courant Institute, NYU, New York, USA 4 Humbolt University, Berlin, Germany

slide-2
SLIDE 2

Roadmap

Motivation from computational biology

DNA

TSS Donor Acceptor Donor Acceptor TIS Stop polyA/cleavage

Empirical comparison of domain adaptation algorithms Algorithms for hierarchical multi-task learning Algorithms for learning task relations Fast(er) Algorithms Discussion & Conclusion

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

2

Memorial Sloan-Kettering Cancer Center

slide-3
SLIDE 3

A Core CompBio Problem: Gene Finding

DNA pre-mRNA mRNA Protein

5' UTR exon intergenic 3' UTR intron genic exon exon intron

polyA cap

Given a piece of DNA sequence Predict gene products including intermediate processing steps Predict signals used during processing Predict the correct corresponding label sequence with labels

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

3

Memorial Sloan-Kettering Cancer Center

slide-4
SLIDE 4

A Core CompBio Problem: Gene Finding

DNA pre-mRNA mRNA Protein

polyA cap TSS Splice Donor Splice Acceptor Splice Donor Splice Acceptor TIS Stop polyA/cleavage

Given a piece of DNA sequence Predict gene products including intermediate processing steps Predict signals used during processing Predict the correct corresponding label sequence with labels

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

3

Memorial Sloan-Kettering Cancer Center

slide-5
SLIDE 5

A Core CompBio Problem: Gene Finding

DNA pre-mRNA mRNA Protein

polyA cap TSS Donor Acceptor Donor Acceptor TIS Stop polyA/cleavage

Given a piece of DNA sequence Predict gene products including intermediate processing steps Predict signals used during processing Predict the correct corresponding label sequence with labels

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

3

Memorial Sloan-Kettering Cancer Center

slide-6
SLIDE 6

A Core CompBio Problem: Gene Finding

DNA pre-mRNA mRNA Protein

polyA cap TSS Donor Acceptor Donor Acceptor TIS Stop polyA/cleavage

Given a piece of DNA sequence Predict gene products including intermediate processing steps Predict signals used during processing Predict the correct corresponding label sequence with labels

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

3

Memorial Sloan-Kettering Cancer Center

slide-7
SLIDE 7

Example: Splice Site Recognition

CT...GTCGTA...GAAGCTAGGAGCGC...ACGCGT...GA

150 nucleotides window around dimer

True Splice Sites

GCCAATATTTTTCTATTCAGGTGCAATCAATCACCCATCAT ATTGAATGAACATATTCCAGGGTCTCCTTCCACCTCAACAA AGCAACGAACTCCATTACAGCAAGGACATCGAAGTCGATCA GCCAATTTTTGACCTTGCAGAATCAATCGTGCACGTTCGGA CATCTGAAATTTCCCCCAAGTATAGCGGAAATAGACCGACG GAAATTTCCCCCAAGTATAGCGGAAATAGACCGACGAAATC CCCAAGTATAGCGGAAATAGACCGACGAAATCGCTCTCTCC AATCGCTCTCTCCCTGGGAGCGATGCGAATGTCAAATTCGA ACCAAAAAATCAATTTTTAGATTTTTCGAATTAATTTTTCG TGCTTTGCATGTTTCTAAAGTTACAGCCGTTCAAAATTTAA GCATGTTTCTAAAGTTACAGCCGTTCAAAATTTAAAAACTC ACCAATACGCAATGACTGAGTCTGTAATTTCACATAGTAAT 1 1 1 1

  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

4

Memorial Sloan-Kettering Cancer Center

slide-8
SLIDE 8

Example: Splice Site Recognition

CT...GTCGTA...GAAGCTAGGAGCGC...ACGCGT...GA

150 nucleotides window around dimer

Potential Splice Sites

GCCAATATTTTTCTATTCAGGTGCAATCAATCACCCATCAT ATTGAATGAACATATTCCAGGGTCTCCTTCCACCTCAACAA AGCAACGAACTCCATTACAGCAAGGACATCGAAGTCGATCA GCCAATTTTTGACCTTGCAGAATCAATCGTGCACGTTCGGA CATCTGAAATTTCCCCCAAGTATAGCGGAAATAGACCGACG GAAATTTCCCCCAAGTATAGCGGAAATAGACCGACGAAATC CCCAAGTATAGCGGAAATAGACCGACGAAATCGCTCTCTCC AATCGCTCTCTCCCTGGGAGCGATGCGAATGTCAAATTCGA ACCAAAAAATCAATTTTTAGATTTTTCGAATTAATTTTTCG TGCTTTGCATGTTTCTAAAGTTACAGCCGTTCAAAATTTAA GCATGTTTCTAAAGTTACAGCCGTTCAAAATTTAAAAACTC ACCAATACGCAATGACTGAGTCTGTAATTTCACATAGTAAT 1 1 1 1

  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1

. . .

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

4

Memorial Sloan-Kettering Cancer Center

slide-9
SLIDE 9

Domain Adaptation for Genome Annotation

Motivation: Increasing number of sequenced genomes Often newly sequenced genomes are poorly annotated However often relatives with good annotation exist Idea: Transfer knowledge between organisms Example: Splice site annotation in worm genomes (≈2010) Newly sequenced organism: C. briggsae

≈ 100 confirmed genes (590 splice site pairs)

Well annotated relative: C. elegans

≈ 10.000 confirmed genens (36.782 splice site pairs)

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

5

Memorial Sloan-Kettering Cancer Center

slide-10
SLIDE 10

Domain Adaptation for Genome Annotation

Motivation: Increasing number of sequenced genomes Often newly sequenced genomes are poorly annotated However often relatives with good annotation exist Idea: Transfer knowledge between organisms Example: Splice site annotation in worm genomes (≈2010) Newly sequenced organism: C. briggsae

≈ 100 confirmed genes (590 splice site pairs)

Well annotated relative: C. elegans

≈ 10.000 confirmed genens (36.782 splice site pairs)

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

5

Memorial Sloan-Kettering Cancer Center

slide-11
SLIDE 11

The “Bioinformatics Way” of Transfer Learning

1 Homology-based annotation

(a.k.a. “Comparative genomics”)

Source Target

Works for closely related species, does not require any labeled data from target organism.

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

6

Memorial Sloan-Kettering Cancer Center

slide-12
SLIDE 12

The “Bioinformatics Way” of Transfer Learning

1 Homology-based annotation

(a.k.a. “Comparative genomics”)

Source Target

?

Works for closely related species, does not require any labeled data from target organism.

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

6

Memorial Sloan-Kettering Cancer Center

slide-13
SLIDE 13

Domain Adaptation by Learning vs. Homology

[Schweikert et al., 2008; Widmer et al., 2010b]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

7

Memorial Sloan-Kettering Cancer Center

slide-14
SLIDE 14

Domain Adaptation by Learning vs. Homology

[Schweikert et al., 2008; Widmer et al., 2010b]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

7

Memorial Sloan-Kettering Cancer Center

slide-15
SLIDE 15

Domain Adaptation by Learning vs. Homology

[Schweikert et al., 2008; Widmer et al., 2010b]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

7

Memorial Sloan-Kettering Cancer Center

slide-16
SLIDE 16

Domain Adaptation by Learning vs. Homology

[Schweikert et al., 2008; Widmer et al., 2010b]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

7

Memorial Sloan-Kettering Cancer Center

slide-17
SLIDE 17

Domain Adaptation Algorithms Overview

[Schweikert et al., 2008]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

8

Memorial Sloan-Kettering Cancer Center

slide-18
SLIDE 18

Large-Scale Empirical Comparison

Varying distances Different data set sizes

[MPI Developmental Biology and UCSC Genome Browser]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

9

Memorial Sloan-Kettering Cancer Center

slide-19
SLIDE 19

Experimental Setup

Source dataset size: always 100k examples Target dataset sizes: {2500, 6500, 16000, 64000, 100000} Simple kernel (WDK of degree 1 ⇒ under-fitting) Extensive model selection for each method Area under Precision/Recall curve for evaluation

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

10

Memorial Sloan-Kettering Cancer Center

slide-20
SLIDE 20

Domain Adaptation Results Summary

Considerable improvements possible Sophisticated domain adaptation methods needed on distantly related organisms Best overall performance has DualTask Most cost effective Convex/AdvancedConvex

[Schweikert et al., 2008]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

11

Memorial Sloan-Kettering Cancer Center

slide-21
SLIDE 21

Domain Adaptation Methods

Idea:

[e.g., Caruana, 1997]

Simultaneous optimization of both models Similarity between solution enforced Approach: min

w S,w T ,ξ

1 2w S2 + 1 2w T2−Bw T

Tw S + C m+n

  • i=1

ξi s.t. yi(w S, Φ(xi) + b) ≥ 1 − ξi i = 1, . . . , m yi(w T, Φ(xi) + b) ≥ 1 − ξi i = m + 1, . . . , m + n Equivalent to multi-task kernel learning:

[Daume III, 2007]

KMTK((x, t), (x′, t′)) = γt,t′K(x, x′) for a suitably chosen Γ (p.s.d.).

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

12

Memorial Sloan-Kettering Cancer Center

slide-22
SLIDE 22

Domain Adaptation Methods

Idea:

[e.g., Caruana, 1997]

Simultaneous optimization of both models Similarity between solution enforced Approach: min

w S,w T ,ξ

1 2w S2 + 1 2w T2−Bw T

Tw S + C m+n

  • i=1

ξi s.t. yi(w S, Φ(xi) + b) ≥ 1 − ξi i = 1, . . . , m yi(w T, Φ(xi) + b) ≥ 1 − ξi i = m + 1, . . . , m + n Equivalent to multi-task kernel learning:

[Daume III, 2007]

KMTK((x, t), (x′, t′)) = γt,t′K(x, x′) for a suitably chosen Γ (p.s.d.).

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

12

Memorial Sloan-Kettering Cancer Center

slide-23
SLIDE 23

Multiple Source Domains

Combine information from several sources (treated equally) Methods: Multi-task learning, Convex combination, Shifting

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

13

Memorial Sloan-Kettering Cancer Center

slide-24
SLIDE 24

Results - Multiple Source Domains

Single source model best for very closely related task Multiple source model better for distantly related tasks Multi-task algorithm strongest

[Schweikert et al., 2008]

Multiple sources can be worse than single source. How to use information on relatedness in learning?

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

14

Memorial Sloan-Kettering Cancer Center

slide-25
SLIDE 25

Results - Multiple Source Domains

Single source model best for very closely related task Multiple source model better for distantly related tasks Multi-task algorithm strongest

[Schweikert et al., 2008]

Multiple sources can be worse than single source. How to use information on relatedness in learning?

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

14

Memorial Sloan-Kettering Cancer Center

slide-26
SLIDE 26

Results - Multiple Source Domains

Single source model best for very closely related task Multiple source model better for distantly related tasks Multi-task algorithm strongest

[Schweikert et al., 2008]

Multiple sources can be worse than single source. How to use information on relatedness in learning?

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

14

Memorial Sloan-Kettering Cancer Center

slide-27
SLIDE 27

Multitask learning

Hierarchical structure arises naturally from the Tree of Life Taxonomy defines relationship between tasks Closer tasks benefit more from each other

[Widmer et al., 2010a]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

15

Memorial Sloan-Kettering Cancer Center

slide-28
SLIDE 28

Two ways of leveraging a given taxonomy T

KMTL((x, t), (x′, t′)) = γt,t′K(x, x′)

[Widmer et al., 2010a]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

16

Memorial Sloan-Kettering Cancer Center

slide-29
SLIDE 29

From Taxonomy to Γ

B

1

D C

2 3 4

100 million years

Time

now 400 million years

5

A 990 million years 1600 million years

worm 1 worm 2 worm 3 fly plant

Idea: γi,j should be inversely related to time to last common ancestor Strategies: 1/years, Hop-distance, . . .

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

17

Memorial Sloan-Kettering Cancer Center

slide-30
SLIDE 30

From Taxonomy to Γ

B

1

D C

2 3 4

100 million years

Time

now 400 million years

5

A 990 million years 1600 million years

worm 1 worm 2 worm 3 fly plant

Idea: γi,j should be inversely related to time to last common ancestor Strategies: 1/years, Hop-distance, . . .

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

17

Memorial Sloan-Kettering Cancer Center

slide-31
SLIDE 31

Hierarchical Top-Down Approach

Idea: Exploit taxonomy G in a top-down fashion Use taxonomy T in a top-down procedure Initialization: w0 trained on union of all task datasets Top-Down for each node t:

Train on Di =

ji Dj

Regularize w i against parent predictor w parent: min

w i,b

1 2w i − w parent2 + C

  • (x,y)∈Di

ℓ (Φ(x), w i + b, y) ,

Use leaf predictors for classification

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

18

Memorial Sloan-Kettering Cancer Center

slide-32
SLIDE 32

Hierarchical Top-Down Approach: Illustration

(a) Given taxonomy (b) Top-level training (c) Intermediate training (d) Taxon training

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

19

Memorial Sloan-Kettering Cancer Center

slide-33
SLIDE 33

Application to Splicing Data

Formulation as binary classification problem Utilize 15 organisms related by taxonomy Restricted to at most 10.000 examples per organism

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

20

Memorial Sloan-Kettering Cancer Center

slide-34
SLIDE 34

Application to Splicing Data

Formulation as binary classification problem Utilize 15 organisms related by taxonomy Restricted to at most 10.000 examples per organism

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

20

Memorial Sloan-Kettering Cancer Center

slide-35
SLIDE 35

Results: Splicing Data

Observations: Union > Plain → conservation Often: Union > Nearest MTL methods outperform baselines Best performer: Top-Down (& MT-Kernel)

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

21

Memorial Sloan-Kettering Cancer Center

slide-36
SLIDE 36

Digestion

Hierarchy helps transferring information into right places Top-down approach transfers information most accurately Performance depends strongly on task similarity matrix

Choice very difficult, not easily done with cross-validation Can we learn, e.g. γi,j = f (“years of evolution between i and j”)? Adaptive Multi-Task approach? ⇒ Multiple-Kernel Multi-Task Learning!

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

22

Memorial Sloan-Kettering Cancer Center

slide-37
SLIDE 37

Digestion

Hierarchy helps transferring information into right places Top-down approach transfers information most accurately Performance depends strongly on task similarity matrix

Choice very difficult, not easily done with cross-validation Can we learn, e.g. γi,j = f (“years of evolution between i and j”)? Adaptive Multi-Task approach? ⇒ Multiple-Kernel Multi-Task Learning!

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

22

Memorial Sloan-Kettering Cancer Center

slide-38
SLIDE 38

Digestion

Hierarchy helps transferring information into right places Top-down approach transfers information most accurately Performance depends strongly on task similarity matrix

Choice very difficult, not easily done with cross-validation Can we learn, e.g. γi,j = f (“years of evolution between i and j”)? Adaptive Multi-Task approach? ⇒ Multiple-Kernel Multi-Task Learning!

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

22

Memorial Sloan-Kettering Cancer Center

slide-39
SLIDE 39

Digestion

Hierarchy helps transferring information into right places Top-down approach transfers information most accurately Performance depends strongly on task similarity matrix

Choice very difficult, not easily done with cross-validation Can we learn, e.g. γi,j = f (“years of evolution between i and j”)? Adaptive Multi-Task approach? ⇒ Multiple-Kernel Multi-Task Learning!

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

22

Memorial Sloan-Kettering Cancer Center

slide-40
SLIDE 40

MTL by Graph Regularization

Adjacency matrix A

[Evgeniou et al. 2005]

Regularizer expressed using Graph Laplacian L

T

  • s=1

T

  • t=1

Ast ||w s − w t||2 =

T

  • s=1

T

  • t=1

Lstw T

s w t = Tr(W LW ⊤)

where L = D − A and Dii =

j Aij.

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

23

Memorial Sloan-Kettering Cancer Center

slide-41
SLIDE 41

MTL by Graph Regularization

Adjacency matrix A mtl regularizer

[Evgeniou et al. 2005]

Regularizer expressed using Graph Laplacian L

T

  • s=1

T

  • t=1

Ast ||w s − w t||2 =

T

  • s=1

T

  • t=1

Lstw T

s w t = Tr(W LW ⊤)

where L = D − A and Dii =

j Aij.

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

23

Memorial Sloan-Kettering Cancer Center

slide-42
SLIDE 42

MTL by Graph Regularization

Adjacency matrix A mtl regularizer

[Evgeniou et al. 2005]

Regularizer expressed using Graph Laplacian L

T

  • s=1

T

  • t=1

Ast ||w s − w t||2 =

T

  • s=1

T

  • t=1

Lstw T

s w t = Tr(W LW ⊤)

where L = D − A and Dii =

j Aij.

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

23

Memorial Sloan-Kettering Cancer Center

slide-43
SLIDE 43

Beyond given task similarities

Question: How can we learn these similarities?

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

24

Memorial Sloan-Kettering Cancer Center

slide-44
SLIDE 44

Multitask Multiple Kernel Learning

General formulation:

inf

||θ||p≤1 W ∈H

1 2

M

  • m=1

θm

−1Tr(W mQmW T m) + C N

  • i=1

l

  • yi

M

  • m=1
  • w mτ(i), φm(xi)
  • Key properties

We learn a weighted sum of M regularizers Allows to parameterize the task similarity measure Learning the weights is convex

Key contributions

Subsume existing methods and derive new formulations Estiablish primal dual relationship using Fenchel duality Derive dual-coordinate descend solver

[Widmer et al. 2010, 2013, 2014 i.P.]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

25

Memorial Sloan-Kettering Cancer Center

slide-45
SLIDE 45

Multitask Multiple Kernel Learning

General formulation:

inf

||θ||p≤1 W ∈H

1 2

M

  • m=1

θm

−1Tr(W mQmW T m) + C N

  • i=1

l

  • yi

M

  • m=1
  • w mτ(i), φm(xi)
  • Key properties

We learn a weighted sum of M regularizers Allows to parameterize the task similarity measure Learning the weights is convex

Key contributions

Subsume existing methods and derive new formulations Estiablish primal dual relationship using Fenchel duality Derive dual-coordinate descend solver

[Widmer et al. 2010, 2013, 2014 i.P.]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

25

Memorial Sloan-Kettering Cancer Center

slide-46
SLIDE 46

Recover existing formulations

Existing methods are recovered by choosing Q, M, T, and φ. Multiple Kernel Learning: T = 1, different φm

[Kloft et al., 2011]

Standard single-task: M = 1, T = 1

[Boser et al., 1992]

Frustratingly easy: M = 1, T = 2, special Q

[Daume III, 2007]

Graph-regularized MTL: M = 1, Q = L

[Evgeniou et al., 2005]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

26

Memorial Sloan-Kettering Cancer Center

slide-47
SLIDE 47

Recover existing formulations

Existing methods are recovered by choosing Q, M, T, and φ. Multiple Kernel Learning: T = 1, different φm

[Kloft et al., 2011]

Standard single-task: M = 1, T = 1

[Boser et al., 1992]

Frustratingly easy: M = 1, T = 2, special Q

[Daume III, 2007]

Graph-regularized MTL: M = 1, Q = L

[Evgeniou et al., 2005]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

26

Memorial Sloan-Kettering Cancer Center

slide-48
SLIDE 48

Recover existing formulations

Existing methods are recovered by choosing Q, M, T, and φ. Multiple Kernel Learning: T = 1, different φm

[Kloft et al., 2011]

Standard single-task: M = 1, T = 1

[Boser et al., 1992]

Frustratingly easy: M = 1, T = 2, special Q

[Daume III, 2007]

Graph-regularized MTL: M = 1, Q = L

[Evgeniou et al., 2005]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

26

Memorial Sloan-Kettering Cancer Center

slide-49
SLIDE 49

Recover existing formulations

Existing methods are recovered by choosing Q, M, T, and φ. Multiple Kernel Learning: T = 1, different φm

[Kloft et al., 2011]

Standard single-task: M = 1, T = 1

[Boser et al., 1992]

Frustratingly easy: M = 1, T = 2, special Q

[Daume III, 2007]

Graph-regularized MTL: M = 1, Q = L

[Evgeniou et al., 2005]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

26

Memorial Sloan-Kettering Cancer Center

slide-50
SLIDE 50

Recover existing formulations

Existing methods are recovered by choosing Q, M, T, and φ. Multiple Kernel Learning: T = 1, different φm

[Kloft et al., 2011]

Standard single-task: M = 1, T = 1

[Boser et al., 1992]

Frustratingly easy: M = 1, T = 2, special Q

[Daume III, 2007]

Graph-regularized MTL: M = 1, Q = L

[Evgeniou et al., 2005]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

26

Memorial Sloan-Kettering Cancer Center

slide-51
SLIDE 51

Novel formulations

+ + Graphs from different sources Hierarchical decomposition Transformations at different length scales, e.g. Q(m)

st

= exp(Ast/σm) Powerset of tasks

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

27

Memorial Sloan-Kettering Cancer Center

slide-52
SLIDE 52

Novel formulations

+ + Graphs from different sources Hierarchical decomposition Transformations at different length scales, e.g. Q(m)

st

= exp(Ast/σm) Powerset of tasks

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

27

Memorial Sloan-Kettering Cancer Center

slide-53
SLIDE 53

Novel formulations

+ +

5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30

Graphs from different sources Hierarchical decomposition Transformations at different length scales, e.g. Q(m)

st

= exp(Ast/σm) Powerset of tasks

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

27

Memorial Sloan-Kettering Cancer Center

slide-54
SLIDE 54

Novel formulations

+ + Graphs from different sources Hierarchical decomposition Transformations at different length scales, e.g. Q(m)

st

= exp(Ast/σm) Powerset of tasks

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

27

Memorial Sloan-Kettering Cancer Center

slide-55
SLIDE 55

Illustration on Detecting Gene Starts

+ +

5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30

Initial task-similarity A from conserved sequences Different length scales to adapt trade-off: Q(m)

st

= exp(Ast/σm)

[Widmer et al. 2014 i.P.]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

28

Memorial Sloan-Kettering Cancer Center

slide-56
SLIDE 56

Illustration on Detecting Gene Starts

D.melanogaster

Transcription start sites (TSS)

R.norvegicus R.latipes D.simulans G.max C.sativus A.thaliana O.sativa H.sapiens D.pseudoobscura C.briggsae Mean

11/32 organisms c Gunnar R¨ atsch ( cBio@MSKCC) Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014 29

Memorial Sloan-Kettering Cancer Center

slide-57
SLIDE 57

An efficient solver

How can we solve this efficiently?

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

30

Memorial Sloan-Kettering Cancer Center

slide-58
SLIDE 58

Dual-coordinate descend

We bring the state-of-the-art from SVMs to MTL: LibLinear Update one example weight αi at a time Keep primal variable w = N

i=1 αiyixi in memory

Use this to speed up computation: ∇if (α) = ( ¯ Qα)i − 1 =

N

  • j=1

yiyjk(xi, xj)αj − 1

  • O(N·D)

= yiwTxi − 1

  • O(D)

.

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

31

Memorial Sloan-Kettering Cancer Center

slide-59
SLIDE 59

Dual-coordinate descend

Establish primal-dual relationship using Fenchel duality inf

θ∈Θp

sup

0αC

− 1 2

N

  • i,j=1

αiαjyiyj

M

  • m=1

θm Q(−1)

mτ(i)τ(j)km(xi, xj)

  • ˜

km

+

N

  • i=1

αi, Representer theorem: w mt = θm

N

  • i=1

Q(−1)

mτ(i)tαiyiφm(xi) .

Derive update rule (set gradient to zero, solve for d, project): d⋆ = max

  • −αi , min
  • C − αi , 1 − yi

M

m=1 θm

  • w mτ(i), φm(xi)
  • M

m=1 θm˜

km(xi, xi)

  • Caveat: We need to update all w new

mt

:= w old

mt + dθmQ(−1) mτ(i)tyiφm(xi)

[Widmer et al., 2012]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

32

Memorial Sloan-Kettering Cancer Center

slide-60
SLIDE 60

Computational Experiments

dim examples tasks time(s) mtk time(s) dcd Gauss2D 2 1 · 105 2 2 · 103 4 · 100 Breast Cancer 44 474 3 1 · 101 2 · 100 MNIST-MTL 784 9.0 · 103 3 3 · 102 1 · 102 Land Mine 9 1.5 · 104 29 8 · 101 2 · 102 Splicing 6 · 106 6.4 · 106 4 > 1 · 106 9 · 104

Different data set properties Compare to SVMLight-based solver on ˜ k Fast convergence for moderate number of tasks

[Widmer et al., 2012]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

33

Memorial Sloan-Kettering Cancer Center

slide-61
SLIDE 61

Computational Experiments

dim examples tasks time(s) mtk time(s) dcd Gauss2D 2 1 · 105 2 2 · 103 4 · 100 Breast Cancer 44 474 3 1 · 101 2 · 100 MNIST-MTL 784 9.0 · 103 3 3 · 102 1 · 102 Land Mine 9 1.5 · 104 29 8 · 101 2 · 102 Splicing 6 · 106 6.4 · 106 4 > 1 · 106 9 · 104

Different data set properties Compare to SVMLight-based solver on ˜ k Fast convergence for moderate number of tasks

[Widmer et al., 2012]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

33

Memorial Sloan-Kettering Cancer Center

slide-62
SLIDE 62

Experiments (a): Splice-site recognition

Taxonomy is used to define collection of meta-tasks I Baselines: Plain, Union, Vanilla MTL Best performance for norm q = 2, 3

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

34

Memorial Sloan-Kettering Cancer Center

slide-63
SLIDE 63

Experiments (b): MHC-I binding prediction

Method Plain Union Vanilla MTL Powerset MT-MKL auPRC 67.1% 57.6% 67.9% 69.9% No task structure (used): Powerset MT-MKL Question: Can we identify meaningful structure?

[Widmer et al., 2010b]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

35

Memorial Sloan-Kettering Cancer Center

slide-64
SLIDE 64

Experiments (b): MHC-I binding prediction

Method Plain Union Vanilla MTL Powerset MT-MKL auPRC 67.1% 57.6% 67.9% 69.9% No task structure (used): Powerset MT-MKL Question: Can we identify meaningful structure?

[Widmer et al., 2010b]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

35

Memorial Sloan-Kettering Cancer Center

slide-65
SLIDE 65

Experiments (b): MHC-I binding prediction

Learned weights can also be used for interpretation purposes: Similarity computed from meta-task weights Comparison to similarity between peptide sequences Successfully identifies biological meaningful structure

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

36

Memorial Sloan-Kettering Cancer Center

slide-66
SLIDE 66

Application to Imaging

Nucleus Fitting 2D to 3D Nucleus fitting: Couple layers

[Widmer et al., 2014]

2D helps 3D: Couple 2D and 3D

[Lou et al., 2012]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

37

Memorial Sloan-Kettering Cancer Center

slide-67
SLIDE 67

Extension to Structured Output Learning

Strategies only work on regularizer Also work for structured output learning

Hidden Markov SVMs Gene prediction (prokaryotes)

DNA pre-mRNA mRNA Protein 5' UTR exon intergenic 3' UTR intron genic exon exon intron

polyA cap

[Widmer et al., 2011]

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

38

Memorial Sloan-Kettering Cancer Center

slide-68
SLIDE 68

Summary

Domain adaptation: Considerable improvements possible Sophisticated methods slight edge for distantly related tasks Multitask learning: Novel methods provide scalable way of integrating information

(Implementations available for SVMLight and LibSVM)

Design of task similarity matrix critical & difficult Recent extensions: Estimation of “optimal” task similarity matrix Extension to structured output learning Cleaner formulations, Large-scale MTK-MT-SVMs Material available at: www.raetschlab.org/suppl/transfer-learning

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

39

Memorial Sloan-Kettering Cancer Center

slide-69
SLIDE 69

Summary

Domain adaptation: Considerable improvements possible Sophisticated methods slight edge for distantly related tasks Multitask learning: Novel methods provide scalable way of integrating information

(Implementations available for SVMLight and LibSVM)

Design of task similarity matrix critical & difficult Recent extensions: Estimation of “optimal” task similarity matrix Extension to structured output learning Cleaner formulations, Large-scale MTK-MT-SVMs Material available at: www.raetschlab.org/suppl/transfer-learning

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

39

Memorial Sloan-Kettering Cancer Center

slide-70
SLIDE 70

Summary

Domain adaptation: Considerable improvements possible Sophisticated methods slight edge for distantly related tasks Multitask learning: Novel methods provide scalable way of integrating information

(Implementations available for SVMLight and LibSVM)

Design of task similarity matrix critical & difficult Recent extensions: Estimation of “optimal” task similarity matrix Extension to structured output learning Cleaner formulations, Large-scale MTK-MT-SVMs Material available at: www.raetschlab.org/suppl/transfer-learning

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

39

Memorial Sloan-Kettering Cancer Center

slide-71
SLIDE 71

Summary

Domain adaptation: Considerable improvements possible Sophisticated methods slight edge for distantly related tasks Multitask learning: Novel methods provide scalable way of integrating information

(Implementations available for SVMLight and LibSVM)

Design of task similarity matrix critical & difficult Recent extensions: Estimation of “optimal” task similarity matrix Extension to structured output learning Cleaner formulations, Large-scale MTK-MT-SVMs Material available at: www.raetschlab.org/suppl/transfer-learning

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

39

Memorial Sloan-Kettering Cancer Center

slide-72
SLIDE 72

Acknowledgements

Christian Widmer

(MSKCC & TU Berlin, Microsoft)

Marius Kloft

(MSKCC, NYU, Humboldt U)

Vipin Sreedharan

(MSKCC)

Involved earlier Gabriele Schweikert Nico G¨

  • rnitz

Nora Toussaint Jose Leiva Yasemin Altun Bernhard Sch¨

  • lkopf

Funding by German Research Foundation, Max Planck Society & MSKCC Thank you for your attention!

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

40

Memorial Sloan-Kettering Cancer Center

slide-73
SLIDE 73

Acknowledgements

Christian Widmer

(MSKCC & TU Berlin, Microsoft)

Marius Kloft

(MSKCC, NYU, Humboldt U)

Vipin Sreedharan

(MSKCC)

Involved earlier Gabriele Schweikert Nico G¨

  • rnitz

Nora Toussaint Jose Leiva Yasemin Altun Bernhard Sch¨

  • lkopf

Funding by German Research Foundation, Max Planck Society & MSKCC Thank you for your attention!

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

40

Memorial Sloan-Kettering Cancer Center

slide-74
SLIDE 74

Many algorithms implemented in Shogun toolbox (GPL, ≥ 1000 users)

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

41

Memorial Sloan-Kettering Cancer Center

slide-75
SLIDE 75

Soon...

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

42

Memorial Sloan-Kettering Cancer Center

slide-76
SLIDE 76

References I

Boser, B., Guyon, I., and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Haussler, D., editor, Proc. Annual Conf. Computational Learning Theory, pages 144–152, Pittsburgh, PA. ACM Press. Caruana, R. (1997). Multitask learning. Machine Learning, 28(1):41–75. Daume III, H. (2007). Frustratingly easy domain adaptation. In Conference of the Association for Computational Linguistics (ACL), Prague, Czech Republic. Evgeniou, T., Micchelli, C., and Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6:615–637. Lou, X., Widmer, C., Hadjantonakis, K., and R¨ atsch, G. (2012). Structured domain adaptation across imaging modality how 2d data helps 3d inference. In Proc. Workshop on Machine Learning in Computational Biology.

  • M. Kloft, U. Brefeld, S. S. and Zien, A. (2011). Lp-norm multiple kernel learning. Journal of Machine Learning Research,

(12):953–997. Schweikert, G., Widmer, C., Sch¨

  • lkopf, B., and R¨

atsch, G. (2008). An empirical analysis of domain adaptation algorithms. In Advances in Neural Information Processing System, NIPS, volume 22, Vancouver, B.C. Widmer, C. (2014). Regularization-based Multitask Learning. PhD thesis, Berlin Institute of Technology. Widmer, C., G¨

  • rnitz, N., Zeller, G., and R¨

atsch, G. (2011). Hierarchical multitask structured output learning for large-scale sequence segmentation. submitted. Widmer, C., Heinrich, S., Drewe, P., Lou, X., Umrania, S., and R¨ atsch, G. (2014). Graph-regularized 3d shape reconstruction from highly anisotropic and noisy images. SIViP. Widmer, C., Kloft, M., G¨

  • rnitz, N., and R¨

atsch, G. (2012). Efficient training of graph-regularized multitask svms. In Proc. ECML. Widmer, C., Leiva, J., Altun, Y., and Rtsch, G. (2010a). Leveraging sequence classification by taxonomy-based multitask

  • learning. In Proc. RECOMB’10. accepted.

Widmer, C. and R¨ atsch, G. (2011). Transfer learning in computational biology. In Proc. ICML. Widmer, C., Toussaint, N., Altun, Y., and R¨ atsch, G. (2010b). Inferring latent task structure for multi-task learning by multiple kernel learning. BMC Bioinformatics, 11(Suppl. 8):S5.

c Gunnar R¨ atsch ( cBio@MSKCC)

Transfer Learning in Computational Biology NIPS MTL Workshop December 13, 2014

43

Memorial Sloan-Kettering Cancer Center