Nonparametric combinatorial sequence models Fabian L. Wauthier, UC - - PowerPoint PPT Presentation

nonparametric combinatorial sequence models
SMART_READER_LITE
LIVE PREVIEW

Nonparametric combinatorial sequence models Fabian L. Wauthier, UC - - PowerPoint PPT Presentation

Nonparametric combinatorial sequence models Fabian L. Wauthier, UC Berkeley with Nebojsa Jojic (MSR) and Michael I. Jordan (UCB) 30 th March, 2011 Fabian L. Wauthier: Nonparametric combinatorial sequence models, 1 Biological motivation:


slide-1
SLIDE 1

Nonparametric combinatorial sequence models

Fabian L. Wauthier, UC Berkeley with Nebojsa Jojic (MSR) and Michael I. Jordan (UCB) 30th March, 2011

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 1

slide-2
SLIDE 2

Biological motivation: Sequence variability

Y N Q S E D G S H T I Q I M Y G C D Y N Q S E A G S H T L Q R M Y G C D Y N Q S E A G S H I I Q R M Y G C D

◮ Suppose we are given aligned sequences.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 2

slide-3
SLIDE 3

Biological motivation: Sequence variability

Y N Q S E D G S H T I Q I M Y G C D Y N Q S E A G S H T L Q R M Y G C D Y N Q S E A G S H I I Q R M Y G C D

◮ Suppose we are given aligned sequences. ◮ Interest in understanding sequence variability:

  • Functional properties, domains, ancestral inference, etc.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 2

slide-4
SLIDE 4

Biological motivation: Sequence variability

Y N Q S E D G S H T I Q I M Y G C D Y N Q S E A G S H T L Q R M Y G C D Y N Q S E A G S H I I Q R M Y G C D

◮ Suppose we are given aligned sequences. ◮ Interest in understanding sequence variability:

  • Functional properties, domains, ancestral inference, etc.

◮ Many simplifying assumptions in previous work:

  • Site independence: Kingman coalescents, phylogenetic trees.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 2

slide-5
SLIDE 5

Biological motivation: Sequence variability

Y N Q S E D G S H T I Q I M Y G C D Y N Q S E A G S H T L Q R M Y G C D Y N Q S E A G S H I I Q R M Y G C D

◮ Suppose we are given aligned sequences. ◮ Interest in understanding sequence variability:

  • Functional properties, domains, ancestral inference, etc.

◮ Many simplifying assumptions in previous work:

  • Site independence: Kingman coalescents, phylogenetic trees.
  • Full site dependence: Mixture models

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 2

slide-6
SLIDE 6

Biological motivation: Sequence variability

Y N Q S E D G S H T I Q I M Y G C D Y N Q S E A G S H T L Q R M Y G C D Y N Q S E A G S H I I Q R M Y G C D

◮ Suppose we are given aligned sequences. ◮ Interest in understanding sequence variability:

  • Functional properties, domains, ancestral inference, etc.

◮ Many simplifying assumptions in previous work:

  • Site independence: Kingman coalescents, phylogenetic trees.
  • Full site dependence: Mixture models
  • Sequential stochastic process: HMMs, changepoint models.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 2

slide-7
SLIDE 7

Biological motivation: Sequence variability

Y N Q S E D G S H T I Q I M Y G C D Y N Q S E A G S H T L Q R M Y G C D Y N Q S E A G S H I I Q R M Y G C D

◮ Suppose we are given aligned sequences. ◮ Interest in understanding sequence variability:

  • Functional properties, domains, ancestral inference, etc.

◮ Many simplifying assumptions in previous work:

  • Site independence: Kingman coalescents, phylogenetic trees.
  • Full site dependence: Mixture models
  • Sequential stochastic process: HMMs, changepoint models.

Our interest: sequences where these assumptions do not hold

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 2

slide-8
SLIDE 8

Biological motivation: Sequence variability

Y N Q S E D G S H T I Q I M Y G C D Y N Q S E A G S H T L Q R M Y G C D Y N Q S E A G S H I I Q R M Y G C D

◮ Suppose we are given aligned sequences. ◮ Interest in understanding sequence variability:

  • Functional properties, domains, ancestral inference, etc.

◮ Many simplifying assumptions in previous work:

  • Site independence: Kingman coalescents, phylogenetic trees.
  • Full site dependence: Mixture models
  • Sequential stochastic process: HMMs, changepoint models.

Our interest: sequences where these assumptions do not hold

◮ Partial, long-range site dependencies

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 2

slide-9
SLIDE 9

Example: MHC I proteins

Freeman and Company, 2007

◮ MHC I proteins present peptide chains to T-cell receptors.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 3

slide-10
SLIDE 10

Example: MHC I proteins

Freeman and Company, 2007

◮ MHC I proteins present peptide chains to T-cell receptors.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 3

slide-11
SLIDE 11

Example: MHC I proteins

Freeman and Company, 2007

◮ MHC I proteins present peptide chains to T-cell receptors.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 3

slide-12
SLIDE 12

Example: MHC I proteins

Freeman and Company, 2007

◮ MHC I proteins present peptide chains to T-cell receptors.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 3

slide-13
SLIDE 13

Example: MHC I proteins

Freeman and Company, 2007

◮ MHC I proteins present peptide chains to T-cell receptors. ◮ Peptides originating from virus protein ⇒ destruction of cell.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 3

slide-14
SLIDE 14

Example: MHC I proteins

Freeman and Company, 2007

◮ MHC I proteins present peptide chains to T-cell receptors. ◮ Peptides originating from virus protein ⇒ destruction of cell. ◮ Variability: duplication + mutation + fitness pressure.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 3

slide-15
SLIDE 15

Example: MHC I proteins

Freeman and Company, 2007

◮ MHC I proteins present peptide chains to T-cell receptors. ◮ Peptides originating from virus protein ⇒ destruction of cell. ◮ Variability: duplication + mutation + fitness pressure.

Our Interest: model sequence variability, not its origins.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 3

slide-16
SLIDE 16

Example: MHC I proteins

Freeman and Company, 2007 Fabian L. Wauthier: Nonparametric combinatorial sequence models, 4

slide-17
SLIDE 17

Example: MHC I proteins

Freeman and Company, 2007

◮ Binding site decomposes into pockets (Sidney et al., 2008)

Expect partial site linkage.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 4

slide-18
SLIDE 18

Example: MHC I proteins

Freeman and Company, 2007

◮ Binding site decomposes into pockets (Sidney et al., 2008)

Expect partial site linkage.

⇒ Full site (in)dependence inappropriate

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 4

slide-19
SLIDE 19

Example: MHC I proteins

Freeman and Company, 2007

◮ Binding site decomposes into pockets (Sidney et al., 2008)

Expect partial site linkage.

⇒ Full site (in)dependence inappropriate

◮ Variability due to evolutionary pressure on 3D binding site.

Variable sites are discontiguous ⇒ long-range dependencies.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 4

slide-20
SLIDE 20

Example: MHC I proteins

Freeman and Company, 2007

◮ Binding site decomposes into pockets (Sidney et al., 2008)

Expect partial site linkage.

⇒ Full site (in)dependence inappropriate

◮ Variability due to evolutionary pressure on 3D binding site.

Variable sites are discontiguous ⇒ long-range dependencies.

⇒ Markovian analysis inappropriate

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 4

slide-21
SLIDE 21

Our model: high level

Main idea: Each sequence is composed of smaller components.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 5

slide-22
SLIDE 22

Our model: high level

Main idea: Each sequence is composed of smaller components.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 5

slide-23
SLIDE 23

Our model: high level

Main idea: Each sequence is composed of smaller components.

  • 1. Sites grouped into discontiguous, aligned components (gray).

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 5

slide-24
SLIDE 24

Our model: high level

Main idea: Each sequence is composed of smaller components.

  • 1. Sites grouped into discontiguous, aligned components (gray).
  • 2. Components of a sequence assigned a PSSM (colors).

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 5

slide-25
SLIDE 25

Our model: high level

Main idea: Each sequence is composed of smaller components.

  • 1. Sites grouped into discontiguous, aligned components (gray).
  • 2. Components of a sequence assigned a PSSM (colors).
  • 3. Symbols sampled from assigned PSSMs.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 5

slide-26
SLIDE 26

Our model: high level

Main idea: Each sequence is composed of smaller components.

  • 1. Sites grouped into discontiguous, aligned components (gray).
  • 2. Components of a sequence assigned a PSSM (colors).
  • 3. Symbols sampled from assigned PSSMs.

C.f. Probabilistic index map (Jojic and Caspi, CVPR 2004; Jojic et al., UAI 2004)

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 5

slide-27
SLIDE 27

Missing information

Do not know how many site groups/PSSMs there are!

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 6

slide-28
SLIDE 28

Missing information

Do not know how many site groups/PSSMs there are!

◮ Our approach: put a prior distribution on these unknowns

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 6

slide-29
SLIDE 29

Missing information

Do not know how many site groups/PSSMs there are!

◮ Our approach: put a prior distribution on these unknowns ◮ Our model: A Chinese Restaurant Franchise (CRF)

conditioned on a Chinese Restaurant Process (CRP)

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 6

slide-30
SLIDE 30

Missing information

Do not know how many site groups/PSSMs there are!

◮ Our approach: put a prior distribution on these unknowns ◮ Our model: A Chinese Restaurant Franchise (CRF)

conditioned on a Chinese Restaurant Process (CRP)

  • 1. CRP: induces prior on number of site groups.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 6

slide-31
SLIDE 31

Missing information

Do not know how many site groups/PSSMs there are!

◮ Our approach: put a prior distribution on these unknowns ◮ Our model: A Chinese Restaurant Franchise (CRF)

conditioned on a Chinese Restaurant Process (CRP)

  • 1. CRP: induces prior on number of site groups.
  • 2. CRF: induces prior on number of PSSMs and shares them

among sequences.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 6

slide-32
SLIDE 32

Background: The Chinese Restaurant Process (CRP)

  • Culinary metaphor: Datapoints are customers; tables are clusters

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 7

slide-33
SLIDE 33

Background: The Chinese Restaurant Process (CRP)

  • Culinary metaphor: Datapoints are customers; tables are clusters

◮ First customer sits at the first table

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 7

slide-34
SLIDE 34

Background: The Chinese Restaurant Process (CRP)

  • Culinary metaphor: Datapoints are customers; tables are clusters

◮ First customer sits at the first table ◮ Subsequent customers

  • choose a table with probability proportional to the number of

customers sitting at it,

  • or with small probability open a new table.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 7

slide-35
SLIDE 35

Background: The Chinese Restaurant Process (CRP)

  • Culinary metaphor: Datapoints are customers; tables are clusters

◮ First customer sits at the first table ◮ Subsequent customers

  • choose a table with probability proportional to the number of

customers sitting at it,

  • or with small probability open a new table.

◮ Key point: The number of tables is random and inferred.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 7

slide-36
SLIDE 36

Background: The Chinese Restaurant Process (CRP)

  • Culinary metaphor: Datapoints are customers; tables are clusters

◮ First customer sits at the first table ◮ Subsequent customers

  • choose a table with probability proportional to the number of

customers sitting at it,

  • or with small probability open a new table.

◮ Key point: The number of tables is random and inferred. ◮ For us: Infer the number of site groups.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 7

slide-37
SLIDE 37

Background: The Chinese Restaurant Franchise (CRF)

◮ CRF = “multiple coupled CRPs”

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 8

slide-38
SLIDE 38

Background: The Chinese Restaurant Franchise (CRF)

◮ CRF = “multiple coupled CRPs” ◮ One restaurant per dataset. Customers seated by CRP rules.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 8

slide-39
SLIDE 39

Background: The Chinese Restaurant Franchise (CRF)

◮ CRF = “multiple coupled CRPs” ◮ One restaurant per dataset. Customers seated by CRP rules. ◮ Global menu of “dishes”

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 8

slide-40
SLIDE 40

Background: The Chinese Restaurant Franchise (CRF)

◮ CRF = “multiple coupled CRPs” ◮ One restaurant per dataset. Customers seated by CRP rules. ◮ Global menu of “dishes” ◮ Each newly opened table assigned a dish

  • First table assigned first dish in menu

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 8

slide-41
SLIDE 41

Background: The Chinese Restaurant Franchise (CRF)

◮ CRF = “multiple coupled CRPs” ◮ One restaurant per dataset. Customers seated by CRP rules. ◮ Global menu of “dishes” ◮ Each newly opened table assigned a dish

  • First table assigned first dish in menu
  • Subsequent tables

◮ assigned a dish with probability proportional to the number of

past tables that were assigned that dish,

◮ or with small probability assigned a new dish at random. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 8

slide-42
SLIDE 42

Background: The Chinese Restaurant Franchise (CRF)

◮ CRF = “multiple coupled CRPs” ◮ One restaurant per dataset. Customers seated by CRP rules. ◮ Global menu of “dishes” ◮ Each newly opened table assigned a dish

  • First table assigned first dish in menu
  • Subsequent tables

◮ assigned a dish with probability proportional to the number of

past tables that were assigned that dish,

◮ or with small probability assigned a new dish at random.

◮ Key point: The number of distinct dishes and sharing pattern

is random and inferred.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 8

slide-43
SLIDE 43

Background: The Chinese Restaurant Franchise (CRF)

◮ CRF = “multiple coupled CRPs” ◮ One restaurant per dataset. Customers seated by CRP rules. ◮ Global menu of “dishes” ◮ Each newly opened table assigned a dish

  • First table assigned first dish in menu
  • Subsequent tables

◮ assigned a dish with probability proportional to the number of

past tables that were assigned that dish,

◮ or with small probability assigned a new dish at random.

◮ Key point: The number of distinct dishes and sharing pattern

is random and inferred.

◮ For us: Infer the number of PSSMs and sharing pattern.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 8

slide-44
SLIDE 44

Our model: Cartoon

s1 s2 s3

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-45
SLIDE 45

Our model: Cartoon

s1 s2 s3

1 2 3 4 5 6 7

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-46
SLIDE 46

Our model: Cartoon

1 4 3 5 7 2 6

s1 s2 s3

1 2 3 4 5 6 7

CRP: site groups = “linkage”

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-47
SLIDE 47

Our model: Cartoon

CRF: secondary site grouping

1 4 3 5 7 2 6

s1 s2 s3

1 2 3 4 5 6 7

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-48
SLIDE 48

Our model: Cartoon

1 4 3 5 7 2 6

s1 s2 s3

1 2 3 4 5 6 7

Parameters

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-49
SLIDE 49

Our model: Cartoon

1 4 3 5 7 2 6

s1 s2 s3

1 2 3 4 5 6 7

Parameters

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-50
SLIDE 50

Our model: Cartoon

1 4 3 5 7 2 6

s1 s2 s3

1 2 3 4 5 6 7

Parameters

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-51
SLIDE 51

Our model: Cartoon

1 4 3 5 7 2 6

s1 s2 s3

1 2 3 4 5 6 7

Parameters

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-52
SLIDE 52

Our model: Cartoon

1 4 3 5 7 2 6

s1 s2 s3

1 2 3 4 5 6 7

Parameters

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-53
SLIDE 53

Our model: Cartoon

1 4 3 5 7 2 6

s1 s2 s3

1 2 3 4 5 6 7

Parameters = PSSMs

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-54
SLIDE 54

Our model: Cartoon

1 4 3 5 7 2 6

s1 s2

1 2 3 4 5 6 7

s3

A

Parameters = PSSMs

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-55
SLIDE 55

Our model: Cartoon

1 4 3 5 7 2 6

s1 s2

1 2 3 4 5 6 7

s3

A C

Parameters = PSSMs

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-56
SLIDE 56

Our model: Cartoon

1 4 3 5 7 2 6

s1 s2

1 2 3 4 5 6 7

s3

A C A

Parameters = PSSMs

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-57
SLIDE 57

Our model: Cartoon

1 4 3 5 7 2 6

s1 s2

1 2 3 4 5 6 7

s3

A C A T

Parameters = PSSMs

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-58
SLIDE 58

Our model: Cartoon

1 4 3 5 7 2 6

s1 s2

1 2 3 4 5 6 7

s3

A C A T A

Parameters = PSSMs

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-59
SLIDE 59

Our model: Cartoon

1 4 3 5 7 2 6

s1 s2

1 2 3 4 5 6 7

s3

A C A T A C

Parameters = PSSMs

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-60
SLIDE 60

Our model: Cartoon

1 4 3 5 7 2 6

s1 s2

1 2 3 4 5 6 7

s3

A C A T A C C

Parameters = PSSMs

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 9

slide-61
SLIDE 61

Inference

◮ Inference algorithm: collapsed Gibbs sampler.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 10

slide-62
SLIDE 62

Inference

◮ Inference algorithm: collapsed Gibbs sampler. ◮ Varying hyperparameters varies posterior model “complexity.”

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 10

slide-63
SLIDE 63

Inference

◮ Inference algorithm: collapsed Gibbs sampler. ◮ Varying hyperparameters varies posterior model “complexity.” ◮ Given posterior complexity, compare average model likelihood

with that of a mixture model with similar complexity.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 10

slide-64
SLIDE 64

Inference

◮ Inference algorithm: collapsed Gibbs sampler. ◮ Varying hyperparameters varies posterior model “complexity.” ◮ Given posterior complexity, compare average model likelihood

with that of a mixture model with similar complexity.

◮ Look at three datasets: blue = our model, red = mixture.

MHC I Flu KIR

4 6 8 −1.4 −1.2 −1 −0.8 −0.6 x 10

4

Complexity Loglik 3.5 4 4.5 5 5.5 −1500 −1000 −500 Complexity Loglik 10 20 30 −6000 −4000 −2000 Complexity Loglik

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 10

slide-65
SLIDE 65

Predicting phenotypic quantities

◮ Let binary vector mik encode latent variables of sequence si

for posterior sample k.

  • mik: which sequence position was assigned which PSSM

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 11

slide-66
SLIDE 66

Predicting phenotypic quantities

◮ Let binary vector mik encode latent variables of sequence si

for posterior sample k.

  • mik: which sequence position was assigned which PSSM

◮ Similar sequences have similar encodings; can use mi· to share

phenotype information.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 11

slide-67
SLIDE 67

Predicting phenotypic quantities

◮ Let binary vector mik encode latent variables of sequence si

for posterior sample k.

  • mik: which sequence position was assigned which PSSM

◮ Similar sequences have similar encodings; can use mi· to share

phenotype information.

◮ Example: binding affinities of MHC I proteins

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 11

slide-68
SLIDE 68

Predicting phenotypic quantities

◮ Let binary vector mik encode latent variables of sequence si

for posterior sample k.

  • mik: which sequence position was assigned which PSSM

◮ Similar sequences have similar encodings; can use mi· to share

phenotype information.

◮ Example: binding affinities of MHC I proteins

  • Peptide encoding pj, affinity yij

yij = p⊤

j Θkmik = trace(Θkmikp⊤ j ).

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 11

slide-69
SLIDE 69

Predicting phenotypic quantities

◮ Let binary vector mik encode latent variables of sequence si

for posterior sample k.

  • mik: which sequence position was assigned which PSSM

◮ Similar sequences have similar encodings; can use mi· to share

phenotype information.

◮ Example: binding affinities of MHC I proteins

  • Peptide encoding pj, affinity yij

yij = p⊤

j Θkmik = trace(Θkmikp⊤ j ).

  • Learn Θk for each sample k. Then average predictions.
  • Predict binding/non-binding peptides ⇒ AUC score

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 11

slide-70
SLIDE 70

Results

◮ Information transfer

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 12

slide-71
SLIDE 71

Results

◮ Information transfer

Method AUC Independent 0.8290 Transfer only 0.7285

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 12

slide-72
SLIDE 72

Results

◮ Information transfer

Method AUC Independent 0.8290 Transfer only 0.7285 Random 0.5

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 12

slide-73
SLIDE 73

Results

◮ Information transfer

Method AUC Independent 0.8290 Transfer only 0.7285 Random 0.5

◮ Compare with state of the art (SOA) (Peters et al., 2006):

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 12

slide-74
SLIDE 74

Results

◮ Information transfer

Method AUC Independent 0.8290 Transfer only 0.7285 Random 0.5

◮ Compare with state of the art (SOA) (Peters et al., 2006):

Method AUC MAP 0.8378 Averaging 0.8911

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 12

slide-75
SLIDE 75

Results

◮ Information transfer

Method AUC Independent 0.8290 Transfer only 0.7285 Random 0.5

◮ Compare with state of the art (SOA) (Peters et al., 2006):

Method AUC MAP 0.8378 Averaging 0.8911 SOA 0.85–0.91

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 12

slide-76
SLIDE 76

Results

◮ Information transfer

Method AUC Independent 0.8290 Transfer only 0.7285 Random 0.5

◮ Compare with state of the art (SOA) (Peters et al., 2006):

Method AUC MAP 0.8378 Averaging 0.8911 SOA 0.85–0.91

◮ Similar performance, but use only limited information:

no spatial proximity, chemical properties, interaction features, nonlinearities.

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 12

slide-77
SLIDE 77

Questions?

Fabian L. Wauthier: Nonparametric combinatorial sequence models, 13