Learning Effective and Interpretable Semantic Models using - - PowerPoint PPT Presentation

learning effective and interpretable semantic models
SMART_READER_LITE
LIVE PREVIEW

Learning Effective and Interpretable Semantic Models using - - PowerPoint PPT Presentation

1 Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding (NNSE) Brian Murphy, Partha Talukdar, Tom Mitchell Machine Learning Department Carnegie Mellon University 2 Distributional Semantic Modeling 2


slide-1
SLIDE 1

Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding (NNSE)

Brian Murphy, Partha Talukdar, Tom Mitchell

Machine Learning Department Carnegie Mellon University

1

slide-2
SLIDE 2

Distributional Semantic Modeling

2

slide-3
SLIDE 3

Distributional Semantic Modeling

  • Words are represented in a high dimensional

vector space

2

slide-4
SLIDE 4

Distributional Semantic Modeling

  • Words are represented in a high dimensional

vector space

  • Long history:
  • (Deerwester et al., 1990), (Lin, 1998), (Turney, 2006), ...

2

slide-5
SLIDE 5

Distributional Semantic Modeling

  • Words are represented in a high dimensional

vector space

  • Long history:
  • (Deerwester et al., 1990), (Lin, 1998), (Turney, 2006), ...
  • Although effective, these models are often not

interpretable

2

slide-6
SLIDE 6

Distributional Semantic Modeling

  • Words are represented in a high dimensional

vector space

  • Long history:
  • (Deerwester et al., 1990), (Lin, 1998), (Turney, 2006), ...
  • Although effective, these models are often not

interpretable

2

Examples of top 5 words from 5 randomly chosen dimensions from SVD300

slide-7
SLIDE 7

Why interpretable dimensions?

Semantic Decoding: (Mitchell et al., Science 2008)

3

slide-8
SLIDE 8

Why interpretable dimensions?

Semantic Decoding: (Mitchell et al., Science 2008)

“motorbike” Input stimulus word

3

slide-9
SLIDE 9

Why interpretable dimensions?

Semantic Decoding: (Mitchell et al., Science 2008)

“motorbike” Semantic representation

(0.87, ride) (0.29, see) (0.00, rub) (0.00, taste)

. . .

Input stimulus word

3

slide-10
SLIDE 10

Why interpretable dimensions?

Semantic Decoding: (Mitchell et al., Science 2008)

“motorbike” Semantic representation Mapping learned from fMRI data

(0.87, ride) (0.29, see) (0.00, rub) (0.00, taste)

. . .

Input stimulus word

3

slide-11
SLIDE 11

Why interpretable dimensions?

Semantic Decoding: (Mitchell et al., Science 2008)

“motorbike” predicted activity for “motorbike” Semantic representation Mapping learned from fMRI data

(0.87, ride) (0.29, see) (0.00, rub) (0.00, taste)

. . .

Input stimulus word

3

slide-12
SLIDE 12

Why interpretable dimensions?

(Mitchell et al., Science 2008)

4

slide-13
SLIDE 13

Why interpretable dimensions?

(Mitchell et al., Science 2008)

4

slide-14
SLIDE 14

Why interpretable dimensions?

(Mitchell et al., Science 2008)

  • Interpretable dimension reveals insightful brain

activation patterns!

4

slide-15
SLIDE 15

Why interpretable dimensions?

(Mitchell et al., Science 2008)

  • Interpretable dimension reveals insightful brain

activation patterns!

  • But, features in the semantic representation

were based on 25 hand-selected verbs

4

slide-16
SLIDE 16

Why interpretable dimensions?

(Mitchell et al., Science 2008)

  • Interpretable dimension reveals insightful brain

activation patterns!

  • But, features in the semantic representation

were based on 25 hand-selected verbs

  • can’t represent arbitrary concepts

4

slide-17
SLIDE 17

Why interpretable dimensions?

(Mitchell et al., Science 2008)

  • Interpretable dimension reveals insightful brain

activation patterns!

  • But, features in the semantic representation

were based on 25 hand-selected verbs

  • can’t represent arbitrary concepts
  • need data-driven, broad coverage semantic representations

4

slide-18
SLIDE 18

What is an Interpretable, Cognitively-plausible Representation?

5

slide-19
SLIDE 19

What is an Interpretable, Cognitively-plausible Representation?

words features

5

slide-20
SLIDE 20

What is an Interpretable, Cognitively-plausible Representation?

words features

... 0 0 0 ... 1.2 ...

w

5

slide-21
SLIDE 21

What is an Interpretable, Cognitively-plausible Representation?

words features

... 0 0 0 ... 1.2 ...

w

  • 1. Compact representation:

Sparse, many zeros

5

slide-22
SLIDE 22

What is an Interpretable, Cognitively-plausible Representation?

words features

... 0 0 0 ... 1.2 ...

w

  • 2. Uneconomical to store negative

(or inferable) characteristics: Non-Negative

  • 1. Compact representation:

Sparse, many zeros

5

slide-23
SLIDE 23

What is an Interpretable, Cognitively-plausible Representation?

words features

  • 3. Meaningful Dimensions:

Coherent

... 0 0 0 ... 1.2 ...

w

  • 2. Uneconomical to store negative

(or inferable) characteristics: Non-Negative

  • 1. Compact representation:

Sparse, many zeros

5

slide-24
SLIDE 24

Properties of Different Semantic Representations

Broad Coverage Effective Sparse Non- Negative Coherent

6

slide-25
SLIDE 25

Properties of Different Semantic Representations

Broad Coverage Effective Sparse Non- Negative Coherent

Hand Tuned

6

slide-26
SLIDE 26

Properties of Different Semantic Representations

Broad Coverage Effective Sparse Non- Negative Coherent

Corpus-derived (existing) Hand Tuned

6

slide-27
SLIDE 27

Properties of Different Semantic Representations

Broad Coverage Effective Sparse Non- Negative Coherent

Corpus-derived (existing) Hand Tuned

6

Hand Coded Corpus-derived 83.1 83.5

(Murphy, Talukdar, Mitchell, StarSem 2012)

Prediction accuracy (on Neurosemantic Decoding)

slide-28
SLIDE 28

Properties of Different Semantic Representations

Broad Coverage Effective Sparse Non- Negative Coherent

Corpus-derived (existing) Hand Tuned

6

slide-29
SLIDE 29

Properties of Different Semantic Representations

Broad Coverage Effective Sparse Non- Negative Coherent

Corpus-derived (existing) NNSE Our proposal Hand Tuned

6

slide-30
SLIDE 30

Properties of Different Semantic Representations

Broad Coverage Effective Sparse Non- Negative Coherent

Corpus-derived (existing) NNSE Our proposal Hand Tuned

6

slide-31
SLIDE 31

Non-Negative Sparse Embedding (NNSE)

7

slide-32
SLIDE 32

X Non-Negative Sparse Embedding (NNSE)

Input Matrix (Corpus Cooc + SVD)

7

slide-33
SLIDE 33

X Non-Negative Sparse Embedding (NNSE)

wi

input representation for word wi Input Matrix (Corpus Cooc + SVD)

7

slide-34
SLIDE 34

X Non-Negative Sparse Embedding (NNSE)

wi

input representation for word wi

A

= x

D

Input Matrix (Corpus Cooc + SVD)

7

slide-35
SLIDE 35

X Non-Negative Sparse Embedding (NNSE)

wi

input representation for word wi

A

= x

D

Basis

Input Matrix (Corpus Cooc + SVD) NNSE representation for word wi

7

slide-36
SLIDE 36

X Non-Negative Sparse Embedding (NNSE)

wi

input representation for word wi

A

= x

D

Basis

  • matrix A is non-negative

Input Matrix (Corpus Cooc + SVD) NNSE representation for word wi

7

slide-37
SLIDE 37

X Non-Negative Sparse Embedding (NNSE)

wi

input representation for word wi

A

= x

D

Basis

  • matrix A is non-negative
  • sparsity penalty on the rows of A

Input Matrix (Corpus Cooc + SVD) NNSE representation for word wi

7

slide-38
SLIDE 38

X Non-Negative Sparse Embedding (NNSE)

wi

input representation for word wi

A

= x

D

Basis

  • matrix A is non-negative
  • sparsity penalty on the rows of A
  • alternating minimization between A and D,

using SPAMS package

Input Matrix (Corpus Cooc + SVD) NNSE representation for word wi

7

slide-39
SLIDE 39

NNSE Optimization X A

= x

D

8

slide-40
SLIDE 40

Experiments

9

slide-41
SLIDE 41

Experiments

  • Three main questions

9

slide-42
SLIDE 42

Experiments

  • Three main questions
  • 1. Are NNSE representations effective in practice?

9

slide-43
SLIDE 43

Experiments

  • Three main questions
  • 1. Are NNSE representations effective in practice?
  • 2. What is the degree of sparsity of NNSE?

9

slide-44
SLIDE 44

Experiments

  • Three main questions
  • 1. Are NNSE representations effective in practice?
  • 2. What is the degree of sparsity of NNSE?
  • 3. Are NNSE dimensions coherent?

9

slide-45
SLIDE 45

Experiments

  • Three main questions
  • 1. Are NNSE representations effective in practice?
  • 2. What is the degree of sparsity of NNSE?
  • 3. Are NNSE dimensions coherent?
  • Setup

9

slide-46
SLIDE 46

Experiments

  • Three main questions
  • 1. Are NNSE representations effective in practice?
  • 2. What is the degree of sparsity of NNSE?
  • 3. Are NNSE dimensions coherent?
  • Setup
  • partial ClueWeb09, 16bn tokens, 540m sentences,

50m documents

9

slide-47
SLIDE 47

Experiments

  • Three main questions
  • 1. Are NNSE representations effective in practice?
  • 2. What is the degree of sparsity of NNSE?
  • 3. Are NNSE dimensions coherent?
  • Setup
  • partial ClueWeb09, 16bn tokens, 540m sentences,

50m documents

  • dependency parsed using Malt parser

9

slide-48
SLIDE 48

Baseline Representation: SVD

10

slide-49
SLIDE 49

Baseline Representation: SVD

  • For about 35k words (~adult vocabulary), extract

10

slide-50
SLIDE 50

Baseline Representation: SVD

  • For about 35k words (~adult vocabulary), extract
  • document co-occurrence

10

slide-51
SLIDE 51

Baseline Representation: SVD

  • For about 35k words (~adult vocabulary), extract
  • document co-occurrence
  • dependency features from the parsed corpus

10

slide-52
SLIDE 52

Baseline Representation: SVD

  • For about 35k words (~adult vocabulary), extract
  • document co-occurrence
  • dependency features from the parsed corpus
  • Reduce dimensionality using SVD. Subsets of this

reduced dimensional space is the baseline

10

slide-53
SLIDE 53

Baseline Representation: SVD

  • For about 35k words (~adult vocabulary), extract
  • document co-occurrence
  • dependency features from the parsed corpus
  • Reduce dimensionality using SVD. Subsets of this

reduced dimensional space is the baseline

  • This is also the input (X) to NNSE

10

slide-54
SLIDE 54

Baseline Representation: SVD

  • For about 35k words (~adult vocabulary), extract
  • document co-occurrence
  • dependency features from the parsed corpus
  • Reduce dimensionality using SVD. Subsets of this

reduced dimensional space is the baseline

  • This is also the input (X) to NNSE
  • Other representations were also compared (e.g.,

LDA, Collobert and Weston, etc.), details in the paper

10

slide-55
SLIDE 55

Is NNSE effective in Neurosemantic Decoding?

11

slide-56
SLIDE 56

Is NNSE effective in Neurosemantic Decoding?

11

slide-57
SLIDE 57

Is NNSE effective in Neurosemantic Decoding?

NNSE has similar peak performance as SVD

11

slide-58
SLIDE 58

Does NNSE result in sparse semantic representation?

12

slide-59
SLIDE 59

Does NNSE result in sparse semantic representation?

12

slide-60
SLIDE 60

Does NNSE result in sparse semantic representation?

  • NNSE is significantly sparser than SVD

12

slide-61
SLIDE 61

Does NNSE result in sparse semantic representation?

  • NNSE is significantly sparser than SVD
  • Words per dimension is significantly lower in NNSE

12

slide-62
SLIDE 62

Does NNSE result in sparse semantic representation?

  • NNSE is significantly sparser than SVD
  • Words per dimension is significantly lower in NNSE
  • Growth in active dimensions per word is sub-linear

in NNSE

12

slide-63
SLIDE 63

Are NNSE Dimensions Coherent?

13

slide-64
SLIDE 64

Are NNSE Dimensions Coherent?

  • Word Intrusion Task (Boyd-Graber et al., NIPS 2009)

13

slide-65
SLIDE 65

Are NNSE Dimensions Coherent?

  • Word Intrusion Task (Boyd-Graber et al., NIPS 2009)
  • for each dimension, select top ranked N (N=5) words

13

slide-66
SLIDE 66

Are NNSE Dimensions Coherent?

  • Word Intrusion Task (Boyd-Graber et al., NIPS 2009)
  • for each dimension, select top ranked N (N=5) words
  • intrude it with a low ranking word from this dimension

13

slide-67
SLIDE 67

Are NNSE Dimensions Coherent?

  • Word Intrusion Task (Boyd-Graber et al., NIPS 2009)
  • for each dimension, select top ranked N (N=5) words
  • intrude it with a low ranking word from this dimension
  • intruding word should be high ranking in another dimension

13

slide-68
SLIDE 68

Are NNSE Dimensions Coherent?

  • Word Intrusion Task (Boyd-Graber et al., NIPS 2009)
  • for each dimension, select top ranked N (N=5) words
  • intrude it with a low ranking word from this dimension
  • intruding word should be high ranking in another dimension
  • ask a human to identify the intruding word

13

slide-69
SLIDE 69

Are NNSE Dimensions Coherent?

  • Word Intrusion Task (Boyd-Graber et al., NIPS 2009)
  • for each dimension, select top ranked N (N=5) words
  • intrude it with a low ranking word from this dimension
  • intruding word should be high ranking in another dimension
  • ask a human to identify the intruding word
  • repeat multiple times for each dimension, calculate precision

13

slide-70
SLIDE 70

Are NNSE Dimensions Coherent?

  • Word Intrusion Task (Boyd-Graber et al., NIPS 2009)
  • for each dimension, select top ranked N (N=5) words
  • intrude it with a low ranking word from this dimension
  • intruding word should be high ranking in another dimension
  • ask a human to identify the intruding word
  • repeat multiple times for each dimension, calculate precision
  • An intruded set from an NNSE1000 dimension

13

slide-71
SLIDE 71

Are NNSE Dimensions Coherent?

  • Word Intrusion Task (Boyd-Graber et al., NIPS 2009)
  • for each dimension, select top ranked N (N=5) words
  • intrude it with a low ranking word from this dimension
  • intruding word should be high ranking in another dimension
  • ask a human to identify the intruding word
  • repeat multiple times for each dimension, calculate precision
  • An intruded set from an NNSE1000 dimension

intruder

13

slide-72
SLIDE 72

Are NNSE Dimensions Coherent?

14

slide-73
SLIDE 73

Are NNSE Dimensions Coherent?

14

slide-74
SLIDE 74

Are NNSE Dimensions Coherent?

NNSE dimensions are significantly more coherent than SVD-based dimensions

14

slide-75
SLIDE 75

SVD and NNSE Dimensions: Examples

Examples of top 5 words from 5 randomly chosen dimensions from SVD300 and NNSE1000

15

slide-76
SLIDE 76

SVD and NNSE Dimensions: Examples

Examples of top 5 words from 5 randomly chosen dimensions from SVD300 and NNSE1000

NNSE dimensions are significantly more coherent than SVD-based dimensions

15

slide-77
SLIDE 77

Top 5 NNSE1000 Dimensions for “apple”

16

slide-78
SLIDE 78

Top 5 NNSE1000 Dimensions for “apple”

0.40 raspberry, peach, pear, mango, melon 0.26 ripper, aac, converter, vcd, rm 0.14 cpu, intel, mips, pentium, risc 0.13 motorola, lg, samsung, vodafone, alcatel 0.11 peaches, apricots, pears, cherries, blueberries

16

slide-79
SLIDE 79

Top 5 NNSE1000 Dimensions for “apple”

0.40 raspberry, peach, pear, mango, melon 0.26 ripper, aac, converter, vcd, rm 0.14 cpu, intel, mips, pentium, risc 0.13 motorola, lg, samsung, vodafone, alcatel 0.11 peaches, apricots, pears, cherries, blueberries

Fruit Music Processor Company Fruit

16

slide-80
SLIDE 80

Top 5 NNSE1000 Dimensions for “apple”

0.40 raspberry, peach, pear, mango, melon 0.26 ripper, aac, converter, vcd, rm 0.14 cpu, intel, mips, pentium, risc 0.13 motorola, lg, samsung, vodafone, alcatel 0.11 peaches, apricots, pears, cherries, blueberries

Fruit Music Processor Company Fruit

Different senses of the word are not mixed, each dimension corresponds to only

  • ne sense of “apple”!

16

slide-81
SLIDE 81

Conclusion

17

slide-82
SLIDE 82

Conclusion

  • Non-Negative Sparse Embedding (NNSE)
  • broad coverage, sparse, non-negative
  • interpretable, effective in practice
  • probably first semantic model with all these

desirable traits

17

slide-83
SLIDE 83

Conclusion

  • Non-Negative Sparse Embedding (NNSE)
  • broad coverage, sparse, non-negative
  • interpretable, effective in practice
  • probably first semantic model with all these

desirable traits

  • Exploited large text corpus, including deep

linguistic features (e.g., dependency parses)

17

slide-84
SLIDE 84

Conclusion

  • Non-Negative Sparse Embedding (NNSE)
  • broad coverage, sparse, non-negative
  • interpretable, effective in practice
  • probably first semantic model with all these

desirable traits

  • Exploited large text corpus, including deep

linguistic features (e.g., dependency parses)

  • Future work
  • multi-word extension; using NNSE representations

in non-neurosemantic domains (e.g., NELL)

17

slide-85
SLIDE 85

Acknowledgment

  • Khaled El-Arini (CMU) and Min Xu (CMU)
  • Sheshadri Sridharan (CMU), Marco Baroni

(Trento)

  • Justin Betteridge (CMU), Jamie Callan (CMU)
  • DARPA, Google, NIH

18

slide-86
SLIDE 86

Thank You!

NNSE embeddings available at: http://www.cs.cmu.edu/~bmurphy/NNSE/

19