Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek - - PowerPoint PPT Presentation

β–Ά
kai wei chang
SMART_READER_LITE
LIVE PREVIEW

Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek - - PowerPoint PPT Presentation

Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research Build an intelligent system that can interact with human using natural language Research challenge Meaning representation of text Support useful inferential tasks


slide-1
SLIDE 1

Kai-Wei Chang

Joint work with Scott Wen-tau Yih, Chris Meek

Microsoft Research

slide-2
SLIDE 2

Build an intelligent system that can interact with human using natural language Research challenge

Meaning representation of text Support useful inferential tasks

Semantic word representation is the foundation

Language is compositional Word is the basic semantic unit

slide-3
SLIDE 3

A lot of popular methods for creating word vectors!

Vector Space Model [Salton & McGill 83] Latent Semantic Analysis [Deerwester+ 90] Latent Dirichlet Allocation [Blei+ 01] Deep Neural Networks [Collobert & Weston 08]

Encode term co-occurrence information Measure semantic similarity well

slide-4
SLIDE 4

sunny rainy windy cloudy car wheel cab sad joy emotion feeling

slide-5
SLIDE 5

Tomorrow will be rainy. Tomorrow will be sunny.

π‘‘π‘—π‘›π‘—π‘šπ‘π‘ (rainy, sunny)? π‘π‘œπ‘’π‘π‘œπ‘§π‘›(rainy, sunny)?

slide-6
SLIDE 6

Can’t we just use the existing linguistic resources?

Knowledge in these resources is never complete Often lack of degree of relations

Create a continuous semantic representation that

Leverages existing rich linguistic resources Discovers new relations Enables us to measure the degree of multiple relations (not just similarity)

slide-7
SLIDE 7

Introduction Background Latent Semantic Analysis (LSA) Polarity Inducing LSA (PILSA) Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensor Tensor decomposition & measuring degree of a relation

Experiments

slide-8
SLIDE 8

Introduction Background Latent Semantic Analysis (LSA) Polarity Inducing LSA (PILSA) Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensor Tensor decomposition & measuring degree of a relation

Experiments

slide-9
SLIDE 9

Data representation

Encode single-relational data in a matrix

Co-occurrence (e.g., from a general corpus) Synonyms (e.g., from a thesaurus)

Factorization

Apply SVD to the matrix to find latent components

Measuring degree of relation

Cosine of latent vectors

slide-10
SLIDE 10

Cosine Score Input: Synonyms from a thesaurus Joyfulness: joy, gladden Sad: sorrow, sadden

joy gladden sorrow sadden goodwill Group 1: β€œjoyfulness” 1 1 Group 2: β€œsad” 1 1 Group 3: β€œaffection” 1

Target word: row- vector Term: column- vector

slide-11
SLIDE 11

SVD generalizes the original data

Uncovers relationships not explicit in the thesaurus Term vectors projected to 𝑙-dim latent space

Word similarity: cosine of two column vectors in πš»π–0

𝐗

𝐕

𝐖3

β‰ˆ

π‘’Γ—π‘œ 𝑒×𝑙 𝑙×𝑙 π‘™Γ—π‘œ

𝚻

terms

slide-12
SLIDE 12

LSA cannot distinguish antonyms [Landauer

2002]

β€œDistinguishing synonyms and antonyms is still perceived as a difficult open problem.” [Poon &

Domingos 09]

slide-13
SLIDE 13

Data representation

Encode two opposite relations in a matrix using β€œpolarity”

Synonyms & antonyms (e.g., from a thesaurus)

Factorization

Apply SVD to the matrix to find latent components

Measuring degree of relation

Cosine of latent vectors

slide-14
SLIDE 14

joy gladden sorrow sadden goodwill Group 1: β€œjoyfulness” 1 1

  • 1
  • 1

Group 2: β€œsad”

  • 1
  • 1

1 1 Group 3: β€œaffection” 1

Joyfulness: joy, gladden; sorrow, sadden Sad: sorrow, sadden; joy, gladden Inducing polarity Cosine Score: + π‘‡π‘§π‘œπ‘π‘œπ‘§π‘›π‘‘ Target word: row- vector

slide-15
SLIDE 15

joy gladden sorrow sadden goodwill Group 1: β€œjoyfulness” 1 1

  • 1
  • 1

Group 2: β€œsad”

  • 1
  • 1

1 1 Group 3: β€œaffection” 1

Joyfulness: joy, gladden; sorrow, sadden Sad: sorrow, sadden; joy, gladden Inducing polarity Target word: row- vector Cosine Score: βˆ’ π΅π‘œπ‘’π‘π‘œπ‘§π‘›π‘‘

slide-16
SLIDE 16

Limitation of the matrix representation

Each entry captures a particular type of relation between two entities, or Two opposite relations with the polarity trick

Encoding other binary relations

Is-A (hyponym) – ostrich is a bird Part-whole – engine is a part of car

Encode multiple relations in a 3-way tensor (3- dim array)!

slide-17
SLIDE 17

Data representation

Encode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

Factorization

Apply tensor decomposition to the tensor to find latent components

Measuring degree of relation

Cosine of latent vectors after projection

slide-18
SLIDE 18

Data representation

Encode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

Factorization

Apply tensor decomposition to the tensor to find latent components

Measuring degree of relation

Cosine of latent vectors after projection

slide-19
SLIDE 19

Data representation

Encode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

Factorization

Apply tensor decomposition to the tensor to find latent components

Measuring degree of relation

Cosine of latent vectors after projection

slide-20
SLIDE 20

Data representation

Encode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

Factorization

Apply tensor decomposition to the tensor to find latent components

Measuring degree of relation

Cosine of latent vectors after projection

slide-21
SLIDE 21

Data representation

Encode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

Factorization

Apply tensor decomposition to the tensor to find latent components

Measuring degree of relation

Cosine of latent vectors after projection

slide-22
SLIDE 22

Represent word relations using a tensor

Each slice encodes a relation between terms and target words.

1 1 joyfulness gladden sad anger joyfulness gladden sad anger Synonym layer Antonym layer 1 1 1 1 1

Construct a tensor with two slices

slide-23
SLIDE 23

Can encode multiple relations in the tensor

1 1 1 joyfulness gladden sad anger Hyponym layer 1 1 1 1 1 1 1 1 1 1

slide-24
SLIDE 24

Data representation

Encode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

Factorization

Apply tensor decomposition to the tensor to find latent components

Measuring degree of relation

Cosine of latent vectors after projection

slide-25
SLIDE 25

Derive a low-rank approximation to generalize the data and to discover unseen relations Apply Tucker decomposition and reformulate the results

π‘₯=, π‘₯?, … , π‘₯A 𝑒=, 𝑒?, … , 𝑒B

~ ~ Γ— Γ—

𝑒=, 𝑒?, … , 𝑒B

𝑠 𝑠 𝑠 𝑠

π‘₯=, π‘₯?, … , π‘₯A

latent representation of words

slide-26
SLIDE 26

π‘₯=, π‘₯?, … , π‘₯A 𝑒=, 𝑒?, … , 𝑒B

~ ~ Γ— Γ—

𝑒=, 𝑒?, … , 𝑒B

𝑠 𝑠 𝑠 𝑠

Derive a low-rank approximation to generalize the data and to discover unseen relations Apply Tucker decomposition and reformulate the results

~ ~ Γ— Γ—

𝑠 𝑠 𝑠

latent representation of words latent representation of a relation

slide-27
SLIDE 27

Data representation

Encode multiple relations in a tensor

Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

Factorization

Apply tensor decomposition to the tensor to find latent components

Measuring degree of relation

Cosine of latent vectors after projection

slide-28
SLIDE 28

Similarity

Cosine of the latent vectors

Other relation (both symmetric and asymmetric)

Take the latent matrix of the pivot relation (synonym) Take the latent matrix of the relation Cosine of the latent vectors after projection

slide-29
SLIDE 29

π‘π‘œπ‘’ joy, sadden = cos 𝓧:,joy,IJB, 𝓧:,sadden,KBL

1 1 joyfulness gladden sad anger joyfulness gladden sad anger Synonym layer Antonym layer 1 1 1 1 1

slide-30
SLIDE 30

π‘π‘œπ‘’ joy, sadden = cos 𝓧:,joy,IJB, 𝓧:,sadden,KBL

1 1 joyfulness gladden sad anger joyfulness gladden sad anger Synonym layer Antonym layer 1 1 1 1 1

slide-31
SLIDE 31

πΌπ‘§π‘žπ‘“π‘  joy, feeling = cos 𝑿:,joy,IJB, 𝑿:,feeling,QJRST

joyfulness gladden sad anger Synonym layer 1 1 1 1 1 1 1 joyfulness gladden sad anger Hypernym layer

slide-32
SLIDE 32

π‘ π‘“π‘š wU, wV = cos 𝑋

:,wX,IJB , 𝑋 :,wY,TSZ

wV wU

Synonym layer The slice of the specific relation

slide-33
SLIDE 33

Cos ( , ) ~ ~ Γ— Γ—

𝑀=, 𝑀?, … , 𝑀B

𝑠 𝑠 𝑠

~ ~ Γ— Γ— π‘ π‘“π‘š wU, wV = cos 𝑻:,:,IJB𝐖U,:

3, 𝑻:,:,TSZ𝐖 V,: 3

wV wU RIJB RTSZ vV vU

slide-34
SLIDE 34

Introduction Background Latent Semantic Analysis (LSA) Polarity Inducing LSA (PILSA) Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensor Tensor decomposition & measuring degree of a relation

Experiments

slide-35
SLIDE 35

Encarta Thesaurus

Record synonyms and antonyms of target words

Vocabulary of 50k terms and 47k target words

WordNet

Has synonym, antonym, hyponym, hypernym relations

Vocabulary of 149k terms and 117k target

words

Goals:

MRLSA generalizes LSA to model multiple relations

slide-36
SLIDE 36

Target High Score Words inanimat e alive, living, bodily, in-the-flesh, incarnate alleviate exacerbate, make-worse, in-flame, amplify, stir-up relish detest, abhor, abominate, despise, loathe * Words in blue are antonyms listed in the Encarta thesaurus.

slide-37
SLIDE 37

Task: GRE closest-opposite questions

Which is the closest opposite of adulterate?

(a) renounce (b) forbid (c) purify (d) criticize (e) correct

0.64 0.56 0.74 0.77 0.5 0.6 0.7 0.8

1 2 3 4

Accuracy

slide-38
SLIDE 38

Target High Score Words bird

  • strich, gamecock, nighthawk, amazon,

parrot automobil e minivan, wagon, taxi, minicab, gypsy cab vegetable buttercrunch, yellow turnip, romaine, chipotle, chilli

slide-39
SLIDE 39

Task: Class-Inclusion Relation (𝑍 is-a kind

  • f π‘Œ)

Most/least illustrative word pairs

(a) art:abstract (b) song:opera (c) footwear:boot (d) hair:brown

0.34 0.37 0.56 0.3 0.4 0.5 0.6

1 2 3

Accuracy

slide-40
SLIDE 40

Continuous semantic representation that

Leverages existing rich linguistic resources Discovers new relations Enables us to measure the degree of multiple relations

Approaches

Better data representation Matrix/Tensor decomposition

Challenges & Future Work

Capture more types of knowledge in the model Support more sophisticated inferential tasks