What do you mean? Inferring Word Meaning Using Computer Vision - - PowerPoint PPT Presentation

what do you mean inferring word meaning using computer
SMART_READER_LITE
LIVE PREVIEW

What do you mean? Inferring Word Meaning Using Computer Vision - - PowerPoint PPT Presentation

What do you mean? Inferring Word Meaning Using Computer Vision Shibamouli Lahiri Original paper Multimodal Distributional Semantics Bruni, et al. (JAIR 2014) 2 Authors Elia Bruni Nam Khanh Marco Baroni Tran 3


slide-1
SLIDE 1

What do you mean? Inferring Word Meaning Using Computer Vision

Shibamouli Lahiri

slide-2
SLIDE 2

Original paper

➔ Multimodal Distributional Semantics

◆ Bruni, et al. (JAIR 2014)

2

slide-3
SLIDE 3

Authors

Elia Bruni Nam Khanh Marco Baroni Tran

3

slide-4
SLIDE 4

What does this word mean?

অথਐ

4

slide-5
SLIDE 5

What does this word mean?

অথਐ means “meaning” in Bengali.

5

slide-6
SLIDE 6

What does this word mean?

★ অথਐ means “meaning” in Bengali. ★ It also means “money” or “wealth”.

6

slide-7
SLIDE 7

The importance of “grounding”

ওবামা বুশ ি੹ন

7

slide-8
SLIDE 8

The importance of “grounding”

ওবামা বুশ ি੹ন

8

slide-9
SLIDE 9

The importance of “grounding”

ওবামা = আেমিরকার ৪৪তম রাਵপিত

9

slide-10
SLIDE 10

The importance of “grounding”

Topic 1: state agent control set system systems states event learning model action problem agents task actions time algorithm knowledge events figure Topic 2: optimal problem function time probability set information game strategy model distribution case algorithm section number random cost theorem vol matrix Topic 3: data information learning features set work text language word number analysis words results table based research social semantic web system Topic 4: design circuit gate logic test delay input circuits fault gates error simulation number timing placement faults figure analysis techniques model Topic 5: system user data systems security users file time server application software information network applications key design mobile process access interface

10

slide-11
SLIDE 11

Grounding in the brain

  • “kick” vs “lick”

○ Pulvermueller, 2005

11

slide-12
SLIDE 12

Distributed vs Perceptual

tropical fruit edible yellow peel smooth

12

slide-13
SLIDE 13

Origins of Meaning Representation “You shall know a word by the company it keeps.”

13

slide-14
SLIDE 14

Origins of Meaning Representation “The individual words in language name objects... It is the object for which the word stands.”

14

slide-15
SLIDE 15

Can we combine them? Yes!

car automobile vehicle + =

15

slide-16
SLIDE 16

Background

Distributional Perceptual Words Visual words BoW BoVW Documents Images

16

slide-17
SLIDE 17

Background

Distributional Perceptual Words Visual words BoW BoVW Documents Images car automobile vehicle

17

slide-18
SLIDE 18

Visual words

18

slide-19
SLIDE 19

Overview

19

slide-20
SLIDE 20

Overview

⃪ SVD on joint matrix ⃪ Feature-level fusion ⃪ Scoring-level fusion

20

slide-21
SLIDE 21

Overview

21

slide-22
SLIDE 22

Text matrix

❏ Corpora: ukWaC and Wackypedia

❏ 1.9B and 820M tokens. ❏ Both lemmatized and POS-tagged.

❏ Terms: most frequent 20K nouns, 5K adjectives, 5K verbs

❏ Adjustment leads to 20,515 terms.

❏ Context: Window2 and Window20 ❏ Association Score: non-negative local mutual information (LMI) ❏ Matrix: 20,515 rows, 30K columns

22

slide-23
SLIDE 23

Non-negative LMI

23

slide-24
SLIDE 24

Image matrix

❏ Corpus: ESP-Game dataset

❏ 100K tagged images, 14 tags on average. ❏ 20,515 distinct word types.

❏ Terms: 20,515 words ❏ Context: Visual words ✕ spatial bins

❏ 5K visual words ❏ 16 spatial bins

❏ Association Score: non-negative LMI

❏ word associated with images labeled with it ❏ co-occurrence counts are summed

❏ Matrix: 20,515 rows, 80K columns

24

slide-25
SLIDE 25

ESP-Game images

25

slide-26
SLIDE 26

Image matrix construction

★ Identify “keypoints”. ★ 128 SIFT features for each keypoint.

○ 4 ✕ 4 sampling regions ✕ 8 orientations ○ Average across three channels (HSV)

★ Cluster all keypoints from all images.

○ 5,000 clusters with k-means ○ Cluster centers are “visual words”

★ Image representation ○ Vector of “term frequency” on visual words ★ 4 ✕ 4 spatial binning

26

slide-27
SLIDE 27

Visual words

27

slide-28
SLIDE 28

Image matrix construction

28

slide-29
SLIDE 29

Overview

29

slide-30
SLIDE 30

Latent Multimodal Mixing

➔ Singular value decomposition (SVD): ➔ Low-rank approximation:

30

slide-31
SLIDE 31

Overview

31

slide-32
SLIDE 32

Multimodal Similarity

➢ Goal: Measure similarity between word pairs ➢ Similarity function: Cosine on latent vectors ➢ Feature-level fusion: F = Ft ⊕ (1-)Fv ➢ Scoring-level fusion: S = St + (1-)Sv

32

slide-33
SLIDE 33

Fusion options

k = r < r FL FL SL SL

= 1 Text only = 0.5 NaiveFL (Bruni et al., 2011) = 0 Image only = 1 Text only = 0.5 NaiveSL (Leong & Mihalcea, 2011) = 0 Image only = 1 Textmixed ∈ (0, 1) TunedFL = 0 Imagemixed = 1 Textmixed ∈ (0, 1) TunedSL = 0 Imagemixed

33

slide-34
SLIDE 34

Experiments: Overview

➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization

34

slide-35
SLIDE 35

Experiments: Overview

➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization

35

slide-36
SLIDE 36

Semantic Relations

★ Goal is to explore which word relations are best captured by which model.

36

slide-37
SLIDE 37

Semantic Relations

★ Goal is to explore which word relations are best captured by which model. ★ BLESS benchmark (Baroni and Lenci, 2011) ★ 184 pivot words denoting concrete concepts. ★ For each pivot, there are related words (relata):

○ COORD (co-hyponym: alligator-lizard) ○ HYPER (hypernym: alligator-reptile) ○ MERO (meronym: alligator-teeth) ○ ATTRI (attribute: alligator-ferocious) ○ EVENT (alligator-swim) ○ RAN.N (random noun: alligator-trombone) ○ RAN.J (random adjective: alligator-electronic) ○ RAN.V (random verb: alligator-conclude)

37

slide-38
SLIDE 38

Semantic Relations

★ Goal is to explore which word relations are best captured by which model. ★ BLESS benchmark (Baroni and Lenci, 2011) ★ 184 pivot words denoting concrete concepts. ★ For each pivot, there are related words (relata):

○ COORD (co-hyponym: alligator-lizard) ○ HYPER (hypernym: alligator-reptile) ○ MERO (meronym: alligator-teeth) ○ ATTRI (attribute: alligator-ferocious) ○ EVENT (alligator-swim) ○ RAN.N (random noun: alligator-trombone) ○ RAN.J (random adjective: alligator-electronic) ○ RAN.V (random verb: alligator-conclude)

★ Represent pivots and relata with text and image vectors. ★ Pick relatum with highest cosine for each relation. ★ Convert cosines to z-scores.

38

slide-39
SLIDE 39

Semantic Relations

39

slide-40
SLIDE 40

Semantic Relations

40

Pivot Text Image Pivot Text Image cabbage leafy white helicopter heavy

  • ld

carrot fresh

  • range
  • nion

fresh white cherry ripe red

  • ven

electric new deer wild brown plum juicy red dishwasher electric white sofa comfortable

  • ld

elephant wild white sparrow wild little glider heavy white stove electric hot gorilla wild black tanker heavy grey hat white

  • ld

toaster electric new hatchet sharp short trout fresh

  • ld
slide-41
SLIDE 41

Experiments: Overview

➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization

41

slide-42
SLIDE 42

Word Relatedness

★ Goal is to predict relatedness between two words.

42

slide-43
SLIDE 43

Word Relatedness

★ Goal is to predict relatedness between two words. ★ WS (WordSim353) and MEN (Marco-Elia-Nam) benchmarks. ★ WS has 353 similarity-rated word pairs.

○ 252 were used in this study.

★ MEN has 3,000 similarity-rated word pairs. ○

Similarity scores obtained from Mechanical Turk. ○ 2,000 development pairs. ○ 1,000 test pairs.

43

slide-44
SLIDE 44

Word Relatedness

★ Goal is to predict relatedness between two words. ★ WS (WordSim353) and MEN (Marco-Elia-Nam) benchmarks. ★ WS has 353 similarity-rated word pairs.

○ 252 were used in this study.

★ MEN has 3,000 similarity-rated word pairs. ○

Similarity scores obtained from Mechanical Turk. ○ 2,000 development pairs. ○ 1,000 test pairs.

★ Models evaluated by correlation between human similarity and cosine similarity (of word pairs).

44

slide-45
SLIDE 45

Word Relatedness (Spearman)

45

Window2 Window20 Model MEN WS MEN WS Text 0.73 0.70 0.68 0.70 Image 0.43 0.36 0.43 0.36 NaiveFL 0.75 0.67 0.73 0.67 NaiveSL 0.76 0.69 0.74 0.64 MixLDA (Feng and Lapata, 2010) 0.30 0.23 0.30 0.23 Textmixed 0.77 0.73 0.74 0.75 Imagemixed 0.55 0.52 0.57 0.51 TunedFL 0.78 0.72 0.76 0.75 TunedSL 0.78 0.71 0.77 0.72

slide-46
SLIDE 46

Word Relatedness (Spearman)

46

Window2 Window20 Model MEN WS MEN WS Text 0.73 0.70 0.68 0.70 Image 0.43 0.36 0.43 0.36 NaiveFL 0.75 0.67 0.73 0.67 NaiveSL 0.76 0.69 0.74 0.64 MixLDA (Feng and Lapata, 2010) 0.30 0.23 0.30 0.23 Textmixed 0.77 0.73 0.74 0.75 Imagemixed 0.55 0.52 0.57 0.51 TunedFL 0.78 0.72 0.76 0.75 TunedSL 0.78 0.71 0.77 0.72

slide-47
SLIDE 47

Word Relatedness (Pearson)

47

MixLDA (Feng and Lapata, 2010) 0.32 Window2 Window20 Textmixed 0.47 0.49 TunedFL 0.46 0.49 TunedSL 0.46 0.47

slide-48
SLIDE 48

Qualitative Analysis

48

Text (Window20) TunedFL dawn - dusk pet - puppy sunrise - sunset candy - chocolate canine - dog paw - pet grape - wine bicycle - bike foliage - plant apple - cherry foliage - petal copper - metal skyscraper - tall military - soldier cat - feline paws - whiskers pregnancy - pregnant stream - waterfall misty - rain cheetah - lion

slide-49
SLIDE 49

Experiments: Overview

➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization

49

slide-50
SLIDE 50

Concrete vs Abstract Words

★ Goal is to see which model performs better on concrete/abstract words.

50

slide-51
SLIDE 51

Concrete vs Abstract Words

★ Goal is to see which model performs better on concrete/abstract words. ★ MEN test set (1,000 word pairs) divided into:

○ MEN-conc, 837 concrete word pairs (arm-bicycle) ○ MEN-abst, 163 abstract and mixed word pairs (fun-relax, design-orange)

51

slide-52
SLIDE 52

Concrete vs Abstract Words

★ Goal is to see which model performs better on concrete/abstract words. ★ MEN test set (1,000 word pairs) divided into:

○ MEN-conc, 837 concrete word pairs (arm-bicycle) ○ MEN-abst, 163 abstract and mixed word pairs (fun-relax, design-orange)

★ Abstractness determined by an algorithm (Turney et al., 2011).

○ Concrete and abstract paradigm words ○ Co-occurrence counts of words with paradigm words ○ Converted to Pointwise Mutual Information ○ (Word, Paradigm word) matrix smoothed by SVD ○ Abstractness score of a word = sum(similarities with concrete words) - sum(similarities with abstract words) ○ If abstractness score ≤ 0.5, MEN-abst; else, MEN-conc.

52

slide-53
SLIDE 53

Concrete vs Abstract Words

★ Goal is to see which model performs better on concrete/abstract words. ★ MEN test set (1,000 word pairs) divided into:

○ MEN-conc, 837 concrete word pairs (arm-bicycle) ○ MEN-abst, 163 abstract and mixed word pairs (fun-relax, design-orange)

★ Abstractness determined by an algorithm (Turney et al., 2011).

○ Concrete and abstract paradigm words ○ Co-occurrence counts of words with paradigm words ○ Converted to Pointwise Mutual Information ○ (Word, Paradigm word) matrix smoothed by SVD ○ Abstractness score of a word = sum(similarities with concrete words) - sum(similarities with abstract words) ○ If abstractness score ≤ 0.5, MEN-abst; else, MEN-conc.

★ Models evaluated by correlation between human similarity and cosine similarity (of word pairs).

53

slide-54
SLIDE 54

Evaluation (Spearman)

54

Model MEN-conc MEN-abst MEN-full Window20 0.70 0.51 0.68 Image 0.47 0.37 0.43 TunedFL 0.78 0.52 0.76

slide-55
SLIDE 55

Experiments: Overview

➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization

55

slide-56
SLIDE 56

Concept Categorization

★ Goal is to cluster words that are conceptually similar.

56

slide-57
SLIDE 57

Concept Categorization

★ Goal is to cluster words that are conceptually similar. ★ Battig benchmark for tuning (Baroni et al., 2010). ○ 83 concepts from 10 concrete categories. ○ 77 were used in this study. ★ AP benchmark for testing (Almuhareb and Poesio, 2005). ○ 402 nouns from 21 WordNet classes. ○ 231 were used in this study. ○ Harder dataset than Battig (casuarina - samba).

57

slide-58
SLIDE 58

Concept Categorization

★ Goal is to cluster words that are conceptually similar. ★ Battig benchmark for tuning (Baroni et al., 2010). ○ 83 concepts from 10 concrete categories. ○ 77 were used in this study. ★ AP benchmark for testing (Almuhareb and Poesio, 2005). ○ 402 nouns from 21 WordNet classes. ○ 231 were used in this study. ○ Harder dataset than Battig (casuarina - samba). ★ Clustering performed with CLUTO toolkit. ○ Repeated bisection with global optimization. ○ Default parameters. ★ Models evaluated by cluster purity:

58

slide-59
SLIDE 59

Concept Categorization

59

Window2 Window20 Model AP AP Text 0.73 0.65 Image 0.26 0.26 NaiveFL 0.74 0.64 NaiveSL 0.65 0.66 MixLDA (Feng and Lapata, 2010) 0.14 0.14 Textmixed 0.74 0.67 Imagemixed 0.35 0.29 TunedFL 0.74 0.69 TunedSL 0.75 0.69

slide-60
SLIDE 60

Thank you!

60

slide-61
SLIDE 61

For further study

➔ Multimodal deep learning

◆ Ngiam et al., ICML 2011. ◆ Srivastava & Salakhutdinov, ICML 2012, NIPS 2012, JMLR 2014. ◆ Sohn et al., NIPS 2014.

61