What do you mean? Inferring Word Meaning Using Computer Vision - - PowerPoint PPT Presentation
What do you mean? Inferring Word Meaning Using Computer Vision - - PowerPoint PPT Presentation
What do you mean? Inferring Word Meaning Using Computer Vision Shibamouli Lahiri Original paper Multimodal Distributional Semantics Bruni, et al. (JAIR 2014) 2 Authors Elia Bruni Nam Khanh Marco Baroni Tran 3
Original paper
➔ Multimodal Distributional Semantics
◆ Bruni, et al. (JAIR 2014)
2
Authors
Elia Bruni Nam Khanh Marco Baroni Tran
3
What does this word mean?
অথਐ
4
What does this word mean?
অথਐ means “meaning” in Bengali.
5
What does this word mean?
★ অথਐ means “meaning” in Bengali. ★ It also means “money” or “wealth”.
6
The importance of “grounding”
ওবামা বুশ িন
7
The importance of “grounding”
ওবামা বুশ িন
8
The importance of “grounding”
ওবামা = আেমিরকার ৪৪তম রাਵপিত
9
The importance of “grounding”
Topic 1: state agent control set system systems states event learning model action problem agents task actions time algorithm knowledge events figure Topic 2: optimal problem function time probability set information game strategy model distribution case algorithm section number random cost theorem vol matrix Topic 3: data information learning features set work text language word number analysis words results table based research social semantic web system Topic 4: design circuit gate logic test delay input circuits fault gates error simulation number timing placement faults figure analysis techniques model Topic 5: system user data systems security users file time server application software information network applications key design mobile process access interface
10
Grounding in the brain
- “kick” vs “lick”
○ Pulvermueller, 2005
11
Distributed vs Perceptual
tropical fruit edible yellow peel smooth
12
Origins of Meaning Representation “You shall know a word by the company it keeps.”
13
Origins of Meaning Representation “The individual words in language name objects... It is the object for which the word stands.”
14
Can we combine them? Yes!
car automobile vehicle + =
15
Background
Distributional Perceptual Words Visual words BoW BoVW Documents Images
16
Background
Distributional Perceptual Words Visual words BoW BoVW Documents Images car automobile vehicle
17
Visual words
18
Overview
19
Overview
⃪ SVD on joint matrix ⃪ Feature-level fusion ⃪ Scoring-level fusion
20
Overview
21
Text matrix
❏ Corpora: ukWaC and Wackypedia
❏ 1.9B and 820M tokens. ❏ Both lemmatized and POS-tagged.
❏ Terms: most frequent 20K nouns, 5K adjectives, 5K verbs
❏ Adjustment leads to 20,515 terms.
❏ Context: Window2 and Window20 ❏ Association Score: non-negative local mutual information (LMI) ❏ Matrix: 20,515 rows, 30K columns
22
Non-negative LMI
23
Image matrix
❏ Corpus: ESP-Game dataset
❏ 100K tagged images, 14 tags on average. ❏ 20,515 distinct word types.
❏ Terms: 20,515 words ❏ Context: Visual words ✕ spatial bins
❏ 5K visual words ❏ 16 spatial bins
❏ Association Score: non-negative LMI
❏ word associated with images labeled with it ❏ co-occurrence counts are summed
❏ Matrix: 20,515 rows, 80K columns
24
ESP-Game images
25
Image matrix construction
★ Identify “keypoints”. ★ 128 SIFT features for each keypoint.
○ 4 ✕ 4 sampling regions ✕ 8 orientations ○ Average across three channels (HSV)
★ Cluster all keypoints from all images.
○ 5,000 clusters with k-means ○ Cluster centers are “visual words”
★ Image representation ○ Vector of “term frequency” on visual words ★ 4 ✕ 4 spatial binning
26
Visual words
27
Image matrix construction
28
Overview
29
Latent Multimodal Mixing
➔ Singular value decomposition (SVD): ➔ Low-rank approximation:
30
Overview
31
Multimodal Similarity
➢ Goal: Measure similarity between word pairs ➢ Similarity function: Cosine on latent vectors ➢ Feature-level fusion: F = Ft ⊕ (1-)Fv ➢ Scoring-level fusion: S = St + (1-)Sv
32
Fusion options
k = r < r FL FL SL SL
= 1 Text only = 0.5 NaiveFL (Bruni et al., 2011) = 0 Image only = 1 Text only = 0.5 NaiveSL (Leong & Mihalcea, 2011) = 0 Image only = 1 Textmixed ∈ (0, 1) TunedFL = 0 Imagemixed = 1 Textmixed ∈ (0, 1) TunedSL = 0 Imagemixed
33
Experiments: Overview
➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization
34
Experiments: Overview
➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization
35
Semantic Relations
★ Goal is to explore which word relations are best captured by which model.
36
Semantic Relations
★ Goal is to explore which word relations are best captured by which model. ★ BLESS benchmark (Baroni and Lenci, 2011) ★ 184 pivot words denoting concrete concepts. ★ For each pivot, there are related words (relata):
○ COORD (co-hyponym: alligator-lizard) ○ HYPER (hypernym: alligator-reptile) ○ MERO (meronym: alligator-teeth) ○ ATTRI (attribute: alligator-ferocious) ○ EVENT (alligator-swim) ○ RAN.N (random noun: alligator-trombone) ○ RAN.J (random adjective: alligator-electronic) ○ RAN.V (random verb: alligator-conclude)
37
Semantic Relations
★ Goal is to explore which word relations are best captured by which model. ★ BLESS benchmark (Baroni and Lenci, 2011) ★ 184 pivot words denoting concrete concepts. ★ For each pivot, there are related words (relata):
○ COORD (co-hyponym: alligator-lizard) ○ HYPER (hypernym: alligator-reptile) ○ MERO (meronym: alligator-teeth) ○ ATTRI (attribute: alligator-ferocious) ○ EVENT (alligator-swim) ○ RAN.N (random noun: alligator-trombone) ○ RAN.J (random adjective: alligator-electronic) ○ RAN.V (random verb: alligator-conclude)
★ Represent pivots and relata with text and image vectors. ★ Pick relatum with highest cosine for each relation. ★ Convert cosines to z-scores.
38
Semantic Relations
39
Semantic Relations
40
Pivot Text Image Pivot Text Image cabbage leafy white helicopter heavy
- ld
carrot fresh
- range
- nion
fresh white cherry ripe red
- ven
electric new deer wild brown plum juicy red dishwasher electric white sofa comfortable
- ld
elephant wild white sparrow wild little glider heavy white stove electric hot gorilla wild black tanker heavy grey hat white
- ld
toaster electric new hatchet sharp short trout fresh
- ld
Experiments: Overview
➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization
41
Word Relatedness
★ Goal is to predict relatedness between two words.
42
Word Relatedness
★ Goal is to predict relatedness between two words. ★ WS (WordSim353) and MEN (Marco-Elia-Nam) benchmarks. ★ WS has 353 similarity-rated word pairs.
○ 252 were used in this study.
★ MEN has 3,000 similarity-rated word pairs. ○
Similarity scores obtained from Mechanical Turk. ○ 2,000 development pairs. ○ 1,000 test pairs.
43
Word Relatedness
★ Goal is to predict relatedness between two words. ★ WS (WordSim353) and MEN (Marco-Elia-Nam) benchmarks. ★ WS has 353 similarity-rated word pairs.
○ 252 were used in this study.
★ MEN has 3,000 similarity-rated word pairs. ○
Similarity scores obtained from Mechanical Turk. ○ 2,000 development pairs. ○ 1,000 test pairs.
★ Models evaluated by correlation between human similarity and cosine similarity (of word pairs).
44
Word Relatedness (Spearman)
45
Window2 Window20 Model MEN WS MEN WS Text 0.73 0.70 0.68 0.70 Image 0.43 0.36 0.43 0.36 NaiveFL 0.75 0.67 0.73 0.67 NaiveSL 0.76 0.69 0.74 0.64 MixLDA (Feng and Lapata, 2010) 0.30 0.23 0.30 0.23 Textmixed 0.77 0.73 0.74 0.75 Imagemixed 0.55 0.52 0.57 0.51 TunedFL 0.78 0.72 0.76 0.75 TunedSL 0.78 0.71 0.77 0.72
Word Relatedness (Spearman)
46
Window2 Window20 Model MEN WS MEN WS Text 0.73 0.70 0.68 0.70 Image 0.43 0.36 0.43 0.36 NaiveFL 0.75 0.67 0.73 0.67 NaiveSL 0.76 0.69 0.74 0.64 MixLDA (Feng and Lapata, 2010) 0.30 0.23 0.30 0.23 Textmixed 0.77 0.73 0.74 0.75 Imagemixed 0.55 0.52 0.57 0.51 TunedFL 0.78 0.72 0.76 0.75 TunedSL 0.78 0.71 0.77 0.72
Word Relatedness (Pearson)
47
MixLDA (Feng and Lapata, 2010) 0.32 Window2 Window20 Textmixed 0.47 0.49 TunedFL 0.46 0.49 TunedSL 0.46 0.47
Qualitative Analysis
48
Text (Window20) TunedFL dawn - dusk pet - puppy sunrise - sunset candy - chocolate canine - dog paw - pet grape - wine bicycle - bike foliage - plant apple - cherry foliage - petal copper - metal skyscraper - tall military - soldier cat - feline paws - whiskers pregnancy - pregnant stream - waterfall misty - rain cheetah - lion
Experiments: Overview
➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization
49
Concrete vs Abstract Words
★ Goal is to see which model performs better on concrete/abstract words.
50
Concrete vs Abstract Words
★ Goal is to see which model performs better on concrete/abstract words. ★ MEN test set (1,000 word pairs) divided into:
○ MEN-conc, 837 concrete word pairs (arm-bicycle) ○ MEN-abst, 163 abstract and mixed word pairs (fun-relax, design-orange)
51
Concrete vs Abstract Words
★ Goal is to see which model performs better on concrete/abstract words. ★ MEN test set (1,000 word pairs) divided into:
○ MEN-conc, 837 concrete word pairs (arm-bicycle) ○ MEN-abst, 163 abstract and mixed word pairs (fun-relax, design-orange)
★ Abstractness determined by an algorithm (Turney et al., 2011).
○ Concrete and abstract paradigm words ○ Co-occurrence counts of words with paradigm words ○ Converted to Pointwise Mutual Information ○ (Word, Paradigm word) matrix smoothed by SVD ○ Abstractness score of a word = sum(similarities with concrete words) - sum(similarities with abstract words) ○ If abstractness score ≤ 0.5, MEN-abst; else, MEN-conc.
52
Concrete vs Abstract Words
★ Goal is to see which model performs better on concrete/abstract words. ★ MEN test set (1,000 word pairs) divided into:
○ MEN-conc, 837 concrete word pairs (arm-bicycle) ○ MEN-abst, 163 abstract and mixed word pairs (fun-relax, design-orange)
★ Abstractness determined by an algorithm (Turney et al., 2011).
○ Concrete and abstract paradigm words ○ Co-occurrence counts of words with paradigm words ○ Converted to Pointwise Mutual Information ○ (Word, Paradigm word) matrix smoothed by SVD ○ Abstractness score of a word = sum(similarities with concrete words) - sum(similarities with abstract words) ○ If abstractness score ≤ 0.5, MEN-abst; else, MEN-conc.
★ Models evaluated by correlation between human similarity and cosine similarity (of word pairs).
53
Evaluation (Spearman)
54
Model MEN-conc MEN-abst MEN-full Window20 0.70 0.51 0.68 Image 0.47 0.37 0.43 TunedFL 0.78 0.52 0.76
Experiments: Overview
➔ Differentiation between semantic relations ➔ Word relatedness ➔ Concrete vs Abstract words ➔ Concept categorization
55
Concept Categorization
★ Goal is to cluster words that are conceptually similar.
56
Concept Categorization
★ Goal is to cluster words that are conceptually similar. ★ Battig benchmark for tuning (Baroni et al., 2010). ○ 83 concepts from 10 concrete categories. ○ 77 were used in this study. ★ AP benchmark for testing (Almuhareb and Poesio, 2005). ○ 402 nouns from 21 WordNet classes. ○ 231 were used in this study. ○ Harder dataset than Battig (casuarina - samba).
57
Concept Categorization
★ Goal is to cluster words that are conceptually similar. ★ Battig benchmark for tuning (Baroni et al., 2010). ○ 83 concepts from 10 concrete categories. ○ 77 were used in this study. ★ AP benchmark for testing (Almuhareb and Poesio, 2005). ○ 402 nouns from 21 WordNet classes. ○ 231 were used in this study. ○ Harder dataset than Battig (casuarina - samba). ★ Clustering performed with CLUTO toolkit. ○ Repeated bisection with global optimization. ○ Default parameters. ★ Models evaluated by cluster purity:
58
Concept Categorization
59
Window2 Window20 Model AP AP Text 0.73 0.65 Image 0.26 0.26 NaiveFL 0.74 0.64 NaiveSL 0.65 0.66 MixLDA (Feng and Lapata, 2010) 0.14 0.14 Textmixed 0.74 0.67 Imagemixed 0.35 0.29 TunedFL 0.74 0.69 TunedSL 0.75 0.69
Thank you!
60
For further study
➔ Multimodal deep learning
◆ Ngiam et al., ICML 2011. ◆ Srivastava & Salakhutdinov, ICML 2012, NIPS 2012, JMLR 2014. ◆ Sohn et al., NIPS 2014.
61