1
Learning Concept Taxonomies from Multi-modal Data
Carnegie Mellon University
Hao Zhang
Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan and Eric P. Xing
Learning Concept Taxonomies from Multi-modal Data Hao Zhang - - PowerPoint PPT Presentation
Learning Concept Taxonomies from Multi-modal Data Hao Zhang Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan and Eric P. Xing Carnegie Mellon University 1 Outline Problem Taxonomy Induction Model Features Evaluation
1
Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan and Eric P. Xing
2
3
{consumer goods, fashion, uniform, neckpiece, handwear, finery, disguise, ...} A set of lexical terms =
4
5
Widdows [2003] Snow et al [2006] Yang and Callan [2009] Kozareva and Hovy [2010] Poon and Domnigos [2010] Navigli et al [2011] Fu et al [2014] Bansal et al [2014]
6
shark white shark brid bird of prey
7
seafish shark ray “seafish, such as shark…” “rays are a group
“Either shark or ray…” “Both shark and ray…”
8
“seafish, such as shark…” “rays are a group
“Either shark or ray…” “Both shark and ray…”
– Presence and distance – Patterns
extracted as
9
d(𝑤 𝑙𝑗𝑜 , 𝑤 𝑟𝑣𝑓𝑓𝑜 ) ≈ 𝑒(𝑤 𝑛𝑏𝑜 , 𝑤 𝑥𝑝𝑛𝑏𝑜 ) 𝑤 se𝑏𝑔𝑗𝑡ℎ − 𝑤 𝑡ℎ𝑏𝑠𝑙 𝑤 ℎ𝑣𝑛𝑏𝑜 − 𝑤(𝑥𝑝𝑛𝑏𝑜) ?
10
seafish shark ray
Seafish Shark Ray
– Images may include perceptual semantics – Jointly leverage text and visual information (from the web)
– How to design visual features to capture the perceptual semantics? – How to design models to integrate visual and text information?
12
Chen et al [2013] Sivic et al [2008] Griffin and Perona [2008]
– Each category has a name and a set of images
– Using both text & visual features
x = {Animal, Fish, Shark, Cat, Tiger, Terrestrial animal, Seafish, Feline}
14
– The set 𝐴 = {𝑨=, 𝑨>, … , 𝑨B} encodes the whole tree structure
x = {Animal, Fish, Shark, Cat, Tiger, Terrestrial animal, Seafish, Feline}
15
– (text) hypernym-hyponym relation: shark -> cat shark – visual similarity: images of shark ⇔ images of ray
– image: deep convnet – text: word embedding
16
– 𝒅B: child nodes of 𝑦B – 𝑦B
N ∈ 𝒅B
– P: consistency term depending on features – 𝑥: model weights to be learned parent indexes
popularity (#child)
prior of popularity consistency of 𝑦B
N with parent
𝑦B and siblings 𝒅𝒐\xBN
17
– 𝑦B, 𝑦B
N , 𝒅B\𝑦B N
evaluates how consistent a parent-child group is. – The whole model is a factorization of consistency terms of all local parent-child groups. consistency of 𝑦B
N with parent
𝑦B and siblings 𝒅𝒐\xBN
18
– 𝒅B: child nodes of 𝑦B – 𝑦B
N ∈ 𝒅B
– P: consistency term depending on features – 𝑥: model weights to be learned consistency of 𝑦B
N with parent
𝑦B and siblings 𝒅𝒐\xBN weight vector (to be learned) feature vector: feature vector of 𝑦B
N with parent
𝑦Band siblings 𝒅B\𝑦B
N
19
20
– Step 1 : fit a Gaussian to the images of each category – Step 2: Derive the pairwise similarity 𝑤𝑗𝑡𝑡𝑗𝑛(𝑦B, 𝑦U) – Step 3: Derive the groupwise similarity by averaging
* S: Siblings, V: Visual
21
– Step 1 : Fit a Gaussian for child categories – Step 2: Fit a Gaussian for only the top-K images of parent categories – Step 3 – 4: same with S-V1
* PC: Parent-child, V: Visual
Seafish Shark
22
– Step 1 : Learn a projection matrix to map the mean image of child category to the word embedding of its parent category – Step 2: Calculate the distance – Step 3: bin the distance as a feature vector
* PC: Parent-child, V: Visual
23
Capitalization, etc.
24
– Chao-Liu-Edmonds algorithm
25
– Wordvec: Google word2vec – Convnet: VGG-16
– Training set: ImageNet taxonomies
26
– surface features – embedding features
– web ngrams etc.
27
28
Seafish Shark
29
– Shallow layers ↔abstract categories ↔ text features more effective – Deep layers ↔ specific categories ↔ visual features more effective
Weights v.s. depth
30
31
32