learning concept taxonomies from multi modal data
play

Learning Concept Taxonomies from Multi-modal Data Hao Zhang - PowerPoint PPT Presentation

Learning Concept Taxonomies from Multi-modal Data Hao Zhang Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan and Eric P. Xing Carnegie Mellon University 1 Outline Problem Taxonomy Induction Model Features Evaluation


  1. Learning Concept Taxonomies from Multi-modal Data Hao Zhang Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan and Eric P. Xing Carnegie Mellon University 1

  2. Outline • Problem • Taxonomy Induction Model • Features • Evaluation and Analysis 2

  3. Problem • Taxonomy induction {consumer goods, fashion, uniform, A set of lexical terms = neckpiece, handwear, finery, disguise, ...} • Human knowledge • Question answering • Interpretability • Information extraction 3 • Computer vision

  4. Problem • Existing Taxonomies – Knowledge/time intensive to build – Limited coverage – Unavailable 4

  5. Related Works (NLP) • Automatically induction of taxonomies Widdows [2003] Snow et al [2006] Poon and Domnigos [2010] Kozareva and Hovy Yang and Callan [2009] Navigli et al [2011] [2010] Fu et al [2014] Bansal et al [2014] 5

  6. Problem • What evidence helps taxonomy induction? – Surface features shark • Ends with • Contains • Suffix match white shark • … brid bird of prey 6

  7. Problem • What evidence helps taxonomy induction? – Semantics from text descriptions • Parent-child relation • Sibling relation [Bansal 2014] “seafish, such as shark…” seafish “rays are a group of seafishes…” shark ray “Either shark or ray…” “Both shark and ray…” 7

  8. Problem • What evidence helps taxonomy induction? – Semantics from text descriptions • Parent-child relation • Sibling relation [Bansal 2014] “seafish, such as shark…” • Wikipedia abstract – Presence and distance extracted as “rays are a group – Patterns of seafishes…” • Web-ngrams “Either shark or ray…” • … “Both shark and ray…” 8

  9. Problem • What evidence helps taxonomy induction? – wordvec d(𝑤 𝑙𝑗𝑜𝑕 , 𝑤 𝑟𝑣𝑓𝑓𝑜 ) ≈ 𝑒(𝑤 𝑛𝑏𝑜 , 𝑤 𝑥𝑝𝑛𝑏𝑜 ) ? 𝑤 se𝑏𝑔𝑗𝑡ℎ − 𝑤 𝑡ℎ𝑏𝑠𝑙 𝑤 ℎ𝑣𝑛𝑏𝑜 − 𝑤(𝑥𝑝𝑛𝑏𝑜) – Projections between parent and child [Fu 2014] 9

  10. Motivation • How about images? Seafish seafish Shark shark ray Ray 10

  11. Motivation • Our motivation – Images may include perceptual semantics – Jointly leverage text and visual information (from the web) • Problems to be addressed: – How to design visual features to capture the perceptual semantics? – How to design models to integrate visual and text information?

  12. Related Works (CV) • Building visual hierarchies Griffin and Perona [2008] Sivic et al [2008] Chen et al [2013] 12

  13. Task Definition • Assume a set of N cateogries 𝒚 = 𝑦 = , 𝑦 > , … , 𝑦 @ – Each category has a name and a set of images • Goal: induce a taxonomy tree over 𝒚 – Using both text & visual features x = {Animal, Fish, Shark, Cat, Tiger, Terrestrial animal, Seafish, Feline} • Setting: Supervised learning of category hierarchies from data

  14. Model Let 𝑨 B (1 ≤ 𝑨 B ≤ 𝑂) be the index of the parent of category 𝑦 B – The set 𝐴 = {𝑨 = , 𝑨 > , … , 𝑨 B } encodes the whole tree structure • Our goal → infer the conditional distribution 𝑞(𝒜|𝒚) x = {Animal, Fish, Shark, Cat, Tiger, Terrestrial animal, Seafish, Feline} 14

  15. Model Overview • Intuition: Categories tend to be closely related to parents and siblings – (text) hypernym-hyponym relation: shark -> cat shark – visual similarity: images of shark ⇔ images of ray • Method: Induce features from distributed representations of images and text – image: deep convnet – text: word embedding 15

  16. Taxonomy Induction Model • Notations: – 𝒅 B : child nodes of 𝑦 B N ∈ 𝒅 B – 𝑦 B – 𝑕 P : consistency term depending on features – 𝑥 : model weights to be learned parent indexes popularity (#child) of categories of categories N with parent consistency of 𝑦 B prior of popularity 16 𝑦 B and siblings 𝒅 𝒐 \x BN

  17. Taxonomy Induction Model • Looking into 𝑕 P : N , 𝒅 B \𝑦 B N – 𝑕 𝑦 B , 𝑦 B evaluates how consistent a parent-child group is. – The whole model is a factorization of consistency terms of all local parent-child groups. N with parent consistency of 𝑦 B 𝑦 B and siblings 𝒅 𝒐 \x BN 17

  18. Model: Develop 𝑕 P • Notations: – 𝒅 B : child nodes of 𝑦 B N ∈ 𝒅 B – 𝑦 B – 𝑕 P : consistency term depending on features – 𝑥 : model weights to be learned weight vector (to be learned) feature vector: feature N with parent consistency of 𝑦 B N with parent vector of 𝑦 B 𝑦 B and siblings 𝒅 𝒐 \x BN N 𝑦 B and siblings 𝒅 B \𝑦 B 18

  19. Feature: Develop 𝑔 • Visual features: – Sibling similarity – Parent-child similarity – Parent prediction • Text features – Parent prediction [Fu et al.] – Sibling Similarity – Surface features [Bansal et al.] 19

  20. Feature: Develop 𝑔 • Visual features: Sibling similarity (S-V1*) – Step 1 : fit a Gaussian to the images of each category – Step 2: Derive the pairwise similarity 𝑤𝑗𝑡𝑡𝑗𝑛(𝑦 B , 𝑦 U ) – Step 3: Derive the groupwise similarity by averaging S-V1 evaluates the visual similarity between siblings * S: Siblings, V: Visual 20

  21. Feature: Develop 𝑔 • Visual features: Parent-child Similarity (PC-V1*) – Step 1 : Fit a Gaussian for child categories – Step 2: Fit a Gaussian for only the top-K images of parent categories – Step 3 – 4: same with S-V1 Seafish Shark * PC: Parent-child, V: Visual 21

  22. Feature: Develop 𝑔 • Visual features: Parent Prediction (PC-V2*) – Step 1 : Learn a projection matrix to map the mean image of child category to the word embedding of its parent category – Step 2: Calculate the distance – Step 3: bin the distance as a feature vector * PC: Parent-child, V: Visual 22

  23. Feature: Develop 𝑔 • Text features – Parent prediction [Fu et al.] • Parent prediction: projection from child to parent – Sibling Similarity • Distance between word vectors – Surface features [Bansal et al.] • Ends with (e.g. catshark is a sub-category of shark), LCS, Capitalization, etc. 23

  24. Parameter Estimation • Inference – Gibbs sampling • Learning – Supervised learning from gold taxonomies of training data – Gradient descent-based maximum likelihood estimation • Output taxonomies – Chao-Liu-Edmonds algorithm 24

  25. Experiment Setup • Implementation – Wordvec: Google word2vec – Convnet: VGG-16 >VW • Evaluation metric: Ancestor-F1 = VXW • Data – Training set: ImageNet taxonomies 25

  26. Evaluation Results: Comparison to baseline methods • Embedding-based feature (LV) is comparable to state-of-the-art • Full feature set (LVB) achieve the best • L: L anguage features – surface features – embedding features • V: V isual features • B: B ansal2014 features – web ngrams etc. • E: E mbedding features 26

  27. Evaluation Results: How much visual features help? Messages: • Visual similarity (S-V1, PC-V1) help a lot • The complexity of visual representations does not affect much 27

  28. Evaluation Results: Investigating PC-V1 • Images of parent category are not all necessarily visually similar to images of child category Seafish Shark 28

  29. Evaluation Results: When/Where visual features help? • Messages: – Shallow layers ↔ abstract categories ↔ text features more effective – Deep layers ↔ specific categories ↔ visual features more effective Weights v.s. depth 29

  30. Take-home Message • Visual similarity helps taxonomy induction a lot – Sibling similarity – Parent-child similarity • Which features are more important? – Visual features are more indicative in near- leaf layers – Text features more evident in near-root layers • Embedding features augments word count features 30

  31. Thank You! Q & A 31

  32. Evaluation Results: Visualization 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend