Learning Concept Taxonomies from Multi-modal Data Hao Zhang - - PowerPoint PPT Presentation

learning concept taxonomies from multi modal data
SMART_READER_LITE
LIVE PREVIEW

Learning Concept Taxonomies from Multi-modal Data Hao Zhang - - PowerPoint PPT Presentation

Learning Concept Taxonomies from Multi-modal Data Hao Zhang Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan and Eric P. Xing Carnegie Mellon University 1 Outline Problem Taxonomy Induction Model Features Evaluation


slide-1
SLIDE 1

1

Learning Concept Taxonomies from Multi-modal Data

Carnegie Mellon University

Hao Zhang

Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan and Eric P. Xing

slide-2
SLIDE 2
  • Problem
  • Taxonomy Induction Model
  • Features
  • Evaluation and Analysis

Outline

2

slide-3
SLIDE 3

3

Problem

  • Taxonomy induction

{consumer goods, fashion, uniform, neckpiece, handwear, finery, disguise, ...} A set of lexical terms =

  • Human knowledge
  • Interpretability
  • Question answering
  • Information extraction
  • Computer vision
slide-4
SLIDE 4

4

Problem

  • Existing Taxonomies

– Knowledge/time intensive to build – Limited coverage – Unavailable

slide-5
SLIDE 5

5

Related Works (NLP)

  • Automatically induction of taxonomies

Widdows [2003] Snow et al [2006] Yang and Callan [2009] Kozareva and Hovy [2010] Poon and Domnigos [2010] Navigli et al [2011] Fu et al [2014] Bansal et al [2014]

slide-6
SLIDE 6

6

Problem

– Surface features

  • Ends with
  • Contains
  • Suffix match

shark white shark brid bird of prey

  • What evidence helps taxonomy induction?
slide-7
SLIDE 7

7

Problem

  • What evidence helps taxonomy induction?

– Semantics from text descriptions

  • Parent-child relation
  • Sibling relation [Bansal 2014]

seafish shark ray “seafish, such as shark…” “rays are a group

  • f seafishes…”

“Either shark or ray…” “Both shark and ray…”

slide-8
SLIDE 8

8

Problem

  • What evidence helps taxonomy induction?

– Semantics from text descriptions

  • Parent-child relation
  • Sibling relation [Bansal 2014]

“seafish, such as shark…” “rays are a group

  • f seafishes…”

“Either shark or ray…” “Both shark and ray…”

  • Wikipedia abstract

– Presence and distance – Patterns

  • Web-ngrams

extracted as

slide-9
SLIDE 9

9

Problem

  • What evidence helps taxonomy induction?

– wordvec – Projections between parent and child [Fu 2014]

d(𝑤 𝑙𝑗𝑜𝑕 , 𝑤 𝑟𝑣𝑓𝑓𝑜 ) ≈ 𝑒(𝑤 𝑛𝑏𝑜 , 𝑤 𝑥𝑝𝑛𝑏𝑜 ) 𝑤 se𝑏𝑔𝑗𝑡ℎ − 𝑤 𝑡ℎ𝑏𝑠𝑙 𝑤 ℎ𝑣𝑛𝑏𝑜 − 𝑤(𝑥𝑝𝑛𝑏𝑜) ?

slide-10
SLIDE 10

10

Motivation

  • How about images?

seafish shark ray

Seafish Shark Ray

slide-11
SLIDE 11

Motivation

  • Our motivation

– Images may include perceptual semantics – Jointly leverage text and visual information (from the web)

  • Problems to be addressed:

– How to design visual features to capture the perceptual semantics? – How to design models to integrate visual and text information?

slide-12
SLIDE 12

12

Related Works (CV)

  • Building visual hierarchies

Chen et al [2013] Sivic et al [2008] Griffin and Perona [2008]

slide-13
SLIDE 13

Task Definition

  • Assume a set of N cateogries 𝒚 = 𝑦=, 𝑦>, … , 𝑦@

– Each category has a name and a set of images

  • Goal: induce a taxonomy tree over 𝒚

– Using both text & visual features

  • Setting: Supervised learning of category

hierarchies from data

x = {Animal, Fish, Shark, Cat, Tiger, Terrestrial animal, Seafish, Feline}

slide-14
SLIDE 14

14

Model

Let 𝑨B(1 ≤ 𝑨B ≤ 𝑂) be the index of the parent of category 𝑦B

– The set 𝐴 = {𝑨=, 𝑨>, … , 𝑨B} encodes the whole tree structure

  • Our goal → infer the conditional distribution

𝑞(𝒜|𝒚)

x = {Animal, Fish, Shark, Cat, Tiger, Terrestrial animal, Seafish, Feline}

slide-15
SLIDE 15

15

Model Overview

  • Intuition: Categories tend to be closely related

to parents and siblings

– (text) hypernym-hyponym relation: shark -> cat shark – visual similarity: images of shark ⇔ images of ray

  • Method: Induce features from distributed

representations of images and text

– image: deep convnet – text: word embedding

slide-16
SLIDE 16

16

Taxonomy Induction Model

  • Notations:

– 𝒅B: child nodes of 𝑦B – 𝑦B

N ∈ 𝒅B

– 𝑕P: consistency term depending on features – 𝑥: model weights to be learned parent indexes

  • f categories

popularity (#child)

  • f categories

prior of popularity consistency of 𝑦B

N with parent

𝑦B and siblings 𝒅𝒐\xBN

slide-17
SLIDE 17

17

Taxonomy Induction Model

  • Looking into 𝑕P:

– 𝑕 𝑦B, 𝑦B

N , 𝒅B\𝑦B N

evaluates how consistent a parent-child group is. – The whole model is a factorization of consistency terms of all local parent-child groups. consistency of 𝑦B

N with parent

𝑦B and siblings 𝒅𝒐\xBN

slide-18
SLIDE 18

18

Model: Develop 𝑕P

  • Notations:

– 𝒅B: child nodes of 𝑦B – 𝑦B

N ∈ 𝒅B

– 𝑕P: consistency term depending on features – 𝑥: model weights to be learned consistency of 𝑦B

N with parent

𝑦B and siblings 𝒅𝒐\xBN weight vector (to be learned) feature vector: feature vector of 𝑦B

N with parent

𝑦Band siblings 𝒅B\𝑦B

N

slide-19
SLIDE 19

19

Feature: Develop 𝑔

  • Visual features:

– Sibling similarity – Parent-child similarity – Parent prediction

  • Text features

– Parent prediction [Fu et al.] – Sibling Similarity – Surface features [Bansal et al.]

slide-20
SLIDE 20

20

Feature: Develop 𝑔

  • Visual features: Sibling similarity (S-V1*)

– Step 1 : fit a Gaussian to the images of each category – Step 2: Derive the pairwise similarity 𝑤𝑗𝑡𝑡𝑗𝑛(𝑦B, 𝑦U) – Step 3: Derive the groupwise similarity by averaging

S-V1 evaluates the visual similarity between siblings

* S: Siblings, V: Visual

slide-21
SLIDE 21

21

Feature: Develop 𝑔

  • Visual features: Parent-child Similarity (PC-V1*)

– Step 1 : Fit a Gaussian for child categories – Step 2: Fit a Gaussian for only the top-K images of parent categories – Step 3 – 4: same with S-V1

* PC: Parent-child, V: Visual

Seafish Shark

slide-22
SLIDE 22

22

Feature: Develop 𝑔

  • Visual features: Parent Prediction (PC-V2*)

– Step 1 : Learn a projection matrix to map the mean image of child category to the word embedding of its parent category – Step 2: Calculate the distance – Step 3: bin the distance as a feature vector

* PC: Parent-child, V: Visual

slide-23
SLIDE 23

23

Feature: Develop 𝑔

  • Text features

– Parent prediction [Fu et al.]

  • Parent prediction: projection from child to parent

– Sibling Similarity

  • Distance between word vectors

– Surface features [Bansal et al.]

  • Ends with (e.g. catshark is a sub-category of shark), LCS,

Capitalization, etc.

slide-24
SLIDE 24

24

Parameter Estimation

  • Inference

– Gibbs sampling

  • Learning

– Supervised learning from gold taxonomies of training data – Gradient descent-based maximum likelihood estimation

  • Output taxonomies

– Chao-Liu-Edmonds algorithm

slide-25
SLIDE 25

25

Experiment Setup

  • Implementation

– Wordvec: Google word2vec – Convnet: VGG-16

  • Evaluation metric: Ancestor-F1 =

>VW VXW

  • Data

– Training set: ImageNet taxonomies

slide-26
SLIDE 26

26

Evaluation

Results: Comparison to baseline methods

  • Embedding-based feature (LV) is comparable to

state-of-the-art

  • Full feature set (LVB) achieve the best
  • L: Language features

– surface features – embedding features

  • V: Visual features
  • B: Bansal2014 features

– web ngrams etc.

  • E: Embedding features
slide-27
SLIDE 27

27

Evaluation

Results: How much visual features help? Messages:

  • Visual similarity (S-V1, PC-V1) help a lot
  • The complexity of visual representations does

not affect much

slide-28
SLIDE 28

28

Evaluation

Results: Investigating PC-V1

  • Images of parent category are not all necessarily

visually similar to images of child category

Seafish Shark

slide-29
SLIDE 29

29

Evaluation

Results: When/Where visual features help?

  • Messages:

– Shallow layers ↔abstract categories ↔ text features more effective – Deep layers ↔ specific categories ↔ visual features more effective

Weights v.s. depth

slide-30
SLIDE 30

30

Take-home Message

  • Visual similarity helps taxonomy induction a lot

– Sibling similarity – Parent-child similarity

  • Which features are more important?

– Visual features are more indicative in near- leaf layers – Text features more evident in near-root layers

  • Embedding features augments word count

features

slide-31
SLIDE 31

31

Thank You! Q & A

slide-32
SLIDE 32

32

Evaluation

Results: Visualization