Large Margin Taxonomy Embedding with an Application to Document - - PowerPoint PPT Presentation

large margin taxonomy embedding with an application to
SMART_READER_LITE
LIVE PREVIEW

Large Margin Taxonomy Embedding with an Application to Document - - PowerPoint PPT Presentation

Large Margin Taxonomy Embedding with an Application to Document Categorization K. Weinberger and O. Chapelle NIPS 2008 presented by J. Silva, Duke University Large Margin Taxonomy Embedding with an Application to Document Categorization May


slide-1
SLIDE 1

Large Margin Taxonomy Embedding with an Application to Document Categorization

  • K. Weinberger and O. Chapelle

NIPS 2008

presented by J. Silva, Duke University

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 1 / 16

slide-2
SLIDE 2

Problem

Multi-class classification

◮ Application: document categorization

The classes (topics) follow a taxonomy (e.g., a hierarchy) Misclassification errors are not all the same. Examples:

◮ It is worse to misclassify a male pedestrian as a traffic light than as a

female pedestrian

◮ ...or to misclassify a medical journal on heart attack as a publication on

athlete’s foor than on coronary disease

The proposed approach is cost-sensitive, and aims to move beyond hierarchical representations to discover a continuous latent semantic space

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 2 / 16

slide-3
SLIDE 3

Multi-class classification of documents based on a taxonomy of topics

  • xi is the i-th document
  • pα is the protype for the α-th class

P, W are mappings to latent semantic space

Note: All figures adapted from the original paper Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 3 / 16

slide-4
SLIDE 4

Contributions

A supervised regression algorithm called taxem (taxonomy embedding) is presented The algorithm learns the regression for the documents and the placement of the topic prototypes in a single optimization The regression is done by solving a convex semi-definite programming (SDP) problem In this case, the SDP admits a particular form that can be solved efficiently for large datasets

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 4 / 16

slide-5
SLIDE 5

Outline of the presentation

Notation Two-step method

◮ Topic embedding ◮ Document regression

One-step combined large margin optimization Results on the OHSUMED medical journal database Discussion

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 5 / 16

slide-6
SLIDE 6

Notation

Documents x1, . . . , xn ∈ X of dimensionality d

◮ Can be, e.g., bag-of-word indicators or tf-idf scores

y1, . . . , yn ∈ {1, . . . , c} are topic labels in some taxonomy T Indices

◮ i, j ∈ {1, . . . , n} for documents ◮ α, β ∈ {1, . . . , c} for classes

The taxonomy gives rise to a cost matrix C ∈ Rc×c, where Cαβ ≥ 0 is the cost of misclassifying class α as β and Cαα = 0 We wish to represent

◮ each topic α as a prototype

pα ∈ F

◮ each document

xi as a low-dimensional vector zi ∈ F

We assume C is given

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 6 / 16

slide-7
SLIDE 7

Two-step approach: embedding topic prototypes

Find prototypes p1, . . . , pc ∈ F based on C Define P = [ p1 · · · pc] ∈ Rc×c (note: it is assumed throughout the paper that F = Rc) How to derive P from C?

◮ Simplest: ignore C and set P = Ic×c; all topics in the corners of a

(c-1)-dimensional simplex, denoted PI

◮ Better: solve

Pmds = arg min

P c

  • α,β=1

( pα − pβ2

2 − Cαβ)2

where mds means metric dimensional scaling, as in ISOMAP

◮ The solution is Pmds =

√ ΛV, obtained from the decomposition ¯ C = VΛV⊤ where ¯ C = − 1

2HCH and H = I − 1 c 11⊤

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 7 / 16

slide-8
SLIDE 8

Two-step approach: document regression

Assume we have P Find mapping W : X → F so that xi with label yi is placed near pyi Solve the linear ridge regression W = arg min

W n

  • i=1

( pyi − W xi2

2 + λW2 F

The solution has the closed form PJX⊤(XX⊤ + λI)−1 where X = [ x1 · · · xn] and J ∈ {0, 1}c×n, with Jαi = 1 iff yi = α

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 8 / 16

slide-9
SLIDE 9

Inference and performance measure

Inference (for new documents)

◮ Given a new document

xt, first map it into F and then estimate its label via the nearest-neighbor rule ˆ yt = arg min

α

pα − W xt2

2

Performance measure

◮ For a given set of labeled documents (

x1, y1), . . . , ( xn, yn), the quality

  • f the regression is assessed via the averaged cost-sensitive

misclassification loss E = 1 n

n

  • i=1

Cyi ˆ

yi

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 9 / 16

slide-10
SLIDE 10

One-step method

Learning the prototypes independently of the data is not optimal Untangle the mutual dependence between W and P (“chicken-and-egg” problem)

◮ Define

A = JX⊤(XX⊤ + λI)−1

◮ We have W = PA; A is independent of P, and can be pre-computed

Now find P

◮ Let

x′

i = A

xi and eα = [0 · · · 1 · · · 0]⊤ with 1 in the α-th position

◮ Rewrite

pα = P eα and zi = P x′

i

◮ We can’t optimize E w.r.t. P directly (the objective is non-continuous

and non-differentiable)

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 10 / 16

slide-11
SLIDE 11

One-step method (cont.)

Surrogate loss function

Minimize

i,α ξiα s.t.

P( eyi − x′

i )2 2 + Cyiα ≤ P(

eα − x′

i )2 2 + ξiα

ξiα ≥ 0 This enforces a large-margin condition, so that prototypes which would incur larger misclassification loss are farther away The surrogate loss is an upper bound on E

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 11 / 16

slide-12
SLIDE 12

One-step method (cont.)

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 12 / 16

slide-13
SLIDE 13

One-step method: upper bound on E

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 13 / 16

slide-14
SLIDE 14

One-step method: convex formulation and regularization

The above optimization is not convex, due to quadratic constraints Make the problem invariant to rotations by defining Q = P⊤P and writing distances w.r.t. Q P( eα − x′

i )2 2 = (

eα − x′

i )⊤Q(

eα − x′

i ) =

eα − x′

i 2 Q

µ is a regularization parameter

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 14 / 16

slide-15
SLIDE 15

Results

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 15 / 16

slide-16
SLIDE 16

Results

Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 16 / 16