Large Margin Taxonomy Embedding with an Application to Document - PowerPoint PPT Presentation

Large Margin Taxonomy Embedding with an Application to Document Categorization K. Weinberger and O. Chapelle NIPS 2008 presented by J. Silva, Duke University Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 1 / 16

Problem Multi-class classification ◮ Application: document categorization The classes (topics) follow a taxonomy ( e.g. , a hierarchy) Misclassification errors are not all the same. Examples: ◮ It is worse to misclassify a male pedestrian as a traffic light than as a female pedestrian ◮ ...or to misclassify a medical journal on heart attack as a publication on athlete’s foor than on coronary disease The proposed approach is cost-sensitive , and aims to move beyond hierarchical representations to discover a continuous latent semantic space Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 2 / 16

Multi-class classification of documents based on a taxonomy of topics x i is the i -th document � � p α is the protype for the α -th class P , W are mappings to latent semantic space Note: All figures adapted from the original paper Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 3 / 16

Contributions A supervised regression algorithm called taxem (taxonomy embedding) is presented The algorithm learns the regression for the documents and the placement of the topic prototypes in a single optimization The regression is done by solving a convex semi-definite programming (SDP) problem In this case, the SDP admits a particular form that can be solved efficiently for large datasets Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 4 / 16

Outline of the presentation Notation Two-step method ◮ Topic embedding ◮ Document regression One-step combined large margin optimization Results on the OHSUMED medical journal database Discussion Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 5 / 16

Notation Documents � x n ∈ X of dimensionality d x 1 , . . . ,� ◮ Can be, e.g. , bag-of-word indicators or tf-idf scores y 1 , . . . , y n ∈ { 1 , . . . , c } are topic labels in some taxonomy T Indices ◮ i , j ∈ { 1 , . . . , n } for documents ◮ α, β ∈ { 1 , . . . , c } for classes The taxonomy gives rise to a cost matrix C ∈ R c × c , where C αβ ≥ 0 is the cost of misclassifying class α as β and C αα = 0 We wish to represent ◮ each topic α as a prototype � p α ∈ F ◮ each document � x i as a low-dimensional vector � z i ∈ F We assume C is given Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 6 / 16

Two-step approach: embedding topic prototypes Find prototypes � p 1 , . . . ,� p c ∈ F based on C p c ] ∈ R c × c Define P = [ � p 1 · · · � (note: it is assumed throughout the paper that F = R c ) How to derive P from C ? ◮ Simplest: ignore C and set P = I c × c ; all topics in the corners of a (c-1)-dimensional simplex, denoted P I ◮ Better: solve c � p β � 2 2 − C αβ ) 2 P mds = arg min ( � � p α − � P α,β =1 where mds means metric dimensional scaling , as in ISOMAP √ ◮ The solution is P mds = ΛV , obtained from the decomposition C = VΛV ⊤ where ¯ ¯ C = − 1 2 HCH and H = I − 1 c 11 ⊤ Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 7 / 16

Two-step approach: document regression Assume we have P Find mapping W : X → F so that � x i with label y i is placed near � p y i Solve the linear ridge regression n � x i � 2 2 + λ � W � 2 W = arg min ( � � p y i − W � F W i =1 The solution has the closed form PJX ⊤ ( XX ⊤ + λ I ) − 1 where x n ] and J ∈ { 0 , 1 } c × n , with J α i = 1 iff y i = α X = [ � x 1 · · · � Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 8 / 16

Inference and performance measure Inference (for new documents) ◮ Given a new document � x t , first map it into F and then estimate its label via the nearest-neighbor rule x t � 2 ˆ y t = arg min α � � p α − W � 2 Performance measure ◮ For a given set of labeled documents ( � x 1 , y 1 ) , . . . , ( � x n , y n ), the quality of the regression is assessed via the averaged cost-sensitive misclassification loss n E = 1 � C y i ˆ y i n i =1 Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 9 / 16

One-step method Learning the prototypes independently of the data is not optimal Untangle the mutual dependence between W and P (“chicken-and-egg” problem) ◮ Define A = JX ⊤ ( XX ⊤ + λ I ) − 1 ◮ We have W = PA ; A is independent of P , and can be pre-computed Now find P e α = [0 · · · 1 · · · 0] ⊤ with 1 in the α -th position ◮ Let � x ′ i = A � x i and � ◮ Rewrite � x ′ p α = P � e α and � z i = P � i ◮ We can’t optimize E w.r.t. P directly (the objective is non-continuous and non-differentiable) Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 10 / 16

One-step method (cont.) Surrogate loss function Minimize � i ,α ξ i α s.t. x ′ i ) � 2 x ′ i ) � 2 � P ( � e y i − � 2 + C y i α ≤ � P ( � e α − � 2 + ξ i α ξ i α ≥ 0 This enforces a large-margin condition, so that prototypes which would incur larger misclassification loss are farther away The surrogate loss is an upper bound on E Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 11 / 16

One-step method (cont.) Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 12 / 16

One-step method: upper bound on E Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 13 / 16

One-step method: convex formulation and regularization The above optimization is not convex, due to quadratic constraints Make the problem invariant to rotations by defining Q = P ⊤ P and writing distances w.r.t. Q x ′ i ) � 2 x ′ i ) ⊤ Q ( � x ′ x ′ i � 2 � P ( � e α − � 2 = ( � e α − � e α − � i ) = � � e α − � Q µ is a regularization parameter Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 14 / 16

Results Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 15 / 16

Results Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 16 / 16

Large Margin Taxonomy Embedding with an Application to Document - PowerPoint PPT Presentation

Large Margin Taxonomy Embedding with an Application to Document Categorization K. Weinberger and O. Chapelle NIPS 2008 presented by J. Silva, Duke University Large Margin Taxonomy Embedding with an Application to Document Categorization May

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

NCTracks Taxonomy Presentation Agenda Taxonomy Code Information Using Taxonomy Codes in

Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Maximum Margin Criterion Math

Introduction to Plant Taxonomy Introduction to Plant Taxonomy (See P. 1169) (See P. 1169)

Taxonomy Jrg Cassens Data and Process Visualization SoSe 2017 SoSe 2017 Jrg Cassens

About this class Maximizing the Margin Maximum margin classifiers Picture of large and small

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

How are living Taxonomy things classified? the classification of living things Taxonomy

BLOOMS TAXONOMY At the end of this workshop you will be able to: Explain what a Taxonomy

Flynns Taxonomy Prof. Mike Flynns famous taxonomy of parallel computers 1 Flynns

AmI Taxonomy AmI Taxonomy Network Characteristics of the technologies allowing devices to

Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 Maximum Margin Criterion

Large-Scale Clustering through Functional NCut Embedding Embedding Experiments Summary

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Topic #28 Nyquist plots: Gain and phase margin Reference textbook : Control Systems, Dhanesh N.

PREDATOR / REAPER SAFETY EDUCATION Presentation for GASCo UAVs are dangerous, right? The slides

Manufacturing Research at McMaster An Emphasis on Materials and Manufacturing 1998: McMaster

NSRP National Shipbuilding Research Program Program Update Welding Technology Panel May 4 - 5,

Meta-Learning for Low Resource NMT Introduction Historically Statistical Translation

Embedding client feedback into reflective practice. Dr Suzie Hudson, Clinical Director NADA

Failing EMC testing? > 50% of products fail EMC testing first time around Situation An

EMC KIT BOXES COVERS 4 EMC KIT BOXES Frequency Range 5 FOCUS ON HCC Kit Box 6

Early/Middle College Lisa Bartell, GCDF Kalamazoo County EMC Coordinator, KRESA EFE What is EMC?

Large Margin Taxonomy Embedding with an Application to Document - PowerPoint PPT Presentation

Large Margin Taxonomy Embedding with an Application to Document Categorization K. Weinberger and O. Chapelle NIPS 2008 presented by J. Silva, Duke University Large Margin Taxonomy Embedding with an Application to Document Categorization May

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

NCTracks Taxonomy Presentation Agenda Taxonomy Code Information Using Taxonomy Codes in

Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Maximum Margin Criterion Math

Introduction to Plant Taxonomy Introduction to Plant Taxonomy (See P. 1169) (See P. 1169)

Taxonomy Jrg Cassens Data and Process Visualization SoSe 2017 SoSe 2017 Jrg Cassens

About this class Maximizing the Margin Maximum margin classifiers Picture of large and small

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

How are living Taxonomy things classified? the classification of living things Taxonomy

BLOOMS TAXONOMY At the end of this workshop you will be able to: Explain what a Taxonomy

Flynns Taxonomy Prof. Mike Flynns famous taxonomy of parallel computers 1 Flynns

AmI Taxonomy AmI Taxonomy Network Characteristics of the technologies allowing devices to

Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 Maximum Margin Criterion

Large-Scale Clustering through Functional NCut Embedding Embedding Experiments Summary

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Topic #28 Nyquist plots: Gain and phase margin Reference textbook : Control Systems, Dhanesh N.

PREDATOR / REAPER SAFETY EDUCATION Presentation for GASCo UAVs are dangerous, right? The slides

Manufacturing Research at McMaster An Emphasis on Materials and Manufacturing 1998: McMaster

NSRP National Shipbuilding Research Program Program Update Welding Technology Panel May 4 - 5,

Meta-Learning for Low Resource NMT Introduction Historically Statistical Translation

Embedding client feedback into reflective practice. Dr Suzie Hudson, Clinical Director NADA

Failing EMC testing? &gt; 50% of products fail EMC testing first time around Situation An

EMC KIT BOXES COVERS 4 EMC KIT BOXES Frequency Range 5 FOCUS ON HCC Kit Box 6

Early/Middle College Lisa Bartell, GCDF Kalamazoo County EMC Coordinator, KRESA EFE What is EMC?

Failing EMC testing? > 50% of products fail EMC testing first time around Situation An