On Flat versus Hierarchical Classification in Large-Scale Taxonomies - PowerPoint PPT Presentation

On Flat versus Hierarchical Classification in Large-Scale Taxonomies R. Babbar, I. Partalas, ´ E. Gaussier, M.-R. Amini Gargantua (CNRS Mastodons) - November the 26 th , 2013

2/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Large-scale Hierarchical Classification in Practice ❑ Directory Mozilla ❑ 5 × 10 6 sites ❑ 10 6 categories ❑ 10 5 editors Root Arts Arts Sports Sports Movies Video Tennis Soccer Players Fun Gargantua - Mastodons Massih-Reza.Amini@imag.fr

3/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Approaches for Large Scale Hierarchical Classification (LSHC) ❑ Hierarchical ❑ Top-down - solve individual classification problems at every node ❑ Big-bang - solve the problem at Root once for entire tree Books Books Music Music ❑ Flat - ignore the taxonomy structure altogether Comics Poetry Rock Jazz ❑ Flattening Approaches in LSHTC Funky Fusion ❑ Somewhat arbitrary as they flatten entire layers ❑ Not quite clear which layers to flatten when taxonomy are much deeper with 10-15 levels Gargantua - Mastodons Massih-Reza.Amini@imag.fr

4/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Key Challenges in LSHC ❑ How reliable is the given hierarchical structure ? ❑ Arbitrariness in taxonomy creation based on personal biases and choices ❑ Other sources of noise include imbalanced nature of hierarchies ❑ Which Approach - Flat or Hierarchical ? ❑ Lack of clarity on exploiting the hierarchical structure of categories ❑ Speed versus Accuracy trade-off Gargantua - Mastodons Massih-Reza.Amini@imag.fr

5/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ⊥ ❑ hierarchy of classes H = ( V , E ) is defined in the form of a rooted tree, with a root ⊥ and a parent relationship π ❑ Nodes at the leaf level, Y = { y ∈ V : ∄ v ∈ V , ( y , v ) ∈ E } ⊂ V , constitute the set of target classes ❑ ∀ v ∈ V \ {⊥} , we define the set of its sisters S ( v ) = { v ′ ∈ V \ {⊥} ; v � = v ′ ∧ π ( v ) = π ( v ′ ) } and its daughters D ( v ) = { v ′ ∈ V \ {⊥} ; π ( v ′ ) = v } ❑ ∀ y ∈ Y , P ( y ) = { v y 1 , . . . , v y k y ; v y 1 = π ( y ) ∧ ∀ l ∈ { 1 , . . . , k y − 1 } , v y l +1 = π ( v y l ) ∧ π ( v y k y ) = ⊥} Gargantua - Mastodons Massih-Reza.Amini@imag.fr

5/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ⊥ Y ❑ hierarchy of classes H = ( V , E ) is defined in the form of a rooted tree, with a root ⊥ and a parent relationship π ❑ Nodes at the leaf level, Y = { y ∈ V : ∄ v ∈ V , ( y , v ) ∈ E } ⊂ V , constitute the set of target classes ❑ ∀ v ∈ V \ {⊥} , we define the set of its sisters S ( v ) = { v ′ ∈ V \ {⊥} ; v � = v ′ ∧ π ( v ) = π ( v ′ ) } and its daughters D ( v ) = { v ′ ∈ V \ {⊥} ; π ( v ′ ) = v } ❑ ∀ y ∈ Y , P ( y ) = { v y 1 , . . . , v y k y ; v y 1 = π ( y ) ∧ ∀ l ∈ { 1 , . . . , k y − 1 } , v y l +1 = π ( v y l ) ∧ π ( v y k y ) = ⊥} Gargantua - Mastodons Massih-Reza.Amini@imag.fr

5/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ⊥ S ( v ) v D ( v ) ❑ hierarchy of classes H = ( V , E ) is defined in the form of a rooted tree, with a root ⊥ and a parent relationship π ❑ Nodes at the leaf level, Y = { y ∈ V : ∄ v ∈ V , ( y , v ) ∈ E } ⊂ V , constitute the set of target classes ❑ ∀ v ∈ V \ {⊥} , we define the set of its sisters S ( v ) = { v ′ ∈ V \ {⊥} ; v � = v ′ ∧ π ( v ) = π ( v ′ ) } and its daughters D ( v ) = { v ′ ∈ V \ {⊥} ; π ( v ′ ) = v } ❑ ∀ y ∈ Y , P ( y ) = { v y 1 , . . . , v y k y ; v y 1 = π ( y ) ∧ ∀ l ∈ { 1 , . . . , k y − 1 } , v y l +1 = π ( v y l ) ∧ π ( v y k y ) = ⊥} Gargantua - Mastodons Massih-Reza.Amini@imag.fr

5/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ⊥ P ( y ) y ❑ hierarchy of classes H = ( V , E ) is defined in the form of a rooted tree, with a root ⊥ and a parent relationship π ❑ Nodes at the leaf level, Y = { y ∈ V : ∄ v ∈ V , ( y , v ) ∈ E } ⊂ V , constitute the set of target classes ❑ ∀ v ∈ V \ {⊥} , we define the set of its sisters S ( v ) = { v ′ ∈ V \ {⊥} ; v � = v ′ ∧ π ( v ) = π ( v ′ ) } and its daughters D ( v ) = { v ′ ∈ V \ {⊥} ; π ( v ′ ) = v } ❑ ∀ y ∈ Y , P ( y ) = { v y 1 , . . . , v y k y ; v y 1 = π ( y ) ∧ ∀ l ∈ { 1 , . . . , k y − 1 } , v y l +1 = π ( v y l ) ∧ π ( v y k y ) = ⊥} Gargantua - Mastodons Massih-Reza.Amini@imag.fr

6/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ❑ We consider a top-down hierarchical classification strategy ; ❑ Let K : X × X → R be a PDS kernel and let Φ : X → H be the associated feature mapping function, we suppose that there exists R > 0 such that K ( x , x ) ≤ R 2 for all x ∈ X ; ❑ We consider the class of functions f ∈ F B = { f : ( x , v ) ∈ X × V �→ � Φ( x ) , w v � | W = ( w 1 . . . , w | V | ) , || W || H ≤ B } ; ❑ An exemple ( x , y ) is misclassified iff by f ∈ F B v ′ ∈ S ( v ) f ( x , v ′ )) ≤ 0 v ∈ P ( y ) ( f ( x , v ) − max min Gargantua - Mastodons Massih-Reza.Amini@imag.fr

6/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound root ⊥ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ y ◦ ❑ An exemple ( x , y ) is misclassified iff by f ∈ F B v ′ ∈ S ( v ) f ( x , v ′ )) ≤ 0 v ∈ P ( y ) ( f ( x , v ) − max min Gargantua - Mastodons Massih-Reza.Amini@imag.fr

6/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound root ⊥ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ × ◦ ◦ ◦ y ◦ ❑ An exemple ( x , y ) is misclassified iff by f ∈ F B v ′ ∈ S ( v ) f ( x , v ′ )) v ∈ P ( y ) ( f ( x , v ) − max min ≤ 0 � �� multi-class margin Gargantua - Mastodons Massih-Reza.Amini@imag.fr

6/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound root ⊥ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ × ◦ ◦ ◦ y ◦ ❑ Top-Down hierarchical techniques suffer from error propagation, but imbalancement harms less as it does for flat approaches ⇒ a generalization bound to study these effects. Gargantua - Mastodons Massih-Reza.Amini@imag.fr

7/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound Theorem Let S = (( x ( i ) , y ( i ) )) m i =1 an i.i.d. training set drawn according to a probability distribution D over X × Y , and let A be a Lipschitz function with constant L dominating the 0 / 1 loss; further let K : X × X → R be a PDS kernel and let Φ : X → H be the associated feature mapping function. Assume R > 0 such that K ( x , x ) ≤ R 2 for all x ∈ X . Then, with probability at least (1 − δ ) the following bound holds for all f ∈ F B = { f : ( x , v ) ∈ X × V �→ � Φ( x ) , w v � | W = ( w 1 . . . , w | V | ) , || W || H ≤ B } : m � E ( g f ) ≤ 1 A ( g f ( x ( i ) , y ( i ) )) + 8 BRL ln(2 /δ ) � � √ m | D ( v ) | ( | D ( v ) | − 1) + 3 m 2 m i =1 v ∈ V \Y (1) where G F B = { g f : ( x , y ) ∈ X × Y �→ min v ∈ P ( y ) ( f ( x , v ) − max v ′ ∈ S ( v ) f ( x , v ′ )) | f ∈ F B } and | D ( v ) | denotes the number of daughters of node v. Gargantua - Mastodons Massih-Reza.Amini@imag.fr

8/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Extension of an existing result for flat multi-class classification Theorem (Guermeur, 2007) Let S = (( x ( i ) , y ( i ) )) m i =1 an i.i.d. training set drawn according to a probability distribution D over X × Y , and let A be a Lipschitz function with constant L dominating the 0 / 1 loss; further let K : X × X → R be a PDS kernel and let Φ : X → H be the associated feature mapping function. Assume R > 0 such that K ( x , x ) ≤ R 2 for all x ∈ X . Then, with probability at least (1 − δ ) the following bound holds for all f ∈ F B = { f : ( x , y ) ∈ X × Y �→ � Φ( x ) , w y � | W = ( w 1 . . . , w |Y| ) , || W || H ≤ B } : m � E ( g f ) ≤ 1 A ( g f ( x ( i ) , y ( i ) )) + 8 BRL ln(2 /δ ) � √ m |Y| ( |Y| − 1) + 3 (2) m 2 m i =1 where G F B = { g f : ( x , y ) ∈ X × Y �→ ( f ( x , y ) − max y ′ ∈Y\{ y } f ( x , y ′ )) | f ∈ F B } . Gargantua - Mastodons Massih-Reza.Amini@imag.fr

On Flat versus Hierarchical Classification in Large-Scale Taxonomies - PowerPoint PPT Presentation

On Flat versus Hierarchical Classification in Large-Scale Taxonomies R. Babbar, I. Partalas, E. Gaussier, M.-R. Amini Gargantua (CNRS Mastodons) - November the 26 th , 2013 2/21 Challenges Proposed approach Hierarchy Pruning Experiments

On(x) ~Flat(x) START FINISH ~Flat(Spare) Intact(Spare) Off(Spare) On(Tire1) Flat(Tire1)

On(x) ~Flat(x) START FINISH ~Flat(Spare) Intact(Spare) Off(Spare) On(Tire1) Flat(Tire1)

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Interlocking Forme No. 1 Flat - 520x418mm Finished - 220x307 Interlocking Forme No. 2 Flat -

Straight line drawing of a graph on the flat torus Luca Castelli Aleardi, LIX Olivier

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Positional Plagiocephaly Flat Head Syndrome Positional Plagiocephaly Also known as flat

Flat Rail Offering a solution for your OOG Cargo Simply attach to the 40 flat rack and

Very Flat, Locally Very Flat, and Contraadjusted Modules Alexander Sl avik (joint work with

On-line Hierarchical Multi-label Classification last 6 months Jesse Read jesse.read@gmail.com

Classification Relative to Hierarchical Order and Extension Property Luciano Vianna F elix

Hierarchical Classification of Pulmonary Lesions: A Large-Scale Radio-Pathomics Study Jiancheng

Recursive Regularization for Large-scale Classification with Hierarchical and Graphical

90

C -algebras of 2-groupoids Massoud Amini Tarbiat Modares University Institute for

On the Compressibility of Affinely Singular Random Vectors Mohammad Amin Charusaie , Stefano

The Many Faces of Instrumentation: Debugging and Better Performance using LLVM in HPC What are

for Modeling and Optimizing Distributed and Dynamic Multimedia Systems Presenter: Brian Foo

Graphs with a Power-Law Degree Distribution Grant Schoenebeck, Fang-Yi Yu Contagions, diffusion,

A Contextual Query Expansion Approach by Term Clustering for Robust Text Summarization Massih

StayingFIT: StayingFIT: EfficientLoadSheddingTechniquesfor

Sambuz

Useful Links

Newsletter

Mail Us