on flat versus hierarchical classification in large scale
play

On Flat versus Hierarchical Classification in Large-Scale Taxonomies - PowerPoint PPT Presentation

On Flat versus Hierarchical Classification in Large-Scale Taxonomies R. Babbar, I. Partalas, E. Gaussier, M.-R. Amini Gargantua (CNRS Mastodons) - November the 26 th , 2013 2/21 Challenges Proposed approach Hierarchy Pruning Experiments


  1. On Flat versus Hierarchical Classification in Large-Scale Taxonomies R. Babbar, I. Partalas, ´ E. Gaussier, M.-R. Amini Gargantua (CNRS Mastodons) - November the 26 th , 2013

  2. 2/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Large-scale Hierarchical Classification in Practice ❑ Directory Mozilla ❑ 5 × 10 6 sites ❑ 10 6 categories ❑ 10 5 editors Root Arts Arts Sports Sports Movies Video Tennis Soccer Players Fun Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  3. 3/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Approaches for Large Scale Hierarchical Classification (LSHC) ❑ Hierarchical ❑ Top-down - solve individual classification problems at every node ❑ Big-bang - solve the problem at Root once for entire tree Books Books Music Music ❑ Flat - ignore the taxonomy structure altogether Comics Poetry Rock Jazz ❑ Flattening Approaches in LSHTC Funky Fusion ❑ Somewhat arbitrary as they flatten entire layers ❑ Not quite clear which layers to flatten when taxonomy are much deeper with 10-15 levels Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  4. 4/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Key Challenges in LSHC ❑ How reliable is the given hierarchical structure ? ❑ Arbitrariness in taxonomy creation based on personal biases and choices ❑ Other sources of noise include imbalanced nature of hierarchies ❑ Which Approach - Flat or Hierarchical ? ❑ Lack of clarity on exploiting the hierarchical structure of categories ❑ Speed versus Accuracy trade-off Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  5. 5/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ⊥ ❑ hierarchy of classes H = ( V , E ) is defined in the form of a rooted tree, with a root ⊥ and a parent relationship π ❑ Nodes at the leaf level, Y = { y ∈ V : ∄ v ∈ V , ( y , v ) ∈ E } ⊂ V , constitute the set of target classes ❑ ∀ v ∈ V \ {⊥} , we define the set of its sisters S ( v ) = { v ′ ∈ V \ {⊥} ; v � = v ′ ∧ π ( v ) = π ( v ′ ) } and its daughters D ( v ) = { v ′ ∈ V \ {⊥} ; π ( v ′ ) = v } ❑ ∀ y ∈ Y , P ( y ) = { v y 1 , . . . , v y k y ; v y 1 = π ( y ) ∧ ∀ l ∈ { 1 , . . . , k y − 1 } , v y l +1 = π ( v y l ) ∧ π ( v y k y ) = ⊥} Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  6. 5/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ⊥ Y ❑ hierarchy of classes H = ( V , E ) is defined in the form of a rooted tree, with a root ⊥ and a parent relationship π ❑ Nodes at the leaf level, Y = { y ∈ V : ∄ v ∈ V , ( y , v ) ∈ E } ⊂ V , constitute the set of target classes ❑ ∀ v ∈ V \ {⊥} , we define the set of its sisters S ( v ) = { v ′ ∈ V \ {⊥} ; v � = v ′ ∧ π ( v ) = π ( v ′ ) } and its daughters D ( v ) = { v ′ ∈ V \ {⊥} ; π ( v ′ ) = v } ❑ ∀ y ∈ Y , P ( y ) = { v y 1 , . . . , v y k y ; v y 1 = π ( y ) ∧ ∀ l ∈ { 1 , . . . , k y − 1 } , v y l +1 = π ( v y l ) ∧ π ( v y k y ) = ⊥} Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  7. 5/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ⊥ S ( v ) v D ( v ) ❑ hierarchy of classes H = ( V , E ) is defined in the form of a rooted tree, with a root ⊥ and a parent relationship π ❑ Nodes at the leaf level, Y = { y ∈ V : ∄ v ∈ V , ( y , v ) ∈ E } ⊂ V , constitute the set of target classes ❑ ∀ v ∈ V \ {⊥} , we define the set of its sisters S ( v ) = { v ′ ∈ V \ {⊥} ; v � = v ′ ∧ π ( v ) = π ( v ′ ) } and its daughters D ( v ) = { v ′ ∈ V \ {⊥} ; π ( v ′ ) = v } ❑ ∀ y ∈ Y , P ( y ) = { v y 1 , . . . , v y k y ; v y 1 = π ( y ) ∧ ∀ l ∈ { 1 , . . . , k y − 1 } , v y l +1 = π ( v y l ) ∧ π ( v y k y ) = ⊥} Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  8. 5/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ⊥ P ( y ) y ❑ hierarchy of classes H = ( V , E ) is defined in the form of a rooted tree, with a root ⊥ and a parent relationship π ❑ Nodes at the leaf level, Y = { y ∈ V : ∄ v ∈ V , ( y , v ) ∈ E } ⊂ V , constitute the set of target classes ❑ ∀ v ∈ V \ {⊥} , we define the set of its sisters S ( v ) = { v ′ ∈ V \ {⊥} ; v � = v ′ ∧ π ( v ) = π ( v ′ ) } and its daughters D ( v ) = { v ′ ∈ V \ {⊥} ; π ( v ′ ) = v } ❑ ∀ y ∈ Y , P ( y ) = { v y 1 , . . . , v y k y ; v y 1 = π ( y ) ∧ ∀ l ∈ { 1 , . . . , k y − 1 } , v y l +1 = π ( v y l ) ∧ π ( v y k y ) = ⊥} Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  9. 6/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ❑ We consider a top-down hierarchical classification strategy ; ❑ Let K : X × X → R be a PDS kernel and let Φ : X → H be the associated feature mapping function, we suppose that there exists R > 0 such that K ( x , x ) ≤ R 2 for all x ∈ X ; ❑ We consider the class of functions f ∈ F B = { f : ( x , v ) ∈ X × V �→ � Φ( x ) , w v � | W = ( w 1 . . . , w | V | ) , || W || H ≤ B } ; ❑ An exemple ( x , y ) is misclassified iff by f ∈ F B v ′ ∈ S ( v ) f ( x , v ′ )) ≤ 0 v ∈ P ( y ) ( f ( x , v ) − max min Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  10. 6/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound root ⊥ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ y ◦ ❑ An exemple ( x , y ) is misclassified iff by f ∈ F B v ′ ∈ S ( v ) f ( x , v ′ )) ≤ 0 v ∈ P ( y ) ( f ( x , v ) − max min Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  11. 6/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound root ⊥ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ × ◦ ◦ ◦ y ◦ ❑ An exemple ( x , y ) is misclassified iff by f ∈ F B v ′ ∈ S ( v ) f ( x , v ′ )) v ∈ P ( y ) ( f ( x , v ) − max min ≤ 0 � �� � multi-class margin Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  12. 6/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound root ⊥ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ × ◦ ◦ ◦ y ◦ ❑ Top-Down hierarchical techniques suffer from error propagation, but imbalancement harms less as it does for flat approaches ⇒ a generalization bound to study these effects. Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  13. 7/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound Theorem Let S = (( x ( i ) , y ( i ) )) m i =1 an i.i.d. training set drawn according to a probability distribution D over X × Y , and let A be a Lipschitz function with constant L dominating the 0 / 1 loss; further let K : X × X → R be a PDS kernel and let Φ : X → H be the associated feature mapping function. Assume R > 0 such that K ( x , x ) ≤ R 2 for all x ∈ X . Then, with probability at least (1 − δ ) the following bound holds for all f ∈ F B = { f : ( x , v ) ∈ X × V �→ � Φ( x ) , w v � | W = ( w 1 . . . , w | V | ) , || W || H ≤ B } : m � E ( g f ) ≤ 1 A ( g f ( x ( i ) , y ( i ) )) + 8 BRL ln(2 /δ ) � � √ m | D ( v ) | ( | D ( v ) | − 1) + 3 m 2 m i =1 v ∈ V \Y (1) where G F B = { g f : ( x , y ) ∈ X × Y �→ min v ∈ P ( y ) ( f ( x , v ) − max v ′ ∈ S ( v ) f ( x , v ′ )) | f ∈ F B } and | D ( v ) | denotes the number of daughters of node v. Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  14. 8/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Extension of an existing result for flat multi-class classification Theorem (Guermeur, 2007) Let S = (( x ( i ) , y ( i ) )) m i =1 an i.i.d. training set drawn according to a probability distribution D over X × Y , and let A be a Lipschitz function with constant L dominating the 0 / 1 loss; further let K : X × X → R be a PDS kernel and let Φ : X → H be the associated feature mapping function. Assume R > 0 such that K ( x , x ) ≤ R 2 for all x ∈ X . Then, with probability at least (1 − δ ) the following bound holds for all f ∈ F B = { f : ( x , y ) ∈ X × Y �→ � Φ( x ) , w y � | W = ( w 1 . . . , w |Y| ) , || W || H ≤ B } : m � E ( g f ) ≤ 1 A ( g f ( x ( i ) , y ( i ) )) + 8 BRL ln(2 /δ ) � √ m |Y| ( |Y| − 1) + 3 (2) m 2 m i =1 where G F B = { g f : ( x , y ) ∈ X × Y �→ ( f ( x , y ) − max y ′ ∈Y\{ y } f ( x , y ′ )) | f ∈ F B } . Gargantua - Mastodons Massih-Reza.Amini@imag.fr

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend