On the benefits of output sparsity for multi-label classification - PowerPoint PPT Presentation

On the benefits of output sparsity for multi-label classification Evgenii Chzhen http://echzhen.com Université Paris-Est, Télécom Paristech Joint work with: Christoph Denis, Mohamed Hebiri, Joseph Salmon 1 / 13

Outline Introduction Framework and notation Motivation Our approach Add weights Numerical results Conclusion 2 / 13

Framework and notation We have N observations and each observation belongs to a set of labels. § Observations: X i P R D , i q J P t 0 , 1 u L , § Label vectors = binary vectors: Y i “ p Y 1 i , . . . , Y L § N, L, D - huge and probably N � L , § Y i consists of at most K ones (active labels) and K ! L . 4 / 13

Motivation 0-type error vs 1-type error Y l “ 1 when Y l “ 0 Y l “ 0 when Y l “ 1 ˆ ˆ 6 / 13

Motivation 0-type error vs 1-type error Y l “ 1 when Y l “ 0 Y l “ 0 when Y l “ 1 ˆ ˆ Example q J , Y “ p 1 , . . . , 1 , 0 , . . . , 0 loomoon loomoon 10 90 q J , ˆ Y 0 “ p 1 , . . . , 1 , 1 , . . . , 1 , 0 , . . . , 0 loomoon loomoon loomoon 10 5 85 q J . ˆ Y 1 “ p 1 , . . . , 1 , 0 , . . . , 0 , 0 , . . . , 0 loomoon loomoon loomoon 5 5 90 § Same amount of mistakes but of different type § Which one is better for a user? 6 / 13

Motivation 0-type error vs 1-type error Y l “ 1 when Y l “ 0 Y l “ 0 when Y l “ 1 ˆ ˆ Hamming loss L ÿ ÿ ÿ L H p Y, ˆ Y q “ Y l u “ Y l “ 1 u ` 1 t Y l ‰ ˆ 1 t ˆ 1 t ˆ Y l “ 0 u l “ 1 Y l “ 0 Y l “ 1 § For Hamming loss ˆ Y 0 and ˆ Y 1 are the same, § Hamming loss does not know anything about sparsity K , § But Hamming is separable, hence easy to optimize. 6 / 13

Our approach: add weights Weighted Hamming loss L p Y, ˆ ÿ ÿ Y q “ p 0 Y l “ 1 u ` p 1 1 t ˆ 1 t ˆ Y l “ 0 u , Y l “ 0 Y l “ 1 such that p 0 ` p 1 “ 1 . 8 / 13

Our approach: add weights Weighted Hamming loss L p Y, ˆ ÿ ÿ Y q “ p 0 Y l “ 1 u ` p 1 1 t ˆ 1 t ˆ Y l “ 0 u , Y l “ 0 Y l “ 1 such that p 0 ` p 1 “ 1 . Examples § Hamming loss: p 0 “ p 1 “ 0 . 5 § [Jain et al., 2016] : p 0 “ 0 and p 1 “ 1 § Our choice: p 0 “ 2 K L and p 1 “ 1 ´ p 0 8 / 13

Why our choice of weights? Consider the following situation q J § Y “ p 1 , . . . , 1 , 0 , . . . , 0 loomoon loomoon K L ´ K § ˆ Y 0 “ p 0 , . . . , 0 q J : predicts all labels inactive, § ˆ Y 1 “ p 1 , . . . , 1 q J : predicts all labels active, § ˆ Y 2 K “ p 1 , . . . , 1 , 0 , . . . , 0 q : makes K mistakes of 0-type loomoon loomoon 2 K L ´ 2 K § Do not forget that K ! L 9 / 13

Why our choice of weights? Consider the following situation q J § Y “ p 1 , . . . , 1 , 0 , . . . , 0 loomoon loomoon K L ´ K § ˆ Y 0 “ p 0 , . . . , 0 q J : predicts all labels inactive, § ˆ Y 1 “ p 1 , . . . , 1 q J : predicts all labels active, § ˆ Y 2 K “ p 1 , . . . , 1 , 0 , . . . , 0 q : makes K mistakes of 0-type loomoon loomoon 2 K L ´ 2 K § Do not forget that K ! L Classical Hamming loss § ˆ Y 1 is almost the worst § ˆ Y 0 is the same as ˆ Y 2 K 9 / 13

Why our choice of weights? Consider the following situation q J § Y “ p 1 , . . . , 1 , 0 , . . . , 0 loomoon loomoon K L ´ K § ˆ Y 0 “ p 0 , . . . , 0 q J : predicts all labels inactive, § ˆ Y 1 “ p 1 , . . . , 1 q J : predicts all labels active, § ˆ Y 2 K “ p 1 , . . . , 1 , 0 , . . . , 0 q : makes K mistakes of 0-type loomoon loomoon 2 K L ´ 2 K § Do not forget that K ! L [Jain et al., 2016] § ˆ Y 0 is the worst § ˆ Y 1 is the same as ˆ Y 2 K 9 / 13

Why our choice of weights? Consider the following situation q J § Y “ p 1 , . . . , 1 , 0 , . . . , 0 loomoon loomoon K L ´ K § ˆ Y 0 “ p 0 , . . . , 0 q J : predicts all labels inactive, § ˆ Y 1 “ p 1 , . . . , 1 q J : predicts all labels active, § ˆ Y 2 K “ p 1 , . . . , 1 , 0 , . . . , 0 q : makes K mistakes of 0-type loomoon loomoon 2 K L ´ 2 K § Do not forget that K ! L Our choice § ˆ Y 0 , ˆ Y 1 are almost the worst § ˆ Y 2 K is almost the best 9 / 13

Numerical results Synthetic dataset with controlled sparsity: N “ 2 D “ 2 L “ 200 Median output sparsity Recall (micro) Precision (micro) Settings Our Std Our Std Our Std K “ 2 2.47 0.04 1.0 0.02 0.80 1.0 K “ 6 6.83 0.43 1.0 0.07 0.88 1.0 K “ 10 9.85 1.81 0.90 0.18 0.91 1.0 K “ 14 10.90 4.11 0.72 0.29 0.93 0.99 K “ 18 10.98 6.61 0.58 0.36 0.95 0.99 § When K ! L we output MORE active labels, § Hence, better Recall and worse Precision, § When K ą 10 our setting are violated. 11 / 13

Conclusion § For sparse datasets: errors of 0/1-type are not the same for a user; § Use our framework if you agree with the previous idea; § We do not introduce a new algorithm per se, but we construct a new loss; § We provide a theoretical justification to our framework (generalization bounds and analysis of convex surrogates). 12 / 13

Thank you for your attention! 13 / 13

On the benefits of output sparsity for multi-label classification - PowerPoint PPT Presentation

On the benefits of output sparsity for multi-label classification Evgenii Chzhen http://echzhen.com Universit Paris-Est, Tlcom Paristech Joint work with: Christoph Denis, Mohamed Hebiri, Joseph Salmon 1 / 13 Outline Introduction

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon

Introduction to Sparsity in Modeling and Learning Introduction to Sparsity in Modeling and

Sparsity and image processing Aurlie Boisbunon INRIA-SAM, AYIN March 26, 2014 Why sparsity?

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

Club Med Bintan Island, Indonesia A HOLISTIC WELLNESS ESCAPE JUST OFF SINGAPORE Image label

Presentation of the label Certicold WHY A CERTICOLD LABEL? A European conformity label For

IETF 78 TPA-Label for ADSP DKIM Third-Party Authorization Label draft-otis-dkim-tpa-label By

MPLS Source Label draft-chen-mpls-source-label-02 Mach Chen, Xiaohu Xu Zhenbin Li, Luyuan Fang

Motivations 1 Intel/Sandia Teraflops System (10 12 flops) ENIAC (5000 flops) 1 It will be

IN5060 Performance in distributed systems User studies (cntd) Does blur hide asynchrony? study

Multiple Dirichlet Series Gautam Chinta 2013-02-18 Mon Previous talks of Sol Friedberg 1.

Interrupts and System Calls Don Porter 1 COMP 530: Operating Systems First lecture Ok,

Program Transformation Application examples Converting to a new language dialect Migrating

Calabi-Yau pointed Hopf algebras of finite Cartan type Yinhuo Zhang University of Hasselt

Integrable Deformations for N = 4 SYM and ABJM Amplitudes Till Bargheer IAS Princeton Dec 12,

GigaRing SWS & I/O Status Paul Falde Engineering Manager - I/O Software Development SGI

Sambuz

Useful Links

Newsletter

Mail Us

On the benefits of output sparsity for multi-label classification - PowerPoint PPT Presentation

On the benefits of output sparsity for multi-label classification Evgenii Chzhen http://echzhen.com Universit Paris-Est, Tlcom Paristech Joint work with: Christoph Denis, Mohamed Hebiri, Joseph Salmon 1 / 13 Outline Introduction

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

Extreme Classification A New Paradigm for Ranking &amp; Recommendation Manik Varma Microsoft

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon

Introduction to Sparsity in Modeling and Learning Introduction to Sparsity in Modeling and

Sparsity and image processing Aurlie Boisbunon INRIA-SAM, AYIN March 26, 2014 Why sparsity?

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

Club Med Bintan Island, Indonesia A HOLISTIC WELLNESS ESCAPE JUST OFF SINGAPORE Image label

Presentation of the label Certicold WHY A CERTICOLD LABEL? A European conformity label For

IETF 78 TPA-Label for ADSP DKIM Third-Party Authorization Label draft-otis-dkim-tpa-label By

MPLS Source Label draft-chen-mpls-source-label-02 Mach Chen, Xiaohu Xu Zhenbin Li, Luyuan Fang

Motivations 1 Intel/Sandia Teraflops System (10 12 flops) ENIAC (5000 flops) 1 It will be

IN5060 Performance in distributed systems User studies (cntd) Does blur hide asynchrony? study

Multiple Dirichlet Series Gautam Chinta 2013-02-18 Mon Previous talks of Sol Friedberg 1.

Interrupts and System Calls Don Porter 1 COMP 530: Operating Systems First lecture Ok,

Program Transformation Application examples Converting to a new language dialect Migrating

Calabi-Yau pointed Hopf algebras of finite Cartan type Yinhuo Zhang University of Hasselt

Integrable Deformations for N = 4 SYM and ABJM Amplitudes Till Bargheer IAS Princeton Dec 12,

GigaRing SWS &amp; I/O Status Paul Falde Engineering Manager - I/O Software Development SGI

Sambuz

Useful Links

Newsletter

Mail Us

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft

GigaRing SWS & I/O Status Paul Falde Engineering Manager - I/O Software Development SGI