on the benefits of output sparsity for multi label
play

On the benefits of output sparsity for multi-label classification - PowerPoint PPT Presentation

On the benefits of output sparsity for multi-label classification Evgenii Chzhen http://echzhen.com Universit Paris-Est, Tlcom Paristech Joint work with: Christoph Denis, Mohamed Hebiri, Joseph Salmon 1 / 13 Outline Introduction


  1. On the benefits of output sparsity for multi-label classification Evgenii Chzhen http://echzhen.com Université Paris-Est, Télécom Paristech Joint work with: Christoph Denis, Mohamed Hebiri, Joseph Salmon 1 / 13

  2. Outline Introduction Framework and notation Motivation Our approach Add weights Numerical results Conclusion 2 / 13

  3. Outline Introduction Framework and notation Motivation Our approach Add weights Numerical results Conclusion 3 / 13

  4. Framework and notation We have N observations and each observation belongs to a set of labels. § Observations: X i P R D , i q J P t 0 , 1 u L , § Label vectors = binary vectors: Y i “ p Y 1 i , . . . , Y L § N, L, D - huge and probably N � L , § Y i consists of at most K ones (active labels) and K ! L . 4 / 13

  5. Outline Introduction Framework and notation Motivation Our approach Add weights Numerical results Conclusion 5 / 13

  6. Motivation 0-type error vs 1-type error Y l “ 1 when Y l “ 0 Y l “ 0 when Y l “ 1 ˆ ˆ 6 / 13

  7. Motivation 0-type error vs 1-type error Y l “ 1 when Y l “ 0 Y l “ 0 when Y l “ 1 ˆ ˆ Example q J , Y “ p 1 , . . . , 1 , 0 , . . . , 0 loomoon loomoon 10 90 q J , ˆ Y 0 “ p 1 , . . . , 1 , 1 , . . . , 1 , 0 , . . . , 0 loomoon loomoon loomoon 10 5 85 q J . ˆ Y 1 “ p 1 , . . . , 1 , 0 , . . . , 0 , 0 , . . . , 0 loomoon loomoon loomoon 5 5 90 § Same amount of mistakes but of different type § Which one is better for a user? 6 / 13

  8. Motivation 0-type error vs 1-type error Y l “ 1 when Y l “ 0 Y l “ 0 when Y l “ 1 ˆ ˆ Hamming loss L ÿ ÿ ÿ L H p Y, ˆ Y q “ Y l u “ Y l “ 1 u ` 1 t Y l ‰ ˆ 1 t ˆ 1 t ˆ Y l “ 0 u l “ 1 Y l “ 0 Y l “ 1 § For Hamming loss ˆ Y 0 and ˆ Y 1 are the same, § Hamming loss does not know anything about sparsity K , § But Hamming is separable, hence easy to optimize. 6 / 13

  9. Outline Introduction Framework and notation Motivation Our approach Add weights Numerical results Conclusion 7 / 13

  10. Our approach: add weights Weighted Hamming loss L p Y, ˆ ÿ ÿ Y q “ p 0 Y l “ 1 u ` p 1 1 t ˆ 1 t ˆ Y l “ 0 u , Y l “ 0 Y l “ 1 such that p 0 ` p 1 “ 1 . 8 / 13

  11. Our approach: add weights Weighted Hamming loss L p Y, ˆ ÿ ÿ Y q “ p 0 Y l “ 1 u ` p 1 1 t ˆ 1 t ˆ Y l “ 0 u , Y l “ 0 Y l “ 1 such that p 0 ` p 1 “ 1 . Examples § Hamming loss: p 0 “ p 1 “ 0 . 5 § [Jain et al., 2016] : p 0 “ 0 and p 1 “ 1 § Our choice: p 0 “ 2 K L and p 1 “ 1 ´ p 0 8 / 13

  12. Why our choice of weights? Consider the following situation q J § Y “ p 1 , . . . , 1 , 0 , . . . , 0 loomoon loomoon K L ´ K § ˆ Y 0 “ p 0 , . . . , 0 q J : predicts all labels inactive, § ˆ Y 1 “ p 1 , . . . , 1 q J : predicts all labels active, § ˆ Y 2 K “ p 1 , . . . , 1 , 0 , . . . , 0 q : makes K mistakes of 0-type loomoon loomoon 2 K L ´ 2 K § Do not forget that K ! L 9 / 13

  13. Why our choice of weights? Consider the following situation q J § Y “ p 1 , . . . , 1 , 0 , . . . , 0 loomoon loomoon K L ´ K § ˆ Y 0 “ p 0 , . . . , 0 q J : predicts all labels inactive, § ˆ Y 1 “ p 1 , . . . , 1 q J : predicts all labels active, § ˆ Y 2 K “ p 1 , . . . , 1 , 0 , . . . , 0 q : makes K mistakes of 0-type loomoon loomoon 2 K L ´ 2 K § Do not forget that K ! L Classical Hamming loss § ˆ Y 1 is almost the worst § ˆ Y 0 is the same as ˆ Y 2 K 9 / 13

  14. Why our choice of weights? Consider the following situation q J § Y “ p 1 , . . . , 1 , 0 , . . . , 0 loomoon loomoon K L ´ K § ˆ Y 0 “ p 0 , . . . , 0 q J : predicts all labels inactive, § ˆ Y 1 “ p 1 , . . . , 1 q J : predicts all labels active, § ˆ Y 2 K “ p 1 , . . . , 1 , 0 , . . . , 0 q : makes K mistakes of 0-type loomoon loomoon 2 K L ´ 2 K § Do not forget that K ! L [Jain et al., 2016] § ˆ Y 0 is the worst § ˆ Y 1 is the same as ˆ Y 2 K 9 / 13

  15. Why our choice of weights? Consider the following situation q J § Y “ p 1 , . . . , 1 , 0 , . . . , 0 loomoon loomoon K L ´ K § ˆ Y 0 “ p 0 , . . . , 0 q J : predicts all labels inactive, § ˆ Y 1 “ p 1 , . . . , 1 q J : predicts all labels active, § ˆ Y 2 K “ p 1 , . . . , 1 , 0 , . . . , 0 q : makes K mistakes of 0-type loomoon loomoon 2 K L ´ 2 K § Do not forget that K ! L Our choice § ˆ Y 0 , ˆ Y 1 are almost the worst § ˆ Y 2 K is almost the best 9 / 13

  16. Outline Introduction Framework and notation Motivation Our approach Add weights Numerical results Conclusion 10 / 13

  17. Numerical results Synthetic dataset with controlled sparsity: N “ 2 D “ 2 L “ 200 Median output sparsity Recall (micro) Precision (micro) Settings Our Std Our Std Our Std K “ 2 2.47 0.04 1.0 0.02 0.80 1.0 K “ 6 6.83 0.43 1.0 0.07 0.88 1.0 K “ 10 9.85 1.81 0.90 0.18 0.91 1.0 K “ 14 10.90 4.11 0.72 0.29 0.93 0.99 K “ 18 10.98 6.61 0.58 0.36 0.95 0.99 § When K ! L we output MORE active labels, § Hence, better Recall and worse Precision, § When K ą 10 our setting are violated. 11 / 13

  18. Conclusion § For sparse datasets: errors of 0/1-type are not the same for a user; § Use our framework if you agree with the previous idea; § We do not introduce a new algorithm per se, but we construct a new loss; § We provide a theoretical justification to our framework (generalization bounds and analysis of convex surrogates). 12 / 13

  19. Thank you for your attention! 13 / 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend