bandit multiclass linear classification efficient
play

Bandit Multiclass Linear Classification: Efficient Algorithms for - PowerPoint PPT Presentation

Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case Alina Beygelzimer David Pal Balazs Szorenyi (Yahoo!) (Yahoo!) (Yahoo!) Devanathan Chen-Yu Wei Chicheng Zhang Thiruvenkatachari (USC) (Microsoft) (NYU)


  1. Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case Alina Beygelzimer David Pal Balazs Szorenyi (Yahoo!) (Yahoo!) (Yahoo!) Devanathan Chen-Yu Wei Chicheng Zhang Thiruvenkatachari (USC) (Microsoft) (NYU)

  2. Bandit multiclass classification For t = 1 , 2 , . . . , T :

  3. Bandit multiclass classification For t = 1 , 2 , . . . , T : 1. Example ( x t , y t ) is chosen, where

  4. Bandit multiclass classification For t = 1 , 2 , . . . , T : 1. Example ( x t , y t ) is chosen, where x t ∈ R d is the feature (shown − − − − − → to the learner),

  5. Bandit multiclass classification For t = 1 , 2 , . . . , T : 1. Example ( x t , y t ) is chosen, where x t ∈ R d is the feature (shown − − − − − → to the learner), y t ∈ [ K ] is the label (hidden).

  6. Bandit multiclass classification For t = 1 , 2 , . . . , T : 1. Example ( x t , y t ) is chosen, where x t ∈ R d is the feature (shown − − − − − → to the learner), ← − − − − − y t ∈ [ K ] is the label (hidden). 2. Predict class label � y t ∈ [ K ].

  7. Bandit multiclass classification For t = 1 , 2 , . . . , T : 1. Example ( x t , y t ) is chosen, where x t ∈ R d is the feature (shown − − − − − → to the learner), ← − − − − − y t ∈ [ K ] is the label (hidden). − − − − − → 2. Predict class label � y t ∈ [ K ]. 3. Observe feedback z t = 1 [ � y t � = y t ] ∈ { 0 , 1 } .

  8. Bandit multiclass classification For t = 1 , 2 , . . . , T : 1. Example ( x t , y t ) is chosen, where x t ∈ R d is the feature (shown − − − − − → to the learner), ← − − − − − y t ∈ [ K ] is the label (hidden). − − − − − → 2. Predict class label � y t ∈ [ K ]. 3. Observe feedback z t = 1 [ � y t � = y t ] ∈ { 0 , 1 } . Goal: minimize the total number of mistakes � T t =1 z t .

  9. Challenge: efficient algorithms in the separable setting Definition A dataset is called γ -linearly separable if there exists w 1 , . . . , w K such that � � ∀ y ′ � = y , � w y , x � ≥ w y ′ , x + γ, for all ( x , y ) in the dataset. (with the constraint � K i =1 � w i � 2 ≤ 1)

  10. Challenge: efficient algorithms in the separable setting Definition A dataset is called γ -linearly separable if there exists w 1 , . . . , w K such that � � ∀ y ′ � = y , � w y , x � ≥ w y ′ , x + γ, for all ( x , y ) in the dataset. (with the constraint � K i =1 � w i � 2 ≤ 1) � w 1 − w 2 , x � = 0 Class 1 Class 2 Class 3 � w 1 − w 3 , x � = 0 � w 2 − w 3 , x � = 0

  11. Related work Algorithm Mistake Bound Efficient? 1 See also [HK11, BOZ17, FKL + 18, ..] that have similar guarantees

  12. Related work Algorithm Mistake Bound Efficient? O ( K /γ 2 ) Minimax algorithm [DH13] No 1 See also [HK11, BOZ17, FKL + 18, ..] that have similar guarantees

  13. Related work Algorithm Mistake Bound Efficient? O ( K /γ 2 ) Minimax algorithm [DH13] No � Banditron [KSST08] 1 TK /γ 2 ) O ( Yes 1 See also [HK11, BOZ17, FKL + 18, ..] that have similar guarantees

  14. Related work Algorithm Mistake Bound Efficient? O ( K /γ 2 ) Minimax algorithm [DH13] No � Banditron [KSST08] 1 TK /γ 2 ) O ( Yes O (min( K log 2 (1 /γ ) , √ 2 � 1 /γ log K )) This work Yes 1 See also [HK11, BOZ17, FKL + 18, ..] that have similar guarantees

  15. Related work Algorithm Mistake Bound Efficient? O ( K /γ 2 ) Minimax algorithm [DH13] No � Banditron [KSST08] 1 TK /γ 2 ) O ( Yes O (min( K log 2 (1 /γ ) , √ 2 � 1 /γ log K )) This work Yes √ Contribution : first efficient algorithm that breaks the T barrier 1 See also [HK11, BOZ17, FKL + 18, ..] that have similar guarantees

  16. Algorithm (One-versus-rest approach)

  17. Algorithm (One-versus-rest approach)

  18. Algorithm (One-versus-rest approach) If ≥ 1 of them respond YES: y t ← any one of those YES labels � If all of them respond NO: y t ← uniform from { 1 , . . . , K } �

  19. Algorithm (One-versus-rest approach) If ≥ 1 of them respond YES: y t ← any one of those YES labels � If all of them respond NO: y t ← uniform from { 1 , . . . , K } � E [#mistakes(alg)] ≤ K � i #mistakes( i )

  20. Algorithm ◮ Each non-linear binary classifier learns the support of class i , which lies in an intersection of K − 1 halfspaces with a margin [KS04]. � w 1 − w 2 , x � = 0 Class 1 Class 2 Class 3 � w 1 − w 3 , x � = 0 � w 2 − w 3 , x � = 0

  21. Algorithm ◮ Each non-linear binary classifier learns the support of class i , which lies in an intersection of K − 1 halfspaces with a margin [KS04]. � w 1 − w 2 , x � = 0 Class 1 Class 2 Class 3 � w 1 − w 3 , x � = 0 � w 2 − w 3 , x � = 0 ◮ Choice: kernel Perceptron with rational kernel [SSSS11]: 1 K ( x , x ′ ) = 2 � x , x ′ � . 1 − 1

  22. Algorithm ◮ Each non-linear binary classifier learns the support of class i , which lies in an intersection of K − 1 halfspaces with a margin [KS04]. � w 1 − w 2 , x � = 0 Class 1 Class 2 Class 3 � w 1 − w 3 , x � = 0 � w 2 − w 3 , x � = 0 ◮ Choice: kernel Perceptron with rational kernel [SSSS11]: 1 K ( x , x ′ ) = 2 � x , x ′ � . 1 − 1 Thu. Poster#158 ◮

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend