the ladder a reliable leaderboard for machine learning
play

The Ladder: A Reliable Leaderboard for Machine Learning Competitions - PowerPoint PPT Presentation

The Ladder: A Reliable Leaderboard for Machine Learning Competitions COMS 6998-4 2017, Topics in Learning Theory Qinyao He qh2183@columbia.edu Columbia University November 30, 2017 . . . . . . . . . . . . . . . . . . . . .


  1. The Ladder: A Reliable Leaderboard for Machine Learning Competitions COMS 6998-4 2017, Topics in Learning Theory Qinyao He qh2183@columbia.edu Columbia University November 30, 2017 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  2. Outline Introduction Problem Formulation Ladder Mechanism Parameter Free Modification Boosting Attack Experiment in Real . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  3. Outline Introduction Problem Formulation Ladder Mechanism Boosting Attack Experiment in Real . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  4. Kaggle Competition Figure: Public and Private Leaderboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  5. Overfiting ▶ Repeated submission to Kaggle leaderboard tends to overfit the public leaderboard dataset. ▶ Public leaderboard score may not represent the actual performance, participants can be mislead. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  6. Overfiting ▶ Repeated submission to Kaggle leaderboard tends to overfit the public leaderboard dataset. ▶ Public leaderboard score may not represent the actual performance, participants can be mislead. ▶ In fact the error between the public leaderboard and actual √ k performance can be large as O ( n ), k is number of submission. ▶ How should we deal with that? How to maintain a leaderboard with reliable accurate estimation of the true performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  7. Ways to Reduce that Effect ▶ Limit the rate of submission (maximum of 10 submission per day). ▶ Limit the numerical accuracy returned by the leaderboard (rounding to fixed decimal digits). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  8. Ways to Reduce that Effect ▶ Limit the rate of submission (maximum of 10 submission per day). ▶ Limit the numerical accuracy returned by the leaderboard (rounding to fixed decimal digits). We want theoretical guarantee even for very large times of submission. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  9. Outline Introduction Problem Formulation Ladder Mechanism Boosting Attack Experiment in Real . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  10. Preliminaries and Notations ▶ Data domain X and label domain Y , unknown distribution D over X × Y . ▶ Classifier f : X → Y , loss function ℓ : Y × Y → [0 , 1]. ▶ Set of sample S = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } drawn i.i.d from D . ▶ Empirical loss n R S ( f ) = 1 ∑ ℓ ( f ( x i ) , y i ) n i =1 ▶ True loss R D ( f ) = [ ℓ ( f ( x ) , y )] E ( x , y ) ∼D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  11. Leaderboard Model 1. Each time t a competitor submit a classifier f t (in practice a prediction over holdout dataset). 2. The leaderboard return a estimate of score R t to the competitor using public leaderboard dataset S . 3. Finally the true score over D is estimated over another set of private dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  12. Error Evaluation Given a sequence of classifier f 1 , f 2 , . . . , f k , and score by the leaderboard R t , we want to bound max | R D ( f t ) − R t | t i.e., we should make Pr[ ∃ t ∈ [ k ] : | R D ( f t ) − R t | > ϵ ] ≤ δ The error on private leaderboard should be close to the true loss since those private data are not revealed to the competitor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  13. Kaggle Algorithm Algorithm 1 Kaggle Algorithm Input: Data set S , rounding parameter α > 0 (typically 0.00001) for each round t ← 1 , 2 , . . . do Receive function f t : X → Y return [ R S ( f t )] α end for [ x ] α denote rounding x to the nearest integer multiple of α . e.g., [3 . 14159] 0 . 01 = 3 . 14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  14. Simple Non-adaptive Case ▶ Assume all f 1 , . . . , f k are fixed independent of S ▶ Just compute empirical loss R S ( f t ) as R t . ▶ Directly apply Hoeffding’s inequality and union bound we have Pr[ ∃ t ∈ [ k ] : | R D ( f t ) − R S ( f t ) | > ϵ ] ≤ 2 k exp( − 2 ϵ 2 n ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  15. Simple Non-adaptive Case ▶ Assume all f 1 , . . . , f k are fixed independent of S ▶ Just compute empirical loss R S ( f t ) as R t . ▶ Directly apply Hoeffding’s inequality and union bound we have Pr[ ∃ t ∈ [ k ] : | R D ( f t ) − R S ( f t ) | > ϵ ] ≤ 2 k exp( − 2 ϵ 2 n ) ▶ √ log k ϵ = O ( ) n k = O (exp( ϵ 2 n )) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  16. Adaptive Setting ▶ Classifier f t may be chosen as a function of previous estimate. f t = A ( f 1 , R 1 , . . . , f t − 1 , R t − 1 ) independence of f 1 , . . . , f k never holds, no longer union bounds over k ! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  17. Adaptive Setting ▶ Classifier f t may be chosen as a function of previous estimate. f t = A ( f 1 , R 1 , . . . , f t − 1 , R t − 1 ) independence of f 1 , . . . , f k never holds, no longer union bounds over k ! ▶ We will later show an simple attack for the Kaggle algorithm √ k to have error ϵ = Ω( n ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  18. Adaptive Setting ▶ Classifier f t may be chosen as a function of previous estimate. f t = A ( f 1 , R 1 , . . . , f t − 1 , R t − 1 ) independence of f 1 , . . . , f k never holds, no longer union bounds over k ! ▶ We will later show an simple attack for the Kaggle algorithm √ k to have error ϵ = Ω( n ). ▶ In fact no computational efficient way to achieve o (1) error with k ≥ n 2+ o (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  19. Leaderboard Error Previous setting of bounding error for every step is not possible. Introduce a weaker notion, we only cares about the best classifier submitted so far rather than accurately estimate all f i . Let R t returned by the leaderboard at time t represent the estimated loss of the currently best classifier. Definition Given adaptively chosen f 1 , . . . , f k , define leaderboard error of estimates R 1 , . . . , R k , � � � � lberr( R 1 , . . . , R k ) = max � min 1 ≤ i ≤ t R D ( f i ) − R t � � 1 ≤ t ≤ k � . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  20. Outline Introduction Problem Formulation Ladder Mechanism Boosting Attack Experiment in Real . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  21. Ladder Algorithm Algorithm 2 Ladder Algorithm Input: Data set S , step size η > 0 Assign initial state R 0 ← ∞ for each round t ← 1 , 2 , . . . do Receive function f t : X → Y if R S ( f t ) < R t − 1 − η then Assign R t ← [ R S ( f t )] η else Assign R t ← R t − 1 end if return R t end for Require an increase by some margin η to be considered as the new best. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  22. Error Bound Theorem For any adaptively chosen f 1 , . . . , f k , the Ladder Mechanism satisfy for all t ≤ k and ϵ > 0 , lberr ( R 1 , . . . , R k ) = O (log 1 / 3 ( kn ) ) n 1 / 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  23. Error Bound Theorem For any adaptively chosen f 1 , . . . , f k , the Ladder Mechanism satisfy for all t ≤ k and ϵ > 0 , lberr ( R 1 , . . . , R k ) = O (log 1 / 3 ( kn ) ) n 1 / 3 Put it another way, we can have up to k = O (1 n exp( n ϵ 3 )) submissions but still expect the leaderboard error to be small. Previously, k = O ( n 2 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  24. Proof ▶ Recall the union bound technique we apply in non-adaptive setting Pr[ ∃ t ∈ [ k ] : | R D ( f t ) − R S ( f t ) | > ϵ ] ≤ 2 k exp( − 2 ϵ 2 n ) ▶ No longer only k possible classifiers, need to consider all possible classifiers may appear to apply the union bound. ▶ Now the problem becomes counting the total number of different classifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend