learning wit ith pairw rwis ise losses
play

Learning wit ith Pairw rwis ise Losses Problems, Algorithms and - PowerPoint PPT Presentation

Learning wit ith Pairw rwis ise Losses Problems, Algorithms and Analysis Purushottam Kar Microsoft Research India Outl tline Part I: Introduction to pairwise loss functions Example applications Part II: Batch learning with


  1. Learning wit ith Pairw rwis ise Losses Problems, Algorithms and Analysis Purushottam Kar Microsoft Research India

  2. Outl tline • Part I: Introduction to pairwise loss functions • Example applications • Part II: Batch learning with pairwise loss functions • Learning formulation: no algorithmic details • Generalization bounds • The coupling phenomenon • Decoupling techniques • Part III: Online learning with pairwise loss functions • A generic online algorithm • Regret analysis • Online-to-batch conversion bounds • A decoupling technique for online-to-batch conversions E0 370: Statistical Learning Theory 2

  3. Part I: I: In Introduction E0 370: Statistical Learning Theory 3

  4. What is is a lo loss fu functio ion? ℓ: ℋ → ℝ + • We observe empirical losses on data 𝑇 = 𝑦 1 , … 𝑦 𝑜 ℓ 𝑦 𝑗 ⋅ = ℓ ℎ, 𝑦 𝑗 • … and try to minimize them (e.g. classfn, regression) ℒ 𝑇 ℎ = 1 ℎ = inf ℒ 𝑇 ℎ , 𝑜 ∑ℓ 𝑦 𝑗 ℎ ℎ∈ℋ • … in the hope that 1 𝑜 ∑ℓ 𝑦 𝑗 ⋅ − 𝔽ℓ 𝑦 ⋅ ∞ ≤ 𝜗 • ... so that ℎ ≤ ℒ ℎ ∗ + 𝜗, ℒ ℒ ℎ = 𝔽ℓ 𝑦 ℎ E0 370: Statistical Learning Theory 4

  5. Metric ic Learnin ing • Penalize metric for bringing blue and red points close • Loss function needs to consider two points at a time! • … in other words a pairwise loss function 1, 𝑧 1 ≠ 𝑧 2 and 𝑁 𝑦 1 , 𝑦 2 < 𝛿 1 • E.g. ℓ 𝑦 1 ,𝑦 2 𝑁 = 1, 𝑧 1 = 𝑧 2 and 𝑁 𝑦 1 , 𝑦 2 > 𝛿 2 0, otherwise E0 370: Statistical Learning Theory 5

  6. Pairw irwis ise Loss Functio ions • Typically, loss functions are based on ground truth ℓ 𝑦 ℎ = ℓ ℎ 𝑦 , 𝑧 𝑦 • Thus, for metric learning, loss functions look like ℓ 𝑦 1 ,𝑦 2 ℎ = ℓ ℎ 𝑦 1 , 𝑦 2 , 𝑧 𝑦 1 , 𝑦 2 • In previous example, we had ℎ 𝑦 1 , 𝑦 2 = 𝑁 𝑦 1 , 𝑦 2 and 𝑧 𝑦 1 , 𝑦 2 = 𝑧 1 𝑧 2 • Useful to learn patterns that capture data interactions E0 370: Statistical Learning Theory 6

  7. Pairw irwis ise Loss Functio ions Examples: ( 𝜚 is any margin loss function e.g. hinge loss) • Metric learning [Jin et al NIPS ‘ 09] ℓ 𝑦 1 ,𝑦 2 𝑁 = 𝜚 𝑧 1 𝑧 2 1 − 𝑁 𝑦 1 , 𝑦 2 • Preference learning [Xing et al NIPS ‘ 02] • S-goodness [Balcan-Blum ICML ‘ 06] ℓ 𝑦 1 ,𝑦 2 𝐿 = 𝜚 𝑧 1 𝑧 2 𝐿 𝑦 1 , 𝑦 2 • Kernel-target alignment [Cortes et al ICML ‘ 10] • Bipartite ranking, (p)AUC [Narasimhan-Agarwal ICML ‘ 13] ℓ 𝑦 1 ,𝑦 2 𝑔 = 𝜚 𝑔 𝑦 1 − 𝑔 𝑦 2 𝑧 1 − 𝑧 2 E0 370: Statistical Learning Theory 7

  8. Learnin ing Obje jectiv ives in in Pairw irwise Learnin ing • Given training data 𝑦 1 , 𝑦 2 , … 𝑦 𝑜 • Learn ℎ: 𝒴 × 𝒴 → 𝒵 such that ℎ ≤ ℒ ℎ ∗ + 𝜗 ℒ (will define ℒ ⋅ and ℒ ⋅ shortly) Challenges: • Training data given as singletons, not pairs • Algorithmic efficiency • Generalization error bounds E0 370: Statistical Learning Theory 8

  9. Part II: II: Batch Learning E0 370: Statistical Learning Theory 9

  10. Part II: II: Batch Learning Batch Learning for Unary Losses E0 370: Statistical Learning Theory 10

  11. Trainin ing wit ith Unary Loss Functions • Notion of empirical loss ℒ: ℋ → ℝ + • Given training data 𝑇 = 𝑦 1 , … , 𝑦 𝑜 , natural notion ℒ 𝑇 ⋅ = 1 𝑜 ∑ℓ ⋅, 𝑦 𝑗 • Empirical risk minimization dictates us to find ℎ , s.t. ℒ 𝑇 ℎ ≤ inf ℒ 𝑇 ℎ ℎ∈ℋ • Note that ℒ ⋅ is a U-statistic ℒ 𝑇 : ℋ → ℝ + s.t. • U-statistic : a notion of “training loss” ∀ℎ ∈ ℋ, 𝔽 ℒ 𝑇 ℎ = ℒ ℎ E0 370: Statistical Learning Theory 11

  12. Generali lization bounds for Unary ry Loss Functio ions • Step 1 : Bound excess risk by supr ē mus excess risk ℒ ℒ 𝑇 ℎ − ℒ ℎ − ℎ ≤ sup ℒ 𝑇 ℎ ℎ∈ℋ • Step 2 : Apply McDiarmid’s inequality ℒ 𝑇 ℎ is not perturbed by changing any 𝑦 𝑗 1 ℒ ℒ 𝑇 ℎ − ℒ ℎ − + ℎ ≤ 𝔽 sup ℒ 𝑇 ℎ 𝒫 𝑜 ℎ∈ℋ • Step 3 : Analyze the expected supr ē mus excess risk ℒ ℎ − 𝔽 − 𝔽 sup ℒ 𝑇 ℎ = 𝔽 sup ℒ 𝑇 ℎ ℒ 𝑇 ℎ ℎ∈ℋ ℎ∈ℋ 𝑇 ℎ − ≤ 𝔽 sup ℒ ℒ 𝑇 ℎ (Jensen′s inequality) ℎ∈ℋ E0 370: Statistical Learning Theory 12

  13. Analy lyzin ing th the Expected Supr upr ē mus Excess Ris isk 𝑇 ℎ − 𝔽 sup ℒ ℒ 𝑇 ℎ ℎ∈ℋ • For unary losses ℒ 𝑇 ⋅ = ∑ℓ 𝑦 𝑗 ⋅ • Analyzing this term through symmetrization easy 1 ≤ 2 n 𝔽 sup ∑ℓ 𝑦 𝑗 ℎ − ℓ 𝑦 𝑗 ℎ 𝑜 𝔽 sup ∑𝜗 𝑗 ℓ 𝑦 𝑗 ℎ ℎ∈ℋ ℎ∈ℋ ≤ 2𝑀 1 𝑜 𝔽 sup ∑𝜗 𝑗 ℎ 𝑦 𝑗 ≈ 𝒫 𝑜 ℎ∈ℋ E0 370: Statistical Learning Theory 13

  14. Part II: II: Batch Learning Batch Learning for Pairwise Loss Functions E0 370: Statistical Learning Theory 14

  15. Trainin ing wit ith Pairw irwis ise Loss Functions • Given training data 𝑦 1 , 𝑦 2 , … 𝑦 𝑜 , choose a U-statistic • U-statistic should use terms like ℓ 𝑦 𝑗 ,𝑦 𝑘 ℎ ( the kernel ) • Population risk defined as ℒ ⋅ = 𝔽ℓ 𝑦,𝑦 ′ ⋅ Examples: • For any index set Ω ⊂ 𝑜 × 𝑜 , define ℒ S ⋅; Ω = 1 Ω ℓ 𝑦 𝑗 ,𝑦 𝑘 ⋅ 𝑗,𝑘 ∈Ω • Choice of Ω = 𝑗, 𝑘 : 𝑗 ≠ 𝑘 maximizes data utilization • Various ways of optimizing inf ℒ 𝑇 ℎ (e.g. SSG) ℎ∈ℋ E0 370: Statistical Learning Theory 15

  16. Generali lization bounds for Pairw irwise Loss Functio ions • Step 1 : Bound excess risk by supr ē mus excess risk ℒ ℒ 𝑇 ℎ − ℒ ℎ − ℎ ≤ sup ℒ 𝑇 ℎ ℎ∈ℋ • Step 2 : Apply McDiarmid ’ s inequality Check that ℒ 𝑇 ℎ is not perturbed by changing any 𝑦 𝑗 1 ℒ ℒ 𝑇 ℎ − ℒ ℎ − + ℎ ≤ 𝔽 sup ℒ 𝑇 ℎ 𝒫 𝑜 ℎ∈ℋ • Step 3 : Analyze the expected supr ē mus excess risk ℒ ℎ − 𝔽 − 𝔽 sup ℒ 𝑇 ℎ = 𝔽 sup ℒ 𝑇 ℎ ℒ 𝑇 ℎ ℎ∈ℋ ℎ∈ℋ 𝑇 ℎ − ≤ 𝔽 sup ℒ ℒ 𝑇 ℎ (Jensen′s inequality) ℎ∈ℋ E0 370: Statistical Learning Theory 16

  17. Analy lyzin ing th the Expected Supr upr ē mus Excess Ris isk 𝑇 ℎ − 𝔽 sup ℒ ℒ 𝑇 ℎ ℎ∈ℋ • For pairwise losses ℒ 𝑇 ⋅ = ∑ 𝑗≠𝑘 ℓ 𝑦 𝑗 ,𝑦 𝑘 ⋅ • Clean symmetrization not possible due to coupling 2𝔽 sup ℓ ℎ − ℓ 𝑦 𝑗 ,𝑦 𝑘 ℎ 𝑦 𝑗 , 𝑦 𝑘 ℎ∈ℋ 𝑗 𝑘 • Solutions [see Clémençon et al Ann. Stat. ‘ 08] • Alternate representation of U-statistics • Hoeffding decomposition E0 370: Statistical Learning Theory 17

  18. Part III III: Onli line Learning E0 370: Statistical Learning Theory 18

  19. Part III III: Onli line Learning A Whirlwind Tour of Online Learning for Unary Losses E0 370: Statistical Learning Theory 19

  20. Model l for Onli line Learnin ing wit ith Unary Losses Propose hypothesis ℎ 𝑢−1 ∈ ℋ Update Receive loss ℎ 𝑢−1 → ℎ 𝑢 ℓ 𝑢 ⋅ = ℓ 𝑦 𝑢 ,⋅ • Regret ℜ 𝑈 = ∑ℓ 𝑢 ℎ 𝑢−1 − inf ℎ∈ℋ ∑ℓ 𝑢 ℎ E0 370: Statistical Learning Theory 20

  21. Onli line Learnin ing Alg lgorit ithms • Generalized Infinitesimal Gradient Ascent (GIGA) [Zinkevich ’03] ℎ 𝑢 = ℎ 𝑢−1 − 𝜃 𝑢 𝛼 ℎ ℓ 𝑢 ℎ 𝑢−1 • Follow the Regularized Leader (FTRL) [Hazan et al ‘06] 𝑢−1 ℓ 𝜐 ℎ + 𝜏 𝑢 ℎ 2 ℎ 𝑢 = argmin ℎ∈ℋ 𝜐=1 • Under some conditions ℜ 𝑈 ≤ 𝒫 𝑈 • Under strong er conditions ℜ 𝑈 ≤ 𝒫 log 𝑈 E0 370: Statistical Learning Theory 21

  22. Onli line to Batch Conversio ion for Unary Losses • Key insight: ℎ 𝑢−1 is evaluated on an unseen point [Cesa-Bianchi et al ‘01] 𝔽 ℓ 𝑢 ℎ 𝑢−1 |𝜏(𝑦 1 , … , 𝑦 𝑢−1 ) = 𝔽ℓ ℎ 𝑢−1 , 𝑦 𝑢 = ℒ ℎ 𝑢−1 • Set up a martingale difference sequence 𝑊 𝑢 = ℒ ℎ 𝑢−1 − ℓ 𝑢 ℎ 𝑢−1 𝔽 𝑊 𝑢 |𝜏 𝑦 1 , … , 𝑦 𝑢−1 = 0 • Azuma-Hoeffding gives us ∑ℒ ℎ 𝑢−1 ≤ ∑ℓ 𝑢 ℎ 𝑢−1 + 𝒫 𝑈 ∑ℓ 𝑢 ℎ ∗ ≥ 𝑈ℒ ℎ ∗ − 𝒫 𝑈 • Together we get ∑ℒ ℎ 𝑢−1 − 𝑈ℒ ℎ ∗ ≤ ℜ 𝑈 + 𝒫 𝑈 E0 370: Statistical Learning Theory 22

  23. Onli line to Batch Conversio ion for Unary Losses • Hypothesis selection 1 • Convex loss function ℎ = 𝑈 ∑ℎ 𝑢 ℎ ≤ 1 𝑈 ∑ℒ ℎ 𝑢 ≤ ℒ ℎ ∗ + ℜ 𝑈 1 ℒ 𝑈 + 𝒫 𝑈 • More involved for non convex losses • Better results possible [Tewari-Kakade ‘08 ] • Assume strongly convex loss functions ∑ℒ ℎ 𝑢−1 ≤ 𝑈ℒ ℎ ∗ + ℜ 𝑈 + 𝒫 ℜ 𝑈 • For ℜ 𝑈 = 𝒫 log 𝑈 , this reduces to ℎ ≤ 1 𝒫 log 𝑈 𝑈 ∑ℒ ℎ 𝑢 ≤ ℒ ℎ ∗ + ℒ 𝑈 E0 370: Statistical Learning Theory 23

  24. Part III III: Onli line Learning Online Learning for Pairwise Loss Functions E0 370: Statistical Learning Theory 24

  25. Model l for Onli line Learnin ing wit ith Pairw irwise Losses Propose hypothesis ℎ 𝑢−1 ∈ ℋ Update Receive loss ℎ 𝑢−1 → ℎ 𝑢 ℓ 𝑢 ⋅ = ? • Regret ℜ 𝑈 = ? E0 370: Statistical Learning Theory 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend