Learning Faster from Easy Data II Wouter Koolen Tim van - PowerPoint PPT Presentation

Learning Faster from Easy Data II Wouter Koolen Tim van Erven

Aim of the Workshop ● Minimax analysis gives robust algorithms ● But in common easy cases these are overly conservative – Large gap between performance predicted by theory and observed in practice ● This workshop: – Bring together easy cases in different learning settings – New algorithms: robust to worst case, but automatically adapt to easy cases to learn faster

Learning Settings Easy Cases (non-exhaustive list) ● Margin condition (classification), Standard statistical learning Bernstein condition ● Data fit low-complexity model Active learning ● Sparsity ● Curvature of the loss: strong convexity, exp-concavity, mixability ● Small variance: Online learning 2nd-order bounds, IID losses + gap, small losses, ... ● Many “good” experts ● Stochastic = IID losses + gap Bandits ● K-Means “works” Clustering

Easy Posters Land This talk Bandits m a r g i n Online condition Learning S t a t i s t i c a l Learning

Outline ● Easy data – statistical learning – online learning – bandits ● How to exploit easy data – statistical learning – online learning ● The price of adaptivity

Statistical Learning small risk compared to minimizer of risk in model

Easy Data in Classification For worst-case learning is slow: Margin condition : [Tsybakov, 2004] – common case: not too close to – then learning is much faster, up to

The Margin Condition easy moderate hard

Large Margin Reduces Variance ● Important source of excess risk is variance in excess loss : ● Margin condition Bernstein condition : ● Smaller excess risk smaller variance

Online Learning small cumulative loss compared to minimizer of cumulative loss in model

Easy Data in Online Learning ● Curved losses: easier than strongly convex, linear loss exp-concave, mixable

Easy Data in Online Learning ● Curved losses: easier than strongly convex, linear loss exp-concave, mixable ● Small empirical variance in excess losses: Implied by: – small losses ( -bounds) – i.i.d. losses + gap

Easy Data in Online Learning ● Curved losses: easier than strongly convex, linear loss exp-concave, mixable ● Small empirical variance in excess losses: Implied by: – small losses ( -bounds) – i.i.d. losses + gap – Bernstein condition! Grünwald

Bandit Online Learning ● K arms /treatments with losses ● Only observe own (randomized) choice small cumulative loss compared to best fixed arm

Easy Data for Bandits ● Stochastic bandits (easier): – Losses for arms are independent, identically distributed (i.i.d.) – Positive gap between expected performance of best arm and all others ● Adversarial bandits (harder): – Losses can be anything, even chosen to make learning as difficult as possible

Easy Data for Bandits ● Stochastic bandits (easier): – Losses for arms are independent, identically distributed (i.i.d.) – Positive gap between expected performance of best arm and all others ● Adversarial bandits (harder): – Losses can be anything, even chosen to make learning as difficult as possible ● Can a single algorithm adapt to: – iid+gap + adversarial ? Auer – small losses + adversarial? Neu – small variance in general + adversarial?

Adaptive Statistical Learning We consider exploiting - Bernstein cases: Method: penalized ERM minimizes (for simplicity: prior on countable model ) How to tune ?

Adaptive Statistical Learning ● Knowing , penalized ERM with : ● Adaptive method through holdout estimate ● More sophisticated adaptive methods: – Slope heuristic [Birgé, Massart] – Lepski's method – Safe Bayes [Grünwald]

Adaptive Online Learning: Probabilistic Estimators ● Penalized ERM: ● Allow probability distributions :

Adaptive Online Learning: Probabilistic Estimators ● Penalized ERM: ● Allow probability distributions : ● Solution: exponential weights

Adaptive Online Learning: Probabilistic Estimators ● Penalized ERM: Remark: Obtain other methods like gradient descent by: ● changing KL to other regularizers + ● more general sets for p ● Allow probability distributions : ● Solution: exponential weights

Adaptive Online Learning ● For convex losses, play mean: ● Standard tuning for the worst case ● Gives worst-case regret bound ● Can we do better if we get - Bernstein data?

Adaptive Online Learning ● Turns out can indeed exploit -Bernstein data with correctly tuned . In fact want . ● But cannot do holdout ● Then how to tune eta? – One approach: tune in terms of upper bound on regret that includes some measure of variance – Next slide: learn empirically best learning rate for data at hand

Squint [Koolen and Van Erven 2015] ● Exponential weights : needs external tuning exponential in regret .

Squint [Koolen and Van Erven 2015] ● Exponential weights : needs external tuning exponential in regret . ● Squint : learn best for the data with variance penalty .

Squint ● Philosophy: learn best for the data ● Important for current overview: – Optimal rate in Bernstein cases ● Further advantages beyond stochastic case : – Fast rates on sub-adversarial data – Second-order and quantile adaptivity

Price of adaptivity ● Settings where adaptivity is cheap – Statistical learning : holdout, etc. (Grünwald, – Online learning (full inf.): Squint Foster) ● Settings where adaptivity subtle/unknown – Bandits (IID stochastic / adversarial) ● Adaptivity to both settings affordable (Auer) . ● Can adapt to small losses ( ) but general intermediate case very very tricky (Neu) . – Active learning (Singh) – Online boosting : (Kale) ● Newly introduced setting (ICML best paper) ● Seems some cost for adaptivity – Clustering : (Ben-David) – ...

Schedule ● Invited speakers ● Spotlights + posters : – Online learning, online convex optimization – Clustering – Statistical learning – Non-i.i.d. data – Bandits ● Panel discussion

? Easy Land: great unknowns Non-Stationarity Active Learning Bandits m a r g i n Online condition Learning S t a t i s t i c a l Clustering Learning

Learning Faster from Easy Data II Wouter Koolen Tim van - PowerPoint PPT Presentation

Learning Faster from Easy Data II Wouter Koolen Tim van Erven Aim of the Workshop Minimax analysis gives robust algorithms But in common easy cases these are overly conservative Large gap between performance predicted by

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

Easy Flype & Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

TS 83 DORMA DORMA TS 83 Easy-action Door Closer Easy-action door closer Data and features TS

Water Rights Accounting New Accounting Model New Technology: 1979 versus 2011 Faster

Faster Cover Trees Mike Izbicki and Christian R. Shelton UC Riverside Izbicki and Shelton (UC

Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Faster

WRITING FASTER CODE 1 . 1 WRITING FASTER CODE AND NOT HATING YOUR JOB AS A SOFTWARE DEVELOPER

Faster Code Nicolas Limare 2014/11/19 faster? one task vs many speeds one operation vs many

Inphi Moves Big Data Faster Inphi Moves Big Data Faster Inphis New Canopus DSP Enabling

Rapid Restoration Diagnostic Motivate Enable Implement ++ Urgent Urgent Not easy Easy

Easy Move Progression and Distinctive versions The lifts for accessibility EASY MOVE Distinctive

Title Table of content 1 Easy to change colors, photos and Text 2 Easy to change colors,

Meal Planning Made Easy Meal Planning Made Easy Healthy Utah Meal Planning Made Easy

Expandabee Easy Access | Easy Lift | Easy Bees Red B Where we left off Standalone

1 1 easy to compute , 1 easy to compute 2

Midterm review CS 446 1. Lecture review (Lec1.) Basic setting: supervised learning Training data

A CONTINUAL LEARNING APPROACH FOR LOCAL LEVEL ENVIRONMENTAL MONITORING IN LOW-RESOURCE SETTINGS

Introduction to Machine Learning Evaluation: Test Error Learning goals training error 0.06

Natural Analysts in Adaptive Data Analysis Tijana Zrnic joint with Moritz Hardt Adaptivity

Hold out in residential projects: Land assembly revisited Thomas Boogaerts & Geert Goeyvaerts

Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra -

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck & Co., Inc.

Deep Learning for Perception Robert Platt Northeastern University Perception problems We will

Learning Faster from Easy Data II Wouter Koolen Tim van - PowerPoint PPT Presentation

Learning Faster from Easy Data II Wouter Koolen Tim van Erven Aim of the Workshop Minimax analysis gives robust algorithms But in common easy cases these are overly conservative Large gap between performance predicted by

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

Easy Flype &amp; Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

TS 83 DORMA DORMA TS 83 Easy-action Door Closer Easy-action door closer Data and features TS

Water Rights Accounting New Accounting Model New Technology: 1979 versus 2011 Faster

Faster Cover Trees Mike Izbicki and Christian R. Shelton UC Riverside Izbicki and Shelton (UC

Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Faster

WRITING FASTER CODE 1 . 1 WRITING FASTER CODE AND NOT HATING YOUR JOB AS A SOFTWARE DEVELOPER

Faster Code Nicolas Limare 2014/11/19 faster? one task vs many speeds one operation vs many

Inphi Moves Big Data Faster Inphi Moves Big Data Faster Inphis New Canopus DSP Enabling

Rapid Restoration Diagnostic Motivate Enable Implement ++ Urgent Urgent Not easy Easy

Easy Move Progression and Distinctive versions The lifts for accessibility EASY MOVE Distinctive

Title Table of content 1 Easy to change colors, photos and Text 2 Easy to change colors,

Meal Planning Made Easy Meal Planning Made Easy Healthy Utah Meal Planning Made Easy

Expandabee Easy Access | Easy Lift | Easy Bees Red B Where we left off Standalone

1 1 easy to compute , 1 easy to compute 2

Midterm review CS 446 1. Lecture review (Lec1.) Basic setting: supervised learning Training data

A CONTINUAL LEARNING APPROACH FOR LOCAL LEVEL ENVIRONMENTAL MONITORING IN LOW-RESOURCE SETTINGS

Introduction to Machine Learning Evaluation: Test Error Learning goals training error 0.06

Natural Analysts in Adaptive Data Analysis Tijana Zrnic joint with Moritz Hardt Adaptivity

Hold out in residential projects: Land assembly revisited Thomas Boogaerts &amp; Geert Goeyvaerts

Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra -

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck &amp; Co., Inc.

Deep Learning for Perception Robert Platt Northeastern University Perception problems We will

Easy Flype & Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

Hold out in residential projects: Land assembly revisited Thomas Boogaerts & Geert Goeyvaerts

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck & Co., Inc.