Easy Data Peter Grnwald Centrum Wiskunde & Informatica - PowerPoint PPT Presentation

Easy Data Peter Grünwald Centrum Wiskunde & Informatica – Amsterdam Mathematical Institute – Leiden University Joint work with W. Koolen, T. Van Erven, N. Mehta, T. Sterkenburg

Today: Three Things To Tell You 1. Nifty Reformulation of Conditions for Fast Rates in Statistical Learning – Tsybakov, Bernstein, Exp-Concavity,... 2. Do this via new concept: ESI 3. Precise Analogue of Bernstein Condition for Fast Rates in Individual Sequence Setting – ...and algorithm that achieves these rates!

Today: Three Things To Tell You 1. Nifty Reformulation of Conditions for Fast Rates in Statistical Learning 2. Do this via new concept: ESI 3. Precise Analogue of Bernstein Condition for Fast Rates in Individual Sequence Setting – ...and algorithm that achieves these rates!

Van Erven, G. Mehta, Reid, Williamson Fast Rates in Statistical and Online Learning. JMLR Special Issue in Memory of A. Chervonenkis, Oct. 2015 VC: Vapnik-Chervonenkis (1974!) optimistic (realizability) • Plaatje van stochmix paper condition TM: Tsybakov (2004) margin condition (special case: Massart Condition) 𝒗 -BC: Audibert, Bousquet (2005), Bartlett, Mendelson (2006 ) “Bernstein Condition” • Does not require 0/1 or absolute loss • Does not require Bayes act to be in model

Decision Problem • A decision problem (DP) is defined as a tuple where • 𝑄 is the distribution of random quantity 𝑎 taking values in , is a set of predictors 𝑔 , and for each • the model indicates loss 𝑔 makes on 𝑎 , • Example: squared error loss

Decision Problem • A decision problem (DP) is defined as a tuple where • 𝑄 is the distribution of random quantity 𝑎 taking values in , is a set of predictors 𝑔 , and for each • the model indicates loss 𝑔 makes on 𝑎 , • We assume throughout that the model contains a risk minimizer 𝑔 ∗ , achieving • abbreviates

Bernstein Condition • Fix a DP with (for now) bounded loss • DP satisfies the 𝐷, 𝛽 -Bernstein condition if there exists 𝐷 > 0, 𝛽 ∈ 0,1 , such that for all where we set and is ‘ regret of 𝑔 relative to 𝑔 ∗ ’. •

Bernstein Condition • Fix a DP with (for now) bounded loss • DP satisfies the 𝐷, 𝛽 -Bernstein condition if there exists 𝐷 > 0, 𝛽 ∈ 0,1 , such that for all where we set and Generalizes Tsybakov condition: 𝑔 ∗ does not need • to be Bayes act, loss does not need to be 0/1

Bernstein Condition • Fix a DP with (for now) bounded loss • DP satisfies the 𝐷, 𝛽 -Bernstein condition if there exists 𝐷 > 0, 𝛽 ∈ 0,1 , such that for all where we set and Suppose data are i.i.d. and the 𝐷, 𝛽 -Bernstein • condition holds. Then...

Under Bernstein (𝑫, 𝜷) • Empirical Risk minimization satisfies, with high prob*, 𝛽 = 0 : condition trivially satisfied, get minimax rate • 𝛽 = 1 : nice case (Massart condition), get ‘log - loss’ • rate

Under Bernstein (𝑫, 𝜷) 𝜽 − “Bayes” MAP satisfies , with high prob*, • This requires setting “learning rate” 𝜃 in terms of 𝛽 • and 𝑈 ! 𝛽 = 0 : slow rate ; 𝛽 = 1 : fast rate •

GOAL: Sequential Bernstein 𝜃 − “Bayes” MAP satisfies, with high prob*, • • GOAL: design ‘sequential Bernstein condition’ and accompanying sequential prediction algorithm s.t. 1. cumulative regret always satisfies, for all 𝑔 ∗ , all sequences 2. if condition holds, it also satisfies, with high prob*

GOAL: Sequential Bernstein • GOAL: design ‘sequential Bernstein condition’ and accompanying sequential prediction algorithm s.t. 1. cumulative regret always satisfies, for all 𝑔 ∗ , all sequences 2. if condition holds, it also satisfies, with high prob*

DREAM • DREAM: design ‘sequential Bernstein condition’ and accompanying sequential prediction algorithm s.t. 1. cumulative regret always satisfies, for all 𝑔 ∗ , all sequences 2. if condition holds for given sequence , then cumulative regret satisfies, for that sequence:

GOAL: Sequential Bernstein • GOAL: design ‘sequential Bernstein condition’ s.t. 1. for all 𝑔 ∗ , all sequences 2. if condition holds, it also satisfies, with high prob*, Approach 1: define seq. Bernstein as standard Bernstein+i.i.d. Even then none of the standard algorithms achieve this... With one (?) exception!

Today: Three Things To Tell You 1. Nifty Reformulation of Fast Rate Conditions in Statistical Learning 2. Do this via new concept: ESI 3. Precise Analogue of Bernstein Condition for Fast Rates in Individual Sequence Setting – ...and algorithm that achieves these rates!

Exponential Stochastic Inequality (ESI) ∗ 𝝑 as For any given 𝜃 > 0 we write 𝒀 ≤ 𝜽 • shorthand for ∗ 𝜗 implies, via Jensen, 𝑌 ≤ 𝜃 • ∗ 𝜗 implies, via Markov, for all 𝐵 , 𝑌 ≤ 𝜃 •

ESI-Example Hoeffding’s Inequality : suppose that 𝑌 has • support [−1,1] , and mean 0. Then

ESI – More Properties For i.i.d. rvs 𝑌, 𝑌 1 , … , 𝑌 𝑈 we have • For arbitrary rvs 𝑌, 𝑍 we have •

Bernstein in ESI Terms • Most general form of Bernstein condition: for some nondecreasing function :

Bernstein in ESI Terms • Most general form of Bernstein condition: for some nondecreasing function : • Van Erven et al. (2015) show this is equivalent to having for some nondecreasing function with

U-Central Condition • Van Erven et al. (2015) show Bernstein condition is is equivalent to the existence of increasing function such that for some : They term this the 𝒗 -central condition

U-Central Condition • Van Erven et al. (2015) show Bernstein condition is is equivalent to the existence of increasing function such that for some : They term this the 𝒗 -central condition – can also be related to mixability, exp-concavity, JRT-condition , condition for well-behavedness of Bayesian inference under misspecification

U-Central Condition • Van Erven et al. (2015) show Bernstein condition is is equivalent to the existence of increasing function such that for some : They term this the 𝒗 -central condition – can also be related to mixability, exp-concavity, JRT-condition , condition for well-behavedness of Bayesian inference under misspecification – for unbounded losses, it becomes different (and better!) than Bernstein condition – it is one-sided

Three Equivalent Notions for Bounded Losses • U-central condition in terms of regret : .....or equivalently (extending notation):

Three Equivalent Notions for Bounded Losses • U-central condition in terms of regret : with • For bounded losses, this turns out to be equivalent to: for some appropriately chosen with :

Three Equivalent Notions for Bounded Losses • U-central condition in terms of regret : with • For bounded losses, this turns out to be equivalent to: for some appropriately chosen with : • More similar to original Bernstein condition. However, condition is now in ‘exponential’ rather than ‘expectation’ form

Today: Three Things To Tell You 1. Nifty Reformulation of Fast Rate Conditions in Statistical Learning 2. Do this via new concept: ESI 3. Precise Analogue of Bernstein Condition for Fast Rates in Individual Sequence Setting – ...and algorithm that achieves these rates!

T-fold U-Central Condition • Suppose that 𝑣 -central condition holds (i.e. 𝑦 / 𝑣(𝑦) – Bernstein holds) , and data are i.i.d. Then by generic property of ESI, with 𝜃 𝜗 = 𝐷 1 ⋅ 𝑣(𝜗) , where

T-fold U-Central Condition • Under 𝑣 -central cond. and iid data, with 𝜃 𝜗 = 𝐷 1 ⋅ 𝑣 𝜗 : but also for every learning algorithm with

Cumulative U-Central Condition • Under 𝑣 -central cond. and iid data, with 𝜃 𝜗 = 𝐷 1 ⋅ 𝑣 𝜗 : but also for every learning algorithm This condition may of course also hold for non-i.i.d. data . It is the condition we need, so we term it the cumulative u-central condition

Hedge with Oracle Learning Rate Hedge with learning rate 𝜃 achieves regret bound, • for all We assume cumulative 𝑣 -central condition for some • 𝑣 . For simplicity assume ; then: and even for some other constant

Hedge with Oracle Learning Rate • Combining we get We can set 𝜗 (or eqv. 𝜃 ) as we like. Best possible • bound achieved if we make sure all terms are of same order, i.e. we set at time 𝑈, • and then and

Squint without Oracle Learning Rate! • Hedge achieves ESI- (!)-bound ...but needs to know 𝑔 ∗ , 𝛾 and 𝑈 to set learning rate! • Squint (Koolen and Van Erven ’15) • achieves same bound without knowing these! Gets bound with 𝛾 = 0 automatically for individual • sequences • What about Adanormalhedge? (Luo & Shapire ‘15)

Dessert: Easy Data Rather than Distributions • We are working with algorithms such as Hedge and Squint, designed for individual, nonstochastic sequences • Yet condition is stochastic • Does there exist nonstochastic analogue ? • Answer is yes:

Easy Data Peter Grnwald Centrum Wiskunde & Informatica - PowerPoint PPT Presentation

Easy Data Peter Grnwald Centrum Wiskunde & Informatica Amsterdam Mathematical Institute Leiden University Joint work with W. Koolen, T. Van Erven, N. Mehta, T. Sterkenburg Today: Three Things To Tell You 1. Nifty Reformulation of

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

Easy Flype & Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

TS 83 DORMA DORMA TS 83 Easy-action Door Closer Easy-action door closer Data and features TS

Rapid Restoration Diagnostic Motivate Enable Implement ++ Urgent Urgent Not easy Easy

Easy Move Progression and Distinctive versions The lifts for accessibility EASY MOVE Distinctive

Title Table of content 1 Easy to change colors, photos and Text 2 Easy to change colors,

Meal Planning Made Easy Meal Planning Made Easy Healthy Utah Meal Planning Made Easy

Expandabee Easy Access | Easy Lift | Easy Bees Red B Where we left off Standalone

1 1 easy to compute , 1 easy to compute 2

make experiment WMT 2010 workflow management Goals JHU Submission WMT 2010 Running translation

BelKraft Water Purifiers Pure Water Pure Water Pure Water an Easy Way to an Easy Way

Easy Tank Speed Log 2017-05-18 2 Consilium STW Speed Log with Easy Installation Based on

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Emsi UK Product Update June 2019 Emsi UK Product Vision 3 Key principles Easy to use

Passive Learning (Ch. 21.1-21.2) Step 1. EM Algorithm For an example, lets go back to the

Its Not Its Not Easy Being Green: Easy Being Green: Green Screen as Green Screen as

Quantum Computing Kitty Yeung, Ph.D. in Applied Physics Creative Technologist + Sr. PM Microsoft

Nested subqueries CIP Petra Selmer petra.selmer@neotechnology.com opencypher.org |

Programming Abstractions Week 4-2: Y Combinator Stephen Checkoway How do we write a recursive

Functional Languages 101 Whats All the Fuss About? Rebecca Parsons ThoughtWorks Agenda

Design Extension Condi.ons and Severe Accidents in Light Water

Ad Hoc Synchroniza/on Considered Harmful Weiwei Xiong, Soyoen

least-squares L. Olson Department of Computer Science University of Illinois at Urbana-Champaign

Syntax & semantics of Beginning Student Readings: HtDP , Intermezzo 1 (Section 8). We are

Easy Data Peter Grnwald Centrum Wiskunde & Informatica - PowerPoint PPT Presentation

Easy Data Peter Grnwald Centrum Wiskunde & Informatica Amsterdam Mathematical Institute Leiden University Joint work with W. Koolen, T. Van Erven, N. Mehta, T. Sterkenburg Today: Three Things To Tell You 1. Nifty Reformulation of

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

Easy Flype &amp; Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

TS 83 DORMA DORMA TS 83 Easy-action Door Closer Easy-action door closer Data and features TS

Rapid Restoration Diagnostic Motivate Enable Implement ++ Urgent Urgent Not easy Easy

Easy Move Progression and Distinctive versions The lifts for accessibility EASY MOVE Distinctive

Title Table of content 1 Easy to change colors, photos and Text 2 Easy to change colors,

Meal Planning Made Easy Meal Planning Made Easy Healthy Utah Meal Planning Made Easy

Expandabee Easy Access | Easy Lift | Easy Bees Red B Where we left off Standalone

1 1 easy to compute , 1 easy to compute 2

make experiment WMT 2010 workflow management Goals JHU Submission WMT 2010 Running translation

BelKraft Water Purifiers Pure Water Pure Water Pure Water an Easy Way to an Easy Way

Easy Tank Speed Log 2017-05-18 2 Consilium STW Speed Log with Easy Installation Based on

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Emsi UK Product Update June 2019 Emsi UK Product Vision 3 Key principles Easy to use

Passive Learning (Ch. 21.1-21.2) Step 1. EM Algorithm For an example, lets go back to the

Its Not Its Not Easy Being Green: Easy Being Green: Green Screen as Green Screen as

Quantum Computing Kitty Yeung, Ph.D. in Applied Physics Creative Technologist + Sr. PM Microsoft

Nested subqueries CIP Petra Selmer petra.selmer@neotechnology.com opencypher.org |

Programming Abstractions Week 4-2: Y Combinator Stephen Checkoway How do we write a recursive

Functional Languages 101 Whats All the Fuss About? Rebecca Parsons ThoughtWorks Agenda

Design Extension Condi.ons and Severe Accidents in Light Water

Ad Hoc Synchroniza/on Considered Harmful Weiwei Xiong, Soyoen

least-squares L. Olson Department of Computer Science University of Illinois at Urbana-Champaign

Syntax &amp; semantics of Beginning Student Readings: HtDP , Intermezzo 1 (Section 8). We are

Easy Flype & Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

Syntax & semantics of Beginning Student Readings: HtDP , Intermezzo 1 (Section 8). We are