Optimistic Rates Nati Srebro Based on work with Karthik Sridharan - PowerPoint PPT Presentation

“Optimistic” Rates Nati Srebro Based on work with Karthik Sridharan and Ambuj Tewari Examples based on work with Andy Cotter, Elad Hazan, Tomer Koren, Percy Liang, Shai Shalev-Shwartz, Ohad Shamir, Karthik Sridharan

Outline • What? • When? (How?) • Why?

Estimating the Bias of a Coin

Optimistic VC bound (aka 𝑀 ∗ -bound, multiplicative bound) • For a hypothesis class with VC-dim D , w.p. 1- 𝜀 over n samples:

Optimistic VC bound (aka 𝑀 ∗ -bound, multiplicative bound) • For a hypothesis class with VC-dim D , w.p. 1- 𝜀 over n samples: • Sample complexity to get 𝑀 ℎ ≤ 𝑀 ∗ + 𝜗 : • Extends to bounded real-valued loss, D =VC subgraph dim

From Parametric to Scale Sensitive Classes • Instead of VC-dim or VC-subgraph-dim ( ≈ #params), rely on 𝑆 metric scale to control complexity, e.g.: = • Learning depends on: • Metric complexity measures: fat shattering dimension, covering numbers, Rademacher Complexity • Scale sensitivity of loss 𝜚 (bound on derivatives or “margin”) • For ℋ with Rademacher Complexity ℛ 𝑜 , and 𝜚 ′ ≤ 𝐻 :

Non-Parametric Optimistic Rate for Smooth Loss • Theorem: for any ℋ with (worst case) Rademacher Complexity ℛ 𝑜 (ℋ) , and any smooth loss with 𝜚 ′′ ≤ 𝐼 , 𝜚 ≤ 𝑐 , w.p. 1 − 𝜀 over n samples: [S Sridharan Tewari 2010] • Sample complexity

Proof Ideas • Smooth functions are self bounding: for any H -smooth non- negative f : 𝑔 ′ 𝑢 ≤ 4𝐼𝑔 𝑢 • 2 nd order version of Lipschitz composition Lemma, restricted to predictors with low loss: Rademacher  fat shattering  𝑀 ∞ covering  (compose with loss and use smoothness)  𝑀 2 covering  Rademacher • Local Rademacher analysis

Non-Parametric Optimistic Rate for Smooth Loss • Theorem: for any ℋ with (worst case) Rademacher Complexity ℛ 𝑜 (ℋ) , and any smooth loss with 𝜚 ′′ ≤ 𝐼 , 𝜚 ≤ 𝑐 , w.p. 1 − 𝜀 over n samples: [S Sridharan Tewari 2010] • Sample complexity

Parametric vs Non-Parametric Parametric Scale-Sensitive dim( ℋ ) ≤ 𝐄 , 𝒊 ≤ 𝟐 𝑺 𝒐 ℛ 𝒐 ℋ ≤ Lipschitz : 𝜚 ′ ≤ 𝐻 (e.g. hinge, ℓ 1 ) 𝐻 2 𝑆 𝐻 𝐸 𝑀 ∗ 𝐻𝐸 𝑜 + 𝑜 𝑜 Smooth : 𝜚 ′ ′ ≤ 𝐼 (e.g. logistic, Huber, 𝐼 𝐸 𝑀 ∗ 𝐼𝐸 𝐼 𝑆 𝑀 ∗ 𝐼𝑆 𝑜 + 𝑜 + 𝑜 𝑜 smoothed hinge) Smooth & strongly convex : 𝜇 ≤ 𝜚 ′′ ≤ 𝐼 𝐼 𝜇 ⋅ 𝐼 𝐸 𝐼 𝑆 𝑀 ∗ 𝐼𝑆 𝑜 + 𝑜 𝑜 (e.g. square loss) Min-max tight up to poly-log factors

Optimistic SVM-Type Bounds 𝜚hinge 𝜚 01 ≤ • Optimize • Generalize

Optimistic SVM-Type Bounds 𝜚 01 ≤ 𝜚smooth≤ 𝜚hinge • • Generalize Optimize

Optimistic Learning Guarantees  Parametric classes  Scale-sensitive classes with smooth loss  SVM-type bounds  Margin Bounds  Online Learning/Optimization with smooth loss  Stability-based guarantees with smooth loss × Non-param (scale sensitive) classes with non-smooth loss × Online Learning/Optimization with non-smooth loss

Why Optimistic Guarantees? • Optimistic regime typically relevant regime: • Approximation error 𝑀 ∗ ≈ Estimation error 𝜗 • If 𝜗 ≪ 𝑀 ∗ , better to spend energy on lowering approx. error (use more complex class) • Important in understanding statistical learning

Training Kernel SVMs 𝑥 ∗ 2 ) # Kernel evaluations to get excess error 𝜗 : ( 𝑆 = • Using SGD: • Using the Stochastic Batch Perceptron [Cotter et al 2012] : (is this the best possible?)

Training Linear SVMs 𝑥 ∗ 2 ) Runtime (# feature evaluations): ( 𝑆 = • Using SGD: • Using SIMBA [Hazan et al 2011] : (is this the best possible?)

Mini-Batch SGD • Stochastic optimization of smooth 𝑀 𝑥 using n training-points, doing T=n/b iterations of SGD with mini-batches of size b • Pessimistic Analysis (ignoring 𝑀 ∗ ):  Can use minibatch of size 𝑐 ∝ 𝑜 , with 𝑈 ∝ 𝑜 iterations and get same error (up to constant factor) as sequential SGD [Dekel et al 2010][Agarwal Duchi 2011] • But taking into account 𝑀 ∗ :  In Optimistic Regime: Can’t use b>1, no parallelization speedups! • Use acceleration to get speedup in optimistic regime  [Cotter et al 2011]

Multiple Complexity Controls [Liang Srebro 2010] 𝑥, 𝑌 − 𝑍 2 , 𝑍 = 𝑥, 𝑌 + 𝒪(0, 𝜏 2 ) 𝑀 𝑥 = 𝔽 𝑥 2 ≤ 𝑆 𝑥 ∈ ℝ 𝐸 𝔽[𝑍 2 ] 𝑆/𝑜 𝑀 ∗ 𝑆/𝑜 𝑀 ∗ 𝑀 ∗ 𝐸/𝑜 𝑀 ∗ /𝐸 𝑆/𝔽[𝑍 2 ] 𝑆/𝑀 ∗ 𝑀 ∗ 𝐸 2 /𝑆

Be Optimistic • For scale-sensitive non-parametric classes, with smooth loss: [Srebro Sridharan Tewari 2010] • Diff vs parametric: Not possible with non-smooth loss! • Optimistic regime typically relevant regime: • Approximation error 𝑀 ∗ ≈ Estimation error 𝜗 • If 𝜗 ≪ 𝑀 ∗ , better to spend energy on lowering approx. error (use more complex class) • Important in understanding statistical learning

Optimistic Rates Nati Srebro Based on work with Karthik Sridharan - PowerPoint PPT Presentation

Optimistic Rates Nati Srebro Based on work with Karthik Sridharan and Ambuj Tewari Examples based on work with Andy Cotter, Elad Hazan, Tomer Koren, Percy Liang, Shai Shalev-Shwartz, Ohad Shamir, Karthik Sridharan Outline What?

PROPERTY RATES PROPERTY RATES PROPERTY RATES PROPERTY RATES BUFFALO CITY MUNICIPALITY

Optimistic Fair Priced Oblivious Transfer A. Rial B. Preneel Katholieke Universiteit Leuven -

Exa- to Yotta-scale Data An Optimistic View Rob Farber PNNL Optimistic about Storage Bandwidth

Clearance Rates Office of Research and Data Analysis Clearance Rates Clearance rates are the

Advanced Macroeconomics 7. Exchange Rates, Interest Rates and Expectations Karl Whelan School of

Optimistic Parallelism Benefits from Data Partitioning Milind Kulkarni, Keshav Pingali, Ganesh

Optimistic Parallelism Requires Abstractions Milind Kulkarni, Keshav Pingali The University

TicToc: Time Traveling Optimistic Concurrency Control Authors: Xiangyao Yu, Andrew Pavlo, Daniel

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech

Optimistic Assumptions in Polyhedral Compilation Johannes Doerfert 1 Tobias Grosser 2 1 Saarland

Faster than Weighted A*: An Optimistic Approach to Bounded Suboptimal Search Jordan Thayer and

US Rates Strategy US Rates Outlook: The Calm before the Storm Ian Lyngen, CFA Head of US Rates

City of Johannesburg REVIEW OF RATES POLICY , RATES BY LAW Councillor Briefing 2017/18 Table

Interest Rates and Fundamental Fluctuations in Home Values Albert Saiz 1 Saiz Interest Rates

City of Johannesburg REVIEW OF RATES POLICY, RATES BY LAW Councillor Briefing 2017/18 Table of

TUBERCULOSIS CASES AND RATES, 1 NEW YORK CITY, 1982-2014 TUBERCULOSIS CASES AND RATES 1 BY AGE,

What is a Strategic Plan? Source: www.thefreedictionary.com/strategic 1 12/7/2011 What is

On conditional versus marginal bias in multi-armed bandits Jaehyeok Shin 1 , Aaditya Ramdas 1,2

E XPECTATIONS , N ETWORKS , AND C ONVENTIONS Benjamin Golub Stephen Morris Harvard Princeton

Direct measurement of Chern numbers in the diffraction pattern of a Fibonacci chain. JMC15

(De)Constructing Bias on Skin Lesion Datasets A. Bissoto, M. Fornaciali, E. Valle, S.

Protective Optimization Technologies: The revolution will not be optimized? Seda Grses

10/10/2018 Nattawoot Koowattanatianchai 1 Investment Analysis & Portfolio Management

Mod 0651 Solution Option overview Distribution Workgroup 28.06.18 Purpose of providing this

Optimistic Rates Nati Srebro Based on work with Karthik Sridharan - PowerPoint PPT Presentation

Optimistic Rates Nati Srebro Based on work with Karthik Sridharan and Ambuj Tewari Examples based on work with Andy Cotter, Elad Hazan, Tomer Koren, Percy Liang, Shai Shalev-Shwartz, Ohad Shamir, Karthik Sridharan Outline What?

PROPERTY RATES PROPERTY RATES PROPERTY RATES PROPERTY RATES BUFFALO CITY MUNICIPALITY

Optimistic Fair Priced Oblivious Transfer A. Rial B. Preneel Katholieke Universiteit Leuven -

Exa- to Yotta-scale Data An Optimistic View Rob Farber PNNL Optimistic about Storage Bandwidth

Clearance Rates Office of Research and Data Analysis Clearance Rates Clearance rates are the

Advanced Macroeconomics 7. Exchange Rates, Interest Rates and Expectations Karl Whelan School of

Optimistic Parallelism Benefits from Data Partitioning Milind Kulkarni, Keshav Pingali, Ganesh

Optimistic Parallelism Requires Abstractions Milind Kulkarni, Keshav Pingali The University

TicToc: Time Traveling Optimistic Concurrency Control Authors: Xiangyao Yu, Andrew Pavlo, Daniel

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech

Optimistic Assumptions in Polyhedral Compilation Johannes Doerfert 1 Tobias Grosser 2 1 Saarland

Faster than Weighted A*: An Optimistic Approach to Bounded Suboptimal Search Jordan Thayer and

US Rates Strategy US Rates Outlook: The Calm before the Storm Ian Lyngen, CFA Head of US Rates

City of Johannesburg REVIEW OF RATES POLICY , RATES BY LAW Councillor Briefing 2017/18 Table

Interest Rates and Fundamental Fluctuations in Home Values Albert Saiz 1 Saiz Interest Rates

City of Johannesburg REVIEW OF RATES POLICY, RATES BY LAW Councillor Briefing 2017/18 Table of

TUBERCULOSIS CASES AND RATES, 1 NEW YORK CITY, 1982-2014 TUBERCULOSIS CASES AND RATES 1 BY AGE,

What is a Strategic Plan? Source: www.thefreedictionary.com/strategic 1 12/7/2011 What is

On conditional versus marginal bias in multi-armed bandits Jaehyeok Shin 1 , Aaditya Ramdas 1,2

E XPECTATIONS , N ETWORKS , AND C ONVENTIONS Benjamin Golub Stephen Morris Harvard Princeton

Direct measurement of Chern numbers in the diffraction pattern of a Fibonacci chain. JMC15

(De)Constructing Bias on Skin Lesion Datasets A. Bissoto, M. Fornaciali, E. Valle, S.

Protective Optimization Technologies: The revolution will not be optimized? Seda Grses

10/10/2018 Nattawoot Koowattanatianchai 1 Investment Analysis &amp; Portfolio Management

Mod 0651 Solution Option overview Distribution Workgroup 28.06.18 Purpose of providing this

10/10/2018 Nattawoot Koowattanatianchai 1 Investment Analysis & Portfolio Management