Tight Bounds on Minimax Regret under Logarithmic Loss via - PowerPoint PPT Presentation

Tight Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance Blair Bilodeau 1,2,3 , Dylan J. Foster 4 , and Daniel M. Roy 1,2,3 Presented at the 2020 International Conference on Machine Learning 1 Department of Statistical Sciences, University of Toronto 2 Vector Institute 3 Institute for Advanced Study 4 Institute for Foundations of Data Science, Massachusetts Institute of Technology

Contextual Online Learning with Log Loss Example: Image Identification For rounds t = 1 , . . . , n :

Contextual Online Learning with Log Loss Example: Image Identification For rounds t = 1 , . . . , n : • Receive an image.

Contextual Online Learning with Log Loss Example: Image Identification For rounds t = 1 , . . . , n : • Receive an image. • Assign a probability to whether the image is adversarially generated.

Contextual Online Learning with Log Loss Example: Image Identification For rounds t = 1 , . . . , n : • Receive an image. • Assign a probability to whether the image is adversarially generated. • Observe the true label.

Contextual Online Learning with Log Loss Example: Image Identification For rounds t = 1 , . . . , n : • Receive an image. • Assign a probability to whether the image is adversarially generated. • Observe the true label. • Incur penalty based on prediction and observation.

Contextual Online Learning with Log Loss Example: Image Identification For rounds t = 1 , . . . , n : • Receive an image. Context x t ∈ X • Assign a probability to whether the image is adversarially generated. • Observe the true label. • Incur penalty based on prediction and observation.

Contextual Online Learning with Log Loss Example: Image Identification For rounds t = 1 , . . . , n : • Receive an image. Context x t ∈ X p t ∈ [0 , 1] • Assign a probability. Prediction ˆ • Observe the true label. • Incur penalty based on prediction and observation.

Contextual Online Learning with Log Loss Example: Image Identification For rounds t = 1 , . . . , n : • Receive an image. Context x t ∈ X p t ∈ [0 , 1] • Assign a probability. Prediction ˆ • Observe the true label. Observation y t ∈ { 0 , 1 } • Incur penalty based on prediction and observation.

Contextual Online Learning with Log Loss Example: Image Identification For rounds t = 1 , . . . , n : • Receive an image. Context x t ∈ X p t ∈ [0 , 1] • Assign a probability. Prediction ˆ • Observe the true label. Observation y t ∈ { 0 , 1 } • Incur penalty. Loss ℓ log (ˆ p t , y t ) = − y t log(ˆ p t ) − (1 − y t ) log(ˆ p t )

Contextual Online Learning with Log Loss Example: Image Identification For rounds t = 1 , . . . , n : • Receive an image. Context x t ∈ X p t ∈ [0 , 1] • Assign a probability. Prediction ˆ • Observe the true label. Observation y t ∈ { 0 , 1 } • Incur penalty. Loss ℓ log (ˆ p t , y t ) = − y t log(ˆ p t ) − (1 − y t ) log(ˆ p t ) Notice that ℓ log equals the negative log likelihood of y t under the model ˆ p t .

Contextual Online Learning with Log Loss Example: Image Identification For rounds t = 1 , . . . , n : • Receive an image. Context x t ∈ X p t ∈ [0 , 1] • Assign a probability. Prediction ˆ • Observe the true label. Observation y t ∈ { 0 , 1 } • Incur penalty. Loss ℓ log (ˆ p t , y t ) = − y t log(ˆ p t ) − (1 − y t ) log(ˆ p t ) Notice that ℓ log equals the negative log likelihood of y t under the model ˆ p t . Challenges

Contextual Online Learning with Log Loss Example: Image Identification For rounds t = 1 , . . . , n : • Receive an image. Context x t ∈ X p t ∈ [0 , 1] • Assign a probability. Prediction ˆ • Observe the true label. Observation y t ∈ { 0 , 1 } • Incur penalty. Loss ℓ log (ˆ p t , y t ) = − y t log(ˆ p t ) − (1 − y t ) log(ˆ p t ) Notice that ℓ log equals the negative log likelihood of y t under the model ˆ p t . Challenges • We do not rely on data-generating assumptions.

Contextual Online Learning with Log Loss Example: Image Identification For rounds t = 1 , . . . , n : • Receive an image. Context x t ∈ X p t ∈ [0 , 1] • Assign a probability. Prediction ˆ • Observe the true label. Observation y t ∈ { 0 , 1 } • Incur penalty. Loss ℓ log (ˆ p t , y t ) = − y t log(ˆ p t ) − (1 − y t ) log(ˆ p t ) Notice that ℓ log equals the negative log likelihood of y t under the model ˆ p t . Challenges • We do not rely on data-generating assumptions. • ℓ log is neither bounded nor Lipschitz.

Measuring Performance with Regret Without model assumptions, guaranteed small loss on predictions is impossible.

Measuring Performance with Regret Without model assumptions, guaranteed small loss on predictions is impossible. If I can’t promise about the future, can I say something about the past?

Measuring Performance with Regret Without model assumptions, guaranteed small loss on predictions is impossible. If I can’t promise about the future, can I say something about the past? Consider a relative notion of performance in hindsight. • Relative to a class F ⊆ { f : X → [0 , 1] } , consisting of experts f ∈ F . • Compete against the optimal f ∈ F on the actual sequence of observations.

Measuring Performance with Regret Without model assumptions, guaranteed small loss on predictions is impossible. If I can’t promise about the future, can I say something about the past? Consider a relative notion of performance in hindsight. • Relative to a class F ⊆ { f : X → [0 , 1] } , consisting of experts f ∈ F . • Compete against the optimal f ∈ F on the actual sequence of observations. n n � � R n (ˆ p ; F , x , y ) = p t , y t ) − inf Regret: ℓ log (ˆ ℓ log ( f ( x t ) , y t ) . f ∈F t =1 t =1

Measuring Performance with Regret Without model assumptions, guaranteed small loss on predictions is impossible. If I can’t promise about the future, can I say something about the past? Consider a relative notion of performance in hindsight. • Relative to a class F ⊆ { f : X → [0 , 1] } , consisting of experts f ∈ F . • Compete against the optimal f ∈ F on the actual sequence of observations. n n � � R n (ˆ p ; F , x , y ) = p t , y t ) − inf Regret: ℓ log (ˆ ℓ log ( f ( x t ) , y t ) . f ∈F t =1 t =1 This quantity depends on • ˆ p : Player predictions, • F : Expert class, • x : Observed contexts, • y : Observed data points.

Summary of Results We control the minimax regret using the sequential entropy of the experts F .

Summary of Results We control the minimax regret using the sequential entropy of the experts F . • Minimax regret: the smallest possible regret under worst-case observations. • Sequential entropy: a data-dependent complexity measure for F .

Summary of Results We control the minimax regret using the sequential entropy of the experts F . • Minimax regret: the smallest possible regret under worst-case observations. • Sequential entropy: a data-dependent complexity measure for F . Contributions

Summary of Results We control the minimax regret using the sequential entropy of the experts F . • Minimax regret: the smallest possible regret under worst-case observations. • Sequential entropy: a data-dependent complexity measure for F . Contributions • Improved upper bound for expert classes with polynomial sequential entropy.

Summary of Results We control the minimax regret using the sequential entropy of the experts F . • Minimax regret: the smallest possible regret under worst-case observations. • Sequential entropy: a data-dependent complexity measure for F . Contributions • Improved upper bound for expert classes with polynomial sequential entropy. • Novel proof technique that exploits the curvature of log loss to avoid a key “truncation step” used by previous works.

Summary of Results We control the minimax regret using the sequential entropy of the experts F . • Minimax regret: the smallest possible regret under worst-case observations. • Sequential entropy: a data-dependent complexity measure for F . Contributions • Improved upper bound for expert classes with polynomial sequential entropy. • Novel proof technique that exploits the curvature of log loss to avoid a key “truncation step” used by previous works. • Resolve the minimax regret with log loss for Lipschitz experts on [0 , 1] p with matching lower bounds.

Summary of Results We control the minimax regret using the sequential entropy of the experts F . • Minimax regret: the smallest possible regret under worst-case observations. • Sequential entropy: a data-dependent complexity measure for F . Contributions • Improved upper bound for expert classes with polynomial sequential entropy. • Novel proof technique that exploits the curvature of log loss to avoid a key “truncation step” used by previous works. • Resolve the minimax regret with log loss for Lipschitz experts on [0 , 1] p with matching lower bounds. • Conclude the minimax regret with log loss cannot be completely characterized using sequential entropy.

Minimax Regret n n � � Regret: R n (ˆ p ; F , x , y ) = ℓ log (ˆ p t , y t ) − inf ℓ log ( f ( x t ) , y t ) . f ∈F t =1 t =1 Minimax regret: an algorithm-free quantity on worst-case observations . R n ( F ) = sup inf p 1 sup sup inf p 2 sup · · · sup inf p n sup R n (ˆ p ; F , x , y ) . ˆ ˆ ˆ x 1 y 1 x 2 y 2 x n y n

Tight Bounds on Minimax Regret under Logarithmic Loss via - PowerPoint PPT Presentation

Tight Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance Blair Bilodeau 1,2,3 , Dylan J. Foster 4 , and Daniel M. Roy 1,2,3 Presented at the 2020 International Conference on Machine Learning 1 Department of Statistical Sciences,

Improved Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance Blair Bilodeau 1 , 2

Chapter 2 Tight-frames An Introduction 1 Outline 1. Tight-frame 1. Tight-frame 2. Matrix

Regret bounds for online variational inference Pierre Alquier ACML Nagoya, Nov. 18, 2019

On adaptive regret bounds for non- stochastic bandits Gergely Neu INRIA Lille, SequeL team

Computing Tight Bounds for Insurance Payments with Nonlinear Risk Man Hong WONG 1 Shuzhong ZHANG 2

Tight Bounds for Learning a Mixture of Two Gaussians Moritz Hardt Eric Price Google Research

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Regret Bounds for Lifelong Learning Pierre Alquier Groupe de Travail de Machine learning du CMLA

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently Asaf Cassel Joint work

Efficient Online Portfolio with Logarithmic Regret Haipeng Luo (USC) Chen-Yu Wei (USC) Kai Zheng

Topics on N orlund logarithmic means Nacima Memi c University of Sarajevo, Bosnia and

Logarithmic space Evgenij Thorstensen V18 Evgenij Thorstensen Logarithmic space V18 1 / 18

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using

Lower Bounds on Matrix Rigidity via a Quantum Argument Ronald de Wolf CWI Amsterdam Lower

Massive Text Corpora Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, Jiawei Han University of

Ocelot and the Seman.c Web MULTILINGUAL WEB WORKSHOP, RIGA PHIL RITCHIE,

Cosmological background solutions and cosmological backreactions V. Marra, E. W. Kolb, S.

Dataflow Testing Chapter 10 Dataflow Testing Testing All-Nodes and All-Edges in a control

Building ilding an an op open en con oncordancer ordancer for or Mal alay ay/In

1 Timothy 6:1-2 (NIV) All who are under the yoke of slavery should consider their masters worthy

A Novel Holistic Behavior Change Coaching Approach Harm op den Akker, PhD Roessingh Research and

Assessing inter-rater agreement in Stata Daniel Klein klein.daniel.81@gmail.com

Tight Bounds on Minimax Regret under Logarithmic Loss via - PowerPoint PPT Presentation

Tight Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance Blair Bilodeau 1,2,3 , Dylan J. Foster 4 , and Daniel M. Roy 1,2,3 Presented at the 2020 International Conference on Machine Learning 1 Department of Statistical Sciences,

Improved Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance Blair Bilodeau 1 , 2

Chapter 2 Tight-frames An Introduction 1 Outline 1. Tight-frame 1. Tight-frame 2. Matrix

Regret bounds for online variational inference Pierre Alquier ACML Nagoya, Nov. 18, 2019

On adaptive regret bounds for non- stochastic bandits Gergely Neu INRIA Lille, SequeL team

Computing Tight Bounds for Insurance Payments with Nonlinear Risk Man Hong WONG 1 Shuzhong ZHANG 2

Tight Bounds for Learning a Mixture of Two Gaussians Moritz Hardt Eric Price Google Research

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Regret Bounds for Lifelong Learning Pierre Alquier Groupe de Travail de Machine learning du CMLA

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently Asaf Cassel Joint work

Efficient Online Portfolio with Logarithmic Regret Haipeng Luo (USC) Chen-Yu Wei (USC) Kai Zheng

Topics on N orlund logarithmic means Nacima Memi c University of Sarajevo, Bosnia and

Logarithmic space Evgenij Thorstensen V18 Evgenij Thorstensen Logarithmic space V18 1 / 18

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using

Lower Bounds on Matrix Rigidity via a Quantum Argument Ronald de Wolf CWI Amsterdam Lower

Massive Text Corpora Jialu Liu*, Jingbo Shang*, Chi Wang, Xiang Ren, Jiawei Han University of

Ocelot and the Seman.c Web MULTILINGUAL WEB WORKSHOP, RIGA PHIL RITCHIE,

Cosmological background solutions and cosmological backreactions V. Marra, E. W. Kolb, S.

Dataflow Testing Chapter 10 Dataflow Testing Testing All-Nodes and All-Edges in a control

Building ilding an an op open en con oncordancer ordancer for or Mal alay ay/In

1 Timothy 6:1-2 (NIV) All who are under the yoke of slavery should consider their masters worthy

A Novel Holistic Behavior Change Coaching Approach Harm op den Akker, PhD Roessingh Research and

Assessing inter-rater agreement in Stata Daniel Klein klein.daniel.81@gmail.com

Massive Text Corpora Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, Jiawei Han University of