Improved Bounds on Minimax Regret under Logarithmic Loss via - PowerPoint PPT Presentation

Improved Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance Blair Bilodeau 1 , 2 with Dylan J. Foster 3 and Daniel M. Roy 1 , 2 March 11, 2020 1 Department of Statistical Sciences, University of Toronto 2 Vector Institute 3 Institute for Foundations of Data Science, Massachusetts Institute of Technology

Motivation

Weather Forecasting Goal: forecast the probability of rain from historical data and current conditions.

Weather Forecasting Goal: forecast the probability of rain from historical data and current conditions. Considerations • Which assumptions to make about historical trends continuing? • How many physical relationships should be incorporated in the model? • Are some missed predictions more expensive than others?

Traditional Statistical Learning • Receive a batch of data • Estimate a prediction function ˆ h • Evaluate performance on new data assumed to be from the same distribution

Traditional Statistical Learning But what if there’s a changepoint...

Traditional Statistical Learning ...or your training data isn’t even i.i.d.?

Statistical Solutions We want to remove assumptions about the data generating process. In particular, future data may not be i.i.d. with past data .

Statistical Solutions We want to remove assumptions about the data generating process. In particular, future data may not be i.i.d. with past data . Statistics does this with, for example, • Markov assumption • stationarity assumption (time series) • covariance structure assumption (e.g., Gaussian process)

Statistical Solutions We want to remove assumptions about the data generating process. In particular, future data may not be i.i.d. with past data . Statistics does this with, for example, • Markov assumption • stationarity assumption (time series) • covariance structure assumption (e.g., Gaussian process) But these assumptions are often uncheckable or false .

Online Learning

Online Learning A framework where the past may not be indicative of the future .

Online Learning A framework where the past may not be indicative of the future . Online Learning For rounds t = 1 , . . . , n : y t ∈ ˆ • Predict ˆ Y • Observe y t ∈ Y • Incur loss ℓ (ˆ y t , y t )

Online Learning A framework where the past may not be indicative of the future . Online Learning For rounds t = 1 , . . . , n : y t ∈ ˆ • Predict ˆ Y • Observe y t ∈ Y ← − We do not assume this is generated by a model • Incur loss ℓ (ˆ y t , y t )

Online Learning A framework where the past may not be indicative of the future . Contextual Online Learning For rounds t = 1 , . . . , n : • Observe context x t ∈ X y t ∈ ˆ • Predict ˆ Y • Observe y t ∈ Y ← − We do not assume this is generated by a model • Incur loss ℓ (ˆ y t , y t )

Online Learning A framework where the past may not be indicative of the future . Contextual Online Learning For rounds t = 1 , . . . , n : • Observe context x t ∈ X ← − Also has no model assumptions y t ∈ ˆ • Predict ˆ Y • Observe y t ∈ Y ← − We do not assume this is generated by a model • Incur loss ℓ (ˆ y t , y t )

Measuring Performance In statistical learning, performance is often measured against: • a ground truth, e.g., parameter estimation • the best predictor from some class for the underlying probability model

Measuring Performance In statistical learning, performance is often measured against: • a ground truth, e.g., parameter estimation • the best predictor from some class for the underlying probability model These measures quantify guarantees about the future given the past . Without a probabilistic model: • no notion of ground truth to compare with • the “best hypothesis” in a class is not clearly defined • cannot naively hope to do well on future observations

Measuring Performance In statistical learning, performance is often measured against: • a ground truth, e.g., parameter estimation • the best predictor from some class for the underlying probability model These measures quantify guarantees about the future given the past . Without a probabilistic model: • no notion of ground truth to compare with • the “best hypothesis” in a class is not clearly defined • cannot naively hope to do well on future observations If I can’t promise about the future, can I say something about the past?

Measuring Performance In statistical learning, performance is often measured against: • a ground truth, e.g., parameter estimation • the best predictor from some class for the underlying probability model These measures quantify guarantees about the future given the past . Without a probabilistic model: • no notion of ground truth to compare with • the “best hypothesis” in a class is not clearly defined • cannot naively hope to do well on future observations Consider a relative notion of performance in hindsight. • Relative to a class F ⊆ { f : X → ˆ Y} , consisting of experts f ∈ F . • Compete against the optimal f ∈ F on the actual sequence of observations from past rounds.

Regret n n R ℓ � � Regret: n (ˆ y ; F , x , y ) = ℓ (ˆ y t , y t ) − inf ℓ ( f ( x t ) , y t ) . f ∈F t =1 t =1

Regret n n R ℓ � � Regret: n (ˆ y ; F , x , y ) = ℓ (ˆ y t , y t ) − inf ℓ ( f ( x t ) , y t ) . f ∈F t =1 t =1 This quantity depends on • ˆ y : Player predictions, • F : Expert class, • x : Observed contexts, • y : Observed data points.

Minimax Regret n n R ℓ � � Regret: n (ˆ y ; F , x , y ) = ℓ (ˆ y t , y t ) − inf ℓ ( f ( x t ) , y t ) . f ∈F t =1 t =1 Minimax regret: an algorithm-free quantity on worst-case observations . R ℓ R ℓ n ( F ) = sup inf y 1 sup sup inf y 2 sup · · · sup inf y n sup n (ˆ y ; F , x , y ) . ˆ ˆ ˆ x 1 y 1 x 2 y 2 x n y n

Minimax Regret n n R ℓ � � Regret: n (ˆ y ; F , x , y ) = ℓ (ˆ y t , y t ) − inf ℓ ( f ( x t ) , y t ) . f ∈F t =1 t =1 Minimax regret: an algorithm-free quantity on worst-case observations . R ℓ R ℓ n ( F ) = sup inf y 1 sup sup inf y 2 sup · · · sup inf y n sup n (ˆ y ; F , x , y ) . ˆ ˆ ˆ y 1 x 2 y 2 x n y n x 1 The first context is observed.

Minimax Regret n n R ℓ � � Regret: n (ˆ y ; F , x , y ) = ℓ (ˆ y t , y t ) − inf ℓ ( f ( x t ) , y t ) . f ∈F t =1 t =1 Minimax regret: an algorithm-free quantity on worst-case observations . R ℓ R ℓ n ( F ) = sup inf sup sup inf y 2 sup · · · sup inf y n sup n (ˆ y ; F , x , y ) . ˆ ˆ ˆ x 1 y 1 x 2 y 2 x n y n y 1 The player makes their prediction.

Minimax Regret n n R ℓ � � Regret: n (ˆ y ; F , x , y ) = ℓ (ˆ y t , y t ) − inf ℓ ( f ( x t ) , y t ) . f ∈F t =1 t =1 Minimax regret: an algorithm-free quantity on worst-case observations . R ℓ R ℓ n ( F ) = sup inf y 1 sup sup inf y 2 sup · · · sup inf y n sup n (ˆ y ; F , x , y ) . ˆ ˆ ˆ x 1 x 2 y 2 x n y n y 1 The adversary plays an observation.

Minimax Regret n n R ℓ � � Regret: n (ˆ y ; F , x , y ) = ℓ (ˆ y t , y t ) − inf ℓ ( f ( x t ) , y t ) . f ∈F t =1 t =1 Minimax regret: an algorithm-free quantity on worst-case observations . R ℓ R ℓ n ( F ) = sup inf y 1 sup sup inf sup · · · sup inf y n sup n (ˆ y ; F , x , y ) . ˆ ˆ ˆ x 1 y 1 x n y n x 2 y 2 y 2 This repeats for all n rounds.

Minimax Regret n n R ℓ � � Regret: n (ˆ y ; F , x , y ) = ℓ (ˆ y t , y t ) − inf ℓ ( f ( x t ) , y t ) . f ∈F t =1 t =1 Minimax regret: an algorithm-free quantity on worst-case observations . R ℓ R ℓ n ( F ) = sup inf y 1 sup sup inf y 2 sup · · · sup inf sup n (ˆ y ; F , x , y ) . ˆ ˆ ˆ x 1 y 1 x 2 y 2 x n y n y n This repeats for all n rounds.

Minimax Regret n n R ℓ � � Regret: n (ˆ y ; F , x , y ) = ℓ (ˆ y t , y t ) − inf ℓ ( f ( x t ) , y t ) . f ∈F t =1 t =1 Minimax regret: an algorithm-free quantity on worst-case observations . n n ( F ) = ⟪ sup ⟫ R ℓ R ℓ n (ˆ inf y t sup y ; F , x , y ) . ˆ x t y t t =1 The notation ⟪ · ⟫ n t =1 denotes repeated application of operators.

Minimax Regret n n R ℓ � � Regret: n (ˆ y ; F , x , y ) = ℓ (ˆ y t , y t ) − inf ℓ ( f ( x t ) , y t ) . f ∈F t =1 t =1 Minimax regret: an algorithm-free quantity on worst-case observations . n n ( F ) = ⟪ sup ⟫ R ℓ R ℓ n (ˆ inf y t sup y ; F , x , y ) . ˆ x t y t t =1 Interpretation: The tuple ( ℓ, F ) is online learnable if R ℓ n ( F ) < o ( n ) . n ( F ) = Θ( √ n ) • slow rate: R ℓ • fast rate: R ℓ n ( F ) ≤ O (log( n ))

Logarithmic Loss

Problem Formulation Sequential Probability Assignment In each round, the prediction is a distribution on possible observations.

Problem Formulation Sequential Probability Assignment In each round, the prediction is a distribution on possible observations. Predicting Binary Outcomes p ∈ ˆ y ∈ Y = { 0 , 1 } and ˆ Y ≡ [0 , 1]

Measuring Loss What is the correct notion of loss?

Measuring Loss Intuition: being confidently wrong is much worse than being indecisive. Statistical motivation: maximum likelihood estimation for a Bernoulli.

Improved Bounds on Minimax Regret under Logarithmic Loss via - PowerPoint PPT Presentation

Improved Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance Blair Bilodeau 1 , 2 with Dylan J. Foster 3 and Daniel M. Roy 1 , 2 March 11, 2020 1 Department of Statistical Sciences, University of Toronto 2 Vector Institute 3

Tight Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance Blair Bilodeau 1,2,3 ,

Regret bounds for online variational inference Pierre Alquier ACML Nagoya, Nov. 18, 2019

On adaptive regret bounds for non- stochastic bandits Gergely Neu INRIA Lille, SequeL team

An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting Cem

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Regret Bounds for Lifelong Learning Pierre Alquier Groupe de Travail de Machine learning du CMLA

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently Asaf Cassel Joint work

Efficient Online Portfolio with Logarithmic Regret Haipeng Luo (USC) Chen-Yu Wei (USC) Kai Zheng

Topics on N orlund logarithmic means Nacima Memi c University of Sarajevo, Bosnia and

Logarithmic space Evgenij Thorstensen V18 Evgenij Thorstensen Logarithmic space V18 1 / 18

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using

4. Minimax and planning problems Optimizing piecewise linear functions Minimax problems

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Early Hearing Early Hearing Early Hearing loss D Early Hearing-loss D loss D loss D

Square Formation by Asynchronous Oblivious Robots CCCG 2016 Marcello Mamino, Giovanni Viglietta

Machine Learning for Information Discovery Thorsten Joachims Cornell University Department of

Model 1 proc logistic data=framing descending; model chd01 = age; run; Model Information Data

An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming Jim Renegar joint work

Mutual exclusivity analysis identifies oncogenic network modules Giovanni Ciriello,1,3,4 Ethan

Gross Substitutes Tutorial Part I: Combinatorial structure and algorithms (Renato Paes Leme,

Massive runaway stars: probes for stellar physics and dynamics Mathieu Renzo Collaborators: E.

Evolution and Population Genetics 02-715 Advanced Topics in Computa8onal