Online Learning with Kernel Losses Aldo Pacchiano UC Berkeley - PowerPoint PPT Presentation

Online Learning with Kernel Losses Aldo Pacchiano UC Berkeley Joint work with Niladri Chatterji and Peter Bartlett 1

Talk Overview • Intro to Online Learning • Linear Bandits • Kernel Bandits 2

Online Learning 3

Online Learning t = 1 , · · · , n Learner Adversary 3

Online Learning t = 1 , · · · , n Learner Adversary Learner chooses an action a t ∈ A 3

Online Learning t = 1 , · · · , n Learner Adversary Learner chooses an action a t ∈ A Adversary reveals loss (or reward) ` t ∈ W 3

Online Learning t = 1 , · · · , n Learner Adversary Learner chooses an action a t ∈ A Can be i.i.d or Adversary reveals loss (or reward) ` t ∈ W adversarial 3

Online Learning t = 1 , · · · , n Learner Adversary Learner chooses an action a t ∈ A Can be i.i.d or Adversary reveals loss (or reward) ` t ∈ W adversarial n X ` t ( a t ) t =1 3

Online Learning t = 1 , · · · , n Learner Adversary Learner chooses an action a t ∈ A Can be i.i.d or Adversary reveals loss (or reward) ` t ∈ W adversarial n n R ( n ) = X X ` t ( a t ) − min ` t ( a t ) a ∗ ∈ A t =1 t =1 3

Online Learning t = 1 , · · · , n Learner Adversary Learner chooses an action a t ∈ A Can be i.i.d or Adversary reveals loss (or reward) ` t ∈ W adversarial n n R ( n ) = X X ` t ( a t ) − min ` t ( a t ) a ∗ ∈ A t =1 t =1 The learner’s objective is to minimize Regret 3

Full information vs Bandit feedback 4

Full information vs Bandit feedback Full Information: Learner gets to sees all of ` t ( · ) 4

Full information vs Bandit feedback Full Information: Learner gets to sees all of ` t ( · ) Bandit Feedback: Learner only sees the value ` t ( a t ) 4

Multi Armed Bandits P 1 P 2 P 3 µ 2 µ 3 µ 1 5

Multi Armed Bandits P 1 P 2 P 3 Learner chooses a t ∈ { 1 , · · · , K } µ 2 µ 3 µ 1 5

Multi Armed Bandits P 1 P 2 P 3 Learner chooses a t ∈ { 1 , · · · , K } Gets reward X a t ∼ P a t µ 2 µ 3 µ 1 5

Multi Armed Bandits P 1 P 2 P 3 Learner chooses a t ∈ { 1 , · · · , K } Gets reward X a t ∼ P a t µ 2 µ 3 µ 1 " n # X R ( n ) = max a ∗ ∈ { 1 , ··· K } nµ a ∗ − E X a t t =1 5

Multi Armed Bandits P 1 P 2 P 3 Learner chooses a t ∈ { 1 , · · · , K } Gets reward X a t ∼ P a t µ 2 µ 3 µ 1 " n # X R ( n ) = max a ∗ ∈ { 1 , ··· K } nµ a ∗ − E X a t t =1 MAB regret p R ( n ) = O ( Kn log( n )) [Auer et al. 2002] 5

Structured losses Network ( V, E ) Arms = Paths a t ∈ A ⊂ { 0 , 1 } E Loss = delay w t ∈ W = [0 , 1] E Exponential MAB regret Packet routing p R ( n ) = O ( | num paths | · n log( n )) Delay is linear h a t , w t i 6

Linear Bandits 7

Linear Bandits Learner chooses an action a t ∈ A ⊂ R d 7

Linear Bandits Learner chooses an action a t ∈ A ⊂ R d Adversary’s loss ` t ( a ) = h w t , a i for w t ∈ W ⊂ R d 7

Linear Bandits Learner chooses an action a t ∈ A ⊂ R d Adversary’s loss ` t ( a ) = h w t , a i for w t ∈ W ⊂ R d Can be i.i.d or adversarial 7

Linear Bandits Learner chooses an action a t ∈ A ⊂ R d Adversary’s loss ` t ( a ) = h w t , a i for w t ∈ W ⊂ R d Can be i.i.d or Learner only experiences h w t , a t i adversarial 7

Linear Bandits Learner chooses an action a t ∈ A ⊂ R d Adversary’s loss ` t ( a ) = h w t , a i for w t ∈ W ⊂ R d Can be i.i.d or Learner only experiences h w t , a t i adversarial Expected regret: " n # n X X R ( n ) = E h w t , a t i � inf h w t , a i a ∈ A t =1 t =1 7

Linear Bandits Learner chooses an action a t ∈ A ⊂ R d Adversary’s loss ` t ( a ) = h w t , a i for w t ∈ W ⊂ R d Can be i.i.d or Learner only experiences h w t , a t i adversarial Expected regret: " n # n X X R ( n ) = E h w t , a t i � inf h w t , a i a ∈ A t =1 t =1 , W = [0 , 1] d MAB reduces to Linear A = { e 1 , · · · , e d } Bandits 7

Exponential weights for adversarial linear bandits 8

Exponential weights for adversarial linear bandits For t = 1 , · · · , n : Sample mixture a t ∼ p t = (1 − γ ) q t + γν |{z} | {z } Exploration Exploitation 8

Exponential weights for adversarial linear bandits For t = 1 , · · · , n : Sample mixture a t ∼ p t = (1 − γ ) q t + γν |{z} | {z } Exploration Exploitation See h w t , a t i 8

Exponential weights for adversarial linear bandits For t = 1 , · · · , n : Sample mixture a t ∼ p t = (1 − γ ) q t + γν |{z} | {z } Exploration Exploitation See h w t , a t i Build loss estimator ˆ w t 8

Exponential weights for adversarial linear bandits For t = 1 , · · · , n : Sample mixture a t ∼ p t = (1 − γ ) q t + γν |{z} | {z } Exploration Exploitation See h w t , a t i Build loss estimator ˆ w t q t ( a ) / exp( � η h ˆ w t , a i ) q t − 1 ( a ) Update | {z } Exponential weights 8

Exponential weights t X q t ( a ) / exp( � η h w i , a i ) ˆ i =1 9

Exponential weights t X q t ( a ) / exp( � η h w i , a i ) ˆ i =1 t X ˆ w i i =1 A 9

Exponential weights t X q t ( a ) / exp( � η h w i , a i ) ˆ i =1 t X ˆ w i q t i =1 A A 9

Unbiased estimator of the loss ⇥ aa > ⇤ Let and set w t = ( Σ t ) − 1 a t h w t , a t i Σ t = E a ⇠ p t ˆ 10

Unbiased estimator of the loss ⇥ aa > ⇤ Let and set w t = ( Σ t ) − 1 a t h w t , a t i Σ t = E a ⇠ p t ˆ is an unbiased estimator of : ˆ w t w t 10

Unbiased estimator of the loss ⇥ aa > ⇤ Let and set w t = ( Σ t ) − 1 a t h w t , a t i Σ t = E a ⇠ p t ˆ is an unbiased estimator of : ˆ w t w t aa T ⇤� � 1 E a t ⇠ p t [ a t h w t , a t i |F t � 1 ] � ⇥ E a t ⇠ p t [ ˆ w t |F t � 1 ] = E a ⇠ p t aa T ⇤� � 1 E a t ⇠ p t a t a > � ⇥ ⇥ ⇤ = t |F t � 1 E a ⇠ p t w t = w t 10

Linear bandits regret Theorem. (Linear Bandits Regret). [See for example Bubeck ‘11] n R ( n )  γ n + log( |A| ) X w t , a i ) 2 + η EE a ∼ p t ( h ˆ η t =1 Exploration over Barycentric Spanner, [Dani, Hayes, Kakade ’08] p n log( |A| )) = O ( d 3 / 2 √ n ) O ( d Uniform over , [Cesa-Bianchi, Lugosi, ’12] A p dn log( |A| )) = O ( d √ n ) O ( John’s distribution [Bubeck, Cesa-Bianchi, Kakade ’12] O ( d √ n ) 11

Linear bandits regret Dimension dependence Variance bound: w t , a i ) 2 ⇤⇤ ⇥ ⇥ ( h ˆ  d E E a t ∼ p t 12

Linear bandits regret Dimension dependence Variance bound: w t , a i ) 2 ⇤⇤ ⇥ ⇥ ( h ˆ  d E E a t ∼ p t Dimension dependence 12

Linear bandits regret Dimension dependence Variance bound: w t , a i ) 2 ⇤⇤ ⇥ ⇥ ( h ˆ  d E E a t ∼ p t Dimension dependence n R ( n )  γ n + log( |A| ) X w t , a i ) 2 + η EE a ∼ p t ( h ˆ η t =1 | {z } ≤ η dn 12

Recap • Intro to Online Learning • Linear Bandits • Kernel Bandits 13

Online Quadratic losses a t 2 A = { a s.t. k a k 2  1 } Symmetric and B t possibly non convex ` t ( a ) = h b t , a i + a > B t a min ` t ( a ) a ∈ A Offline problem has polytime solution Strong Duality Covfefe z = x 2 − . 5 ∗ y 2 + x ∗ y − . 5 ∗ x + . 5 y + 1 Peter Bartlett Niladri Chatterji 14

Linearization of Quadratic losses matrices ( ) Quadratic losses are linear in the space of vector ✓ aa > ⌧✓ B t ◆ ◆� ` ( a ) = h b t , a i + a > B t a ` ( a ) = , b t a We can use the linear bandits machinery Exponential weights for quadratic bandits 15

Exponential weights for adversarial quadratic bandits For t = 1 , · · · , n : Sample mixture a t ∼ p t = (1 − γ ) q t + γν |{z} | {z } Exploration Exploitation See h b t , a t i + a > t B t a t ✓ ˆ ◆ B t Build loss estimator ˆ b t b t , a i + a > ˆ q t ( a ) / exp( � η ( h ˆ B t a )) q t � 1 ( a ) Update | {z } Exponential weights Sampling is poly time 16

Beyond “Finite Dimensional” Losses Evasion games: Obstacle avoidance ` t ( a ) = exp( �k a � w t k 2 ) Gaussian kernel - Infinite dimensional 17

Online Learning with Kernel Losses Aldo Pacchiano UC Berkeley - PowerPoint PPT Presentation

Online Learning with Kernel Losses Aldo Pacchiano UC Berkeley Joint work with Niladri Chatterji and Peter Bartlett 1 Talk Overview Intro to Online Learning Linear Bandits Kernel Bandits 2 Online Learning 3 Online Learning t = 1

Contents of Presentation Types of losses Causes of losses Prevention of losses

Food Losses/Waste in Food Value Chains Food Losses/Waste in Food Value Chains Areas

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

LOSSES OEE Workshop Siyambulela Bozo: Junior Project Manager AIDC - TPM Pres resentation

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

Piping Systems and Flow Analysis ( Chapter 3) 2 Learning Outcomes (Chapter 3) Losses in

Multiple Kernel Learning and Feature Space Denoising Fei Yan, Josef Kittler and Krystian

Efficient Multiple Kernel Learning Lei Tang Outline What is Kernel Learning? Whats the

Kernel Learning with a Million Kernels Ashesh Jain SVN Vishwanathan IIT Delhi Purdue

A Consideration about Second Screen Senario Kensaku KOMATSU NTT Communications Corporation

To Towards(Understanding(Android(System( Vu Vulnerabilities:( Te Techniques(and(Insights(

Starforming rings in lenticular galaxies Olga K. Silchenko, Sternberg Astronomical Institute

JUNIT JUnit ( http://www.junit.org ) is a framework for writing tests in Java, written by Erich

Chapter 2 Integer Programming Paragraph 1 Total Unimodularity What we did so far We

Using Collocated AVHRR Imager Background and Motivation Measurements to Constrain Cloud Cleared

Kevon Swift, External Relations Officer for the Caribbean LACNIC Caribbean on the move

B-trees anhtt-fit@mail.hut.edu.vn B tree A B-Tree of order m (the maximum number of children