Intro to Online Learning Instructor: Haifeng Xu Outline Online - PowerPoint PPT Presentation

CS6501: T opics in Learning and Game Theory (Fall 2019) Intro to Online Learning Instructor: Haifeng Xu

Outline Ø Online Learning/Optimization Ø Measure Algorithm Performance via Regret Ø Warm-up: A Simple Example 2

Overview of Machine Learning Ø Supervised learning Classifier/ Labeled training ML Regression data Algorithm function Ø Unsupervised learning Unlabeled training ML Clusters/ data Algorithm Knowledge Ø Semi-supervised learning (a combination of the two) What else are there? 3

Overview of Machine Learning Ø Supervised learning Ø Unsupervised learning Ø Semi-supervised learning Ø Online learning Ø Reinforcement learning Ø Active learning Ø . . . 4

Online Learning: When Data Come Online The online learning pipeline Observed one more training instance Initial ML Receive algorithm loss/reward Make predictions/ decisions 5

Online Learning: When Data Come Online The online learning pipeline Observed one more training instance Update ML Initial ML Receive algorithm algorithm loss/reward Make predictions/ decisions 6

Typical Assumptions on Data Ø Statistical feedback: instances drawn from a fixed distribution • Image classification, predict stock prices, choose restaurants, gambling machine (a.k.a., bandits) Ø Adversarial feedback: instances are drawn adversarially • Spam detection, anomaly detection, game playing Ø Markovian feedback: instances drawn from a distribution which is dynamically changing • Interventions, treatments 7

Online learning for Decision Making Ø Learn to commute to school • Bus, walking, or driving? Which route? Uncertainty on the way? Ø Learn to gamble or buy stocks 8

Online learning for Decision Making Ø Learn to commute to school • Bus, walking, or driving? Which route? Uncertainty on the way? Ø Learn to gamble or buy stocks Ø Advertisers learn to bid for keywords 9

Online learning for Decision Making Ø Learn to commute to school • Bus, walking, or driving? Which route? Uncertainty on the way? Ø Learn to gamble or buy stocks Ø Advertisers learn to bid for keywords Ø Recommendation systems learn to make recommendations 10

Online learning for Decision Making Ø Learn to commute to school • Bus, walking, or driving? Which route? Uncertainty on the way? Ø Learn to gamble or buy stocks Ø Advertisers learn to bid for keywords Ø Recommendation systems learn to make recommendations Ø Clinical trials Ø Robotics learn to react Ø Learn to play games (video games and strategic games) Ø Even how you learn to make decisions in your life Ø . . . 11

Model Sketch Ø A learner acts in an uncertain world for 𝑈 time steps Ø Each step 𝑢 = 1, ⋯ , 𝑈 , learner takes action 𝑗 ( ∈ 𝑜 = {1, ⋯ , 𝑜} Ø Learner observes cost vector 𝑑 ( where 𝑑 ( 𝑗 ∈ [0,1] is the cost of action 𝑗 ∈ [𝑜] • Learner suffers cost 𝑑 ( (𝑗 ( ) at step 𝑢 • Can be similarly defined as reward instead of cost, not much difference • There are also “partial feedback” models (will not cover here) Ø Adversarial feedbacks: 𝑑 ( is chosen by an adversary • The powerful adversary has access to all the history (learner actions, past costs, etc.) until 𝑢 − 1 and also the learner’s algorithm • There are models of stochastic feedbacks (will not cover here) Ø Learner’s goal: minimize ∑ (∈[5] 𝑑 ( (𝑗 ( ) 12

Formal Procedure of the Model At each time step 𝑢 = 1, ⋯ , 𝑈 , the following occurs in order: Learner picks a distribution 𝑞 ( over actions [𝑜] 1. Adversary picks cost vector 𝑑 ( ∈ 0,1 7 (he knows 𝑞 ( ) 2. Action 𝑗 ( ∼ 𝑞 ( is chosen and learner incurs cost 𝑑 ( (𝑗 ( ) 3. Learner observes 𝑑 ( (for use in future time steps) 4. Ø Learner tries to pick distribution sequence 𝑞 9 , ⋯ , 𝑞 5 to minimize expected cost 𝔽 ∑ (∈5 𝑑 ( (𝑗 ( ) • Expectation over randomness of action Ø The adversary does not have to really exist – it is assumed mainly for the purpose of worst-case analysis 13

Well, Adversary Seems Too Powerful? Ø Adversary can choose 𝑑 ( ≡ 1, ∀𝑢 ; learner suffers cost 𝑈 regardless • Cannot do anything non-trivial? We are done? Ø If 𝑑 ( ≡ 1 ∀𝑢 , if you look back at the end, you do not regret anything – had you known such costs in hindsight, you cannot do better • From this perspective, cost 𝑈 in this case is not bad So what is a good measure for the performance of an online learning algorithm? 14

Regret Ø Measures how much the learner regrets, had he known the cost vector 𝑑 9 , ⋯ , 𝑑 5 in hindsight Ø Formally, 𝑆 5 = 𝔽 @ B ∼C B ∑ (∈[5] 𝑑 ( 𝑗 ( @∈[7] ∑ (∈[5] 𝑑 ( (𝑗) − min @∈[7] ∑ ( 𝑑 ( (𝑗) is the learner utility had he known 𝑑 9 , ⋯ , 𝑑 5 Ø Benchmark min and is allowed to take the best single action across all rounds 16

Regret Ø Measures how much the learner regrets, had he known the cost vector 𝑑 9 , ⋯ , 𝑑 5 in hindsight Ø Formally, 𝑆 5 = 𝔽 @ B ∼C B ∑ (∈[5] 𝑑 ( 𝑗 ( @∈[7] ∑ (∈[5] 𝑑 ( (𝑗) − min @∈[7] ∑ ( 𝑑 ( (𝑗) is the learner utility had he known 𝑑 9 , ⋯ , 𝑑 5 Ø Benchmark min and is allowed to take the best single action across all rounds • There are other concepts of regret, e.g., swap regret (coming later) @∈[7] ∑ ( 𝑑 ( (𝑗) is mostly used • But, min Regret is an appropriate performance measure of online algorithms • It measures exactly the loss due to not knowing the data in advance 17

Average Regret G H 9 9 D 5 ∑ (∈[5] 𝑑 ( 𝑗 ( − min 5 ∑ (∈[5] 𝑑 ( (𝑗) 𝑆 5 = 5 = 𝔽 @ B ∼C B @∈[7] Ø When D 𝑆 5 → 0 as 𝑈 → ∞ , we say the algorithm has vanishing regret or no-regret; the algorithm is called a no-regret online learning algorithm • Equivalently, 𝑆 5 is sublinear in 𝑈 • Both are used, depending on your habits Our goal: design no-regret algorithms by minimizing regret 18

A Naive Strategy: Follow the Leader (FTL) Ø That is, pick the action with the smallest accumulated cost so far What is the worst-case regret of FTL? Answer: worst (largest) regret 𝑈/2 Ø Consider following instance with 2 actions 𝑢 𝑈 1 2 3 4 5 . . . 𝑑 ( (1) 1 0 1 0 1 . . . ∗ 𝑑 ( (2) 0 1 0 1 0 . . . ∗ Ø FTL always pick the action with cost 1 à total cost 𝑈 Ø Best action in hindsight has cost at most 𝑈/2 19

Randomization is Necessary In fact, any deterministic algorithm suffers (linear) regret (n − 1)𝑈/𝑜 Ø Recall, adversary knows history and learner’s algorithm • So he can infer our 𝑞 ( at time 𝑢 (but do not know our sampled 𝑗 ( ∼ 𝑞 ( ) Ø But if 𝑞 ( is deterministic, action 𝑗 ( can also be inferred Ø Adversary simply sets 𝑑 ( 𝑗 ( = 1 and 𝑑 ( 𝑗 = 0 for all 𝑗 ≠ 𝑗 ( Ø Learner suffers total cost 𝑈 Ø Best action in hindsight has cost at most 𝑈/𝑜 Can randomized algorithm achieve sublinear regret? 20

Consider a Simpler (Special) Setting Ø Only two types of costs, 𝑑 ( 𝑗 ∈ {0,1} Ø One of the actions is perfect – it always has cost 0 • Minimum cost in hindsight is thus 0 • Learner does not know which action is perfect Is it possible to achieve sublinear regret in this simpler setting? 22

A Natural Algorithm Observations: If an action ever had non-zero costs, it is not perfect 1. Actions with all zero costs so far, we do not really know how to 2. distinguish them currently These motivate to the following natural algorithm For 𝑢 = 1, ⋯ , 𝑈 Ø Identify the set of actions with zero total cost so far, and pick one action from the set uniformly at random. Note: there is always at least one action to pick since the perfect action is always a candidate 23

Analysis of the Algorithm Ø Fix a round 𝑢 , we examine the expected loss from this round Ø Let 𝑇 OPPQ = { actions with zero total cost before 𝑢} and 𝑙 = |𝑇 OPPQ | • So each action in 𝑇 OPPQ is picked with probability 1/𝑙 24

Analysis of the Algorithm Ø Fix a round 𝑢 , we examine the expected loss from this round Ø Let 𝑇 OPPQ = { actions with zero total cost before 𝑢} and 𝑙 = |𝑇| • So each action in 𝑇 OPPQ is picked with probability 1/𝑙 Ø For any parameter 𝜗 ∈ [0,1] , one of the following two happens • Case 1: • Case 2 : 25

Analysis of the Algorithm Ø Fix a round 𝑢 , we examine the expected loss from this round Ø Let 𝑇 OPPQ = { actions with zero total cost before 𝑢} and 𝑙 = |𝑇| • So each action in 𝑇 OPPQ is picked with probability 1/𝑙 Ø For any parameter 𝜗 ∈ [0,1] , one of the following two happens at most 𝜗𝑙 actions from 𝑇 OPPQ have cost 1 , in which case • Case 1: we suffer expected cost at most 𝜗 • Case 2 : 26

Intro to Online Learning Instructor: Haifeng Xu Outline Online - PowerPoint PPT Presentation

CS6501: T opics in Learning and Game Theory (Fall 2019) Intro to Online Learning Instructor: Haifeng Xu Outline Online Learning/Optimization Measure Algorithm Performance via Regret Warm-up: A Simple Example 2 Overview of Machine

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

Online Learning and Online Investing Jia Mao February 20, 2006 Jia Mao () Online Learning and

Online Learning with Kernel Losses Aldo Pacchiano UC Berkeley Joint work with Niladri Chatterji

INTRO: What is a MOOD BOARD? What is it? INTRO: Why are they Used? INTRO: Things to Consider

Teaching with Online Platforms What is an Online Learning Platform? A n Online Learning Platform

ONLINE ADVERTISING What is SIBC online? SIBC Online is a leading online news source for the

Online Learning Tomaso Poggio and Lorenzo Rosasco 9.520 Class 15 March 30 2011 T. Poggio and L.

Intro to Life Cycle Analysis Intro to Life Cycle Analysis Intro to Life Cycle Analysis

Intro to Electronics Week 2 Intro to Electronics, Week 2 Last updated Oct. 17, 2012 1 Build a

Large-Scale Data Engineering Intro to LSDE, Intro to Big Data & Intro to Cloud Computing

Lecture 5: HW1 Discussion, Intro to GPUs G63.2011.002/G22.2945.001 October 5, 2010 Discuss HW1

Large-Scale Data Engineering Intro to LSDE, Intro to Big Data & Intro to Cloud Computing

Lab 0 Objectives Intro to Labs Intro to Operating Systems Start Lab #0 UNIX/Linux

Some issues in model-based development for embedded control systems Paul Caspi Verimag-Cnrs

Digital Advertising (PPC/SEM) Course Digital Advertising (PPC/SEM) Equinet 1 Academy Digital

disambiguation on Twitter Damiano Spina, Enrique Amig and Julio Gonzalo

SETTING UP A CP2K CALCULATION Iain Bethune (ibethune@epcc.ed.ac.uk) Overview How to run

Finding Top-k Min-Cost Connected Trees in Databases Bolin Ding 1 Jeffrey Xu Yu 1 Shan Wang 2 Lu

Digital Magazine Design Page Make A Plan Plan 1.1 1.2 1.3 1.5 1.6 1.4 Structure

Reinforcement Learning Lecture 18a Gillian Hayes 7th March 2007 Gillian Hayes RL Lecture 18a

GATORCON 2020 4TH 6TH FEBRUARY . OLD THORNS MANOR HOTEL #GatorCon2020 Live Q&A at sli.do

Honeycomb Crea/ve Works is financed by the European

Intro to Online Learning Instructor: Haifeng Xu Outline Online - PowerPoint PPT Presentation

CS6501: T opics in Learning and Game Theory (Fall 2019) Intro to Online Learning Instructor: Haifeng Xu Outline Online Learning/Optimization Measure Algorithm Performance via Regret Warm-up: A Simple Example 2 Overview of Machine

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

Online Learning and Online Investing Jia Mao February 20, 2006 Jia Mao () Online Learning and

Online Learning with Kernel Losses Aldo Pacchiano UC Berkeley Joint work with Niladri Chatterji

INTRO: What is a MOOD BOARD? What is it? INTRO: Why are they Used? INTRO: Things to Consider

Teaching with Online Platforms What is an Online Learning Platform? A n Online Learning Platform

ONLINE ADVERTISING What is SIBC online? SIBC Online is a leading online news source for the

Online Learning Tomaso Poggio and Lorenzo Rosasco 9.520 Class 15 March 30 2011 T. Poggio and L.

Intro to Life Cycle Analysis Intro to Life Cycle Analysis Intro to Life Cycle Analysis

Intro to Electronics Week 2 Intro to Electronics, Week 2 Last updated Oct. 17, 2012 1 Build a

Large-Scale Data Engineering Intro to LSDE, Intro to Big Data &amp; Intro to Cloud Computing

Lecture 5: HW1 Discussion, Intro to GPUs G63.2011.002/G22.2945.001 October 5, 2010 Discuss HW1

Large-Scale Data Engineering Intro to LSDE, Intro to Big Data &amp; Intro to Cloud Computing

Lab 0 Objectives Intro to Labs Intro to Operating Systems Start Lab #0 UNIX/Linux

Some issues in model-based development for embedded control systems Paul Caspi Verimag-Cnrs

Digital Advertising (PPC/SEM) Course Digital Advertising (PPC/SEM) Equinet 1 Academy Digital

disambiguation on Twitter Damiano Spina, Enrique Amig and Julio Gonzalo

SETTING UP A CP2K CALCULATION Iain Bethune (ibethune@epcc.ed.ac.uk) Overview How to run

Finding Top-k Min-Cost Connected Trees in Databases Bolin Ding 1 Jeffrey Xu Yu 1 Shan Wang 2 Lu

Digital Magazine Design Page Make A Plan Plan 1.1 1.2 1.3 1.5 1.6 1.4 Structure

Reinforcement Learning Lecture 18a Gillian Hayes 7th March 2007 Gillian Hayes RL Lecture 18a

GATORCON 2020 4TH 6TH FEBRUARY . OLD THORNS MANOR HOTEL #GatorCon2020 Live Q&amp;A at sli.do

Honeycomb Crea/ve Works is financed by the European

Large-Scale Data Engineering Intro to LSDE, Intro to Big Data & Intro to Cloud Computing

Large-Scale Data Engineering Intro to LSDE, Intro to Big Data & Intro to Cloud Computing

GATORCON 2020 4TH 6TH FEBRUARY . OLD THORNS MANOR HOTEL #GatorCon2020 Live Q&A at sli.do