online learning
play

Online Learning Machine Learning 1 Some slides based on lectures - PowerPoint PPT Presentation

Online Learning Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and others Big picture 2 Big picture Last lecture: Linear models 3 Big picture Linear models How good is a learning algorithm? 4 Big picture


  1. Online Learning Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and others

  2. Big picture 2

  3. Big picture Last lecture: Linear models 3

  4. Big picture Linear models How good is a learning algorithm? 4

  5. Big picture Linear models Online learning How good is a learning algorithm? 5

  6. Big picture Linear models Perceptron, Winnow Online learning How good is a learning algorithm? 6

  7. Big picture Linear models …. Perceptron, Support Vector Winnow Machines PAC, Online …. Empirical Risk learning Minimization How good is a learning algorithm? 7

  8. Mistake bound learning • The mistake bound model • A proof of concept mistake bound algorithm: The Halving algorithm • Examples • Representations and ease of learning 8

  9. Coming up… • Mistake-driven learning • Learning algorithms for learning a linear function over the feature space – Perceptron (with many variants) – General Gradient Descent view Issues to watch out for – Importance of Representation – Complexity of Learning – More about features 9

  10. Mistake bound learning • The mistake bound model • A proof of concept mistake bound algorithm: The Halving algorithm • Examples • Representations and ease of learning 10

  11. Motivation Consider a learning problem in a very high dimensional space {𝑦 ! , 𝑦 " , ⋯ , 𝑦 !###### } And assume that the function space is very sparse (the function of interest depends on a small number of attributes.) 𝑔 = 𝑦 " ∧ 𝑦 $ ∧ 𝑦 % ∧ 𝑦 & ∧ 𝑦 !## Middle Eastern deserts are known for their sweetness Can we develop an algorithm that depends only weakly on the • dimensionality and mostly on the number of relevant attributes? How should we represent the hypothesis? • 11

  12. Motivation Consider a learning problem in a very high dimensional space {𝑦 ! , 𝑦 " , ⋯ , 𝑦 !###### } And assume that the function space is very sparse (the function of interest depends on a small number of attributes.) 𝑔 = 𝑦 " ∧ 𝑦 $ ∧ 𝑦 % ∧ 𝑦 & ∧ 𝑦 !## Middle Eastern deserts are known for their sweetness Can we develop an algorithm that depends only weakly on the • dimensionality and mostly on the number of relevant attributes? How should we represent the hypothesis? • 12

  13. Motivation Consider a learning problem in a very high dimensional space {𝑦 ! , 𝑦 " , ⋯ , 𝑦 !###### } And assume that the function space is very sparse (the function of interest depends on a small number of attributes.) 𝑔 = 𝑦 " ∧ 𝑦 $ ∧ 𝑦 % ∧ 𝑦 & ∧ 𝑦 !## Middle Eastern deserts are known for their sweetness Can we develop an algorithm that depends only weakly on the • dimensionality and mostly on the number of relevant attributes? How should we represent the hypothesis? • 13

  14. Motivation Consider a learning problem in a very high dimensional space {𝑦 ! , 𝑦 " , ⋯ , 𝑦 !###### } And assume that the function space is very sparse (the function of interest depends on a small number of attributes.) 𝑔 = 𝑦 " ∧ 𝑦 $ ∧ 𝑦 % ∧ 𝑦 & ∧ 𝑦 !## Middle Eastern deserts are known for their sweetness Can we develop an algorithm that depends only weakly on the • dimensionality and mostly on the number of relevant attributes? How should we represent the hypothesis? • 14

  15. An illustration of mistake driven learning Learner One example: x Prediction ℎ ! (x) Current hypothesis ℎ ! Loop forever: 1. Receive example x 2. Make a prediction using the current hypothesis ℎ ! (x) 3. Receive the true label for x . 4. If ℎ ! (x) is not correct, then: • Update ℎ ! to ℎ !"# 15

  16. An illustration of mistake driven learning Learner One example: x Prediction ℎ ! (x) Current hypothesis ℎ ! Loop forever: 1. Receive example x 2. Make a prediction using the current hypothesis ℎ ! (x) 3. Receive the true label for x . 4. If ℎ ! (x) is not correct, then: • Update ℎ ! to ℎ !"# Only need to define how prediction and update behave Can such a simple scheme work? How do we quantify what “work” means? 16

  17. Mistake bound algorithms Setting: • – Instance space: 𝒴 (dimensionality 𝑜 ) – Target 𝑔: 𝒴 → 0,1 , 𝑔 ∈ 𝐷 the concept class (parameterized by 𝑜 ) Learning Protocol: • – Learner is given 𝐲 ∈ 𝒴 , randomly chosen – Learner predicts ℎ(𝐲) and is then given 𝑔 𝐲 ⟵ the feedback Performance: learner makes a mistake when ℎ 𝐲 ≠ 𝑔(𝑦) • – 𝑁 ! 𝑔, 𝑇 : Number of mistakes algorithm 𝐵 makes on sequence 𝑇 of examples for the target function 𝑔 – 𝑁 ! 𝐷 = max "∈$ 𝑁 ! 𝑔, 𝑇 : The maximum possible number of mistakes made by 𝐵 for any target function in 𝐷 and any sequence S of examples Algorithm 𝐵 is a mistake bound algorithm for the concept class 𝐷 if • 𝑁 𝐵 (𝐷) is a polynomial in the dimensionality 𝑜 17

  18. Mistake bound algorithms Setting: • – Instance space: 𝒴 (dimensionality 𝑜 ) – Target 𝑔: 𝒴 → 0,1 , 𝑔 ∈ 𝐷 the concept class (parameterized by 𝑜 ) Learning Protocol: • – Learner is given 𝐲 ∈ 𝒴 , randomly chosen – Learner predicts ℎ(𝐲) and is then given 𝑔 𝐲 ⟵ the feedback Performance: learner makes a mistake when ℎ 𝐲 ≠ 𝑔(𝑦) • – 𝑁 ! 𝑔, 𝑇 : Number of mistakes algorithm 𝐵 makes on sequence 𝑇 of examples for the target function 𝑔 – 𝑁 ! 𝐷 = max "∈$ 𝑁 ! 𝑔, 𝑇 : The maximum possible number of mistakes made by 𝐵 for any target function in 𝐷 and any sequence S of examples Algorithm 𝐵 is a mistake bound algorithm for the concept class 𝐷 if • 𝑁 𝐵 (𝐷) is a polynomial in the dimensionality 𝑜 18

  19. Mistake bound algorithms Setting: • – Instance space: 𝒴 (dimensionality 𝑜 ) – Target 𝑔: 𝒴 → 0,1 , 𝑔 ∈ 𝐷 the concept class (parameterized by 𝑜 ) Learning Protocol: • – Learner is given 𝐲 ∈ 𝒴 , randomly chosen – Learner predicts ℎ(𝐲) and is then given 𝑔 𝐲 ⟵ the feedback Performance: learner makes a mistake when ℎ 𝐲 ≠ 𝑔(𝑦) • – 𝑁 ! 𝑔, 𝑇 : Number of mistakes algorithm 𝐵 makes on sequence 𝑇 of examples for the target function 𝑔 – 𝑁 ! 𝐷 = max "∈$ 𝑁 ! 𝑔, 𝑇 : The maximum possible number of mistakes made by 𝐵 for any target function in 𝐷 and any sequence S of examples Algorithm 𝐵 is a mistake bound algorithm for the concept class 𝐷 if • 𝑁 𝐵 (𝐷) is a polynomial in the dimensionality 𝑜 19

  20. Mistake bound algorithms Setting: • – Instance space: 𝒴 (dimensionality 𝑜 ) – Target 𝑔: 𝒴 → 0,1 , 𝑔 ∈ 𝐷 the concept class (parameterized by 𝑜 ) Learning Protocol: • – Learner is given 𝐲 ∈ 𝒴 , randomly chosen – Learner predicts ℎ(𝐲) and is then given 𝑔 𝐲 ⟵ the feedback Performance: learner makes a mistake when ℎ 𝐲 ≠ 𝑔(𝑦) • – 𝑁 ! 𝑔, 𝑇 : Number of mistakes algorithm 𝐵 makes on sequence 𝑇 of examples for the target function 𝑔 – 𝑁 ! 𝐷 = max "∈$ 𝑁 ! 𝑔, 𝑇 : The maximum possible number of mistakes made by 𝐵 for any target function in 𝐷 and any sequence S of examples Algorithm 𝐵 is a mistake bound algorithm for the concept class 𝐷 if • 𝑁 𝐵 (𝐷) is a polynomial in the dimensionality 𝑜 20

  21. Mistake bound algorithms Setting: • – Instance space: 𝒴 (dimensionality 𝑜 ) – Target 𝑔: 𝒴 → 0,1 , 𝑔 ∈ 𝐷 the concept class (parameterized by 𝑜 ) Learning Protocol: • – Learner is given 𝐲 ∈ 𝒴 , randomly chosen – Learner predicts ℎ(𝐲) and is then given 𝑔 𝐲 ⟵ the feedback Performance: learner makes a mistake when ℎ 𝐲 ≠ 𝑔(𝑦) • – 𝑁 ! 𝑔, 𝑇 : Number of mistakes algorithm 𝐵 makes on sequence 𝑇 of examples for the target function 𝑔 – 𝑁 ! 𝐷 = max "∈$ 𝑁 ! 𝑔, 𝑇 : The maximum possible number of mistakes made by 𝐵 for any target function in 𝐷 and any sequence S of examples Algorithm 𝐵 is a mistake bound algorithm for the concept class 𝐷 if • 𝑁 𝐵 (𝐷) is a polynomial in the dimensionality 𝑜 21

  22. Mistake bound algorithms Setting: • – Instance space: 𝒴 (dimensionality 𝑜 ) – Target 𝑔: 𝒴 → 0,1 , 𝑔 ∈ 𝐷 the concept class (parameterized by 𝑜 ) Learning Protocol: • – Learner is given 𝐲 ∈ 𝒴 , randomly chosen – Learner predicts ℎ(𝐲) and is then given 𝑔 𝐲 ⟵ the feedback Performance: learner makes a mistake when ℎ 𝐲 ≠ 𝑔(𝑦) • – 𝑁 ! 𝑔, 𝑇 : Number of mistakes algorithm 𝐵 makes on sequence 𝑇 of examples for the target function 𝑔 – 𝑁 ! 𝐷 = max "∈$ 𝑁 ! 𝑔, 𝑇 : The maximum possible number of mistakes made by 𝐵 for any target function in 𝐷 and any sequence S of examples Algorithm 𝐵 is a mistake bound algorithm for the concept class 𝐷 if • 𝑁 𝐵 (𝐷) is a polynomial in the dimensionality 𝑜 22

  23. Learnability in the mistake bound model • Algorithm 𝐵 is a mistake bound algorithm for the concept class 𝐷 if 𝑁 𝐵 (𝐷) is a polynomial in the dimensionality 𝑜 – That is, the maximum number of mistakes it makes for any sequence of inputs (perhaps even an adversarially chosen one) is polynomial in the dimensionality 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend