computational learning theory an analysis of a
play

Computational Learning Theory: An Analysis of a Conjunction Learner - PowerPoint PPT Presentation

Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others 1 This lecture: Computational Learning Theory The Theory of Generalization


  1. Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others 1

  2. This lecture: Computational Learning Theory • The Theory of Generalization • Probably Approximately Correct (PAC) learning • Positive and negative learnability results • Agnostic Learning • Shattering and the VC dimension 2

  3. Where are we? • The Theory of Generalization – When can be trust the learning algorithm? – What functions can be learned? – Batch Learning • Probably Approximately Correct (PAC) learning • Positive and negative learnability results • Agnostic Learning • Shattering and the VC dimension 3

  4. This section 1. Analyze a simple algorithm for learning conjunctions 2. Define the PAC model of learning 3. Make formal connections to the principle of Occam’s razor 4

  5. This section 1. Analyze a simple algorithm for learning conjunctions 2. Define the PAC model of learning 3. Make formal connections to the principle of Occam’s razor 5

  6. Learning Conjunctions The true function f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Training data – <(1,1,1,1,1,1,…,1,1), 1> – <(1,1,1,0,0,0,…,0,0), 0> – <(1,1,1,1,1,0,...0,1,1), 1> – <(1,0,1,1,1,0,...0,1,1), 0> – <(1,1,1,1,1,0,...0,0,1), 1> – <(1,0,1,0,0,0,...0,1,1), 0> – <(1,1,1,1,1,1,…,0,1), 1> – <(0,1,0,1,0,0,...0,1,1), 0> 6

  7. Learning Conjunctions f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Training data – <(1,1,1,1,1,1,…,1,1), 1> A simple learning algorithm ( Elimination ) – <(1,1,1,0,0,0,…,0,0), 0> – <( 1 , 1 , 1 , 1 , 1 ,0,...0,1, 1 ), 1> Discard all negative examples • – <(1,0,1,1,1,0,...0,1,1), 0> – <( 1 , 1 , 1 , 1 , 1 ,0,...0,0, 1 ), 1> – <(1,0,1,0,0,0,...0,1,1), 0> – <( 1 , 1 , 1 , 1 , 1 ,1,…,0, 1 ), 1> – <(0,1,0,1,0,0,...0,1,1), 0> 7

  8. Learning Conjunctions f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Training data – <( 1 , 1 , 1 , 1 , 1 ,1,…,1, 1 ), 1> A simple learning algorithm ( Elimination ) – <(1,1,1,0,0,0,…,0,0), 0> – <( 1 , 1 , 1 , 1 , 1 ,0,...0,1, 1 ), 1> Discard all negative examples • – <(1,0,1,1,1,0,...0,1,1), 0> Build a conjunction using the features • – <( 1 , 1 , 1 , 1 , 1 ,0,...0,0, 1 ), 1> that are common to all positive – <(1,0,1,0,0,0,...0,1,1), 0> conjunctions – <( 1 , 1 , 1 , 1 , 1 ,1,…,0, 1 ), 1> h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 – <(0,1,0,1,0,0,...0,1,1), 0> 8

  9. Learning Conjunctions f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Training data – <( 1 , 1 , 1 , 1 , 1 ,1,…,1, 1 ), 1> A simple learning algorithm ( Elimination ) – <(1,1,1,0,0,0,…,0,0), 0> – <( 1 , 1 , 1 , 1 , 1 ,0,...0,1, 1 ), 1> Discard all negative examples • – <(1,0,1,1,1,0,...0,1,1), 0> Build a conjunction using the features • – <( 1 , 1 , 1 , 1 , 1 ,0,...0,0, 1 ), 1> that are common to all positive – <(1,0,1,0,0,0,...0,1,1), 0> conjunctions – <( 1 , 1 , 1 , 1 , 1 ,1,…,0, 1 ), 1> h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 – <(0,1,0,1,0,0,...0,1,1), 0> Positive examples eliminate irrelevant features 9

  10. Learning Conjunctions f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Training data – <( 1 , 1 , 1 , 1 , 1 ,1,…,1, 1 ), 1> A simple learning algorithm: – <(1,1,1,0,0,0,…,0,0), 0> – <( 1 , 1 , 1 , 1 , 1 ,0,...0,1, 1 ), 1> Discard all negative examples • – <(1,0,1,1,1,0,...0,1,1), 0> Build a conjunction using the features • – <( 1 , 1 , 1 , 1 , 1 ,0,...0,0, 1 ), 1> that are common to all positive – <(1,0,1,0,0,0,...0,1,1), 0> conjunctions – <( 1 , 1 , 1 , 1 , 1 ,1,…,0, 1 ), 1> h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 – <(0,1,0,1,0,0,...0,1,1), 0> Clearly this algorithm produces a conjunction that is consistent with the data, that is err S (h) = 0, if the target function is a monotone conjunction Exercise: Why? 10

  11. Learning Conjunctions: Analysis h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Claim 1: Any hypothesis consistent with the training data will only make mistakes on positive future examples Why? 11

  12. Learning Conjunctions: Analysis h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Claim 1: Any hypothesis consistent with the training data will only make mistakes on positive future examples Why? A mistake will occur only if some literal z (in our example x 1 ) is present in h but not in f This mistake can cause a positive example to be predicted as negative by h Specifically: x 1 = 0, x 2 =1, x 3 =1, x 4 =1, x 5 =1, x 100 =1 The reverse situation can never happen For an example to be predicted as positive in the training set, every relevant literal must have been present 12

  13. Learning Conjunctions: Analysis h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Claim 1: Any hypothesis consistent with the training data will only make mistakes on positive future examples Why? A mistake will occur only if some literal z (in our example x 1 ) is present in h but not in f This mistake can cause a positive example to be predicted as negative by h Specifically: x 1 = 0, x 2 =1, x 3 =1, x 4 =1, x 5 =1, x 100 =1 The reverse situation can never happen For an example to be predicted as positive in the training set, every relevant literal must have been present 13

  14. Learning Conjunctions: Analysis h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Claim 1: Any hypothesis consistent with the training data will only make mistakes on positive future examples - Why? f h + + - A mistake will occur only if some literal z (in our - example x 1 ) is present in h but not in f This mistake can cause a positive example to be predicted as negative by h Specifically: x 1 = 0, x 2 =1, x 3 =1, x 4 =1, x 5 =1, x 100 =1 The reverse situation can never happen For an example to be predicted as positive in the training set, every relevant literal must have been present 14

  15. Learning Conjunctions: Analysis Theorem: Suppose we are learning a conjunctive concept with n dimensional Boolean features using m training examples. If then, with probability > 1 - ± , the error of the learned hypothesis err D (h) will be less than ² . 15

  16. Learning Conjunctions: Analysis Theorem: Suppose we are learning a conjunctive concept with n dimensional Boolean features using m training examples. If Poly in n, 1/ ± , 1/ ² then, with probability > 1 - ± , the error of the learned hypothesis err D (h) will be less than ² . If we see these many training examples, then the algorithm will produce a conjunction that, with high probability, will make few errors 16

  17. Learning Conjunctions: Analysis Theorem: Suppose we are learning a conjunctive concept with n dimensional Boolean features using m training examples. If then, with probability > 1 - ± , the error of the learned hypothesis err D (h) will be less than ² . Let’s prove this assertion 17

  18. Proof Intuition h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 What kinds of examples would drive a hypothesis to make a mistake? Positive examples, where x 1 is absent f would say true and h would say false None of these examples appeared during training Otherwise x 1 would have been eliminated If they never appeared during training, maybe their appearance in the future would also be rare! Let’s quantify our surprise at seeing such examples 18

  19. Proof Intuition h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 What kinds of examples would drive a hypothesis to make a mistake? Positive examples, where x 1 is absent f would say true and h would say false None of these examples appeared during training Otherwise x 1 would have been eliminated If they never appeared during training, maybe their appearance in the future would also be rare! Let’s quantify our surprise at seeing such examples 19

  20. Proof Intuition h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 What kinds of examples would drive a hypothesis to make a mistake? Positive examples, where x 1 is absent f would say true and h would say false None of these examples appeared during training Otherwise x 1 would have been eliminated If they never appeared during training, maybe their appearance in the future would also be rare! Let’s quantify our surprise at seeing such examples 20

  21. Proof Intuition h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 What kinds of examples would drive a hypothesis to make a mistake? Positive examples, where x 1 is absent f would say true and h would say false None of these examples appeared during training Otherwise x 1 would have been eliminated If they never appeared during training, maybe their appearance in the future would also be rare! Let’s quantify our surprise at seeing such examples 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend