Computational Learning Theory: An Analysis of a Conjunction Learner - PowerPoint PPT Presentation

Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others 1

This lecture: Computational Learning Theory • The Theory of Generalization • Probably Approximately Correct (PAC) learning • Positive and negative learnability results • Agnostic Learning • Shattering and the VC dimension 2

Where are we? • The Theory of Generalization – When can be trust the learning algorithm? – What functions can be learned? – Batch Learning • Probably Approximately Correct (PAC) learning • Positive and negative learnability results • Agnostic Learning • Shattering and the VC dimension 3

This section 1. Analyze a simple algorithm for learning conjunctions 2. Define the PAC model of learning 3. Make formal connections to the principle of Occam’s razor 4

This section 1. Analyze a simple algorithm for learning conjunctions 2. Define the PAC model of learning 3. Make formal connections to the principle of Occam’s razor 5

Learning Conjunctions The true function f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Training data – <(1,1,1,1,1,1,…,1,1), 1> – <(1,1,1,0,0,0,…,0,0), 0> – <(1,1,1,1,1,0,...0,1,1), 1> – <(1,0,1,1,1,0,...0,1,1), 0> – <(1,1,1,1,1,0,...0,0,1), 1> – <(1,0,1,0,0,0,...0,1,1), 0> – <(1,1,1,1,1,1,…,0,1), 1> – <(0,1,0,1,0,0,...0,1,1), 0> 6

Learning Conjunctions f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Training data – <(1,1,1,1,1,1,…,1,1), 1> A simple learning algorithm ( Elimination ) – <(1,1,1,0,0,0,…,0,0), 0> – <( 1 , 1 , 1 , 1 , 1 ,0,...0,1, 1 ), 1> Discard all negative examples • – <(1,0,1,1,1,0,...0,1,1), 0> – <( 1 , 1 , 1 , 1 , 1 ,0,...0,0, 1 ), 1> – <(1,0,1,0,0,0,...0,1,1), 0> – <( 1 , 1 , 1 , 1 , 1 ,1,…,0, 1 ), 1> – <(0,1,0,1,0,0,...0,1,1), 0> 7

Learning Conjunctions f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Training data – <( 1 , 1 , 1 , 1 , 1 ,1,…,1, 1 ), 1> A simple learning algorithm ( Elimination ) – <(1,1,1,0,0,0,…,0,0), 0> – <( 1 , 1 , 1 , 1 , 1 ,0,...0,1, 1 ), 1> Discard all negative examples • – <(1,0,1,1,1,0,...0,1,1), 0> Build a conjunction using the features • – <( 1 , 1 , 1 , 1 , 1 ,0,...0,0, 1 ), 1> that are common to all positive – <(1,0,1,0,0,0,...0,1,1), 0> conjunctions – <( 1 , 1 , 1 , 1 , 1 ,1,…,0, 1 ), 1> h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 – <(0,1,0,1,0,0,...0,1,1), 0> 8

Learning Conjunctions f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Training data – <( 1 , 1 , 1 , 1 , 1 ,1,…,1, 1 ), 1> A simple learning algorithm ( Elimination ) – <(1,1,1,0,0,0,…,0,0), 0> – <( 1 , 1 , 1 , 1 , 1 ,0,...0,1, 1 ), 1> Discard all negative examples • – <(1,0,1,1,1,0,...0,1,1), 0> Build a conjunction using the features • – <( 1 , 1 , 1 , 1 , 1 ,0,...0,0, 1 ), 1> that are common to all positive – <(1,0,1,0,0,0,...0,1,1), 0> conjunctions – <( 1 , 1 , 1 , 1 , 1 ,1,…,0, 1 ), 1> h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 – <(0,1,0,1,0,0,...0,1,1), 0> Positive examples eliminate irrelevant features 9

Learning Conjunctions f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Training data – <( 1 , 1 , 1 , 1 , 1 ,1,…,1, 1 ), 1> A simple learning algorithm: – <(1,1,1,0,0,0,…,0,0), 0> – <( 1 , 1 , 1 , 1 , 1 ,0,...0,1, 1 ), 1> Discard all negative examples • – <(1,0,1,1,1,0,...0,1,1), 0> Build a conjunction using the features • – <( 1 , 1 , 1 , 1 , 1 ,0,...0,0, 1 ), 1> that are common to all positive – <(1,0,1,0,0,0,...0,1,1), 0> conjunctions – <( 1 , 1 , 1 , 1 , 1 ,1,…,0, 1 ), 1> h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 – <(0,1,0,1,0,0,...0,1,1), 0> Clearly this algorithm produces a conjunction that is consistent with the data, that is err S (h) = 0, if the target function is a monotone conjunction Exercise: Why? 10

Learning Conjunctions: Analysis h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Claim 1: Any hypothesis consistent with the training data will only make mistakes on positive future examples Why? 11

Learning Conjunctions: Analysis h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Claim 1: Any hypothesis consistent with the training data will only make mistakes on positive future examples Why? A mistake will occur only if some literal z (in our example x 1 ) is present in h but not in f This mistake can cause a positive example to be predicted as negative by h Specifically: x 1 = 0, x 2 =1, x 3 =1, x 4 =1, x 5 =1, x 100 =1 The reverse situation can never happen For an example to be predicted as positive in the training set, every relevant literal must have been present 12

Learning Conjunctions: Analysis h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Claim 1: Any hypothesis consistent with the training data will only make mistakes on positive future examples Why? A mistake will occur only if some literal z (in our example x 1 ) is present in h but not in f This mistake can cause a positive example to be predicted as negative by h Specifically: x 1 = 0, x 2 =1, x 3 =1, x 4 =1, x 5 =1, x 100 =1 The reverse situation can never happen For an example to be predicted as positive in the training set, every relevant literal must have been present 13

Learning Conjunctions: Analysis h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 Claim 1: Any hypothesis consistent with the training data will only make mistakes on positive future examples - Why? f h + + - A mistake will occur only if some literal z (in our - example x 1 ) is present in h but not in f This mistake can cause a positive example to be predicted as negative by h Specifically: x 1 = 0, x 2 =1, x 3 =1, x 4 =1, x 5 =1, x 100 =1 The reverse situation can never happen For an example to be predicted as positive in the training set, every relevant literal must have been present 14

Learning Conjunctions: Analysis Theorem: Suppose we are learning a conjunctive concept with n dimensional Boolean features using m training examples. If then, with probability > 1 - ± , the error of the learned hypothesis err D (h) will be less than ² . 15

Learning Conjunctions: Analysis Theorem: Suppose we are learning a conjunctive concept with n dimensional Boolean features using m training examples. If Poly in n, 1/ ± , 1/ ² then, with probability > 1 - ± , the error of the learned hypothesis err D (h) will be less than ² . If we see these many training examples, then the algorithm will produce a conjunction that, with high probability, will make few errors 16

Learning Conjunctions: Analysis Theorem: Suppose we are learning a conjunctive concept with n dimensional Boolean features using m training examples. If then, with probability > 1 - ± , the error of the learned hypothesis err D (h) will be less than ² . Let’s prove this assertion 17

Proof Intuition h = x 1 ∧ x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 f = x 2 ∧ x 3 ∧ x 4 ∧ x 5 ∧ x 100 What kinds of examples would drive a hypothesis to make a mistake? Positive examples, where x 1 is absent f would say true and h would say false None of these examples appeared during training Otherwise x 1 would have been eliminated If they never appeared during training, maybe their appearance in the future would also be rare! Let’s quantify our surprise at seeing such examples 18

Computational Learning Theory: An Analysis of a Conjunction Learner - PowerPoint PPT Presentation

Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others 1 This lecture: Computational Learning Theory The Theory of Generalization

Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning 1

Computational Learning Theory: Agnostic Learning Machine Learning 1 Slides based on material

Statistical and Computational Statistical and Computational Learning Theory Learning Theory

Computational Learning Theory: Positive and negative learnability results Machine Learning 1

Computational Learning Theory: Occams Razor Machine Learning 1 Slides based on material from

Computational Learning Theory: Shattering and VC Dimensions Machine Learning 1 Slides based on

Applying Computational Learning Theory to Software Testing Neil Walkinshaw Computational

Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch.

Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7:

Computational Learning Theory: The Theory of Generalization Machine Learning 1 Slides based on

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

1. Computational Fluid a. Computational Fluid Dynamics is in the domain of Computational Science

Computational Learning Theory [read Chapter 7] [Suggested exercises: 7.1, 7.2, 7.5, 7.8]

Computational Physics What is Computational Physics? Basic Computer Hardware Operating Systems

SWOT Analysis W T S O SWOT Analysis Learning Objectives What is SWOT Analysis? What is SWOT

Dennis Ryan Clark County School District Health Occupations ryandl@nv.ccsd.net Learning Theory

Observational equivalence using scheduler for quantum processes K. Yasuda / T. Kubota / Y.

Bisimulation Congruences in the Calculus of Looping Sequences Roberto Barbuti Andrea Maggiolo

YangBaxter equation and a congruence of biracks Pemysl Jedlika with Agata Pilitowska and

Congruence permutable Fregean varieties Katarzyna Somczyska Pedagogical University in Krakw

Poetic Figures 1 THE OMISSION OF CERTAIN WORDS Ellipsis: leaving out omission of

Hoare Logic and Model Checking In the previous lecture we saw the informal concepts that

Structural Ambiguity How different parse trees may be produced from the same sentence or phrase

Revision: Negation of propositional formulae Conjunctive and disjunctive normal forms of