Machine Learning
Computational Learning Theory: The Theory of Generalization
1
Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others
Computational Learning Theory: The Theory of Generalization Machine - - PowerPoint PPT Presentation
Computational Learning Theory: The Theory of Generalization Machine Learning 1 Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others Checkpoint: The bigger picture Supervised learning: instances, concepts, and
1
Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others
Learning algorithm Labeled data Hypothesis/ Model h h
New example Prediction
2
Learning algorithm Labeled data Hypothesis/ Model h h
New example Prediction
3
Learning algorithm Labeled data Hypothesis/ Model h h
New example Prediction
4
Learning algorithm Labeled data Hypothesis/ Model h h
New example Prediction Questions?
5
6
7
8
9
Notation: <example, label> How good is our learning algorithm?
How good is our learning algorithm?
11
How good is our learning algorithm?
12
How good is our learning algorithm?
13
14
15
16
target function – Eg: all 𝑜-conjunctions; all 𝑜-dimensional linear functions, …
– This is the set that the learning algorithm explores
– A hypothesis h Î H such that h(x) = f(x) for all x Î S ? – A hypothesis h Î H such that h(x) = f(x) for all x Î X ?
2 2 1 1 n n
17
target function – Eg: all 𝑜-conjunctions; all 𝑜-dimensional linear functions, …
– This is the set that the learning algorithm explores
– A hypothesis h ∈ 𝐼 such that ℎ 𝑦 = 𝑔(𝑦) for all 𝑦 ∈ 𝑇? – A hypothesis h ∈ 𝐼 such that ℎ 𝑦 = 𝑔(𝑦) for all 𝑦 ∈ 𝑌?
2 2 1 1 n n
18
target function – Eg: all 𝑜-conjunctions; all 𝑜-dimensional linear functions, …
– This is the set that the learning algorithm explores
– A hypothesis h ∈ 𝐼 such that ℎ 𝑦 = 𝑔(𝑦) for all 𝑦 ∈ 𝑇? – A hypothesis h ∈ 𝐼 such that ℎ 𝑦 = 𝑔(𝑦) for all 𝑦 ∈ 𝑌?
2 2 1 1 n n
19
target function – Eg: all 𝑜-conjunctions; all 𝑜-dimensional linear functions, …
– This is the set that the learning algorithm explores
– A hypothesis h ∈ 𝐼 such that ℎ 𝑦 = 𝑔(𝑦) for all 𝑦 ∈ 𝑇? – A hypothesis h ∈ 𝐼 such that ℎ 𝑦 = 𝑔(𝑦) for all 𝑦 ∈ 𝑌?
20
target function – Eg: all 𝑜-conjunctions; all 𝑜-dimensional linear functions, …
– This is the set that the learning algorithm explores
– …for all 𝑦 ∈ 𝑇? – …for all 𝑦 ∈ 𝑌?
21
target function – Eg: all 𝑜-conjunctions; all 𝑜-dimensional linear functions, …
– This is the set that the learning algorithm explores
22
23
Consider a two dimensional instance space Not all points in the space are equally likely to exist as instances. For example, not every sequence of words is an email, not every sequence of letters is a name
24
Consider a two dimensional instance space Not all points in the space are equally likely to exist as instances. For example, not every sequence of words is an email, not every sequence of letters is a name That is, there is a some probability that a point in the space of instances is an instance
25
Consider a two dimensional instance space Not all points in the space are equally likely to exist as instances. For example, not every sequence of words is an email, not every sequence of letters is a name That is, there is a some probability that a point in the space of instances is an instance
26
Consider a two dimensional instance space Not all points in the space are equally likely to exist as instances. For example, not every sequence of words is an email, not every sequence of letters is a name That is, there is a some probability that a point in the space of instances is an instance Or as a contour plot
27
Consider a two dimensional instance space Not all points in the space are equally likely to exist as instances. For example, not every sequence of words is an email, not every sequence of letters is a name That is, there is a some probability that a point in the space of instances is an instance Or as a contour plot We assume that any finite set of examples is drawn i.i.d from this distribution.
28
Consider a two dimensional instance space Not all points in the space are equally likely to exist as instances. For example, not every sequence of words is an email, not every sequence of letters is a name That is, there is a some probability that a point in the space of instances is an instance Or as a contour plot We assume that any finite set of examples is drawn from this distribution.
29
Consider a two dimensional instance space Not all points in the space are equally likely to exist as instances. For example, not every sequence of words is an email, not every sequence of letters is a name That is, there is a some probability that a point in the space of instances is an instance Or as a contour plot We assume that any finite set of examples is drawn from this distribution. We may not know what the distribution is, but we assume one exists and is fixed
30
31
32
33
Instance space 𝑌
34
Instance space 𝑌 Target concept 𝑔 labels all these points as +ve
35
Instance space 𝑌 Target concept 𝑔 labels all these points as +ve A hypothesis h labels all these points as +ve
36
Instance space 𝑌 Target concept 𝑔 labels all these points as +ve A hypothesis h labels all these points as +ve Error: 𝑔 and ℎ disagree
37
38
– Learner sees a single example, makes a prediction – If mistake, update hypothesis
39
– Learner sees a single example, makes a prediction – If mistake, update hypothesis
40
– Learner sees a single example, makes a prediction – If mistake, update hypothesis
41
– Learner sees a single example, makes a prediction – If mistake, update hypothesis
42