Machine Learning
Supervised Learning: The Setup
1
Supervised Learning: The Setup Machine Learning 1 Last lecture We - - PowerPoint PPT Presentation
Supervised Learning: The Setup Machine Learning 1 Last lecture We saw What is learning? Learning as generalization The badges game 2 This lecture More badges Formalizing supervised learning Instance space and features
1
2
3
Some slides based on lectures from Tom Dietterich, Dan Roth
4
5
(Full data on the class website, you can stare at it longer if you want) Name Label Claire Cardie
Eric Baum
Haym Hirsh
Yoav Freund
6
(Full data on the class website, you can stare at it longer if you want) Name Label Claire Cardie
Eric Baum
Haym Hirsh
Yoav Freund
7
(Full data on the class website, you can stare at it longer if you want) Name Label Claire Cardie
Eric Baum
Haym Hirsh
Yoav Freund
8
(Full data on the class website, you can stare at it longer if you want)
If last letter of first name is before last letter of last name: label = + else label = -
Name Label Claire Cardie
Eric Baum
Haym Hirsh
Yoav Freund
9
10
11
Running example: Automatically tag news articles
12
Running example: Automatically tag news articles An instance of a news article that needs to be classified A label
13
Running example: Automatically tag news articles An instance of a news article that needs to be classified A label
14
Instance Space: All possible news articles Label Space: All possible labels Running example: Automatically tag news articles
15
𝒴: Instance Space The set of examples that need to be classified Eg: The set of all possible names, documents, sentences, images, emails, etc
16
𝒴: Instance Space The set of examples that need to be classified Eg: The set of all possible names, documents, sentences, images, emails, etc 𝒵: Label Space The set of all possible labels Eg: {Spam, Not-Spam}, {+,-}, etc.
17
𝒴: Instance Space The set of examples that need to be classified Eg: The set of all possible names, documents, sentences, images, emails, etc
𝒵: Label Space The set of all possible labels Eg: {Spam, Not-Spam}, {+,-}, etc.
18
𝒴: Instance Space The set of examples that need to be classified Eg: The set of all possible names, documents, sentences, images, emails, etc
𝒵: Label Space The set of all possible labels Eg: {Spam, Not-Spam}, {+,-}, etc. The goal of learning: Find this target function
19
𝒴: Instance Space The set of examples that need to be classified
𝒵: Label Space The set of all possible labels
20
𝒴: Instance Space The set of examples that need to be classified
𝒵: Label Space The set of all possible labels
Labeled training data
21
𝒴: Instance Space The set of examples that need to be classified
𝒵: Label Space The set of all possible labels
Labeled training data Learning algorithm
22
𝒴: Instance Space The set of examples that need to be classified
𝒵: Label Space The set of all possible labels
Labeled training data Learning algorithm A learned function : 𝒴 → 𝒵
23
𝒴: Instance Space The set of examples that need to be classified
𝒵: Label Space The set of all possible labels
Labeled training data Learning algorithm A learned function : 𝒴 → 𝒵
This is the training phase.
24
𝒴: Instance Space The set of examples that need to be classified
𝒵: Label Space The set of all possible labels
Labeled training data Learning algorithm A learned function : 𝒴 → 𝒵 Can you think of other training protocols?
25
𝒴: Instance Space The set of examples that need to be classified
𝒵: Label Space The set of all possible labels
26
𝒴: Instance Space The set of examples that need to be classified 𝒵: Label Space The set of all possible labels Draw test example 𝑦 ∈ 𝒴 𝑔(𝑦) (𝑦) Are they different? How different?
27
𝒴: Instance Space The set of examples that need to be classified 𝒵: Label Space The set of all possible labels Apply the model to many test examples and compare to the target’s prediction Aggregate these results to get a quality measure
Draw test example 𝑦 ∈ 𝒴 𝑔(𝑦) (𝑦) Are they different? How different?
28
𝒴: Instance Space The set of examples that need to be classified 𝒵: Label Space The set of all possible labels Apply the model to many test examples and compare to the target’s prediction Can we use these test examples during the training phase?
Draw test example 𝑦 ∈ 𝒴 𝑔(𝑦) (𝑦) Are they different? How different?
29
30
31
problem (e.g., news articles) to features
32
problem (e.g., news articles) to features For a training example (𝑦, 𝑔 𝑦 ), the value of 𝑔 𝑦 is called its label
33
problem (e.g., news articles) to features For a training example (𝑦, 𝑔 𝑦 ), the value of 𝑔 𝑦 is called its label
The goal of learning: Use the training examples to find a good approximation for 𝑔
34
problem (e.g., news articles) to features For a training example (𝑦, 𝑔 𝑦 ), the value of 𝑔 𝑦 is called its label
The goal of learning: Use the training examples to find a good approximation for 𝑔
35
problem (e.g., news articles) to features For a training example (𝑦, 𝑔 𝑦 ), the value of 𝑔 𝑦 is called its label
The goal of learning: Use the training examples to find a good approximation for 𝑔
Questions?
– Is an email spam or not?
– Given user’s movie preferences, will she like a new movie?
– Is a smartphone app malicious? – Is a Twitter user a bot?
– Were these two documents written by the same person?
– Will the future value of a stock increase or decrease with respect to its current value?
36
(the label space consists of two elements)
What are the inputs to the problem? What are the features?
What is the prediction task?
What functions should the learning algorithm search over?
How do we learn from the labeled data?
What is success?
37
38
𝒴: Instance Space The set of examples that need to be classified Eg: The set of all possible names, documents, sentences, images, emails, etc
Learning is search over functions 𝒵: Label Space The set of all possible labels Eg: {Spam, Not-Spam}, {+,-}, etc. The goal of learning: Find this target function
39
𝒴: Instance Space The set of examples that need to be classified Eg: The set of all possible names, documents, sentences, images, emails, etc
Learning is search over functions 𝒵: Label Space The set of all possible labels Eg: {Spam, Not-Spam}, {+,-}, etc. The goal of learning: Find this target function Designing an appropriate feature representation of the instance space is crucial Instances x 2 X are defined by features/attributes Features could be Boolean
Features could be real valued
Features could be hand-crafted or themselves learned
40
An input to the problem (Eg: emails, names, images) A feature vector Feature function
41
An input to the problem (Eg: emails, names, images) A feature vector Feature function Feature functions, also known as feature extractors
Typically thought of as high-dimensional vectors Important part of the design of a learning based solution
42
– Each dimension is one feature, we have 𝑒 features in all
– Each x = [x), x+, ⋯ , 𝑦7] is a point in the vector space with 𝑒 dimensions
43
– Each dimension is one feature, we have 𝑒 features in all
– Each x = [x), x+, ⋯ , 𝑦7] is a point in the vector space with 𝑒 dimensions
44
– Each dimension is one feature, we have 𝑒 features in all
– Each x = [x), x+, ⋯ , 𝑦7] is a point in the vector space with 𝑒 dimensions
45
→ 5
→ 3
46
→ 5
→ 3
47
→ 5
→ 3
48
What is the dimensionality of these feature vectors?
→ 5
→ 3
49
26 (One dimension per letter) What is the dimensionality of these feature vectors?
→ 5
→ 3
50
26 (One dimension per letter) What is the dimensionality of these feature vectors? Such vectors where exactly one dimension is 1 and all others are zero are called
This is the one-hot representation of the feature “The second letter of the name”
→ 5
→ 3
51
→ 5
→ 3
52
Features can be accumulated by concatenating the vectors
Something to think about: Why would we think that this is a bad feature?
53
What are the inputs to the problem? What are the features?
What is the learning task?
What functions should the learning algorithm search over?
How do we learn from the labeled data?
What is success?
54
55
𝒴: Instance Space The set of examples that need to be classified Eg: The set of all possible names, documents, sentences, images, emails, etc
Learning is search over functions 𝒵: Label Space The set of all possible labels Eg: {Spam, Not-Spam}, {+,-}, etc. The goal of learning: Find this target function
56
𝒴: Instance Space The set of examples that need to be classified Eg: The set of all possible names, documents, sentences, images, emails, etc
Learning is search over functions 𝒵: Label Space The set of all possible labels Eg: {Spam, Not-Spam}, {+,-}, etc. The goal of learning: Find this target function
57
Classification is the primary focus of this class
58
What are the inputs to the problem? What are the features?
What is the learning task?
What functions should the learning algorithm search over?
How do we learn from the labeled data?
What is success?
59
60
𝒴: Instance Space The set of examples that need to be classified Eg: The set of all possible names, documents, sentences, images, emails, etc
Learning is search over functions 𝒵: Label Space The set of all possible labels Eg: {Spam, Not-Spam}, {+,-}, etc. The goal of learning: Find this target function
61
𝒴: Instance Space The set of examples that need to be classified Eg: The set of all possible names, documents, sentences, images, emails, etc
𝒵: Label Space The set of all possible labels Eg: {Spam, Not-Spam}, {+,-}, etc. The goal of learning: Find this target function
62
Unknown function f x1 x2 y = f(x1, x2) Can you learn this function? What is it?
Assume that 1 stands for True 0 stands for False
63
Unknown function f x1 x2 x3 x4 y = f(x1, x2, x3, x4) Can you learn this function? What is it?
64
65
66
67
– We were looking at the space of all Boolean functions – Instead choose a hypothesis space that is not all possible functions
without negations)
– Using some prior knowledge (or by guessing)
– We need a hypothesis space that is flexible enough
68
(The “When in doubt, make an assumption” school of thought!)
69
There are only 16 simple conjunctive rules
Example
70
There are only 16 simple conjunctive rules
Example
71
There are only 16 simple conjunctive rules
Example Exercise: How many simple conjunctions are possible when there are n inputs instead of 4?
72
There are only 16 simple conjunctive rules
Example Is there a consistent hypothesis in this space?
73
There are only 16 simple conjunctive rules
Example
74
There are only 16 simple conjunctive rules
Example No simple conjunction explains the data! (Confirm each counterexample by going through the list afterwards) Our hypothesis space is too small and the true function we were looking for is not in it. L
– We were looking at the space of all Boolean functions – Instead choose a hypothesis space that is not all possible functions
without negations)
– Using some prior knowledge (or by guessing)
– We need a hypothesis space that is flexible enough
75
– We were looking at the space of all Boolean functions – Instead choose a hypothesis space that is not all possible functions
without negations)
– Using some prior knowledge (or by guessing)
– We need a hypothesis space that is flexible enough
76
77
Is there a consistent hypothesis in this space? Exercise: Check if there is one First, how many m-of-n rules are there for four variables? Example
may call it the oracle.
functions, grammars, multi-layer deep networks, etc
78
79
What are the inputs to the problem? What are the features?
What is the learning task?
What functions should the learning algorithm search over?
How do we learn from the labeled data?
What is success?
80
Much of the rest