 
              Machine Learning 2007: Lecture 3 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website: www.cwi.nl/˜erven/teaching/0708/ml/ September 20, 2007 1 / 30
Overview Organisational Organisational Matters ● Matters Hypothesis Spaces ● Hypothesis Spaces Method: Least Squares Linear Regression ● Least Squares Linear Regression Being Informal about Feature Vectors ● Being Informal about Method: L IST -T HEN -E LIMINATE for Concept Learning ● Feature Vectors L IST -T HEN -E LIMINATE ✦ A Biased Hypothesis Space for Concept Learning ✦ An Unbiased Hypothesis Space? Biased Hypothesis Space An Unbiased Hypothesis Space? 2 / 30
Organisational Matters Organisational Course Organisation: Matters Hypothesis Spaces Intermediate exam: October 25, 11.00 – 13.00 in 04A05. ● Least Squares Linear Biweekly exercises ● Regression Being Informal about This Lecture versus Mitchell Feature Vectors L IST -T HEN -E LIMINATE for Concept Learning All of it is in the book (Chapters 1 and 2), except for “Being ● Biased Hypothesis Informal About Feature Vectors”. Space The presentation is different though: We recognise methods ● An Unbiased Hypothesis Space? from Mitchell as methods to deal with regression and classification. 3 / 30
Overview Organisational Organisational Matters ● Matters Hypothesis Spaces ● Hypothesis Spaces Method: Least Squares Linear Regression ● Least Squares Linear Regression Being Informal about Feature Vectors ● Being Informal about Method: L IST -T HEN -E LIMINATE for Concept Learning ● Feature Vectors L IST -T HEN -E LIMINATE ✦ A Biased Hypothesis Space for Concept Learning ✦ An Unbiased Hypothesis Space? Biased Hypothesis Space An Unbiased Hypothesis Space? 4 / 30
Reminder of Machine Learning Categories Prediction: Given data D = y 1 , . . . , y n , predict how the Organisational Matters sequence continues with y n +1 . Hypothesis Spaces � y 1 � � y n � Least Squares Linear Regression: Given data D = , learn to predict , . . . , Regression x 1 x n Being Informal about the value of the label y for any new feature vector x . Typically y Feature Vectors L IST -T HEN -E LIMINATE can take infinitely many values. Acceptable if your prediction is for Concept Learning close to the correct y . Biased Hypothesis Space � y 1 � � y n � Classification: Given data D = An Unbiased , . . . , , learn to Hypothesis Space? x 1 x n predict the class label y for any new feature vector x . Only finitely many categories. Your prediction is either correct or wrong. 5 / 30
Hypotheses and Hypothesis Spaces Definition of a Hypothesis: Organisational Matters A hypothesis h is a candidate description of the regularity or Hypothesis Spaces Least Squares Linear patterns in your data. Regression Being Informal about Feature Vectors Prediction example: y n +1 = h ( y 1 , . . . , y n ) = y n − 1 + y n ● L IST -T HEN -E LIMINATE Regression example: y = h ( x ) = 5 x 1 ● for Concept Learning � +1 if 3 x 1 − 20 > 0; Biased Hypothesis Classification example: y = h ( x ) = ● Space − 1 otherwise. An Unbiased Hypothesis Space? Definition of a Hypothesis Space: A hypothesis space H is the set { h } of hypotheses that are being considered. Regression example: { h a ( x ) = a · x 1 | a ∈ R } ● 6 / 30
Overview Organisational Organisational Matters ● Matters Hypothesis Spaces ● Hypothesis Spaces Method: Least Squares Linear Regression ● Least Squares Linear Regression Being Informal about Feature Vectors ● Being Informal about Method: L IST -T HEN -E LIMINATE for Concept Learning ● Feature Vectors L IST -T HEN -E LIMINATE ✦ A Biased Hypothesis Space for Concept Learning ✦ An Unbiased Hypothesis Space? Biased Hypothesis Space An Unbiased Hypothesis Space? 7 / 30
Linear Regression Linear Regression: Organisational Matters In linear regression the goal is to select a linear hypothesis that Hypothesis Spaces Least Squares Linear best captures the regularity in the data. Regression Being Informal about Feature Vectors 100 L IST -T HEN -E LIMINATE for Concept Learning 80 Biased Hypothesis Space 60 An Unbiased Hypothesis Space? 40 y 20 0 −20 −10 −5 0 5 10 15 x 8 / 30
Hypothesis Space of Linear Hypotheses Organisational Linear Function: Matters Hypothesis Spaces y = h w ( x ) = w 0 + w 1 x 1 + . . . + w d x d Least Squares Linear Regression Being Informal about x = ( x 1 , . . . , x d ) ⊤ is a d -dimensional feature vector. Feature Vectors ● w = ( w 0 , w 1 , . . . , w d ) ⊤ are called the weights . L IST -T HEN -E LIMINATE ● for Concept Learning Examples: Biased Hypothesis Space An Unbiased h w ( x ) = 2 + 9 x 1 ( w 0 = 2 , w 1 = 9) Hypothesis Space? h w ( x ) = 3 + 16 x 1 − 2 x 3 ( w 0 = 3 , w 1 = 16 , w 2 = 0 , w 3 = − 2) Hypothesis Space of All Linear Hypotheses: H = { h w | w ∈ R d +1 } . 9 / 30
Example: A Linear Function with Noise Organisational Matters 100 Hypothesis Spaces 80 Least Squares Linear Regression 60 Being Informal about Feature Vectors 40 y L IST -T HEN -E LIMINATE 20 for Concept Learning 0 Biased Hypothesis Space −20 An Unbiased Hypothesis Space? −10 −5 0 5 10 15 x Data generated by a linear function y = 6 x + 20 + ǫ, where ǫ is noise with distribution N (0 , 10) . Can we recover this function from the data alone? 10 / 30
Determining Weights from the Data Organisational Squared Error: Matters For given w , we may evaluate the squared error of h w on a single Hypothesis Spaces � y i � Least Squares Linear data-item : Regression x i Being Informal about Feature Vectors Squared Error = ( y i − h w ( x i )) 2 L IST -T HEN -E LIMINATE for Concept Learning Biased Hypothesis Space Least Squares Linear Regression: An Unbiased Hypothesis Space? � y 1 � � y n � Given data D = , . . . , , select w to minimize the sum x 1 x n of squared errors SSE ( D ) on all data: n � ( y i − h w ( x i )) 2 . min w SSE ( D ) = min w i =1 11 / 30
Linear Regression Example The previous example again: Organisational Matters Hypothesis Spaces 100 Least Squares Linear Regression 80 Being Informal about Feature Vectors L IST -T HEN -E LIMINATE 60 for Concept Learning 40 Biased Hypothesis y Space An Unbiased 20 Hypothesis Space? 0 −20 −10 −5 0 5 10 15 x Original Function y = 6 x + 20 + ǫ 12 / 30
Linear Regression Example The previous example again: Organisational Matters Hypothesis Spaces 100 Least Squares Linear Regression 80 Being Informal about Feature Vectors L IST -T HEN -E LIMINATE 60 for Concept Learning 40 Biased Hypothesis y Space An Unbiased 20 Hypothesis Space? 0 −20 −10 −5 0 5 10 15 x Original Function Least Squares y = 6 x + 20 + ǫ y = 6 . 38 x + 17 . 37 12 / 30
Inductive Bias Organisational Least Squares Linear Regression: Matters Hypothesis Spaces Only looks for linear patterns in the data. ● Least Squares Linear Regression For example, it cannot discover y = x 2 ✦ 1 even if it gets an Being Informal about infinite amount of data. Feature Vectors L IST -T HEN -E LIMINATE Minimizes the sum of squared errors. for Concept Learning ● Biased Hypothesis ✦ Why not something else, like for example the sum of Space absolute errors? An Unbiased Hypothesis Space? n � | y i − h w ( x i ) | min w i =1 13 / 30
Overview Organisational Organisational Matters ● Matters Hypothesis Spaces ● Hypothesis Spaces Method: Least Squares Linear Regression ● Least Squares Linear Regression Being Informal about Feature Vectors ● Being Informal about Method: L IST -T HEN -E LIMINATE for Concept Learning ● Feature Vectors L IST -T HEN -E LIMINATE ✦ A Biased Hypothesis Space for Concept Learning ✦ An Unbiased Hypothesis Space? Biased Hypothesis Space An Unbiased Hypothesis Space? 14 / 30
EnjoySport Representation 1 Numbering Attribute Values: Organisational Matters Attribute Sky AirTemp EnjoySport Hypothesis Spaces Least Squares Linear Value Sunny Cloudy Rainy Warm Cold No Yes Regression Encoding 1 2 3 1 2 1 2 Being Informal about Feature Vectors L IST -T HEN -E LIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 15 / 30
EnjoySport Representation 1 Numbering Attribute Values: Organisational Matters Attribute Sky AirTemp EnjoySport Hypothesis Spaces Least Squares Linear Value Sunny Cloudy Rainy Warm Cold No Yes Regression Encoding 1 2 3 1 2 1 2 Being Informal about Feature Vectors Example: L IST -T HEN -E LIMINATE for Concept Learning Sky, AirTemp EnjoySport Representation Biased Hypothesis � 1 � Space Sunny, Warm Yes x = , y = 2 1 An Unbiased � 3 � Hypothesis Space? x = , y = 1 Rainy, Cold No 2 � 1 � Sunny, Cold Yes x = , y = 2 2 ● The difference between feature vectors has no clear meaning. For � 3 � � 1 � � 2 � − example = . 2 1 1 15 / 30
EnjoySport Representation 2 Another Way to Do It: Organisational Matters Attribute Sky AirTemp EnjoySport Hypothesis Spaces Least Squares Linear Value Sunny Cloudy Rainy Warm Cold No Yes Regression       1 0 0 � 1 � � 0 � Being Informal about Encoding 0 1 0 1 2 Feature Vectors       0 1 0 0 1 L IST -T HEN -E LIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 16 / 30
Recommend
More recommend