Statistical Machine Learning Lecture 08: Regression Kristian - PowerPoint PPT Presentation

Statistical Machine Learning Lecture 08: Regression Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 1 / 55

Today’s Objectives Make you understand how to learn a continuous function Covered Topics Linear Regression and its interpretations What is overfitting? Deriving Linear Regression from Maximum Likelihood Estimation Bayesian Linear Regression K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 2 / 55

Outline 1. Introduction to Linear Regression 2. Maximum Likelihood Approach to Regression 3. Bayesian Linear Regression 4. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 3 / 55

1. Introduction to Linear Regression Outline 1. Introduction to Linear Regression 2. Maximum Likelihood Approach to Regression 3. Bayesian Linear Regression 4. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 4 / 55

1. Introduction to Linear Regression Reminder Our task is to learn a mapping f from input to output f : I → O , y = f ( x ; θ ) Input: x ∈ I (images, text, sensor measurements, ...) Output: y ∈ O Parameters: θ ∈ Θ (what needs to be “learned”) Regression Learn a mapping into a continuous space O = R O = R 3 . . . K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 5 / 55

1. Introduction to Linear Regression Motivation You want to predict the torques of a robot arm y = I ¨ q − µ ˙ q + mlg sin ( q ) � ¨ ˙ � � � ⊺ = q q sin( q ) I − µ mlg = φ ( x ) ⊺ θ Can we do this with a data set? � � � D = ( x i , y i ) � i = 1 · · · n � A linear regression problem! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 6 / 55

1. Introduction to Linear Regression Least Squares Linear Regression We are given pairs of training data points and associated function values ( x i , y i ) � � x 1 ∈ R d , . . . , x n X = Y = { y 1 ∈ R , . . . , y n } Note: here we only do the case y i ∈ R . In general y i can have more than one dimension, i.e., y i ∈ R f for some positive f Start with linear regressor x ⊺ i w + w 0 = y i ∀ i = 1 , . . . , n One linear equation for each training data point/label pair Exactly the same basic setup as for least-squares classification! Only the values are continuous K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 7 / 55

1. Introduction to Linear Regression Least Squares Linear Regression x ⊺ i w + w 0 = y i ∀ i = 1 , . . . , n Step 1 : Define � x i � w � � ˆ ˆ x i = w = 1 w 0 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 8 / 55

1. Introduction to Linear Regression Least Squares Linear Regression x ⊺ i w + w 0 = y i ∀ i = 1 , . . . , n Step 1 : Define � x i � w � � ˆ ˆ x i = w = 1 w 0 Step 2 : Rewrite ˆ x ⊺ i ˆ w = y i ∀ i = 1 , . . . , n K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 8 / 55

1. Introduction to Linear Regression Least Squares Linear Regression x ⊺ i w + w 0 = y i ∀ i = 1 , . . . , n Step 1 : Define � x i � w � � ˆ ˆ x i = w = 1 w 0 Step 2 : Rewrite ˆ x ⊺ i ˆ w = y i ∀ i = 1 , . . . , n Step 3 : Matrix-vector notation X ⊺ ˆ ˆ w = y where ˆ X = [ˆ x 1 , . . . , ˆ x n ] (each ˆ x i is a vector) and y = [ y 1 , . . . , y n ] ⊺ K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 8 / 55

1. Introduction to Linear Regression Least Squares Linear Regression Step 4 : Find the least squares solution � � 2 � ˆ � � ˆ X ⊺ w − y w = arg min � w � � 2 � ˆ � � ∇ w X ⊺ w − y = 0 � X ⊺ � − 1 ˆ � X ˆ ˆ ˆ w = Xy A closed form solution! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 9 / 55

1. Introduction to Linear Regression Least Squares Linear Regression X ⊺ � − 1 ˆ � X ˆ ˆ w = ˆ Xy Where is the costly part of this computation? K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 10 / 55

1. Introduction to Linear Regression Least Squares Linear Regression X ⊺ � − 1 ˆ � X ˆ ˆ w = ˆ Xy Where is the costly part of this computation? The inverse is a R D × D matrix � D 3 � Naive inversion takes O , but better methods exist What can we do if the input dimension D is too large? K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 10 / 55

1. Introduction to Linear Regression Least Squares Linear Regression X ⊺ � − 1 ˆ � X ˆ ˆ w = ˆ Xy Where is the costly part of this computation? The inverse is a R D × D matrix � D 3 � Naive inversion takes O , but better methods exist What can we do if the input dimension D is too large? Gradient descent Work with fewer dimensions K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 10 / 55

1. Introduction to Linear Regression Mechanical Interpretation K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 11 / 55

1. Introduction to Linear Regression Geometric Interpretation Predicted outputs are Linear Combinations of Features! Samples are projected in this Feature Space K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 12 / 55

1. Introduction to Linear Regression Polynomial Regression How can we fit arbitrary polynomials using least-squares regression? We introduce a feature transformation as before y ( x ) = w ⊺ φ ( x ) M � = w i φ i ( x ) i = 0 Assume φ 0 ( x ) = 1 φ i ( . ) are called the basis functions Still a linear model in the parameters w E.g. fitting a cubic polynomial � 1 , x , x 2 , x 3 � ⊺ φ ( x ) = K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 13 / 55

� ✁ 1. Introduction to Linear Regression Polynomial Regression Polynomial of degree 0 (constant value) ✂☎✄✝✆ 1 0 −1 0 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 14 / 55

� ✁ 1. Introduction to Linear Regression Polynomial Regression Polynomial of degree 1 (line) ✂☎✄✝✆ 1 0 −1 0 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 15 / 55

� ✁ 1. Introduction to Linear Regression Polynomial Regression Polynomial of degree 3 (cubic) ✂☎✄✝✆ 1 0 −1 0 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 16 / 55

� ✁ 1. Introduction to Linear Regression Polynomial Regression Polynomial of degree 9 ✂☎✄✝✆ 1 0 −1 0 1 Massive overfitting K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 17 / 55

2. Maximum Likelihood Approach to Regression Outline 1. Introduction to Linear Regression 2. Maximum Likelihood Approach to Regression 3. Bayesian Linear Regression 4. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 18 / 55

2. Maximum Likelihood Approach to Regression Overfitting Relatively little data leads to Enough data leads to a good overfitting estimate 1 1 0 0 − 1 − 1 0 1 0 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 19 / 55

2. Maximum Likelihood Approach to Regression Probabilistic Regression Assumption 1 : Our target function values are generated by adding noise to the function estimate y = f ( x , w ) + ǫ y - target function value; f - regression function; x - input value; w - weights or parameters; ǫ - noise K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 20 / 55

2. Maximum Likelihood Approach to Regression Probabilistic Regression Assumption 1 : Our target function values are generated by adding noise to the function estimate y = f ( x , w ) + ǫ y - target function value; f - regression function; x - input value; w - weights or parameters; ǫ - noise Assumption 2 : The noise is a random variable that is Gaussian distributed � 0 , β − 1 � ǫ ∼ N � � � � � � f ( x , w ) , β − 1 � � � = N p y � x , w , β y f ( x , w ) is the mean; β − 1 is the variance ( β is the precision) Note that y is now a random variable with underlying probability � � � � distribution p y � x , w , β K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 20 / 55

2. Maximum Likelihood Approach to Regression Probabilistic Regression K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 21 / 55

Statistical Machine Learning Lecture 08: Regression Kristian - PowerPoint PPT Presentation

Statistical Machine Learning Lecture 08: Regression Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters Statistical Machine Learning Summer Term 2020 1 / 55 Todays Objectives Make you understand

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Robust Interconnect Robust Interconnect Communication Capacity Algorithm Communication Capacity

Robust Statistics in Stata Ben Jann University of Bern, ben.jann@soz.unibe.ch 2017 London Stata

Correlators of operators on Wilson loops in N=4 SYM and AdS 2 /CFT 1 Arkady Tseytlin M.

The complexity of string partitioning Anne Condon 1 nuch 1 , 2 Chris Thachuk 1 J an Ma 1

Logics for Weighted Timed Pushdown Automata Manfred Droste and Vitaly Perevoshchikov Leipzig

Help System H "able" "absent" "add" "zoom" . . . The

Introduction to Machine Learning - CS725 Instructor: Prof. Ganesh Ramakrishnan Lecture 4 - Linear

Distributed Learning Environment Using XML Templates Sren Auer - University of Leipzig, Germany

Statistical Machine Learning Lecture 08: Regression Kristian - PowerPoint PPT Presentation

Statistical Machine Learning Lecture 08: Regression Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters Statistical Machine Learning Summer Term 2020 1 / 55 Todays Objectives Make you understand

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Robust Interconnect Robust Interconnect Communication Capacity Algorithm Communication Capacity

Robust Statistics in Stata Ben Jann University of Bern, ben.jann@soz.unibe.ch 2017 London Stata

Correlators of operators on Wilson loops in N=4 SYM and AdS 2 /CFT 1 Arkady Tseytlin M.

The complexity of string partitioning Anne Condon 1 nuch 1 , 2 Chris Thachuk 1 J an Ma 1

Logics for Weighted Timed Pushdown Automata Manfred Droste and Vitaly Perevoshchikov Leipzig

Help System H &quot;able&quot; &quot;absent&quot; &quot;add&quot; &quot;zoom&quot; . . . The

Introduction to Machine Learning - CS725 Instructor: Prof. Ganesh Ramakrishnan Lecture 4 - Linear

Distributed Learning Environment Using XML Templates Sren Auer - University of Leipzig, Germany

Help System H "able" "absent" "add" "zoom" . . . The