Linear Regression 4/14/17 Hypothesis Space Supervised learning - - PowerPoint PPT Presentation

linear regression
SMART_READER_LITE
LIVE PREVIEW

Linear Regression 4/14/17 Hypothesis Space Supervised learning - - PowerPoint PPT Presentation

Linear Regression 4/14/17 Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number, not a category label The learned model: A linear function mapping


slide-1
SLIDE 1

Linear Regression

4/14/17

slide-2
SLIDE 2

Hypothesis Space

Supervised learning

  • For every input in the data set, we know the output

Regression

  • Outputs are continuous
  • A number, not a category label

The learned model:

  • A linear function mapping input to output
  • A weight for each feature (including bias)
slide-3
SLIDE 3

Linear Models

In two dimensions: In d dimensions: We want to find the linear model that fits our data best. When have we seen a model like this before? f(x) = wx + b ~ x ≡      x0 x1 . . . xd      f(~ x) =      wb w0 . . . wd      ·      1 x0 . . . xd     

slide-4
SLIDE 4

Linear Regression

We want to find the linear model that fits our data best. Key idea: model data as linear model plus noise. Pick the weights to minimize noise magnitude. f(~ x) =      wb w0 . . . wd      ·      1 x0 . . . xd      + ✏

slide-5
SLIDE 5

Squared Error

ˆ f(~ x) =      wb w0 . . . wd      ·      1 x0 . . . xd      f(~ x) =      wb w0 . . . wd      ·      1 x0 . . . xd      + ✏

Define error for a data point to be the squared distance between correct output and predicted output: Error for the model is the sum of point errors: ⇣ f(~ x) − ˆ f(~ x) ⌘2 = ✏2 X

~ x∈data

⇣ y − ˆ f(~ x) ⌘ = X

~ x∈data

✏2

~ x

slide-6
SLIDE 6

Minimizing Squared Error

Goal: pick weights that minimize squared error. Approach #1: gradient descent Your reading showed how to do this for 1D inputs:

slide-7
SLIDE 7

Minimizing Squared Error

Goal: pick weights that minimize squared error. Approach #2 (the right way): analytical solution

  • The gradient is 0 at the error minimum.
  • There is generally a unique global minimum.

~ w = ⇣ XT X ⌘−1 XT~ y

X ≡ ⇥~ x0 ~ x1 . . . ~ xn ⇤ ≡ 2 6 6 6 6 6 4 1 1 . . . 1 x00 x01 . . . x0n x10 x11 . . . x1n . . . . . . ... . . . xd0 xd1 . . . xdn 3 7 7 7 7 7 5

slide-8
SLIDE 8

Change of Basis

Polynomial regression is just linear regression with a change of basis. Perform linear regression on the new representation.

     x0 x1 . . . xd      − →            x0 (x0)2 x1 (x1)2 . . . xd (xd)2                 x0 x1 . . . xd      − →                  x0 (x0)2 (x0)3 x1 (x1)2 (x1)3 . . . xd (xd)2 (xd)3                 

quadratic basis cubic basis

slide-9
SLIDE 9

Change of Basis Demo

slide-10
SLIDE 10

Locally Weighted Regression

Recall from KNN: locally weighted averaging We can apply the same idea here: points that are further away should contribute less to the estimate. To estimate the value for a specific test point xt compute a linear regression with error weighted by distance:

X

~ x∈data

⇣ y − ˆ f(~ x) ⌘ dist(~ xt, ~ x) = X

~ x∈data

✏2

~ x

||~ xt − ~ x||2

slide-11
SLIDE 11

Exam Topics

Covers the machine learning portion of the class.

  • Supervised learning
  • Regression
  • Classification
  • Unsupervised learning
  • Clustering
  • Dimensionality reduction
  • Semi-supervised learning
  • Reinforcement learning

Know the differences between these topics. Know what algorithms apply to which problems.

slide-12
SLIDE 12

Machine Learning Algorithms

  • neural networks
  • perceptrons
  • backpropagation
  • auto-encoders
  • deep learning
  • decision trees
  • naive Bayes
  • k-nearest neighbors
  • support vector machines
  • locally-weighted average
  • linear regression
  • EM
  • K-means
  • Gaussian mixtures
  • hierarchical clustering
  • agglomerative
  • divisive
  • principal component

analysis

  • growing neural gas
  • Q-learning
  • approximate Q-learning
  • ensemble learning