Lecture 20: Regression Dr. Chengjiang Long Computer Vision - - PowerPoint PPT Presentation

lecture 20 regression
SMART_READER_LITE
LIVE PREVIEW

Lecture 20: Regression Dr. Chengjiang Long Computer Vision - - PowerPoint PPT Presentation

Lecture 20: Regression Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu Recap Previous Lecture 2 C. Long Lecture 20 April 16, 2018 Outline Regression Overview Linear


slide-1
SLIDE 1

Lecture 20: Regression

  • Dr. Chengjiang Long

Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu

slide-2
SLIDE 2
  • C. Long

Lecture 20 April 16, 2018 2

Recap Previous Lecture

slide-3
SLIDE 3
  • C. Long

Lecture 20 April 16, 2018 3

Outline

  • Regression Overview
  • Linear Regression
  • Support Vector Regression
  • Logistic Regression
  • Deep Neural Network for Regression
slide-4
SLIDE 4
  • C. Long

Lecture 20 April 16, 2018 4

Outline

  • Regression Overview
  • Linear Regression
  • Support Vector Regression
  • Logistic Regression
  • Deep Neural Network for Regression
slide-5
SLIDE 5
  • C. Long

Lecture 20 April 16, 2018 5

Regression Overview

  • Logistic Regression
  • Neural Networks

Hierarchy clustering Gaussian Mixture Model

slide-6
SLIDE 6
  • C. Long

Lecture 20 April 16, 2018 6

One Example

......

slide-7
SLIDE 7
  • C. Long

Lecture 20 April 16, 2018 7

Evaluation Metrics

  • Root mean-square error (RMSE)
  • RMSE is a popular formula to measure the error rate of a

regression model. However, it can only be compared between models whose errors are measured in the same units.

  • Mean absolute error (MAE)
  • MAE has the same unit as the original data, and it can only be

compared between models whose errors are measured in the same

  • units. It is usually similar in magnitude to RMSE, but slightly smaller.
slide-8
SLIDE 8
  • C. Long

Lecture 20 April 16, 2018 8

Evaluation Metrics

  • Relative Squared Error (RSE)
  • Unlike RMSE, the relative squared error (RSE) can be compared

between models whose errors are measured in the different units.

  • Relative Absolute Error (RAE)
  • Like RSE , the relative absolute error (RAE) can be compared

between models whose errors are measured in the different units.. mean value

slide-9
SLIDE 9
  • C. Long

Lecture 20 April 16, 2018 9

Outline

  • Regression Overview
  • Linear Regression
  • Support Vector Regression
  • Logistic Regression
  • Deep Neural Network for Regression
slide-10
SLIDE 10
  • C. Long

Lecture 20 April 16, 2018 10

Linear Regression

  • Given data with n dimensional variables and 1 target-

variable (real number) where

  • The objective: Find a function f that returns the best fit.
  • Assume that the relationship between X and y is

approximately linear. The model can be represented as (w represents coefficients and b is an intercept)

slide-11
SLIDE 11
  • C. Long

Lecture 20 April 16, 2018 11

Linear Regression

  • To find the best fit, we minimize the sum of squared

errors -> Least square estimation

  • The solution can be found by solving
  • In MATLAB, the back-slash operator computes a least

square solution.

(By taking the derivative of the above objective function w.r.t. )

slide-12
SLIDE 12
  • C. Long

Lecture 20 April 16, 2018 12

Linear Regression

To avoid over-fitting, a regularization term can be introduced (minimize a magnitude of w)

slide-13
SLIDE 13
  • C. Long

Lecture 20 April 16, 2018 13

Outline

  • Regression Overview
  • Linear Regression
  • Support Vector Regression
  • Logistic Regression
  • Deep Neural Network for Regression
slide-14
SLIDE 14
  • C. Long

Lecture 20 April 16, 2018 14

Support Vector Regression

  • Find a function, f(x), with at most ε-deviation from

the target y

slide-15
SLIDE 15
  • C. Long

Lecture 20 April 16, 2018 15

Support Vector Regression

slide-16
SLIDE 16
  • C. Long

Lecture 20 April 16, 2018 16

Soft margin

slide-17
SLIDE 17
  • C. Long

Lecture 20 April 16, 2018 17

How about a non-linear case?

slide-18
SLIDE 18
  • C. Long

Lecture 20 April 16, 2018 18

Linear versus Non-linear SVR

slide-19
SLIDE 19
  • C. Long

Lecture 20 April 16, 2018 19

Dual problem

slide-20
SLIDE 20
  • C. Long

Lecture 20 April 16, 2018 20

Kernel trick

slide-21
SLIDE 21
  • C. Long

Lecture 20 April 16, 2018 21

Dual problem for non-linear case

slide-22
SLIDE 22
  • C. Long

Lecture 20 April 16, 2018 22

Architecture of a regression machine

[Alex J. Smola et al. A tutorial on support vector regression, 2004.] URL: https://alex.smola.org/papers/2004/SmoSch04.pdf

slide-23
SLIDE 23
  • C. Long

Lecture 20 April 16, 2018 23

Outline

  • Regression Overview
  • Linear Regression
  • Support Vector Regression
  • Logistic Regression
  • Deep Neural Network for Regression
slide-24
SLIDE 24
  • C. Long

Lecture 20 April 16, 2018 24

Logistic Regression

  • Takes a probabilistic approach to learning

discriminative functions (i.e., a classifier)

  • Logistic regression model:
slide-25
SLIDE 25
  • C. Long

Lecture 20 April 16, 2018 25

Interpretation of Hypothesis Output

slide-26
SLIDE 26
  • C. Long

Lecture 20 April 16, 2018 26

Another Interpretation

  • Equivalently, logistic regression assumes that
  • In other words, logistic regression assumes that the log
  • dds is a linear function of x
slide-27
SLIDE 27
  • C. Long

Lecture 20 April 16, 2018 27

Logistic Regression

  • Assume a threshold and...
slide-28
SLIDE 28
  • C. Long

Lecture 20 April 16, 2018 28

Non-Linear DecisionBoundary

  • Can apply basis function expansion to features, same

as with linear regression?

slide-29
SLIDE 29
  • C. Long

Lecture 20 April 16, 2018 29

Logistic Regression

slide-30
SLIDE 30
  • C. Long

Lecture 20 April 16, 2018 30

Logistic Regression Objective Function

  • Can’t just use squared loss as in linear regression

– Using the logistic regression model results in a non-convex optimization

slide-31
SLIDE 31
  • C. Long

Lecture 20 April 16, 2018 31

Deriving the Cost Function via Maximum Likelihood Estimation

slide-32
SLIDE 32
  • C. Long

Lecture 20 April 16, 2018 32

Deriving the Cost Function via Maximum Likelihood Estimation

slide-33
SLIDE 33
  • C. Long

Lecture 20 April 16, 2018 33

Intuition Behind the Objective

slide-34
SLIDE 34
  • C. Long

Lecture 20 April 16, 2018 34

Intuition Behind the Objective

slide-35
SLIDE 35
  • C. Long

Lecture 20 April 16, 2018 35

Intuition Behind the Objective

slide-36
SLIDE 36
  • C. Long

Lecture 20 April 16, 2018 36

Regularized Logistic Regression

  • We can regularize logistic regression exactly as before
slide-37
SLIDE 37
  • C. Long

Lecture 20 April 16, 2018 37

Gradient Descent for Logistic Regression

slide-38
SLIDE 38
  • C. Long

Lecture 20 April 16, 2018 38

Gradient Descent for Logistic Regression

slide-39
SLIDE 39
  • C. Long

Lecture 20 April 16, 2018 39

Gradient Descent for Logistic Regression

slide-40
SLIDE 40
  • C. Long

Lecture 20 April 16, 2018 40

Multi-Class Classification

slide-41
SLIDE 41
  • C. Long

Lecture 20 April 16, 2018 41

Multi-Class Logistic Regression

slide-42
SLIDE 42
  • C. Long

Lecture 20 April 16, 2018 42

Multi-Class Logistic Regression

  • Split into One vs Rest:
  • Train a logistic regression classifier for each class

i to predict the probability that y = i with

slide-43
SLIDE 43
  • C. Long

Lecture 20 April 16, 2018 43

Implementing Multi-Class Logistic Regression

slide-44
SLIDE 44
  • C. Long

Lecture 20 April 16, 2018 44

Outline

  • Regression Overview
  • Linear Regression
  • Support Vector Regression
  • Logistic Regression
  • Deep Neural Network for Regression
slide-45
SLIDE 45
  • C. Long

Lecture 20 April 16, 2018 45

DNN Regression

  • For a two-layer MLP:
  • The network weights are adjusted to minimize an
  • utput cost function
slide-46
SLIDE 46
  • C. Long

Lecture 20 April 16, 2018 46

Computing the Partial Derivatives for Regression

  • We use SSE and for a two layer network the linear final outputs

can be written:

  • We can then use the chain rules for derivatives, as for the Single

Layer Perceptron, to give the derivatives with respect to the two sets of weights

slide-47
SLIDE 47
  • C. Long

Lecture 20 April 16, 2018 47

Deriving the Back Propagation Algorithm for Regression

  • All we now have to do is substitute our derivatives into the weight

update equations

  • Then if the transfer function f(x) is a Sigmoid we can use f′(x) =

f(x).(1 – f(x)) to give

  • These equations constitute the Back-Propagation Learning

Algorithm for Regression.

slide-48
SLIDE 48
  • C. Long

Lecture 20 April 16, 2018 48

Classification + Localization: Task

slide-49
SLIDE 49
  • C. Long

Lecture 20 April 16, 2018 49

Idea #1: Localization as Regression

slide-50
SLIDE 50
  • C. Long

Lecture 20 April 16, 2018 50

Simple Recipe for Classification + Localization

  • Step 1: Train (or download) a classification model

(AlexNet, VGG, GoogLeNet)

slide-51
SLIDE 51
  • C. Long

Lecture 20 April 16, 2018 51

Simple Recipe for Classification + Localization

  • Step 2: Attach new fully-connected “regression head”

to the network

slide-52
SLIDE 52
  • C. Long

Lecture 20 April 16, 2018 52

Simple Recipe for Classification + Localization

  • Step 3: Train the regression head only with SGD and

L2 loss

slide-53
SLIDE 53
  • C. Long

Lecture 20 April 16, 2018 53

Simple Recipe for Classification + Localization

  • Step 4: At test time use both heads
slide-54
SLIDE 54
  • C. Long

Lecture 20 April 16, 2018 54

Per-class vs class agnostic regression

slide-55
SLIDE 55
  • C. Long

Lecture 20 April 16, 2018 55

Where to attach the regression head?

slide-56
SLIDE 56
  • C. Long

Lecture 20 April 16, 2018 56

Aside: Localizing multiple objects

slide-57
SLIDE 57
  • C. Long

Lecture 20 April 16, 2018 57

Aside: Human Pose Estimation

slide-58
SLIDE 58
  • C. Long

Lecture 20 April 16, 2018 58

DNN Regression Applications

  • Great results in:
  • Computer Vision

○ Object Localization / Detection as DNN Regression ○ Self-driving Steering Command Prediction ○ Human Pose Regression

  • Finance

○ Currency Exchange Rate ○ Stock Price Prediction ○ Forecasting Financial Time Series ○ Crude Oil Price Prediction

slide-59
SLIDE 59
  • C. Long

Lecture 20 April 16, 2018 59

DNN Regression Applications

  • Great results in:
  • Atmospheric Sciences

○ Air Quality Prediction ○ Carbon Dioxide Pollution Prediction ○ Ozone Concentration Modeling ○ Sulphur Dioxide Concentration Prediction

  • Infrastructure

○ Road Tunnel Cost Estimation ○ Highway Engineering Cost Estimation

  • Geology / Physics

○ Meteorology and Oceanography Application ○ Pacific Sea Surface Temperature Prediction ○ Hydrological Modeling

slide-60
SLIDE 60
  • C. Long

Lecture 20 April 16, 2018 60