Machin ine Learnin ing Basic ics I2DL: Prof. Niessner, Prof. - - PowerPoint PPT Presentation

machin ine learnin ing
SMART_READER_LITE
LIVE PREVIEW

Machin ine Learnin ing Basic ics I2DL: Prof. Niessner, Prof. - - PowerPoint PPT Presentation

Machin ine Learnin ing Basic ics I2DL: Prof. Niessner, Prof. Leal-Taix 1 Machin ine Learn rning Task I2DL: Prof. Niessner, Prof. Leal-Taix 2 Im Image Cla lassific ication I2DL: Prof. Niessner, Prof. Leal-Taix 3 Appearance


slide-1
SLIDE 1

Machin ine Learnin ing Basic ics

I2DL: Prof. Niessner, Prof. Leal-Taixé 1

slide-2
SLIDE 2

Machin ine Learn rning

Task

I2DL: Prof. Niessner, Prof. Leal-Taixé 2

slide-3
SLIDE 3

Im Image Cla lassific ication

I2DL: Prof. Niessner, Prof. Leal-Taixé 3

slide-4
SLIDE 4

I2DL: Prof. Niessner, Prof. Leal-Taixé 4

Pose Appearance Illumination

slide-5
SLIDE 5

Im Image Cla lassific ication

Occlusions

I2DL: Prof. Niessner, Prof. Leal-Taixé 5

slide-6
SLIDE 6

Im Image Cla lassific ication

I2DL: Prof. Niessner, Prof. Leal-Taixé 6

Background clutter

slide-7
SLIDE 7

Representation

Im Image Cla lassific ication

I2DL: Prof. Niessner, Prof. Leal-Taixé 7

slide-8
SLIDE 8

Task Image classification Experience Data

Machin ine Learn rning

  • How can we learn to perform image classification?

I2DL: Prof. Niessner, Prof. Leal-Taixé 8

slide-9
SLIDE 9

Supervised learning

Machin ine Learn rning

  • No label or target class
  • Find out properties of

the structure of the data

  • Clustering (k-means,

PCA, etc.)

I2DL: Prof. Niessner, Prof. Leal-Taixé 9

Unsupervised learning

slide-10
SLIDE 10

Machin ine Learn rning

Unsupervised learning Supervised learning

I2DL: Prof. Niessner, Prof. Leal-Taixé 10

slide-11
SLIDE 11

Machin ine Learn rning

  • Labels or target classes

Unsupervised learning Supervised learning

I2DL: Prof. Niessner, Prof. Leal-Taixé 11

slide-12
SLIDE 12

DOG DOG DOG CAT CAT CAT

Machin ine Learn rning

Unsupervised learning Supervised learning

I2DL: Prof. Niessner, Prof. Leal-Taixé 12

slide-13
SLIDE 13

Experience Data Training data Test data Underlying assumption that train and test data come from the same distribution

Machin ine Learn rning

  • How can we learn to perform image classification?

I2DL: Prof. Niessner, Prof. Leal-Taixé 13

slide-14
SLIDE 14

Task Image classification Experience Data Performance measure Accuracy

Machin ine Learn rning

I2DL: Prof. Niessner, Prof. Leal-Taixé 14

  • How can we learn to perform image classification?
slide-15
SLIDE 15

Reinforcement learning Agents Environment interaction

Machin ine Learn rning

Unsupervised learning Supervised learning

I2DL: Prof. Niessner, Prof. Leal-Taixé 15

slide-16
SLIDE 16

Reinforcement learning Agents Environment reward

Machin ine Learn rning

Unsupervised learning Supervised learning

I2DL: Prof. Niessner, Prof. Leal-Taixé 16

slide-17
SLIDE 17

Reinforcement learning Agents Environment reward

Machin ine Learn rning

Unsupervised learning Supervised learning

I2DL: Prof. Niessner, Prof. Leal-Taixé 17

slide-18
SLIDE 18

A Sim imple le Cla lassif ifie ier

I2DL: Prof. Niessner, Prof. Leal-Taixé 18

slide-19
SLIDE 19

Neare rest Neig ighbor

?

I2DL: Prof. Niessner, Prof. Leal-Taixé 19

slide-20
SLIDE 20

Neare rest Neig ighbor

distance NN classifier = dog

I2DL: Prof. Niessner, Prof. Leal-Taixé 20

slide-21
SLIDE 21

Neare rest Neig ighbor

distance k-NN classifier = cat

I2DL: Prof. Niessner, Prof. Leal-Taixé 21

slide-22
SLIDE 22

Neare rest Neig ighbor

Source: https://commons.wikimedia.org/wiki/File:Data3classes.png

How does the NN classifier perform on training data? What classifier is more likely to perform best on test data?

I2DL: Prof. Niessner, Prof. Leal-Taixé 22

NN Classifier 5NN Classifier The Data

slide-23
SLIDE 23

Neare rest Neig ighbor

  • Hyperparameters
  • These parameters are problem dependent.
  • How do we choose these hyperparameters?

L2 distance : ||𝑦 − 𝑑||2 L1 distance : |𝑦 − 𝑑|

  • No. of Neighbors: 𝑙

I2DL: Prof. Niessner, Prof. Leal-Taixé 23

slide-24
SLIDE 24

Basic Recip ipe fo for r Machine Learning

  • Split your data

I2DL: Prof. Niessner, Prof. Leal-Taixé 24

Find your hyperparameters train test validation 20% 60% 20%

Other splits are also possible (e.g., 80%/10%/10%)

slide-25
SLIDE 25

Basic Recip ipe fo for r Machine Learning

  • Split your data

I2DL: Prof. Niessner, Prof. Leal-Taixé 25

Find your hyperparameters train test validation 20% 60% 20% Test set is only used once!

slide-26
SLIDE 26

Cro ross Valid idation

train validation Run 1 Run 2 Run 3 Run 4 Run 5 Split the tra train ining data ta into N folds

I2DL: Prof. Niessner, Prof. Leal-Taixé 26

slide-27
SLIDE 27

Cro ross Valid idation

Find your hyperparameters train test validation 20% 60% 20% Why do cross validation? Why not just train and test?

I2DL: Prof. Niessner, Prof. Leal-Taixé 27

slide-28
SLIDE 28

Find your hyperparameters train test validation 20% 60% 20%

Why do cross validation? Why not just train and test?

Cro ross Valid idation

Test set is only used once!

I2DL: Prof. Niessner, Prof. Leal-Taixé 28

slide-29
SLIDE 29

Lin inear Decis isio ion Boundarie ies

This lecture What are the pros and cons for using linear decision boundaries?

I2DL: Prof. Niessner, Prof. Leal-Taixé 32

slide-30
SLIDE 30

Lin inear Regressio ion

I2DL: Prof. Niessner, Prof. Leal-Taixé 33

slide-31
SLIDE 31

Lin inear Regression

  • Supervised learning
  • Find a linear model that explains a target 𝒛 given

inputs 𝒚 𝒚 𝒛

I2DL: Prof. Niessner, Prof. Leal-Taixé 34

slide-32
SLIDE 32

Training Model parameters

Lin inear Regression

{𝒚1:𝑜, 𝒛1:𝑜}

Data points

𝜾

Input (e.g., image,

measurement) Labels (e.g., cat/dog)

I2DL: Prof. Niessner, Prof. Leal-Taixé 35

Learner

slide-33
SLIDE 33

Training Testing Learner Model parameters Predictor

Lin inear Regression

can be parameters of a Neural Network

{𝒚1:𝑜, 𝒛1:𝑜}

Data points

𝜾 𝑦𝑜+1, 𝜾 ො 𝑧𝑜+1 Estimation

I2DL: Prof. Niessner, Prof. Leal-Taixé 36

slide-34
SLIDE 34

Input data, features weights (i.e., model parameters) input dimension

Lin inear Pre rediction

  • A linear model is expressed in the form

ො 𝑧𝑗 = ෍

𝑘=1 𝑒

𝑦𝑗𝑘𝜄

𝑘

I2DL: Prof. Niessner, Prof. Leal-Taixé 37

slide-35
SLIDE 35

ො 𝑧𝑗 = 𝜄0 + ෍

𝑘=1 𝑒

𝑦𝑗𝑘𝜄

𝑘 = 𝜄0 + 𝑦𝑗1𝜄1 + 𝑦𝑗2𝜄2 + ⋯ + 𝑦𝑗𝑒𝜄𝑒 bias

Lin inear Pre rediction

  • A linear model is expressed in the form

𝒚 𝒛

I2DL: Prof. Niessner, Prof. Leal-Taixé 38

𝜄0

slide-36
SLIDE 36

Temperature

  • f a building

Outside temperature

Number of people

Sun exposure Level of humidity

Lin inear Pre rediction

𝜄1 𝜄3 𝜄2 𝜄4 𝑦1 𝑦2 𝑦3 𝑦4

I2DL: Prof. Niessner, Prof. Leal-Taixé 39

slide-37
SLIDE 37

Lin inear Pre rediction

I2DL: Prof. Niessner, Prof. Leal-Taixé 40

ො 𝑧1 ො 𝑧2 ⋮ ො 𝑧𝑜 = 𝜄0 + 𝑦11 𝑦21 ⋯ ⋯ 𝑦1𝑒 𝑦2𝑒 ⋮ ⋱ ⋮ 𝑦𝑜1 ⋯ 𝑦𝑜𝑒 ∙ 𝜄1 𝜄2 ⋮ 𝜄𝑒

ො 𝑧1 ො 𝑧2 ⋮ ො 𝑧𝑜 = 1 1 ⋮ 1 𝑦11 𝑦21 ⋯ ⋯ 𝑦1𝑒 𝑦2𝑒 ⋮ ⋱ ⋮ 𝑦𝑜1 ⋯ 𝑦𝑜𝑒 𝜄0 𝜄1 ⋮ 𝜄𝑒 ֜ ො 𝐳 = 𝐘𝜾

slide-38
SLIDE 38

Input features

(one sample has 𝑒 features)

Model parameters Prediction

Lin inear Pre rediction

ො 𝐳 = 𝐘𝜾

I2DL: Prof. Niessner, Prof. Leal-Taixé 41

(𝑒 weights and 1 bias)

ො 𝑧1 ො 𝑧2 ⋮ ො 𝑧𝑜 = 1 1 ⋮ 1 𝑦11 𝑦21 ⋯ ⋯ 𝑦1𝑒 𝑦2𝑒 ⋮ ⋱ ⋮ 𝑦𝑜1 ⋯ 𝑦𝑜𝑒 𝜄0 𝜄1 ⋮ 𝜄𝑒

slide-39
SLIDE 39

Temperature

  • f the building

Lin inear Pre rediction

I2DL: Prof. Niessner, Prof. Leal-Taixé 42

ො 𝑧1 ො 𝑧2 = 1 25 1 − 10 50 50 2 50 10 ⋅ 0.2 0.64 1 0.14

MODEL

slide-40
SLIDE 40

ො 𝑧1 ො 𝑧2 = 1 25 1 − 10 50 50 2 50 10 ⋅ 0.2 0.64 1 0.14

Temperature

  • f the building

MODEL

Lin inear Pre rediction

I2DL: Prof. Niessner, Prof. Leal-Taixé 43

How

  • w do
  • we
  • b
  • btain

in th the mod

  • del?
slide-41
SLIDE 41

How to Obtain in the Model?

I2DL: Prof. Niessner, Prof. Leal-Taixé 44

Data points Model parameters Labels (ground truth) Estimation Loss function Optimization 𝐘 𝜄 ො 𝑧 𝑧

slide-42
SLIDE 42

How to Obtain in the Model?

  • Loss fu

functio ion: measures how good my estimation is (how good my model is) and tells the optimization method how to make it better.

  • Opti

timizatio ion: : changes the model in order to improve the loss function (i.e., to improve my estimation).

I2DL: Prof. Niessner, Prof. Leal-Taixé 45

slide-43
SLIDE 43

Prediction: Temperature

  • f the building

Lin inear Regression: : Loss Functio ion

I2DL: Prof. Niessner, Prof. Leal-Taixé 46

slide-44
SLIDE 44

Lin inear Regression: : Loss Functio ion

I2DL: Prof. Niessner, Prof. Leal-Taixé 47

Prediction: Temperature

  • f the building
slide-45
SLIDE 45

Minimizing Objective function Energy Cost function

Lin inear Regression: : Loss Functio ion

𝐾 𝜾 = 1 𝑜 ෍

𝑗=1 𝑜

ො 𝑧𝑗 − 𝑧𝑗 2

I2DL: Prof. Niessner, Prof. Leal-Taixé 48

slide-46
SLIDE 46

Optim imization: : Lin inear Least Squares

  • Linear least squares: an approach to fit a linear model

to the data

  • Convex problem, there exists a closed-form solution

that is unique.

min

𝜄

𝐾 𝜾 = 1 𝑜 ෍

𝑗=1 𝑜

ො 𝑧𝑗 − 𝑧𝑗 2

I2DL: Prof. Niessner, Prof. Leal-Taixé 49

slide-47
SLIDE 47

Optim imization: : Lin inear Least Squares

The estimation comes from the linear model 𝑜 training samples

= 1 𝑜 ෍

𝑗=1 𝑜

𝐲𝑗𝜾 − 𝑧𝑗 2

I2DL: Prof. Niessner, Prof. Leal-Taixé 50

min

𝜾

𝐾 𝜾 = 1 𝑜 ෍

𝑗=1 𝑜

ො 𝑧𝑗 − 𝑧𝑗 2

slide-48
SLIDE 48

Optim imization: : Lin inear Least Squares

min

𝜾

𝐾 𝜾 = 1 𝑜 ෍

𝑗=1 𝑜

ො 𝑧𝑗 − 𝑧𝑗 2 = 1 𝑜 ෍

𝑗=1 𝑜

𝐲𝑗𝜾 − 𝑧𝑗 2 min

𝜾

𝐾 𝜾 = 𝐘𝜾 − 𝒛 𝑈(𝐘𝜾 − 𝒛)

𝑜 training samples, each input vector has size 𝑒 𝑜 labels

Matrix notation

I2DL: Prof. Niessner, Prof. Leal-Taixé 51

slide-49
SLIDE 49

Optim imization: : Lin inear Least Squares

min

𝜾

𝐾 𝜾 = 1 𝑜 ෍

𝑗=1 𝑜

ො 𝑧𝑗 − 𝑧𝑗 2 = 1 𝑜 ෍

𝑗=1 𝑜

𝐲𝑗𝜾 − 𝑧𝑗 2 min

𝜾

𝐾 𝜾 = 𝐘𝜾 − 𝒛 𝑈(𝐘𝜾 − 𝒛)

Matrix notation

More on matrix notation in the next exercise session

I2DL: Prof. Niessner, Prof. Leal-Taixé 52

slide-50
SLIDE 50

Optim imization: : Lin inear Least Squares

min

𝜾

𝐾 𝜾 = 1 𝑜 ෍

𝑗=1 𝑜

ො 𝑧𝑗 − 𝑧𝑗 2 = 1 𝑜 ෍

𝑗=1 𝑜

𝐲𝑗𝜾 − 𝑧𝑗 2 min

𝜾

𝐾 𝜾 = 𝐘𝜾 − 𝒛 𝑈(𝐘𝜾 − 𝒛)

53

Convex Optimum

𝜖𝐾(𝜾) 𝜖𝜾 = 0

I2DL: Prof. Niessner, Prof. Leal-Taixé 53

slide-51
SLIDE 51

True output: Temperature of the building Inputs: Outside temperature, number of people, …

Optim imization

We have found an analytical solution to a convex problem

Details in the exercise session!

𝜖𝐾(𝜄) 𝜖𝜄 = 2𝐘𝑈𝐘𝜾 − 2𝐘𝑈𝐳 = 0 𝜄 = 𝐘𝑈𝐘 −1𝐘𝑈𝐳

I2DL: Prof. Niessner, Prof. Leal-Taixé 55

slide-52
SLIDE 52

Is Is this is the best Estim imate?

  • Least squares estimate

𝐾 𝜾 = 1 𝑜 ෍

𝑗=1 𝑜

ො 𝑧𝑗 − 𝑧𝑗 2

I2DL: Prof. Niessner, Prof. Leal-Taixé 56

slide-53
SLIDE 53

Maxim imum Lik ikeli lihood

I2DL: Prof. Niessner, Prof. Leal-Taixé 57

slide-54
SLIDE 54

Maximum Lik ikeli lihood Estim imate

Controlled by parameter(s)

Parametric family of distributions 𝑞𝑒𝑏𝑢𝑏(𝐳|𝐘) 𝑞𝑛𝑝𝑒𝑓𝑚(𝐳|𝐘, 𝜾)

I2DL: Prof. Niessner, Prof. Leal-Taixé 58

True underlying distribution

slide-55
SLIDE 55

Maximum Lik ikeli lihood Estim imate

  • A method of estimating the parameters of a statistical

model given observations,

Observations from 𝑞𝑒𝑏𝑢𝑏(𝐳|𝐘)

𝑞𝑛𝑝𝑒𝑓𝑚(𝐳|𝐘, 𝜾)

I2DL: Prof. Niessner, Prof. Leal-Taixé 59

slide-56
SLIDE 56

Maximum Lik ikeli lihood Estim imate

  • A method of estimating the parameters of a statistical

model given observations, by finding the parameter values that maxim ximiz ize th the li likelih lihood of making the

  • bservations given the parameters.

𝜾𝑵𝑴 = arg max

𝜾

𝑞𝑛𝑝𝑒𝑓𝑚(𝐳|𝐘, 𝜾)

I2DL: Prof. Niessner, Prof. Leal-Taixé 61

slide-57
SLIDE 57

Maximum Lik ikeli lihood Estim imate

  • MLE assumes that the training samples are

in independent and genera rated by th the same pro robabili ility dis istri tribution

𝑞𝑛𝑝𝑒𝑓𝑚 𝐳 𝐘, 𝜾 = ෑ

𝑗=1 𝑜

𝑞𝑛𝑝𝑒𝑓𝑚(𝑧𝑗|𝐲𝑗, 𝜾)

I2DL: Prof. Niessner, Prof. Leal-Taixé 62

“i.i.d.” assumption

slide-58
SLIDE 58

Maximum Lik ikeli lihood Estim imate

𝜾𝑵𝑴 = arg max

𝜾

𝑗=1 𝑜

𝑞𝑛𝑝𝑒𝑓𝑚(𝑧𝑗|𝐲𝑗, 𝜾) 𝜾𝑵𝑴 = arg max

𝜾

𝑗=1 𝑜

log 𝑞𝑛𝑝𝑒𝑓𝑚(𝑧𝑗|𝐲𝑗, 𝜾)

Logarithmic property log 𝑏𝑐 = log 𝑏 + log 𝑐

I2DL: Prof. Niessner, Prof. Leal-Taixé 63

slide-59
SLIDE 59

𝜾𝑵𝑴 = arg max

𝜾

𝑗=1 𝑜

log 𝑞𝑛𝑝𝑒𝑓𝑚(𝑧𝑗|𝐲𝑗, 𝜾)

Back to Lin inear Regression

What shape does our probability distribution have?

I2DL: Prof. Niessner, Prof. Leal-Taixé 64

slide-60
SLIDE 60

What shape does our probability distribution have? 𝑞(𝑧𝑗|𝐲𝑗, 𝜾)

I2DL: Prof. Niessner, Prof. Leal-Taixé 65

Back to Lin inear Regression

slide-61
SLIDE 61

Assuming

Gaussian / Normal distribution

𝑞(𝑧𝑗|𝐲𝑗, 𝜾) 𝑧𝑗 = 𝒪 𝐲𝑗𝜾, 𝜏2 = 𝐲𝑗𝜾 + 𝒪(0, 𝜏2)

mean Gaussian:

𝑞 𝑧𝑗 = 1 2𝜌𝜏2 𝑓− 1

2𝜏2 𝑧𝑗−𝜈 2

𝑧𝑗 ~ 𝒪(𝜈, 𝜏2)

I2DL: Prof. Niessner, Prof. Leal-Taixé 66

Back to Lin inear Regression

slide-62
SLIDE 62

Assuming 𝑞 𝑧𝑗 𝐲𝑗, 𝜾 = ? 𝑧𝑗 = 𝒪 𝐲𝑗𝜾, 𝜏2 = 𝐲𝑗𝜾 + 𝒪(0, 𝜏2)

𝑞 𝑧𝑗 = 1 2𝜌𝜏2 𝑓− 1

2𝜏2 𝑧𝑗−𝜈 2

𝑧𝑗 ~ 𝒪(𝜈, 𝜏2)

I2DL: Prof. Niessner, Prof. Leal-Taixé 67

Back to Lin inear Regression

mean Gaussian:

𝑞 𝑧𝑗 = 1 2𝜌𝜏2 𝑓− 1

2𝜏2 𝑧𝑗−𝜈 2

𝑧𝑗 ~ 𝒪(𝜈, 𝜏2)

slide-63
SLIDE 63

𝑞 𝑧𝑗 𝐲𝑗, 𝜾 = 2𝜌𝜏2 −1/2𝑓− 1

2𝜏2 𝑧𝑗−𝐲𝒋𝜾 2

Assuming 𝑧𝑗 = 𝒪 𝐲𝑗𝜾, 𝜏2 = 𝐲𝑗𝜾 + 𝒪(0, 𝜏2)

𝑞 𝑧𝑗 = 1 2𝜌𝜏2 𝑓− 1

2𝜏2 𝑧𝑗−𝜈 2

𝑧𝑗 ~ 𝒪(𝜈, 𝜏2)

I2DL: Prof. Niessner, Prof. Leal-Taixé 68

Back to Lin inear Regression

mean Gaussian:

𝑞 𝑧𝑗 = 1 2𝜌𝜏2 𝑓− 1

2𝜏2 𝑧𝑗−𝜈 2

𝑧𝑗 ~ 𝒪(𝜈, 𝜏2)

slide-64
SLIDE 64

Back to Lin inear Regression

𝑞 𝑧𝑗 𝐲𝑗, 𝜾 = 2𝜌𝜏2 −1/2𝑓− 1

2𝜏2 𝑧𝑗−𝐲𝒋𝜾 2

Original

  • ptimization

problem 𝜾𝑵𝑴 = arg max

𝜾

𝑗=1 𝑜

log 𝑞𝑛𝑝𝑒𝑓𝑚(𝑧𝑗|𝐲𝑗, 𝜾)

I2DL: Prof. Niessner, Prof. Leal-Taixé 69

slide-65
SLIDE 65

Back to Lin inear Regression

𝑗=1 𝑜

log 2𝜌𝜏2 −1

2 𝑓− 1 2𝜏2 𝑧𝑗−𝒚𝒋𝜾 2 Matrix notation Canceling log and 𝑓

− 𝑜 2 log 2𝜌𝜏2 − 1 2𝜏2 𝒛 − 𝒀𝜾 𝑈 𝒛 − 𝒀𝜾

I2DL: Prof. Niessner, Prof. Leal-Taixé 70

𝑗=1 𝑜

− 1 2 log 2𝜌𝜏2 + ෍

𝑗=1 𝑜

− 1 2𝜏2 𝑧𝑗 − 𝒚𝒋𝜾 2

slide-66
SLIDE 66

𝜄𝑁𝑀 = arg max

𝜄

𝑗=1 𝑜

log 𝑞𝑛𝑝𝑒𝑓𝑚(𝑧𝑗|𝐲𝑗, 𝜾)

Back to Lin inear Regression

How can we find the estimate of theta? Details in the exercise session!

− 𝑜 2 log 2𝜌𝜏2 − 1 2𝜏2 𝐳 − 𝐘𝜾 𝑈 𝐳 − 𝐘𝜾

𝜖𝐾(𝜾) 𝜖𝜾 = 0

𝜾 = 𝒀𝑈𝒀 −1𝒀𝑈𝐳

I2DL: Prof. Niessner, Prof. Leal-Taixé 71

slide-67
SLIDE 67

Lin inear Regression

  • Maximum Likelihood Estimate (MLE) corresponds to

the Least Squares Estimate (given the assumptions)

  • Introduced the concepts of loss function and
  • ptimization to obtain the best model for regression

I2DL: Prof. Niessner, Prof. Leal-Taixé 72

slide-68
SLIDE 68

Im Image Cla lassific ication

I2DL: Prof. Niessner, Prof. Leal-Taixé 73

slide-69
SLIDE 69

Regression vs Cla lassific ication

  • Regression: predict a continuous output value (e.g.,

temperature of a room)

  • Classification: predict a discrete value

– Binary classification: output is either 0 or 1 – Multi-class classification: set of N classes

I2DL: Prof. Niessner, Prof. Leal-Taixé 74

slide-70
SLIDE 70

Logis istic ic Regressio ion

CAT classifier

I2DL: Prof. Niessner, Prof. Leal-Taixé 75

slide-71
SLIDE 71

Sig igmoid id fo for r Bin inary Pre redic ictio ions

Can be interpreted as a probability

1 𝑦0 𝑦1 𝑦2 𝜄1 𝜄0 𝜄2

Σ

𝜏 𝑦 = 1 1 + 𝑓−𝑦

𝑞(𝑧𝑗 = 1|𝐲𝑗, 𝜾)

I2DL: Prof. Niessner, Prof. Leal-Taixé 76

slide-72
SLIDE 72

Spoil iler Ale lert: 1-Layer Neural Network

1 𝑦0 𝑦1 𝑦2 𝜄1 𝜄0 𝜄2

Σ

𝜏 𝑦 = 1 1 + 𝑓−𝑦

𝑞(𝑧𝑗 = 1|𝐲𝑗, 𝜾)

I2DL: Prof. Niessner, Prof. Leal-Taixé 77

Can be interpreted as a probability

slide-73
SLIDE 73

Logistic ic Regression

  • Probability of a binary output

The prediction of

  • ur sigmoid

ො 𝐳 = 𝑞 𝐳 = 1 𝐘, 𝜾 = ෑ

𝑗=1 𝑜

𝑞(𝑧𝑗 = 1|𝐲𝑗, 𝜾) ො 𝑧𝑗 = 𝜏(𝐲𝑗𝜾)

I2DL: Prof. Niessner, Prof. Leal-Taixé 78

slide-74
SLIDE 74

Logistic ic Regression

  • Probability of a binary output

ො 𝐳 = 𝑞 𝐳 = 1 𝐘, 𝜾 = ෑ

𝑗=1 𝑜

𝑞(𝑧𝑗 = 1|𝐲𝑗, 𝜾)

Model for coins Bernoulli trial The prediction of

  • ur sigmoid

𝑞 𝑨 𝜚 = 𝜚𝑨 1 − 𝜚 1−𝑨 = ቊ𝜚 , if 𝑨 = 1 1 − 𝜚, if 𝑨 = 0

I2DL: Prof. Niessner, Prof. Leal-Taixé 79

slide-75
SLIDE 75

Logistic ic Regression

  • Probability of a binary output

ො 𝐳 = 𝑞 𝐳 = 1 𝐘, 𝜾 = ෑ

𝑗=1 𝑜

𝑞(𝑧𝑗 = 1|𝐲𝑗, 𝜾)

Model for coins

ො 𝐳 = ෑ

𝑗=1 𝑜

ො 𝑧𝑗

𝑧𝑗 1 − ො

𝑧𝑗 (1−𝑧𝑗)

True labels: 0 or 1 Prediction of the Sigmoid: continuous

I2DL: Prof. Niessner, Prof. Leal-Taixé 80

slide-76
SLIDE 76

Logistic ic Regression: : Loss Function

  • Probability of a binary output
  • Maximum Likelihood Estimate

𝑞 y 𝐘, 𝜾 = ො 𝐳 = ෑ

𝑗=1 𝑜

ො 𝑧𝑗

𝑧𝑗 1 − ො

𝑧𝑗 (1−𝑧𝑗) 𝜾𝑵𝑴 = arg max

𝜾

log 𝑞 y 𝐘, 𝜾

I2DL: Prof. Niessner, Prof. Leal-Taixé 81

slide-77
SLIDE 77

Logistic ic Regression: : Loss Function

𝑞 y 𝐘, 𝜾 = ො 𝐳 = ෑ

𝑗=1 𝑜

ො 𝑧𝑗

𝑧𝑗 1 − ො

𝑧𝑗 (1−𝑧𝑗) ෍

𝑗=1 𝑜

log ො 𝑧𝑗

𝑧𝑗 1 − ො

𝑧𝑗 (1−𝑧𝑗) ෍

𝑗=1 𝑜

𝑧𝑗 log ො 𝑧𝑗 + (1 − 𝑧𝑗) log(1 − ො 𝑧𝑗)

I2DL: Prof. Niessner, Prof. Leal-Taixé 82

slide-78
SLIDE 78

Logistic ic Regression: : Loss Function

ℒ ො 𝑧𝑗, 𝑧𝑗 = 𝑧𝑗 log ො 𝑧𝑗 + (1 − 𝑧𝑗) log(1 − ො 𝑧𝑗) 𝑧𝑗 = 1 ℒ ො 𝑧𝑗, 1 = log ො 𝑧𝑗

I2DL: Prof. Niessner, Prof. Leal-Taixé 83

Maximize!

𝜾𝑵𝑴 = arg max

𝜾

log 𝑞 y 𝐘, 𝜾

slide-79
SLIDE 79

Logistic ic Regression: : Loss Function

ℒ ො 𝑧𝑗, 𝑧𝑗 = 𝑧𝑗 log ො 𝑧𝑗 + (1 − 𝑧𝑗) log(1 − ො 𝑧𝑗) 𝑧𝑗 = 1 ℒ ො 𝑧𝑗, 1 = log ො 𝑧𝑗 We want log ො 𝑧𝑗 large; since logarithm is a monotonically increasing function, we also want large ො 𝑧𝑗 .

(1 is the largest value our model’s estimate can take!)

I2DL: Prof. Niessner, Prof. Leal-Taixé 84

slide-80
SLIDE 80

Logistic ic Regression: : Loss Function

ℒ ො 𝑧𝑗, 𝑧𝑗 = 𝑧𝑗 log ො 𝑧𝑗 + (1 − 𝑧𝑗) log(1 − ො 𝑧𝑗) We want log 1 − ො 𝑧𝑗 large; so we want ො 𝑧𝑗 to be small

(0 is the smallest value our model’s estimate can take!)

𝑧𝑗 = 0 ℒ ො 𝑧𝑗, 0 = log 1 − ො 𝑧𝑗 𝑧𝑗 = 1 ℒ ො 𝑧𝑗, 1 = log ො 𝑧𝑗

I2DL: Prof. Niessner, Prof. Leal-Taixé 85

slide-81
SLIDE 81

Logistic ic Regression: : Loss Function

  • Related to the multi-class loss you will see in this

course (also called softmax loss)

Referred to as bin binary cr cross-entropy loss (BCE)

ℒ ො 𝑧𝑗, 𝑧𝑗 = 𝑧𝑗 log ො 𝑧𝑗 + (1 − 𝑧𝑗) log(1 − ො 𝑧𝑗)

I2DL: Prof. Niessner, Prof. Leal-Taixé 86

slide-82
SLIDE 82

𝐷 𝜄 = − 1 𝑜 ෍

𝑗=1 𝑜

ℒ ො 𝑧𝑗, 𝑧𝑗 = − 1 𝑜 ෍

𝑗=1 𝑜

𝑧𝑗 log ො 𝑧𝑗 + (1 − 𝑧𝑗) log(1 − ො 𝑧𝑗)

Logistic ic Regression: : Optim imizatio ion

  • Loss function
  • Cost function

Minimization ℒ ො 𝑧𝑗, 𝑧𝑗 = 𝑧𝑗 log ො 𝑧𝑗 + (1 − 𝑧𝑗) log(1 − ො 𝑧𝑗)

ො 𝑧𝑗 = 𝜏(𝐲𝑗𝜾)

I2DL: Prof. Niessner, Prof. Leal-Taixé 87

slide-83
SLIDE 83

Logistic ic Regression: : Optim imizatio ion

  • No closed-form solution
  • Make use of an iterative method  gradient descent

Gradient descent – later on!

I2DL: Prof. Niessner, Prof. Leal-Taixé 89

slide-84
SLIDE 84

Why Machin ine Learning so Cool

  • We can learn from experience
  • > Intelligence, certain ability to infer the future!
  • Even linear models are often pretty good for

complex phenomena: e.g., weather:

– Linear combination of day-time, day-year etc. is often pretty good

I2DL: Prof. Niessner, Prof. Leal-Taixé 115

slide-85
SLIDE 85

Many Examples of f Logis istic ic Regre ressio ion

  • Coronavirus models behave like logistic regressions

– Exponential spread at beginning – Plateaus when certain portion of pop. is infected/immune

I2DL: Prof. Niessner, Prof. Leal-Taixé 116

https://www.worldometers.info/coronavirus

slide-86
SLIDE 86

Many Examples of f Logis istic ic Regre ressio ion

  • Coronavirus models behave like logistic regressions

– Exponential spread at beginning – Plateaus when certain portion of pop. is infected/immune

I2DL: Prof. Niessner, Prof. Leal-Taixé 117

https://www.worldometers.info/coronavirus

Think about good features:

  • Total population
  • Population density
  • Implementation of Measures
  • Reasonable government  ?
  • Etc. (many more of course)
slide-87
SLIDE 87

The Model Matters

  • Each case requires different models; linear vs logistic
  • Many models:

– #coronavirus_infections cannot be > #total_population – Munich housing prizes seem exponential though

  • No hard upper bound -> prizes can always grow!

I2DL: Prof. Niessner, Prof. Leal-Taixé 118

slide-88
SLIDE 88

Next Lectures

  • Next exercise session: Math Recap II
  • Next Lecture: Lecture 3:

– Jumping towards our first Neural Networks and Computational Graphs

I2DL: Prof. Niessner, Prof. Leal-Taixé 119

slide-89
SLIDE 89

Refe ferences fo for r fu furt rther Reading

  • Cross validation:

– https://medium.com/@zstern/k-fold-cross-validation- explained-5aeba90ebb3 – https://towardsdatascience.com/train-test-split-and- cross-validation-in-python-80b61beca4b6

  • General Machine Learning book:

– Pa Patte ttern Recogni nitio ion and nd Machin hine Learn

  • rning. C. Bishop.

I2DL: Prof. Niessner, Prof. Leal-Taixé 120

slide-90
SLIDE 90

See you next week 

I2DL: Prof. Niessner, Prof. Leal-Taixé 121