Slides for 15-381/781 15-381/781 Fall 2016 - - PowerPoint PPT Presentation

slides for 15 381 781
SMART_READER_LITE
LIVE PREVIEW

Slides for 15-381/781 15-381/781 Fall 2016 - - PowerPoint PPT Presentation

Slides for 15-381/781 15-381/781 Fall 2016 Slides courtesy of Machine Learning Instructors: Ariel Procaccia & Emma Brunskill


slide-1
SLIDE 1
  • Slides for 15-381/781

Slides courtesy of 15-381/781 Fall 2016 Machine Learning Instructors: Ariel Procaccia & Emma Brunskill Slides courtesy of Zico Kolter

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
  • , 2
  • , 0
  • , 8
  • , 5
slide-7
SLIDE 7
  • , 2
  • , 0
  • , 8
  • , 5

slide-8
SLIDE 8
  • , 2
  • , 0
  • , 8
  • , 5

  • = hθ
  • = hθ
slide-9
SLIDE 9

  • = hθ
  • = hθ
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12

65 70 75 80 85 90 1.6 1.8 2 2.2 2.4 2.6 2.8 3 High Temperature (F) Peak Hourly Demand (GW)

slide-13
SLIDE 13
  • ≈ θ1 · () + θ2

θ1 θ2

slide-14
SLIDE 14
  • 65

70 75 80 85 90 1.6 1.8 2 2.2 2.4 2.6 2.8 3 High Temperature (F) Peak Hourly Demand (GW) Observed data Linear regression prediction

slide-15
SLIDE 15
  • x (i) ∈ Rn, i = 1, . . . , m

x (i) ∈ R2 =

  • i

1

  • y(i) ∈ R

y(i) ∈ R = { i}

θ ∈ Rn hθ(x) : Rn → R

hθ(x) y

  • hθ(x) = x Tθ =

n

  • i=1

xiθi

slide-16
SLIDE 16
  • ℓ : R × R → R+

hθ(x)

y y ℓ (hθ(x), y) = (hθ(x) − y)2

slide-17
SLIDE 17
  • (x (i), y(i))

i = 1, . . . , m hθ θ

  • θ

m

  • i=1

  • hθ(x (i)), y(i)

slide-18
SLIDE 18
  • 65

70 75 80 85 90 1.6 1.8 2 2.2 2.4 2.6 2.8 3 High Temperature (F) Peak Hourly Demand (GW) Observed data Linear regression prediction

hθ(x) = x Tθ ℓ(hθ(y), y) = (hθ(x) − y)2

  • θ

m

  • i=1

  • hθ(x (i)), y(i)

θ m

  • i=1
  • x (i)Tθ − y(i)2
slide-19
SLIDE 19
  • θ

m

  • i=1
  • x (i)Tθ − y(i)2
  • ∇θ

m

  • i=1
  • x (i)Tθ − y(i)2

=

m

  • i=1

∇θ

  • x (i)Tθ − y(i)2

= 2

m

  • i=1

x (i) x (i)Tθ − y(i)

θ ← θ − α

m

  • i=1

x (i) x (i)Tθ − y(i)

slide-20
SLIDE 20

∇θf (θ) = 0

m

  • i=1

x (i) x (i)Tθ⋆ − y(i) = 0

m

  • i=1

x (i)x (i)T

  • θ⋆ =

m

  • i=1

x (i)y(i)

  • ⇒ θ⋆ =

m

  • i=1

x (i)x (i)T −1 m

  • i=1

x (i)y(i)

slide-21
SLIDE 21

65 70 75 80 85 90 1.6 1.8 2 2.2 2.4 2.6 2.8 3 High Temperature (F) Peak Hourly Demand (GW) Observed data Linear regression prediction

slide-22
SLIDE 22
  • ℓ (hθ(x), y) = (hθ(x) − y)2?
  • ℓ(hθ(x), y) = |hθ(x) − y|
  • ℓ(hθ(x), y) = {0, |hθ(x) − y| − }, ∈ R+

−3 −2 −1 1 2 3 1 2 3 4 hθ(xi) − yi Loss Squared Loss Absolute Loss Deadband Loss

slide-23
SLIDE 23

θ⋆

  • : θ ← θ − α

m

  • i=1

x (i)

  • x (i)Tθ − y(i)
slide-24
SLIDE 24

65 70 75 80 85 90 1.6 1.8 2 2.2 2.4 2.6 2.8 3 High Temperature (F) Peak Hourly Demand (GW) Observed data Squared loss Absolute loss Deadband loss, eps = 0.1

slide-25
SLIDE 25
  • y

hθ(x)

y = hθ(x) +

  • p() =

1 √ 2πσ

  • − 2

2σ2

  • y x θ

p(y|x; θ) = 1 √ 2πσ

  • −(hθ(x) − y)2

2σ2

slide-26
SLIDE 26
  • p(y(1), . . . , y(m)|x (1), . . . , x (m); θ) =

m

  • i=1

p(y(i)|x (i); θ)

θ

  • θ

m

  • i=1

p(y(i)|x (i); θ) ≡

  • θ

m

  • i=1

p(y(i)|x (i); θ) ≡

θ m

  • i=1
  • (

√ 2πσ) + 1 2σ2 (hθ(x (i)) − y(i))2

θ m

  • i=1

(hθ(x (i)) − y(i))2

slide-27
SLIDE 27
slide-28
SLIDE 28
  • θ

m

  • i=1

ℓ(hθ(x (i)), y(i))

θ = ({(x (i), y(i))}, hθ, ℓ, α) θ ← 0

g ← 0

i = 1, . . . , m

g ← g + ∇θℓ(hθ(x (i)), y(i)) θ ← θ − αg

θ

slide-29
SLIDE 29

m

  • θ = ({(x (i), y(i))}, hθ, ℓ, α)

θ ← 0 i = 1, . . . , m

θ ← θ − α∇θℓ(hθ(x (i)), y(i))

θ

slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
  • 150

160 170 180 190 200 210 500 1000 1500 2000 2500 Power (watts) Duration (seconds) Fridge 1 Fridge 2

slide-33
SLIDE 33
  • x (i) ∈ Rn, i = 1, . . . , m

x (i) ∈ R3 = ( i, i, 1)

y(i) ∈ {−1, +1}

y(i) =

θ ∈ Rn hθ(x) : Rn → R

y −1 +1

(hθ(x))

hθ(x) = x Tθ

slide-34
SLIDE 34
  • ℓ : R × {−1, +1} → R+
  • y

−1 +1 x

slide-35
SLIDE 35
  • ℓ : R × {−1, +1} → R+
  • y

−1 +1 x

slide-36
SLIDE 36
  • ℓ : R × {−1, +1} → R+
  • y

−1 +1 x

slide-37
SLIDE 37
  • ℓ(hθ(x), y) =

1

y = (hθ(x))

  • = {y · hθ(x) ≤ 0}

−3 −2 −1 1 2 3 0.5 1 1.5 2 y × hθ(x) Loss

slide-38
SLIDE 38
  • ℓ(hθ(x), y) = {1 − y · hθ(x), 0}
  • ℓ(hθ(x), y) = {1 − y · hθ(x), 0}2
  • ℓ(hθ(x), y) = (1 + e−y·hθ(x))
  • ℓ(hθ(x), y) = e−y·hθ(x)
slide-39
SLIDE 39

−3 −2 −1 1 2 3 0.5 1 1.5 2 2.5 3 3.5 4 y × hθ(x) Loss 0−1 Loss Hinge Loss Logistic Loss Exponential Loss

slide-40
SLIDE 40
  • θ

m

  • i=1

{1 − y(i) · x (i)Tθ, 0} + λ

n

  • i=1

θ2

i

θ := θ − α

m

  • i=1

y(i)x (i){y(i) · x (i)Tθ < 1} + 2λ

n

  • i=1

θi

slide-41
SLIDE 41

150 160 170 180 190 200 210 500 1000 1500 2000 2500 Power (watts) Duration (seconds) Fridge 1 Fridge 2 Classifier boundary

slide-42
SLIDE 42
  • θ

+

m

  • i=1

(1 + e−y·x (i)T θ) + λ

n

  • i=1

θ2

i

slide-43
SLIDE 43

150 160 170 180 190 200 210 500 1000 1500 2000 2500 Power (watts) Duration (seconds)

slide-44
SLIDE 44
  • p(y|x; θ) =

1 1 + (−y · hθ(x))

x (i) y(i)

  • θ

m

  • i=1

p(y(i)|x (i); θ) ≡

θ

  • 1 +
  • −y(i) · hθ(x (i))
slide-45
SLIDE 45
  • y ∈ 0, 1, . . . , k

y j

ˆ y(i) =

  • 1

y(i) = j

−1

  • θj

x j hθj (x)

slide-46
SLIDE 46
  • 120

140 160 180 200 220 240 500 1000 1500 2000 2500 Power (watts) Duration (seconds)

slide-47
SLIDE 47
slide-48
SLIDE 48

65 70 75 80 85 90 1.6 1.8 2 2.2 2.4 2.6 2.8 3 High Temperature (F) Peak Hourly Demand (GW) Observed data Linear regression prediction

slide-49
SLIDE 49

20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW)

slide-50
SLIDE 50
  • x (i) ∈ R3 =

  ( i)2

i

1  

hθ(x) = x Tθ

slide-51
SLIDE 51

20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Observed Data d = 2

slide-52
SLIDE 52

20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Observed Data d = 4

slide-53
SLIDE 53

20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Observed Data d = 30

slide-54
SLIDE 54
  • θ

m

  • i=1

ℓ(hθ(x (i)), y(i)) (x ′, y′)

slide-55
SLIDE 55

20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Training set Validation set

slide-56
SLIDE 56

20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Training set Validation set d = 4

slide-57
SLIDE 57

20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Training set Validation set d = 30

slide-58
SLIDE 58
slide-59
SLIDE 59

5 10 15 20 25 30 10 10

5

10

10

Degree of polynomial Loss Training Validation

slide-60
SLIDE 60
slide-61
SLIDE 61

θ

20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Observed Data d = 30

slide-62
SLIDE 62

θ

  • θ

m

  • i=1

  • hθ(x (i)), y(i)

+ λ

n

  • i=1

θ2

i

λ ∈ R+ θ

slide-63
SLIDE 63

20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Training set Validation set d = 30

λ = 0

slide-64
SLIDE 64

20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Training set Validation set d = 30

λ = 1

slide-65
SLIDE 65
slide-66
SLIDE 66
slide-67
SLIDE 67
  • hθ(x) =

m

  • i=1

θiK(x, x (i))

K : Rn × Rn → R x x (i) K

  • θ

θ ∈ Rm

slide-68
SLIDE 68
  • hθ(x) = y(i x−x (i)2)

x2 = n

i=1 x 2 i

k k

slide-69
SLIDE 69
  • hθ(x) = σ(θT

2 σ(ΘT 1 x))

θ = {Θ1 ∈ Rn×p, θ2 ∈ Rp

σ : R → R σ(z) = 1/(1 + (−z))

  • x1

x2 xn

  • z1

z2 zp

  • y

Θ1 θ2

slide-70
SLIDE 70
slide-71
SLIDE 71
  • x2 ≥ 2

x1 ≥ −3 hθ(x) = +1 hθ(x) = −1 hθ(x) = −1

slide-72
SLIDE 72
  • hθ(x) =

k

  • i=1

θi(hi(x))

slide-73
SLIDE 73
slide-74
SLIDE 74
  • , 2
  • , 0
  • , 8
  • , 5

  • = hθ
  • = hθ
slide-75
SLIDE 75

  • = hθ
  • = hθ
slide-76
SLIDE 76
  • x (i) ∈ Rn, i = 1, . . . , m

θ ∈ Rk

slide-77
SLIDE 77
  • hθ : Rn → Rn

hθ(x (i)) ≈ x (i)

ℓ : Rn × Rn → R+

ℓ(hθ(x), x) = hθ(x) − x2

hθ(x) = x hθ

slide-78
SLIDE 78

k

k

θ = {µ(1), . . . , µ(k)}, µ(i) ∈ Rk

  • hθ(x) = µ(i x−µ(i)2)
  • θ

m

  • i=1

x (i) − hθ(x (i))2 µ(i) µ(i)

slide-79
SLIDE 79
  • θ = {Θ1 ∈ Rn×k, Θ2 ∈ Rk×n} k < n

hθ(x) = Θ1Θ2x Θ2x ∈ Rk x

  • Θ1,Θ2

m

  • i=1

x (i) − Θ1Θ2x (i)2

2