- Slides for 15-381/781
Slides for 15-381/781 15-381/781 Fall 2016 - - PowerPoint PPT Presentation
Slides for 15-381/781 15-381/781 Fall 2016 - - PowerPoint PPT Presentation
Slides for 15-381/781 15-381/781 Fall 2016 Slides courtesy of Machine Learning Instructors: Ariel Procaccia & Emma Brunskill
- , 2
- , 0
- , 8
- , 5
- , 2
- , 0
- , 8
- , 5
- −
→
- hθ
- , 2
- , 0
- , 8
- , 5
- −
→
- hθ
- = hθ
- = hθ
- −
→
- hθ
- = hθ
- = hθ
65 70 75 80 85 90 1.6 1.8 2 2.2 2.4 2.6 2.8 3 High Temperature (F) Peak Hourly Demand (GW)
- ≈ θ1 · () + θ2
θ1 θ2
- 65
70 75 80 85 90 1.6 1.8 2 2.2 2.4 2.6 2.8 3 High Temperature (F) Peak Hourly Demand (GW) Observed data Linear regression prediction
- x (i) ∈ Rn, i = 1, . . . , m
x (i) ∈ R2 =
- i
1
- y(i) ∈ R
y(i) ∈ R = { i}
θ ∈ Rn hθ(x) : Rn → R
hθ(x) y
- hθ(x) = x Tθ =
n
- i=1
xiθi
- ℓ : R × R → R+
hθ(x)
y y ℓ (hθ(x), y) = (hθ(x) − y)2
- (x (i), y(i))
i = 1, . . . , m hθ θ
- θ
m
- i=1
ℓ
- hθ(x (i)), y(i)
- hθ
ℓ
- 65
70 75 80 85 90 1.6 1.8 2 2.2 2.4 2.6 2.8 3 High Temperature (F) Peak Hourly Demand (GW) Observed data Linear regression prediction
hθ(x) = x Tθ ℓ(hθ(y), y) = (hθ(x) − y)2
- θ
m
- i=1
ℓ
- hθ(x (i)), y(i)
≡
θ m
- i=1
- x (i)Tθ − y(i)2
- θ
m
- i=1
- x (i)Tθ − y(i)2
- ∇θ
m
- i=1
- x (i)Tθ − y(i)2
=
m
- i=1
∇θ
- x (i)Tθ − y(i)2
= 2
m
- i=1
x (i) x (i)Tθ − y(i)
θ ← θ − α
m
- i=1
x (i) x (i)Tθ − y(i)
∇θf (θ) = 0
m
- i=1
x (i) x (i)Tθ⋆ − y(i) = 0
- ⇒
m
- i=1
x (i)x (i)T
- θ⋆ =
m
- i=1
x (i)y(i)
- ⇒ θ⋆ =
m
- i=1
x (i)x (i)T −1 m
- i=1
x (i)y(i)
65 70 75 80 85 90 1.6 1.8 2 2.2 2.4 2.6 2.8 3 High Temperature (F) Peak Hourly Demand (GW) Observed data Linear regression prediction
- ℓ (hθ(x), y) = (hθ(x) − y)2?
- ℓ(hθ(x), y) = |hθ(x) − y|
- ℓ(hθ(x), y) = {0, |hθ(x) − y| − }, ∈ R+
−3 −2 −1 1 2 3 1 2 3 4 hθ(xi) − yi Loss Squared Loss Absolute Loss Deadband Loss
θ⋆
- : θ ← θ − α
m
- i=1
x (i)
- x (i)Tθ − y(i)
65 70 75 80 85 90 1.6 1.8 2 2.2 2.4 2.6 2.8 3 High Temperature (F) Peak Hourly Demand (GW) Observed data Squared loss Absolute loss Deadband loss, eps = 0.1
- y
hθ(x)
y = hθ(x) +
- p() =
1 √ 2πσ
- − 2
2σ2
- y x θ
p(y|x; θ) = 1 √ 2πσ
- −(hθ(x) − y)2
2σ2
- p(y(1), . . . , y(m)|x (1), . . . , x (m); θ) =
m
- i=1
p(y(i)|x (i); θ)
θ
- θ
m
- i=1
p(y(i)|x (i); θ) ≡
- θ
−
m
- i=1
p(y(i)|x (i); θ) ≡
θ m
- i=1
- (
√ 2πσ) + 1 2σ2 (hθ(x (i)) − y(i))2
- ≡
θ m
- i=1
(hθ(x (i)) − y(i))2
- θ
m
- i=1
ℓ(hθ(x (i)), y(i))
θ = ({(x (i), y(i))}, hθ, ℓ, α) θ ← 0
g ← 0
i = 1, . . . , m
g ← g + ∇θℓ(hθ(x (i)), y(i)) θ ← θ − αg
θ
m
- θ = ({(x (i), y(i))}, hθ, ℓ, α)
θ ← 0 i = 1, . . . , m
θ ← θ − α∇θℓ(hθ(x (i)), y(i))
θ
- 150
160 170 180 190 200 210 500 1000 1500 2000 2500 Power (watts) Duration (seconds) Fridge 1 Fridge 2
- x (i) ∈ Rn, i = 1, . . . , m
x (i) ∈ R3 = ( i, i, 1)
y(i) ∈ {−1, +1}
y(i) =
θ ∈ Rn hθ(x) : Rn → R
y −1 +1
(hθ(x))
hθ(x) = x Tθ
- ℓ : R × {−1, +1} → R+
- y
−1 +1 x
- ℓ : R × {−1, +1} → R+
- y
−1 +1 x
- ℓ : R × {−1, +1} → R+
- y
−1 +1 x
- ℓ(hθ(x), y) =
1
y = (hθ(x))
- = {y · hθ(x) ≤ 0}
−3 −2 −1 1 2 3 0.5 1 1.5 2 y × hθ(x) Loss
- ℓ(hθ(x), y) = {1 − y · hθ(x), 0}
- ℓ(hθ(x), y) = {1 − y · hθ(x), 0}2
- ℓ(hθ(x), y) = (1 + e−y·hθ(x))
- ℓ(hθ(x), y) = e−y·hθ(x)
−3 −2 −1 1 2 3 0.5 1 1.5 2 2.5 3 3.5 4 y × hθ(x) Loss 0−1 Loss Hinge Loss Logistic Loss Exponential Loss
- θ
m
- i=1
{1 − y(i) · x (i)Tθ, 0} + λ
n
- i=1
θ2
i
θ := θ − α
- −
m
- i=1
y(i)x (i){y(i) · x (i)Tθ < 1} + 2λ
n
- i=1
θi
150 160 170 180 190 200 210 500 1000 1500 2000 2500 Power (watts) Duration (seconds) Fridge 1 Fridge 2 Classifier boundary
- θ
+
m
- i=1
(1 + e−y·x (i)T θ) + λ
n
- i=1
θ2
i
150 160 170 180 190 200 210 500 1000 1500 2000 2500 Power (watts) Duration (seconds)
- p(y|x; θ) =
1 1 + (−y · hθ(x))
x (i) y(i)
- θ
−
m
- i=1
p(y(i)|x (i); θ) ≡
θ
- 1 +
- −y(i) · hθ(x (i))
- y ∈ 0, 1, . . . , k
y j
ˆ y(i) =
- 1
y(i) = j
−1
- θj
x j hθj (x)
- 120
140 160 180 200 220 240 500 1000 1500 2000 2500 Power (watts) Duration (seconds)
65 70 75 80 85 90 1.6 1.8 2 2.2 2.4 2.6 2.8 3 High Temperature (F) Peak Hourly Demand (GW) Observed data Linear regression prediction
20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW)
- x (i) ∈ R3 =
( i)2
i
1
hθ(x) = x Tθ
20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Observed Data d = 2
20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Observed Data d = 4
20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Observed Data d = 30
- θ
m
- i=1
ℓ(hθ(x (i)), y(i)) (x ′, y′)
hθ
- hθ
20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Training set Validation set
20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Training set Validation set d = 4
20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Training set Validation set d = 30
5 10 15 20 25 30 10 10
5
10
10
Degree of polynomial Loss Training Validation
θ
20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Observed Data d = 30
θ
- θ
m
- i=1
ℓ
- hθ(x (i)), y(i)
+ λ
n
- i=1
θ2
i
λ ∈ R+ θ
20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Training set Validation set d = 30
λ = 0
20 40 60 80 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 High Temperature (F) Peak Hourly Demand (GW) Training set Validation set d = 30
λ = 1
- hθ(x) =
m
- i=1
θiK(x, x (i))
K : Rn × Rn → R x x (i) K
- θ
θ ∈ Rm
- hθ(x) = y(i x−x (i)2)
x2 = n
i=1 x 2 i
k k
- hθ(x) = σ(θT
2 σ(ΘT 1 x))
θ = {Θ1 ∈ Rn×p, θ2 ∈ Rp
σ : R → R σ(z) = 1/(1 + (−z))
- x1
x2 xn
- z1
z2 zp
- y
Θ1 θ2
- x2 ≥ 2
x1 ≥ −3 hθ(x) = +1 hθ(x) = −1 hθ(x) = −1
- hθ(x) =
k
- i=1
θi(hi(x))
- , 2
- , 0
- , 8
- , 5
- −
→
- hθ
- = hθ
- = hθ
- −
→
- hθ
- = hθ
- = hθ
- x (i) ∈ Rn, i = 1, . . . , m
θ ∈ Rk
- hθ : Rn → Rn
hθ(x (i)) ≈ x (i)
ℓ : Rn × Rn → R+
ℓ(hθ(x), x) = hθ(x) − x2
hθ(x) = x hθ
k
k
θ = {µ(1), . . . , µ(k)}, µ(i) ∈ Rk
- hθ(x) = µ(i x−µ(i)2)
- θ
m
- i=1
x (i) − hθ(x (i))2 µ(i) µ(i)
- θ = {Θ1 ∈ Rn×k, Θ2 ∈ Rk×n} k < n
hθ(x) = Θ1Θ2x Θ2x ∈ Rk x
- Θ1,Θ2
m
- i=1
x (i) − Θ1Θ2x (i)2
2