Midterm Review
Jia-Bin Huang Virginia Tech
Spring 2019
ECE-5424G / CS-5824
Midterm Review Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G - - PowerPoint PPT Presentation
Midterm Review Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative HW 2 due today. HW 3 release tonight. Due March 25. Final project Midterm HW 3: Multi-Layer Neural Network 1) Forward function of FC and
Jia-Bin Huang Virginia Tech
Spring 2019
ECE-5424G / CS-5824
1) Forward function of FC and ReLU 2) Backward function of FC and ReLU 3) Loss function (Softmax) 4) Construction of a two-layer network 5) Updating weight by minimizing the loss 6) Construction of a multi-layer network 7) Final prediction and test accuracy
Data
Source: CS229 @ Stanford
instructor/TA/faculty review
Consider the following dataset πΈ in one-dimensional space, where π¦ π , π§ π β π, π β {1,2, β¦ , |πΈ|} π¦ 1 = 0, π§ 1 = β1 π¦ 2 = 1, π§ 2 = π¦ 3 = 2, π§ 3 = 4 We optimize the following program argminπ0,π1 Ο π¦ π ,π§ π
βπΈ π§ π β π0 β π1π¦ π 2
(1) Please find the optimal π0
β, π1 β given the dataset above. Show all the work.
π πΊ = 1 = π π = 1|πΊ = 1 = π π = 1|πΊ = 0 = π πΈ = 1|πΊ = 1 = π πΈ = 1|πΊ = 0 = π π» = 1|πΊ = 1 = π π» = 1|πΊ = 0 =
Given a dataset of { π¦ 1 , π§ 1 , π¦ 2 , π§ 2 , β― , (π¦ π , π§(π))}, the cost function for logistic regression is
πΎ π = β 1 π ΰ·
π=1 π
π§ π log βπ π¦ π + 1 β π§ π log 1 β βπ π¦ π ,
where the hypothesis βπ π¦ =
1 1+exp(βπβ€π¦)
Questions:
loss function
π¦2 π¦1
margin
understand those concepts
π¦ 1 , π§ 1 , π¦ 2 , π§ 2 , β― , π¦ π , π§ π
None
Do nothing
ΰ· π§ = β π¦test = π§(π), where π = argminπ πΈ(π¦test, π¦(π))
βπ π¦ = π0 + π1π¦1 + π2π¦2 + β― + πππ¦π = πβ€π¦
πΎ π = 1 2π ΰ·
π=1 π
βπ π¦ π β π§ π
2
1) Gradient descent: Repeat {π
π β π π β π½ 1 π Οπ=1 π
βπ π¦ π β π§ π π¦π
π }
2) Solving normal equation π = (πβ€π)β1πβ€π§
ΰ· π§ = βπ π¦test = πβ€π¦test
π = π§) (Categorical, Normal, etc.)
Regression - Solution differs because of objective function
βπ π¦ = π(π|π1, π2, β― , ππ) β π π Ξ ππ ππ π)
Maximum likelihood estimation: πΎ π = β log π Data π Maximum a posteriori estimation :πΎ π = β log π Data π π π
ππ = π(π = π§π) (Discrete ππ) ππππ = π(ππ = π¦πππ|π = π§π) (Continuous ππ) mean πππ, variance πππ
2 , π ππ π = π§π) = πͺ(ππ|πππ, πππ
2 )
ΰ· π β argmax
π§π
π π = π§π Ξ ππ ππ
test π = π§π)
π π+πβπΎπΌπ
ο sigmoid/logistic function
concave so there is a single maximum.
regression and do linear fits on non-linear data transforms!
βπ π¦ = π π = 1 π1, π2, β― , ππ =
1 1+πβπβ€π¦
πΎ π = 1 π ΰ·
π=1 π
Cost(βπ(π¦ π ), π§(π))) Cost(βπ π¦ , π§) = ΰ΅βlog βπ π¦ if π§ = 1 βlog 1 β βπ π¦ if π§ = 0
Gradient descent: Repeat {π
π β π π β π½ 1 π Οπ=1 π
βπ π¦ π β π§ π π¦π
π }
ΰ· π = βπ π¦test = 1 1 + πβπβ€π¦test
Practice: What classifier(s) for this data? Why?
x1 x2
Practice: What classifier for this data? Why?
x1 x2
Choose π that maximizes probability of observed data
ΰ·‘ πΎMLE = argmax
π
π(πΈππ’π|π)
Choose π that is most probable given prior probability and data
ΰ·‘ πΎMAP = argmax
π
π π πΈ = argmax
π
π πΈππ’π π π π π(πΈππ’π)
Error Training Iters Train Error Validation Error Und erfit ting Overfitting
π¦2 π¦1
margin
input, hidden layer, pre-activation, activation ReLU, Sigmoid, Softmax Parameters: weight, bias
gradient descent, back-propagation, initialization
π1
(2)
π2
(2)
π3
(2)
π0
(2)
βΞ π¦
π¦1 π¦2 π¦3 π¦0