Machine Learning
Neural Networks: Backpropagation
1
Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
Neural Networks: Backpropagation Machine Learning Based on slides - - PowerPoint PPT Presentation
Neural Networks: Backpropagation Machine Learning Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, 1 Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others This lecture What is a neural network?
1
Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
3
4
6
Perhaps with a regularizer
( . *($$ &/, ( , !/)
Each minimizes a different loss function
7
5 + 244 5 74 + 284 5 78
: + 248 : ;4 + 288 : ;8)
: + 244 : ;4 + 284 : ;8)
9
Suppose the true label for this example is a number !/ We can write the square loss for this example as:
5 + 244 5 74 + 284 5 78
: + 248 : ;4 + 288 : ;8)
: + 244 : ;4 + 284 : ;8)
10
Perhaps with a regularizer
( . *($$ ;/, 2 , !/)
11
min
( . *($$ ;/, 2 , !/)
12
min
( . *($$ ;/, 2 , !/)
13
min
( . *($$ ;/, 2 , !/)
14
min
( . *($$ ;/, 2 , !/)
15
min
( . *($$ ;/, 2 , !/)
17
°t: learning rate, many tweaks possible The objective is not convex. Initialization can be important
min
( . *($$ ;/, 2 , !/)
18
°t: learning rate, many tweaks possible The objective is not convex. Initialization can be important
min
( . *($$ ;/, 2 , !/)
19
20
21
22
Where are we
Questions?
23
Where are we
Questions?
24
Where are we
Questions?
25
Where are we
Questions?
FH
27
Slide courtesy Richard Socher
FH
28
Slide courtesy Richard Socher
FH
29
Slide courtesy Richard Socher
30
5 + 244 5 74 + 284 5 78
: + 248 : ;4 + 288 : ;8)
: + 244 : ;4 + 284 : ;8)
31
M and
N
5 + 244 5 74 + 284 5 78
: + 248 : ;4 + 288 : ;8)
: + 244 : ;4 + 284 : ;8)
32
Applying the chain rule to compute the gradient (And remembering partial computations along the way to speed up things)
M and
N
5 + 244 5 74 + 284 5 78
: + 248 : ;4 + 288 : ;8)
: + 244 : ;4 + 284 : ;8)
33
5 + 244 5 74 + 284 5 78
5 = O*
3 Backpropagation example
34
5 + 244 5 74 + 284 5 78
5 = O*
5 Backpropagation example
35
5 + 244 5 74 + 284 5 78
5 = O*
5
Backpropagation example
36
5 + 244 5 74 + 284 5 78
5 = O*
5
5 = 1 Backpropagation example
37
5 = O*
3
5 + 244 5 74 + 284 5 78
Backpropagation example
38
5 = O*
5
5 + 244 5 74 + 284 5 78
Backpropagation example
39
5 = O*
5
5 + 244 5 74 + 284 5 78
Backpropagation example
40
5 = O*
5
5 = 74
5 + 244 5 74 + 284 5 78
Backpropagation example
41
5 = O*
5
5 = 74
5 + 244 5 74 + 284 5 78
We have already computed this partial derivative for the previous case Cache to speed up! Backpropagation example
42
5 + 244 5 74 + 284 5 78
: + 248 : ;4 + 288 : ;8)
: + 244 : ;4 + 284 : ;8) Backpropagation example
43
5 + 244 5 74 + 284 5 78
: + 248 : ;4 + 288 : ;8)
: + 244 : ;4 + 284 : ;8) Backpropagation example
44
: = O*
: Backpropagation example
45
: = O*
:
: (234 5 + 244 5 74 + 284 5 78)
5 + 244 5 74 + 284 5 78 Backpropagation example
46
: = O*
:
: (234 5 + 244 5 74 + 284 5 78)
5 + 244 5 74 + 284 5 78
5
: 74 + 284 5
: 78) Backpropagation example
47
: = O*
:
: (234 5 + 244 5 74 + 284 5 78)
5 + 244 5 74 + 284 5 78
5
: 74 + 284 5
: 78)
:
Backpropagation example
48
: = O*
:
: (234 5 + 244 5 74 + 284 5 78)
5 + 244 5 74 + 284 5 78
5
: Backpropagation example
49
: = O*
:
: (234 5 + 244 5 74 + 284 5 78)
5
:
: + 248 : ;4 + 288 : ;8) Backpropagation example
50
: = O*
:
: (234 5 + 244 5 74 + 284 5 78)
5
:
: + 248 : ;4 + 288 : ;8) Call this s Backpropagation example
51
: = O*
:
: (234 5 + 244 5 74 + 284 5 78)
5
:
: + 248 : ;4 + 288 : ;8) Call this s
5 O78
: Backpropagation example
52
: = O*
5 O78
:
: + 248 : ;4 + 288 : ;8) Call this s Backpropagation example (From previous slide)
53
: = O*
5 O78
:
: + 248 : ;4 + 288 : ;8) Call this s Each of these partial derivatives is easy Backpropagation example
54
: = O*
5 O78
:
: + 248 : ;4 + 288 : ;8) Call this s
Each of these partial derivatives is easy Backpropagation example
55
: = O*
5 O78
:
: + 248 : ;4 + 288 : ;8) Call this s
Why? Because 78 Q is the logistic function we have already seen Each of these partial derivatives is easy Backpropagation example
56
: = O*
5 O78
:
: + 248 : ;4 + 288 : ;8) Call this s
Why? Because 78 Q is the logistic function we have already seen
: = ;8 Each of these partial derivatives is easy Backpropagation example
57
: = O*
5 O78
:
: + 248 : ;4 + 288 : ;8) Call this s
Why? Because 78 Q is the logistic function we have already seen
: = ;8
Backpropagation example Each of these partial derivatives is easy
58
59
backpropagation
60
°t: learning rate, many tweaks possible The objective is not convex. Initialization can be important
min
( . *($$ ;/, 2 , !/)