Computation Graphs
Philipp Koehn 29 September 2020
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn - - PowerPoint PPT Presentation
Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020 Neural Network Cartoon 1 A common way to illustrate a neural network x h y Philipp Koehn Machine Translation:
Philipp Koehn 29 September 2020
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
1
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
2
h = sigmoid(W1x + b1)
y = sigmoid(W2h + b2)
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
3
sigmoid sum b2 prod W2 sigmoid sum b1 prod W1 x
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
4
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
5
sigmoid sum b2 prod W2 sigmoid sum b1 prod W1 x
3.7 2.9 2.9
−5.2
−4.6
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
6
sigmoid sum b2 prod W2 sigmoid sum b1 prod W1 x
3.7 2.9 2.9
−5.2
−4.6
0.0
Machine Translation: Computation Graphs 29 September 2020
7
sigmoid sum b2 prod W2 sigmoid sum b1 prod W1 x
3.7 2.9 2.9
−5.2
−4.6
0.0
2.9
Machine Translation: Computation Graphs 29 September 2020
8
sigmoid sum b2 prod W2 sigmoid sum b1 prod W1 x
3.7 2.9 2.9
−5.2
−4.6
0.0
2.9
−1.6
Machine Translation: Computation Graphs 29 September 2020
9
sigmoid sum b2 prod W2 sigmoid sum b1 prod W1 x
3.7 2.9 2.9
−5.2
−4.6
0.0
2.9
−1.6
.168
Machine Translation: Computation Graphs 29 September 2020
10
sigmoid sum b2 prod W2 sigmoid sum b1 prod W1 x
3.7 2.9 2.9
−5.2
−4.6
0.0
2.9
−1.6
.168
1.18 .765
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
11
⇒ Cost function also known as objective function, loss, gain, cost, ...
error = 1 2(t − y)2
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
12
error(λ)
λ error(λ) gradient = 1 current λ
λ error(λ) g r a d i e n t = 2 current λ
λ error(λ) gradient = 0.2 current λ
– we are updating based on one training example, do not want to overfit to it – we are also changing all the other parameters, the curve will look different
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
13
– functions f and g – composition f ◦ g maps x to f(g(x))
(f ◦ g)′ = (f ′ ◦ g) · g′
F ′(x) = f ′(g(x))g′(x)
dz dx = dz dy · dy dx
if z = f(y) and y = g(x), then
dz dx = dz dy · dy dx = f ′(y)g′(x) = f ′(g(x))g′(x) Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
14
k wkhk
2(t − y)2
dE dwk = dE dy dy ds ds dwk
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
15
L2 t sigmoid sum b2 prod W2 sigmoid sum b1 prod W1 x
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
16
E B A
dA = dE dB dB dA
dB (backward pass through graph)
dA
– forward computation: B = A2 – backward computation: dB
dA = dA2 dA = 2A Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
17
L2 t sigmoid sum b2 prod W2 sigmoid sum b1 prod W1 x
dL2 dsigmoid = do di = d di 1 2(t − i)2 = t − i Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
18
L2 t sigmoid sum b2 prod W2 sigmoid sum b1 prod W1 x
dL2 dsigmoid = do di = d di 1 2(t − i)2 = t − i dsigmoid dsum
= do
di = d diσ(i) = σ(i)(1 − σ(i)) Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
19
L2 t sigmoid sum b2 prod W2 sigmoid sum b1 prod W1 x
dL2 dsigmoid = do di = d di 1 2(t − i)2 = t − i dsigmoid dsum
= do
di = d diσ(i) = σ(i)(1 − σ(i)) dsum dprod = do di1 = d di1i1 + i2 = 1, do di2 = 1 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
20
L2 t sigmoid sum b2 prod W2 sigmoid sum b1 prod W1 x
dL2 dsigmoid = do di = d di 1 2(t − i)2 = t − i dsigmoid dsum
= do
di = d diσ(i) = σ(i)(1 − σ(i)) dsum dprod = do di1 = d di1i1 + i2 = 1, do di2 = 1 dsum dprod = do di1 = d di1i1i2 = i2, do di2 = i1 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
21
t b2 W2 b1 W1 x L2 sigmoid sum prod sigmoid sum prod i2 − i1 σ′(i) 1, 1 i2, i1 σ′(i) 1, 1 i2, i1
3.7 2.9 2.9
−5.2
−4.6
1.0
0.0
2.9
−1.6
.17
1.18 .765 .0277 .235
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
22
t b2 W2 b1 W1 x L2 sigmoid sum prod sigmoid sum prod i2 − i1 σ′(i) 1, 1 i2, i1 σ′(i) 1, 1 i2, i1
3.7 2.9 2.9
−5.2
−4.6
1.0
0.0
2.9
−1.6
.17
1.18 .765 .0277 .180 × .235 = .0424 .235
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
23
t b2 W2 b1 W1 x L2 sigmoid sum prod sigmoid sum prod i2 − i1 σ′(i) 1, 1 i2, i1 σ′(i) 1, 1 i2, i1
3.7 2.9 2.9
−5.2
−4.6
1.0
0.0
2.9
−1.6
.17
1.18 .765 .0277 .0424 , .0424 .180 × .235 = .0424 .235
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
24
t b2 W2 b1 W1 x L2 sigmoid sum prod sigmoid sum prod i2 − i1 σ′(i) 1, 1 i2, i1 σ′(i) 1, 1 i2, i1
3.7 2.9 2.9
−5.2
−4.6
1.0
0.0
2.9
−1.6
.17
1.18 .765 .0277
−.0260
−.0308
−.0308
−.0308
−.0308
−.220
.0382 .00712 .0424 , .0424 .180 × .235 = .0424 .235
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
25
t b2 W2 b1 W1 x L2 sigmoid sum prod sigmoid sum prod i2 − i1 σ′(i) 1, 1 i2, i1 σ′(i) 1, 1 i2, i1
3.7 2.9 2.9
−5.2
−4.6
−.0308
−.0308
.00712 .0424
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
26
t b2 W2 b1 W1 x L2 sigmoid sum prod sigmoid sum prod i2 − i1 σ′(i) 1, 1 i2, i1 σ′(i) 1, 1 i2, i1
3.7 2.9 2.9
−.0308
−5.2 – µ
−.0308
−4.6
.0382 .00712 −2.0 – µ .0424
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
27
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
28
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
29
– define a computation graph – provide data and a training strategy (e.g., batching) – toolkit does the rest – seamless support of GPUs
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
30
✗ ✖ ✔ ✕
pip install torch
✗ ✖ ✔ ✕
import torch
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
31
✬ ✫ ✩ ✪
W = torch.tensor([[3,4],[2,3]], requires grad=True, dtype=torch.float) b = torch.tensor([-2,-4], requires grad=True, dtype=torch.float) W2 = torch.tensor([5,-5], requires grad=True, dtype=torch.float) b2 = torch.tensor([-2], requires grad=True, dtype=torch.float)
– specification of their basic data type (float) – indication to compute gradients (requires grad=True)
✤ ✣ ✜ ✢
x = torch.tensor([1,0], dtype=torch.float) t = torch.tensor([1], dtype=torch.float)
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
32
✬ ✫ ✩ ✪
s = W.mv(x) + b h = torch.nn.Sigmoid()(s) z = torch.dot(W2, h) + b2 y = torch.nn.Sigmoid()(z) error = 1/2 * (t - z) ** 2
– PyTorch sigmoid function torch.nn.Sigmoid() – multiplication between matrix W and vector x is mv – multiplication between two vectors W2 and h is torch.dot.
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
33
✗ ✖ ✔ ✕
error.backward()
★ ✧ ✥ ✦
>>> W2.grad tensor([-0.0360, -0.0059])
– when you run this code multiple times, then gradients accumulate – reset them with, e.g., W2.grad.data.zero ()
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
34
x y x ⊕ y 1 1 1 1 1 1
✬ ✫ ✩ ✪
training data = [ [ torch.tensor([0.,0.]), torch.tensor([0.]) ], [ torch.tensor([1.,0.]), torch.tensor([1.]) ], [ torch.tensor([0.,1.]), torch.tensor([1.]) ], [ torch.tensor([1.,1.]), torch.tensor([0.]) ] ]
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
35
✬ ✫ ✩ ✪
mu = 0.1 for epoch in range(1000): total_error = 0 for item in training_data: x = item[0] t = item[1] # forward computation s = W.mv(x) + b h = torch.nn.Sigmoid()(s) z = torch.dot(W2, h) + b2 y = torch.nn.Sigmoid()(z) error = 1/2 * (t - y) ** 2 total_error = total_error + error
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
36
✬ ✫ ✩ ✪
# backward computation error.backward() # weight updates W.data = W
b.data = b
W2.data = W2 - mu * W2.grad.data b2.data = b2 - mu * b2.grad.data W.grad.data.zero_() b.grad.data.zero_() W2.grad.data.zero_() b2.grad.data.zero_() print("error: ", total_error/4)
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
37
✗ ✖ ✔ ✕
error.backward()
✗ ✖ ✔ ✕
total error.backward()
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
38
★ ✧ ✥ ✦
x = torch.tensor([ [0.,0.], [1.,0.], [0.,1.], [1.,1.] ]) t = torch.tensor([ 0., 1., 1., 0. ])
✬ ✫ ✩ ✪
s = x.mm(W) + b h = torch.nn.Sigmoid()(s) z = h.mv(W2) + b2 y = torch.nn.Sigmoid()(z)
✬ ✫ ✩ ✪
error = 1/2 * (t - y) ** 2 mean error = error.mean() mean error.backward()
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
39
✬ ✫ ✩ ✪
# weight updates W.data = W
b.data = b
W2.data = W2 - mu * W2.grad.data b2.data = b2 - mu * b2.grad.data
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
40
✬ ✫ ✩ ✪
class ExampleNet(torch.nn.Module): def init (self): super(ExampleNet, self). init () self.layer1 = torch.nn.Linear(2,2) self.layer2 = torch.nn.Linear(2,1) self.layer1.weight = torch.nn.Parameter(torch.tensor([[3.,2.],[4.,3.]])) self.layer1.bias = torch.nn.Parameter(torch.tensor([-2.,-4.])) self.layer2.weight = torch.nn.Parameter(torch.tensor([[5.,-5.]])) self.layer2.bias = torch.nn.Parameter(torch.tensor([-2.])) def forward(self, x): s = self.layer1(x) h = torch.nn.Sigmoid()(s) z = self.layer2(h) y = torch.nn.Sigmoid()(z) return y
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
41
✗ ✖ ✔ ✕
net = ExampleNet()
✗ ✖ ✔ ✕
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
42
✬ ✫ ✩ ✪
for iteration in range(1000):
error = 1/2 * (t - out) ** 2 mean error = error.mean() print("error: ",mean error.data) mean error.backward()
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020
43
code available on web page for textbook http://www.statmt.org/nmt-book/
Philipp Koehn Machine Translation: Computation Graphs 29 September 2020