1
NLP Programming Tutorial 8 – Recurrent Neural Nets
NLP Programming Tutorial 8 - Recurrent Neural Nets
Graham Neubig Nara Institute of Science and Technology (NAIST)
NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig - - PowerPoint PPT Presentation
NLP Programming Tutorial 8 Recurrent Neural Nets NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and Technology (NAIST) 1 NLP Programming Tutorial 8 Recurrent Neural Nets Feed Forward Neural
1
NLP Programming Tutorial 8 – Recurrent Neural Nets
Graham Neubig Nara Institute of Science and Technology (NAIST)
2
NLP Programming Tutorial 8 – Recurrent Neural Nets
y
3
NLP Programming Tutorial 8 – Recurrent Neural Nets
y
4
NLP Programming Tutorial 8 – Recurrent Neural Nets
NET NET NET NET
x1 x2 x3 x4 y1 y2 y3 y4
5
NLP Programming Tutorial 8 – Recurrent Neural Nets
NET NET NET NET
natural language processing is JJ NN NN VBZ
6
NLP Programming Tutorial 8 – Recurrent Neural Nets
7
NLP Programming Tutorial 8 – Recurrent Neural Nets
A book review Oh, man I love this book! This book is so boring... Is it positive? yes no
Binary Prediction (2 choices)
A tweet On the way to the park! 公園に行くなう! Its language English Japanese
Multi-class Prediction (several choices)
A sentence I read a book Its syntactic parse
Structured Prediction (millions of choices)
I read a book
DET NN NP VBD VP S N
8
NLP Programming Tutorial 8 – Recurrent Neural Nets
5 10 0.5 1 w*phi(x) p(y|x)
5 10 0.5 1 w*phi(x) p(y|x)
Step Function Sigmoid Function
w⋅ ϕ( x)
w⋅ϕ(x)
9
NLP Programming Tutorial 8 – Recurrent Neural Nets
y e w⋅ϕ(x,~ y)
Current class Sum of other classes
r ∈r ~
10
NLP Programming Tutorial 8 – Recurrent Neural Nets
find_best(p): y = 0 for each element i in 1 .. len(p)-1: if p[i] > p[y]: y = i return y
11
NLP Programming Tutorial 8 – Recurrent Neural Nets
probability distributions
12
NLP Programming Tutorial 8 – Recurrent Neural Nets
create_one_hot(id, size): vec = np.zeros(size) vec[id] = 1 return vec
13
NLP Programming Tutorial 8 – Recurrent Neural Nets
14
NLP Programming Tutorial 8 – Recurrent Neural Nets
forward_nn(network, φ0) φ= [ φ0 ] # Output of each layer for each layer i in 1 .. len(network): w, b = network[i-1] # Calculate the value based on previous layer φ[i] = np.tanh( np.dot( w, φ[i-1] ) + b ) return φ # Return the values of all layers
15
NLP Programming Tutorial 8 – Recurrent Neural Nets
ht-1 xt 1
tanh
wr,h wr,x br ht
tanh
xt+1 1 wr,h wr,x br ht+1
softmax
wo,h bo 1 pt
softmax
bo 1 wo,h pt+1
ht=tanh(wr ,h⋅ht−1+wr ,x⋅xt+br) pt=softmax (wo,h⋅ht+bo)
16
NLP Programming Tutorial 8 – Recurrent Neural Nets
forward_rnn(wr,x, wr,h, br, wo,h, bo, x) h = [ ] # Hidden layers (at time t) p = [ ] # Output probability distributions (at time t) y = [ ] # Output values (at time t) for each time t in 0 .. len(x)-1: if t > 0: h[t] = tanh(wr,xx[t] + wr,hh[t-1] + br) else: h[t] = tanh(wr,xx[t] + br) p[t] = tanh(wo,hh[t] + bo) y[t] = find_max(p[t]) return h, p, y
17
NLP Programming Tutorial 8 – Recurrent Neural Nets
18
NLP Programming Tutorial 8 – Recurrent Neural Nets
(including logistic regression) w = 0 for I iterations for each labeled pair x, y in the data w += α * dP(y|x)/dw
(the direction that will increase the probability of y)
19
NLP Programming Tutorial 8 – Recurrent Neural Nets
5 10 0.1 0.2 0.3 0.4 w*phi(x) dp(y|x)/dw*phi(x)
w⋅ ϕ( x)
w⋅ϕ(x)
w⋅ϕ(x)
w⋅ϕ(x)) 2
w⋅ϕ(x)
w⋅ϕ(x))
w⋅ϕ(x)
w⋅ϕ(x)) 2
20
NLP Programming Tutorial 8 – Recurrent Neural Nets
y=1
d P( y=1∣x) d w4 =h(x) ew4⋅h(x) (1+e
w 4⋅h(x)) 2
d P( y=1∣x) d w1 =? d P( y=1∣x) d w2 =? d P( y=1∣x) d w3 =?
w1 w2 w3 w4
21
NLP Programming Tutorial 8 – Recurrent Neural Nets
d P( y=1∣x) d w1 =d P( y=1∣x) d w4 h(x) d w4 h(x) d h1(x) d h1(x) d w1
w 4⋅h(x)) 2
Error of next unit (δ4) Weight Gradient of this unit
In General Calculate i based
22
NLP Programming Tutorial 8 – Recurrent Neural Nets
w1 w2 w3 w4
y
23
NLP Programming Tutorial 8 – Recurrent Neural Nets
24
NLP Programming Tutorial 8 – Recurrent Neural Nets
NET NET NET NET
x1 x2 x3 x4 y1 y2 y3 y4
25
NLP Programming Tutorial 8 – Recurrent Neural Nets
sequence
26
NLP Programming Tutorial 8 – Recurrent Neural Nets
NET NET NET NET
x1 x2 x3 x4 y1 y2 y3 y4 δo,4
27
NLP Programming Tutorial 8 – Recurrent Neural Nets
NET NET NET NET
x1 x2 x3 x4 y1 y2 y3 y4 δo,4
28
NLP Programming Tutorial 8 – Recurrent Neural Nets
29
NLP Programming Tutorial 8 – Recurrent Neural Nets
NET NET NET NET
x1 x2 x3 x4 y1 y2 y3 y4 δo,4
med. small tiny very tiny
30
NLP Programming Tutorial 8 – Recurrent Neural Nets
gradient_rnn(wr,x, wr,h, br, wo,h, bo, x, h, p, y') initialize Δwr,x, Δwr,h, Δbr, Δwo,h, Δbo δr' = np.zeros(len(br)) # Error from the following time step for each time t in len(x)-1 .. 0: p' = create_one_hot(y'[t]) δo' = p' – p[t] # Output error Δwo,h += np.outer(h[t], δo'); Δbo += δo' # Output gradient δr = np.dot(δ'r, wr,h) + np.dot(δ'o, wo,h) # Backprop δ'r = δr * (1 – h[t]2) # tanh gradient Δwr,x += np.outer(x[t], δr'); Δbr += δr' # Hidden gradient if t != 0: Δwr,h += np.outer(h[t-1], δr'); return Δwr,x, Δwr,h, Δbr, Δwo,h, Δbo
31
NLP Programming Tutorial 8 – Recurrent Neural Nets
update_weights(wr,x, wr,h, br, wo,h, bo, Δwr,x, Δwr,h, Δbr, Δwo,h, Δbo, λ) wr,x += λ * Δwr,x wr,h += λ * Δwr,h br += λ * Δbr wo,h += λ * Δwo,h bo += λ * Δbo
32
NLP Programming Tutorial 8 – Recurrent Neural Nets
# Create features create map x_ids, y_ids, array data for each labeled pair x, y in the data
add (create_ids(x, x_ids), create_ids(y, y_ids) ) to data
initialize net randomly # Perform training for I iterations for each labeled pair x, y' in the feat_lab h, p, y = forward_rnn(net, φ0) Δ= gradient_rnn(net, x, h, y') update_weights(net, Δ, λ) print net to weight_file print x_ids, y_ids to id_file
33
NLP Programming Tutorial 8 – Recurrent Neural Nets
34
NLP Programming Tutorial 8 – Recurrent Neural Nets
data/wiki-en-test.norm
script/gradepos.pl data/wiki-en-test.pos my_answer.pos