- 1
Supported by European Union’s Seventh Framework Programme (FP7) under grant agreement no. 615688 (PRIME)
On the Practical Computational Power of Finite Precision RNNs for Language Recognition
Gail Weiss, Yoav Goldberg, Eran Yahav
On the Practical Computational Power of Finite Precision RNNs for - - PowerPoint PPT Presentation
On the Practical Computational Power of Finite Precision RNNs for Language Recognition Gail Weiss , Yoav Goldberg, Eran Yahav GRU < LSTM (!?) 1 Supported by European Unions Seventh Framework Programme (FP7) under grant agreement no.
Supported by European Union’s Seventh Framework Programme (FP7) under grant agreement no. 615688 (PRIME)
Gail Weiss, Yoav Goldberg, Eran Yahav
them:
2
3
4
unreasonable assumptions!
1993 Proof:
Uses stack(s), maintained in certain dimension(s) Zeros are pushed using division (using g = g/4 + 1/4) In 32 bits, this reaches the limit after 15 pushes
Allows processing steps beyond reading input (Not the standard use case!)
5
unreasonable assumptions!
1993 Proof:
Uses stack(s), maintained in certain dimension(s) Zeros are pushed using division (using g = g/4 + 1/4) In 32 bits, this reaches the limit after 15 pushes
Allows processing steps beyond reading input (Not the standard use case!)
6
7
8
We accept all RNN types can simulate DFAs We show that LSTMs and IRNNs can also count And that the GRU and SRNN cannot
9
10
reset
11
Fischer, Meyer, Rosenberg - 1968
12
Regular Languages (RL) Context Free Languages (CFL) Context Sensitive Languages (CSL) Recursively Enumerable Languages (RE)
Regular Languages (RL) Context Free Languages (CFL) Context Sensitive Languages (CSL) Recursively Enumerable Languages (RE)
13
Palindromes
Regular Languages (RL) Context Free Languages (CFL) Context Sensitive Languages (CSL) Recursively Enumerable Languages (RE)
14
Palindromes
15
Palindromes Regular Languages (RL) Context Free Languages (CFL) Context Sensitive Languages (CSL) Recursively Enumerable Languages (RE)
16
Palindromes Regular Languages (RL) Context Free Languages (CFL) Context Sensitive Languages (CSL) Recursively Enumerable Languages (RE)
17
Palindromes Regular Languages (RL) Context Free Languages (CFL) Context Sensitive Languages (CSL) Recursively Enumerable Languages (RE) SKCMs cross the Chomsky Hierarchy! ?
18
19
20
GRU LSTM
zt = σ(Wzxt + Uzht−1 + bz) rt = σ(Wrxt + Urht−1 + br) ˜ ht = tanh(Whxt + Uh(rt ∘ ht−1) + bh) ht = zt ∘ ht−1 + (1 − zt) ∘ ˜ ht
ft = σ(Wf xt + Ufht−1 + bf) it = σ(Wixt + Uiht−1 + bi)
˜ ct = tanh(Wcxt + Ucht−1 + bc) ct = ft ∘ ct−1 + it ∘ ˜ ct ht = ot ∘ g(ct)
21
GRU LSTM
zt = σ(Wzxt + Uzht−1 + bz) rt = σ(Wrxt + Urht−1 + br) ˜ ht = tanh(Whxt + Uh(rt ∘ ht−1) + bh) ht = zt ∘ ht−1 + (1 − zt) ∘ ˜ ht
ft = σ(Wf xt + Ufht−1 + bf) it = σ(Wixt + Uiht−1 + bi)
˜ ct = tanh(Wcxt + Ucht−1 + bc) ct = ft ∘ ct−1 + it ∘ ˜ ct ht = ot ∘ g(ct)
gates candidate vectors update functions
zt ∈ (0,1) rt ∈ (0,1) ˜ ht = tanh(Whxt + Uh(rt ∘ ht−1) + bh) ht = zt ∘ ht−1 + (1 − zt) ∘ ˜ ht
22
GRU LSTM
ft ∈ (0,1)Wf xtdfsfsfddgdg it ∈ (0,1)Wixtddgdgsfsdfs
˜ ct = tanh(Wcxt + Ucht−1 + bc) ct = ft ∘ ct−1 + it ∘ ˜ ct ht = ot ∘ g(ct)
gates candidate vectors update functions
23
LSTM
zt ∈ (0,1) rt ∈ (0,1) ˜ ht ∈ (−1,1) ht = zt ∘ ht−1 + (1 − zt) ∘ ˜ ht
ft ∈ (0,1)Wf xtaaaaaaaaaa it ∈ (0,1)Wixtaaaaaaaaaa
˜ ct ∈ (−1,1)ac
b
ct = ft ∘ ct−1 + it ∘ ˜ ct ht = ot ∘ g(ct)
GRU
gates candidate vectors update functions
24
LSTM
ft ∈ (0,1)Wf xtaaaaaaaaaa it ∈ (0,1)Wixtaaaaaaaaaa
˜ ct ∈ (−1,1)ac
b
ct = ft ∘ ct−1 + it ∘ ˜ ct ht = ot ∘ g(ct)
GRU
zt ∈ (0,1) rt ∈ (0,1) ˜ ht ∈ (−1,1) ht = zt ∘ ht−1 + (1 − z) ∘ ˜ ht
25
LSTM
ft ∈ (0,1)Wf xtaaaaaaaaaa it ∈ (0,1)Wixtaaaaaaaaaa
˜ ct ∈ (−1,1)ac
b
ct = ft ∘ ct−1 + it ∘ ˜ ct ht = ot ∘ g(ct)
Interpolation
GRU
zt ∈ (0,1) rt ∈ (0,1) ˜ ht ∈ (−1,1) ht = zt ∘ ht−1 + (1 − z) ∘ ˜ ht
26
LSTM
ft ∈ (0,1)Wf xtaaaaaaaaaa it ∈ (0,1)Wixtaaaaaaaaaa
˜ ct ∈ (−1,1)ac
b
ct = ft ∘ ct−1 + it ∘ ˜ ct ht = ot ∘ g(ct)
Interpolation
GRU
zt ∈ (0,1) rt ∈ (0,1) ˜ ht ∈ (−1,1) ht = zt ∘ ht−1 + (1 − z) ∘ ˜ ht
Bounded!
27
LSTM
zt ∈ (0,1) rt ∈ (0,1) ˜ ht ∈ (−1,1) ht = zt ∘ ht−1 + (1 − z) ∘ ˜ ht
ft ∈ (0,1)Wf xtaaaaaaaaaa it ∈ (0,1)Wixtaaaaaaaaaa
˜ ct ∈ (−1,1)ac
b
ct = ft ∘ ct−1 + it ∘ ˜ ct ht = ot ∘ g(ct)
Interpolation
ct = ft ∘ ct−1 + it ∘ ˜ ct
GRU
Bounded!
28
LSTM
zt ∈ (0,1) rt ∈ (0,1) ˜ ht ∈ (−1,1) ht = zt ∘ ht−1 + (1 − z) ∘ ˜ ht
ft ∈ (0,1)Wf xtaaaaaaaaaa it ∈ (0,1)Wixtaaaaaaaaaa
˜ ct ∈ (−1,1)ac
b
ct = ft ∘ ct−1 + it ∘ ˜ ct ht = ot ∘ g(ct)
Interpolation Addition
ct = ft ∘ ct−1 + it ∘ ˜ ct
GRU
Bounded!
29
LSTM
ft ≈ 1Wf xtaaaaaaaaaa it ≈ 1Wixtaaaaaaaaaa
˜ ct ∈ (−1,1)ac
b
ct ≈ ct−1 + ˜ ct ht = ot ∘ g(ct)
Interpolation Addition
ct = ft ∘ ct−1 + it ∘ ˜ ct GRU
zt ∈ (0,1) rt ∈ (0,1) ˜ ht ∈ (−1,1) ht = zt ∘ ht−1 + (1 − z) ∘ ˜ ht
Bounded!
30
LSTM
ft ≈ 1Wf xtaaaaaaaaaa it ≈ 1Wixtaaaaaaaaaa
˜ ct ≈ 1ac
b
ct ≈ ct−1 + 1 ht = ot ∘ g(ct)
Interpolation Increase by 1
ct = ft ∘ ct−1 + it ∘ ˜ ct GRU
zt ∈ (0,1) rt ∈ (0,1) ˜ ht ∈ (−1,1) ht = zt ∘ ht−1 + (1 − z) ∘ ˜ ht
Bounded!
31
LSTM
ft ≈ 1Wf xtaaaaaaaaaa it ≈ 1Wixtaaaaaaaaaa
˜ ct ≈ − 1ac
b
ct ≈ ct−1 − 1 ht = ot ∘ g(ct)
Interpolation Decrease by 1
ct = ft ∘ ct−1 + it ∘ ˜ ct GRU
zt ∈ (0,1) rt ∈ (0,1) ˜ ht ∈ (−1,1) ht = zt ∘ ht−1 + (1 − z) ∘ ˜ ht
Bounded!
32
LSTM
ft ≈ 1Wf xtaaaaaaaaaa it ≈ 0Wixtaaaaaaaaaa
˜ ct ∈ (−1,1)ac
b
ct ≈ ct−1+ ˜ ct ht = ot ∘ g(ct)
Interpolation Do Nothing
ct = ft ∘ ct−1 + it ∘ ˜ ct GRU
zt ∈ (0,1) rt ∈ (0,1) ˜ ht ∈ (−1,1) ht = zt ∘ ht−1 + (1 − z) ∘ ˜ ht
Bounded!
33
LSTM
ft ≈ 0Wf xtaaaaaaaaaa it ≈ 0Wixtaaaaaaaaaa
˜ ct ∈ (−1,1)ac
b
ct ≈ 0ct−1 + ˜ ct ht = ot ∘ g(ct)
Interpolation Reset
ct = ft ∘ ct−1 + it ∘ ˜ ct GRU
zt ∈ (0,1) rt ∈ (0,1) ˜ ht ∈ (−1,1) ht = zt ∘ ht−1 + (1 − z) ∘ ˜ ht
Bounded!
34
LSTM
ft ≈ 0Wf xtaaaaaaaaaa it ≈ 0Wixtaaaaaaaaaa
˜ ct ∈ (−1,1)ac
b
ct ≈ 0ct−1 + ˜ ct ht = ot ∘ g(ct)
Interpolation Reset
ct = ft ∘ ct−1 + it ∘ ˜ ct GRU
zt ∈ (0,1) rt ∈ (0,1) ˜ ht ∈ (−1,1) ht = zt ∘ ht−1 + (1 − z) ∘ ˜ ht
Bounded! Can Count!
35
SRNN IRNN
ht = σh(Whxt + Uhht−1 + bh)
ht = max(0,Whxt + Uhht−1 + bh)
36
SRNN IRNN
ht = σh(Whxt + Uhht−1 + bh) ∈ (0,1)
ht = max(0,Whxt + Uhht−1 + bh)
Bounded!
37
SRNN IRNN
ht = σh(Whxt + Uhht−1 + bh) ∈ (0,1)
ht = max(0,Whxt + Uhht−1 + bh)
Bounded!
keep/reset +0 / +1 (subtraction in parallel, also increasing, counter)
38
SRNN IRNN
ht = σh(Whxt + Uhht−1 + bh) ∈ (0,1)
ht = max(0,Whxt + Uhht−1 + bh)
Bounded!
(subtraction in parallel, also increasing, counter)
Can Count!
keep/reset +0 / +1
39
40
LSTM GRU Trained , (on positive examples up to length 100)
anbn
Activations on :
a1000b1000
41
GRU:
LSTM GRU Trained , (on positive examples up to length 100)
anbn
Activations on :
a1000b1000
42
GRU:
LSTM GRU Trained , (on positive examples up to length 100)
anbn
Activations on :
a1000b1000
43
GRU:
LSTM GRU Trained , (on positive examples up to length 100)
anbn
Activations on :
a1000b1000
44
Activations on : LSTM GRU Trained , (on positive examples up to length 50)
anbncn
a100b100c100
45
Activations on : LSTM GRU GRU:
Trained , (on positive examples up to length 100)
anbncn
a100b100c100
46
47
48
and result in actual differences in expressive power
49
https://github.com/tech-srl/counting_dimensions
https://tinyurl.com/ybjkumrz