Parallelizable StackLSTM
Shuoyang Ding Philipp Koehn NAACL 2019 Structured Prediction Workshop Minneapolis, MN, United States June 7th, 2019
Parallelizable StackLSTM Shuoyang Ding Philipp Koehn NAACL 2019 - - PowerPoint PPT Presentation
Parallelizable StackLSTM Shuoyang Ding Philipp Koehn NAACL 2019 Structured Prediction Workshop Minneapolis, MN, United States June 7th, 2019 Outline What is StackLSTM? Parallelization Problem Homogenizing Computation
Shuoyang Ding Philipp Koehn NAACL 2019 Structured Prediction Workshop Minneapolis, MN, United States June 7th, 2019
Parallelizable StackLSTM
2
Parallelizable StackLSTM
4
Parallelizable StackLSTM
5
Parallelizable StackLSTM
6
Parallelizable StackLSTM
7
Parallelizable StackLSTM
8
Parallelizable StackLSTM
9
Dyer et al. (2015) Ballesteros et al. (2017)
Parallelizable StackLSTM
10
Parallelizable StackLSTM
11
Parallelizable StackLSTM
12
Parallelizable StackLSTM
13
Parallelizable StackLSTM
14
Parallelizable StackLSTM
15
Parallelizable StackLSTM
16
Parallelizable StackLSTM
17
Parallelizable StackLSTM
18
Parallelizable StackLSTM
19
Parallelizable StackLSTM
20
Parallelizable StackLSTM
21
Parallelizable StackLSTM
22
Parallelizable StackLSTM
23
Parallelizable StackLSTM
25
Parallelizable StackLSTM
26
Parallelizable StackLSTM
27
Parallelizable StackLSTM
28
Parallelizable StackLSTM
29
Parallelizable StackLSTM
30
Parallelizable StackLSTM
32
hidden state h_{p(t)};
computation with x(t) and h_{p(t)};
to h_{p(t) + 1};
p(t+1) = p(t) + 1;
Parallelizable StackLSTM
33
hidden state h_{p(t)};
computation with x(t) and h_{p(t)};
to h_{p(t) + 1};
p(t+1) = p(t) + 1;
Parallelizable StackLSTM
34
hidden state h_{p(t)};
computation with x(t) and h_{p(t)};
to h_{p(t) + 1};
p(t+1) = p(t) + 1;
Parallelizable StackLSTM
35
hidden state h_{p(t)};
computation with x(t) and h_{p(t)};
to h_{p(t) + 1};
p(t+1) = p(t) + 1;
Parallelizable StackLSTM
36
hidden state h_{p(t)};
computation with x(t) and h_{p(t)};
to h_{p(t) + 1};
p(t+1) = p(t) + 1;
Parallelizable StackLSTM
37
p(t+1) = p(t) - 1;
Parallelizable StackLSTM
38
p(t+1) = p(t) - 1;
Parallelizable StackLSTM
39
hidden state h_{p(t)};
computation with x(t) and h_{p(t)};
to h_{p(t) + 1};
p(t+1) = p(t) + 1;
p(t+1) = p(t) - 1;
Parallelizable StackLSTM
40
hidden state h_{p(t)};
computation with x(t) and h_{p(t)};
to h_{p(t) + 1};
p(t+1) = p(t) + op;
p(t+1) = p(t) + op;
Use op = +1 for push and
Parallelizable StackLSTM
41
Parallelizable StackLSTM
42
Parallelizable StackLSTM
43
hidden state h_{p(t)};
computation with x(t) and h_{p(t)};
to h_{p(t) + 1};
p(t+1) = p(t) + op;
p(t+1) = p(t) + op;
Parallelizable StackLSTM
44
p(t+1) = p(t) + op;
hidden state h_{p(t)};
computation with x(t) and h_{p(t)};
p(t+1) = p(t) + op;
to h_{p(t) + 1};
Parallelizable StackLSTM
45
Parallelizable StackLSTM
46
Parallelizable StackLSTM
47
Parallelizable StackLSTM
48
hidden state h_{p(t)};
computation with x(t) and h_{p(t)};
to h_{p(t) + 1};
p(t+1) = p(t) + op;
p(t+1) = p(t) + op;
Parallelizable StackLSTM
49
hidden state h_{p(t)};
computation with x(t) and h_{p(t)};
to h_{p(t) + 1};
p(t+1) = p(t) + op;
hidden state h_{p(t)};
computation with x(t) and h_{p(t)};
to h_{p(t) + 1};
p(t+1) = p(t) + op;
Parallelizable StackLSTM
50
hidden state h_{p(t)};
computation with x(t) and h_{p(t)};
to h_{p(t) + 1};
p(t+1) = p(t) + op;
Parallelizable StackLSTM
52
Parallelizable StackLSTM
action embedding) perform better
53
Parallelizable StackLSTM
54
Parallelizable StackLSTM
55
Parallelizable StackLSTM
56
batch size 91 91.5 92 92.5 93 8 16 32 64 128 256
Ours Ballesteros 2017
Parallelizable StackLSTM
architecture.
parsers of comparable performance within 1 hour.
58
paper code slides
https://github.com/shuoyangd/hoolock