1
Natural Language Processing
Machine Translation III
Dan Klein – UC Berkeley
Natural Language Processing Machine Translation III Dan Klein UC - - PowerPoint PPT Presentation
Natural Language Processing Machine Translation III Dan Klein UC Berkeley 1 Syntactic Models 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Syntactic Decoding 29 30 31 32 33 34 35 36
1
Dan Klein – UC Berkeley
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
From [Chiang et al, 2005]
53
54
55
56
57
58
59
60
61
62
63
64
65
66
≈2.6 billion words
67
≈6 months (CPU)
68
≈3.6 days (GPU)
69
>98% sparsity
Slide credit: Slav Petrov
[Petrov & Klein, 2007]
70
Grammar S NP VP
Skip Spans Skip Rules
71
72
73
74
75
76
77
78
32 Threads
79
Warp
add.s32 %r1, %r631, %r0; ld.global.f32 %f81, [%r1]; ld.global.f32 %f82, [%r34]; mul.ftz.f32 %f94, %f82, %f81; mov.f32 %f95, 0f3E002E23; mov.f32 %f96, 0f00000000; mad.f32 %f93, %f94, %f95, %f96; shl.b32 %r2, %r646, 8; add.s32 %r3, %r658, %r2; shl.b32 %r4, %r3, 2; add.s32 %r5, %r631, %r4; mul.lo.s32 %r6, %r646, 588; shl.b32 %r7, %r6, 1; add.s32 %r8, %r5, %r7; ld.global.f32 %f83, [%r8]; mul.ftz.f32 %f98, %f82, %f83;
80
Warp
81
Warp Divergence
82
83
84
Warp Divergence
85
Warp Divergence
86
Coalescence
87
Dense, Uniform Computation
Warp Coalescence
88
Irregular, Sparse Regular, Dense
×
89
Irregular, Sparse Regular, Dense
×
[Canny, Hall, and Klein, 2013]
90
CKY Algorithm
91
for each sentence: for each span (begin, end): for each split: for each rule (P ‐> L R): score[begin, end, P] += ruleScore[P ‐> L R] * score[begin, split, L] * score[split, end, R]
Grammar Application Item Queue
92
for each sentence: for each span (begin, end): for each split: applyGrammar(begin, split, end)
Item Queue Grammar Application
93
for each parse item in sentence: applyGrammar(item)
Item Queue Grammar Application
94
for each parse item in sentence: applyGrammar(item)
CPU GPU
95
CPU GPU
(i, k, j) (1, 2, 4) (1, 3, 4)
Grammar S NP VP
(0, 1, 3) (0, 2, 3)
(0, 1, 3)
96
[Canny, Hall, and Klein, 2013]
Sentences per second
97
Grammar S NP VP
CPU Queuing GPU Application
98
Grammar S NP VP
GPU Application
Grammar S NP VP
GPU Application
99
Warp
(1, 2, 4) (0, 1, 3) (0, 2, 3) (1, 3, 4)
(2, 3, 5) (2, 4, 5) (3, 4, 6)
100
(1, 2, 4) (0, 1, 3) (0, 2, 3) (1, 3, 4)
(2, 3, 5) (2, 4, 5) (3, 4, 6)
S NP VP PP … S NP VP PP … S NP VP PP … S NP VP PP … S NP VP PP … S NP VP PP … S NP VP PP …
101
Warp Divergence
102
Grammar S NP VP
GPU Application
103
NP NP PP
VP VP PP
S NP VP
PP IN NP
(i, k, j)
(0, 1, 3) (0, 2, 3)
(i, k, j)
(0, 1, 3) (0, 2, 3)
(i, k, j)
(0, 1, 3) (0, 2, 3)
(i, k, j)
(0, 1, 3) (0, 2, 3)
(i, k, j)
(0, 1, 3) (0, 2, 3)
104
CPU GPU
(i, k, j) (1, 2, 4) (1, 3, 4)
(0, 1, 3) (0, 2, 3)
NP NP PP
NP NP PP
NP NP PP
NP NP PP
NP NP PP
105
CPU GPU
(i, k, j) (1, 2, 4) (1, 3, 4)
(0, 1, 3) (0, 2, 3)
NP NP PP
NP NP PP
NP NP PP
VP VP NP
106