Natural Language Processing Machine Translation III Dan Klein UC - - PowerPoint PPT Presentation

natural language processing
SMART_READER_LITE
LIVE PREVIEW

Natural Language Processing Machine Translation III Dan Klein UC - - PowerPoint PPT Presentation

Natural Language Processing Machine Translation III Dan Klein UC Berkeley 1 Syntactic Models 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Syntactic Decoding 29 30 31 32 33 34 35 36


slide-1
SLIDE 1

1

Natural Language Processing

Machine Translation III

Dan Klein – UC Berkeley

slide-2
SLIDE 2

2

Syntactic Models

slide-3
SLIDE 3

3

slide-4
SLIDE 4

4

slide-5
SLIDE 5

5

slide-6
SLIDE 6

6

slide-7
SLIDE 7

7

slide-8
SLIDE 8

8

slide-9
SLIDE 9

9

slide-10
SLIDE 10

10

slide-11
SLIDE 11

11

slide-12
SLIDE 12

12

slide-13
SLIDE 13

13

slide-14
SLIDE 14

14

slide-15
SLIDE 15

15

slide-16
SLIDE 16

16

slide-17
SLIDE 17

17

slide-18
SLIDE 18

18

slide-19
SLIDE 19

19

slide-20
SLIDE 20

20

slide-21
SLIDE 21

21

slide-22
SLIDE 22

22

slide-23
SLIDE 23

23

slide-24
SLIDE 24

24

slide-25
SLIDE 25

25

slide-26
SLIDE 26

26

slide-27
SLIDE 27

27

slide-28
SLIDE 28

28

slide-29
SLIDE 29

29

Syntactic Decoding

slide-30
SLIDE 30

30

slide-31
SLIDE 31

31

slide-32
SLIDE 32

32

slide-33
SLIDE 33

33

slide-34
SLIDE 34

34

slide-35
SLIDE 35

35

slide-36
SLIDE 36

36

slide-37
SLIDE 37

37

slide-38
SLIDE 38

38

slide-39
SLIDE 39

39

slide-40
SLIDE 40

40

slide-41
SLIDE 41

41

slide-42
SLIDE 42

42

slide-43
SLIDE 43

43

slide-44
SLIDE 44

44

slide-45
SLIDE 45

45

slide-46
SLIDE 46

46

slide-47
SLIDE 47

47

slide-48
SLIDE 48

48

slide-49
SLIDE 49

49

Flexible Syntax

slide-50
SLIDE 50

50

Soft Syntactic MT: From Chiang 2010

slide-51
SLIDE 51

51

slide-52
SLIDE 52

52

Hiero Rules

From [Chiang et al, 2005]

slide-53
SLIDE 53

53

slide-54
SLIDE 54

54

slide-55
SLIDE 55

55

slide-56
SLIDE 56

56

slide-57
SLIDE 57

57

slide-58
SLIDE 58

58

slide-59
SLIDE 59

59

slide-60
SLIDE 60

60

slide-61
SLIDE 61

61

slide-62
SLIDE 62

62

slide-63
SLIDE 63

63

slide-64
SLIDE 64

64

slide-65
SLIDE 65

65

Exploiting GPUs

slide-66
SLIDE 66

66

Lots to Parse

≈2.6 billion words

slide-67
SLIDE 67

67

Lots to Parse

≈6 months (CPU)

slide-68
SLIDE 68

68

Lots to Parse

≈3.6 days (GPU)

slide-69
SLIDE 69

69

CPU Parsing

  • NLP algorithms achieve speed by exploiting

sparsity.

>98% sparsity

Slide credit: Slav Petrov

[Petrov & Klein, 2007]

slide-70
SLIDE 70

70

CPU Parsing

Grammar S NP VP

× ×××

Skip Spans Skip Rules

slide-71
SLIDE 71

71

CPU Parsing

CPU

slide-72
SLIDE 72

72

CPU Parsing

CPU

slide-73
SLIDE 73

73

CPU Parsing

CPU CPU

slide-74
SLIDE 74

74

The Future of Hardware

slide-75
SLIDE 75

75

The Future of Hardware

slide-76
SLIDE 76

76

The Future of Hardware

slide-77
SLIDE 77

77

The Future of Hardware

16384

slide-78
SLIDE 78

78

The Future of Hardware

32 Threads

slide-79
SLIDE 79

79

The Future of Hardware

Warp

add.s32 %r1, %r631, %r0; ld.global.f32 %f81, [%r1]; ld.global.f32 %f82, [%r34]; mul.ftz.f32 %f94, %f82, %f81; mov.f32 %f95, 0f3E002E23; mov.f32 %f96, 0f00000000; mad.f32 %f93, %f94, %f95, %f96; shl.b32 %r2, %r646, 8; add.s32 %r3, %r658, %r2; shl.b32 %r4, %r3, 2; add.s32 %r5, %r631, %r4; mul.lo.s32 %r6, %r646, 588; shl.b32 %r7, %r6, 1; add.s32 %r8, %r5, %r7; ld.global.f32 %f83, [%r8]; mul.ftz.f32 %f98, %f82, %f83;

slide-80
SLIDE 80

80

Warps

Warp

slide-81
SLIDE 81

81

Warps

Warp Divergence

slide-82
SLIDE 82

82

Warps

slide-83
SLIDE 83

83

Warps

slide-84
SLIDE 84

84

Warps

Warp Divergence

slide-85
SLIDE 85

85

Warps

Warp Divergence

slide-86
SLIDE 86

86

Warps

✔ ✗

Coalescence

slide-87
SLIDE 87

87

Designing GPU Algorithms

Dense, Uniform Computation

Warp Coalescence

slide-88
SLIDE 88

88

Designing GPU Algorithms

Irregular, Sparse Regular, Dense

CPU

GPU

×

× ×××

slide-89
SLIDE 89

89

Designing GPU Algorithms

Irregular, Sparse Regular, Dense

CPU

GPU

×

× ×××

[Canny, Hall, and Klein, 2013]

× ×××

slide-90
SLIDE 90

90

Designing GPU Algorithms

CKY Algorithm

slide-91
SLIDE 91

91

CKY Parsing

for each sentence: for each span (begin, end): for each split: for each rule (P ‐> L R): score[begin, end, P] += ruleScore[P ‐> L R] * score[begin, split, L] * score[split, end, R]

Grammar Application Item Queue

slide-92
SLIDE 92

92

CKY Parsing

for each sentence: for each span (begin, end): for each split: applyGrammar(begin, split, end)

Item Queue Grammar Application

slide-93
SLIDE 93

93

CKY Parsing

for each parse item in sentence: applyGrammar(item)

Item Queue Grammar Application

slide-94
SLIDE 94

94

CKY Parsing

for each parse item in sentence: applyGrammar(item)

CPU GPU

slide-95
SLIDE 95

95

GPU Parsing Pipeline

CPU GPU

Queue

(i, k, j) (1, 2, 4) (1, 3, 4)

Grammar S NP VP

(0, 1, 3) (0, 2, 3)

3 2

(0, 1, 3)

slide-96
SLIDE 96

96

Parsing Speed

[Canny, Hall, and Klein, 2013]

GPU 190 s/sec CPU 10 s/sec 100 200 300 400 500

Sentences per second

slide-97
SLIDE 97

97

Exploiting Sparsity

Grammar S NP VP

× ×××

CPU Queuing GPU Application

slide-98
SLIDE 98

98

Exploiting Sparsity

Grammar S NP VP

GPU Application

Grammar S NP VP

GPU Application

slide-99
SLIDE 99

99

Exploiting Sparsity

Warp

(1, 2, 4) (0, 1, 3) (0, 2, 3) (1, 3, 4)

(2, 3, 5) (2, 4, 5) (3, 4, 6)

3 2

slide-100
SLIDE 100

100

Exploiting Sparsity

(1, 2, 4) (0, 1, 3) (0, 2, 3) (1, 3, 4)

(2, 3, 5) (2, 4, 5) (3, 4, 6)

S NP VP PP … S NP VP PP … S NP VP PP … S NP VP PP … S NP VP PP … S NP VP PP … S NP VP PP …

slide-101
SLIDE 101

101

Exploiting Sparsity

Warp Divergence

slide-102
SLIDE 102

102

Exploiting Sparsity

Grammar S NP VP

GPU Application

slide-103
SLIDE 103

103

Exploiting Sparsity

NP

NP NP PP

VP

VP VP PP

S

S NP VP

PP

PP IN NP

NP

(i, k, j)

(0, 1, 3) (0, 2, 3)

VP

(i, k, j)

(0, 1, 3) (0, 2, 3)

S

(i, k, j)

(0, 1, 3) (0, 2, 3)

PP

(i, k, j)

(0, 1, 3) (0, 2, 3)

Queue

(i, k, j)

(0, 1, 3) (0, 2, 3)

slide-104
SLIDE 104

104

Exploiting Sparsity

CPU GPU

NP Queue

(i, k, j) (1, 2, 4) (1, 3, 4)

(0, 1, 3) (0, 2, 3)

NP

NP NP PP

NP

NP NP PP

NP

NP NP PP

NP

NP NP PP

NP

NP NP PP

slide-105
SLIDE 105

105

Exploiting Sparsity

CPU GPU

VP Queue

(i, k, j) (1, 2, 4) (1, 3, 4)

(0, 1, 3) (0, 2, 3)

NP

NP NP PP

NP

NP NP PP

NP

NP NP PP

VP

VP VP NP

slide-106
SLIDE 106

106

Parsing Speed

GPU Min Risk 190 s/sec GPU Vit. 405 s/sec CPU 10 s/sec 100 200 300 400 500