Human-Inspired Structured Prediction for Language and Biology Liang - - PowerPoint PPT Presentation

human inspired structured prediction for language and
SMART_READER_LITE
LIVE PREVIEW

Human-Inspired Structured Prediction for Language and Biology Liang - - PowerPoint PPT Presentation

Human-Inspired Structured Prediction for Language and Biology Liang Huang Principal Scientist, Baidu Research Assistant Professor, Oregon State University incremental & linear-time Human-Inspired Structured Prediction for Language and


slide-1
SLIDE 1

Human-Inspired Structured Prediction for Language and Biology

Liang Huang

Principal Scientist, Baidu Research Assistant Professor, Oregon State University

slide-2
SLIDE 2

Human-Inspired Structured Prediction for Language and Biology

Liang Huang

Principal Scientist, Baidu Research Assistant Professor, Oregon State University

incremental & linear-time

slide-3
SLIDE 3

Human-Inspired Structured Prediction for Language and Biology

Liang Huang

Principal Scientist, Baidu Research Assistant Professor, Oregon State University

simultaneous interpretation

incremental & linear-time

slide-4
SLIDE 4

Human-Inspired Structured Prediction for Language and Biology

Liang Huang

Principal Scientist, Baidu Research Assistant Professor, Oregon State University

I eat sushi with tuna from Japan

natural language sequence syntactic structure simultaneous interpretation

incremental & linear-time

slide-5
SLIDE 5

Human-Inspired Structured Prediction for Language and Biology

Liang Huang

Principal Scientist, Baidu Research Assistant Professor, Oregon State University

I eat sushi with tuna from Japan

GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

RNA sequence natural language sequence syntactic structure secondary structure

G C G G G A A U A G C U C A G U U G G U A G A G C A C G A C C U U G C C A A G G U C G G G G U C G C G A G U U C G A G U C U C G U U U C C C G C U C C A 1 10 20 30 40 50 60 70 76

simultaneous interpretation

incremental & linear-time

slide-6
SLIDE 6

A Bit about Myself…

2

Ashish Vaswani (USC, 2014)


(co-advised by David Chiang)


Senior Research Scientist
 Google Brain

first author of Transformer
 “Attention is All You Need”

James Cross (OSU, 2016)


(co-advised by David Chiang)


Research Scientist


Facebook

EMNLP 2016 Best Paper
 Honorable Mention

Kai Zhao (OSU, 2017)


(co-advised by David Chiang)


Research Scientist
 Google

11 top-conference papers
 (ACL/EMNLP/NAACL)

Mingbo Ma (OSU, 2018)


(co-advised by David Chiang)


Research Scientist
 Baidu Research USA breakthrough in 
 simultaneous translation

My PhD Graduates

slide-7
SLIDE 7

A Bit about Myself…

2

Ashish Vaswani (USC, 2014)


(co-advised by David Chiang)


Senior Research Scientist
 Google Brain

first author of Transformer
 “Attention is All You Need”

James Cross (OSU, 2016)


(co-advised by David Chiang)


Research Scientist


Facebook

EMNLP 2016 Best Paper
 Honorable Mention

Kai Zhao (OSU, 2017)


(co-advised by David Chiang)


Research Scientist
 Google

11 top-conference papers
 (ACL/EMNLP/NAACL)

Mingbo Ma (OSU, 2018)


(co-advised by David Chiang)


Research Scientist
 Baidu Research USA breakthrough in 
 simultaneous translation

My PhD Graduates

I eat sushi with tuna from Japan

GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

RNA sequence natural language sentence

My Research: Efficient Structured Prediction

Bush met Putin in Moscow

source language sentence

slide-8
SLIDE 8

A Bit about Myself…

2

Ashish Vaswani (USC, 2014)


(co-advised by David Chiang)


Senior Research Scientist
 Google Brain

first author of Transformer
 “Attention is All You Need”

James Cross (OSU, 2016)


(co-advised by David Chiang)


Research Scientist


Facebook

EMNLP 2016 Best Paper
 Honorable Mention

Kai Zhao (OSU, 2017)


(co-advised by David Chiang)


Research Scientist
 Google

11 top-conference papers
 (ACL/EMNLP/NAACL)

Mingbo Ma (OSU, 2018)


(co-advised by David Chiang)


Research Scientist
 Baidu Research USA breakthrough in 
 simultaneous translation

My PhD Graduates

I eat sushi with tuna from Japan

GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

RNA sequence natural language sentence syntactic structure secondary structure

My Research: Efficient Structured Prediction

Bush met Putin in Moscow

source language sentence

布什茶在莫斯科与普京会晤

target-language sequence

slide-9
SLIDE 9

Language is Hard, Even for Humans

  • how many interpretations?

3

I saw her duck

Aravind Joshi (1929-2018)

slide-10
SLIDE 10

Language is Hard, Even for Humans

  • how many interpretations?

3

I saw her duck

Aravind Joshi (1929-2018)

slide-11
SLIDE 11

Language is Hard, Even for Humans

  • how many interpretations?

3

I saw her duck

Aravind Joshi (1929-2018)

lexical ambiguity

slide-12
SLIDE 12

Language is Hard, Even for Humans

  • how many interpretations?

4

I eat sushi with tuna

Aravind Joshi (1929-2018)

slide-13
SLIDE 13

Language is Hard, Even for Humans

  • how many interpretations?

4

I eat sushi with tuna

Aravind Joshi (1929-2018)

slide-14
SLIDE 14

Language is Hard, Even for Humans

  • how many interpretations?

4

I eat sushi with tuna

structural ambiguity

Aravind Joshi (1929-2018)

slide-15
SLIDE 15

Language is Hard, Even for Humans

  • how many interpretations?

4

I eat sushi with tuna

structural ambiguity

Aravind Joshi (1929-2018)

slide-16
SLIDE 16

Unexpected Structural Ambiguity

5

slide-17
SLIDE 17

Language is Hard: Ambiguity Explosion

  • how many interpretations?

6

I saw her duck

slide-18
SLIDE 18

Language is Hard: Ambiguity Explosion

  • how many interpretations?

6

I saw her duck with a telescope

slide-19
SLIDE 19

Language is Hard: Ambiguity Explosion

  • how many interpretations?

6

I saw her duck with a telescope

slide-20
SLIDE 20

Language is Hard: Ambiguity Explosion

  • how many interpretations?

6

I saw her duck with a telescope in the garden ... ...

slide-21
SLIDE 21

Language is Hard: Ambiguity Explosion

  • how many interpretations?

6

I saw her duck with a telescope in the garden ... ...

But humans can resolve these ambiguities incremental in linear-time!

slide-22
SLIDE 22

Human-Inspired NLP?

7

I eat sushi with tuna from Japan

slide-23
SLIDE 23

Human-Inspired NLP?

  • human sentence processing is well-known to be incremental and linear-time

7

I eat sushi with tuna from Japan

slide-24
SLIDE 24

Human-Inspired NLP?

  • human sentence processing is well-known to be incremental and linear-time

7

I eat sushi with tuna from Japan

O(n)

slide-25
SLIDE 25

Human-Inspired NLP?

  • human sentence processing is well-known to be incremental and linear-time
  • natural language processing algorithms are often non-incremental and slow


they often need full sentence as input and run in superlinear time

7

I eat sushi with tuna from Japan I eat sushi with tuna from Japan

O(n3)

. . .

O(n)

slide-26
SLIDE 26

Human-Inspired NLP?

  • human sentence processing is well-known to be incremental and linear-time
  • natural language processing algorithms are often non-incremental and slow


they often need full sentence as input and run in superlinear time

7

I eat sushi with tuna from Japan I eat sushi with tuna from Japan

O(n3)

. . .

O(n)

hsi I do like eating fish h/si f0 b0

1

f1 b1

2

f2 b2

3

f3 b3

4

f4 b4

5

f5 b5

slide-27
SLIDE 27

Human-Inspired NLP?

  • human sentence processing is well-known to be incremental and linear-time
  • natural language processing algorithms are often non-incremental and slow


they often need full sentence as input and run in superlinear time

7

I eat sushi with tuna from Japan I eat sushi with tuna from Japan

O(n3)

. . .

O(n)

hsi I do like eating fish h/si f0 b0

1

f1 b1

2

f2 b2

3

f3 b3

4

f4 b4

5

f5 b5

… wait whole source sentence …

1 2

source: target:

4 1 2 3 5

seq-to-seq

slide-28
SLIDE 28

Three Stories on Incrementality and Instantaneity

8

simultaneous translation incremental parsing linear-time RNA structure prediction

slide-29
SLIDE 29

Three Stories on Incrementality and Instantaneity

8

I eat sushi with tuna from Japan

GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

RNA sequence natural language sentence

Bush met Putin in Moscow

source language sentence

simultaneous translation incremental parsing linear-time RNA structure prediction

slide-30
SLIDE 30

Three Stories on Incrementality and Instantaneity

8

I eat sushi with tuna from Japan

GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

RNA sequence natural language sentence syntactic structure secondary structure

Bush met Putin in Moscow

source language sentence

布什茶在莫斯科与普京会晤

target-language sequence

simultaneous translation incremental parsing linear-time RNA structure prediction

slide-31
SLIDE 31

Part 1: Simultaneous Translation

(Ma, Huang, et al, ArXiv 2018; under review)

slide-32
SLIDE 32

Part 1: Simultaneous Translation

(Ma, Huang, et al, ArXiv 2018; under review)

slide-33
SLIDE 33

Background: Consecutive vs. Simultaneous Interpretation

consecutive interpretation
 multiplicative latency (x2) simultaneous interpretation
 additive latency (+3 secs)

slide-34
SLIDE 34

Background: Consecutive vs. Simultaneous Interpretation

consecutive interpretation
 multiplicative latency (x2) simultaneous interpretation
 additive latency (+3 secs)

simultaneous interpretation is extremely difficult

  • nly ~3,000 qualified simultaneous

interpreters world-wide each interpreter can only sustain for 
 at most 10-30 minutes the best interpreters can only cover 
 ~60% of the source material

slide-35
SLIDE 35

Background: Consecutive vs. Simultaneous Interpretation

consecutive interpretation
 multiplicative latency (x2) simultaneous interpretation
 additive latency (+3 secs)

simultaneous interpretation is extremely difficult

  • nly ~3,000 qualified simultaneous

interpreters world-wide each interpreter can only sustain for 
 at most 10-30 minutes the best interpreters can only cover 
 ~60% of the source material

just use standard
 full-sentence translation (e.g., seq-to-seq)

  • ne of the holy grails of AI


 need fundamentally different ideas!

slide-36
SLIDE 36

Our Breakthrough

11

Baidu World Conference, November 2017 Baidu World Conference, November 2018

  • ur

work

full-sentence (non-simultaneous) translation
 latency: one sentence (10+ secs) simultaneous translation achieved for the first time
 latency ~3 secs and many other companies

slide-37
SLIDE 37

Our Breakthrough

11

Baidu World Conference, November 2017 Baidu World Conference, November 2018

  • ur

work

full-sentence (non-simultaneous) translation
 latency: one sentence (10+ secs) simultaneous translation achieved for the first time
 latency ~3 secs and many other companies

slide-38
SLIDE 38

Our Breakthrough

11

Baidu World Conference, November 2017 Baidu World Conference, November 2018

  • ur

work

full-sentence (non-simultaneous) translation
 latency: one sentence (10+ secs) simultaneous translation achieved for the first time
 latency ~3 secs and many other companies

slide-39
SLIDE 39

Our Breakthrough

11

Baidu World Conference, November 2017 Baidu World Conference, November 2018

  • ur

work

full-sentence (non-simultaneous) translation
 latency: one sentence (10+ secs) simultaneous translation achieved for the first time
 latency ~3 secs and many other companies

slide-40
SLIDE 40

Our Breakthrough

11

Baidu World Conference, November 2017 Baidu World Conference, November 2018

  • ur

work

full-sentence (non-simultaneous) translation
 latency: one sentence (10+ secs) simultaneous translation achieved for the first time
 latency ~3 secs and many other companies

slide-41
SLIDE 41

Challenge: Word Order Difference

  • e.g. translate from Subj-Obj-Verb (Japanese, German) to Subj-Verb-Obj (English)
  • German is underlyingly SOV, and Chinese is a mix of SVO and SOV
  • human simultaneous interpreters routinely “anticipate” (e.g., predicting German verb)

Grissom et al, 2014

slide-42
SLIDE 42

Challenge: Word Order Difference

  • e.g. translate from Subj-Obj-Verb (Japanese, German) to Subj-Verb-Obj (English)
  • German is underlyingly SOV, and Chinese is a mix of SVO and SOV
  • human simultaneous interpreters routinely “anticipate” (e.g., predicting German verb)

Grissom et al, 2014

President Bush meets with Russian President Putin in Moscow

slide-43
SLIDE 43

Challenge: Word Order Difference

  • e.g. translate from Subj-Obj-Verb (Japanese, German) to Subj-Verb-Obj (English)
  • German is underlyingly SOV, and Chinese is a mix of SVO and SOV
  • human simultaneous interpreters routinely “anticipate” (e.g., predicting German verb)

Grissom et al, 2014

non-anticipative: President Bush (…… waiting ……) meets with Russian … President Bush meets with Russian President Putin in Moscow

slide-44
SLIDE 44

Challenge: Word Order Difference

  • e.g. translate from Subj-Obj-Verb (Japanese, German) to Subj-Verb-Obj (English)
  • German is underlyingly SOV, and Chinese is a mix of SVO and SOV
  • human simultaneous interpreters routinely “anticipate” (e.g., predicting German verb)

Grissom et al, 2014

non-anticipative: President Bush (…… waiting ……) meets with Russian … President Bush meets with Russian President Putin in Moscow anticipative: President Bush meets with Russian President Putin in Moscow

slide-45
SLIDE 45

Our Solution: Prefix-to-Prefix

… wait whole source sentence …

1 2

source: target:

4 1 2 3 5

seq-to-seq

4 1 2 3

wait k words

1 2

source: target:

5

prefix-to-prefix
 (wait-k)

  • standard seq-to-seq is only suitable for

conventional full-sentence MT

  • we propose prefix-to-prefix, 


tailed to simultaneous MT

  • special case: wait-k policy: translation is

always k words behind source sentence

  • training in this way enables anticipation

p(yi | x1 … xn , y1…yi-1)

p(yi | x1 … xi+k-1 , y1…yi-1)

slide-46
SLIDE 46

Our Solution: Prefix-to-Prefix

… wait whole source sentence …

1 2

source: target:

4 1 2 3 5

seq-to-seq

4 1 2 3

wait k words

1 2

source: target:

5

prefix-to-prefix
 (wait-k)

  • standard seq-to-seq is only suitable for

conventional full-sentence MT

  • we propose prefix-to-prefix, 


tailed to simultaneous MT

  • special case: wait-k policy: translation is

always k words behind source sentence

  • training in this way enables anticipation

President

Bùshí

布什茶

Bush zǒngtǒng

总统

President

wait 2

p(yi | x1 … xn , y1…yi-1)

p(yi | x1 … xi+k-1 , y1…yi-1)

slide-47
SLIDE 47

Our Solution: Prefix-to-Prefix

… wait whole source sentence …

1 2

source: target:

4 1 2 3 5

seq-to-seq

4 1 2 3

wait k words

1 2

source: target:

5

prefix-to-prefix
 (wait-k)

  • standard seq-to-seq is only suitable for

conventional full-sentence MT

  • we propose prefix-to-prefix, 


tailed to simultaneous MT

  • special case: wait-k policy: translation is

always k words behind source sentence

  • training in this way enables anticipation

President Bush

Bùshí

布什茶

Bush zǒngtǒng

总统

President zài

in

wait 2

p(yi | x1 … xn , y1…yi-1)

p(yi | x1 … xi+k-1 , y1…yi-1)

slide-48
SLIDE 48

Our Solution: Prefix-to-Prefix

… wait whole source sentence …

1 2

source: target:

4 1 2 3 5

seq-to-seq

4 1 2 3

wait k words

1 2

source: target:

5

prefix-to-prefix
 (wait-k)

  • standard seq-to-seq is only suitable for

conventional full-sentence MT

  • we propose prefix-to-prefix, 


tailed to simultaneous MT

  • special case: wait-k policy: translation is

always k words behind source sentence

  • training in this way enables anticipation

President Bush meets

Bùshí

布什茶

Bush zǒngtǒng

总统

President zài

in Mòsīkē

莫斯科

Moscow

wait 2

p(yi | x1 … xn , y1…yi-1)

p(yi | x1 … xi+k-1 , y1…yi-1)

slide-49
SLIDE 49

Our Solution: Prefix-to-Prefix

… wait whole source sentence …

1 2

source: target:

4 1 2 3 5

seq-to-seq

4 1 2 3

wait k words

1 2

source: target:

5

prefix-to-prefix
 (wait-k)

  • standard seq-to-seq is only suitable for

conventional full-sentence MT

  • we propose prefix-to-prefix, 


tailed to simultaneous MT

  • special case: wait-k policy: translation is

always k words behind source sentence

  • training in this way enables anticipation

President Bush meets with

Bùshí

布什茶

Bush zǒngtǒng

总统

President zài

in Mòsīkē

莫斯科

Moscow yǔ

with

wait 2

p(yi | x1 … xn , y1…yi-1)

p(yi | x1 … xi+k-1 , y1…yi-1)

slide-50
SLIDE 50

Our Solution: Prefix-to-Prefix

… wait whole source sentence …

1 2

source: target:

4 1 2 3 5

seq-to-seq

4 1 2 3

wait k words

1 2

source: target:

5

prefix-to-prefix
 (wait-k)

  • standard seq-to-seq is only suitable for

conventional full-sentence MT

  • we propose prefix-to-prefix, 


tailed to simultaneous MT

  • special case: wait-k policy: translation is

always k words behind source sentence

  • training in this way enables anticipation

President Bush meets with Russian

Bùshí

布什茶

Bush zǒngtǒng

总统

President zài

in Mòsīkē

莫斯科

Moscow yǔ

with Éluósī

俄罗斯

Russian

wait 2

p(yi | x1 … xn , y1…yi-1)

p(yi | x1 … xi+k-1 , y1…yi-1)

slide-51
SLIDE 51

Our Solution: Prefix-to-Prefix

… wait whole source sentence …

1 2

source: target:

4 1 2 3 5

seq-to-seq

4 1 2 3

wait k words

1 2

source: target:

5

prefix-to-prefix
 (wait-k)

  • standard seq-to-seq is only suitable for

conventional full-sentence MT

  • we propose prefix-to-prefix, 


tailed to simultaneous MT

  • special case: wait-k policy: translation is

always k words behind source sentence

  • training in this way enables anticipation

President Bush meets with Russian President

Bùshí

布什茶

Bush zǒngtǒng

总统

President zài

in Mòsīkē

莫斯科

Moscow yǔ

with zǒngtǒng

总统

President Éluósī

俄罗斯

Russian

wait 2

p(yi | x1 … xn , y1…yi-1)

p(yi | x1 … xi+k-1 , y1…yi-1)

slide-52
SLIDE 52

Our Solution: Prefix-to-Prefix

… wait whole source sentence …

1 2

source: target:

4 1 2 3 5

seq-to-seq

4 1 2 3

wait k words

1 2

source: target:

5

prefix-to-prefix
 (wait-k)

  • standard seq-to-seq is only suitable for

conventional full-sentence MT

  • we propose prefix-to-prefix, 


tailed to simultaneous MT

  • special case: wait-k policy: translation is

always k words behind source sentence

  • training in this way enables anticipation

President Bush meets with Russian President

Bùshí

布什茶

Bush zǒngtǒng

总统

President zài

in Mòsīkē

莫斯科

Moscow yǔ

with zǒngtǒng

总统

President Éluósī

俄罗斯

Russian Pǔjīng

普京

Putin

Putin

wait 2

p(yi | x1 … xn , y1…yi-1)

p(yi | x1 … xi+k-1 , y1…yi-1)

slide-53
SLIDE 53

Our Solution: Prefix-to-Prefix

… wait whole source sentence …

1 2

source: target:

4 1 2 3 5

seq-to-seq

4 1 2 3

wait k words

1 2

source: target:

5

prefix-to-prefix
 (wait-k)

  • standard seq-to-seq is only suitable for

conventional full-sentence MT

  • we propose prefix-to-prefix, 


tailed to simultaneous MT

  • special case: wait-k policy: translation is

always k words behind source sentence

  • training in this way enables anticipation

President Bush meets with Russian President

Bùshí

布什茶

Bush zǒngtǒng

总统

President zài

in Mòsīkē

莫斯科

Moscow yǔ

with zǒngtǒng

总统

President Éluósī

俄罗斯

Russian Pǔjīng

普京

Putin

Putin in Moscow

huìwù

会晤

meet

wait 2

p(yi | x1 … xn , y1…yi-1)

p(yi | x1 … xi+k-1 , y1…yi-1)

slide-54
SLIDE 54

Research Demo

14

This is just our research demo. Our production system is better (shorter ASR latency).

slide-55
SLIDE 55

Research Demo

14

This is just our research demo. Our production system is better (shorter ASR latency).

slide-56
SLIDE 56

Research Demo

14

This is just our research demo. Our production system is better (shorter ASR latency).

江 泽⺠氒 对 法国 总统 的 来华 访问 表示 感谢 。

jiāng zé mín d u ì fǎ guó zǒng tǒng d e l á i huá fǎng wèn biǎo shì gǎn xiè

jiang zemin to French President ’s to-China visit express gratitude

jiang zemin expressed his appreciation for the visit by french president .

slide-57
SLIDE 57

Latency-Accuracy Tradeoff

15

slide-58
SLIDE 58

Latency-Accuracy Tradeoff

15

slide-59
SLIDE 59

Deployment Demo

16

This is live recording from the Baidu World Conference on Nov 1, 2018.

slide-60
SLIDE 60

Deployment Demo

16

This is live recording from the Baidu World Conference on Nov 1, 2018.

slide-61
SLIDE 61

Experimental Results (German=>English)

17

German source: doch während man sich im kongress nicht auf ein vorgehen einigen kann , warten mehrere bundesstaaten nicht länger .

but while they self in congress not on one action agree can wait several states not longer

English translation (simultaneous, wait 3): but , while congress does not agree on a course of action , several states no longer wait . English translation (full-sentence baseline): but , while congressional action can not be agreed , several states are no longer waiting .

slide-62
SLIDE 62

Experimental Results (German=>English)

17

German source: doch während man sich im kongress nicht auf ein vorgehen einigen kann , warten mehrere bundesstaaten nicht länger .

but while they self in congress not on one action agree can wait several states not longer

English translation (simultaneous, wait 3): but , while congress does not agree on a course of action , several states no longer wait . English translation (full-sentence baseline): but , while congressional action can not be agreed , several states are no longer waiting .

Gu et al. (2017) full-sentence
 baselines

slide-63
SLIDE 63

Experimental Results (German=>English)

17

German source: doch während man sich im kongress nicht auf ein vorgehen einigen kann , warten mehrere bundesstaaten nicht länger .

but while they self in congress not on one action agree can wait several states not longer

English translation (simultaneous, wait 3): but , while congress does not agree on a course of action , several states no longer wait . English translation (full-sentence baseline): but , while congressional action can not be agreed , several states are no longer waiting .

Gu et al. (2017) full-sentence
 baselines

wait 8 words

I traveled to Ulm by train full-sentence baseline: CW = 8

wait 2 wait 6 words

I traveled to Ulm by train Gu et al. (2017): CW = (2+6)/2 = 4

wait 4

took I a train to Ulm

1 1 1 1

  • ur wait 4 model: CW = (4+1+1+1+1)/5 = 1.6
slide-64
SLIDE 64

Summary of Innovations and Impact

  • first simultaneous translation approach with integrated anticipation
  • inspired by human simultaneous interpreters who routinely anticipate
  • first simultaneous translation approach with arbitrary controllable latency
  • previous RL-based approaches can encourage but can’t enforce latency limit
  • very easy to train and scalable — minor changes to any neural MT codebase
  • prefix-to-prefix is very general; can be used in other tasks with simultaneity

18

slide-65
SLIDE 65

Summary of Innovations and Impact

  • first simultaneous translation approach with integrated anticipation
  • inspired by human simultaneous interpreters who routinely anticipate
  • first simultaneous translation approach with arbitrary controllable latency
  • previous RL-based approaches can encourage but can’t enforce latency limit
  • very easy to train and scalable — minor changes to any neural MT codebase
  • prefix-to-prefix is very general; can be used in other tasks with simultaneity

18

slide-66
SLIDE 66

Next: Integrate Incremental Predictive Parsing

  • how to be smarter about when to wait and when to translate?

19

关于 克林淋顿主义 , 没有 准确 的 定义

guānyú k è l í n d ù n z h ǔ y ì méiyǒu zhǔnquè d e d ì n g y ì

about Clintonism no accurate def. “There is no accurate definition of Clintonism.”

习近平 于 2012 年憐 在 北磻京 当选

x í j ì n p í n g y ú nián z à i b ě i j ī n g dāngxuǎn

Xi Jiping in 2012 yr in Beijing elected “Xi Jinping was elected in Beijing in 2012”

mandatory reordering (i.e., wait):

  • ptional reordering:

reference translation

slide-67
SLIDE 67

Next: Integrate Incremental Predictive Parsing

  • how to be smarter about when to wait and when to translate?

19

关于 克林淋顿主义 , 没有 准确 的 定义

guānyú k è l í n d ù n z h ǔ y ì méiyǒu zhǔnquè d e d ì n g y ì

about Clintonism no accurate def. “There is no accurate definition of Clintonism.”

习近平 于 2012 年憐 在 北磻京 当选

x í j ì n p í n g y ú nián z à i b ě i j ī n g dāngxuǎn

Xi Jiping in 2012 yr in Beijing elected “Xi Jinping was elected in Beijing in 2012” About Clintonism, there is no accurate definition.

mandatory reordering (i.e., wait):

  • ptional reordering:

Xi Jinping ….. was elected…

reference translation ideal simultaneous

slide-68
SLIDE 68

Next: Integrate Incremental Predictive Parsing

  • how to be smarter about when to wait and when to translate?

19

关于 克林淋顿主义 , 没有 准确 的 定义

guānyú k è l í n d ù n z h ǔ y ì méiyǒu zhǔnquè d e d ì n g y ì

about Clintonism no accurate def. “There is no accurate definition of Clintonism.”

习近平 于 2012 年憐 在 北磻京 当选

x í j ì n p í n g y ú nián z à i b ě i j ī n g dāngxuǎn

Xi Jiping in 2012 yr in Beijing elected “Xi Jinping was elected in Beijing in 2012” About Clintonism, there is no accurate definition.

VP PP PP VP VP NP S S PP S

mandatory reordering (i.e., wait):

  • ptional reordering:

(Chinese) PP VP => (English) VP PP (Chinese) PP S => (English) PP S or S PP

Xi Jinping ….. was elected…

reference translation ideal simultaneous

slide-69
SLIDE 69

Part II: Linear-Time Incremental Parsing

(Huang & Sagae, ACL 2010*; Goldberg, Zhao, Huang, ACL 2013; 
 Zhao, Cross, Huang, EMNLP 2013; Mi & Huang, ACL 2015; 
 Cross & Huang, ACL 2016; Cross & Huang, EMNLP 2016**
 Hong and Huang, ACL 2018)

* best paper nominee ** best paper honorable mention

S NP DT the NN man VP VB bit NP DT the NN dog

the man bit the dog

constituency parsing dependency parsing

the man bit the dog

bit man the dog the

slide-70
SLIDE 70

Part II: Linear-Time Incremental Parsing

(Huang & Sagae, ACL 2010*; Goldberg, Zhao, Huang, ACL 2013; 
 Zhao, Cross, Huang, EMNLP 2013; Mi & Huang, ACL 2015; 
 Cross & Huang, ACL 2016; Cross & Huang, EMNLP 2016**
 Hong and Huang, ACL 2018)

* best paper nominee ** best paper honorable mention

S NP DT the NN man VP VB bit NP DT the NN dog

the man bit the dog

constituency parsing dependency parsing

the man bit the dog

bit man the dog the

slide-71
SLIDE 71

Motivations for Incremental Parsing

  • simultaneous translation
  • auto completion (search suggestions)
  • question answering
  • dialog
  • speech recognition
  • input method editor

21

slide-72
SLIDE 72

Human Parsing vs. Compilers vs. NL Parsing

22

:= id x + id y const 3

x = y + 3;

I eat sushi with tuna from Japan

I eat sushi with tuna from Japan

slide-73
SLIDE 73

Human Parsing vs. Compilers vs. NL Parsing

22

:= id x + id y const 3

x = y + 3;

I eat sushi with tuna from Japan

I eat sushi with tuna from Japan

O(n) O(n3) O(n)

slide-74
SLIDE 74

Human Parsing vs. Compilers vs. NL Parsing

  • can we design NL parsing algorithms that is both fast and accurate, 


inspired by human sentence processing and compilers?

  • our idea: generalize PL parsing (LR algorithm) to NL parsing, but keep it O(n)
  • challenge: how to deal with ambiguity explosion in NL?
  • solution: linear-time dynamic programming — both fast and accurate!

22

:= id x + id y const 3

x = y + 3;

I eat sushi with tuna from Japan

I eat sushi with tuna from Japan

O(n) O(n3) O(n)

slide-75
SLIDE 75

Solution: linear-time, DP , and accurate!

  • very fast linear-time dynamic programming parser
  • explores exponentially many trees (and outputs forest)
  • accurate parsing accuracy on English & Chinese

23

slide-76
SLIDE 76

Solution: linear-time, DP , and accurate!

  • very fast linear-time dynamic programming parser
  • explores exponentially many trees (and outputs forest)
  • accurate parsing accuracy on English & Chinese

23

this work

O(n2) O(n) O(n2.4) O(n2.5)

slide-77
SLIDE 77

Solution: linear-time, DP , and accurate!

  • very fast linear-time dynamic programming parser
  • explores exponentially many trees (and outputs forest)
  • accurate parsing accuracy on English & Chinese

23

this work DP: exponential

non-DP beam search

O(n2) O(n) O(n2.4) O(n2.5)

slide-78
SLIDE 78

Incremental Parsing (Shift-Reduce)

24

I eat sushi with tuna from Japan

slide-79
SLIDE 79

Incremental Parsing (Shift-Reduce)

24

I eat sushi with tuna from Japan

slide-80
SLIDE 80

Incremental Parsing (Shift-Reduce)

24

action stack queue

I eat sushi with tuna from Japan

slide-81
SLIDE 81

Incremental Parsing (Shift-Reduce)

24

action stack queue

I eat sushi ...

  • I eat sushi with tuna from Japan
slide-82
SLIDE 82

Incremental Parsing (Shift-Reduce)

24

action stack queue

I eat sushi ... eat sushi with ...

I

  • 1

shift

I eat sushi with tuna from Japan

slide-83
SLIDE 83

Incremental Parsing (Shift-Reduce)

24

action stack queue

I eat sushi ... eat sushi with ... sushi with tuna ...

I eat I

  • 1

shift 2 shift

I eat sushi with tuna from Japan

slide-84
SLIDE 84

Incremental Parsing (Shift-Reduce)

24

action stack queue

I eat sushi ... eat sushi with ... sushi with tuna ... sushi with tuna ...

I eat I eat I

  • 1

shift 2 shift 3 l-reduce

I eat sushi with tuna from Japan

slide-85
SLIDE 85

Incremental Parsing (Shift-Reduce)

24

action stack queue

I eat sushi ... eat sushi with ... sushi with tuna ... sushi with tuna ... with tuna from ...

I eat I eat I eat sushi I

  • 1

shift 2 shift 3 l-reduce 4 shift

I eat sushi with tuna from Japan

slide-86
SLIDE 86

Incremental Parsing (Shift-Reduce)

24

action stack queue

I eat sushi ... eat sushi with ... sushi with tuna ... sushi with tuna ... with tuna from ... with tuna from ...

I eat I eat I eat sushi I eat I sushi

  • 1

shift 2 shift 3 l-reduce 4 shift 5a r-reduce

I eat sushi with tuna from Japan

slide-87
SLIDE 87

Incremental Parsing (Shift-Reduce)

24

action stack queue

I eat sushi ... eat sushi with ... sushi with tuna ... sushi with tuna ... with tuna from ... with tuna from ... tuna from Japan ...

I eat I eat I eat sushi I eat I sushi eat sushi with I

  • 1

shift 2 shift 3 l-reduce 4 shift 5a r-reduce 5b shift

I eat sushi with tuna from Japan

slide-88
SLIDE 88

Incremental Parsing (Shift-Reduce)

24

action stack queue

shift-reduce
 conflict

I eat sushi ... eat sushi with ... sushi with tuna ... sushi with tuna ... with tuna from ... with tuna from ... tuna from Japan ...

I eat I eat I eat sushi I eat I sushi eat sushi with I

  • 1

shift 2 shift 3 l-reduce 4 shift 5a r-reduce 5b shift

I eat sushi with tuna from Japan

slide-89
SLIDE 89

Greedy Search

  • each state => three new states (shift, l-reduce, r-reduce)
  • greedy search: always pick the best next state
  • “best” is defined by a score learned from data

25

sh l-re r-re

slide-90
SLIDE 90

Greedy Search

  • each state => three new states (shift, l-reduce, r-reduce)
  • greedy search: always pick the best next state
  • “best” is defined by a score learned from data

26

slide-91
SLIDE 91

Beam Search

  • each state => three new states (shift, l-reduce, r-reduce)
  • beam search: always keep top-b states
  • still just a tiny fraction of the whole search space

27

slide-92
SLIDE 92

Beam Search

  • each state => three new states (shift, l-reduce, r-reduce)
  • beam search: always keep top-b states
  • still just a tiny fraction of the whole search space

27

psycholinguistic evidence: parallelism (Fodor et al, 1974; Gibson, 1991)

slide-93
SLIDE 93

Dynamic Programming

  • each state => three new states (shift, l-reduce, r-reduce)
  • key idea of DP: share common subproblems
  • merge equivalent states => polynomial space

28 (Huang and Sagae, 2010)

slide-94
SLIDE 94

Dynamic Programming

  • each state => three new states (shift, l-reduce, r-reduce)
  • key idea of DP: share common subproblems
  • merge equivalent states => polynomial space

29 (Huang and Sagae, 2010)

slide-95
SLIDE 95

Dynamic Programming

  • each state => three new states (shift, l-reduce, r-reduce)
  • key idea of DP: share common subproblems
  • merge equivalent states => polynomial space

30 (Huang and Sagae, 2010)

slide-96
SLIDE 96

Dynamic Programming

  • each state => three new states (shift, l-reduce, r-reduce)
  • key idea of DP: share common subproblems
  • merge equivalent states => polynomial space

30

each DP state corresponds to
 exponentially many non-DP states

(Huang and Sagae, 2010)

graph-structured stack


(Tomita, 1986)

slide-97
SLIDE 97

Dynamic Programming

  • each state => three new states (shift, l-reduce, r-reduce)
  • key idea of DP: share common subproblems
  • merge equivalent states => polynomial space

31

each DP state corresponds to
 exponentially many non-DP states

(Huang and Sagae, 2010)

DP: exponential

non-DP beam search

slide-98
SLIDE 98

Dynamic Programming

  • each state => three new states (shift, l-reduce, r-reduce)
  • key idea of DP: share common subproblems
  • merge equivalent states => polynomial space

31

each DP state corresponds to
 exponentially many non-DP states

(Huang and Sagae, 2010)

DP: exponential

non-DP beam search

graph-structured stack


(Tomita, 1986)

slide-99
SLIDE 99

Merging (Ambiguity Packing)

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • example: if we only care about the last 2 words on stack

32

I sushi I eat sushi eat sushi

(Huang and Sagae, 2010)

slide-100
SLIDE 100

Merging (Ambiguity Packing)

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • example: if we only care about the last 2 words on stack

32

I sushi I eat sushi eat sushi

(Huang and Sagae, 2010)

two equivalent classes

... eat sushi ... I sushi

slide-101
SLIDE 101

Merging (Ambiguity Packing)

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • example: if we only care about the last 2 words on stack

32

I sushi I eat sushi eat sushi

(Huang and Sagae, 2010)

psycholinguistic evidence 
 (eye-tracking experiments): delayed disambiguation

John and Mary had 2 papers John and Mary had 2 papers

Frazier and Rayner (1990), Frazier (1999)

two equivalent classes

... eat sushi ... I sushi

slide-102
SLIDE 102

Merging (Ambiguity Packing)

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • example: if we only care about the last 2 words on stack

32

I sushi I eat sushi eat sushi

(Huang and Sagae, 2010)

psycholinguistic evidence 
 (eye-tracking experiments): delayed disambiguation

John and Mary had 2 papers John and Mary had 2 papers

Frazier and Rayner (1990), Frazier (1999)

two equivalent classes

... eat sushi ... I sushi

each together

slide-103
SLIDE 103

Results: Fast and Accurate

33

Parsre Note F1 Score

Durett + Klein 2015

cubic-time parser 91.1

Cross + Huang 2016

  • riginal span parser

(greedy) 91.3

Liu + Zhang 2016

greedy / beam 91.7

Dyer et al. 2016

greedy / beam 91.7

Stern 2017a

cubic-time 
 span-based parser 91.79 Our Work linear-time dynamic programming, 
 span-based 91.97

Constituency parsing, PTB only, Single Model, End-to-End

(Hong and Huang, 2018)

c h a r t p a r s i n g ( c u b i c

  • t

i m e ) This Work

slide-104
SLIDE 104

Part III: Linear-Time RNA Structure Prediction

(Huang et al, 2019; under review)

slide-105
SLIDE 105

Part III: Linear-Time RNA Structure Prediction

(Huang et al, 2019; under review)

slide-106
SLIDE 106

Computational Linguistics => Computational Biology

1955 Chomsky: 
 context-free grammars 1953 Watson & Crick:
 DNA double-helix

linguistics biology computer science

1960s CKY Parsing: O(n3) 1965 Knuth: LR Parsing: O(n) 1958 Backus & Naur: CFGs in programming lang. 1986 Tomita: Generalized LR Parsing 2010: linear-time DP parsing
 (Huang & Sagae) 1980s: O(n3) CKY for RNA structures 2018: linear-time 
 RNA structure prediction

35

GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

S NP DT the NN man VP VB bit NP DT the NN dog

bit man the dog the

slide-107
SLIDE 107

Computational Linguistics => Computational Biology

1955 Chomsky: 
 context-free grammars 1953 Watson & Crick:
 DNA double-helix

linguistics biology computer science

1960s CKY Parsing: O(n3) 1965 Knuth: LR Parsing: O(n) 1958 Backus & Naur: CFGs in programming lang. 1986 Tomita: Generalized LR Parsing 2010: linear-time DP parsing
 (Huang & Sagae) 1980s: O(n3) CKY for RNA structures 2018: linear-time 
 RNA structure prediction

35

GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

S NP DT the NN man VP VB bit NP DT the NN dog

bit man the dog the

slide-108
SLIDE 108

RNAs and Structure Prediction

36

GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

RNA sequence

RNA has dual roles: informational (DNA=>RNA=>protein) functional (non-coding RNAs) knowing structures can infer function

slide-109
SLIDE 109

RNAs and Structure Prediction

36

GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

RNA sequence secondary structure structure prediction
 (“RNA folding”)

RNA has dual roles: informational (DNA=>RNA=>protein) functional (non-coding RNAs) knowing structures can infer function

slide-110
SLIDE 110

RNA Secondary Structure Prediction

37 GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

allowed pairs: G-C A-U G-U assume no crossing pairs

x

37

input example: transfer RNA (tRNA)

slide-111
SLIDE 111

RNA Secondary Structure Prediction

37 GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....

allowed pairs: G-C A-U G-U assume no crossing pairs

x y

37

input

  • utput

example: transfer RNA (tRNA)

slide-112
SLIDE 112

RNA Secondary Structure Prediction

37 GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....

allowed pairs: G-C A-U G-U assume no crossing pairs

x y

37

G C G G G A A U A G C U C A G U U G G U A G A G C A C G A C C U U G C C A A G G U C G G G G U C G C G A G U U C G A G U C U C G U U U C C C G C U C C A

1 10 20 30 40 50 60 70 76

input

  • utput

example: transfer RNA (tRNA)

slide-113
SLIDE 113

parse tree

RNA Secondary Structure Prediction

37 GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....

allowed pairs: G-C A-U G-U assume no crossing pairs

x y

37

G C G G G A A U A G C U C A G U U G G U A G A G C A C G A C C U U G C C A A G G U C G G G G U C G C G A G U U C G A G U C U C G U U U C C C G C U C C A

1 10 20 30 40 50 60 70 76

input

  • utput

example: transfer RNA (tRNA)

slide-114
SLIDE 114

parse tree

RNA Secondary Structure Prediction

37 GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....

allowed pairs: G-C A-U G-U assume no crossing pairs

x y

37

problem: standard structure prediction algorithms are way too slow: O(n3)

G C G G G A A U A G C U C A G U U G G U A G A G C A C G A C C U U G C C A A G G U C G G G G U C G C G A G U U C G A G U C U C G U U U C C C G C U C C A

1 10 20 30 40 50 60 70 76

input

  • utput

example: transfer RNA (tRNA)

. . .

O(n3)

S NP DT the NN man VP VB bit NP DT the NN dog

slide-115
SLIDE 115

parse tree

RNA Secondary Structure Prediction

37 GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....

allowed pairs: G-C A-U G-U assume no crossing pairs

x y

37

problem: standard structure prediction algorithms are way too slow: O(n3) solution: adapt my linear-time dynamic programming algorithms from parsing

G C G G G A A U A G C U C A G U U G G U A G A G C A C G A C C U U G C C A A G G U C G G G G U C G C G A G U U C G A G U C U C G U U U C C C G C U C C A

1 10 20 30 40 50 60 70 76

input

  • utput

example: transfer RNA (tRNA)

. . .

O(n3)

S NP DT the NN man VP VB bit NP DT the NN dog

slide-116
SLIDE 116

5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

How to Fold RNAs in Linear-Time?

  • idea 0: tag each nucleotide from left to right
  • maintain a stack: push “(”, pop “)”, skip “.”
  • exhaustive: O(3n)

38

slide-117
SLIDE 117

5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....

How to Fold RNAs in Linear-Time?

  • idea 0: tag each nucleotide from left to right
  • maintain a stack: push “(”, pop “)”, skip “.”
  • exhaustive: O(3n)

38

( . )

slide-118
SLIDE 118

5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

How to Fold RNAs in Linear-Time?

  • idea 1: DP by merging “equivalent states”
  • maintain graph-structured stacks
  • DP: O(n3)

(((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....

39

( . )

slide-119
SLIDE 119

5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....

How to Fold RNAs in Linear-Time?

  • idea 1: DP by merging “equivalent states”
  • maintain graph-structured stacks
  • DP: O(n3)

40

( . )

slide-120
SLIDE 120

5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

How to Fold RNAs in Linear-Time?

  • idea 2: approximate search: beam pruning
  • keep only top b states per step
  • DP+beam: O(n)

(((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....

41

slide-121
SLIDE 121

5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

How to Fold RNAs in Linear-Time?

  • idea 2: approximate search: beam pruning
  • keep only top b states per step
  • DP+beam: O(n)

each DP state corresponds to
 exponentially many non-DP states

(((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....

41

slide-122
SLIDE 122

5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

How to Fold RNAs in Linear-Time?

  • idea 2: approximate search: beam pruning
  • keep only top b states per step
  • DP+beam: O(n)

each DP state corresponds to
 exponentially many non-DP states

(((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....

41

slide-123
SLIDE 123

5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

How to Fold RNAs in Linear-Time?

  • idea 2: approximate search: beam pruning
  • keep only top b states per step
  • DP+beam: O(n)

each DP state corresponds to
 exponentially many non-DP states

(((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....

41

b e a m s e a r c h

slide-124
SLIDE 124

Ambiguity Packing in Biology and Language

  • two states are “temporarily equivalent” if their rightmost unpaired brackets are the same

42

psycholinguistic evidence 
 (eye-tracking experiments): delayed disambiguation

John and Mary had 2 papers John and Mary had 2 papers

Frazier and Rayner (1990), Frazier (1999)

‘ ( (( ((. ((.) ((.)) . .( .(. .(.) .(.).

. ( . ) . ( ( . ) )

slide-125
SLIDE 125

Ambiguity Packing in Biology and Language

  • two states are “temporarily equivalent” if their rightmost unpaired brackets are the same

42

psycholinguistic evidence 
 (eye-tracking experiments): delayed disambiguation

John and Mary had 2 papers John and Mary had 2 papers

Frazier and Rayner (1990), Frazier (1999)

‘ ( (( ((. ((.) ((.)) . .( .(. .(.) .(.).

. ( . ) . ( ( . ) )

‘ ( ?( ?(. ((.) ((.)) . .(.)

. ( . . ) . ( ( ) )

packing unpacking

slide-126
SLIDE 126

Ambiguity Packing in Biology and Language

  • two states are “temporarily equivalent” if their rightmost unpaired brackets are the same

42

psycholinguistic evidence 
 (eye-tracking experiments): delayed disambiguation

John and Mary had 2 papers John and Mary had 2 papers

Frazier and Rayner (1990), Frazier (1999)

each together

John and Mary … had 2 papers … together … each

‘ ( (( ((. ((.) ((.)) . .( .(. .(.) .(.).

. ( . ) . ( ( . ) )

‘ ( ?( ?(. ((.) ((.)) . .(.)

. ( . . ) . ( ( ) )

packing unpacking

slide-127
SLIDE 127

Our Linear-Time Prediction is Much Faster…

43

10,000nt (~HIV) 4min 7s 244,296nt (longest in RNAcentral) ~200hrs 120s

1 2 3 4 5 6 7 8 9 1000nt 2000nt 3000nt

CONTRAfold MFE, ~n2.6 V i e n n a R N A f

  • l

d , ~ n2.6 LinearFold b=100, ~n1.0 LinearFold b=50, ~n

1 .

running time per sequence (sec)

1 10 100 1000 103nt 104nt 105nt

Vienna RNAfold: n2.6 CONTRAfold MFE: n2.6 LinearFold b=100: n1.0 LinearFold b=050: n1.0

2 hrs s s s s

with even slightly better prediction accuracy!!

43

slide-128
SLIDE 128

… and Also More Accurate!

44

40 50 60 70 80

t R N A 5 S r R N A S R P R N a s e P t m R N A G r

  • u

p I I n t r

  • n

t e l

  • m

e r a s e R N A 1 6 S r R N A 2 3 S r R N A *

Precision

Standard O(n3) search LinearFold: O(n) search * *

40 50 60 70 80

t R N A 5 S r R N A S R P R N a s e P t m R N A G r

  • u

p I I n t r

  • n

t e l

  • m

e r a s e R N A 1 6 S r R N A 2 3 S r R N A **

Recall

Standard O(n3) search LinearFold: O(n) search * * * *

  • esp. on longer sequences and long-range base pairs
slide-129
SLIDE 129

… and Also More Accurate!

44

40 50 60 70 80

t R N A 5 S r R N A S R P R N a s e P t m R N A G r

  • u

p I I n t r

  • n

t e l

  • m

e r a s e R N A 1 6 S r R N A 2 3 S r R N A *

Precision

Standard O(n3) search LinearFold: O(n) search * *

40 50 60 70 80

t R N A 5 S r R N A S R P R N a s e P t m R N A G r

  • u

p I I n t r

  • n

t e l

  • m

e r a s e R N A 1 6 S r R N A 2 3 S r R N A **

Recall

Standard O(n3) search LinearFold: O(n) search * * * *

  • esp. on longer sequences and long-range base pairs

10 20 30 40 50 60 70 1

  • 2

2 1

  • 5

> 5 Precision base pair distance

Standard O(n3) search LinearFold: O(n) search

10 20 30 40 50 60 70 1

  • 2

2 1

  • 5

> 5 Recall base pair distance

Standard O(n3) search LinearFold: O(n) search

slide-130
SLIDE 130

Example: B. Subtilis 16S rRNA (length: 1,552nt)

45

ground-truth

  • ur work:


linear-time standard method:
 cubic-time

slide-131
SLIDE 131

World’s Fastest RNA Structure Prediction Server

46

http://linearfold.eecs.oregonstate.edu:8080/

slide-132
SLIDE 132

Incremental Parsing <=> Incremental Folding

  • humans process sentences incrementally
  • human language sentences evolve to be

incrementally parsable

47

  • RNAs & proteins fold while being assembled
  • RNA & protein sequences evolve to be

incrementally foldable

  • these might explain why linear-time search performs better than exact search

Chinese
 speech English
 text Chinese
 text

slide-133
SLIDE 133

Fast Structure Prediction Enables RNA Design

48

GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

RNA sequence RNA secondary structure RNA 3D structure design structure prediction
 (“folding”)

Professor Rhiju Das
 Stanford Medical School EteRNA game
 (RNA design)

detecting active TB using RNA design which needs our fast RNA folding

slide-134
SLIDE 134

Other Work, Recap, and Vision

slide-135
SLIDE 135

Other Work, Recap, and Vision

slide-136
SLIDE 136

Other Work: Structured Prediction: Linear-Time Learning

50

x y=-1 y=+1 x y

update weights if y ≠ z w

x z

exact
 inference

binary classification

  • nline


learning

slide-137
SLIDE 137

Other Work: Structured Prediction: Linear-Time Learning

50

x y=-1 y=+1 x y

update weights if y ≠ z w

x z

exact
 inference

binary classification structured prediction

. . .

the man bit the dog DT NN VB DT NN

S NP DT the NN man VP VB bit NP DT the NN dog

the man bit the dog

x y

tagging constituency parsing

那 ⼈亻 咬 了僚 狗 the man bit the dog

translation

  • nline


learning dependency parsing

the man bit the dog

scene parsing image segmentation protein
 folding

bit man the dog the

slide-138
SLIDE 138

Other Work: Structured Prediction: Linear-Time Learning

50

x y=-1 y=+1 x y

update weights if y ≠ z w

x z

exact
 inference

binary classification structured prediction

. . .

the man bit the dog DT NN VB DT NN

S NP DT the NN man VP VB bit NP DT the NN dog

the man bit the dog

x y

tagging constituency parsing

那 ⼈亻 咬 了僚 狗 the man bit the dog

translation

  • nline


learning dependency parsing

the man bit the dog

scene parsing image segmentation protein
 folding

linear-time
 inexact
 inference

bit man the dog the

slide-139
SLIDE 139

Other Work: Structured Prediction: Linear-Time Learning

50

x y=-1 y=+1 x y

update weights if y ≠ z w

x z

exact
 inference

binary classification structured prediction

. . .

the man bit the dog DT NN VB DT NN

S NP DT the NN man VP VB bit NP DT the NN dog

the man bit the dog

x y

tagging constituency parsing

那 ⼈亻 咬 了僚 狗 the man bit the dog

translation

  • nline


learning dependency parsing

the man bit the dog

scene parsing image segmentation protein
 folding

linear-time
 inexact
 inference

bit man the dog the

new convergence theorems: 
 (modified) online learning still converges with inexact search

slide-140
SLIDE 140

Other Work: Incremental Semantic Parsing

  • parse NL (e.g., English) into a formal meaning representation
  • type-driven parsing (formal semantics) + polymorphism (PL theory)
  • future work: (simultaneously) translating NL into PL (e.g., SQL)

51

What is the capital of the largest state by area?

slide-141
SLIDE 141

Longer-Term Vision

52

linear-time search algorithms grammar formalisms (context-free & beyond) structured prediction with deep learning

efficiently analyze and generate sequences with hierarchical structures:

natural language, RNA/proteins, programming languages, music, etc.

real-time accompaniment protein
 folding

self.plural is an lambda function with an argument n, which returns result of boolean expression n not equal to 1

NL <=> PL translation

那 ⼈亻 咬 了僚 狗 the man bit the dog

NL translation

self.plural = lambda n: int(n!=1)

slide-142
SLIDE 142

5-Year Vision in Natural Language Processing

53

slide-143
SLIDE 143

5-Year Vision in Natural Language Processing

  • simultaneous translation: from speech-to-text to speech-to-speech
  • incremental text-to-speech synthesis (language production is also incremental!)
  • incremental predictive parsing on the source side (improve reordering)
  • incremental predictive parsing on the target side (improve prosody)

53

slide-144
SLIDE 144

5-Year Vision in Natural Language Processing

  • simultaneous translation: from speech-to-text to speech-to-speech
  • incremental text-to-speech synthesis (language production is also incremental!)
  • incremental predictive parsing on the source side (improve reordering)
  • incremental predictive parsing on the target side (improve prosody)
  • helping the language-impaired with incremental parsing and simultaneous MT
  • simultaneous English <=> ASL translation; intelligent input system

53

slide-145
SLIDE 145

5-Year Vision in Natural Language Processing

  • simultaneous translation: from speech-to-text to speech-to-speech
  • incremental text-to-speech synthesis (language production is also incremental!)
  • incremental predictive parsing on the source side (improve reordering)
  • incremental predictive parsing on the target side (improve prosody)
  • helping the language-impaired with incremental parsing and simultaneous MT
  • simultaneous English <=> ASL translation; intelligent input system
  • online grammatical error correction; automatic or computer-aided writing

53

slide-146
SLIDE 146

5-Year Vision in Natural Language Processing

  • simultaneous translation: from speech-to-text to speech-to-speech
  • incremental text-to-speech synthesis (language production is also incremental!)
  • incremental predictive parsing on the source side (improve reordering)
  • incremental predictive parsing on the target side (improve prosody)
  • helping the language-impaired with incremental parsing and simultaneous MT
  • simultaneous English <=> ASL translation; intelligent input system
  • online grammatical error correction; automatic or computer-aided writing
  • incremental semantic parsing & code generation: NL=>PL (e.g. SQL)
  • also the inverse problem of PL=>NL translation (comment generation)

53

slide-147
SLIDE 147

5-Year Vision in Natural Language Processing

  • simultaneous translation: from speech-to-text to speech-to-speech
  • incremental text-to-speech synthesis (language production is also incremental!)
  • incremental predictive parsing on the source side (improve reordering)
  • incremental predictive parsing on the target side (improve prosody)
  • helping the language-impaired with incremental parsing and simultaneous MT
  • simultaneous English <=> ASL translation; intelligent input system
  • online grammatical error correction; automatic or computer-aided writing
  • incremental semantic parsing & code generation: NL=>PL (e.g. SQL)
  • also the inverse problem of PL=>NL translation (comment generation)
  • how do incremental predictive parsers compare with psycholinguistics data?

53

slide-148
SLIDE 148

5-Year Vision in Natural Language Processing

  • simultaneous translation: from speech-to-text to speech-to-speech
  • incremental text-to-speech synthesis (language production is also incremental!)
  • incremental predictive parsing on the source side (improve reordering)
  • incremental predictive parsing on the target side (improve prosody)
  • helping the language-impaired with incremental parsing and simultaneous MT
  • simultaneous English <=> ASL translation; intelligent input system
  • online grammatical error correction; automatic or computer-aided writing
  • incremental semantic parsing & code generation: NL=>PL (e.g. SQL)
  • also the inverse problem of PL=>NL translation (comment generation)
  • how do incremental predictive parsers compare with psycholinguistics data?

53

helping people communicate across linguistic and accessibility barriers

slide-149
SLIDE 149

5-Year Vision in Computational Biology

  • linear-time incremental folding: from RNA to protein structure prediction
  • predicting crossing structures in RNAs/proteins: use linear-time parsing and


mildly context-sensitive grammars (polynomial-time parsable)

  • how does our beam search compare to real incremental folding in nature?

54

Chomsky Hierarchy

slide-150
SLIDE 150

5-Year Vision in Computational Biology

  • linear-time incremental folding: from RNA to protein structure prediction
  • predicting crossing structures in RNAs/proteins: use linear-time parsing and


mildly context-sensitive grammars (polynomial-time parsable)

  • how does our beam search compare to real incremental folding in nature?

54

Chomsky Hierarchy

reestablishing the forgotten link between computational linguistics and structural biology

slide-151
SLIDE 151
slide-152
SLIDE 152

⾮靟常 感谢 您 来 听 我 的 演讲

Thank you very much for listening to my speech

I eat sushi with tuna from Japan

GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

RNA sequence natural language sentence

Bush met Putin in Moscow

source language sentence

slide-153
SLIDE 153

⾮靟常 感谢 您 来 听 我 的 演讲

Thank you very much for listening to my speech

I eat sushi with tuna from Japan

GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

RNA sequence natural language sentence syntactic structure secondary structure

Bush met Putin in Moscow

source language sentence

布什茶在莫斯科与普京会晤

target-language sequence

Happy Chinese New Year!

slide-154
SLIDE 154

Backup Slide

56

slide-155
SLIDE 155

Backup Slide

56

slide-156
SLIDE 156

Translation with Noisy Speech Input

  • neural MT is fragile, and automatic speech recognition output is noisy
  • our work (Liu et al, ArXiv 2018): Robust Neural MT using phonetic information

57