Human-Inspired Structured Prediction for Language and Biology
Liang Huang
Principal Scientist, Baidu Research Assistant Professor, Oregon State University
Human-Inspired Structured Prediction for Language and Biology Liang - - PowerPoint PPT Presentation
Human-Inspired Structured Prediction for Language and Biology Liang Huang Principal Scientist, Baidu Research Assistant Professor, Oregon State University incremental & linear-time Human-Inspired Structured Prediction for Language and
Principal Scientist, Baidu Research Assistant Professor, Oregon State University
Principal Scientist, Baidu Research Assistant Professor, Oregon State University
incremental & linear-time
Principal Scientist, Baidu Research Assistant Professor, Oregon State University
simultaneous interpretation
incremental & linear-time
Principal Scientist, Baidu Research Assistant Professor, Oregon State University
I eat sushi with tuna from Japan
natural language sequence syntactic structure simultaneous interpretation
incremental & linear-time
Principal Scientist, Baidu Research Assistant Professor, Oregon State University
I eat sushi with tuna from Japan
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
RNA sequence natural language sequence syntactic structure secondary structure
G C G G G A A U A G C U C A G U U G G U A G A G C A C G A C C U U G C C A A G G U C G G G G U C G C G A G U U C G A G U C U C G U U U C C C G C U C C A 1 10 20 30 40 50 60 70 76simultaneous interpretation
incremental & linear-time
2
Ashish Vaswani (USC, 2014)
(co-advised by David Chiang)
Senior Research Scientist Google Brain
first author of Transformer “Attention is All You Need”
James Cross (OSU, 2016)
(co-advised by David Chiang)
Research Scientist
EMNLP 2016 Best Paper Honorable Mention
Kai Zhao (OSU, 2017)
(co-advised by David Chiang)
Research Scientist Google
11 top-conference papers (ACL/EMNLP/NAACL)
Mingbo Ma (OSU, 2018)
(co-advised by David Chiang)
Research Scientist Baidu Research USA breakthrough in simultaneous translation
My PhD Graduates
2
Ashish Vaswani (USC, 2014)
(co-advised by David Chiang)
Senior Research Scientist Google Brain
first author of Transformer “Attention is All You Need”
James Cross (OSU, 2016)
(co-advised by David Chiang)
Research Scientist
EMNLP 2016 Best Paper Honorable Mention
Kai Zhao (OSU, 2017)
(co-advised by David Chiang)
Research Scientist Google
11 top-conference papers (ACL/EMNLP/NAACL)
Mingbo Ma (OSU, 2018)
(co-advised by David Chiang)
Research Scientist Baidu Research USA breakthrough in simultaneous translation
My PhD Graduates
I eat sushi with tuna from Japan
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
RNA sequence natural language sentence
My Research: Efficient Structured Prediction
Bush met Putin in Moscow
source language sentence
2
Ashish Vaswani (USC, 2014)
(co-advised by David Chiang)
Senior Research Scientist Google Brain
first author of Transformer “Attention is All You Need”
James Cross (OSU, 2016)
(co-advised by David Chiang)
Research Scientist
EMNLP 2016 Best Paper Honorable Mention
Kai Zhao (OSU, 2017)
(co-advised by David Chiang)
Research Scientist Google
11 top-conference papers (ACL/EMNLP/NAACL)
Mingbo Ma (OSU, 2018)
(co-advised by David Chiang)
Research Scientist Baidu Research USA breakthrough in simultaneous translation
My PhD Graduates
I eat sushi with tuna from Japan
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
RNA sequence natural language sentence syntactic structure secondary structure
My Research: Efficient Structured Prediction
Bush met Putin in Moscow
source language sentence
布什茶在莫斯科与普京会晤
target-language sequence
3
Aravind Joshi (1929-2018)
3
Aravind Joshi (1929-2018)
3
Aravind Joshi (1929-2018)
4
Aravind Joshi (1929-2018)
4
Aravind Joshi (1929-2018)
4
Aravind Joshi (1929-2018)
4
Aravind Joshi (1929-2018)
5
6
6
6
6
6
But humans can resolve these ambiguities incremental in linear-time!
7
I eat sushi with tuna from Japan
7
I eat sushi with tuna from Japan
7
I eat sushi with tuna from Japan
O(n)
7
I eat sushi with tuna from Japan I eat sushi with tuna from Japan
O(n3)
O(n)
7
I eat sushi with tuna from Japan I eat sushi with tuna from Japan
O(n3)
O(n)
hsi I do like eating fish h/si f0 b0
1
f1 b1
2
f2 b2
3
f3 b3
4
f4 b4
5
f5 b5
7
I eat sushi with tuna from Japan I eat sushi with tuna from Japan
O(n3)
O(n)
hsi I do like eating fish h/si f0 b0
1
f1 b1
2
f2 b2
3
f3 b3
4
f4 b4
5
f5 b5
…
… wait whole source sentence …
1 2
source: target:
4 1 2 3 5
seq-to-seq
8
simultaneous translation incremental parsing linear-time RNA structure prediction
8
I eat sushi with tuna from Japan
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
RNA sequence natural language sentence
Bush met Putin in Moscow
source language sentence
simultaneous translation incremental parsing linear-time RNA structure prediction
8
I eat sushi with tuna from Japan
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
RNA sequence natural language sentence syntactic structure secondary structure
Bush met Putin in Moscow
source language sentence
布什茶在莫斯科与普京会晤
target-language sequence
simultaneous translation incremental parsing linear-time RNA structure prediction
(Ma, Huang, et al, ArXiv 2018; under review)
(Ma, Huang, et al, ArXiv 2018; under review)
consecutive interpretation multiplicative latency (x2) simultaneous interpretation additive latency (+3 secs)
consecutive interpretation multiplicative latency (x2) simultaneous interpretation additive latency (+3 secs)
simultaneous interpretation is extremely difficult
interpreters world-wide each interpreter can only sustain for at most 10-30 minutes the best interpreters can only cover ~60% of the source material
consecutive interpretation multiplicative latency (x2) simultaneous interpretation additive latency (+3 secs)
simultaneous interpretation is extremely difficult
interpreters world-wide each interpreter can only sustain for at most 10-30 minutes the best interpreters can only cover ~60% of the source material
just use standard full-sentence translation (e.g., seq-to-seq)
need fundamentally different ideas!
11
Baidu World Conference, November 2017 Baidu World Conference, November 2018
work
full-sentence (non-simultaneous) translation latency: one sentence (10+ secs) simultaneous translation achieved for the first time latency ~3 secs and many other companies
11
Baidu World Conference, November 2017 Baidu World Conference, November 2018
work
full-sentence (non-simultaneous) translation latency: one sentence (10+ secs) simultaneous translation achieved for the first time latency ~3 secs and many other companies
11
Baidu World Conference, November 2017 Baidu World Conference, November 2018
work
full-sentence (non-simultaneous) translation latency: one sentence (10+ secs) simultaneous translation achieved for the first time latency ~3 secs and many other companies
11
Baidu World Conference, November 2017 Baidu World Conference, November 2018
work
full-sentence (non-simultaneous) translation latency: one sentence (10+ secs) simultaneous translation achieved for the first time latency ~3 secs and many other companies
11
Baidu World Conference, November 2017 Baidu World Conference, November 2018
work
full-sentence (non-simultaneous) translation latency: one sentence (10+ secs) simultaneous translation achieved for the first time latency ~3 secs and many other companies
Grissom et al, 2014
Grissom et al, 2014
President Bush meets with Russian President Putin in Moscow
Grissom et al, 2014
non-anticipative: President Bush (…… waiting ……) meets with Russian … President Bush meets with Russian President Putin in Moscow
Grissom et al, 2014
non-anticipative: President Bush (…… waiting ……) meets with Russian … President Bush meets with Russian President Putin in Moscow anticipative: President Bush meets with Russian President Putin in Moscow
… wait whole source sentence …
1 2
source: target:
4 1 2 3 5
seq-to-seq
4 1 2 3
wait k words
1 2
source: target:
5
prefix-to-prefix (wait-k)
always k words behind source sentence
p(yi | x1 … xn , y1…yi-1)
p(yi | x1 … xi+k-1 , y1…yi-1)
… wait whole source sentence …
1 2
source: target:
4 1 2 3 5
seq-to-seq
4 1 2 3
wait k words
1 2
source: target:
5
prefix-to-prefix (wait-k)
always k words behind source sentence
President
Bùshí
布什茶
Bush zǒngtǒng
总统
President
wait 2
p(yi | x1 … xn , y1…yi-1)
p(yi | x1 … xi+k-1 , y1…yi-1)
… wait whole source sentence …
1 2
source: target:
4 1 2 3 5
seq-to-seq
4 1 2 3
wait k words
1 2
source: target:
5
prefix-to-prefix (wait-k)
always k words behind source sentence
President Bush
Bùshí
布什茶
Bush zǒngtǒng
总统
President zài
在
in
wait 2
p(yi | x1 … xn , y1…yi-1)
p(yi | x1 … xi+k-1 , y1…yi-1)
… wait whole source sentence …
1 2
source: target:
4 1 2 3 5
seq-to-seq
4 1 2 3
wait k words
1 2
source: target:
5
prefix-to-prefix (wait-k)
always k words behind source sentence
President Bush meets
Bùshí
布什茶
Bush zǒngtǒng
总统
President zài
在
in Mòsīkē
莫斯科
Moscow
wait 2
p(yi | x1 … xn , y1…yi-1)
p(yi | x1 … xi+k-1 , y1…yi-1)
… wait whole source sentence …
1 2
source: target:
4 1 2 3 5
seq-to-seq
4 1 2 3
wait k words
1 2
source: target:
5
prefix-to-prefix (wait-k)
always k words behind source sentence
President Bush meets with
Bùshí
布什茶
Bush zǒngtǒng
总统
President zài
在
in Mòsīkē
莫斯科
Moscow yǔ
与
with
wait 2
p(yi | x1 … xn , y1…yi-1)
p(yi | x1 … xi+k-1 , y1…yi-1)
… wait whole source sentence …
1 2
source: target:
4 1 2 3 5
seq-to-seq
4 1 2 3
wait k words
1 2
source: target:
5
prefix-to-prefix (wait-k)
always k words behind source sentence
President Bush meets with Russian
Bùshí
布什茶
Bush zǒngtǒng
总统
President zài
在
in Mòsīkē
莫斯科
Moscow yǔ
与
with Éluósī
俄罗斯
Russian
wait 2
p(yi | x1 … xn , y1…yi-1)
p(yi | x1 … xi+k-1 , y1…yi-1)
… wait whole source sentence …
1 2
source: target:
4 1 2 3 5
seq-to-seq
4 1 2 3
wait k words
1 2
source: target:
5
prefix-to-prefix (wait-k)
always k words behind source sentence
President Bush meets with Russian President
Bùshí
布什茶
Bush zǒngtǒng
总统
President zài
在
in Mòsīkē
莫斯科
Moscow yǔ
与
with zǒngtǒng
总统
President Éluósī
俄罗斯
Russian
wait 2
p(yi | x1 … xn , y1…yi-1)
p(yi | x1 … xi+k-1 , y1…yi-1)
… wait whole source sentence …
1 2
source: target:
4 1 2 3 5
seq-to-seq
4 1 2 3
wait k words
1 2
source: target:
5
prefix-to-prefix (wait-k)
always k words behind source sentence
President Bush meets with Russian President
Bùshí
布什茶
Bush zǒngtǒng
总统
President zài
在
in Mòsīkē
莫斯科
Moscow yǔ
与
with zǒngtǒng
总统
President Éluósī
俄罗斯
Russian Pǔjīng
普京
Putin
Putin
wait 2
p(yi | x1 … xn , y1…yi-1)
p(yi | x1 … xi+k-1 , y1…yi-1)
… wait whole source sentence …
1 2
source: target:
4 1 2 3 5
seq-to-seq
4 1 2 3
wait k words
1 2
source: target:
5
prefix-to-prefix (wait-k)
always k words behind source sentence
President Bush meets with Russian President
Bùshí
布什茶
Bush zǒngtǒng
总统
President zài
在
in Mòsīkē
莫斯科
Moscow yǔ
与
with zǒngtǒng
总统
President Éluósī
俄罗斯
Russian Pǔjīng
普京
Putin
Putin in Moscow
huìwù
会晤
meet
wait 2
p(yi | x1 … xn , y1…yi-1)
p(yi | x1 … xi+k-1 , y1…yi-1)
14
This is just our research demo. Our production system is better (shorter ASR latency).
14
This is just our research demo. Our production system is better (shorter ASR latency).
14
This is just our research demo. Our production system is better (shorter ASR latency).
江 泽⺠氒 对 法国 总统 的 来华 访问 表示 感谢 。
jiāng zé mín d u ì fǎ guó zǒng tǒng d e l á i huá fǎng wèn biǎo shì gǎn xiè
jiang zemin to French President ’s to-China visit express gratitude
jiang zemin expressed his appreciation for the visit by french president .
15
15
16
This is live recording from the Baidu World Conference on Nov 1, 2018.
16
This is live recording from the Baidu World Conference on Nov 1, 2018.
17
German source: doch während man sich im kongress nicht auf ein vorgehen einigen kann , warten mehrere bundesstaaten nicht länger .
but while they self in congress not on one action agree can wait several states not longer
English translation (simultaneous, wait 3): but , while congress does not agree on a course of action , several states no longer wait . English translation (full-sentence baseline): but , while congressional action can not be agreed , several states are no longer waiting .
17
German source: doch während man sich im kongress nicht auf ein vorgehen einigen kann , warten mehrere bundesstaaten nicht länger .
but while they self in congress not on one action agree can wait several states not longer
English translation (simultaneous, wait 3): but , while congress does not agree on a course of action , several states no longer wait . English translation (full-sentence baseline): but , while congressional action can not be agreed , several states are no longer waiting .
Gu et al. (2017) full-sentence baselines
17
German source: doch während man sich im kongress nicht auf ein vorgehen einigen kann , warten mehrere bundesstaaten nicht länger .
but while they self in congress not on one action agree can wait several states not longer
English translation (simultaneous, wait 3): but , while congress does not agree on a course of action , several states no longer wait . English translation (full-sentence baseline): but , while congressional action can not be agreed , several states are no longer waiting .
Gu et al. (2017) full-sentence baselines
wait 8 words
I traveled to Ulm by train full-sentence baseline: CW = 8
wait 2 wait 6 words
I traveled to Ulm by train Gu et al. (2017): CW = (2+6)/2 = 4
wait 4
took I a train to Ulm
1 1 1 1
18
18
19
关于 克林淋顿主义 , 没有 准确 的 定义
guānyú k è l í n d ù n z h ǔ y ì méiyǒu zhǔnquè d e d ì n g y ì
about Clintonism no accurate def. “There is no accurate definition of Clintonism.”
习近平 于 2012 年憐 在 北磻京 当选
x í j ì n p í n g y ú nián z à i b ě i j ī n g dāngxuǎn
Xi Jiping in 2012 yr in Beijing elected “Xi Jinping was elected in Beijing in 2012”
mandatory reordering (i.e., wait):
reference translation
19
关于 克林淋顿主义 , 没有 准确 的 定义
guānyú k è l í n d ù n z h ǔ y ì méiyǒu zhǔnquè d e d ì n g y ì
about Clintonism no accurate def. “There is no accurate definition of Clintonism.”
习近平 于 2012 年憐 在 北磻京 当选
x í j ì n p í n g y ú nián z à i b ě i j ī n g dāngxuǎn
Xi Jiping in 2012 yr in Beijing elected “Xi Jinping was elected in Beijing in 2012” About Clintonism, there is no accurate definition.
mandatory reordering (i.e., wait):
Xi Jinping ….. was elected…
reference translation ideal simultaneous
19
关于 克林淋顿主义 , 没有 准确 的 定义
guānyú k è l í n d ù n z h ǔ y ì méiyǒu zhǔnquè d e d ì n g y ì
about Clintonism no accurate def. “There is no accurate definition of Clintonism.”
习近平 于 2012 年憐 在 北磻京 当选
x í j ì n p í n g y ú nián z à i b ě i j ī n g dāngxuǎn
Xi Jiping in 2012 yr in Beijing elected “Xi Jinping was elected in Beijing in 2012” About Clintonism, there is no accurate definition.
VP PP PP VP VP NP S S PP S
mandatory reordering (i.e., wait):
(Chinese) PP VP => (English) VP PP (Chinese) PP S => (English) PP S or S PP
Xi Jinping ….. was elected…
reference translation ideal simultaneous
(Huang & Sagae, ACL 2010*; Goldberg, Zhao, Huang, ACL 2013; Zhao, Cross, Huang, EMNLP 2013; Mi & Huang, ACL 2015; Cross & Huang, ACL 2016; Cross & Huang, EMNLP 2016** Hong and Huang, ACL 2018)
* best paper nominee ** best paper honorable mention
S NP DT the NN man VP VB bit NP DT the NN dog
the man bit the dog
constituency parsing dependency parsing
the man bit the dog
bit man the dog the
(Huang & Sagae, ACL 2010*; Goldberg, Zhao, Huang, ACL 2013; Zhao, Cross, Huang, EMNLP 2013; Mi & Huang, ACL 2015; Cross & Huang, ACL 2016; Cross & Huang, EMNLP 2016** Hong and Huang, ACL 2018)
* best paper nominee ** best paper honorable mention
S NP DT the NN man VP VB bit NP DT the NN dog
the man bit the dog
constituency parsing dependency parsing
the man bit the dog
bit man the dog the
21
22
:= id x + id y const 3
x = y + 3;
I eat sushi with tuna from Japan
I eat sushi with tuna from Japan
22
:= id x + id y const 3
x = y + 3;
I eat sushi with tuna from Japan
I eat sushi with tuna from Japan
O(n) O(n3) O(n)
inspired by human sentence processing and compilers?
22
:= id x + id y const 3
x = y + 3;
I eat sushi with tuna from Japan
I eat sushi with tuna from Japan
O(n) O(n3) O(n)
23
23
this work
O(n2) O(n) O(n2.4) O(n2.5)
23
this work DP: exponential
non-DP beam search
O(n2) O(n) O(n2.4) O(n2.5)
24
I eat sushi with tuna from Japan
24
I eat sushi with tuna from Japan
24
action stack queue
I eat sushi with tuna from Japan
24
action stack queue
I eat sushi ...
24
action stack queue
I eat sushi ... eat sushi with ...
I
shift
I eat sushi with tuna from Japan
24
action stack queue
I eat sushi ... eat sushi with ... sushi with tuna ...
I eat I
shift 2 shift
I eat sushi with tuna from Japan
24
action stack queue
I eat sushi ... eat sushi with ... sushi with tuna ... sushi with tuna ...
I eat I eat I
shift 2 shift 3 l-reduce
I eat sushi with tuna from Japan
24
action stack queue
I eat sushi ... eat sushi with ... sushi with tuna ... sushi with tuna ... with tuna from ...
I eat I eat I eat sushi I
shift 2 shift 3 l-reduce 4 shift
I eat sushi with tuna from Japan
24
action stack queue
I eat sushi ... eat sushi with ... sushi with tuna ... sushi with tuna ... with tuna from ... with tuna from ...
I eat I eat I eat sushi I eat I sushi
shift 2 shift 3 l-reduce 4 shift 5a r-reduce
I eat sushi with tuna from Japan
24
action stack queue
I eat sushi ... eat sushi with ... sushi with tuna ... sushi with tuna ... with tuna from ... with tuna from ... tuna from Japan ...
I eat I eat I eat sushi I eat I sushi eat sushi with I
shift 2 shift 3 l-reduce 4 shift 5a r-reduce 5b shift
I eat sushi with tuna from Japan
24
action stack queue
shift-reduce conflict
I eat sushi ... eat sushi with ... sushi with tuna ... sushi with tuna ... with tuna from ... with tuna from ... tuna from Japan ...
I eat I eat I eat sushi I eat I sushi eat sushi with I
shift 2 shift 3 l-reduce 4 shift 5a r-reduce 5b shift
I eat sushi with tuna from Japan
25
sh l-re r-re
26
27
27
28 (Huang and Sagae, 2010)
29 (Huang and Sagae, 2010)
30 (Huang and Sagae, 2010)
30
each DP state corresponds to exponentially many non-DP states
(Huang and Sagae, 2010)
graph-structured stack
(Tomita, 1986)
31
each DP state corresponds to exponentially many non-DP states
(Huang and Sagae, 2010)
DP: exponential
non-DP beam search
31
each DP state corresponds to exponentially many non-DP states
(Huang and Sagae, 2010)
DP: exponential
non-DP beam search
graph-structured stack
(Tomita, 1986)
32
I sushi I eat sushi eat sushi
(Huang and Sagae, 2010)
32
I sushi I eat sushi eat sushi
(Huang and Sagae, 2010)
two equivalent classes
... eat sushi ... I sushi
32
I sushi I eat sushi eat sushi
(Huang and Sagae, 2010)
psycholinguistic evidence (eye-tracking experiments): delayed disambiguation
John and Mary had 2 papers John and Mary had 2 papers
Frazier and Rayner (1990), Frazier (1999)
two equivalent classes
... eat sushi ... I sushi
32
I sushi I eat sushi eat sushi
(Huang and Sagae, 2010)
psycholinguistic evidence (eye-tracking experiments): delayed disambiguation
John and Mary had 2 papers John and Mary had 2 papers
Frazier and Rayner (1990), Frazier (1999)
two equivalent classes
... eat sushi ... I sushi
each together
33
Parsre Note F1 Score
Durett + Klein 2015
cubic-time parser 91.1
Cross + Huang 2016
(greedy) 91.3
Liu + Zhang 2016
greedy / beam 91.7
Dyer et al. 2016
greedy / beam 91.7
Stern 2017a
cubic-time span-based parser 91.79 Our Work linear-time dynamic programming, span-based 91.97
Constituency parsing, PTB only, Single Model, End-to-End
(Hong and Huang, 2018)
c h a r t p a r s i n g ( c u b i c
i m e ) This Work
(Huang et al, 2019; under review)
(Huang et al, 2019; under review)
1955 Chomsky: context-free grammars 1953 Watson & Crick: DNA double-helix
linguistics biology computer science
1960s CKY Parsing: O(n3) 1965 Knuth: LR Parsing: O(n) 1958 Backus & Naur: CFGs in programming lang. 1986 Tomita: Generalized LR Parsing 2010: linear-time DP parsing (Huang & Sagae) 1980s: O(n3) CKY for RNA structures 2018: linear-time RNA structure prediction
35
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
S NP DT the NN man VP VB bit NP DT the NN dog
bit man the dog the
1955 Chomsky: context-free grammars 1953 Watson & Crick: DNA double-helix
linguistics biology computer science
1960s CKY Parsing: O(n3) 1965 Knuth: LR Parsing: O(n) 1958 Backus & Naur: CFGs in programming lang. 1986 Tomita: Generalized LR Parsing 2010: linear-time DP parsing (Huang & Sagae) 1980s: O(n3) CKY for RNA structures 2018: linear-time RNA structure prediction
35
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
S NP DT the NN man VP VB bit NP DT the NN dog
bit man the dog the
36
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
RNA sequence
RNA has dual roles: informational (DNA=>RNA=>protein) functional (non-coding RNAs) knowing structures can infer function
36
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
RNA sequence secondary structure structure prediction (“RNA folding”)
RNA has dual roles: informational (DNA=>RNA=>protein) functional (non-coding RNAs) knowing structures can infer function
37 GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
allowed pairs: G-C A-U G-U assume no crossing pairs
x
37
input example: transfer RNA (tRNA)
37 GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....
allowed pairs: G-C A-U G-U assume no crossing pairs
x y
37
input
example: transfer RNA (tRNA)
37 GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....
allowed pairs: G-C A-U G-U assume no crossing pairs
x y
37
G C G G G A A U A G C U C A G U U G G U A G A G C A C G A C C U U G C C A A G G U C G G G G U C G C G A G U U C G A G U C U C G U U U C C C G C U C C A
1 10 20 30 40 50 60 70 76
input
example: transfer RNA (tRNA)
parse tree
37 GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....
allowed pairs: G-C A-U G-U assume no crossing pairs
x y
37
G C G G G A A U A G C U C A G U U G G U A G A G C A C G A C C U U G C C A A G G U C G G G G U C G C G A G U U C G A G U C U C G U U U C C C G C U C C A
1 10 20 30 40 50 60 70 76
input
example: transfer RNA (tRNA)
parse tree
37 GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....
allowed pairs: G-C A-U G-U assume no crossing pairs
x y
37
problem: standard structure prediction algorithms are way too slow: O(n3)
G C G G G A A U A G C U C A G U U G G U A G A G C A C G A C C U U G C C A A G G U C G G G G U C G C G A G U U C G A G U C U C G U U U C C C G C U C C A
1 10 20 30 40 50 60 70 76
input
example: transfer RNA (tRNA)
O(n3)
S NP DT the NN man VP VB bit NP DT the NN dog
parse tree
37 GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....
allowed pairs: G-C A-U G-U assume no crossing pairs
x y
37
problem: standard structure prediction algorithms are way too slow: O(n3) solution: adapt my linear-time dynamic programming algorithms from parsing
G C G G G A A U A G C U C A G U U G G U A G A G C A C G A C C U U G C C A A G G U C G G G G U C G C G A G U U C G A G U C U C G U U U C C C G C U C C A
1 10 20 30 40 50 60 70 76
input
example: transfer RNA (tRNA)
O(n3)
S NP DT the NN man VP VB bit NP DT the NN dog
5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
38
5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....
38
( . )
5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
(((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....
39
( . )
5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....
40
( . )
5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
(((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....
41
5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
each DP state corresponds to exponentially many non-DP states
(((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....
41
5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
each DP state corresponds to exponentially many non-DP states
(((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....
41
5’ 3’ GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
each DP state corresponds to exponentially many non-DP states
(((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....
41
b e a m s e a r c h
42
psycholinguistic evidence (eye-tracking experiments): delayed disambiguation
John and Mary had 2 papers John and Mary had 2 papers
Frazier and Rayner (1990), Frazier (1999)
‘ ( (( ((. ((.) ((.)) . .( .(. .(.) .(.).
. ( . ) . ( ( . ) )
42
psycholinguistic evidence (eye-tracking experiments): delayed disambiguation
John and Mary had 2 papers John and Mary had 2 papers
Frazier and Rayner (1990), Frazier (1999)
‘ ( (( ((. ((.) ((.)) . .( .(. .(.) .(.).
. ( . ) . ( ( . ) )
‘ ( ?( ?(. ((.) ((.)) . .(.)
. ( . . ) . ( ( ) )
packing unpacking
42
psycholinguistic evidence (eye-tracking experiments): delayed disambiguation
John and Mary had 2 papers John and Mary had 2 papers
Frazier and Rayner (1990), Frazier (1999)
each together
John and Mary … had 2 papers … together … each
‘ ( (( ((. ((.) ((.)) . .( .(. .(.) .(.).
. ( . ) . ( ( . ) )
‘ ( ?( ?(. ((.) ((.)) . .(.)
. ( . . ) . ( ( ) )
packing unpacking
43
10,000nt (~HIV) 4min 7s 244,296nt (longest in RNAcentral) ~200hrs 120s
1 2 3 4 5 6 7 8 9 1000nt 2000nt 3000nt
CONTRAfold MFE, ~n2.6 V i e n n a R N A f
d , ~ n2.6 LinearFold b=100, ~n1.0 LinearFold b=50, ~n
1 .
running time per sequence (sec)
1 10 100 1000 103nt 104nt 105nt
Vienna RNAfold: n2.6 CONTRAfold MFE: n2.6 LinearFold b=100: n1.0 LinearFold b=050: n1.0
2 hrs s s s s
with even slightly better prediction accuracy!!
43
44
40 50 60 70 80
t R N A 5 S r R N A S R P R N a s e P t m R N A G r
p I I n t r
t e l
e r a s e R N A 1 6 S r R N A 2 3 S r R N A *
Precision
Standard O(n3) search LinearFold: O(n) search * *
40 50 60 70 80
t R N A 5 S r R N A S R P R N a s e P t m R N A G r
p I I n t r
t e l
e r a s e R N A 1 6 S r R N A 2 3 S r R N A **
Recall
Standard O(n3) search LinearFold: O(n) search * * * *
44
40 50 60 70 80
t R N A 5 S r R N A S R P R N a s e P t m R N A G r
p I I n t r
t e l
e r a s e R N A 1 6 S r R N A 2 3 S r R N A *
Precision
Standard O(n3) search LinearFold: O(n) search * *
40 50 60 70 80
t R N A 5 S r R N A S R P R N a s e P t m R N A G r
p I I n t r
t e l
e r a s e R N A 1 6 S r R N A 2 3 S r R N A **
Recall
Standard O(n3) search LinearFold: O(n) search * * * *
10 20 30 40 50 60 70 1
2 1
> 5 Precision base pair distance
Standard O(n3) search LinearFold: O(n) search
10 20 30 40 50 60 70 1
2 1
> 5 Recall base pair distance
Standard O(n3) search LinearFold: O(n) search
45
ground-truth
linear-time standard method: cubic-time
46
http://linearfold.eecs.oregonstate.edu:8080/
incrementally parsable
47
incrementally foldable
Chinese speech English text Chinese text
48
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
RNA sequence RNA secondary structure RNA 3D structure design structure prediction (“folding”)
Professor Rhiju Das Stanford Medical School EteRNA game (RNA design)
detecting active TB using RNA design which needs our fast RNA folding
50
x y=-1 y=+1 x y
update weights if y ≠ z w
x z
exact inference
binary classification
learning
50
x y=-1 y=+1 x y
update weights if y ≠ z w
x z
exact inference
binary classification structured prediction
. . .
the man bit the dog DT NN VB DT NN
S NP DT the NN man VP VB bit NP DT the NN dog
the man bit the dog
x y
tagging constituency parsing
那 ⼈亻 咬 了僚 狗 the man bit the dog
translation
learning dependency parsing
the man bit the dog
scene parsing image segmentation protein folding
bit man the dog the
50
x y=-1 y=+1 x y
update weights if y ≠ z w
x z
exact inference
binary classification structured prediction
. . .
the man bit the dog DT NN VB DT NN
S NP DT the NN man VP VB bit NP DT the NN dog
the man bit the dog
x y
tagging constituency parsing
那 ⼈亻 咬 了僚 狗 the man bit the dog
translation
learning dependency parsing
the man bit the dog
scene parsing image segmentation protein folding
linear-time inexact inference
bit man the dog the
50
x y=-1 y=+1 x y
update weights if y ≠ z w
x z
exact inference
binary classification structured prediction
. . .
the man bit the dog DT NN VB DT NN
S NP DT the NN man VP VB bit NP DT the NN dog
the man bit the dog
x y
tagging constituency parsing
那 ⼈亻 咬 了僚 狗 the man bit the dog
translation
learning dependency parsing
the man bit the dog
scene parsing image segmentation protein folding
linear-time inexact inference
bit man the dog the
new convergence theorems: (modified) online learning still converges with inexact search
51
What is the capital of the largest state by area?
52
linear-time search algorithms grammar formalisms (context-free & beyond) structured prediction with deep learning
natural language, RNA/proteins, programming languages, music, etc.
real-time accompaniment protein folding
self.plural is an lambda function with an argument n, which returns result of boolean expression n not equal to 1
NL <=> PL translation
那 ⼈亻 咬 了僚 狗 the man bit the dog
NL translation
self.plural = lambda n: int(n!=1)
53
53
53
53
53
53
53
helping people communicate across linguistic and accessibility barriers
54
Chomsky Hierarchy
54
Chomsky Hierarchy
reestablishing the forgotten link between computational linguistics and structural biology
I eat sushi with tuna from Japan
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
RNA sequence natural language sentence
Bush met Putin in Moscow
source language sentence
I eat sushi with tuna from Japan
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
RNA sequence natural language sentence syntactic structure secondary structure
Bush met Putin in Moscow
source language sentence
布什茶在莫斯科与普京会晤
target-language sequence
56
56
57