Robsut Wrod Reocginiton
via Semi-Character RNN
Keisuke Sakaguchi Kevin Duh Ma< Post Ben Van Durme
Aoccdrnig to a rscheearch at Cmabrigde Uinerv4sy, it deosnt m8aer in - - PowerPoint PPT Presentation
Robsut Wrod Reocginiton via Semi-Character RNN Keisuke Kevin Ma< Ben Sakaguchi Duh Post Van Durme Aoccdrnig to a rscheearch at Cmabrigde Uinerv4sy, it deosnt m8aer in waht oredr the l8eers in a word are, the olny iprmoetnt 4hng is
Keisuke Sakaguchi Kevin Duh Ma< Post Ben Van Durme
1
Forward Mask (500 milliseconds)
GARDEN gadren ########
Prime (60 milliseconds) Target
2
Forster, K. I.; Davis, C.; Schoknecht, C.; and Carter, R. 1987. Masked priming with graphemically related forms: Repe44on or par4al ac4va4on? The Quarterly Journal of Experimental Psychology 39(2):211–251.
3
Condi@on Example
#fixa@on Regression (%) Avg. Fixa@on (ms) Normal The boy could not solve the problem so he asked for help. 10.4 15.0 236 Internal The boy cuold not slove the probelm so he aksed for help. 11.4 17.6 244 End The boy coudl not solev the problme so he askde for help. 12.6 17.5 246 Begin The boy oculd not oslve the rpoblem so he saked for help. 13.0 21.5 259
Rayner, K.; White, S. J.; Johnson, R. L.; and Liversedge, S. P. 2006. Raeding wrods with jubmled le8res: There is a cost. Psychological Science 17(3):192–193.
LSTM
LSTM
LSTM
LSTM
Softmax Softmax Softmax Softmax
4
5
6
Kim, Y.; Jernite, Y.; Sontag, D.; and Rush, A. M. 2016. Character-aware neural language
– Should work for most words – Except for anagrams: being/begin, quiet/quite, creditors/directors, views/wives, center/recent, licensed/declines
– Spelling checkers
7
– Jumble: Cambridge à Cmbarigde – Delete: Cambridge à Camridge – Insert: Cambridge à Cambpridge – One type of noise to every word, except short words and numbers
8
Jumble Delete Insert CharCNN (Kim et. al.) 16 19 35 Enchant 57 35 89 Commercial A 54 60 93 Commercial B 54 71 73 scRNN 99 85 97
9
Jumble Delete Insert CharCNN (Kim et. al.) 16 19 35 Enchant 57 35 89 Commercial A 54 60 93 Commercial B 54 71 73 scRNN 99 85 97
10
Jumble Delete Insert CharCNN (Kim et. al.) 16 19 35 Enchant 57 35 89 Commercial A 54 60 93 Commercial B 54 71 73 scRNN 99 85 97
11
12
BPTT
13
b i e
LSTM
b i e
LSTM
b i e
LSTM
b i e
LSTM
Softmax Softmax Softmax Softmax
a rscheearch
to a research
14
A simple model for correc4ng jumbled text
– Cooooool à cool – Speling mistake -> Spelling mistake – Noise beyond word level
15
16