Robsut Wrod Reocginiton via Semi-Character RNN Keisuke Kevin Ma< Ben Sakaguchi Duh Post Van Durme
Aoccdrnig to a rscheearch at Cmabrigde Uinerv4sy, it deosn’t m8aer in waht oredr the l8eers in a word are, the olny iprmoetnt 4hng is taht the frist and lsat l8eer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. 1
Masked Priming GARDEN Target gadren Prime (60 milliseconds) ######## Forward Mask (500 milliseconds) Forster, K. I.; Davis, C.; Schoknecht, C.; and Carter, R. 1987. Masked priming with graphemically related forms: Repe44on or par4al ac4va4on? The Quarterly Journal of 2 Experimental Psychology 39(2):211–251.
Eye movement tracking Condi@on Example #fixa@on Regression Avg. (%) Fixa@on (ms) Normal The boy could not solve the 10.4 15.0 236 problem so he asked for help. Internal The boy cuold not slove the 11.4 17.6 244 probelm so he aksed for help. End The boy coudl not solev the 12.6 17.5 246 problme so he askde for help. Begin The boy oculd not oslve the 13.0 21.5 259 rpoblem so he saked for help. Rayner, K.; White, S. J.; Johnson, R. L.; and Liversedge, S. P. 2006. Raeding wrods with jubmled le8res: There is a cost. Psychological Science 17(3):192–193. 3
Semi-Character Recurrent Net (scRNN) According to a research Softmax Softmax Softmax Softmax ��� LSTM LSTM LSTM LSTM ��� b i e b i e b i e b i e ��� Aoccdrnig to a rscheearch 4
b n Input i n x n = representa4on: e n The word “University” is represented as b n = { U = 1 } e n = { y = 1 } i n = { e = 1 , i = 2 , n = 1 , s = 1 , r = 1 , t = 1 , v = 1 } 5
Alterna4ves • Character vs. Word in output layer • Different input representa4on, e.g. CharCNN Kim, Y.; Jernite, Y.; Sontag, D.; and Rush, A. M. 2016. Character-aware neural language models. AAAI 6
Non-Neural Alterna4ves • Search all permuta4ons – Should work for most words – Except for anagrams: being/begin, quiet/quite, creditors/directors, views/wives, center/recent, licensed/declines • Edit distance – Spelling checkers 7
Experiment setup: Spelling correc4on • Add noise to Penn Treebank: – Jumble: Cambridge à Cmbarigde – Delete: Cambridge à Camridge – Insert: Cambridge à Cambpridge – One type of noise to every word, except short words and numbers • Training: Noisy text à Normal text • Evalua4on: % words corrected 8
Accuracy Results Jumble Delete Insert CharCNN (Kim et. al.) 16 19 35 Enchant 57 35 89 Commercial A 54 60 93 Commercial B 54 71 73 scRNN 99 85 97 9
Accuracy Results Jumble Delete Insert CharCNN (Kim et. al.) 16 19 35 Enchant 57 35 89 Commercial A 54 60 93 Commercial B 54 71 73 scRNN 99 85 97 10
Accuracy Results Jumble Delete Insert CharCNN (Kim et. al.) 16 19 35 Enchant 57 35 89 Commercial A 54 60 93 Commercial B 54 71 73 scRNN 99 85 97 11
Effect of context BPTT 12
Effect of model size 13
Summary A simple model for correc4ng jumbled text According to a research Softmax Softmax Softmax Softmax ��� LSTM LSTM LSTM LSTM ��� b i e b i e b i e b i e ��� Aoccdrnig to a rscheearch 14
Discussion • We can achieve 99% accuracy for this matched condi4on, but… • One model for all noisy condi4ons? – Cooooool à cool – Speling mistake -> Spelling mistake – Noise beyond word level 15
More info • Mul4lingual analysis of jumbled text: h8p://www.mrc-cbu.cam.ac.uk/personal/ ma8.davis/Cmabrigde/ • Paper (AAAI2017): h8p://www.cs.jhu.edu/~kevinduh/a/ sakaguchi17robsut.pdf 16
Recommend
More recommend