sequence prediction using neural network classifiers
play

Sequence Prediction Using Neural Network Classifiers Yanpeng Zhao - PowerPoint PPT Presentation

Sequence Prediction Using Neural Network Classifiers Yanpeng Zhao ShanghaiTech University ICGI, Oct 7 th , 2016, Delft, the Netherlands Sequence Prediction Whats the next symbol? 4 3 5 0 4 6 1 3 1 ? Classification Perspective -1


  1. Sequence Prediction Using Neural Network Classifiers Yanpeng Zhao ShanghaiTech University ICGI, Oct 7 th , 2016, Delft, the Netherlands

  2. Sequence Prediction What’s the next symbol? 4 3 5 0 4 6 1 3 1 ?

  3. Classification Perspective -1 0 1 2 3 4 5 6 the most likely Input Sequence next symbol 4 3 5 0 4 6 1 3 1 Classifier Multinomial

  4. Representation of Inputs Continuous vector representation of the discrete symbols King – Man + Woman ≈ Queen Images are from: https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/

  5. Representation of Inputs Construct inputs for classifiers using learned word vectors -2 -2 -2 -2 -2 4 3 5 0 4 6 1 3 1 2 -1 5 Label � � Input Sample � � � �� � �� � �� Word vectors are concatenated or stacked Predict the next symbol from the previous � = 15 symbols, each represented by a 30 -dimension vector

  6. Input Test Sequence 4 3 5 0 4 6 1 3 1 Neural Network Classifiers � � � � � � � � � � Classifier Multinomial -1 0 1 2 3 4 5 6 the most likely next symbol

  7. Multilayer Perceptrons (MLPs) � � � � � � � � � = � 1 � � a � = � � � � � � � � � � � = � 1 � � a � = � � � � � � � � � � � = � +1 1 +1 +1 � = ������� � � Input � Hidden Layer 1 Hidden Layer 2 Softmax Output

  8. Multilayer Perceptrons (MLPs) � � � � � � � � � = � 1 � � a � = � � � � � � � � � � � = � 1 � � a � = � � � � � � � � � � � = � +1 1 +1 +1 � = ������� � � Input � Hidden Layer 1 Hidden Layer 2 Softmax Output |�| = 450 : 15 symbols with a 30-dimension vector for each symbol � � = 750 and � � = 1000

  9. Convolutional Neural Networks (CNNs) k = 15, � = 30 CNN model architecture adapted from Yoon Kim. Convolutional neural networks for sentence classication. arXiv preprint arXiv:1408.5882 , 2014 Filter windows (height) of 10, 11, 12, 13, 14, 15 ; 200 feature maps for each window

  10. Long Short Term Memory Networks (LSTMs) � � � ℎ ��� � � � � � = � � ⊗ � ��� + � � ⊗ � � � = � � � � � � � � � ℎ � = � � ⊗ ���ℎ (� � ) 1 � ���ℎ � � Time step is 15 , and ℎ � of dim 32 is fed to a logistic regression classifier Images are from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  11. Weighted n-Gram Model (WnGM) 0.28 3 0.11 0.01 2 0.24 0.26 0.15 1 0.21 0.29 0.25 � � � + � � ⋅ + � � ⋅ = ������ 0 0.26 0.21 0.06 0.18 0.23 0.26 −1 Label � � -Gram � � -Gram � � -Gram We set � to 2, 3, 4, 5, 6 with weights 0.3, 0.2, 0.2, 0.15, 0.15 respectively

  12. Overview of Experiments • Implementation - MLP & CNN were implemented in MxNet - LSTM was implemented in Tensorflow - https://bitbucket.org/thinkzhou/spice • System & Hardware - CentOS 7.2 (64Bit) server - Intel Xeon Processor E5-2697 v2 @ 2.70GHz & Four Tesla K40Ms • Time cost - Run all the models on all datasets in less than 16h

  13. Detail Scores on Public Test Sets Total score on private test sets is 10.160324

  14. Discussion & Future Work • MLPs Total scores by different models on public test sets - make the best use of the symbol order information 15 9.802 9.593 9.325 9.237 • CNNs 8.666 10 7.444 5 - should use the problem-specific model architecture 0 - update vectors while training 3-Gram SL MLP CNN WnGram LSTM - train a deep averaging network (DAN) [Mohit et al., 2015] • LSTMs • Future work - integrate neural networks into probabilistic grammatical models in sequence prediction

  15. Thanks

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend