Sequence Prediction Using Neural Network Classifiers Yanpeng Zhao - - PowerPoint PPT Presentation

sequence prediction using neural network classifiers
SMART_READER_LITE
LIVE PREVIEW

Sequence Prediction Using Neural Network Classifiers Yanpeng Zhao - - PowerPoint PPT Presentation

Sequence Prediction Using Neural Network Classifiers Yanpeng Zhao ShanghaiTech University ICGI, Oct 7 th , 2016, Delft, the Netherlands Sequence Prediction Whats the next symbol? 4 3 5 0 4 6 1 3 1 ? Classification Perspective -1


slide-1
SLIDE 1

Sequence Prediction Using Neural Network Classifiers

Yanpeng Zhao ShanghaiTech University

ICGI, Oct 7th, 2016, Delft, the Netherlands

slide-2
SLIDE 2

Sequence Prediction

4 3 5 0 4 6 1 3 1 ?

What’s the next symbol?

slide-3
SLIDE 3

Classification Perspective

4 3 5 0 4 6 1 3 1

Classifier

Input Sequence

  • 1 0 1 2 3 4 5 6

Multinomial the most likely next symbol

slide-4
SLIDE 4

Representation of Inputs

Continuous vector representation of the discrete symbols

King – Man + Woman ≈ Queen

Images are from: https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/

slide-5
SLIDE 5

Representation of Inputs

Construct inputs for classifiers using learned word vectors

  • 2 -2 -2 -2 -2 4 3 5 0 4 6 1 3 1 2 -1
  • Input Sample

5 Label

Word vectors are concatenated or stacked Predict the next symbol from the previous = 15 symbols, each represented by a 30-dimension vector

slide-6
SLIDE 6

Neural Network Classifiers

4 3 5 0 4 6 1 3 1

Input Test Sequence

  • Classifier
  • 1 0 1 2 3 4 5 6

Multinomial the most likely next symbol

slide-7
SLIDE 7

Multilayer Perceptrons (MLPs)

+1

  • +1

+1

Input Hidden Layer 1 Hidden Layer 2 Softmax Output

= 1 a = = 1 a = = 1 =

slide-8
SLIDE 8

Multilayer Perceptrons (MLPs)

+1

  • +1

+1

Input Hidden Layer 1 Hidden Layer 2 Softmax Output

= 1 a = = 1 a = = 1 =

|| = 450 : 15 symbols with a 30-dimension vector for each symbol = 750 and = 1000

slide-9
SLIDE 9

Convolutional Neural Networks (CNNs)

CNN model architecture adapted from Yoon Kim. Convolutional neural networks for sentence classication. arXiv preprint arXiv:1408.5882, 2014

k = 15, = 30 Filter windows (height) of 10, 11, 12, 13, 14, 15 ; 200 feature maps for each window

slide-10
SLIDE 10

Long Short Term Memory Networks (LSTMs)

  • =

  • 1

=

⊗ + ⊗

  • ℎ = ⊗ ℎ

() Images are from: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Time step is 15, and ℎ of dim 32 is fed to a logistic regression classifier

slide-11
SLIDE 11

Weighted n-Gram Model (WnGM)

3 2 1 −1 0.11 0.24 0.21 0.26 0.18 + ⋅ 0.01 0.26 0.29 0.21 0.23 + ⋅ 0.28 0.15 0.25 0.06 0.26 =

Label

  • Gram
  • Gram
  • Gram

We set to 2, 3, 4, 5, 6 with weights 0.3, 0.2, 0.2, 0.15, 0.15 respectively

slide-12
SLIDE 12

Overview of Experiments

  • Implementation
  • MLP & CNN were implemented in MxNet
  • LSTM was implemented in Tensorflow
  • https://bitbucket.org/thinkzhou/spice
  • System & Hardware
  • CentOS 7.2 (64Bit) server
  • Intel Xeon Processor E5-2697 v2 @ 2.70GHz & Four Tesla K40Ms
  • Time cost
  • Run all the models on all datasets in less than 16h
slide-13
SLIDE 13

Detail Scores on Public Test Sets

Total score on private test sets is 10.160324

slide-14
SLIDE 14

Discussion & Future Work

  • MLPs
  • make the best use of the symbol order information
  • CNNs
  • should use the problem-specific model architecture
  • update vectors while training
  • train a deep averaging network (DAN) [Mohit et al., 2015]
  • LSTMs
  • Future work
  • integrate neural networks into probabilistic grammatical models in sequence

prediction

8.666 7.444 9.802 9.593 9.237 9.325

5 10 15 3-Gram SL MLP CNN WnGram LSTM

Total scores by different models

  • n public test sets
slide-15
SLIDE 15

Thanks