Evaluation of Machine Learning Methods on SPiCe Ichinari Sato 1 , - PowerPoint PPT Presentation

1/15 Evaluation of Machine Learning Methods on SPiCe Ichinari Sato 1 , Kaizaburo Chubachi 2 , Diptarama 1 1 Graduate School of Information Sciences, Tohoku University, Japan 2 School of Engineering, Tohoku University, Japan Team: ushitora

2/15 Agenda • Used methods • XGBoost • LSTM • Mixture of Distributions Language Model [Neubig, Dyer 2016] • Neural/ n -gram Hybrid Language Model [Neubig, Dyer 2016]

3/15 At The Beginning Of SPiCe First of all ... XGBoost Deep Learning

4/15 Used Methods • n -gram based • n -gram & spectral learning combined [Balle, 2013] • XGBoost based [Chen& Guestrin, 2016] • Long Short Term Memory [Zaremba et al., 2014] • XGBoost & LSTM combined • Neural/ n -gram hybrid [Neubig & Dyer, 2016]

5/15 eXtreme Gradient Boosting (XGBoost) [Chen & Guestrin, 2016] • XGBoost is a tree boosting system. Tree boosting Training Phase Tree Ensemble Model Add a tree that minimize a loss function Loss function: • Log loss • Mean squared error The output is the sum of predictions from each tree etc. !( ) = 2 + 0.9 = 2.9

6/15 XGBoost for Language Model • The input is the last 10 symbols encoded as 1-hot-vector. 1 st 2 nd 3 rd 10 th Training Given data before before before before labels 123 0 0 0 0 0 0 0 0 0 … 0 0 0 1 Input 1 0 0 0 0 0 0 0 0 … 0 0 0 2 0 1 0 1 0 0 0 0 0 … 0 0 0 3 0 0 1 0 1 0 1 0 0 … 0 0 0 -1 XGB 3321 0 0 0 0 0 0 0 0 0 … 0 0 0 3 0 0 1 0 0 0 0 0 0 … 0 0 0 3 0 0 1 0 0 1 0 0 0 … 0 0 0 2 . . . . . . . . .

7/15 Long Short Term Memory (LSTM) for LM [Zaremba et al., 2014] LSTM node Deep Learning memory cell Feed Forward Recurrent - . (RNN) / . MLP CNN LSTM / .01 output gate input gate / .01 / .01 - . - . X Y train 0 1 2 3 4 5 0 1 2 3 4 5 0 1 0 0 1 0 212312 1 0 1 0 0 1 ? Y 0 0 0 1 0 0 X X' = predict 0 1 2 3 4 5 0 1 2 3 4 5 1 0 1 0 1 0 121212 � 0 1 0 1 0 1 0 0 0 0 0 0 Full conect layer LSTM Layer

8/15 Public Test Score (1) P random Ngrams XGB LSTM 0 0.969 0.985 0.920 0.771 1 0.836 0.879 0.914 0.380 2 0.822 0.888 0.913 0.501 3 0.780 0.848 0.882 0.500 4 0.554 0.590 0.589 0.082 5 0.651 0.787 0.751 0.057 6 0.744 0.698 0.729 0.068 7 0.668 0.783 0.589 0.139 8 0.593 0.609 0.637 0.060 memory 9 0.890 0.922 0.308 0.895 10 0.465 0.595 0.559 0.140 overflow 11 --- 0.509 0.000 0.335 more 0.623 0.677 12 0.404 0.728 difficult 0.400 0.473 13 0.004 0.429 0.376 0.371 14 0.129 0.331 0.263 0.155 15 0.138 0.259 total 2.910 9.090 9.229 9.670

9/15 How To Combine P XGB LSTM XGB+LSTM XGB 0 0.985 0.920 0.914 1 0.879 0.914 0.901 2 0.888 0.913 0.911 3 0.848 0.882 0.881 LSTM 4 0.590 0.589 0.492 5 0.787 0.751 0.775 sum 6 0.698 0.729 0.786 submit top 5 symbols 7 0.783 0.589 0.755 8 0.609 0.637 0.579 9 0.890 0.922 0.917 10 0.595 0.559 0.577 11 --- 0.509 --- • Simple Linear Sum 0.623 0.677 0.663 12 • n -gram & spectral learning is good. 0.400 0.473 0.406 13 0.376 0.371 0.402 • XGB & LSTM is not good. 14 0.263 0.155 0.227 15 • We must find a better ensemble method. total 9.229 9.670 9.272 • What can we do?

10/15 We Got A Chance At June 3rd Graham Neubig tweet "we published a paper of new language model." We upload a paper which formularize neural and n- gram language model to one general framework. please read if you interested on language model or machine learning model for NLP. Generalizing and Hybridizing Count-based and Neural Language Models [EMNLP 2016]

11/15 Mixture of Distribution LM (MODLM) [Neubig & Dyer, 2016] 9 2 3 4 5 = 6 7 8 5 2 8 (3 4|5) weight prediction distribution 8:1 � c is context, 5 = 3 1 , 3 = , ⋯ , 3 ? � |∑| = 3, F = 4 n -gram LM Neural Net LM 7 1 B 1,A B 1,C B 1,1 B 1,= @ 1 @ 1 7 1 1 0 0 7 = @ = @ = B =,A B =,C 7 = = B =,1 B =,= 0 1 0 = 7 A @ A @ A 7 A 0 0 1 B A,A B A,C B A,1 B A,= 7 C 1-gram 2-gram 3-gram 4-gram heuristic Kronecker δ distributions for each symbol learning target Let's combine!

12/15 Neural/ n -gram Hybrid LM [Neubig, Dyer 2016] 7 1 B 1,C 1 B 1,A 0 0 B 1,1 B 1,= @ 1 7 = @ = B =,C 0 1 0 B =,A B =,1 n -gram matrix B =,= = 7 A I |∑| + F @ A ⋮ B A,C 0 0 1 B A,A |∑|×|∑| B A,1 B A,= |∑|×F 7 J place 2 matrix horizontal block dropout When this model learns λ , a part of λ cannot proceed to learn. randomly drop out n -gram matrix (50%) 7 1 ⋮ B 1,C 1 0 0 B 1,A B 1,1 B 1,= @ 1 for n -gram matrix 7 C I O @ = B =,C 0 1 0 B =,A B =,1 B =,= = 7 I |∑|×|∑| @ A |∑|×F B A,C 0 0 1 B A,A B A,1 B A,= learning is ⋮ not proceeding 7 J

13/15 Public Test Score (2) P XGB LSTM Hybrid 1 0.879 0.914 0.911 Hybrid 2 0.888 0.913 0.910 Hybrid • total score (exclude Problem 11) 0.848 0.882 3 0.885 Hybrid 4 0.590 0.589 XGB 0.564 LSTM < XGBoost < Hybrid 5 0.787 0.751 0.767 XGB 9.161 9.229 9.556 6 0.698 0.729 0.852 Hybrid 7 0.783 0.589 0.630 XGB 8 0.609 0.637 • When we submitted to private test, 0.642 Hybrid 9 0.890 0.922 Hybrid 0.956 we chose XGB or Hybrid by problem. 10 0.595 0.559 0.542 XGB 11 --- 0.509 0.489 Hybrid 0.623 0.677 0.770 12 Hybrid 0.400 0.473 13 0.496 Hybrid Is final result also higher? 0.376 0.371 0.370 14 XGB 0.263 0.155 0.260 15 XGB total 9.229 9.670 10.045

14/15 Final Result P public private model 1 0.9146 0.9135 Hybrid 2 0.9137 0.9083 Hybrid Almost all scores is decrease from public to private. 3 0.8853 0.8862 Hybrid 4 0.6060 0.5514 XGB We think that our model was over-fitting. 5 0.7873 0.5514 XGB 6 0.8719 0.8364 Hybrid Finally, our public test rank is 2nd 7 0.7832 0.7846 XGB and private tests rank is 3rd. 8 0.6431 0.5890 Hybrid 9 0.9563 0.9353 Hybrid 10 0.5960 0.5519 XGB 11 0.5096 0.4265 Hybrid 12 0.7751 0.7629 Hybrid 13 0.4959 0.3834 Hybrid 14 0.4024 0.3681 XGB 15 0.2765 0.2609 XGB total 10.4169 9.7098

15/15 Conclusion • We tried some methods including a new method. • The hybrid model got the best score in public test. • Our models were over-fitting.

Evaluation of Machine Learning Methods on SPiCe Ichinari Sato 1 , - PowerPoint PPT Presentation

1/15 Evaluation of Machine Learning Methods on SPiCe Ichinari Sato 1 , Kaizaburo Chubachi 2 , Diptarama 1 1 Graduate School of Information Sciences, Tohoku University, Japan 2 School of Engineering, Tohoku University, Japan Team: ushitora 2/15

Spice Spice contains no compensatory substances

Community Solar and SPICE Presented by: Solar Energy Society of Alberta - September 26, 2019

[PDF] Spice of Life : The Recipes and Cooking Culture of Thailand (book with CD Rom in

SPICE and desktop virtualization Gerd Hoffmann <kraxel@redhat.com> Red Hat LinuxTag, May

Orange Mockup Review Oct. 21 st , 2010 Select-a-Spice

Desktop Virtualization with SPICE Gerd Hoffmann <kraxel@redhat.com> Linux Kongress, Sep 23

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

SPICE project theme 4 Georg Martin Project description Name: Implementation and development

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Institutional Presentation 2019 Index Spice Private Equity 2018 highlights Investment

Sharing Best Practices SPICE Presentation September 17, 2018 WE LIP was honoured to have been

Google/SkyWater and the Promise of the Open PDK The New Ecosystem of Open Source Silicon Tim

SPICE Simula,on and Capacitance Measurements Brad Ellison, Thomas

Liveness Checking as Safety Checking for Infinite State Spaces Viktor Schuppan 1 , Armin Biere 2 1

Lecture 06 Wireless Communication I-Hsiang Wang ihwang@ntu.edu.tw National Taiwan University

Sequence Prediction Using Neural Network Classifiers Yanpeng Zhao ShanghaiTech University ICGI,

Exploring Proton Structure with Drell-Yan Scattering Hadron Physics Seminar Darmstadt, December

Large Scale Searches for Relic Dark Matter Mani Tripathi University of California, Davis

Isovector Dependence of the EMC Effect December 4, 2016 SRC/EMC 2016 IV EMC 1/9

Evaluation of Machine Learning Methods on SPiCe Ichinari Sato 1 , - PowerPoint PPT Presentation

1/15 Evaluation of Machine Learning Methods on SPiCe Ichinari Sato 1 , Kaizaburo Chubachi 2 , Diptarama 1 1 Graduate School of Information Sciences, Tohoku University, Japan 2 School of Engineering, Tohoku University, Japan Team: ushitora 2/15

Spice Spice contains no compensatory substances

Community Solar and SPICE Presented by: Solar Energy Society of Alberta - September 26, 2019

[PDF] Spice of Life : The Recipes and Cooking Culture of Thailand (book with CD Rom in

SPICE and desktop virtualization Gerd Hoffmann &lt;kraxel@redhat.com&gt; Red Hat LinuxTag, May

Orange Mockup Review Oct. 21 st , 2010 Select-a-Spice

Desktop Virtualization with SPICE Gerd Hoffmann &lt;kraxel@redhat.com&gt; Linux Kongress, Sep 23

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

SPICE project theme 4 Georg Martin Project description Name: Implementation and development

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Institutional Presentation 2019 Index Spice Private Equity 2018 highlights Investment

Sharing Best Practices SPICE Presentation September 17, 2018 WE LIP was honoured to have been

Google/SkyWater and the Promise of the Open PDK The New Ecosystem of Open Source Silicon Tim

SPICE Simula,on and Capacitance Measurements Brad Ellison, Thomas

Liveness Checking as Safety Checking for Infinite State Spaces Viktor Schuppan 1 , Armin Biere 2 1

Lecture 06 Wireless Communication I-Hsiang Wang ihwang@ntu.edu.tw National Taiwan University

Sequence Prediction Using Neural Network Classifiers Yanpeng Zhao ShanghaiTech University ICGI,

Exploring Proton Structure with Drell-Yan Scattering Hadron Physics Seminar Darmstadt, December

Large Scale Searches for Relic Dark Matter Mani Tripathi University of California, Davis

Isovector Dependence of the EMC Effect December 4, 2016 SRC/EMC 2016 IV EMC 1/9

SPICE and desktop virtualization Gerd Hoffmann <kraxel@redhat.com> Red Hat LinuxTag, May

Desktop Virtualization with SPICE Gerd Hoffmann <kraxel@redhat.com> Linux Kongress, Sep 23