Distilling Knowledge for Search-based Structured Prediction
Yijia Liu*, Wanxiang Che, Huaipeng Zhao, Bing Qin, Ting Liu Research Center for Social Computing and Information Retrieval Harbin Institute of Technology
Distilling Knowledge for Search-based Structured Prediction Yijia - - PowerPoint PPT Presentation
Distilling Knowledge for Search-based Structured Prediction Yijia Liu*, Wanxiang Che, Huaipeng Zhao, Bing Qin, Ting Liu Research Center for Social Computing and Information Retrieval Harbin Institute of Technology Complex Model Wins [ResNet,
Yijia Liu*, Wanxiang Che, Huaipeng Zhao, Bing Qin, Ting Liu Research Center for Social Computing and Information Retrieval Harbin Institute of Technology
[ResNet, 2015] [He+, 2017]
90 90.5 91 91.5 92 92.5 93 Dependency Parsing Baseline search SOTA Distillation Ensemble 20.5 22 23.5 25 26.5 NMT Baseline search SOTA Distillation Ensemble
90 90.5 91 91.5 92 92.5 93 Dependency Parsing Baseline search SOTA Distillation Ensemble 20.5 22 23.5 25 26.5 NMT Baseline search SOTA Distillation Ensemble
0.6 1.3 0.8 2.6
0.5 1 book i like love the this
p(y | I, like)
0.5 1 book I like love the this
argmax p(y | I, like) $(y=this)
?
0.5 1 book I like love the this
argmax p(y | I, like) !(y=this)
0.5 1 book I like love the this
argmax sumy q(y) p(y |I, like)
0.5 1 book I like love the this
p(y | I, like) !(y=this)
0.5 1 book I like love the this
sumy q(y) p(y | I, like)
0.5 1 book I like love the this
argmax sumy q(y) p(y |I, like)
!(y=this)
0.5 1 book I like love the this
p(y | I, like)
0.5 1 book I like love the this
sumy q(y) p(y | I, like)
!(y=this)
0.5 1 book I like love the this
sumy q(y) p(y | I, like, the)
Transition-based Dependency Parsing Penn Treebank (Stanford dependencies) LAS Baseline 90.83 Ensemble (20) 92.73 Distill (reference, ! = 1.0) 91.99 Distill (exploration) 92.00 Distill (both) 92.14 Ballesteros et al. (2016) (dyn. oracle) 91.42 Andor et al. (2016) (local, B=1) 91.02 Neural Machine Translation IWSLT 2014 de-en BLEU Baseline 22.79 Ensemble (10) 26.26 Distill (reference, ! = 0.8) 24.76 Distill (exploration) 24.64 Distill (both) 25.44 MIXER (Ranzato et al. 2015) 20.73 Wiseman and Rush (2016) (local B=1) 22.53 Wiseman and Rush (2016) (global B=1) 23.83
92.07 92.04 91.93 91.9 91.7 91.72 91.55 91.49 91.3 91.1 90.9
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
26.96 27.04 27.13 26.95 26.6 26.64 26.37 26.21 26.09 25.9 24.93
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
Transition-based Parsing Neural Machine Translation
Transition-based Parsing Neural Machine Translation