CS11-747 Neural Networks for NLP
Advanced Search Algorithms
Graham Neubig https://phontron.com/class/nn4nlp2018/
(Some Slides by Daniel Clothiaux)
Advanced Search Algorithms Graham Neubig - - PowerPoint PPT Presentation
CS11-747 Neural Networks for NLP Advanced Search Algorithms Graham Neubig https://phontron.com/class/nn4nlp2018/ (Some Slides by Daniel Clothiaux) Why search? So far, decoding has mostly been greedy Chose the most likely output from
Graham Neubig https://phontron.com/class/nn4nlp2018/
(Some Slides by Daniel Clothiaux)
probability/score, maintain multiple paths
expanded set
Next word P(next word) Pittsburgh 0.4 New York 0.3 New Jersey 0.25 Other 0.05
time step
score is within a threshold α of best score s1 sn + α > s1
prune
actions are Open, vs. a few Closes and one Shift
(Buckman et al, 2016)
5-100?)
variable length
Encoder–Decoder (Cho et al., 2014)
‘Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation’ (Y Wu et al. 2016)
(Eriguchi et al. 2016)
between sentences
(Ott et. al. 2014)
uncertainty (bad training data, especially copies)
weighted, expanding beam compounds this problem
Effective Inference for Generative Neural Parsing (Mitchell Stern et al., 2017)
Generates) equal to the vocabulary size
same length taken after the ith Shift.
number of Shifts and actions after the Shift
bias, certain Shifts are immediately added to the next bucket
Mutual Information and Diverse Decoding Improve Neural Machine Translation (Li et al., 2016)
models, language model
(Shao et al., 2017)
great diversity!
conversation are less peaky
Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement (Kool et. al 2019)
largest element is sampling from a categorical distribution without replacement
Human-like Natural Language Generation Using Monte Carlo Tree Search
Sequence-to-Sequence Learning as Beam-Search Optimization (Wiseman et al., 2016)
A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models (Goyal et al., 2017)
total cost along the path
to goal
(Lewis et al. 2014)
span
constituent outside of current span
CCG Parsing:
(Lee et al. 2016)
span
then lazily expand good scores
easily throw on existing model
actions (say, the next word)
BLEU) for MT models
trained with TD
the potential next actions, Q reward Actor: Critic:
similar to REINFORCE style algorithms
certainty of it’s paths
paths, dropping any path that would get <1 particle
improve performance
generative model is difficult
model A
trained B