beam search
play

Beam Search Shahrzad Kiani and Zihao Chen CSC2547 Presentation - PowerPoint PPT Presentation

Beam Search Shahrzad Kiani and Zihao Chen CSC2547 Presentation Beam Search Greedy Search: Always go to top 1 scored sequence (seq2seq) Beam Search: Maintain the top K scored sequences (this paper) Seq2Seq Train and Test Issues gold sequence


  1. Beam Search Shahrzad Kiani and Zihao Chen CSC2547 Presentation

  2. Beam Search Greedy Search: Always go to top 1 scored sequence (seq2seq) Beam Search: Maintain the top K scored sequences (this paper)

  3. Seq2Seq Train and Test Issues gold sequence 𝑧 ":$ = [𝑧 " , … , 𝑧 $ ] predicted sequence * 𝑧 ":$ = * 𝑧 " , … , * 𝑧 $ Word level ● π‘ž $,-./ * 𝑧 $ 𝑧 ":$0" ) = 𝑇𝑝𝑔𝑒𝑛𝑏𝑦(𝑒𝑓𝑑𝑝𝑒𝑓𝑠(𝑧 ":$0" )) 1.Exposure ● π‘ž $>?$ * 𝑧 $ * 𝑧 ":$0" ) = 𝑇𝑝𝑔𝑒𝑛𝑏𝑦(𝑒𝑓𝑑𝑝𝑒𝑓𝑠(* 𝑧 ":$0" )) Bias Sentence level C 𝑧 ":$ = 𝑧 ":$ = ∏ $B" ● π‘ž $,-./ * π‘ž(* 𝑧 $ = 𝑧 $ |𝑧 ":$0" )

  4. Seq2Seq Train and Test Issues (continued) Training Loss C 𝑧 ":$ = 𝑧 ":$ = ∏ $B" ● Maximize π‘ž $,-./ * π‘ž(* 𝑧 $ = 𝑧 $ |𝑧 ":$0" ) ● Minimize Negative Log Likelihood (NLL) C 𝑂𝑀𝑀 = βˆ’π‘šπ‘œ J π‘ž * 𝑧 $ = 𝑧 $ 𝑧 ":$0" = βˆ’ K ln(π‘ž * 𝑧 $ = 𝑧 $ 𝑧 ":$0" ) $B" $ Testing Evaluation ● Sequence level metrics like BLEU

  5. Seq2Seq Train and Test Issues (continued) Training Loss C 𝑧 ":$ = 𝑧 ":$ = ∏ $B" ● Maximize π‘ž $,-./ * π‘ž(* 𝑧 $ = 𝑧 $ |𝑧 ":$0" ) ● Minimize Negative Log Likelihood (NLL) C 𝑂𝑀𝑀 = βˆ’π‘šπ‘œ J π‘ž * 𝑧 $ = 𝑧 $ 𝑧 ":$0" = βˆ’ K ln(π‘ž * 𝑧 $ = 𝑧 $ 𝑧 ":$0" ) $B" $ Testing Evaluation ● Sequence level metrics like BLEU word level loss 2.Loss-Evaluation Mismatch

  6. Optimization Approach 1. Exposure Bias: model is not exposed at its errors at training β€’ Train with beam search 2. Loss-Evaluation Mismatch : loss on word level, evaluation on sequence β€’ Define score for sequence β€’ Define search-based sequence loss

  7. Sequence Score β€’ score * 𝑧 ":C = 𝑒𝑓𝑑𝑝𝑒𝑓𝑠(𝑒) β€’ Hard constraint 𝑑𝑑𝑝𝑠𝑓 * 𝑧 ":$ = βˆ’βˆž Constrained Beam Search Optimization(ConBSO) (P) β€’ Sequence with K-th ranked score * 𝑧 ":$

  8. Search-Based Sequence Loss (P) [1 + 𝑑𝑑𝑝𝑠𝑓(* P ) βˆ’ 𝑑𝑑𝑝𝑠𝑓(𝑧 $ )] β„’ πœ„ = K βˆ† * 𝑧 𝑧 ":$ ":$ $ P ) βˆ’ 𝑑𝑑𝑝𝑠𝑓(𝑧 $ ) > 0 : When 1 + 𝑑𝑑𝑝𝑠𝑓(* 𝑧 ":$ β€’ The gold sequence 𝑧 ":$ doesn’t have a K highest score β€’ Margin Violation Margin Violation

  9. Search-Based Sequence Loss (continued) (P) [1 + 𝑑𝑑𝑝𝑠𝑓(* P ) βˆ’ 𝑑𝑑𝑝𝑠𝑓(𝑧 $ )] β„’ πœ„ = K βˆ† * 𝑧 𝑧 ":$ ":$ $ (P) βˆ† * 𝑧 ":$ β€’ scaling factor of penalizing the prediction β€’ = 1 when margin violation; = 0 when no margin violation Goals: β€’ When t<T, avoid margin violation, force the gold sequence to be top K β€’ When t=T, force the gold sequence to be top 1 , so set K = 1

  10. Backpropagation Through Time (BPTT) β€’ Recall loss function: (P) [1 + 𝑑𝑑𝑝𝑠𝑓(* P ) βˆ’ 𝑑𝑑𝑝𝑠𝑓(𝑧 $ )] β„’ πœ„ = βˆ‘ $ βˆ† * 𝑧 𝑧 ":$ ":$ P ) and β€’ When margin violation, backpropagate for 𝑑𝑑𝑝𝑠𝑓(* 𝑧 ":$ 𝑑𝑑𝑝𝑠𝑓(𝑧 $ ) : 𝑷(𝑼) β€’ A margin violation at each time step: worst case 𝑷(𝑼 πŸ‘ )

  11. Learning as Search Optimization (LaSO) P β€’ Normal case: update beam with * 𝑧 ":$ β€’ Margin violation case: update beam with 𝑧 ":$ instead Each incorrect sequence is an extension of the partial gold sequence Only maintain two sequences, 𝑃 2π‘ˆ = 𝑷(𝑼)

  12. Experiment on Word Ordering fish cat eat -> cat eat fish Features β€’ Non-exhaustive search β€’ Hard constraint Settings β€’ Dataset: PTB dataset β€’ Metrics: BLEU [Image credit: Sequence-to-Sequence Learning as Beam Search Optimization, Wiseman et al., EMNLP’ 16] P β€’ βˆ† * 𝑧 ":$ scaler: 0/1

  13. Conclusion Alleviate the issues of seq2seq ● Exposure Bias: Beam Search ● Loss-Evaluation Mismatch: sequence level cost function with O(T) BPTT with hard constraint A variant of seq2seq with beam search training scheme

  14. Related Works and References β€’ Wiseman, Sam, and Alexander M. Rush. "Sequence-to-Sequence Learning as Beam-Search Optimization." Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016.Sbs β€’ Kool, Wouter, Herke Van Hoof, and Max Welling. "Stochastic Beams and Where To Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement." International Conference on Machine Learning . 2019. β€’ https://guillaumegenthial.github.io/sequence-to-sequence.html β€’ https://medium.com/@sharaf/a-paper-a-day-2-sequence-to-sequence-learning-as-beam-search-optimization-92424b490350 β€’ https://www.facebook.com/icml.imls/videos/welcome-back-to-icml-2019-presentations-this-session-on-deep-sequence-models- inc/895968107420746/ β€’ https://icml.cc/media/Slides/icml/2019/hallb(13-11-00)-13-11-00-4927-stochastic_beam.pdf β€’ https://vimeo.com/239248437 β€’ Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems . 2014. β€’ Propose Sequence-to Sequence learning with deep neural networks β€’ DaumΓ© III, Hal, and Daniel Marcu. "Learning as search optimization: Approximate large margin methods for structured prediction." Proceedings of the 22nd international conference on Machine learning . ACM, 2005. β€’ Propose a framework for learning as search optimization, and two parameter updates with convergence theorems and bounds β€’ Gu, Jiatao, Daniel Jiwoong Im, and Victor OK Li. "Neural machine translation with gumbel-greedy decoding." Thirty-Second AAAI Conference on Artificial Intelligence . 2018. β€’ Propose the Gumbel-Greedy Decoding, which trains a generative network to predict translation under a trained model

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend