deep learning

Deep learning Deep dual learning 1 Hamid Beigy Sharif university of - PowerPoint PPT Presentation

Deep learning Deep learning Deep dual learning 1 Hamid Beigy Sharif university of technology December 21, 2019 1 Some slides are adopted from Tao Qin, Sreeja R Thoom et al. slides. Hamid Beigy | Sharif university of technology | December 21,


  1. Deep learning Deep learning Deep dual learning 1 Hamid Beigy Sharif university of technology December 21, 2019 1 Some slides are adopted from Tao Qin, Sreeja R Thoom et al. slides. Hamid Beigy | Sharif university of technology | December 21, 2019 1 / 28

  2. Deep learning Table of contents 1 Introduction 2 Dual learning 3 Dual Supervised Learning Hamid Beigy | Sharif university of technology | December 21, 2019 2 / 28

  3. Deep learning | Introduction Introduction Hamid Beigy | Sharif university of technology | December 21, 2019 2 / 28

  4. Deep learning | Introduction Three Pillars of Deep Learning 1 Three Pillars of Deep Learning Big data: web pages, search logs, social networks, and new mechanisms for data collection: conversation and crowd–sourcing. Big models: 1000+ layers, tens of billions of parameters Big computing: CPU clusters, GPU clusters, TPU clusters, FPGA farms, provided by Amazon, Azure, Ali etc. Hamid Beigy | Sharif university of technology | December 21, 2019 3 / 28

  5. Deep learning | Introduction Some Challenges of Deep Learning 1 Big-Data Challenge Todays deep learning highly relies on huge amount of human-labeled training data Task Typical training data Image classification Millions of labeled images Speech recognition Thousands of hours of annotated voice data Machine translation Tens of millions of bilingual sentence pairs Human labeling is in general very expensive, and it is hard, if not impossible, to obtain large-scale labeled data for rare domains Hamid Beigy | Sharif university of technology | December 21, 2019 4 / 28

  6. Deep learning | Introduction Machine translation 1 How translate from a source language to a destination language? 2 Main problems How translate words from the source language to the destination language? How order words in the destination language? How measure goodness of translation? What type of corpus is needed? (monolingual or bilingual) How build a sequence of translators? (Persian → English → French) Hamid Beigy | Sharif university of technology | December 21, 2019 5 / 28

  7. Deep learning | Introduction Neural machine translation (NMT) 1 In NMT 2 , recurrent neural networks such as LSTM or GRU units are used. 2 Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. ”Neural machine translation by jointly learning to align and translate.” ICLR 2015. Hamid Beigy | Sharif university of technology | December 21, 2019 6 / 28

  8. Deep learning | Introduction Neural machine translation (NMT) 1 A critical disadvantage of this fixed-length context vector design is incapability of remembering long sentences. 2 The attention mechanism was proposed to help memorize long source sentences in NMT 3 Another critical disadvantage of this model is training set. We need a large bilingual corpus. 4 Dual learning was introduced to overcome the need for a large bilingual corpus. Hamid Beigy | Sharif university of technology | December 21, 2019 7 / 28

  9. Deep learning | Dual learning Dual learning Hamid Beigy | Sharif university of technology | December 21, 2019 7 / 28

  10. Deep learning | Dual learning Duality in Machine Translation 1 Dual learning is a auto-encoder like mechanism to utilize the monolingual datasets 3 . 3 Y. Xia, D. He, T. Qin, L. Wang, N. Yu, T.-Y. Liu, and W.-Y. Ma. Dual learning for machine translation. NIPS 2016. Hamid Beigy | Sharif university of technology | December 21, 2019 8 / 28

  11. Deep learning | Dual learning Duality in Speech Processing 1 Duality in Speech Processing. Speech recognition Primal Task 𝑔: 𝑦 → 𝑧 Welcome to Beijing! Dual Task 𝑕: 𝑧 → 𝑦 Speech synthesis Hamid Beigy | Sharif university of technology | December 21, 2019 9 / 28

  12. Deep learning | Dual learning Duality in Question Answering and Generation Generation 1 Duality in Question Answering and Generation. Question answering Primal Task 𝑔: 𝑦 → 𝑧 Parts of the immune system of for what purpose do higher organisms create peroxide , organisms make peroxide superoxide , and singlet oxygen to and superoxide ? destroy invading microbes . Dual Task 𝑕: 𝑧 → 𝑦 Question generation Hamid Beigy | Sharif university of technology | December 21, 2019 10 / 28

  13. Deep learning | Dual learning Duality in Search and Advertising 1 Duality in Search and Advertising. Search: find webpages for a given query Primal Task 𝑔: 𝑦 → 𝑧 Amazon Amazon.com Shopping Dual Task 𝑕: 𝑧 → 𝑦 Advertising: suggest keywords for a given webpage Hamid Beigy | Sharif university of technology | December 21, 2019 11 / 28

  14. Deep learning | Dual learning Structural Duality in AI Structural duality is very common in artificial intelligence AI Tasks X → Y Y → X Image classification Translation from EN to CH Translation from CH to EN Speech processing Speech recognition Text to speech Image understanding Image captioning Image generation Conversation Question answering Question generation Search engine Query-document matching Query/keyword suggestion Currently most machine learning algorithms do not exploit structure duality for training and inference. Hamid Beigy | Sharif university of technology | December 21, 2019 12 / 28

  15. Deep learning | Dual learning Dual Learning 1 A new learning framework that leverages the symmetric (primal-dual) structure of AI tasks to obtain effective feedback or regularization signals to enhance the learning/inference process. 2 If you dont have enough labeled data for training, can we use unlabeled data? 3 Dual Unsupervised Learning can leverage structural duality to learn from unlabeled data. Hamid Beigy | Sharif university of technology | December 21, 2019 13 / 28

  16. Deep learning | Dual learning Dual learning (Definition) 1 Let us to define 4 D A Corpus of language A. D B Corpus of language B. P ( . | s , θ AB ) translation model from A to B. P ( . | s , θ BA ) translation model from B to A. LM A ( . ) learned language model of A. LM B ( . ) learned language model of B. 4 Y. Xia, D. He, T. Qin, L. Wang, N. Yu, T.-Y. Liu, and W.-Y. Ma. Dual learning for machine translation. NIPS 2016. Hamid Beigy | Sharif university of technology | December 21, 2019 14 / 28

  17. Deep learning | Dual learning Dual learning (Algorithm) 1 We have 2 Generate K translated sentences s mid , 1 , s mid , 2 , . . . , s mid , K from P ( . | s , θ AB ) 3 Compute intermediate rewards r 1 , 1 , r 1 , 2 , . . . , r 1 , K from LM B ( s mid , K ) for each sentence as r 1 , k = LM B ( s mid , k ) Hamid Beigy | Sharif university of technology | December 21, 2019 15 / 28

  18. Deep learning | Dual learning Dual learning (Algorithm) 1 We have 2 Compute communication rewards r 2 , 1 , r 2 , 2 , . . . , r 2 , K for each sentence as r 2 , k = ln P ( s | s mid , ; θ BA ) 3 Set the total reward of k th sentence as r k = α r 1 , k + (1 − α ) r 2 , k Hamid Beigy | Sharif university of technology | December 21, 2019 16 / 28

  19. Deep learning | Dual learning Dual learning (Algorithm) 1 We have 2 Compute the stochastic gradient of θ AB and θ BA K ∇ θ AB E [ r ] = 1 � r k ∇ AB ln P ( s mid , k | s , θ AB ) K k =1 K ∇ θ BA E [ r ] = 1 � (1 − α ) ∇ BA ln P ( s mid , k | s , θ BA ) K k =1 Hamid Beigy | Sharif university of technology | December 21, 2019 17 / 28

  20. Deep learning | Dual learning Dual learning (Algorithm) 1 We have 2 Update the mode parameters θ AB and θ BA θ AB ← θ AB + γ 1 ∇ θ AB E [ r ] θ BA ← θ BA + γ 2 ∇ θ BA E [ r ] Hamid Beigy | Sharif university of technology | December 21, 2019 18 / 28

  21. Deep learning | Dual learning Dual learning algorithm (pseudo code)) �E������������������D:� Hamid Beigy | Sharif university of technology | December 21, 2019 19 / 28

  22. Deep learning | Dual learning Experimental results Hamid Beigy | Sharif university of technology | December 21, 2019 20 / 28

  23. Deep learning | Dual learning Experimental results 1 Reconstruction performance (BLEU: geometric mean of n -gram 0��E�D� precision) �IGR��WU�GWMR��SIUJRUPE�GI� 2�5C� ���I�MPSUR�IPI�W�JURP�FE�IOM�I�PRHIO���I�SIGMEOO��M�� 5��/6U�5� A� Hamid Beigy | Sharif university of technology | December 21, 2019 21 / 28

  24. Deep learning | Dual learning Experimental results 1 For different source sentence length (Improvement is significant for 0��E�D� long sentences) Hamid Beigy | Sharif university of technology | December 21, 2019 22 / 28 6RU�HMJJIUI�W��R�UGI��I�WI�GI�OI��WL �PSUR�IPI�W�M���M��MJMGE�W�JRU�OR����I�WI�GI��

  25. Deep learning | Dual learning 0��E�D� Experimental results �IGR��WU�GWMR��I�EPSOI� 1 Reconstruction examples Hamid Beigy | Sharif university of technology | December 21, 2019 23 / 28

  26. Deep learning | Dual Supervised Learning Dual Supervised Learning Hamid Beigy | Sharif university of technology | December 21, 2019 23 / 28

Recommend


More recommend