labeling text in several languages with mul lingual
play

Labeling Text in Several Languages with Mul;lingual Hierarchical - PowerPoint PPT Presentation

Labeling Text in Several Languages with Mul;lingual Hierarchical AEen;on Networks Nikolaos Pappas , Andrei Popescu-Belis Idiap Research Ins;tute June 9, 2017 Swisstext, Winterthur Topic Recogni;on Spam filtering Mailbox Op;miza;on


  1. Labeling Text in Several Languages with Mul;lingual Hierarchical AEen;on Networks Nikolaos Pappas , Andrei Popescu-Belis Idiap Research Ins;tute June 9, 2017 Swisstext, Winterthur

  2. Topic Recogni;on Spam filtering — Mailbox Op;miza;on — Customer Support Nikolaos Pappas 2 /18

  3. Ques;on Answering Reading/Naviga;on Assistant — Interac;ve Search Ques;on: Which Gaudi’s crea;on is his masterpiece? Answer: Sagrada Família Nikolaos Pappas 3 /18

  4. Machine Transla;on Document Transla;on — Dialogue Transla;on Nikolaos Pappas 4 /18

  5. Fundamental Func;on: Represen;ng Word Sequences • Goal : Learn representa;ons (distributed vectors) of word sequences which encode effec;vely the meaning / knowledge needed to perform ✓ Topic Recogni;on • Ques;on Answering • Machine Transla;on • Summariza;on … Can we benefit from multiple languages? Nikolaos Pappas 5 /18

  6. Dealing with Mul;ple Languages: Monolingually • Solu:on? Separate models per language • language-dependent learning • linear growth of the parameters • lack of cross-language knowledge transfer • hierarchical modeling at the document-level Documents X = {x i | i=1…n} Models (Kim, 2014) f: X → Y (Tang et al., 2015) (Lin et al., 2015) Labels (Yang et al., 2016) Y = {y i | i=1…n} Nikolaos Pappas 6 /18

  7. Dealing with Mul;ple Languages: Mul;lingually • Solu:on? Single model with aligned input space • language-independent learning • constant number of parameters • common label sets across languages • modeling at the word-level (Klementiev et al., 2012) (Herman and Blunsom, 2014) Model (Gouws et al., 2015) (Ammar et al., 2016) Nikolaos Pappas 7 /18

  8. Dealing with Mul;ple Languages: Our contribu;on • Solu:on: Single model trained over arbitrary label sets with an aligned input space • language-independent learning • sub-linear growth of parameters • arbitrary label sets across languages • hierarchical modeling at the document-level Model1 Model2 ModelM … Nikolaos Pappas 8 /18

  9. Background: Hierarchical AEen;on Networks (HANs) • Input: sequence of word vectors • Output : document vector u • Hierarchical structure - Word-level and sentence-level abstrac;on layers - encoder (H s , H w ) - aEen;on mechanism (a w , α s ) - Classifica;on layer (W c ) + cross-entropy Words: Sentences: • Training: using SGD with ADAM Document: (Yang et al., 2016) Nikolaos Pappas 9 /18

  10. MHANs: Mul;lingual Hierarchical AEen;on Networks Nikolaos Pappas 10 /18

  11. Mul;lingual AEen;on Networks: Computa;onal Cost • A fewer number of parameters is needed • θ enc = { H, W(l), H, W(l), W(l) } , θ att = { H(l), W, H(l) , W , W(l) } • θ both = { H, W, H, W, W(l) } , θ mono = { H(l), W(l), H(l), W(l), W(l) } • The following inequali;es are true: • Example with shared aEen;on mechanisms Naive DL multilingual adaptation fails! Nikolaos Pappas 11 /18

  12. Mul;lingual AEen;on Networks: Training Strategy • Minimizing the sum of the cross-entropy errors • Issue : Naive consecu;ve training biases the model • Sample document-label pairs for each language in a cyclic fashion: (L 1 , …, L M ) (1) → … → (L 1 , …, L M ) (M) • Op:mizer : SGD with ADAM (same as before) Nikolaos Pappas 12 /18

  13. Dataset: Deutsche Welle Corpus (600k docs, 8 langs) Tagged by journalists Nikolaos Pappas 13 /18

  14. Full-resource Scenario: Bilingual Training Input: 40-d, Encoders: Dense 100-d, AEen;ons: Dense 100-d Ac;va;on: relu • Mul;lingual models consistently outperform monolingual ones • Sharing aEen;on is the best configura;on (on average) • Tradi;onal (bow) vs neural (en+ar, biGRU encoders) • en: 75.8% vs 77.8% — ar: 81.8% vs 84.0% Nikolaos Pappas 14 /18

  15. Low-resource Scenario: Bilingual Training 0.5% high Training percentage Improvement 5% 50% low Nikolaos Pappas 15 /18

  16. Qualita;ve Analysis: English - German • True posi;ve difference Cumulative TP difference (mul; vs mono) increases over the en;re spectrum • German russland (21), berlin (19), irak (14), wahlen (13) and nato (13) • English germany (259), german (97), soccer (73), football 753 (47) and merkel (25) Labels sorted by frequency Nikolaos Pappas 16 /18

  17. Qualita;ve Analysis: Interpretable Output Nikolaos Pappas 17 /18

  18. Conclusion and Perspec;ves • New mul;lingual models to learn shared document structures for text classifica;on • Benefit full-resource and low-resource languages • Achieve beEer accuracy with fewer parameters • Capable of cross-language transfer • Future work • Remove the constraint of closed label sets • Incorporate label informa;on • Apply to other NLU tasks Nikolaos Pappas 18 /18

  19. Thank you User group meeting July 3, 2017 Caversham, UK Demos Technical talks Posters & discussions Contact us if interested! Nikolaos Pappas 19 /18

  20. References • Mul;lingual Hierarchical AEen;on Networks for Text Classifica;on, Nikolaos Pappas and Andrei Popescu-Belis, 2017 (submiEed) • Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, and Noah A. Smith. 2016. Massively mul;lingual word embeddings. CoRR abs/1602.01925. • Stephan Gouws, Yoshua Bengio, and Gregory S. Corrado. 2015. BilBOWA: Fast bilingual distributed representa;ons without word alignments. 32nd Interna;onal Conference on Machine Learning. • Karl Moritz Hermann and Phil Blunsom. 2014. Mul;lingual models for composi;onal distributed seman;cs. 52nd Annual Mee;ng of the Associa;on for Computa;onal Linguis;cs. • Alexandre Klemen;ev, Ivan Titov, and Binod BhaEarai. 2012. Inducing crosslingual distributed rep- 894 resenta;ons of words. Interna;onal Conference on Computa;onal Linguis;cs. • Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical aEen;on networks for document classifica;on. In Proceedings of the 2016 Conference of the North American Chapter of the Associa;on for Computa;onal Linguis;cs: Human Language Technologies. • Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sen;ment classifica;on. In Empirical Methods on Natural Language Processing. • Rui Lin, Shujie Liu, Muyun Yang, Mu Li, Ming Zhou, and Sheng Li. 2015. Hierarchical recurrent neural network for document modeling. Conference on Empirical Methods in Natural Language Processing. • Yoon Kim. 2014. Convolu;onal neural networks for sentence classifica;on. Conference on Empirical Methods in Natural Language Processing. Nikolaos Pappas 20 /18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend