Labeling Text in Several Languages with Mul;lingual Hierarchical - PowerPoint PPT Presentation

Labeling Text in Several Languages with Mul;lingual Hierarchical AEen;on Networks Nikolaos Pappas , Andrei Popescu-Belis Idiap Research Ins;tute June 9, 2017 Swisstext, Winterthur

Topic Recogni;on Spam filtering — Mailbox Op;miza;on — Customer Support Nikolaos Pappas 2 /18

Ques;on Answering Reading/Naviga;on Assistant — Interac;ve Search Ques;on: Which Gaudi’s crea;on is his masterpiece? Answer: Sagrada Família Nikolaos Pappas 3 /18

Machine Transla;on Document Transla;on — Dialogue Transla;on Nikolaos Pappas 4 /18

Fundamental Func;on: Represen;ng Word Sequences • Goal : Learn representa;ons (distributed vectors) of word sequences which encode effec;vely the meaning / knowledge needed to perform ✓ Topic Recogni;on • Ques;on Answering • Machine Transla;on • Summariza;on … Can we benefit from multiple languages? Nikolaos Pappas 5 /18

Dealing with Mul;ple Languages: Monolingually • Solu:on? Separate models per language • language-dependent learning • linear growth of the parameters • lack of cross-language knowledge transfer • hierarchical modeling at the document-level Documents X = {x i | i=1…n} Models (Kim, 2014) f: X → Y (Tang et al., 2015) (Lin et al., 2015) Labels (Yang et al., 2016) Y = {y i | i=1…n} Nikolaos Pappas 6 /18

Dealing with Mul;ple Languages: Mul;lingually • Solu:on? Single model with aligned input space • language-independent learning • constant number of parameters • common label sets across languages • modeling at the word-level (Klementiev et al., 2012) (Herman and Blunsom, 2014) Model (Gouws et al., 2015) (Ammar et al., 2016) Nikolaos Pappas 7 /18

Dealing with Mul;ple Languages: Our contribu;on • Solu:on: Single model trained over arbitrary label sets with an aligned input space • language-independent learning • sub-linear growth of parameters • arbitrary label sets across languages • hierarchical modeling at the document-level Model1 Model2 ModelM … Nikolaos Pappas 8 /18

Background: Hierarchical AEen;on Networks (HANs) • Input: sequence of word vectors • Output : document vector u • Hierarchical structure - Word-level and sentence-level abstrac;on layers - encoder (H s , H w ) - aEen;on mechanism (a w , α s ) - Classifica;on layer (W c ) + cross-entropy Words: Sentences: • Training: using SGD with ADAM Document: (Yang et al., 2016) Nikolaos Pappas 9 /18

MHANs: Mul;lingual Hierarchical AEen;on Networks Nikolaos Pappas 10 /18

Mul;lingual AEen;on Networks: Computa;onal Cost • A fewer number of parameters is needed • θ enc = { H, W(l), H, W(l), W(l) } , θ att = { H(l), W, H(l) , W , W(l) } • θ both = { H, W, H, W, W(l) } , θ mono = { H(l), W(l), H(l), W(l), W(l) } • The following inequali;es are true: • Example with shared aEen;on mechanisms Naive DL multilingual adaptation fails! Nikolaos Pappas 11 /18

Mul;lingual AEen;on Networks: Training Strategy • Minimizing the sum of the cross-entropy errors • Issue : Naive consecu;ve training biases the model • Sample document-label pairs for each language in a cyclic fashion: (L 1 , …, L M ) (1) → … → (L 1 , …, L M ) (M) • Op:mizer : SGD with ADAM (same as before) Nikolaos Pappas 12 /18

Dataset: Deutsche Welle Corpus (600k docs, 8 langs) Tagged by journalists Nikolaos Pappas 13 /18

Full-resource Scenario: Bilingual Training Input: 40-d, Encoders: Dense 100-d, AEen;ons: Dense 100-d Ac;va;on: relu • Mul;lingual models consistently outperform monolingual ones • Sharing aEen;on is the best configura;on (on average) • Tradi;onal (bow) vs neural (en+ar, biGRU encoders) • en: 75.8% vs 77.8% — ar: 81.8% vs 84.0% Nikolaos Pappas 14 /18

Low-resource Scenario: Bilingual Training 0.5% high Training percentage Improvement 5% 50% low Nikolaos Pappas 15 /18

Qualita;ve Analysis: English - German • True posi;ve difference Cumulative TP difference (mul; vs mono) increases over the en;re spectrum • German russland (21), berlin (19), irak (14), wahlen (13) and nato (13) • English germany (259), german (97), soccer (73), football 753 (47) and merkel (25) Labels sorted by frequency Nikolaos Pappas 16 /18

Qualita;ve Analysis: Interpretable Output Nikolaos Pappas 17 /18

Conclusion and Perspec;ves • New mul;lingual models to learn shared document structures for text classifica;on • Benefit full-resource and low-resource languages • Achieve beEer accuracy with fewer parameters • Capable of cross-language transfer • Future work • Remove the constraint of closed label sets • Incorporate label informa;on • Apply to other NLU tasks Nikolaos Pappas 18 /18

Thank you User group meeting July 3, 2017 Caversham, UK Demos Technical talks Posters & discussions Contact us if interested! Nikolaos Pappas 19 /18

References • Mul;lingual Hierarchical AEen;on Networks for Text Classifica;on, Nikolaos Pappas and Andrei Popescu-Belis, 2017 (submiEed) • Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, and Noah A. Smith. 2016. Massively mul;lingual word embeddings. CoRR abs/1602.01925. • Stephan Gouws, Yoshua Bengio, and Gregory S. Corrado. 2015. BilBOWA: Fast bilingual distributed representa;ons without word alignments. 32nd Interna;onal Conference on Machine Learning. • Karl Moritz Hermann and Phil Blunsom. 2014. Mul;lingual models for composi;onal distributed seman;cs. 52nd Annual Mee;ng of the Associa;on for Computa;onal Linguis;cs. • Alexandre Klemen;ev, Ivan Titov, and Binod BhaEarai. 2012. Inducing crosslingual distributed rep- 894 resenta;ons of words. Interna;onal Conference on Computa;onal Linguis;cs. • Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical aEen;on networks for document classifica;on. In Proceedings of the 2016 Conference of the North American Chapter of the Associa;on for Computa;onal Linguis;cs: Human Language Technologies. • Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sen;ment classifica;on. In Empirical Methods on Natural Language Processing. • Rui Lin, Shujie Liu, Muyun Yang, Mu Li, Ming Zhou, and Sheng Li. 2015. Hierarchical recurrent neural network for document modeling. Conference on Empirical Methods in Natural Language Processing. • Yoon Kim. 2014. Convolu;onal neural networks for sentence classifica;on. Conference on Empirical Methods in Natural Language Processing. Nikolaos Pappas 20 /18

Labeling Text in Several Languages with Mul;lingual Hierarchical - PowerPoint PPT Presentation

Labeling Text in Several Languages with Mul;lingual Hierarchical AEen;on Networks Nikolaos Pappas , Andrei Popescu-Belis Idiap Research Ins;tute June 9, 2017 Swisstext, Winterthur Topic Recogni;on Spam filtering Mailbox Op;miza;on

Mul&lingualism @ ECUAD Debora O & Tara Wren

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Mul$lingual web- based communica$on solu$ons for the

Evalua&ng Mul&lingual Humboldt-Universitt zu Berlin

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Mul$-Object Synchroniza$on Mul$-Object Programs What happens

Mul$-Object Synchroniza$on Mul$-Object Programs What happens

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction. Rob van der Goot, Nikola

Recogni(on of Mul(-Oriented, Mul(-Sized, and Curved Text

Requirements of the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Definitions in the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Fall Seminar Seed Sampling & Labeling Larry Nees Seed Administrator Office of INDIANA

Hub Labeling Algorithms Andrew V. Goldberg Amazon.com A.V. Goldberg Hub Labeling 6/2/2016 1 /

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Outline Mul$lingualism and Aphasia Defini,on of mul,lingualism

CS70: Jean Walrand: Lecture 25. Markov Chains: Distributions 1. Review 2. Distribution 3.

Data Presentation in a Web-App Journey of a start-up Simon Oxley Co-founder & CTO Aware

Statistical Natural Language Processing Text Classifjcation ar ltekin University of

EFFECTIVE CODE REVIEW EFFECTIVE CODE REVIEW Who am I? @d0ugal Raise your hand Not

Pseudogrupoids and hoc genus omne in universal algebra Aldo Ursini-Siena, Italy

CVPR is a contemporary art exhibition -- Garbage is a source for impact -- 3D Scene Understanding

Schedule Thursday, May 5: Tracking humans, and how to write conference papers & give

Interpretation and informational aspects of non-Kolmogorovian probability theory Federico Holik

Labeling Text in Several Languages with Mul;lingual Hierarchical - PowerPoint PPT Presentation

Labeling Text in Several Languages with Mul;lingual Hierarchical AEen;on Networks Nikolaos Pappas , Andrei Popescu-Belis Idiap Research Ins;tute June 9, 2017 Swisstext, Winterthur Topic Recogni;on Spam filtering Mailbox Op;miza;on

Mul&amp;lingualism @ ECUAD Debora O &amp; Tara Wren

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Mul$lingual web- based communica$on solu$ons for the

Evalua&amp;ng Mul&amp;lingual Humboldt-Universitt zu Berlin

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Mul$-Object Synchroniza$on Mul$-Object Programs What happens

Mul$-Object Synchroniza$on Mul$-Object Programs What happens

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction. Rob van der Goot, Nikola

Recogni(on of Mul(-Oriented, Mul(-Sized, and Curved Text

Requirements of the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Definitions in the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Fall Seminar Seed Sampling &amp; Labeling Larry Nees Seed Administrator Office of INDIANA

Hub Labeling Algorithms Andrew V. Goldberg Amazon.com A.V. Goldberg Hub Labeling 6/2/2016 1 /

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Outline Mul$lingualism and Aphasia Defini,on of mul,lingualism

CS70: Jean Walrand: Lecture 25. Markov Chains: Distributions 1. Review 2. Distribution 3.

Data Presentation in a Web-App Journey of a start-up Simon Oxley Co-founder &amp; CTO Aware

Statistical Natural Language Processing Text Classifjcation ar ltekin University of

EFFECTIVE CODE REVIEW EFFECTIVE CODE REVIEW Who am I? @d0ugal Raise your hand Not

Pseudogrupoids and hoc genus omne in universal algebra Aldo Ursini-Siena, Italy

CVPR is a contemporary art exhibition -- Garbage is a source for impact -- 3D Scene Understanding

Schedule Thursday, May 5: Tracking humans, and how to write conference papers &amp; give

Interpretation and informational aspects of non-Kolmogorovian probability theory Federico Holik

Mul&lingualism @ ECUAD Debora O & Tara Wren

Evalua&ng Mul&lingual Humboldt-Universitt zu Berlin

Fall Seminar Seed Sampling & Labeling Larry Nees Seed Administrator Office of INDIANA

Data Presentation in a Web-App Journey of a start-up Simon Oxley Co-founder & CTO Aware

Schedule Thursday, May 5: Tracking humans, and how to write conference papers & give