Labeling Text in Several Languages with Mul;lingual Hierarchical - - PowerPoint PPT Presentation

labeling text in several languages with mul lingual
SMART_READER_LITE
LIVE PREVIEW

Labeling Text in Several Languages with Mul;lingual Hierarchical - - PowerPoint PPT Presentation

Labeling Text in Several Languages with Mul;lingual Hierarchical AEen;on Networks Nikolaos Pappas , Andrei Popescu-Belis Idiap Research Ins;tute June 9, 2017 Swisstext, Winterthur Topic Recogni;on Spam filtering Mailbox Op;miza;on


slide-1
SLIDE 1

June 9, 2017

Swisstext, Winterthur

Nikolaos Pappas, Andrei Popescu-Belis

Idiap Research Ins;tute

Labeling Text in Several Languages with Mul;lingual Hierarchical AEen;on Networks

slide-2
SLIDE 2

Nikolaos Pappas

/18

Topic Recogni;on

2

Spam filtering — Mailbox Op;miza;on — Customer Support

slide-3
SLIDE 3

Nikolaos Pappas

/18

Ques;on Answering

3

Reading/Naviga;on Assistant — Interac;ve Search

Ques;on: Which Gaudi’s crea;on is his masterpiece?

Answer: Sagrada Família

slide-4
SLIDE 4

Nikolaos Pappas

/18

Machine Transla;on

4

Document Transla;on — Dialogue Transla;on

slide-5
SLIDE 5

Nikolaos Pappas

/18

Fundamental Func;on: Represen;ng Word Sequences

5

  • Goal: Learn representa;ons (distributed vectors)
  • f word sequences which encode effec;vely the

meaning / knowledge needed to perform

✓ Topic Recogni;on

  • Ques;on Answering
  • Machine Transla;on
  • Summariza;on

Can we benefit from multiple languages?

slide-6
SLIDE 6

Nikolaos Pappas

/18

Dealing with Mul;ple Languages: Monolingually

6

  • Solu:on? Separate models per language
  • language-dependent learning
  • linear growth of the parameters
  • lack of cross-language knowledge transfer
  • hierarchical modeling at the document-level

Documents X = {xi | i=1…n} Labels Y = {yi | i=1…n} Models f: X →Y (Yang et al., 2016) (Tang et al., 2015) (Lin et al., 2015) (Kim, 2014)

slide-7
SLIDE 7

Nikolaos Pappas

/18

Dealing with Mul;ple Languages: Mul;lingually

7

  • Solu:on? Single model with aligned input space
  • language-independent learning
  • constant number of parameters
  • common label sets across languages
  • modeling at the word-level

Model

(Ammar et al., 2016) (Gouws et al., 2015) (Herman and Blunsom, 2014) (Klementiev et al., 2012)

slide-8
SLIDE 8

Nikolaos Pappas

/18

ModelM Model1 Model2

Dealing with Mul;ple Languages: Our contribu;on

8

  • Solu:on: Single model trained over arbitrary label

sets with an aligned input space

  • language-independent learning
  • sub-linear growth of parameters
  • arbitrary label sets across languages
  • hierarchical modeling at the document-level
slide-9
SLIDE 9

Nikolaos Pappas

/18

Background: Hierarchical AEen;on Networks (HANs)

9

Sentences: Document: Words:

  • Input: sequence of word vectors
  • Output: document vector u
  • Hierarchical structure
  • Word-level and sentence-level

abstrac;on layers

  • encoder (Hs, Hw)
  • aEen;on mechanism (aw, αs)
  • Classifica;on layer (Wc) + cross-entropy
  • Training: using SGD with ADAM

(Yang et al., 2016)

slide-10
SLIDE 10

Nikolaos Pappas

/18

MHANs: Mul;lingual Hierarchical AEen;on Networks

10

slide-11
SLIDE 11

Nikolaos Pappas

/18

  • A fewer number of parameters is needed
  • θenc = {H, W(l), H, W(l), W(l)} , θatt = {H(l), W, H(l) , W , W(l)}
  • θboth = {H, W, H, W, W(l)} , θmono = {H(l), W(l), H(l), W(l), W(l)}
  • The following inequali;es are true:
  • Example with shared aEen;on mechanisms

Mul;lingual AEen;on Networks: Computa;onal Cost

11

Naive DL multilingual adaptation fails!

slide-12
SLIDE 12

Nikolaos Pappas

/18

Mul;lingual AEen;on Networks: Training Strategy

12

  • Minimizing the sum of the cross-entropy errors
  • Issue: Naive consecu;ve training biases the model
  • Sample document-label pairs for each language in

a cyclic fashion: (L1, …, LM)(1) → … → (L1, …, LM)(M)

  • Op:mizer: SGD with ADAM (same as before)
slide-13
SLIDE 13

Nikolaos Pappas

/18

Dataset: Deutsche Welle Corpus (600k docs, 8 langs)

13

Tagged by journalists

slide-14
SLIDE 14

Nikolaos Pappas

/18

Full-resource Scenario: Bilingual Training

14

  • Mul;lingual models consistently outperform monolingual ones
  • Sharing aEen;on is the best configura;on (on average)
  • Tradi;onal (bow) vs neural (en+ar, biGRU encoders)
  • en: 75.8% vs 77.8% — ar: 81.8% vs 84.0%

Input: 40-d, Encoders: Dense 100-d, AEen;ons: Dense 100-d Ac;va;on: relu

slide-15
SLIDE 15

Nikolaos Pappas

/18

Improvement low high 50% 5% 0.5% Training percentage

Low-resource Scenario: Bilingual Training

15

slide-16
SLIDE 16

Nikolaos Pappas

/18

16

Qualita;ve Analysis: English - German

  • True posi;ve difference

(mul; vs mono) increases

  • ver the en;re spectrum
  • German

russland (21), berlin (19), irak (14), wahlen (13) and nato (13)

  • English

germany (259), german (97), soccer (73), football 753 (47) and merkel (25)

Cumulative TP difference

Labels sorted by frequency

slide-17
SLIDE 17

Nikolaos Pappas

/18

17

Qualita;ve Analysis: Interpretable Output

slide-18
SLIDE 18

Nikolaos Pappas

/18

Conclusion and Perspec;ves

18

  • New mul;lingual models to learn shared document

structures for text classifica;on

  • Benefit full-resource and low-resource languages
  • Achieve beEer accuracy with fewer parameters
  • Capable of cross-language transfer
  • Future work
  • Remove the constraint of closed label sets
  • Incorporate label informa;on
  • Apply to other NLU tasks
slide-19
SLIDE 19

Nikolaos Pappas

/18

19

User group meeting

July 3, 2017 Caversham, UK Demos Technical talks Posters & discussions Contact us if interested!

Thank you

slide-20
SLIDE 20

Nikolaos Pappas

/18

20

  • Mul;lingual Hierarchical AEen;on Networks for Text Classifica;on, Nikolaos Pappas and Andrei

Popescu-Belis, 2017 (submiEed)

  • Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, and Noah A. Smith.
  • 2016. Massively mul;lingual word embeddings. CoRR abs/1602.01925.
  • Stephan Gouws, Yoshua Bengio, and Gregory S. Corrado. 2015. BilBOWA: Fast bilingual distributed

representa;ons without word alignments. 32nd Interna;onal Conference on Machine Learning.

  • Karl Moritz Hermann and Phil Blunsom. 2014. Mul;lingual models for composi;onal distributed

seman;cs. 52nd Annual Mee;ng of the Associa;on for Computa;onal Linguis;cs.

  • Alexandre Klemen;ev, Ivan Titov, and Binod BhaEarai. 2012. Inducing crosslingual distributed rep-

894 resenta;ons of words. Interna;onal Conference on Computa;onal Linguis;cs.

  • Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical

aEen;on networks for document classifica;on. In Proceedings of the 2016 Conference of the North American Chapter of the Associa;on for Computa;onal Linguis;cs: Human Language Technologies.

  • Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network

for sen;ment classifica;on. In Empirical Methods on Natural Language Processing.

  • Rui Lin, Shujie Liu, Muyun Yang, Mu Li, Ming Zhou, and Sheng Li. 2015. Hierarchical recurrent neural

network for document modeling. Conference on Empirical Methods in Natural Language Processing.

  • Yoon Kim. 2014. Convolu;onal neural networks for sentence classifica;on. Conference on Empirical

Methods in Natural Language Processing.

References