June 9, 2017
Swisstext, Winterthur
Nikolaos Pappas, Andrei Popescu-Belis
Idiap Research Ins;tute
Labeling Text in Several Languages with Mul;lingual Hierarchical - - PowerPoint PPT Presentation
Labeling Text in Several Languages with Mul;lingual Hierarchical AEen;on Networks Nikolaos Pappas , Andrei Popescu-Belis Idiap Research Ins;tute June 9, 2017 Swisstext, Winterthur Topic Recogni;on Spam filtering Mailbox Op;miza;on
June 9, 2017
Swisstext, Winterthur
Nikolaos Pappas, Andrei Popescu-Belis
Idiap Research Ins;tute
Nikolaos Pappas
/18
2
Nikolaos Pappas
/18
3
Ques;on: Which Gaudi’s crea;on is his masterpiece?
Answer: Sagrada Família
Nikolaos Pappas
/18
4
Nikolaos Pappas
/18
5
✓ Topic Recogni;on
Can we benefit from multiple languages?
Nikolaos Pappas
/18
6
Documents X = {xi | i=1…n} Labels Y = {yi | i=1…n} Models f: X →Y (Yang et al., 2016) (Tang et al., 2015) (Lin et al., 2015) (Kim, 2014)
Nikolaos Pappas
/18
7
Model
(Ammar et al., 2016) (Gouws et al., 2015) (Herman and Blunsom, 2014) (Klementiev et al., 2012)
Nikolaos Pappas
/18
ModelM Model1 Model2
…
8
Nikolaos Pappas
/18
9
Sentences: Document: Words:
abstrac;on layers
(Yang et al., 2016)
Nikolaos Pappas
/18
10
Nikolaos Pappas
/18
11
Naive DL multilingual adaptation fails!
Nikolaos Pappas
/18
12
a cyclic fashion: (L1, …, LM)(1) → … → (L1, …, LM)(M)
Nikolaos Pappas
/18
13
Tagged by journalists
Nikolaos Pappas
/18
14
Input: 40-d, Encoders: Dense 100-d, AEen;ons: Dense 100-d Ac;va;on: relu
Nikolaos Pappas
/18
Improvement low high 50% 5% 0.5% Training percentage
15
Nikolaos Pappas
/18
16
russland (21), berlin (19), irak (14), wahlen (13) and nato (13)
germany (259), german (97), soccer (73), football 753 (47) and merkel (25)
Cumulative TP difference
Labels sorted by frequency
Nikolaos Pappas
/18
17
Nikolaos Pappas
/18
18
structures for text classifica;on
Nikolaos Pappas
/18
19
User group meeting
July 3, 2017 Caversham, UK Demos Technical talks Posters & discussions Contact us if interested!
Nikolaos Pappas
/18
20
Popescu-Belis, 2017 (submiEed)
representa;ons without word alignments. 32nd Interna;onal Conference on Machine Learning.
seman;cs. 52nd Annual Mee;ng of the Associa;on for Computa;onal Linguis;cs.
894 resenta;ons of words. Interna;onal Conference on Computa;onal Linguis;cs.
aEen;on networks for document classifica;on. In Proceedings of the 2016 Conference of the North American Chapter of the Associa;on for Computa;onal Linguis;cs: Human Language Technologies.
for sen;ment classifica;on. In Empirical Methods on Natural Language Processing.
network for document modeling. Conference on Empirical Methods in Natural Language Processing.
Methods in Natural Language Processing.