Multilingual Europe As A Challenge for Language Technology Ryan - - PowerPoint PPT Presentation

multilingual europe as a challenge for language technology
SMART_READER_LITE
LIVE PREVIEW

Multilingual Europe As A Challenge for Language Technology Ryan - - PowerPoint PPT Presentation

Multilingual Europe As A Challenge for Language Technology Ryan McDonald Google NLU team Google Linguistics team Universal Dependencies Group Europe & the Internet 54% 32% 14% other ~600M internet users (~80% internet penetration)


slide-1
SLIDE 1

Ryan McDonald

Google NLU team Google Linguistics team Universal Dependencies Group

Multilingual Europe As A Challenge for Language Technology

slide-2
SLIDE 2

META-Forum 2016

Europe & the Internet

Language of top 10M pages
 (W3Tech 2015)

14% 32% 54%

~600M internet users

(~80% internet penetration)

  • ther
slide-3
SLIDE 3

META-Forum 2016

Google: Mobile vs. Desktop

slide-4
SLIDE 4

META-Forum 2016

Search & Language Technology

Ten links

slide-5
SLIDE 5

META-Forum 2016

Search & Language Technology

slide-6
SLIDE 6

META-Forum 2016

Mobile & Language Technology

Generation & Text-to-speech Speech recognition Natural Language Understanding

slide-7
SLIDE 7

META-Forum 2016

Mobile & Language Technology

Order a pizza Get weather Q & A Predictive info

slide-8
SLIDE 8

META-Forum 2016

Mobile & Language Technology

Mobile is the future Language technologies key to mobile experience Users demand native language support Europe: large market dozens of languages

slide-9
SLIDE 9

META-Forum 2016

Language Technology Productionization

NLU system

English version Internationalize

slide-10
SLIDE 10

META-Forum 2016

Baked-in multilingualism in NLU

NLU system

Multilingual systems

slide-11
SLIDE 11

META-Forum 2016

End-to-end NN

Pros Simple & flexible Multilingual by nature? High accuracy** Cons Hard to interpret Need a lot of data**

slide-12
SLIDE 12

META-Forum 2016

Similarity / Paraphrase / NLI

Parikh et al. 2016

Multilingual?? Units/words? Word order? Structural bias?

slide-13
SLIDE 13

META-Forum 2016

Intermediate Representations

Useful (discrete) abstractions for i18n NLU?

NLU Abstraction layer

Ο πληθυσμός του Καναδά είναι 35,000,000 (The population of Canada is 35M)

slide-14
SLIDE 14

NLU focused awards, May 2016

Morphosyntax

Ο Ιωάννης είδε τους γονείς του, όταν πήγε στην Αθήνα.

John saw family his when went to Athens

Ήταν ευτυχής να τον δουν.

were happy to him see δουν:

number: plural person: third tense: subjunctive/future

Ήταν:

number: ? person: third tense: past

πήγε:

number: singular person: third tense: past

Missing pronoun realization

  • 1. Syntax to identify relevant verbs
  • 2. Morphology to piece together

gender/number

NLP / Analysis

slide-15
SLIDE 15

NLU focused awards, May 2016

Morphosyntax

Generation / TTS

population(Καναδάς, 35.000.000) Ο πληθυσμός <masc-gen-sing-prep> <masc-gen-sing-Entity> <sing-cop> <pop>

+

nom-fem-plur

Ο πληθυσμός του Καναδά είναι 35.000.000 = Ο πληθυσμός του Καναδά είναι τριάντα πέντε εκατομμύρια =

slide-16
SLIDE 16

META-Forum 2016

Universal Dependencies (UD)

(Nivre et al. 2016)

✤ Content-head reigns supreme for dependencies ✤ UPOS + Morphology + lemma surface analysis

~26 European languages covered in v1.3

slide-17
SLIDE 17

META-Forum 2016

Analysis: SyntaxNet

(Andor et al. 2016)

UD

Chen & Manning 14 Weiss et al. 15

slide-18
SLIDE 18

META-Forum 2016

Morphosyntactic Analysis @ Google

slide-19
SLIDE 19

META-Forum 2016

Generation: Entity Lexicons

Καναδάς: Gender=Masc, Case=Nom, Number=Sing Καναδά: Gender=Masc, Case=Acc, Number=Sing Καναδά: Gender=Masc, Case=Gen, Number=Sing

Ο Καναδάς έχει πολλά δέντρα.

Gender=Masc Case=Nom Number=Sing

Ο πληθυσμός του Καναδά είναι 35M.

Gender=Masc Case=Gen Number=Sing

Είδα τον Καναδά με το τρένο.

Gender=Masc Case=Acc Number=Sing Inflectional table Edit-distance Unsup-morpher etc.

slide-20
SLIDE 20

META-Forum 2016

Summary

NLU + = Multilingualism in NLU from ground-up End-to-end, Morphosyntax, more?

slide-21
SLIDE 21

Thanks