M OTIVATING E XAMPLE 2 Other languages display still more variation - - PowerPoint PPT Presentation

m otivating e xample 2
SMART_READER_LITE
LIVE PREVIEW

M OTIVATING E XAMPLE 2 Other languages display still more variation - - PowerPoint PPT Presentation

C OMPOSITIONAL M ORPHOLOGY FOR W ORD R EPRESENTATIONS AND L ANGUAGE M ODELLING Jan Botha , Phil Blunsom ICML 2014, Beijing M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M OTIVATING E XAMPLE W HAT WE SEE The king finally abdicated after years of


slide-1
SLIDE 1

COMPOSITIONAL MORPHOLOGY FOR WORD REPRESENTATIONS AND LANGUAGE MODELLING

Jan Botha, Phil Blunsom ICML 2014, Beijing

slide-2
SLIDE 2

MOTIVATION PROPOSED METHOD EXPERIMENTS

MOTIVATING EXAMPLE

WHAT WE SEE

The king finally abdicated after years of unkingly conduct .

slide-3
SLIDE 3

MOTIVATION PROPOSED METHOD EXPERIMENTS

MOTIVATING EXAMPLE

WHAT WE SEE

The king finally abdicated after years of unkingly conduct . Wait what – unkingly?

slide-4
SLIDE 4

MOTIVATION PROPOSED METHOD EXPERIMENTS

MOTIVATING EXAMPLE

WHAT WE SEE

The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand

slide-5
SLIDE 5

MOTIVATION PROPOSED METHOD EXPERIMENTS

MOTIVATING EXAMPLE

WHAT WE SEE

The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand ⇒ compositional morphology in action

slide-6
SLIDE 6

MOTIVATION PROPOSED METHOD EXPERIMENTS

MOTIVATING EXAMPLE

WHAT WE SEE

The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand ⇒ compositional morphology in action

WHAT OUR MODELS SEE (MOSTLY)

10 2 95 529 11 88 21 50 74 239

slide-7
SLIDE 7

MOTIVATION PROPOSED METHOD EXPERIMENTS

MOTIVATING EXAMPLE

WHAT WE SEE

The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand ⇒ compositional morphology in action

WHAT OUR MODELS SEE (MOSTLY)

10 2 95 529 11 88 21 50 74 239

slide-8
SLIDE 8

MOTIVATION PROPOSED METHOD EXPERIMENTS

MOTIVATING EXAMPLE 2

Other languages display still more variation

CZECH

CONJUGATION ˇ cistit (to clean) ˇ cistím ˇ cistíš ˇ cistí ˇ cistíme ˇ cistíte ˇ cistil ˇ cištˇ en ˇ cisti ˇ cistˇ ete ˇ cistˇ eme

TURKISH PRODUCTIVE DERIVATION

Avrupa

(Europe)

Avrupalı

(of Europe)

Avrupalıla¸ s

(become of Europe)

Avrupalıla¸ stır

(to Europeanise)

Avrupalıla¸ stırama

(be unable to Europeanise)

Avrupalıla¸ stıramadık

(we were unable to Europeanise)

. . .

slide-9
SLIDE 9

MOTIVATION PROPOSED METHOD EXPERIMENTS

MOTIVATING EXAMPLE 2

Other languages display still more variation

CZECH

CONJUGATION ˇ cistit (to clean) ˇ cistím ˇ cistíš ˇ cistí ˇ cistíme ˇ cistíte ˇ cistil ˇ cištˇ en ˇ cisti ˇ cistˇ ete ˇ cistˇ eme

TURKISH PRODUCTIVE DERIVATION

Avrupa

(Europe)

Avrupalı

(of Europe)

Avrupalıla¸ s

(become of Europe)

Avrupalıla¸ stır

(to Europeanise)

Avrupalıla¸ stırama

(be unable to Europeanise)

Avrupalıla¸ stıramadık

(we were unable to Europeanise)

. . .

⇒ we should model morphemes!

slide-10
SLIDE 10

MOTIVATION PROPOSED METHOD EXPERIMENTS

REPRESENTING WORDS

◮ Discrete set?

{a, aardvark, . . . , account, accounted, accounting, . . . }

slide-11
SLIDE 11

MOTIVATION PROPOSED METHOD EXPERIMENTS

REPRESENTING WORDS

◮ Discrete set?

{a, aardvark, . . . , account, accounted, accounting, . . . }

◮ Vector space? x2 x1 a accounted account aardvark

slide-12
SLIDE 12

MOTIVATION PROPOSED METHOD EXPERIMENTS

EXTRACT FROM COLLOBERT & WESTON EMBEDDINGS

slide-13
SLIDE 13

MOTIVATION PROPOSED METHOD EXPERIMENTS

EXTRACT FROM COLLOBERT & WESTON EMBEDDINGS

slide-14
SLIDE 14

MOTIVATION PROPOSED METHOD EXPERIMENTS

EXTRACT FROM COLLOBERT & WESTON EMBEDDINGS

slide-15
SLIDE 15

MOTIVATION PROPOSED METHOD EXPERIMENTS

MORPHEME VECTORS

Existing word vectors already capture some morphology.

◮ −

− − → banks − − − → bank ≈ − − − → kings − − − → king ≈ − − − − → queens − − − − → queen

(Mikolov et al. 2013)

slide-16
SLIDE 16

MOTIVATION PROPOSED METHOD EXPERIMENTS

MORPHEME VECTORS

Existing word vectors already capture some morphology.

◮ −

− − → banks − − − → bank ≈ − − − → kings − − − → king ≈ − − − − → queens − − − − → queen

(Mikolov et al. 2013)

Logical extension:

◮ −

− − → kings ≈ − − → king + − →

  • s

◮ −

− − − − → unkingly ≈ − → un- + − − → king + − →

  • ly
slide-17
SLIDE 17

MOTIVATION PROPOSED METHOD EXPERIMENTS

MORPHEME VECTORS

Existing word vectors already capture some morphology.

◮ −

− − → banks − − − → bank ≈ − − − → kings − − − → king ≈ − − − − → queens − − − − → queen

(Mikolov et al. 2013)

Logical extension:

◮ −

− − → kings ≈ − − → king + − →

  • s

◮ −

− − − − → unkingly ≈ − → un- + − − → king + − →

  • ly

HOW TO...

◮ obtain morpheme vectors ◮ compose morpheme vectors ◮ do it all within a language model usable in an MT decoder

slide-18
SLIDE 18

MOTIVATION PROPOSED METHOD EXPERIMENTS

MORPHOLOGICAL COMPOSITION AS ADDITION

Literally, word = sum of its parts?

slide-19
SLIDE 19

MOTIVATION PROPOSED METHOD EXPERIMENTS

MORPHOLOGICAL COMPOSITION AS ADDITION

Literally, word = sum of its parts? Problems:

◮ bag of morphemes:

− − → hang + − − →

  • ver = −

− →

  • ver + −

− → hang

◮ non-compositionality:

− − − − − − − → greenhouse = − − − → green + − − − → house

slide-20
SLIDE 20

MOTIVATION PROPOSED METHOD EXPERIMENTS

MORPHOLOGICAL COMPOSITION AS ADDITION

Literally, word = sum of its parts? Problems:

◮ bag of morphemes:

− − → hang + − − →

  • ver = −

− →

  • ver + −

− → hang

◮ non-compositionality:

− − − − − − − → greenhouse = − − − → green + − − − → house

PRAGMATIC SOLUTION

include word identity as component too: − − − − − − − → greenhouse ≡ − − − → greenstem + − − − → housestem − − − − − → unkingly ≡ − → unpre + − − → kingstem + − → ly suf

slide-21
SLIDE 21

MOTIVATION PROPOSED METHOD EXPERIMENTS

MORPHOLOGICAL COMPOSITION AS ADDITION

Literally, word = sum of its parts? Problems:

◮ bag of morphemes:

− − → hang + − − →

  • ver = −

− →

  • ver + −

− → hang

◮ non-compositionality:

− − − − − − − → greenhouse = − − − → green + − − − → house

PRAGMATIC SOLUTION

include word identity as component too: − − − − − − − → greenhouse ≡ − − − − − − − → greenhouseid + − − − → greenstem + − − − → housestem − − − − − → unkingly ≡ − − − − − → unkinglyid + − → unpre + − − → kingstem + − → ly suf

slide-22
SLIDE 22

MOTIVATION PROPOSED METHOD EXPERIMENTS

SIMPLEST VECTOR-BASED PROBABILISTIC LM

LBL (Log-bilinear model)

(Mnih & Hinton, 2007; Mnih & Teh, 2012)

“colorless green ideas sleep furiously .”

slide-23
SLIDE 23

MOTIVATION PROPOSED METHOD EXPERIMENTS

ADD MORPHEME VECTORS INSIDE LM

LBL

++

“colorless green ideas sleep furiously .”

slide-24
SLIDE 24

MOTIVATION PROPOSED METHOD EXPERIMENTS

COMPUTATIONAL EFFICIENCY

Problem: Each probability query requires normalisation over vocabulary.

◮ O(vocab size) ◮ rich morphology ⇒ large vocabulary

slide-25
SLIDE 25

MOTIVATION PROPOSED METHOD EXPERIMENTS

COMPUTATIONAL EFFICIENCY

Problem: Each probability query requires normalisation over vocabulary.

◮ O(vocab size) ◮ rich morphology ⇒ large vocabulary

SOLUTION: DECOMPOSE MODEL USING WORD CLASSES

P

  • word | history
  • =

P

  • class(word) | history
  • × P
  • word | class(word), history
  • ◮ use unsupervised Brown-clustering

◮ each LM query becomes 2 × O(

√ vocab size) ⇒ fast enough for MT-decoding

slide-26
SLIDE 26

MOTIVATION PROPOSED METHOD EXPERIMENTS

EVALUATION OVERVIEW

Setup

◮ 4-gram models ◮ Czech, English, French, German, Spanish, Russian ◮ train on 20–50m tokens ◮ large vocabularies (exclude 5% of singletons)

slide-27
SLIDE 27

MOTIVATION PROPOSED METHOD EXPERIMENTS

EVALUATION OVERVIEW

Setup

◮ 4-gram models ◮ Czech, English, French, German, Spanish, Russian ◮ train on 20–50m tokens ◮ large vocabularies (exclude 5% of singletons)

Three evaluation contexts:

◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation

slide-28
SLIDE 28

MOTIVATION PROPOSED METHOD EXPERIMENTS

EVALUATION OVERVIEW

Three evaluation contexts:

◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation

slide-29
SLIDE 29

MOTIVATION PROPOSED METHOD EXPERIMENTS

PERPLEXITY IMPROVEMENTS BY LANGUAGE

CLBL→CLBL++

CS DE EN ES FR RU 2 4 6

683→643 422→404 281→273 207→203 232→227 313→300

%

slide-30
SLIDE 30

MOTIVATION PROPOSED METHOD EXPERIMENTS

PERPLEXITY IMPROVEMENTS ON GERMAN

CLBL→CLBL++ (BREAK-DOWN BY TOKEN FREQUENCY) <101 <102 <103 <104 <105 <106 <107

5 10 15 20 % Bins of test token frequency

slide-31
SLIDE 31

MOTIVATION PROPOSED METHOD EXPERIMENTS

EVALUATION OVERVIEW

Three evaluation contexts:

◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation

slide-32
SLIDE 32

MOTIVATION PROPOSED METHOD EXPERIMENTS

EVALUATION OVERVIEW

Three evaluation contexts:

◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation

slide-33
SLIDE 33

MOTIVATION PROPOSED METHOD EXPERIMENTS

WORD SIMILARITY RATING

EN(RW) EN(WS353) DE(Gur) DE(ZG) FR(RG) 10 20 30 40 50 Spearman ρ × 100

CLBL word vectors; unknown test word ⇒ generic − − − − − − → unknown

slide-34
SLIDE 34

MOTIVATION PROPOSED METHOD EXPERIMENTS

WORD SIMILARITY RATING

EN(RW) EN(WS353) DE(Gur) DE(ZG) FR(RG) 10 20 30 40 50 Spearman ρ × 100

CLBL word vectors; unknown test word ⇒ generic − − − − − − → unknown CLBL

++ composed vectors; unknown test word ⇒ generic −

− − − − − → unknown

slide-35
SLIDE 35

MOTIVATION PROPOSED METHOD EXPERIMENTS

WORD SIMILARITY RATING

EN(RW) EN(WS353) DE(Gur) DE(ZG) FR(RG) 10 20 30 40 50 Spearman ρ × 100

CLBL word vectors; unknown test word ⇒ generic − − − − − − → unknown CLBL

++ composed vectors; unknown test word ⇒ generic −

− − − − − → unknown CLBL

++ composed vectors; unknown test word ⇒ known −

− − − − − − → morphemes

slide-36
SLIDE 36

MOTIVATION PROPOSED METHOD EXPERIMENTS

EVALUATION OVERVIEW

Three evaluation contexts:

◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation

slide-37
SLIDE 37

MOTIVATION PROPOSED METHOD EXPERIMENTS

EVALUATION OVERVIEW

Three evaluation contexts:

◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation

slide-38
SLIDE 38

MOTIVATION PROPOSED METHOD EXPERIMENTS

MACHINE TRANSLATION EVALUATION

How to use the LM?

◮ rescore n-best list < rescore lattice < decoder feature

slide-39
SLIDE 39

MOTIVATION PROPOSED METHOD EXPERIMENTS

MACHINE TRANSLATION EVALUATION

How to use the LM?

◮ rescore n-best list < rescore lattice < decoder feature

Hierarchical-phrase based decoder (cdec)

◮ Baseline: Kneser-Ney LM feature ◮ Test:

Kneser-Ney LM feature + CLBL feature

slide-40
SLIDE 40

MOTIVATION PROPOSED METHOD EXPERIMENTS

MACHINE TRANSLATION EVALUATION

How to use the LM?

◮ rescore n-best list < rescore lattice < decoder feature

Hierarchical-phrase based decoder (cdec)

◮ Baseline: Kneser-Ney LM feature ◮ Test:

Kneser-Ney LM feature + CLBL feature CLBL speed-up from:

◮ class decomposition ◮ cache normalisers on-the-fly

slide-41
SLIDE 41

MOTIVATION PROPOSED METHOD EXPERIMENTS

TRANSLATION QUALITY (BLEU)

FOR TRANSLATING INTO GIVEN LANGUAGE

CS DE ES FR RU EN 12 14 16 18 20 22 24 26 Kneser-Ney

higher better

slide-42
SLIDE 42

MOTIVATION PROPOSED METHOD EXPERIMENTS

TRANSLATION QUALITY (BLEU)

FOR TRANSLATING INTO GIVEN LANGUAGE

CS DE ES FR RU EN 12 14 16 18 20 22 24 26 Kneser-Ney with CLBL

higher better

slide-43
SLIDE 43

MOTIVATION PROPOSED METHOD EXPERIMENTS

TRANSLATION QUALITY (BLEU)

FOR TRANSLATING INTO GIVEN LANGUAGE

CS DE ES FR RU EN 12 14 16 18 20 22 24 26 Kneser-Ney with CLBL with CLBL

++

higher better

slide-44
SLIDE 44

MOTIVATION PROPOSED METHOD EXPERIMENTS

QUALITATIVE EVALUATION: ENGLISH AFFIX VECTORS

slide-45
SLIDE 45

SUMMARY

Simple, scaleable, unsupervised method for integrating morphology into vector-based LM

◮ improvements in three evaluation settings ◮ translation with normalised NLM works

slide-46
SLIDE 46

SUMMARY

Simple, scaleable, unsupervised method for integrating morphology into vector-based LM

◮ improvements in three evaluation settings ◮ translation with normalised NLM works

Software released shortly

www.clg.ox.ac.uk/resources

{Jan.Botha,Phil.Blunsom}@cs.ox.ac.uk