M OTIVATING E XAMPLE 2 Other languages display still more variation - - PowerPoint PPT Presentation
M OTIVATING E XAMPLE 2 Other languages display still more variation - - PowerPoint PPT Presentation
C OMPOSITIONAL M ORPHOLOGY FOR W ORD R EPRESENTATIONS AND L ANGUAGE M ODELLING Jan Botha , Phil Blunsom ICML 2014, Beijing M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M OTIVATING E XAMPLE W HAT WE SEE The king finally abdicated after years of
MOTIVATION PROPOSED METHOD EXPERIMENTS
MOTIVATING EXAMPLE
WHAT WE SEE
The king finally abdicated after years of unkingly conduct .
MOTIVATION PROPOSED METHOD EXPERIMENTS
MOTIVATING EXAMPLE
WHAT WE SEE
The king finally abdicated after years of unkingly conduct . Wait what – unkingly?
MOTIVATION PROPOSED METHOD EXPERIMENTS
MOTIVATING EXAMPLE
WHAT WE SEE
The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand
MOTIVATION PROPOSED METHOD EXPERIMENTS
MOTIVATING EXAMPLE
WHAT WE SEE
The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand ⇒ compositional morphology in action
MOTIVATION PROPOSED METHOD EXPERIMENTS
MOTIVATING EXAMPLE
WHAT WE SEE
The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand ⇒ compositional morphology in action
WHAT OUR MODELS SEE (MOSTLY)
10 2 95 529 11 88 21 50 74 239
MOTIVATION PROPOSED METHOD EXPERIMENTS
MOTIVATING EXAMPLE
WHAT WE SEE
The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand ⇒ compositional morphology in action
WHAT OUR MODELS SEE (MOSTLY)
10 2 95 529 11 88 21 50 74 239
MOTIVATION PROPOSED METHOD EXPERIMENTS
MOTIVATING EXAMPLE 2
Other languages display still more variation
CZECH
CONJUGATION ˇ cistit (to clean) ˇ cistím ˇ cistíš ˇ cistí ˇ cistíme ˇ cistíte ˇ cistil ˇ cištˇ en ˇ cisti ˇ cistˇ ete ˇ cistˇ eme
TURKISH PRODUCTIVE DERIVATION
Avrupa
(Europe)
Avrupalı
(of Europe)
Avrupalıla¸ s
(become of Europe)
Avrupalıla¸ stır
(to Europeanise)
Avrupalıla¸ stırama
(be unable to Europeanise)
Avrupalıla¸ stıramadık
(we were unable to Europeanise)
. . .
MOTIVATION PROPOSED METHOD EXPERIMENTS
MOTIVATING EXAMPLE 2
Other languages display still more variation
CZECH
CONJUGATION ˇ cistit (to clean) ˇ cistím ˇ cistíš ˇ cistí ˇ cistíme ˇ cistíte ˇ cistil ˇ cištˇ en ˇ cisti ˇ cistˇ ete ˇ cistˇ eme
TURKISH PRODUCTIVE DERIVATION
Avrupa
(Europe)
Avrupalı
(of Europe)
Avrupalıla¸ s
(become of Europe)
Avrupalıla¸ stır
(to Europeanise)
Avrupalıla¸ stırama
(be unable to Europeanise)
Avrupalıla¸ stıramadık
(we were unable to Europeanise)
. . .
⇒ we should model morphemes!
MOTIVATION PROPOSED METHOD EXPERIMENTS
REPRESENTING WORDS
◮ Discrete set?
{a, aardvark, . . . , account, accounted, accounting, . . . }
MOTIVATION PROPOSED METHOD EXPERIMENTS
REPRESENTING WORDS
◮ Discrete set?
{a, aardvark, . . . , account, accounted, accounting, . . . }
◮ Vector space? x2 x1 a accounted account aardvark
MOTIVATION PROPOSED METHOD EXPERIMENTS
EXTRACT FROM COLLOBERT & WESTON EMBEDDINGS
MOTIVATION PROPOSED METHOD EXPERIMENTS
EXTRACT FROM COLLOBERT & WESTON EMBEDDINGS
MOTIVATION PROPOSED METHOD EXPERIMENTS
EXTRACT FROM COLLOBERT & WESTON EMBEDDINGS
MOTIVATION PROPOSED METHOD EXPERIMENTS
MORPHEME VECTORS
Existing word vectors already capture some morphology.
◮ −
− − → banks − − − → bank ≈ − − − → kings − − − → king ≈ − − − − → queens − − − − → queen
(Mikolov et al. 2013)
MOTIVATION PROPOSED METHOD EXPERIMENTS
MORPHEME VECTORS
Existing word vectors already capture some morphology.
◮ −
− − → banks − − − → bank ≈ − − − → kings − − − → king ≈ − − − − → queens − − − − → queen
(Mikolov et al. 2013)
Logical extension:
◮ −
− − → kings ≈ − − → king + − →
- s
◮ −
− − − − → unkingly ≈ − → un- + − − → king + − →
- ly
MOTIVATION PROPOSED METHOD EXPERIMENTS
MORPHEME VECTORS
Existing word vectors already capture some morphology.
◮ −
− − → banks − − − → bank ≈ − − − → kings − − − → king ≈ − − − − → queens − − − − → queen
(Mikolov et al. 2013)
Logical extension:
◮ −
− − → kings ≈ − − → king + − →
- s
◮ −
− − − − → unkingly ≈ − → un- + − − → king + − →
- ly
HOW TO...
◮ obtain morpheme vectors ◮ compose morpheme vectors ◮ do it all within a language model usable in an MT decoder
MOTIVATION PROPOSED METHOD EXPERIMENTS
MORPHOLOGICAL COMPOSITION AS ADDITION
Literally, word = sum of its parts?
MOTIVATION PROPOSED METHOD EXPERIMENTS
MORPHOLOGICAL COMPOSITION AS ADDITION
Literally, word = sum of its parts? Problems:
◮ bag of morphemes:
− − → hang + − − →
- ver = −
− →
- ver + −
− → hang
◮ non-compositionality:
− − − − − − − → greenhouse = − − − → green + − − − → house
MOTIVATION PROPOSED METHOD EXPERIMENTS
MORPHOLOGICAL COMPOSITION AS ADDITION
Literally, word = sum of its parts? Problems:
◮ bag of morphemes:
− − → hang + − − →
- ver = −
− →
- ver + −
− → hang
◮ non-compositionality:
− − − − − − − → greenhouse = − − − → green + − − − → house
PRAGMATIC SOLUTION
include word identity as component too: − − − − − − − → greenhouse ≡ − − − → greenstem + − − − → housestem − − − − − → unkingly ≡ − → unpre + − − → kingstem + − → ly suf
MOTIVATION PROPOSED METHOD EXPERIMENTS
MORPHOLOGICAL COMPOSITION AS ADDITION
Literally, word = sum of its parts? Problems:
◮ bag of morphemes:
− − → hang + − − →
- ver = −
− →
- ver + −
− → hang
◮ non-compositionality:
− − − − − − − → greenhouse = − − − → green + − − − → house
PRAGMATIC SOLUTION
include word identity as component too: − − − − − − − → greenhouse ≡ − − − − − − − → greenhouseid + − − − → greenstem + − − − → housestem − − − − − → unkingly ≡ − − − − − → unkinglyid + − → unpre + − − → kingstem + − → ly suf
MOTIVATION PROPOSED METHOD EXPERIMENTS
SIMPLEST VECTOR-BASED PROBABILISTIC LM
LBL (Log-bilinear model)
(Mnih & Hinton, 2007; Mnih & Teh, 2012)
“colorless green ideas sleep furiously .”
MOTIVATION PROPOSED METHOD EXPERIMENTS
ADD MORPHEME VECTORS INSIDE LM
LBL
++
“colorless green ideas sleep furiously .”
MOTIVATION PROPOSED METHOD EXPERIMENTS
COMPUTATIONAL EFFICIENCY
Problem: Each probability query requires normalisation over vocabulary.
◮ O(vocab size) ◮ rich morphology ⇒ large vocabulary
MOTIVATION PROPOSED METHOD EXPERIMENTS
COMPUTATIONAL EFFICIENCY
Problem: Each probability query requires normalisation over vocabulary.
◮ O(vocab size) ◮ rich morphology ⇒ large vocabulary
SOLUTION: DECOMPOSE MODEL USING WORD CLASSES
P
- word | history
- =
P
- class(word) | history
- × P
- word | class(word), history
- ◮ use unsupervised Brown-clustering
◮ each LM query becomes 2 × O(
√ vocab size) ⇒ fast enough for MT-decoding
MOTIVATION PROPOSED METHOD EXPERIMENTS
EVALUATION OVERVIEW
Setup
◮ 4-gram models ◮ Czech, English, French, German, Spanish, Russian ◮ train on 20–50m tokens ◮ large vocabularies (exclude 5% of singletons)
MOTIVATION PROPOSED METHOD EXPERIMENTS
EVALUATION OVERVIEW
Setup
◮ 4-gram models ◮ Czech, English, French, German, Spanish, Russian ◮ train on 20–50m tokens ◮ large vocabularies (exclude 5% of singletons)
Three evaluation contexts:
◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation
MOTIVATION PROPOSED METHOD EXPERIMENTS
EVALUATION OVERVIEW
Three evaluation contexts:
◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation
MOTIVATION PROPOSED METHOD EXPERIMENTS
PERPLEXITY IMPROVEMENTS BY LANGUAGE
CLBL→CLBL++
CS DE EN ES FR RU 2 4 6
683→643 422→404 281→273 207→203 232→227 313→300
%
MOTIVATION PROPOSED METHOD EXPERIMENTS
PERPLEXITY IMPROVEMENTS ON GERMAN
CLBL→CLBL++ (BREAK-DOWN BY TOKEN FREQUENCY) <101 <102 <103 <104 <105 <106 <107
5 10 15 20 % Bins of test token frequency
MOTIVATION PROPOSED METHOD EXPERIMENTS
EVALUATION OVERVIEW
Three evaluation contexts:
◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation
MOTIVATION PROPOSED METHOD EXPERIMENTS
EVALUATION OVERVIEW
Three evaluation contexts:
◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation
MOTIVATION PROPOSED METHOD EXPERIMENTS
WORD SIMILARITY RATING
EN(RW) EN(WS353) DE(Gur) DE(ZG) FR(RG) 10 20 30 40 50 Spearman ρ × 100
CLBL word vectors; unknown test word ⇒ generic − − − − − − → unknown
MOTIVATION PROPOSED METHOD EXPERIMENTS
WORD SIMILARITY RATING
EN(RW) EN(WS353) DE(Gur) DE(ZG) FR(RG) 10 20 30 40 50 Spearman ρ × 100
CLBL word vectors; unknown test word ⇒ generic − − − − − − → unknown CLBL
++ composed vectors; unknown test word ⇒ generic −
− − − − − → unknown
MOTIVATION PROPOSED METHOD EXPERIMENTS
WORD SIMILARITY RATING
EN(RW) EN(WS353) DE(Gur) DE(ZG) FR(RG) 10 20 30 40 50 Spearman ρ × 100
CLBL word vectors; unknown test word ⇒ generic − − − − − − → unknown CLBL
++ composed vectors; unknown test word ⇒ generic −
− − − − − → unknown CLBL
++ composed vectors; unknown test word ⇒ known −
− − − − − − → morphemes
MOTIVATION PROPOSED METHOD EXPERIMENTS
EVALUATION OVERVIEW
Three evaluation contexts:
◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation
MOTIVATION PROPOSED METHOD EXPERIMENTS
EVALUATION OVERVIEW
Three evaluation contexts:
◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation
MOTIVATION PROPOSED METHOD EXPERIMENTS
MACHINE TRANSLATION EVALUATION
How to use the LM?
◮ rescore n-best list < rescore lattice < decoder feature
MOTIVATION PROPOSED METHOD EXPERIMENTS
MACHINE TRANSLATION EVALUATION
How to use the LM?
◮ rescore n-best list < rescore lattice < decoder feature
Hierarchical-phrase based decoder (cdec)
◮ Baseline: Kneser-Ney LM feature ◮ Test:
Kneser-Ney LM feature + CLBL feature
MOTIVATION PROPOSED METHOD EXPERIMENTS
MACHINE TRANSLATION EVALUATION
How to use the LM?
◮ rescore n-best list < rescore lattice < decoder feature
Hierarchical-phrase based decoder (cdec)
◮ Baseline: Kneser-Ney LM feature ◮ Test:
Kneser-Ney LM feature + CLBL feature CLBL speed-up from:
◮ class decomposition ◮ cache normalisers on-the-fly
MOTIVATION PROPOSED METHOD EXPERIMENTS
TRANSLATION QUALITY (BLEU)
FOR TRANSLATING INTO GIVEN LANGUAGE
CS DE ES FR RU EN 12 14 16 18 20 22 24 26 Kneser-Ney
higher better
MOTIVATION PROPOSED METHOD EXPERIMENTS
TRANSLATION QUALITY (BLEU)
FOR TRANSLATING INTO GIVEN LANGUAGE
CS DE ES FR RU EN 12 14 16 18 20 22 24 26 Kneser-Ney with CLBL
higher better
MOTIVATION PROPOSED METHOD EXPERIMENTS
TRANSLATION QUALITY (BLEU)
FOR TRANSLATING INTO GIVEN LANGUAGE
CS DE ES FR RU EN 12 14 16 18 20 22 24 26 Kneser-Ney with CLBL with CLBL
++
higher better
MOTIVATION PROPOSED METHOD EXPERIMENTS