Language Modeling for Codeswitching
HILA GONEN PHD STUDENT AT YOAV GOLDBERG’S LAB BAR ILAN UNIVERSITY
Language Modeling for Codeswitching HILA GONEN PHD STUDENT AT YOAV - - PowerPoint PPT Presentation
Language Modeling for Codeswitching HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY Outline Background Codeswitching Language Modeling and Perplexity New Evaluation Method Definition Creation of data
HILA GONEN PHD STUDENT AT YOAV GOLDBERG’S LAB BAR ILAN UNIVERSITY
LANGUAGE MODELING FOR CODE SWITCHING
2
LANGUAGE MODELING FOR CODE SWITCHING
“the alternation of two languages within a single discourse, sentence or constituent”
(Poplack, 1980) English – Spanish: that es su tío that has lived with him like I don't know how like ya several years... that his uncle who has lived with him like, I don't know how, like several years already... French – Arabic: mais les filles ta3na ysedkou n'import quoi ana hada face book jamais cheftou khlah kalbi Our girls believe anything, I have never seen this Facebook before.
3
LANGUAGE MODELING FOR CODE SWITCHING
4
LANGUAGE MODELING FOR CODE SWITCHING
LANGUAGE MODELING FOR CODE SWITCHING
word in the vocabulary to follow.
6
LANGUAGE MODELING FOR CODE SWITCHING
7
I love chocolate cheesecakes winter strawberries me 0.6 0.1 0.08 0.06 0.04 …
LANGUAGE MODELING FOR CODE SWITCHING
8
LANGUAGE MODELING FOR CODE SWITCHING
performance with other aspects of the ASR system, hard to replicate the evaluation procedure and fairly compare results)
9
LANGUAGE MODELING FOR CODE SWITCHING
10
We want to evaluate the language model independently of an ASR
LANGUAGE MODELING FOR CODE SWITCHING
Given a language model and a test sequence of words
the perplexity of
where
is the probability the model assigns to .
The lower the perplexity, the better the LM is.
11
LANGUAGE MODELING FOR CODE SWITCHING
et al. 2018)
word-error-rate (WER) scores (Huang et al. 2018)
implausible sentences
12
LANGUAGE MODELING FOR CODE SWITCHING
results in significant change in perplexity – why?
13
LANGUAGE MODELING FOR CODE SWITCHING
14
We do not want to evaluate the language model with perplexity
LANGUAGE MODELING FOR CODE SWITCHING
15
LANGUAGE MODELING FOR CODE SWITCHING
16
LANGUAGE MODELING FOR CODE SWITCHING
17
LANGUAGE MODELING FOR CODE SWITCHING
Gold data:
speakers in Florida, all of whom are bilingual in English
Examples:
18
LANGUAGE MODELING FOR CODE SWITCHING
How do we obtain similar-sounding sentences to build the sets? We create them! For each gold sentence, we create alternative sentences of all types:
We do that using finite state transducers (FSTs) – to be explained
19
LANGUAGE MODELING FOR CODE SWITCHING
20
LANGUAGE MODELING FOR CODE SWITCHING
21
LANGUAGE MODELING FOR CODE SWITCHING
if there is a path with as its input labels and as its output labels
22
LANGUAGE MODELING FOR CODE SWITCHING
Formally, an FST is a 6-tuple such that:
23
LANGUAGE MODELING FOR CODE SWITCHING
The girl is sad
24
sad: happy
The girl is happy This is a happy story This is a sad story
LANGUAGE MODELING FOR CODE SWITCHING
We implement the creation of the dataset with Carmel, an FST toolkit:
(the inverse of 1).
25
LANGUAGE MODELING FOR CODE SWITCHING
We use pronunciation dictionaries for both languages:
26
B UH K K AE T L IY B R OW G AA T OW
LANGUAGE MODELING FOR CODE SWITCHING
We allow minor change in the phoneme sequence to increase flexibility:
27
LANGUAGE MODELING FOR CODE SWITCHING
We use the same pronunciation dictionaries:
28
smell y que to smelly que to
To favor frequent words over infrequent ones, we add unigram probabilities to the edges of the transducer
29 LANGUAGE MODELING FOR CODE SWITCHING 29 29
gato:SP G AA T OW
Gold sentence: smelly gato Phoneme sequence: S M EH L IY G AA T OW
smelly:EN S M EH L IY
Alternative sequence: smell y que to Phoneme sequence: S M EH L IY K EY T UW
G K AA EY OW UW S M EH L smell:EN IY y:SP K EY que:SP T UW to:EN
Sentence to phonemes FST Phonemes to Sentence FST Changing phonemes FST
Alternative sequence: smelly que to
LANGUAGE MODELING FOR CODE SWITCHING
Implementation details:
sentence
a code-switched alternative
languages, and to differ from each other, using some heuristics (e.g more words from the less dominant language)
code-switched/monolingual
30
LANGUAGE MODELING FOR CODE SWITCHING
New evaluation method that enables comparison of wide range of models:
Creating dataset using FSTs Applicable for any language or language-pair
31
LANGUAGE MODELING FOR CODE SWITCHING
Standard architecture:
recurrent layers (Gal and Ghahramani, 2016)
Parameters matter a lot (less so when only changing the training data)
32
LANGUAGE MODELING FOR CODE SWITCHING
Data: Codeswitching corpus only (Bangor Miami Corpus)
33
How can we improve over the baseline?
Baseline
LANGUAGE MODELING FOR CODE SWITCHING
TV series (Tiedemann, 2009)
34
How do we efficiently incorporate monolingual data when training a CS LM?
LANGUAGE MODELING FOR CODE SWITCHING
We train a language model on both the monolingual and the CS data The CS data is used at the end of each epoch – ALL:CS-Last
35
LANGUAGE MODELING FOR CODE SWITCHING
We train a language model on both the monolingual and the CS data The CS data is used at the end of each epoch – ALL:CS-Last
36
Baseline This model
LANGUAGE MODELING FOR CODE SWITCHING
We train a language model on both the monolingual and the CS data All sentences are shuffled together – ALL:Shuffled
37
LANGUAGE MODELING FOR CODE SWITCHING
We train a language model on both the monolingual and the CS data All sentences are shuffled together – ALL:Shuffled
38
Baseline This model
LANGUAGE MODELING FOR CODE SWITCHING
monolingual sentences
with full sharing of parameters
codeswitched data to further train the model
39
New training
LANGUAGE MODELING FOR CODE SWITCHING
40
Baseline This model
LANGUAGE MODELING FOR CODE SWITCHING
Let’s look at the results more carefully: What is our accuracy when the gold sentence is CS and when it is monolingual?
41
LANGUAGE MODELING FOR CODE SWITCHING
42
Most of the improvement stems from sets with monolingual sentences as gold. This is not surprising! Recall that during the pretraining phase, the model is exposed to monolingual data only.
LANGUAGE MODELING FOR CODE SWITCHING
43
Can we do better than that?
LANGUAGE MODELING FOR CODE SWITCHING
Yes! Now we can help the model with negative examples we create, of all types:
corrects sentence
44
LANGUAGE MODELING FOR CODE SWITCHING
45
be the other sentences in that
LANGUAGE MODELING FOR CODE SWITCHING
Done using the same technique: composition of FSTs We use the CS training set as our gold sentences We get a training set that is 10 times bigger than the original one Fine-tuning is done in the same manner We create sentences from a random subset of the monolingual data
46
LANGUAGE MODELING FOR CODE SWITCHING
47
LANGUAGE MODELING FOR CODE SWITCHING
Dramatic improvements in cases where the gold sentence is CS:
48
LANGUAGE MODELING FOR CODE SWITCHING
As expected, we see that the less CS data we have, the more important it is the add monolingual data:
49
LANGUAGE MODELING FOR CODE SWITCHING
Examples of sentences the FINE-TUNED-DISCRIMINATIVE model identifies correctly while the FINETUNED-LM model does not
50
LANGUAGE MODELING FOR CODE SWITCHING
Examples of sentences that the FINE-TUNED-DISCRIMINATIVE model fails to identify
51
LANGUAGE MODELING FOR CODE SWITCHING
method:
52
LANGUAGE MODELING FOR CODE SWITCHING
53
LANGUAGE MODELING FOR CODE SWITCHING
54