Language Modeling for Codeswitching HILA GONEN PHD STUDENT AT YOAV - PowerPoint PPT Presentation

Language Modeling for Codeswitching HILA GONEN PHD STUDENT AT YOAV GOLDBERG’S LAB BAR ILAN UNIVERSITY

Outline • Background • Codeswitching • Language Modeling and Perplexity • New Evaluation Method • Definition • Creation of data set • Incorporation of Monolingual Data • Discriminative Training • Conclusion LANGUAGE MODELING FOR CODE SWITCHING 2

Codeswitching “the alternation of two languages within a single discourse, sentence or constituent” (Poplack, 1980) English – Spanish: that es su tío that has lived with him like I don't know how like ya several years... that his uncle who has lived with him like, I don't know how, like several years already... French – Arabic: mais les filles ta3na ysedkou n'import quoi ana hada face book jamais cheftou khlah kalbi Our girls believe anything, I have never seen this Facebook before. LANGUAGE MODELING FOR CODE SWITCHING 3

Codeswitching and its challenges • Very popular, mainly among bilingual communities • Extremely limited data • Non standard platforms (spoken data, social media) • Important challenge for automatic speech recognition (ASR) systems LANGUAGE MODELING FOR CODE SWITCHING 4

ASR with monolingual models • Output of IBM Models: • LANGUAGE MODELING FOR CODE SWITCHING 5

Language Modeling • The task of assigning a probability to a given sentence. • Useful for translation and for Automatic Speech Recognition: • The system produces several candidates and the LM scores them • Given a word sequence, the LM estimates the probability of each word in the vocabulary to follow. • Standard training: • Lots of unlabeled text is used • Training examples: all sentence prefixes, along with the following word LANGUAGE MODELING FOR CODE SWITCHING 6

Language Modeling chocolate 0.1 0.08 cheesecakes 0.06 strawberries I love 0.04 winter … 0.6 me LANGUAGE MODELING FOR CODE SWITCHING 7

Automatic Speech Recognition (ASR) • Language models are traditionally used in the decoding process • The ASR system produces candidates for a given acoustic signal • The LM is used to rank the candidates: • Needs to differentiate between “good” and “bad” sentences • ASR systems are hard to set up and tune, not standarized LANGUAGE MODELING FOR CODE SWITCHING 8

Previous Work – LM for CS • Artificial CS data (Vu et al. 2012, Pratapa et al. 2018) • Syntactic constraints (Li and Fung 2012, 2014) • Factored LM (Adel et al. 2013, 2014, 2015) • Most of previous work depend on an ASR system (conflates the LM performance with other aspects of the ASR system, hard to replicate the evaluation procedure and fairly compare results) • No previous work compares to another LANGUAGE MODELING FOR CODE SWITCHING 9

Previous Work – LM for CS We want to evaluate the language model independently of an ASR LANGUAGE MODELING FOR CODE SWITCHING 10

Perplexity (Standard LM Evaluation) Given a language model and a test sequence of words � , � the perplexity of over the sequence is defined as: where � is the probability the model assigns to � . The lower the perplexity, the better the LM is. LANGUAGE MODELING FOR CODE SWITCHING 11

Shortcomings of Perplexity • Not always well aligned with the quality of a language model (Tran et al. 2018) • Better perplexities often do not translate to better word-error-rate (WER) scores (Huang et al. 2018) • Does not penalize for assigning high probability to highly implausible sentences • Strong dependence on vocabulary (e.g. word-based vs. char-based) LANGUAGE MODELING FOR CODE SWITCHING 12

Shortcomings of Perplexity - Example • We train a simple model on some data • We then measure the effect of the vocabulary: • We add words to the vocabulary • We train a model in the same manner, on the same data • We do not train the additional words • This results in a 2.37-points loss on the perplexity measure • Addition of words alone, with no change in the training procedure, results in significant change in perplexity – why? LANGUAGE MODELING FOR CODE SWITCHING 13

Shortcomings of Perplexity - Example We do not want to evaluate the language model with perplexity LANGUAGE MODELING FOR CODE SWITCHING 14

New Evaluation Method • We seek a method that meets the following requirements: 1. Prefers LMs that prioritize correct sentences. 2. Does not depend on the vocabulary of the LM. 3. Is independent of an ASR system. LANGUAGE MODELING FOR CODE SWITCHING 15

New Evaluation Method • We suggest a method that simulates the task of an LM in ASR • Sets of sentences: • A single gold sentence in each set • ~30 similar-sounding alternatives in each set • LM should identify the gold sentence in each set • We use accuracy as our metric LANGUAGE MODELING FOR CODE SWITCHING 16

New Evaluation Method • This method answers all of our requirements: 1. Prefers LMs that prioritize correct sentences. 2. Does not depend on the vocabulary of the LM. 3. Is independent of an ASR system. LANGUAGE MODELING FOR CODE SWITCHING 17

Codeswitching Corpus Gold data: ◦ Bangor Miami Corpus – transcripts of conversations by Spanish- speakers in Florida, all of whom are bilingual in English ◦ 45,621 sentences, split into train/dev/test ◦ All three types: English, Spanish and CS sentences Examples: ◦ So I asked what was happening ◦ Quieres un vaso de agua ? ◦ Que by the way se vino ilegal LANGUAGE MODELING FOR CODE SWITCHING 18

Our Created data How do we obtain similar-sounding sentences to build the sets? We create them! For each gold sentence, we create alternative sentences of all types: ◦ English sentences ◦ Spanish sentences ◦ CS sentences We do that using finite state transducers (FSTs) – to be explained LANGUAGE MODELING FOR CODE SWITCHING 19

Examples from the Dataset LANGUAGE MODELING FOR CODE SWITCHING 20

Dataset Statistics LANGUAGE MODELING FOR CODE SWITCHING 21

Finite State Transducer - FSTs • Similar to FSA (finite state automata), but with an additional component of output (transitions have both input and output labels) • Capable of transforming a string into another • An FST can convert a string � into the string � , � � if there is a path with as its input labels and as its output labels • Composition – FSTs can be composed • Weighted FSTs – transitions can be labelled with weights LANGUAGE MODELING FOR CODE SWITCHING 22

Finite State Transducer - FSTs Formally, an FST is a 6-tuple such that: – the set of states (finite) ◦ – input alphabet (finite) ◦ – output alphabet (finite) ◦ – initial states (subset of ) ◦ – final states (subset of ) ◦ – transition function ◦ LANGUAGE MODELING FOR CODE SWITCHING 23

FSTs – Toy Example sad: happy The girl is sad The girl is happy This is a sad story This is a happy story others: others LANGUAGE MODELING FOR CODE SWITCHING 24

Dataset Creation We implement the creation of the dataset with Carmel, an FST toolkit: 1. An FST for converting a sentence into a sequence of phonemes 2. An FST that allows minor changes in the phoneme sequence 3. An FST for decoding a sequence of phonemes into a sentence (the inverse of 1). LANGUAGE MODELING FOR CODE SWITCHING 25

1. Sentence to Phonemes We use pronunciation dictionaries for both languages: ◦ book__en B UH K ◦ cat__en K AE T ◦ libro__sp L IY B R OW ◦ gato__sp G AA T OW LANGUAGE MODELING FOR CODE SWITCHING 26

2. Change Phoneme Sequence We allow minor change in the phoneme sequence to increase flexibility: LANGUAGE MODELING FOR CODE SWITCHING 27

3. Phonemes to Sentence We use the same pronunciation dictionaries: smell y que to ◦ S M EH L IY K EY T UW smelly que to To favor frequent words over infrequent ones, we add unigram probabilities to the edges of the transducer LANGUAGE MODELING FOR CODE SWITCHING 28

Phoneme sequence: Gold sentence: smelly:EN gato:SP S M EH L IY G AA T OW S M EH L IY G AA T OW smelly gato Sentence to phonemes G FST K Changing AA phonemes EY FST OW UW Phonemes to Sentence Alternative sequence: Phoneme sequence: FST smell y que to 29 S M EH L IY K EY T UW Alternative sequence: T UW K EY IY S M EH L smelly que to que:SP to:EN y:SP smell:EN LANGUAGE MODELING FOR CODE SWITCHING 29 29

Dataset Creation – cont. Implementation details: • We can create monolingual and CS sentences regardless the source sentence • We only convert a sampled part of the gold sentence when creating a code-switched alternative • For CS alternatives, we encourage sentences to include both languages, and to differ from each other, using some heuristics (e.g more words from the less dominant language) • We randomly choose 250/750 sets in which the gold sentence is code-switched/monolingual LANGUAGE MODELING FOR CODE SWITCHING 30

So far… New evaluation method that enables comparison of wide range of models : ◦ Directly penalizing for preferring “bad” sentences ◦ Does not depend on the vocabulary ◦ Independent of an ASR system Creating dataset using FSTs Applicable for any language or language-pair LANGUAGE MODELING FOR CODE SWITCHING 31

Language Modeling for Codeswitching HILA GONEN PHD STUDENT AT YOAV - PowerPoint PPT Presentation

Language Modeling for Codeswitching HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY Outline Background Codeswitching Language Modeling and Perplexity New Evaluation Method Definition Creation of data

Psycholinguistic and Corpus Approaches to Codeswitching Melinda Fricke, University of Pittsburgh

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Language Modeling CSE392 - Spring 2019 Special Topic in CS Task Probabilistic Modeling

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How

Language Modeling Michael Collins, Columbia University Overview The language modeling problem

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Count-based Language Modeling CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner

NEST Modeling Language: A modeling language for spiking neuron and synapse models for NEST

Topics Why E Field Modeling What is E Field Modeling Case Studies Questions 2 Why

Outline 1 The topic 2 Decision support systems 3 Modeling 3.3 Advanced modeling

Verilog HDL:Digital Design and Modeling Chapter 5 Gate-Level Modeling Chapter 5 Gate-Level

Developmental Developmental Disorders affecting Disorders affecting language language

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

Some FoibleS oF the Scholarly communication ProceSS John hn M M. . Budd dd Univ niversity

Artificial Intelligence (AI) Applications in Ophthalmology Robert Chang, MD IDx -- First US FDA

My Five JAMA Papers Fahad Razak Assistant Professor and Internist, St Michael's Hospital,

Morphology within the Multi-Layered Annotation Scenario of the Prague Dependency Treebank Magda

Utilisation malveillante des suivis de connexions ric Leblond OISF SSTIC 2012 ric Leblond

POINCAR ET LA THORIE DE LA RELATIVIT Thibault Damour Institut

Forecasting and Now-Casting with Disparate Predictors: Dynamic Factor Models and Beyond FEMES

Equal opportunity not based on Where you were born Where you went to school