Challenges and Techniques for Dialectal Arabic Speech Recognition and Machine Translation
Mohamed Elmahdy, Mark Hasegawa-Johnson, Eiman Mustafawi, Rehab Duwairi, Wolfgang Minker
- Nov. 21, 20011
Challenges and Techniques for Dialectal Arabic Speech Recognition - - PowerPoint PPT Presentation
Challenges and Techniques for Dialectal Arabic Speech Recognition and Machine Translation Mohamed Elmahdy, Mark Hasegawa-Johnson, Eiman Mustafawi, Rehab Duwairi, Wolfgang Minker Nov. 21, 20011 Qatar University University of Illinois Ulm
Page 2 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
Arabic Formal Dialectal
Significant differences between MSA and Dialectal Arabic
Introduction | Approaches | Experiments and results | Conclusions
Page 3 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
/t/, /s/ in ECA instead of /T/ in MSA e.g. /tala:tah/ (three) in ECA versus /Tala:Tah/ in MSA
/t„ArAbE:zA/ (table) in ECA versus /t„awila/ in MSA
SVO in ECA versus VSO in MSA
Introduction | Approaches | Experiments and results | Conclusions
Page 4 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
Feature Extraction Speech Decoder Features Words Acoustic Model Language Model
Introduction | Approaches | Experiments and results | Conclusions
For dialectal Arabic, sparse and low quality corpora are available
^
L W
) | ( W O P ) (W P
^
W O Pronunciation Model
Page 5 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
Arabic sentence Decoder English sentence Translation Model Language Model
Introduction | Approaches | Experiments and results | Conclusions
Large parallel corpora are required For dialectal Arabic, parallel corpora are not available
^
English E
) | ( E A P ) (E P
^
E A
Page 6 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
and MT
Introduction | Approaches | Experiments and results | Conclusions
Page 7 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
Introduction | Approaches | Experiments and results | Conclusions
Page 8 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
→ Dialectal speech data where phonetic transcription is available
Introduction | Approaches | Experiments and results | Conclusions
Page 9 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
MSA is always a 2nd language for any Arabic speaker Large amount of MSA speech data (large number of speakers) implicitly cover all the acoustic features of the different Arabic dialects
Train an acoustic model using a large amount of MSA speech data Adaptation of the MSA acoustic models with a little amount of dialectal speech data
Introduction | Approaches | Experiments and results | Conclusions
Page 10 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
Maximum Likelihood Linear Regression (MLLR) Maximum A-Posteriori (MAP)
language and phoneme set
Introduction | Approaches | Experiments and results | Conclusions
MSA ECA
MAP
MLLR
Acoustic model adaptation is not possible
Page 11 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
AM adaptation is possible
Several phone mapping rules are applied Map ECA phonemes to their origins in MSA (even if they are acoustically different)
Introduction | Approaches | Experiments and results | Conclusions
MSA MSA ECA ECA Normalization phone mapping rules ECA MSA /b/ /g/ /j/ /e/ /i/ /o/ /u/ /t/ ……. /b/ /dZ/ /i/ /u/ /t/ ………
Page 12 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
Introduction | Approaches | Experiments and results | Conclusions
ECA corpus MSA corpus Phonemes Normalization Phonemes Normalization Normalized MSA corpus Normalized ECA corpus Training MSA acoustic model ECA baseline model Training MLLR adaptation MAP adaptation ECA final model
Page 13 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
Introduction | Approaches | Experiments and results | Conclusions
→ Dialectal speech data where phonetic transcription is available
→ Phonetic transcription is not possible/difficult → Short vowels are missing → Phonetic transcription is approximated to be word letters
→ Transcriptions are not available at all → Dialectal speech was automatically transcribed using a MSA model
→ Latin letters are used instead of Arabic ones → Include short vowels that are missing in traditional Arabic orthography
Page 14 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
Introduction | Approaches | Experiments and results | Conclusions
Page 15 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
→ 65% for training/adaptation → 35% for testing
Introduction | Approaches | Experiments and results | Conclusions
N Del Ins Sub WER
41.8% Relative reduction in WER
Page 16 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
Introduction | Approaches | Experiments and results | Conclusions
Consistent decrease in WER
Page 17 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
Introduction | Approaches | Experiments and results | Conclusions
Page 18 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
→ Problems in ASR and MT for dialectal Arabic → Cross-lingual acoustic modeling for dialectal Arabic ASR → Improvements are observed in both phonemic and graphemic modeling → Consistent reduction in WER by adding more MSA data
→ Data collection (a focus is placed on the Qatari dialect) → Extension to all the Arabic dialects → Dialectal Arabic MT and LM
Introduction | Approaches | Experiments and results | Conclusions
Page 19 Challenges and techniques for dialectal Arabic ASR and MT | Mohamed Elmahdy | Qatar | Doha | Nov. 21, 2011
Introduction | Approaches | Experiments and results | Conclusions