speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Speech Translation Case study: - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Translation Case study: Transtac Details Transtac: Two S2S System DARPA developed for DARPA developed for Check points, medical and civil defense Check points, medical and civil defense


  1. Speech Processing 15-492/18-492 Speech Translation Case study: Transtac Details

  2. Transtac: Two S2S System DARPA developed for � DARPA developed for � � Check points, medical and civil defense Check points, medical and civil defense � Requirements � Requirements � � Two way Two way � � Eyes Eyes- -free (no screen) free (no screen) � � Portable Portable � � Usable by real Usable by real usersS usersS �

  3. Transtac System Close-talking Microphone Optional speech control Push-to-Talk Buttons Laptop secured in Backpack Small powerful Speakers

  4. Transtac System Details Two way system � Two way system � � 2 ASR systems: English and Iraqi 2 ASR systems: English and Iraqi � � 2 way statistical translation 2 way statistical translation � � 2 synthesizers 2 synthesizers � Push- -to to- -talk system talk system � Push � � (Users don’t like “translate everything mode”) (Users don’t like “translate everything mode”) � Echo back ASR result � Echo back ASR result � � And then translation And then translation �

  5. Iraqi Language Iraqi Arabic is a dialect � Iraqi Arabic is a dialect � � Most Iraqi’s write Modern Standard Arabic Most Iraqi’s write Modern Standard Arabic � � Most Iraqi’s do not write their own dialect Most Iraqi’s do not write their own dialect � No standardized spelling � No standardized spelling � � Transtac Transtac project invented one project invented one � � But Iraqi’s may not be used to it But Iraqi’s may not be used to it � Arabic (MSA and dialects) � Arabic (MSA and dialects) � � Do not write short vowels in words Do not write short vowels in words �

  6. Data for Training � Collected human mediated dialogs Collected human mediated dialogs � � Human acts as a machine Human acts as a machine � � Passed a microphone back an forward Passed a microphone back an forward � � Try to get people not to talk at same time Try to get people not to talk at same time � � Large number of collections (over 4 years) Large number of collections (over 4 years) � � 650 thousand sentences pairs 650 thousand sentences pairs � � Many different speakers Many different speakers � � Hand transcribed by experts (in Iraqi spelling) Hand transcribed by experts (in Iraqi spelling) � � Hand translate (Source sentences and Interpreter’s) Hand translate (Source sentences and Interpreter’s) �

  7. Iraqi ASR � Acoustic model from Iraqi data Acoustic model from Iraqi data � � Based on MSA Based on MSA phoneset phoneset � � Needs to be small fast models Needs to be small fast models � � Discriminative Training Discriminative Training � � Speaker specific adaptation Speaker specific adaptation � � Lexicon Lexicon � � Based on LDC provided lexicon Based on LDC provided lexicon � � Multiple pronunciations/typos still a problem Multiple pronunciations/typos still a problem � � Statistically trained LTS rules Statistically trained LTS rules � � Language Model Language Model � � Trained on Iraqi input (and translated output) Trained on Iraqi input (and translated output) �

  8. English ASR � Acoustic model Acoustic model � � Originally using other models Originally using other models � � Then trained from collected data Then trained from collected data � � (Mostly military personnel) (Mostly military personnel) � � Lexicon Lexicon � � Existing lexicon but needed to add Military speak: Existing lexicon but needed to add Military speak: � MRAP, IED MRAP, IED � Language model Language model � � Trained from data provided Trained from data provided � � Trained from “similar” data found on the web Trained from “similar” data found on the web � � Training from hand created “typical” examples Training from hand created “typical” examples �

  9. TTS � Standard English TTS Standard English TTS � � Appropriate “command” voice Appropriate “command” voice � � Unit selection Unit selection � � Added lots of military vocabulary Added lots of military vocabulary � � Iraqi TTS Iraqi TTS � � Recorded from Iraqi radio announcer Recorded from Iraqi radio announcer � � Based on example sentences in the domain Based on example sentences in the domain � � LDC lexicon and LTS rules (same as ASR) LDC lexicon and LTS rules (same as ASR) � � Hand tuned Hand tuned �

  10. S2S Interface Issues How do you teach people to use the system � How do you teach people to use the system � � “ “Transtac Transtac say instructions” say instructions” � � Not really sufficient Not really sufficient � How can you tell it translated correctly � How can you tell it translated correctly � � Give (speech) feedback. Give (speech) feedback. �  Backtranslation Backtranslation   ASR echo back ASR echo back 

  11. S2S Interface Issues How do you translate names � How do you translate names � � A correct translation/transliteration is hard to A correct translation/transliteration is hard to � understand understand Mark names in translations � Mark names in translations � � “My name is … Abdullah” “My name is … Abdullah” � � “He lives on … al “He lives on … al- -Aqar Aqar … street” … street” �

  12. S2S Evaluation (Transtac) � Offline tests Offline tests � � ASR ASR- ->Text and Text >Text and Text- ->Text >Text � � Compare to translation references Compare to translation references � � WER and “BLEU” score WER and “BLEU” score � � Online tests Online tests � � Concept transfer (through defined scenarios) Concept transfer (through defined scenarios) � � Speed (number of concepts per minute) Speed (number of concepts per minute) � � (English speech masking) (English speech masking) � � Utility tests Utility tests � � Does it really work Does it really work �

  13. Transtac Participants � Developer groups Developer groups � � IBM IBM � � SRI SRI � � BBN BBN � � CMU CMU � � USC USC � � Evaluations Evaluations � � Twice a year in Iraqi (somewhere in DC) Twice a year in Iraqi (somewhere in DC) � � One surprise language (Farsi, One surprise language (Farsi, Bahasa Bahasa Malay) Malay) � � Other evaluations with military groups Other evaluations with military groups �

  14. Does it work?? Yes, mostly � Yes, mostly � � 27 concepts out of 30 27 concepts out of 30- -ish turns ish turns � Systems are mostly similar � Systems are mostly similar � � But some better than others But some better than others � Other techniques � Other techniques � � Belt/holster based PC with handheld speaker Belt/holster based PC with handheld speaker � � Small PC in pouch Small PC in pouch � � Chest mounted array microphone Chest mounted array microphone �

  15. S2S ASR Advanced issues � Tight coupling Tight coupling � � ASR should output N ASR should output N- -best best � � Translated all (lattice) Translated all (lattice) � � Choose best translation Choose best translation � � (MT as a LM for ASR) (MT as a LM for ASR) � � Remove Remove disfluencies/hestitations disfluencies/hestitations � � Add more relevant data Add more relevant data � � Automatically convert past tense/third person data to Automatically convert past tense/third person data to � present tense/first+second first+second person … person … present tense/

  16. S2S TTS Advance Issues MT output isn’t gramtical gramtical � MT output isn’t � � TTS doesn’t care and just says it TTS doesn’t care and just says it � � TTS should try to say MT output with more TTS should try to say MT output with more � breaks. breaks. TTS (unit selection) � TTS (unit selection) � � As a LM on MT output As a LM on MT output � � Choose the best translation on what is said best Choose the best translation on what is said best �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend