speech processing 11 492 18 492 speech processing 11 492
play

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 - PowerPoint PPT Presentation

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis Evaluating Speech Synthesis How good is the voice? How good is the voice? This voice is a 45.67 This voice is a


  1. Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

  2. Evaluating Speech Synthesis Evaluating Speech Synthesis  How good is the voice? How good is the voice?  This voice is a 45.67 This voice is a 45.67  Is voice X better than voice Y Is voice X better than voice Y  Why? Why?

  3. Evaluation Evaluation  Objective measures Objective measures  Run a program and get a number Run a program and get a number  Subjective measures Subjective measures  Have human listeners extract a score Have human listeners extract a score  Do Object and Subjective scores correlate Do Object and Subjective scores correlate

  4. Human Tests Human Tests  Synthesis people are warped Synthesis people are warped  The more you listen the better it becomes The more you listen the better it becomes  They hear things others don’t They hear things others don’t  Non-synthesis people are warped Non-synthesis people are warped  People very sensitive to listening conditions People very sensitive to listening conditions  What question do you ask What question do you ask  What hardware you play it on What hardware you play it on  There are (at least) two orthogonal scales There are (at least) two orthogonal scales  Understandability Understandability  Naturalness Naturalness

  5. Standard Tests Standard Tests  DRT: diagnostic rhyme tests DRT: diagnostic rhyme tests  Test confusable phones Test confusable phones  “ “bat” vs “pat” bat” vs “pat”  Good for identifying phone errors Good for identifying phone errors  Sometimes in carrier sentences Sometimes in carrier sentences  Now we will say pat again. Now we will say pat again.  Unit selection Unit selection  Just include the standard works in the database Just include the standard works in the database

  6. Standard Tests Standard Tests  SUS: Semantically unpredictable sentences SUS: Semantically unpredictable sentences  Det adj noun verb det adj noun Det adj noun verb det adj noun  Automatically filled in with low frequency words Automatically filled in with low frequency words  The parklike holders threw the vague vegetables The parklike holders threw the vague vegetables  The simplistic consonants swam the episcopal quartet The simplistic consonants swam the episcopal quartet  The dark geniuses woke the humane emptiness. The dark geniuses woke the humane emptiness.  The masterly serials withdrew the collaborative brochure The masterly serials withdrew the collaborative brochure  Test for understandability Test for understandability  Ask users to type in what they hear Ask users to type in what they hear  Good as discrimination Good as discrimination  Very hard for even fluent non-natives Very hard for even fluent non-natives

  7. Standard tests Standard tests  MOS: mean opinion scores MOS: mean opinion scores  1-5 quality, naturalness, “like it” 1-5 quality, naturalness, “like it”  Take average score Take average score

  8. Some experimental problems Some experimental problems  Order of presentation Order of presentation  Other aids change perception Other aids change perception  Showing the text makes it much easier Showing the text makes it much easier  Having a talking head “improves” the synthesis Having a talking head “improves” the synthesis  Hardware quality Hardware quality  Some voices better on the telephone Some voices better on the telephone  Loud speaker quality (headphone quality) Loud speaker quality (headphone quality)  Room acoustics Room acoustics  Volume Volume  Understandability Understandability  Harder if doing other task Harder if doing other task  Personal preference Personal preference  Voice is full understandable but “creepy” Voice is full understandable but “creepy”  Voice is incomprehensible but “funny” Voice is incomprehensible but “funny”  Sounds like my grade school teacher Sounds like my grade school teacher

  9. TTS Evaluation TTS Evaluation  How good are your ears? How good are your ears?

  10. SUS Sentences SUS Sentences  sus_00005 sus_00005  sus_00012 sus_00012  sus_00017 sus_00017  sus_00022 sus_00022

  11. SUS Sentences SUS Sentences  The sorrowful premieres sang the The sorrowful premieres sang the ostentation gymnast ostentation gymnast  The temperamental gateways forgave the The temperamental gateways forgave the weatherbeaten finalist weatherbeaten finalist  The disruptive billboards blew the sugary The disruptive billboards blew the sugary endorsement endorsement  The serene adjustments foresaw the The serene adjustments foresaw the acceptable acquisition acceptable acquisition

  12. TTS Evaluation TTS Evaluation

  13. TTS Evaluation TTS Evaluation  In mud eels are, in mud none are In mud eels are, in mud none are  A 1918 state constitutional amendment A 1918 state constitutional amendment made Massachusetts one of 23 states made Massachusetts one of 23 states where citizens can enact laws by plebiscite. where citizens can enact laws by plebiscite.  Which is which Which is which  The numbers are 25 and 34. The numbers are 25 and 34.  The numbers 20 5 and 34. The numbers 20 5 and 34.  What is the temperature in Pittsburgh What is the temperature in Pittsburgh

  14. Objective Synthesis Tests Objective Synthesis Tests  Text analysis Text analysis  How well do you cover NSWs How well do you cover NSWs  How well do you cover homographs How well do you cover homographs  Lexical coverage Lexical coverage  How often do you see a new word How often do you see a new word  Lexical correctness Lexical correctness  How correct are pronunciations How correct are pronunciations  For unseen words For unseen words  For seen words For seen words  Phonetic intelligibility Phonetic intelligibility  DRT tests DRT tests  Semantic intelligibility Semantic intelligibility  SUS tests SUS tests

  15. Blizzard Challenge Blizzard Challenge  Annual Event from 2005 (15 years plus) Annual Event from 2005 (15 years plus)  Distribute large databases of speech Distribute large databases of speech  Participants Participants  Build a voice Build a voice  Synthesize a set of sentences Synthesize a set of sentences  Listeners Listeners  Listen and grade results Listen and grade results

  16. Blizzard Challenge Blizzard Challenge 2005: US English synthesis, 4 voices, 1 hour each 2005: US English synthesis, 4 voices, 1 hour each  4 teams plus “Studio” (human speech) 4 teams plus “Studio” (human speech)  2006: US English: 1 voice: 6 hours and 1 hour 2006: US English: 1 voice: 6 hours and 1 hour  12 teams 12 teams  2007: US English: 1 voice: 9 hours and 1 hour 2007: US English: 1 voice: 9 hours and 1 hour  14 teams 14 teams  2008: UK English: 15 hours: Mandarin 5 hours 2008: UK English: 15 hours: Mandarin 5 hours  19 teams 19 teams  2009: UK English: 15 hours: Mandarin 5 hours 2009: UK English: 15 hours: Mandarin 5 hours  2010: UK English 18 hours: Mandarin 6 hours 2010: UK English 18 hours: Mandarin 6 hours  2010- Audio Books, Indian Languages, Speaking in Noise 2010- Audio Books, Indian Languages, Speaking in Noise  Split between industry and academia Split between industry and academia  Split between Asia, Europe, America (mostly Europe and Asia). Split between Asia, Europe, America (mostly Europe and Asia). 

  17. Listeners Listeners  Three sets of listeners Three sets of listeners  Speech experts (participants) Speech experts (participants)  Paid undergrads (native speakers) Paid undergrads (native speakers)  Volunteers Volunteers  Types of tests Types of tests  MOS tests (1-5) MOS tests (1-5)  SUS tests SUS tests  DRT tests DRT tests  About 300 listeners in total About 300 listeners in total

  18. Listening Listening  Web based Web based  So everyone did it in a different environment So everyone did it in a different environment  But we got access to more people But we got access to more people  Asked to do it in quiet office with headphone Asked to do it in quiet office with headphone  Could listen multiple times Could listen multiple times

  19. Blizzard Challenge Results Blizzard Challenge Results  Speech Experts Speech Experts  Like synthesis better Like synthesis better  Understand synthesis better Understand synthesis better  Volunteers don’t always finish tests Volunteers don’t always finish tests  Undergrads sometimes finish tests Undergrads sometimes finish tests  (or put in filler answers) (or put in filler answers)  Results were correlated over different Results were correlated over different subgroups subgroups

  20. Application Tests Application Tests  How does it work *in* the application How does it work *in* the application  With real application data With real application data  A good voice is not noticed A good voice is not noticed  Have *real* users evaluate it Have *real* users evaluate it  Give them a choice (even if artificial) Give them a choice (even if artificial)  CEO chooses the one they like! CEO chooses the one they like!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend