Speech Processing 15-492/18-492 Speech Processing Current Topics - - PowerPoint PPT Presentation
Speech Processing 15-492/18-492 Speech Processing Current Topics - - PowerPoint PPT Presentation
Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges Commercial and Research Current and Future What are the hot topics in Speech What are the hot topics in Speech What currently works What
Current and Future
- What are the hot topics in Speech
What are the hot topics in Speech
- What currently works
What currently works
- What could work soon (5
What could work soon (5-
- 10years)
10years)
- What are the industry hot topics
What are the industry hot topics
- What are the research challenges
What are the research challenges
Spoken Dialog: Now
- Industry:
Industry:
- Location based querying
Location based querying
Google: 411, : 411, smartphone smartphone
Microsoft Live Search:
Microsoft Live Search: smartphone smartphone
Yahoo (
Yahoo (Vlingo Vlingo) )
- Blackberry,
Blackberry, IPhone IPhone
(Owners have money)
(Owners have money)
- How do you make money out of this …
How do you make money out of this …
Spoken Dialog: Now
- Research
Research
- Error recovery
Error recovery
- Adaptive systems
Adaptive systems
- Rapid deployment
Rapid deployment
- Learning dialog structure from data
Learning dialog structure from data
ASR: Now
- Industry
Industry
- Moving from grammar based to N
Moving from grammar based to N-
- gram based
gram based
- Broadcast news transcription of IR
Broadcast news transcription of IR
- Robust speech recognition:
Robust speech recognition:
In car, outside, in noisy office
In car, outside, in noisy office
- LM adaptation from other sources
LM adaptation from other sources
Using click through and search queries
Using click through and search queries
- Pronunciation variants (“wrong” ones too)
Pronunciation variants (“wrong” ones too)
- Medical transcription
Medical transcription
ASR: Now
- Research:
Research:
- Discriminative training
Discriminative training
Acoustic parameter projections to discriminate
Acoustic parameter projections to discriminate between the correct answers and competitors between the correct answers and competitors
- Robust recognition
Robust recognition
Far field microphones
Far field microphones
Blind source separation
Blind source separation
- Out of vocabulary words
Out of vocabulary words
- Unsupervised training
Unsupervised training
TTS: Now
- Industry
Industry
- Building custom voices (and your voice)
Building custom voices (and your voice)
- Multilingual on small devices
Multilingual on small devices
E.g. for GPS Navigation over Europe
E.g. for GPS Navigation over Europe
- Easy methods to build new languages
Easy methods to build new languages
TTS: Now
- Research
Research
- Improving statistical synthesis
Improving statistical synthesis
- Rapid support in new languages
Rapid support in new languages
- Emotional speech synthesis
Emotional speech synthesis
- Automatic building of voices from data
Automatic building of voices from data
Without any human intervention
Without any human intervention
- Synthesis beyond the sentence
Synthesis beyond the sentence
Synthesis with more text analysis
Synthesis with more text analysis
Speech to Speech Translation
- Industry
Industry
- One way systems, domain limited systems
One way systems, domain limited systems
- Simple targeted cell phone systems
Simple targeted cell phone systems
- Research
Research
- Two way systems, large domains
Two way systems, large domains
- One way lecture/broadcast news
One way lecture/broadcast news
VC and SID: Now
- Voice conversion
Voice conversion
- Cross Lingual Voice Conversion
Cross Lingual Voice Conversion
- Emotion/style conversion
Emotion/style conversion
- Conversion without training data
Conversion without training data
- Speaker ID
Speaker ID
- Accuracy on large data sets (> 1000 speakers)
Accuracy on large data sets (> 1000 speakers)
- Cross channel/language ID
Cross channel/language ID
- More information in ID (prosody,
More information in ID (prosody, vocab vocab) )
CALL: Now
- Industry
Industry
- Pronunciation training
Pronunciation training
- Scenario practicing
Scenario practicing
- Research
Research
- Game based tools
Game based tools
- Measuring educational contribution
Measuring educational contribution
Speech Processing Future
- Hard challenges (PhD topics and beyond)
Hard challenges (PhD topics and beyond)
- All on the research side
All on the research side
- But maybe in Research Labs
But maybe in Research Labs
Speech Reco without Speech
- Using other modalities
Using other modalities
- Lip movement, muscle movement
Lip movement, muscle movement
- Silent speech
Silent speech
No generated audio
No generated audio
Just think about the words
Just think about the words
- Gesture recognition
Gesture recognition
Conversational Systems
- Participant in a meeting
Participant in a meeting
- True conversational speech
True conversational speech
Appropriate non
Appropriate non-
- word speech generation
word speech generation
Know when to speak, when to laugh, when to listen
Know when to speak, when to laugh, when to listen
Appropriate timing conversation
Appropriate timing conversation
Able to interrupt when having something to say
Able to interrupt when having something to say
- Have something to say
Have something to say
Summaries and Discussions
- Describe a paper/movie/event
Describe a paper/movie/event
- Appropriate summary
Appropriate summary
- Allow questions
Allow questions
- Know when to use style/emotion
Know when to use style/emotion
- Not just speech<
Not just speech<-
- >text
>text
Understand more of the text content
Understand more of the text content
Final Notes
- Don’t forget to fill in Faculty Course
Don’t forget to fill in Faculty Course Evaluation Evaluation
- Final Homework due
Final Homework due
- Monday 8
Monday 8th
th 3:30pm
3:30pm
- Final exam
Final exam
- Tuesday 16
Tuesday 16th
th 1pm
1pm-
- 4pm