Speech Processing 15-492/18-492 Using Speech with Computers Alan W - - PowerPoint PPT Presentation
Speech Processing 15-492/18-492 Using Speech with Computers Alan W - - PowerPoint PPT Presentation
Speech Processing 15-492/18-492 Using Speech with Computers Alan W Black August 2008 Overview Practical and Theory: Practical and Theory: Understand concepts, Implement Solutions Understand concepts, Implement Solutions
Overview
- Practical and Theory:
Practical and Theory:
- Understand concepts, Implement Solutions
Understand concepts, Implement Solutions
- Speech Recognition
Speech Recognition
- Speech to text
Speech to text
- Speech Synthesis
Speech Synthesis
- Text to Speech
Text to Speech
- Spoken Dialog Systems
Spoken Dialog Systems
- Interaction with machines
Interaction with machines
Course Schedule
- MWF 3:30
MWF 3:30-
- 4:20
4:20
- DH 1117
DH 1117
- Lecturer: Alan W Black (
Lecturer: Alan W Black (awb@cs.cmu.edu awb@cs.cmu.edu) )
- TA: David Huggins (
TA: David Huggins (dhuggins@cs.cmu.edu dhuggins@cs.cmu.edu) )
- http://www.speech.cs.cmu.edu/15
http://www.speech.cs.cmu.edu/15-
- 492/
492/
Course Details
- Three lectures a week
Three lectures a week
- 4
4 Homeworks Homeworks
- Speech Recognition
Speech Recognition
- Speech Synthesis
Speech Synthesis
- Spoken Dialog System
Spoken Dialog System
- Other
Other
- Final Exam
Final Exam
Homeworks
- (Mostly) Practical
(Mostly) Practical
- Build something that talks/can be spoken to
Build something that talks/can be spoken to
- Software and speech data will be provided
Software and speech data will be provided
Will run on Windows/Linux or OSX
Will run on Windows/Linux or OSX
Access to Linux servers if required
Access to Linux servers if required
- Written description of what you did
Written description of what you did
Schedule Details
- Week 1 (Aug 15
Week 1 (Aug 15th
th)
)
- Applications, Human and Computer Speech
Applications, Human and Computer Speech Processing Processing
- Week 2
Week 2-
- 4 (Sep 3
4 (Sep 3rd
rd) Speech Recognition
) Speech Recognition
- Signal representation, acoustic modeling
Signal representation, acoustic modeling
- Language modeling, applications
Language modeling, applications
- Tuning, evaluation, expectations
Tuning, evaluation, expectations
Course Details
- Week 5
Week 5-
- 7 (22
7 (22nd
nd Sep) Speech Synthesis
Sep) Speech Synthesis
- Text processing, prosody, waveform synthesis
Text processing, prosody, waveform synthesis
- Building voices, evaluations, voice conversion
Building voices, evaluations, voice conversion
- Week 8 (13
Week 8 (13th
th Oct)
Oct) Multilinguality Multilinguality
- Supporting new languages efficiently
Supporting new languages efficiently
- Week 9
Week 9-
- 11 (20
11 (20th
th Oct) Dialog Systems
Oct) Dialog Systems
- VoiceXML
VoiceXML, Mixed initiative, barge , Mixed initiative, barge-
- in
in
- Design, installation and tuning.
Design, installation and tuning.
Course Details
- Week 12 (10
Week 12 (10th
th Nov)
Nov)
- Speech to Speech translation
Speech to Speech translation
- Language support, tight integration
Language support, tight integration
- Week 13 (17
Week 13 (17th
th Nov)
Nov)
- Evaluation and expectations
Evaluation and expectations
- Week 14 (24
Week 14 (24th
th)
)
- Speaker ID, Silent Speech, Conversion
Speaker ID, Silent Speech, Conversion
- What still needs to be done.
What still needs to be done.
- Week 15 (1
Week 15 (1st
st Dec)
Dec)
- Exam
Exam
Why Speech
- Most natural way to communicate
Most natural way to communicate
- (For Humans)
(For Humans)
- Not ideal for everything
Not ideal for everything
- Graphics and text can be better (sometimes)
Graphics and text can be better (sometimes)
- Doesn’t compress well
Doesn’t compress well
- Hard to search
Hard to search
Compression
- Alice in Wonderland
Alice in Wonderland
- Text
Text
150K uncompressed
150K uncompressed
43K compressed
43K compressed
- Speech (2hrs 20mins)
Speech (2hrs 20mins)
270M uncompressed
270M uncompressed
600K compressed (mp3, 24KBS)
600K compressed (mp3, 24KBS)
Searching
- Find all NPR broadcasts mentioning
Find all NPR broadcasts mentioning Obama Obama
- Listen to them all
Listen to them all
- From lecture recordings
From lecture recordings
- Find all occurrences of “this will be in the exam”
Find all occurrences of “this will be in the exam”
- So listen to it faster …
So listen to it faster …
- Normal 2x speed
Normal 2x speed
- 2x 4x 8x
2x 4x 8x
Eyes/Hands Free
- Interaction when driving
Interaction when driving
- Look at screen to see next turnoff
Look at screen to see next turnoff
- “In 200 yards turn right onto Murray Ave.”
“In 200 yards turn right onto Murray Ave.”
- Blind users/ Assistive technology
Blind users/ Assistive technology
- Text isn’t very useful
Text isn’t very useful
- Alerts
Alerts
- “Will self
“Will self-
- destruct in 10 seconds”
destruct in 10 seconds” vs vs
- blinking light
blinking light
- Telephone dialog systems
Telephone dialog systems
Speech Applications
- Command and Control
Command and Control
- Information Agents
Information Agents
- Speech to Speech Translation
Speech to Speech Translation
- Speech summarization
Speech summarization
- Lecture or Meeting summarization
Lecture or Meeting summarization
- Transcription/Dictation
Transcription/Dictation
- Speaker Identification
Speaker Identification
- emotion/dialect/language
emotion/dialect/language
- Language Learning
Language Learning
“Hot” Commercial Applications
- Location
Location-
- based services:
based services:
- Yahoo GO
Yahoo GO
Google Maps Maps
- Microsoft Live Search
Microsoft Live Search
- All phone/
All phone/pda pda based based
- Use speech
Use speech-
- in
in
- Directions speech
Directions speech-
- out
- ut
Other Speech uses
- Spoken Dialog Systems
Spoken Dialog Systems
- Let’s Go Public 412 268 3526 evenings 412 442 2000
Let’s Go Public 412 268 3526 evenings 412 442 2000
- Pittsburgh bus timetables by phone
Pittsburgh bus timetables by phone
- Assistive Technologies
Assistive Technologies
- Screen readers
Screen readers
- Augmentitive
Augmentitive and assistive communication devices and assistive communication devices
- On
On-
- line Personalization
line Personalization
- Blogcasts
Blogcasts (your voice, or appropriate voice) (your voice, or appropriate voice)
- Game character customization
Game character customization
- Talking Heads
Talking Heads
- CMU’s
CMU’s roboceptionist roboceptionist
- Singing Synthesis
Singing Synthesis
- XML interface for song specification
XML interface for song specification