Speech Processing 15-492/18-492 Using Speech with Computers Alan W - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Using Speech with Computers Alan W - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Using Speech with Computers Alan W Black August 2008 Overview Practical and Theory: Practical and Theory: Understand concepts, Implement Solutions Understand concepts, Implement Solutions


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Using Speech with Computers Alan W Black August 2008

slide-2
SLIDE 2

Overview

  • Practical and Theory:

Practical and Theory:

  • Understand concepts, Implement Solutions

Understand concepts, Implement Solutions

  • Speech Recognition

Speech Recognition

  • Speech to text

Speech to text

  • Speech Synthesis

Speech Synthesis

  • Text to Speech

Text to Speech

  • Spoken Dialog Systems

Spoken Dialog Systems

  • Interaction with machines

Interaction with machines

slide-3
SLIDE 3

Course Schedule

  • MWF 3:30

MWF 3:30-

  • 4:20

4:20

  • DH 1117

DH 1117

  • Lecturer: Alan W Black (

Lecturer: Alan W Black (awb@cs.cmu.edu awb@cs.cmu.edu) )

  • TA: David Huggins (

TA: David Huggins (dhuggins@cs.cmu.edu dhuggins@cs.cmu.edu) )

  • http://www.speech.cs.cmu.edu/15

http://www.speech.cs.cmu.edu/15-

  • 492/

492/

slide-4
SLIDE 4

Course Details

  • Three lectures a week

Three lectures a week

  • 4

4 Homeworks Homeworks

  • Speech Recognition

Speech Recognition

  • Speech Synthesis

Speech Synthesis

  • Spoken Dialog System

Spoken Dialog System

  • Other

Other

  • Final Exam

Final Exam

slide-5
SLIDE 5

Homeworks

  • (Mostly) Practical

(Mostly) Practical

  • Build something that talks/can be spoken to

Build something that talks/can be spoken to

  • Software and speech data will be provided

Software and speech data will be provided

  Will run on Windows/Linux or OSX

Will run on Windows/Linux or OSX

  Access to Linux servers if required

Access to Linux servers if required

  • Written description of what you did

Written description of what you did

slide-6
SLIDE 6

Schedule Details

  • Week 1 (Aug 15

Week 1 (Aug 15th

th)

)

  • Applications, Human and Computer Speech

Applications, Human and Computer Speech Processing Processing

  • Week 2

Week 2-

  • 4 (Sep 3

4 (Sep 3rd

rd) Speech Recognition

) Speech Recognition

  • Signal representation, acoustic modeling

Signal representation, acoustic modeling

  • Language modeling, applications

Language modeling, applications

  • Tuning, evaluation, expectations

Tuning, evaluation, expectations

slide-7
SLIDE 7

Course Details

  • Week 5

Week 5-

  • 7 (22

7 (22nd

nd Sep) Speech Synthesis

Sep) Speech Synthesis

  • Text processing, prosody, waveform synthesis

Text processing, prosody, waveform synthesis

  • Building voices, evaluations, voice conversion

Building voices, evaluations, voice conversion

  • Week 8 (13

Week 8 (13th

th Oct)

Oct) Multilinguality Multilinguality

  • Supporting new languages efficiently

Supporting new languages efficiently

  • Week 9

Week 9-

  • 11 (20

11 (20th

th Oct) Dialog Systems

Oct) Dialog Systems

  • VoiceXML

VoiceXML, Mixed initiative, barge , Mixed initiative, barge-

  • in

in

  • Design, installation and tuning.

Design, installation and tuning.

slide-8
SLIDE 8

Course Details

  • Week 12 (10

Week 12 (10th

th Nov)

Nov)

  • Speech to Speech translation

Speech to Speech translation

  • Language support, tight integration

Language support, tight integration

  • Week 13 (17

Week 13 (17th

th Nov)

Nov)

  • Evaluation and expectations

Evaluation and expectations

  • Week 14 (24

Week 14 (24th

th)

)

  • Speaker ID, Silent Speech, Conversion

Speaker ID, Silent Speech, Conversion

  • What still needs to be done.

What still needs to be done.

  • Week 15 (1

Week 15 (1st

st Dec)

Dec)

  • Exam

Exam

slide-9
SLIDE 9

Why Speech

  • Most natural way to communicate

Most natural way to communicate

  • (For Humans)

(For Humans)

  • Not ideal for everything

Not ideal for everything

  • Graphics and text can be better (sometimes)

Graphics and text can be better (sometimes)

  • Doesn’t compress well

Doesn’t compress well

  • Hard to search

Hard to search

slide-10
SLIDE 10

Compression

  • Alice in Wonderland

Alice in Wonderland

  • Text

Text

  150K uncompressed

150K uncompressed

  43K compressed

43K compressed

  • Speech (2hrs 20mins)

Speech (2hrs 20mins)

  270M uncompressed

270M uncompressed

  600K compressed (mp3, 24KBS)

600K compressed (mp3, 24KBS)

slide-11
SLIDE 11

Searching

  • Find all NPR broadcasts mentioning

Find all NPR broadcasts mentioning Obama Obama

  • Listen to them all

Listen to them all

  • From lecture recordings

From lecture recordings

  • Find all occurrences of “this will be in the exam”

Find all occurrences of “this will be in the exam”

  • So listen to it faster …

So listen to it faster …

  • Normal 2x speed

Normal 2x speed

  • 2x 4x 8x

2x 4x 8x

slide-12
SLIDE 12

Eyes/Hands Free

  • Interaction when driving

Interaction when driving

  • Look at screen to see next turnoff

Look at screen to see next turnoff

  • “In 200 yards turn right onto Murray Ave.”

“In 200 yards turn right onto Murray Ave.”

  • Blind users/ Assistive technology

Blind users/ Assistive technology

  • Text isn’t very useful

Text isn’t very useful

  • Alerts

Alerts

  • “Will self

“Will self-

  • destruct in 10 seconds”

destruct in 10 seconds” vs vs

  • blinking light

blinking light

  • Telephone dialog systems

Telephone dialog systems

slide-13
SLIDE 13

Speech Applications

  • Command and Control

Command and Control

  • Information Agents

Information Agents

  • Speech to Speech Translation

Speech to Speech Translation

  • Speech summarization

Speech summarization

  • Lecture or Meeting summarization

Lecture or Meeting summarization

  • Transcription/Dictation

Transcription/Dictation

  • Speaker Identification

Speaker Identification

  • emotion/dialect/language

emotion/dialect/language

  • Language Learning

Language Learning

slide-14
SLIDE 14

“Hot” Commercial Applications

  • Location

Location-

  • based services:

based services:

  • Yahoo GO

Yahoo GO

  • Google

Google Maps Maps

  • Microsoft Live Search

Microsoft Live Search

  • All phone/

All phone/pda pda based based

  • Use speech

Use speech-

  • in

in

  • Directions speech

Directions speech-

  • out
  • ut
slide-15
SLIDE 15

Other Speech uses

  • Spoken Dialog Systems

Spoken Dialog Systems

  • Let’s Go Public 412 268 3526 evenings 412 442 2000

Let’s Go Public 412 268 3526 evenings 412 442 2000

  • Pittsburgh bus timetables by phone

Pittsburgh bus timetables by phone

  • Assistive Technologies

Assistive Technologies

  • Screen readers

Screen readers

  • Augmentitive

Augmentitive and assistive communication devices and assistive communication devices

  • On

On-

  • line Personalization

line Personalization

  • Blogcasts

Blogcasts (your voice, or appropriate voice) (your voice, or appropriate voice)

  • Game character customization

Game character customization

  • Talking Heads

Talking Heads

  • CMU’s

CMU’s roboceptionist roboceptionist

  • Singing Synthesis

Singing Synthesis

  • XML interface for song specification

XML interface for song specification

slide-16
SLIDE 16