Speech Processing 15-492/18-492 Using Speech with Computers Alan W - - PowerPoint PPT Presentation

▶

Jul 29, 2023 175 likes •357 views

Speech Processing 15-492/18-492 Using Speech with Computers Alan W Black August 2008 Overview Practical and Theory: Practical and Theory: Understand concepts, Implement Solutions Understand concepts, Implement Solutions

SLIDE 1

Speech Processing 15-492/18-492

Using Speech with Computers Alan W Black August 2008

SLIDE 2

Overview

Practical and Theory:

Practical and Theory:

Understand concepts, Implement Solutions

Understand concepts, Implement Solutions

Speech Recognition

Speech Recognition

Speech to text

Speech to text

Speech Synthesis

Speech Synthesis

Text to Speech

Text to Speech

Spoken Dialog Systems

Spoken Dialog Systems

Interaction with machines

Interaction with machines

SLIDE 3

Course Schedule

MWF 3:30

MWF 3:30-

4:20

4:20

DH 1117

DH 1117

Lecturer: Alan W Black (

Lecturer: Alan W Black (awb@cs.cmu.edu awb@cs.cmu.edu) )

TA: David Huggins (

TA: David Huggins (dhuggins@cs.cmu.edu dhuggins@cs.cmu.edu) )

http://www.speech.cs.cmu.edu/15

http://www.speech.cs.cmu.edu/15-

492/

492/

SLIDE 4

Course Details

Three lectures a week

Three lectures a week

4 Homeworks Homeworks

Speech Recognition

Speech Recognition

Speech Synthesis

Speech Synthesis

Spoken Dialog System

Spoken Dialog System

Other

Other

Final Exam

Final Exam

SLIDE 5

Homeworks

(Mostly) Practical

(Mostly) Practical

Build something that talks/can be spoken to

Build something that talks/can be spoken to

Software and speech data will be provided

Software and speech data will be provided

  Will run on Windows/Linux or OSX

Will run on Windows/Linux or OSX

  Access to Linux servers if required

Access to Linux servers if required

Written description of what you did

Written description of what you did

SLIDE 6

Schedule Details

Week 1 (Aug 15

Week 1 (Aug 15th

th)

)

Applications, Human and Computer Speech

Applications, Human and Computer Speech Processing Processing

Week 2

Week 2-

4 (Sep 3

4 (Sep 3rd

rd) Speech Recognition

) Speech Recognition

Signal representation, acoustic modeling

Signal representation, acoustic modeling

Language modeling, applications

Language modeling, applications

Tuning, evaluation, expectations

Tuning, evaluation, expectations

SLIDE 7

Course Details

Week 5

Week 5-

7 (22

7 (22nd

nd Sep) Speech Synthesis

Sep) Speech Synthesis

Text processing, prosody, waveform synthesis

Text processing, prosody, waveform synthesis

Building voices, evaluations, voice conversion

Building voices, evaluations, voice conversion

Week 8 (13

Week 8 (13th

th Oct)

Oct) Multilinguality Multilinguality

Supporting new languages efficiently

Supporting new languages efficiently

Week 9

Week 9-

11 (20

11 (20th

th Oct) Dialog Systems

Oct) Dialog Systems

VoiceXML

VoiceXML, Mixed initiative, barge , Mixed initiative, barge-

in

Design, installation and tuning.

Design, installation and tuning.

SLIDE 8

Course Details

Week 12 (10

Week 12 (10th

th Nov)

Nov)

Speech to Speech translation

Speech to Speech translation

Language support, tight integration

Language support, tight integration

Week 13 (17

Week 13 (17th

th Nov)

Nov)

Evaluation and expectations

Evaluation and expectations

Week 14 (24

Week 14 (24th

th)

)

Speaker ID, Silent Speech, Conversion

Speaker ID, Silent Speech, Conversion

What still needs to be done.

What still needs to be done.

Week 15 (1

Week 15 (1st

st Dec)

Dec)

Exam

Exam

SLIDE 9

Why Speech

Most natural way to communicate

Most natural way to communicate

(For Humans)

(For Humans)

Not ideal for everything

Not ideal for everything

Graphics and text can be better (sometimes)

Graphics and text can be better (sometimes)

Doesn’t compress well

Doesn’t compress well

Hard to search

Hard to search

SLIDE 10

Compression

Alice in Wonderland

Alice in Wonderland

Text

Text

  150K uncompressed

150K uncompressed

  43K compressed

43K compressed

Speech (2hrs 20mins)

Speech (2hrs 20mins)

  270M uncompressed

270M uncompressed

  600K compressed (mp3, 24KBS)

600K compressed (mp3, 24KBS)

SLIDE 11

Searching

Find all NPR broadcasts mentioning

Find all NPR broadcasts mentioning Obama Obama

Listen to them all

Listen to them all

From lecture recordings

From lecture recordings

Find all occurrences of “this will be in the exam”

Find all occurrences of “this will be in the exam”

So listen to it faster …

So listen to it faster …

Normal 2x speed

Normal 2x speed

2x 4x 8x

2x 4x 8x

SLIDE 12

Eyes/Hands Free

Interaction when driving

Interaction when driving

Look at screen to see next turnoff

Look at screen to see next turnoff

“In 200 yards turn right onto Murray Ave.”

“In 200 yards turn right onto Murray Ave.”

Blind users/ Assistive technology

Blind users/ Assistive technology

Text isn’t very useful

Text isn’t very useful

Alerts

Alerts

“Will self

“Will self-

destruct in 10 seconds”

destruct in 10 seconds” vs vs

blinking light

blinking light

Telephone dialog systems

Telephone dialog systems

SLIDE 13

Speech Applications

Command and Control

Command and Control

Information Agents

Information Agents

Speech to Speech Translation

Speech to Speech Translation

Speech summarization

Speech summarization

Lecture or Meeting summarization

Lecture or Meeting summarization

Transcription/Dictation

Transcription/Dictation

Speaker Identification

Speaker Identification

emotion/dialect/language

emotion/dialect/language

Language Learning

Language Learning

SLIDE 14

“Hot” Commercial Applications

Location

Location-

based services:

based services:

Yahoo GO

Yahoo GO

Google

Google Maps Maps

Microsoft Live Search

Microsoft Live Search

All phone/

All phone/pda pda based based

Use speech

Use speech-

in

Directions speech

Directions speech-

SLIDE 15

Other Speech uses

Spoken Dialog Systems

Spoken Dialog Systems

Let’s Go Public 412 268 3526 evenings 412 442 2000

Let’s Go Public 412 268 3526 evenings 412 442 2000

Pittsburgh bus timetables by phone

Pittsburgh bus timetables by phone

Assistive Technologies

Assistive Technologies

Screen readers

Screen readers

Augmentitive

Augmentitive and assistive communication devices and assistive communication devices

On-

line Personalization

line Personalization

Blogcasts

Blogcasts (your voice, or appropriate voice) (your voice, or appropriate voice)

Game character customization

Game character customization

Talking Heads

Talking Heads

CMU’s

CMU’s roboceptionist roboceptionist

Singing Synthesis

Singing Synthesis

XML interface for song specification

XML interface for song specification

SLIDE 16