Speech Processing 15-492/18-492 Speech Processing Current Topics - - PowerPoint PPT Presentation

▶

Dec 25, 2023 381 likes •549 views

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges Commercial and Research Current and Future What are the hot topics in Speech What are the hot topics in Speech What currently works What

SLIDE 1

Speech Processing 15-492/18-492

Speech Processing Current Topics and Future challenges Commercial and Research

SLIDE 2

Current and Future

What are the hot topics in Speech

What are the hot topics in Speech

What currently works

What currently works

What could work soon (5

What could work soon (5-

10years)

10years)

What are the industry hot topics

What are the industry hot topics

What are the research challenges

What are the research challenges

SLIDE 3

Spoken Dialog: Now

Industry:

Industry:

Location based querying

Location based querying

  Google

Google: 411, : 411, smartphone smartphone

  Microsoft Live Search:

Microsoft Live Search: smartphone smartphone

  Yahoo (

Yahoo (Vlingo Vlingo) )

Blackberry,

Blackberry, IPhone IPhone

  (Owners have money)

(Owners have money)

How do you make money out of this …

How do you make money out of this …

SLIDE 4

Spoken Dialog: Now

Research

Research

Error recovery

Error recovery

Adaptive systems

Adaptive systems

Rapid deployment

Rapid deployment

Learning dialog structure from data

Learning dialog structure from data

SLIDE 5

ASR: Now

Industry

Industry

Moving from grammar based to N

Moving from grammar based to N-

gram based

gram based

Broadcast news transcription of IR

Broadcast news transcription of IR

Robust speech recognition:

Robust speech recognition:

  In car, outside, in noisy office

In car, outside, in noisy office

LM adaptation from other sources

LM adaptation from other sources

  Using click through and search queries

Using click through and search queries

Pronunciation variants (“wrong” ones too)

Pronunciation variants (“wrong” ones too)

Medical transcription

Medical transcription

SLIDE 6

ASR: Now

Research:

Research:

Discriminative training

Discriminative training

  Acoustic parameter projections to discriminate

Acoustic parameter projections to discriminate between the correct answers and competitors between the correct answers and competitors

Robust recognition

Robust recognition

  Far field microphones

Far field microphones

  Blind source separation

Blind source separation

Out of vocabulary words

Out of vocabulary words

Unsupervised training

Unsupervised training

SLIDE 7

TTS: Now

Industry

Industry

Building custom voices (and your voice)

Building custom voices (and your voice)

Multilingual on small devices

Multilingual on small devices

  E.g. for GPS Navigation over Europe

E.g. for GPS Navigation over Europe

Easy methods to build new languages

Easy methods to build new languages

SLIDE 8

TTS: Now

Research

Research

Improving statistical synthesis

Improving statistical synthesis

Rapid support in new languages

Rapid support in new languages

Emotional speech synthesis

Emotional speech synthesis

Automatic building of voices from data

Automatic building of voices from data

  Without any human intervention

Without any human intervention

Synthesis beyond the sentence

Synthesis beyond the sentence

  Synthesis with more text analysis

Synthesis with more text analysis

SLIDE 9

Speech to Speech Translation

Industry

Industry

One way systems, domain limited systems

One way systems, domain limited systems

Simple targeted cell phone systems

Simple targeted cell phone systems

Research

Research

Two way systems, large domains

Two way systems, large domains

One way lecture/broadcast news

One way lecture/broadcast news

SLIDE 10

VC and SID: Now

Voice conversion

Voice conversion

Cross Lingual Voice Conversion

Cross Lingual Voice Conversion

Emotion/style conversion

Emotion/style conversion

Conversion without training data

Conversion without training data

Speaker ID

Speaker ID

Accuracy on large data sets (> 1000 speakers)

Accuracy on large data sets (> 1000 speakers)

Cross channel/language ID

Cross channel/language ID

More information in ID (prosody,

More information in ID (prosody, vocab vocab) )

SLIDE 11

CALL: Now

Industry

Industry

Pronunciation training

Pronunciation training

Scenario practicing

Scenario practicing

Research

Research

Game based tools

Game based tools

Measuring educational contribution

Measuring educational contribution

SLIDE 12

Speech Processing Future

Hard challenges (PhD topics and beyond)

Hard challenges (PhD topics and beyond)

All on the research side

All on the research side

But maybe in Research Labs

But maybe in Research Labs

SLIDE 13

Speech Reco without Speech

Using other modalities

Using other modalities

Lip movement, muscle movement

Lip movement, muscle movement

Silent speech

Silent speech

  No generated audio

No generated audio

  Just think about the words

Just think about the words

Gesture recognition

Gesture recognition

SLIDE 14

Conversational Systems

Participant in a meeting

Participant in a meeting

True conversational speech

True conversational speech

  Appropriate non

Appropriate non-

word speech generation

word speech generation

  Know when to speak, when to laugh, when to listen

Know when to speak, when to laugh, when to listen

  Appropriate timing conversation

Appropriate timing conversation

  Able to interrupt when having something to say

Able to interrupt when having something to say

Have something to say

Have something to say

SLIDE 15

Summaries and Discussions

Describe a paper/movie/event

Describe a paper/movie/event

Appropriate summary

Appropriate summary

Allow questions

Allow questions

Know when to use style/emotion

Know when to use style/emotion

Not just speech<

Not just speech<-

>text

>text

  Understand more of the text content

Understand more of the text content

SLIDE 16

Final Notes

Don’t forget to fill in Faculty Course

Don’t forget to fill in Faculty Course Evaluation Evaluation

Final Homework due

Final Homework due

Monday 8

Monday 8th

th 3:30pm

3:30pm

Final exam

Final exam

Tuesday 16

Tuesday 16th

th 1pm