Speech Processing 15-492/18-492 Speech Recognition Grammars Other - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Speech Recognition Grammars Other - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Recognition Grammars Other ASR techniques But not just acoustics But not all phones are equi-probable Find word sequences that maximizes Using Bayes Law Combine models Us HMMs to


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Speech Recognition Grammars Other ASR techniques

slide-2
SLIDE 2

But not just acoustics

  • But not all phones are equi-probable
  • Find word sequences that maximizes
  • Using Bayes’ Law
  • Combine models

– Us HMMs to provide – Use language model to provide

slide-3
SLIDE 3

Beyond n-grams

  • Tri

Tri-

  • gram languages models

gram languages models

  • Good for general ASR

Good for general ASR

  • More targeted models for dialog systems

More targeted models for dialog systems

  • Look for more structure

Look for more structure

slide-4
SLIDE 4

Formal Language Theory

  • Chomsky Hierarchy

Chomsky Hierarchy

  • Finite State Machines

Finite State Machines

  • Context Free Grammars

Context Free Grammars

  • Context Sensitive Grammars

Context Sensitive Grammars

  • Generalized Rewrite Rules/Turing machines

Generalized Rewrite Rules/Turing machines

  • As LM or as Understanding mechanism

As LM or as Understanding mechanism

  • Folded into the ASR or only ran on output

Folded into the ASR or only ran on output

slide-5
SLIDE 5

Finite State Machines

  • Trigram is a word^2 FSM

Trigram is a word^2 FSM

  • FSM for greeting

FSM for greeting

Hello Good Morning Afternoon

slide-6
SLIDE 6

Finite State Grammar

  • Sentences

Sentences -

  • > Start Greeting End

> Start Greeting End

  • Greeting

Greeting -

  • > “Hello”

> “Hello”

  • Greeting

Greeting -

  • > “Good” TOD

> “Good” TOD

  • TOD

TOD -

  • > Morning

> Morning

  • TOD

TOD -

  • > Afternoon

> Afternoon

slide-7
SLIDE 7

Context Free Grammar

  • X

X -

  • > Y Z

> Y Z

  • Y

Y -

  • > “Terminal”

> “Terminal”

  • Y

Y -

  • >

> NonTerminal NonTerminal NonTerminal NonTerminal

slide-8
SLIDE 8

JSGF

  • Simple grammar formalism for ASR

Simple grammar formalism for ASR

  • Standard for writing ASR grammars

Standard for writing ASR grammars

  • Actually finite state

Actually finite state

  • http://www.w3.org/TR/jsgf

http://www.w3.org/TR/jsgf

slide-9
SLIDE 9

Finite State Machines

  • Finite State Machines:

Finite State Machines:

  • Deterministic

Deterministic

  Each arc leaving a state has unique label

Each arc leaving a state has unique label

  There always exists a Deterministic machine

There always exists a Deterministic machine representing a non representing a non-

  • Deterministic one

Deterministic one

  • Miniminal

Miniminal

  There exists an FSM with less (or equal) states that

There exists an FSM with less (or equal) states that accepts the same language accepts the same language

slide-10
SLIDE 10

Probabilistic FSMs

  • Each arc has a label and a probability

Each arc has a label and a probability

  • Collect probabilities from data

Collect probabilities from data

  • Can do smoothing like

Can do smoothing like ngrams ngrams

slide-11
SLIDE 11

Natural Language Processing

  • Probably mildly context sensitive

Probably mildly context sensitive

  • i.e. you need context sensitive rules

i.e. you need context sensitive rules

  • But if we only accept context free

But if we only accept context free

  • Probably OK

Probably OK

  • If we only accept finite state

If we only accept finite state

  • Probably OK too

Probably OK too

slide-12
SLIDE 12

Writing Grammars for Speech

  • What do people say?

What do people say?

  • No what do people *really* say!

No what do people *really* say!

  • Write examples

Write examples

  • Please, I’d like a flight to Boston

Please, I’d like a flight to Boston

  • I want to fly to Boston

I want to fly to Boston

  • What do you have going to Boston

What do you have going to Boston

  • What about Boston

What about Boston

  • Boston

Boston

  • Write rules grouping things together

Write rules grouping things together

slide-13
SLIDE 13

Ignore the unimportant things

  • I’m terribly sorry but I would greatly

I’m terribly sorry but I would greatly appreciate if you might be able to help me appreciate if you might be able to help me find an acceptable find an acceptable flight to Boston flight to Boston. .

  • I, I

I, I wanna wanna want to go to want to go to ehm ehm Boston. Boston.

slide-14
SLIDE 14

What do people really say

  • A: see who else will somebody else important all the

A: see who else will somebody else important all the {mumble} the whole school are out for a week {mumble} the whole school are out for a week

  • B: oh really

B: oh really

  • A: {

A: {lipsmack lipsmack} {breath} yeah } {breath} yeah

  • B: okay {breath} well when are you going to come up then

B: okay {breath} well when are you going to come up then

  • A: um let’s see well I guess I I could come up actually

A: um let’s see well I guess I I could come up actually anytime anytime

  • B: okay well how about now

B: okay well how about now

  • A: now

A: now

  • B: yeah

B: yeah

  • A: have to work tonight

A: have to work tonight – –laugh laugh-

slide-15
SLIDE 15

Class based language models

  • Conflate all words in same class

Conflate all words in same class

  • Cities, Names, numbers etc

Cities, Names, numbers etc

  • Can be automatic or designed

Can be automatic or designed

slide-16
SLIDE 16

Adaptive Language Models

  • Update with new News stories

Update with new News stories

  • Update your language model every day

Update your language model every day

  • Update your language model with daily use

Update your language model with daily use

  • Using user generated data (if ASR is good)

Using user generated data (if ASR is good)

slide-17
SLIDE 17

Combining models

  • Use “background” model

Use “background” model

  • General tri

General tri-

  • gram model

gram model

  • Use specific model

Use specific model

  • Grammar based

Grammar based

  • Very localized

Very localized

  • Combine

Combine

  • Interpolated (just a weight factor)

Interpolated (just a weight factor)

  • More elaborate combinations

More elaborate combinations

  Maximum entropy models

Maximum entropy models

slide-18
SLIDE 18

Vocabulary size

  • Command and control

Command and control

  • < 100 words, grammar based

< 100 words, grammar based

  • Simple dialog

Simple dialog

  • < 1000 words, grammar/tri

< 1000 words, grammar/tri-

  • gram

gram

  • Complex dialog

Complex dialog

  • < 10K words, tri

< 10K words, tri-

  • gram (some grammar for control)

gram (some grammar for control)

  • Dictation

Dictation

  • < 64K words, tri

< 64K words, tri-

  • gram

gram

  • Broadcast News

Broadcast News

  • 256K plus, tri

256K plus, tri-

  • gram (and lots of other possibilities

gram (and lots of other possibilities

slide-19
SLIDE 19

Homework 1

  • Build a speech recognition system

Build a speech recognition system

  • An acoustic model

An acoustic model

  • A pronunciation lexicon

A pronunciation lexicon

  • A language model

A language model

  • Note it takes time to build

Note it takes time to build

  • What is your initial WER

What is your initial WER

  • How did you improve it

How did you improve it

  • Submitted by 3:30pm Monday 29

Submitted by 3:30pm Monday 29th

th Sep

Sep

slide-20
SLIDE 20

WFSTs