Speech Recognition Results Confidence Measures Hossein Ajorloo - - PowerPoint PPT Presentation

▶

May 24, 2023 244 likes •610 views

What Im Going to Talk About Introduction CM Methods Final remarks Speech Recognition Results Confidence Measures Hossein Ajorloo Computer Engineering Department Sharif University of Technology, Tehran, Iran Prof.: Dr. H. Sameti Hossein

SLIDE 1

What I’m Going to Talk About Introduction CM Methods Final remarks

Speech Recognition Results Confidence Measures

Hossein Ajorloo

Computer Engineering Department Sharif University of Technology, Tehran, Iran

Prof.:

Dr. H. Sameti

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 2

What I’m Going to Talk About Introduction CM Methods Final remarks

Outline

1

What I’m Going to Talk About

2

Introduction

3

CM Methods CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

4

Final remarks

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 3

What I’m Going to Talk About Introduction CM Methods Final remarks

What I’m Going to Talk About

In speech recognition, confidence measures (CM) are used to evaluate reliability of recognition results. In this survey, I summarize most research works related to confidence measures which have been done during the past 10–12 years. I will present all these approaches as three major categories, namely

CM as a combination of predictor features CM as a posterior probability CM as utterance verification

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 4

What I’m Going to Talk About Introduction CM Methods Final remarks

What I’m Going to Talk About

In speech recognition, confidence measures (CM) are used to evaluate reliability of recognition results. In this survey, I summarize most research works related to confidence measures which have been done during the past 10–12 years. I will present all these approaches as three major categories, namely

CM as a combination of predictor features CM as a posterior probability CM as utterance verification

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 5

What I’m Going to Talk About Introduction CM Methods Final remarks

What I’m Going to Talk About

In speech recognition, confidence measures (CM) are used to evaluate reliability of recognition results. In this survey, I summarize most research works related to confidence measures which have been done during the past 10–12 years. I will present all these approaches as three major categories, namely

CM as a combination of predictor features CM as a posterior probability CM as utterance verification

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 6

What I’m Going to Talk About Introduction CM Methods Final remarks

What’s a CM?

Motivation for Defining CM ASR system performance usually dramatically degrades in the real fields because of ambient noises, speaker variations, channel distortions, etc. The capability to evaluate reliability of speech recognition results has been regarded as a crucial technique to increase usefulness and intelligence of an ASR system in many practical applications. Definition of CM A score (preferably between 0 and 1) to indicate reliability of any recognition decision made by ASR systems.

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 7

What I’m Going to Talk About Introduction CM Methods Final remarks

What’s a CM?

Motivation for Defining CM ASR system performance usually dramatically degrades in the real fields because of ambient noises, speaker variations, channel distortions, etc. The capability to evaluate reliability of speech recognition results has been regarded as a crucial technique to increase usefulness and intelligence of an ASR system in many practical applications. Definition of CM A score (preferably between 0 and 1) to indicate reliability of any recognition decision made by ASR systems.

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 8

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Outline

1

What I’m Going to Talk About

2

Introduction

3

CM Methods CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

4

Final remarks

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 9

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Predictor Features

In the literature, a very large portion of CM-related works aim to search for a predictor feature (or a set of features) which is informative to distinguish correctly recognized results from other possible recognition errors. Then all predictor features are combined in a certain way to generate a single score to indicate correctness of the recognition decision Some common predictor features

Pure normalized likelihood score related N-best related Acoustic stability Hypothesis density Duration related Language model (LM) related Parsing related Posterior probability related Log-likelihood-ratio related

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 10

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Predictor Features

In the literature, a very large portion of CM-related works aim to search for a predictor feature (or a set of features) which is informative to distinguish correctly recognized results from other possible recognition errors. Then all predictor features are combined in a certain way to generate a single score to indicate correctness of the recognition decision Some common predictor features

Pure normalized likelihood score related N-best related Acoustic stability Hypothesis density Duration related Language model (LM) related Parsing related Posterior probability related Log-likelihood-ratio related

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 11

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Predictor Features

In the literature, a very large portion of CM-related works aim to search for a predictor feature (or a set of features) which is informative to distinguish correctly recognized results from other possible recognition errors. Then all predictor features are combined in a certain way to generate a single score to indicate correctness of the recognition decision Some common predictor features

Pure normalized likelihood score related N-best related Acoustic stability Hypothesis density Duration related Language model (LM) related Parsing related Posterior probability related Log-likelihood-ratio related

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 12

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Predictor Features (Cont.)

Combination of predictor features

An ideal predictor feature should provide strong information to separate the correctly recognized words from other misrecognitions and the distribution overlap between the two classes should be minor. Combine several different predictor features for a better performance. Many different combinational models have been reported in the literature A combination approach can improve the overall performance only when all individual components are statistically independent. Obviously, this is not the case for the above predictor features.

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 13

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Predictor Features (Cont.)

Combination of predictor features

An ideal predictor feature should provide strong information to separate the correctly recognized words from other misrecognitions and the distribution overlap between the two classes should be minor. Combine several different predictor features for a better performance. Many different combinational models have been reported in the literature A combination approach can improve the overall performance only when all individual components are statistically independent. Obviously, this is not the case for the above predictor features.

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 14

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Predictor Features (Cont.)

Combination of predictor features

An ideal predictor feature should provide strong information to separate the correctly recognized words from other misrecognitions and the distribution overlap between the two classes should be minor. Combine several different predictor features for a better performance. Many different combinational models have been reported in the literature A combination approach can improve the overall performance only when all individual components are statistically independent. Obviously, this is not the case for the above predictor features.

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 15

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Predictor Features (Cont.)

Combination of predictor features

An ideal predictor feature should provide strong information to separate the correctly recognized words from other misrecognitions and the distribution overlap between the two classes should be minor. Combine several different predictor features for a better performance. Many different combinational models have been reported in the literature A combination approach can improve the overall performance only when all individual components are statistically independent. Obviously, this is not the case for the above predictor features.

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 16

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Predictor Features (Cont.)

Combination of predictor features

An ideal predictor feature should provide strong information to separate the correctly recognized words from other misrecognitions and the distribution overlap between the two classes should be minor. Combine several different predictor features for a better performance. Many different combinational models have been reported in the literature A combination approach can improve the overall performance only when all individual components are statistically independent. Obviously, this is not the case for the above predictor features.

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 17

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Outline

1

What I’m Going to Talk About

2

Introduction

3

CM Methods CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

4

Final remarks

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 18

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Main Idea

Using MAP for CM It is well known that the posterior probability in the standard maximum a posterior (MAP) decision rule is a good candidate for CM in speech recognition since it is an absolute measure of how well the decision is. The Basic Problem It is very hard to estimate the posterior probability in a precise manner due to its normalization term in the denominator. In practice, many different approaches have been proposed to approximate it, ranging from simple filler-based methods to complex word-graph-based approaches.

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 19

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Main Idea

Using MAP for CM It is well known that the posterior probability in the standard maximum a posterior (MAP) decision rule is a good candidate for CM in speech recognition since it is an absolute measure of how well the decision is. The Basic Problem It is very hard to estimate the posterior probability in a precise manner due to its normalization term in the denominator. In practice, many different approaches have been proposed to approximate it, ranging from simple filler-based methods to complex word-graph-based approaches.

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 20

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Theoretical View

MAP Method

ˆ W = arg max

W ∈Σ p(W|X)

= arg max

W ∈Σ

p(X|W).p(W) p(X) = arg max

W ∈Σ p(X|W).p(W)

p(X) in theory

p(X) = ❳

p(X, H) = ❳

p(H).p(X|H)

Two methods for approx. p(x) Including the so-called filler-based methods which try to calculate p(X) from a set of general filler or background models So-called lattice-based methods which attempt to calculate p(X), then p(W|X) in turn, from a word lattice or graph

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 21

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Theoretical View

MAP Method

ˆ W = arg max

W ∈Σ p(W|X)

= arg max

W ∈Σ

p(X|W).p(W) p(X) = arg max

W ∈Σ p(X|W).p(W)

p(X) in theory

p(X) = ❳

p(X, H) = ❳

p(H).p(X|H)

Two methods for approx. p(x) Including the so-called filler-based methods which try to calculate p(X) from a set of general filler or background models So-called lattice-based methods which attempt to calculate p(X), then p(W|X) in turn, from a word lattice or graph

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 22

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Theoretical View

MAP Method

ˆ W = arg max

W ∈Σ p(W|X)

= arg max

W ∈Σ

p(X|W).p(W) p(X) = arg max

W ∈Σ p(X|W).p(W)

p(X) in theory

p(X) = ❳

p(X, H) = ❳

p(H).p(X|H)

Two methods for approx. p(x) Including the so-called filler-based methods which try to calculate p(X) from a set of general filler or background models So-called lattice-based methods which attempt to calculate p(X), then p(W|X) in turn, from a word lattice or graph

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 23

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Outline

1

What I’m Going to Talk About

2

Introduction

3

CM Methods CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

4

Final remarks

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 24

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Utterance Verification (UV)

For a given speech segment X, assume that an ASR system recognizes it as word W which is represented by an HMM λW . Utterance verification is a post-processing stage to examine the reliability of the hypothesized recognition result. Under the framework of UV, it’s first proposed two complementary hypotheses, namely the null hypothesis H0 and the alternative hypothesis H1 as follows: H0: X is correctly recognized and truly comes from λW H1: X is wrongly classified and is NOT from λW

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 25

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Utterance Verification (UV)

For a given speech segment X, assume that an ASR system recognizes it as word W which is represented by an HMM λW . Utterance verification is a post-processing stage to examine the reliability of the hypothesized recognition result. Under the framework of UV, it’s first proposed two complementary hypotheses, namely the null hypothesis H0 and the alternative hypothesis H1 as follows: H0: X is correctly recognized and truly comes from λW H1: X is wrongly classified and is NOT from λW

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 26

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Utterance Verification (UV)

For a given speech segment X, assume that an ASR system recognizes it as word W which is represented by an HMM λW . Utterance verification is a post-processing stage to examine the reliability of the hypothesized recognition result. Under the framework of UV, it’s first proposed two complementary hypotheses, namely the null hypothesis H0 and the alternative hypothesis H1 as follows: H0: X is correctly recognized and truly comes from λW H1: X is wrongly classified and is NOT from λW

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 27

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Utterance Verification (UV)

For a given speech segment X, assume that an ASR system recognizes it as word W which is represented by an HMM λW . Utterance verification is a post-processing stage to examine the reliability of the hypothesized recognition result. Under the framework of UV, it’s first proposed two complementary hypotheses, namely the null hypothesis H0 and the alternative hypothesis H1 as follows: H0: X is correctly recognized and truly comes from λW H1: X is wrongly classified and is NOT from λW

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 28

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Utterance Verification (UV)

For a given speech segment X, assume that an ASR system recognizes it as word W which is represented by an HMM λW . Utterance verification is a post-processing stage to examine the reliability of the hypothesized recognition result. Under the framework of UV, it’s first proposed two complementary hypotheses, namely the null hypothesis H0 and the alternative hypothesis H1 as follows: H0: X is correctly recognized and truly comes from λW H1: X is wrongly classified and is NOT from λW

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 29

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Utterance Verification (Cont.)

Then we test H0 against H1 to determine whether we should accept the recognition result or reject it. According to Neyman-Pearson Lemma, under some conditions, the optimal solution to the above testing is based on a likelihood ratio testing (LRT), i.e., LRT = p(X|H0) p(X|H1) H0 ≷ H1 τ The LRT-based utterance verification provides a good theoretical formulation to address CM problems in ASR. The above LRT score can be transformed to a CM based

n a monotonic one-to-one mapping function.

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 30

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Utterance Verification (Cont.)

Then we test H0 against H1 to determine whether we should accept the recognition result or reject it. According to Neyman-Pearson Lemma, under some conditions, the optimal solution to the above testing is based on a likelihood ratio testing (LRT), i.e., LRT = p(X|H0) p(X|H1) H0 ≷ H1 τ The LRT-based utterance verification provides a good theoretical formulation to address CM problems in ASR. The above LRT score can be transformed to a CM based

n a monotonic one-to-one mapping function.

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 31

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Utterance Verification (Cont.)

Then we test H0 against H1 to determine whether we should accept the recognition result or reject it. According to Neyman-Pearson Lemma, under some conditions, the optimal solution to the above testing is based on a likelihood ratio testing (LRT), i.e., LRT = p(X|H0) p(X|H1) H0 ≷ H1 τ The LRT-based utterance verification provides a good theoretical formulation to address CM problems in ASR. The above LRT score can be transformed to a CM based

n a monotonic one-to-one mapping function.

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 32

What I’m Going to Talk About Introduction CM Methods Final remarks CM as Combination of Predictor Features CM as Posterior Probability CM as Utterance Verification

Utterance Verification (Cont.)

Then we test H0 against H1 to determine whether we should accept the recognition result or reject it. According to Neyman-Pearson Lemma, under some conditions, the optimal solution to the above testing is based on a likelihood ratio testing (LRT), i.e., LRT = p(X|H0) p(X|H1) H0 ≷ H1 τ The LRT-based utterance verification provides a good theoretical formulation to address CM problems in ASR. The above LRT score can be transformed to a CM based

n a monotonic one-to-one mapping function.

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 33

What I’m Going to Talk About Introduction CM Methods Final remarks

Open Problems

Despite a large amount of research efforts in the past, I still believe that robust speech recognition and confidence measure will remain as two most active and influential research topics in speech community for a foreseeable future. Studying capabilities and limitations of CM

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 34

What I’m Going to Talk About Introduction CM Methods Final remarks

Open Problems

Despite a large amount of research efforts in the past, I still believe that robust speech recognition and confidence measure will remain as two most active and influential research topics in speech community for a foreseeable future. Studying capabilities and limitations of CM

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 35

Appendix For Further Reading

For Further Reading I

Sukkar, R.A., Rejection for connected digit recognition based on GPD segmental discrimination.

Proc. of International Conference on Acoustics, Speech

and Signal Processing, I-393–I-396, 1994 Young, S.R. Detecting misrecognitions and out-ofvocabulary words.

Proc. of International Conference on Acoustics, Speech

and Signal Processing, II-21–II-24, 1994

Hossein Ajorloo Speech Recognition Results Confidence Measures

SLIDE 36

Appendix For Further Reading

For Further Reading II

Kemp, T., Schaaf, T. Estimating confidence using word lattices.

Proc. of European Conference on Speech Communication

Technology, 827–830, 1997 Lee, C.-H. Statistical confidence measures and their applications.

Proc. of ICSP, August, 2001

Hossein Ajorloo Speech Recognition Results Confidence Measures