Foundations of Language Science and Technology: Statistical - - PowerPoint PPT Presentation

foundations of language science and technology
SMART_READER_LITE
LIVE PREVIEW

Foundations of Language Science and Technology: Statistical - - PowerPoint PPT Presentation

Foundations of Language Science and Technology: Statistical Language Models Dietrich Klakow Using Language Models 2 How Speech Recognition works Speech Signal Feature Extraction Feature Extraction Acoustic Model Stream of feature P(A|W)


slide-1
SLIDE 1

Foundations of Language Science and Technology: Statistical Language Models

Dietrich Klakow

slide-2
SLIDE 2

2

Using Language Models

slide-3
SLIDE 3

3 Speech Signal Feature Extraction Feature Extraction

How Speech Recognition works

Acoustic Model

P(A|W)

Language Model

P(W) Stream of feature vectors A Search W=argmax [P(A|W) P(W)] ^ [W]

Recognized word sequence W ^ ^ Language Model

P(W)

slide-4
SLIDE 4

4

Guess the next word

What‘s in your hometown newspaper ???

slide-5
SLIDE 5

5

Guess the next word

What‘s in your hometown newspaper today

slide-6
SLIDE 6

6

Guess the next word It‘s raining cats and ???

slide-7
SLIDE 7

7

Guess the next word It‘s raining cats and dogs

slide-8
SLIDE 8

8

Guess the next word President Bill ???

slide-9
SLIDE 9

9

Guess the next word President Bill Gates

slide-10
SLIDE 10

10

Information Retrieval

  • Language model introduced to information

retrieval in 1998 by Ponte&Croft

D6 D1 D2 D3 D5 D4 D7 Query Q P(Q|D2) Ranking according to P(Q|Di)

slide-11
SLIDE 11

11

Measuring the Quality of Language Models

slide-12
SLIDE 12

12

Definition of Perplexity

( )

      − = =

− h w N N

h w P h w N N w w P PP

, / 1 1

) | ( log ) , ( 1 exp ) ... (

P(w|h): language model N(w,h): frequency of sequence w,h in some test corpus N: size of test corpus

slide-13
SLIDE 13

13

Interpretation

Calculate perplexity of uniform distribution (white board)

slide-14
SLIDE 14

14

Perplexity and Word Error Rate

Perplexity and error rate are correlate within error bars

slide-15
SLIDE 15

15

Estimating the Parameters of a Language Model

slide-16
SLIDE 16

16

Goal

  • Minimize perplexity on training data

( )

      − =

h w Train Train

h w P h w N N PP

,

) | ( log ) , ( 1 exp

slide-17
SLIDE 17

17

Define likelihood

L=-log (PP)

( )

=

h w Train Train

h w P h w N N L

,

) | ( log ) , ( 1

Minimizing perplexity

  • maximizing likelihood

How to take normalization constraint into account?

slide-18
SLIDE 18

18

Calculating the maximum likelihood estimate (white board)

slide-19
SLIDE 19

19

Maximum likelihood estimator

) ( ) , ( ) | ( h N h w N h w P

Train Train

=

What´s the problem?

slide-20
SLIDE 20

20

Backing-off and Smoothing

slide-21
SLIDE 21

21

Absolute Discounting

  • See white board
slide-22
SLIDE 22

22

Influence of Discounting Parameter

slide-23
SLIDE 23

23

Possible further Improvements

slide-24
SLIDE 24

24

Linear Smoothing

V N w N w N w w N w w P

Train Train Train Train

1 ) 1 ( ) ( ) ( ) ( ) | (

2 1 2 1 1 1 1

λ λ λ λ − − + + =

− − −

V: size of vocabulary

slide-25
SLIDE 25

25

Marginal Backing-Off (Kneser-Ney-Smoothing)

  • Dedicated backing-off distributions
  • Usually about 10% to 20% reduction in

perplexity

slide-26
SLIDE 26

26

Class Language Models

  • Automatically group words into classes
  • Map all words in the language model to

classes

  • Dramatic reduction in number of parameters

to estimate

  • Usually used in linear with word language

model

slide-27
SLIDE 27

27

Summary

  • How to build a state-of-the art plain vanilla

language model:

  • Trigram
  • Absolute discounting
  • Marginal backing-off (Kneser-Ney smoothing)
  • Linear interpolation with class model