An Integrated Framework for Margin-based Sequential Discriminative - - PowerPoint PPT Presentation

an integrated framework for margin based sequential
SMART_READER_LITE
LIVE PREVIEW

An Integrated Framework for Margin-based Sequential Discriminative - - PowerPoint PPT Presentation

Overview Background Unification of Margin-modified MPE and MMI An Integrated Framework for Margin-based Sequential Discriminative Training over Lattices using differenced Maximum Mutual Information (dMMI) Erik McDermott - Google Inc.


slide-1
SLIDE 1

Overview Background Unification of Margin-modified MPE and MMI

An Integrated Framework for Margin-based Sequential Discriminative Training over Lattices using differenced Maximum Mutual Information (dMMI)

Erik McDermott - Google Inc. September 14, 2012

1 / 31

slide-2
SLIDE 2

Overview Background Unification of Margin-modified MPE and MMI

Overview

◮ Error-weighted training using explicit models of error

(MPE/MWE/sMBR etc.)

◮ Shifting of loss function: “margin” (MCE, MPE, bMMI)

◮ Make shift proportional to error. ◮ bMMI (Povey et al. 2008): implicit error model, just use

error-proportional shift.

◮ Extension of “point” use of margin to integral over margin

interval → proposal of “differenced MMI” (dMMI)

◮ dMMI: margin & error-dependent loss smoothing/integration

◮ Unifies margin-modified MMI and MPE ◮ More general than MPE yet allows a simpler implementation

using difference of standard Forward-Backward statistics

◮ Bayesian view & further generalization.

2 / 31

slide-3
SLIDE 3

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

Integrated system optimization

3 / 31

slide-4
SLIDE 4

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

Non-uniform error for discriminative training

4 / 31

slide-5
SLIDE 5

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

Minimum Phone Error (Povey 2002); Decision boundaries

5 / 31

slide-6
SLIDE 6

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

MPE as multi-dimensional sigmoid

6 / 31

slide-7
SLIDE 7

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

MPE derivative - String picture

7 / 31

slide-8
SLIDE 8

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

Modified Forward-Backward for MPE over lattices

8 / 31

slide-9
SLIDE 9

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

MPE derivative - Arc picture

9 / 31

slide-10
SLIDE 10

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

New approaches based on margin

◮ Intuition: improve generalization by making the training

problem “harder”.

◮ “Large-margin MCE” (Yu et al., 2007)

◮ Extension of McDermott & Katagiri (2004)’s Parzen window

analysis of MCE → iteratively increase MCE sigmoid bias term

◮ Applicable to implicit error models:

◮ “Large-margin HMMs” (Sha & Saul, 2007): Insertion of

fine-grained error (e.g. Edit Distance) into the margin term

◮ “Boosted MMI” (Povey et al., Saon & Povey, 2008)

◮ Heigold’s unified theory (2008): bring margin to standard

MMI/MPE/MCE approaches

10 / 31

slide-11
SLIDE 11

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

Linking ASR and Machine Learning

11 / 31

slide-12
SLIDE 12

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

Modifying MPE/MMI with margin term

“Boost” likelihoods (Povey & Saon (2008), Heigold (2008)):

12 / 31

slide-13
SLIDE 13

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

Margin-modified MPE

13 / 31

slide-14
SLIDE 14

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

Effect of margin on MPE loss

14 / 31

slide-15
SLIDE 15

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

Margin-modified MMI (Povey & Saon, 2008)

15 / 31

slide-16
SLIDE 16

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

Effect of margin on MMI loss

16 / 31

slide-17
SLIDE 17

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

◮ 2300h Arabic Broadcast News (GALE) ◮ 2000h English conversational telephone speech (CTS)

17 / 31

slide-18
SLIDE 18

Overview Background Unification of Margin-modified MPE and MMI Large-scale discriminative training Minimum Phone Error Margin-based variants; bMMI

Margin-modified MPE & MMI summary

“Boost” likelihoods (Povey & Saon (2008), Heigold (2008)):

18 / 31

slide-19
SLIDE 19

Overview Background Unification of Margin-modified MPE and MMI Integrated framework Bayesian view & generalization

dMMI: the “integrated” framework

Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training McDermott & Nakamura, Interspeech 2009

◮ Mathematical link between margin-modified MPE and MMI; ◮ Proposal of “dMMI”

19 / 31

slide-20
SLIDE 20

Overview Background Unification of Margin-modified MPE and MMI Integrated framework Bayesian view & generalization

MPE is the derivative of modified MMI!

20 / 31

slide-21
SLIDE 21

Overview Background Unification of Margin-modified MPE and MMI Integrated framework Bayesian view & generalization

dMMI definition

Using previous result & Fundamental Theorem of Calculus:

21 / 31

slide-22
SLIDE 22

Overview Background Unification of Margin-modified MPE and MMI Integrated framework Bayesian view & generalization

dMMI in practice

Just use “reverse-boosted” denominator lattice as numerator lattice:

22 / 31

slide-23
SLIDE 23

Overview Background Unification of Margin-modified MPE and MMI Integrated framework Bayesian view & generalization

Approximating MPE

As margin interval is reduced, dMMI converges to MPE Property must hold for any correct implementations of bMMI and MPE!

23 / 31

slide-24
SLIDE 24

Overview Background Unification of Margin-modified MPE and MMI Integrated framework Bayesian view & generalization

Integrated view of discriminative training

24 / 31

slide-25
SLIDE 25

Overview Background Unification of Margin-modified MPE and MMI Integrated framework Bayesian view & generalization

Leveraging approximated, shifted hinge functions

25 / 31

slide-26
SLIDE 26

Overview Background Unification of Margin-modified MPE and MMI Integrated framework Bayesian view & generalization

Gradient-based optimization using dMMI

26 / 31

slide-27
SLIDE 27

Overview Background Unification of Margin-modified MPE and MMI Integrated framework Bayesian view & generalization

dMMI as integral over margin prior

27 / 31

slide-28
SLIDE 28

Overview Background Unification of Margin-modified MPE and MMI Integrated framework Bayesian view & generalization

dMMI as building block for modeling general margin priors

28 / 31

slide-29
SLIDE 29

Overview Background Unification of Margin-modified MPE and MMI Integrated framework Bayesian view & generalization

Numerical approximation of arbitrary margin priors

◮ E.g. prior p (σ) = c exp (−c|σ|) used for Minimum Relative

Entropy Discrimination, Jebara (2004)

◮ Here: use prior in context of standard HMM-based

discriminative training

◮ Approximate prior using sum of step functions (cf Lebesgue

integration)

29 / 31

slide-30
SLIDE 30

Overview Background Unification of Margin-modified MPE and MMI Integrated framework Bayesian view & generalization

Buidling margin prior using dMMI

30 / 31

slide-31
SLIDE 31

Overview Background Unification of Margin-modified MPE and MMI Integrated framework Bayesian view & generalization

Summary

◮ MPE explicitly models non-uniform error, e.g. phone or word

error including insertions, deletions & substitutions

◮ Margin-based “Boosted MMI” (bMMI):

◮ super-cheap approach for incorporating non-uniform error into

loss function;

◮ however objective is still (modified) Mutual Information, not

explicit model of error.

◮ “Differenced MMI” (dMMI) is similarly cheap alternative that

◮ is explicitly linked to error; ◮ generalizes MPE; ◮ possibly offers better performance (Delcroix et al. ICASSP

2012; Kubo et al. Interspeech 2012);

◮ can be further generalized to define arbitrary margin priors for

lattice-based discriminative training.

31 / 31