an integrated framework for margin based sequential
play

An Integrated Framework for Margin-based Sequential Discriminative - PowerPoint PPT Presentation

Overview Background Unification of Margin-modified MPE and MMI An Integrated Framework for Margin-based Sequential Discriminative Training over Lattices using differenced Maximum Mutual Information (dMMI) Erik McDermott - Google Inc.


  1. Overview Background Unification of Margin-modified MPE and MMI An Integrated Framework for Margin-based Sequential Discriminative Training over Lattices using differenced Maximum Mutual Information (dMMI) Erik McDermott - Google Inc. September 14, 2012 1 / 31

  2. Overview Background Unification of Margin-modified MPE and MMI Overview ◮ Error-weighted training using explicit models of error (MPE/MWE/sMBR etc.) ◮ Shifting of loss function: “margin” (MCE, MPE, bMMI) ◮ Make shift proportional to error. ◮ bMMI (Povey et al. 2008): implicit error model , just use error-proportional shift. ◮ Extension of “point” use of margin to integral over margin interval → proposal of “differenced MMI” (dMMI) ◮ dMMI: margin & error-dependent loss smoothing/integration ◮ Unifies margin-modified MMI and MPE ◮ More general than MPE yet allows a simpler implementation using difference of standard Forward-Backward statistics ◮ Bayesian view & further generalization. 2 / 31

  3. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Integrated system optimization 3 / 31

  4. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Non-uniform error for discriminative training 4 / 31

  5. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Minimum Phone Error (Povey 2002); Decision boundaries 5 / 31

  6. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI MPE as multi-dimensional sigmoid 6 / 31

  7. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI MPE derivative - String picture 7 / 31

  8. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Modified Forward-Backward for MPE over lattices 8 / 31

  9. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI MPE derivative - Arc picture 9 / 31

  10. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI New approaches based on margin ◮ Intuition: improve generalization by making the training problem “harder”. ◮ “Large-margin MCE” (Yu et al., 2007) ◮ Extension of McDermott & Katagiri (2004)’s Parzen window analysis of MCE → iteratively increase MCE sigmoid bias term ◮ Applicable to implicit error models : ◮ “Large-margin HMMs” (Sha & Saul, 2007): Insertion of fine-grained error (e.g. Edit Distance) into the margin term ◮ “Boosted MMI” (Povey et al., Saon & Povey, 2008) ◮ Heigold’s unified theory (2008): bring margin to standard MMI/MPE/MCE approaches 10 / 31

  11. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Linking ASR and Machine Learning 11 / 31

  12. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Modifying MPE/MMI with margin term “Boost” likelihoods (Povey & Saon (2008), Heigold (2008)): 12 / 31

  13. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Margin-modified MPE 13 / 31

  14. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Effect of margin on MPE loss 14 / 31

  15. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Margin-modified MMI (Povey & Saon, 2008) 15 / 31

  16. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Effect of margin on MMI loss 16 / 31

  17. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI ◮ 2300h Arabic Broadcast News (GALE) ◮ 2000h English conversational telephone speech (CTS) 17 / 31

  18. Overview Large-scale discriminative training Background Minimum Phone Error Unification of Margin-modified MPE and MMI Margin-based variants; bMMI Margin-modified MPE & MMI summary “Boost” likelihoods (Povey & Saon (2008), Heigold (2008)): 18 / 31

  19. Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI dMMI: the “integrated” framework Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training McDermott & Nakamura, Interspeech 2009 ◮ Mathematical link between margin-modified MPE and MMI; ◮ Proposal of “dMMI” 19 / 31

  20. Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI MPE is the derivative of modified MMI! 20 / 31

  21. Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI dMMI definition Using previous result & Fundamental Theorem of Calculus: 21 / 31

  22. Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI dMMI in practice Just use “reverse-boosted” denominator lattice as numerator lattice: 22 / 31

  23. Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI Approximating MPE As margin interval is reduced, dMMI converges to MPE Property must hold for any correct implementations of bMMI and MPE! 23 / 31

  24. Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI Integrated view of discriminative training 24 / 31

  25. Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI Leveraging approximated, shifted hinge functions 25 / 31

  26. Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI Gradient-based optimization using dMMI 26 / 31

  27. Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI dMMI as integral over margin prior 27 / 31

  28. Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI dMMI as building block for modeling general margin priors 28 / 31

  29. Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI Numerical approximation of arbitrary margin priors ◮ E.g. prior p ( σ ) = c exp ( − c | σ | ) used for Minimum Relative Entropy Discrimination, Jebara (2004) ◮ Here: use prior in context of standard HMM-based discriminative training ◮ Approximate prior using sum of step functions (cf Lebesgue integration) 29 / 31

  30. Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI Buidling margin prior using dMMI 30 / 31

  31. Overview Integrated framework Background Bayesian view & generalization Unification of Margin-modified MPE and MMI Summary ◮ MPE explicitly models non-uniform error, e.g. phone or word error including insertions, deletions & substitutions ◮ Margin-based “Boosted MMI” (bMMI): ◮ super-cheap approach for incorporating non-uniform error into loss function; ◮ however objective is still (modified) Mutual Information, not explicit model of error. ◮ “Differenced MMI” (dMMI) is similarly cheap alternative that ◮ is explicitly linked to error; ◮ generalizes MPE; ◮ possibly offers better performance (Delcroix et al. ICASSP 2012; Kubo et al. Interspeech 2012); ◮ can be further generalized to define arbitrary margin priors for lattice-based discriminative training. 31 / 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend