missing data masks in all combinations multi band decoding
play

Missing-data masks in all-combinations multi-band decoding It is - PDF document

morris@idiap.ch http://www.idiap.ch/ Missing-data masks in all-combinations multi-band decoding It is shown that for MAP decoding with all-comb experts multi-band expert weighting can make use of same soft missing-data mask as used with


  1. morris@idiap.ch http://www.idiap.ch/ Missing-data masks in all-combinations multi-band decoding It is shown that for MAP decoding with all-comb experts • multi-band expert weighting can make use of same soft missing-data mask as used with “missing-data” ASR • experts must be combined during , not before, decoding RESPITE meeting, 25-26 Jan 2002, Page 1

  2. MAP decoder architectures MAP decoding => experts combined during Viterbi All-combination multi-band SMD HMM/MLP MLP FE MLP Decoder hello? MLP FE Separate MLPs estimate state posteriors for every MLP combination of sub-bands Soft MD Mask All-combination multi-band SMD HMM/GMM FE decoder GMM hello? FE Single GMM uses marginal PDFs to estimate state posteriors for every Soft MD Mask combination of sub-bands Usual missing-data ASR = SMD HMM/GMM FE decoder GMM hello? FE Single GMM uses marginal Soft MD Mask PDFs to estimate state densities for each data coefficient RESPITE meeting, 25-26 Jan 2002, Page 2

  3. All-combinations posteriors based decoder can make use of same mask as usual missing-data decoder Notation Q state sequence for one utterance X spectrotemporal signal for one utterance Ω ω f t ( ) ˆ ˆ x f t = P , clean estimated SMD mask, , ( µ g t ) ⇔ = 1 M band g t MD indicator mask, clean , , ( ) P c P Xclean Usual missing-data (GMM) MAP objective Q E P Q X Θ [ ( , ) ] ˆ = arg Q max E P Q X Θ [ ( , ) ] ∝ ( ) p X Q ∫ ( ) p X X obs ( ) P Q d X ( ) P c δ X ( ) U 0 X obs ( , ) = + 1 – p X X obs P c ( ) – X obs Posteriors based (MLP) MAP objective previously tested ( , ) Ω ˆ = arg = 0.5 Q max , P Q X M assumes Q M Posteriors based (MLP) MAP objective using MD mask ( , ) Ω ˆ ˆ = arg Q max , P Q M X makes use of Q M ( ) P Q X M ( , ) = arg max , P M X Q M ∏ ∏ ( ) ≅ ω g t ( ω g t ) 1 – P M X , , , ∈ , ∉ g t M g t M During Viterbi, each frame selects single expert RESPITE meeting, 25-26 Jan 2002, Page 3

  4. Soft missing data mask for posteriors based decoder Per coefficient soft mask, P(x coeff(f,t) missing) Ω ˆ coeffs ω f t , ( ) ˆ x f t = P , clean , 1 … 6 f = 1 … 8 t = ∏ ( ) ≅ ω f t P c = P Xclean , , f t Per band mask, P(x band(g,t) missing) If P(band clean) = P(all components in band are clean), ∏ Ω ˆ band ω g t , ω f t = , , ∈ f g 1 … 2 g = 1 … 8 t = ∏ ∏ ( ) ≅ ω g t ( ω g t ) P M X 1 – , , , ∈ , ∉ g t M g t M RESPITE meeting, 25-26 Jan 2002, Page 4

  5. Ω = 0.5 Test results promissing, even when ( , ) ˆ Q = arg max , P Q X M Q M 100 90 baseline MAP FC SMD 80 1 70 60 2 WER 50 40 3 30 20 10 0 0 10 20 clean SNR/dB Fig shows WER (Aurora, av. over 4 noise conditions) for 1. baseline HMM/GMM 2. HMM/GMM AC multi-stream ( , ) Ω ˆ Q = arg max , P Q X M = 0.5 (assumes ) Q M Stationary band mask. Streams are MFCC with 1st & 2nd differences. 3. Usual missing-data ASR = SMD HMM/GMM SMD RESPITE meeting, 25-26 Jan 2002, Page 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend