Missing-data masks in all-combinations multi-band decoding It is - - PDF document

missing data masks in all combinations multi band decoding
SMART_READER_LITE
LIVE PREVIEW

Missing-data masks in all-combinations multi-band decoding It is - - PDF document

morris@idiap.ch http://www.idiap.ch/ Missing-data masks in all-combinations multi-band decoding It is shown that for MAP decoding with all-comb experts multi-band expert weighting can make use of same soft missing-data mask as used with


slide-1
SLIDE 1

RESPITE meeting, 25-26 Jan 2002, Page 1

morris@idiap.ch http://www.idiap.ch/

Missing-data masks in all-combinations multi-band decoding

It is shown that for MAP decoding with all-comb experts

  • multi-band expert weighting can make use of same soft

missing-data mask as used with “missing-data” ASR

  • experts must be combined during, not before, decoding
slide-2
SLIDE 2

RESPITE meeting, 25-26 Jan 2002, Page 2

MAP decoder architectures MAP decoding => experts combined during Viterbi All-combination multi-band SMD HMM/MLP All-combination multi-band SMD HMM/GMM Usual missing-data ASR = SMD HMM/GMM

Separate MLPs estimate state posteriors for every combination of sub-bands MLP MLP MLP MLP hello? Decoder Soft MD Mask FE FE Single GMM uses marginal PDFs to estimate state posteriors for every combination of sub-bands hello? FE FE decoder GMM Soft MD Mask hello? FE FE decoder GMM Soft MD Mask Single GMM uses marginal PDFs to estimate state densities for each data coefficient

slide-3
SLIDE 3

RESPITE meeting, 25-26 Jan 2002, Page 3

All-combinations posteriors based decoder can make use of same mask as usual missing-data decoder Notation state sequence for one utterance spectrotemporal signal for one utterance estimated SMD mask, MD indicator mask, clean Usual missing-data (GMM) MAP objective Posteriors based (MLP) MAP objective previously tested assumes Posteriors based (MLP) MAP objective using MD mask makes use of During Viterbi, each frame selects single expert

Q X Ω ˆ ω f t

,

P ˆ x f t

, clean

( ) = M µg t

,

1 = ( ) bandg t

,

⇔ Pc P Xclean ( ) Q ˆ max arg

QE P Q X Θ

, ( ) [ ] = E P Q X Θ , ( ) [ ] P Q ( ) p X Q ( )p X Xobs ( ) X d

∝ p X Xobs ( ) Pcδ X

Xobs – ( )

1 Pc – ( )U 0 Xobs , ( ) + = Q ˆ max arg

Q M , P Q X M

, ( ) = Ω 0.5 = Q ˆ max arg

Q M , P Q M X

, ( ) = Ω ˆ max arg

Q M , P M X

( )P Q X M , ( ) = P M X ( ) ωg t

, g t , M ∈

1 ωg t

,

– ( )

g t , M ∉

slide-4
SLIDE 4

RESPITE meeting, 25-26 Jan 2002, Page 4

Soft missing data mask for posteriors based decoder Per coefficient soft mask, P(x coeff(f,t) missing) f 1…6 = t 1…8 =

Ω ˆ coeffs ω f t

,

, P ˆ x f t

, clean

( ) = Pc P Xclean ( ) ω f t

, f t ,

≅ =

Per band mask, P(x band(g,t) missing) g 1…2 = t 1…8 =

P M X ( ) ωg t

, g t , M ∈

1 ωg t

,

– ( )

g t , M ∉

≅ Ω ˆ band ωg t

,

, ω f t

, f g ∈

=

If P(band clean) = P(all components in band are clean),

slide-5
SLIDE 5

RESPITE meeting, 25-26 Jan 2002, Page 5

Test results promissing, even when Fig shows WER (Aurora, av. over 4 noise conditions) for

  • 1. baseline HMM/GMM
  • 2. HMM/GMM AC multi-stream

(assumes ) Stationary band mask. Streams are MFCC with 1st & 2nd differences.

  • 3. Usual missing-data ASR = SMD HMM/GMM SMD

10 20 clean 10 20 30 40 50 60 70 80 90 100

SNR/dB WER

baseline MAP FC SMD

Q ˆ max arg

Q M , P Q X M

, ( ) =

1 2 3

Ω 0.5 = Q ˆ max arg

Q M , P Q X M

, ( ) = Ω 0.5 =