Experiments with Multisource Decoding and A priori Fragments - - PowerPoint PPT Presentation

▶

Sep 11, 2023 270 likes •407 views

Experiments with Multisource Decoding and A priori Fragments Speech and Hearing Research Group, Dept. Computer Science, University of Sheffield, UK June 6, 2002 Some Experiments with Multisource Decoding and A priori

SLIDE 1

Experiments with Multisource Decoding and ‘A priori’ Fragments

Speech and Hearing Research Group,

Dept. Computer Science,

University of Sheffield, UK June 6, 2002

SLIDE 2

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

The Multisource System

time freqeuencty

Noisy Speech Search Top Down Speech and Background Fragments Bottom Up Processing Multisource Decoder Coherent Fragments Word-sequence Hypothesis Speech Models Segregation Hypothesis Speech/Background

Testing issues:

Need highly non-stationary test data to properly test the approach
Need a strategy for allowing back-end to be tested in isolation of

front-end

Jun 7, 2002 1

SLIDE 3

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

The Noise Sources

Violins

50 100 150 200 250 300 350 400 450 500 5 10 15 20 25 30

Drums

100 200 300 400 500 600 5 10 15 20 25 30

Speech (AURORA utterances with opposing gender)

Jun 7, 2002 2

SLIDE 4

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Constructing the test set

Aurora test set A clean utterances ordered by length
318 M/F pairs of matched length identified
i.e. 318 target utterances, and 318 masking utterances
10 second Drum and Violin extracts downsampled to 8KHz and

filtered with G712 filter

Drum and Violin masking noises for each of the 318 targets cut

from the 10 second extracts

AURORA targets + masking noise mixed so that SNR averages at

0dB during target speech

Jun 7, 2002 3

SLIDE 5

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Using A Priori Test Fragments

Use knowledge of signals prior to mixing to mark out a set of ‘ideal’ test

fragments. i.e. Each fragment contains energy from either the target or the

mask.

Coherent Fragments Speech and Background Fragments Search Top Down Multisource Decoder Word-sequence Hypothesis Speech Models Segregation Hypothesis Speech/Background Speech and Background Fragments Speech Source Noise Source

time freqeuencty

Noisy Speech Bottom Up Processing Apriori Fragments

Jun 7, 2002 4

SLIDE 6

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Example fragments for Speech + Drums

mixed 50 Hz 4 KHz fragments Frequency correct segmentation Time (s) 0.5 1.0 1.5 2.0

Jun 7, 2002 5

SLIDE 7

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Example fragments for Speech + Speech

mixed 50 Hz 4 KHz fragments Frequency correct segmentation Time (s) 0.5 1.0 1.5 2.0

Jun 7, 2002 6

SLIDE 8

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Recognition Results

speech violins drums Standard 28.6 7.9

Soft MD 30.1 54.3 47.0 Adaptive 29.4 76.7 45.0 a priori MD 94.8 94.0 94.4 fragments 42.4 65.4 58.6 i.e. disappointing results - insufficient information in speech models to

rganise the fragments.

Jun 7, 2002 7

SLIDE 9

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Gender Dependency

2 5 2 8 3 male + 7 0 9 4 female 50 Hz 4 KHz correct segmentation (blue=male, red=female) Frequency male hypothesis: − 5 − − 3 female hypothesis: 7 0 9 4 Time (s) 0.5 1.0 1.5 2.0

Awaiting results...

Jun 7, 2002 8

SLIDE 10

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

High Frequency Recruitment

The decoder is given the correct segmentation in the low frequency region. Can it selectively recruit the correct high frequency fragments?

2 5 2 8 3 + 7 0 9 4 50 Hz 4 KHz correct speech fragments Frequency hypothesis: 8 5 3 8 3 Time (s) 0.5 1.0 1.5 2.0

Jun 7, 2002 9

SLIDE 11

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Hi Freq Recruitment Results

speech violins drums full apriori 94.8 94.4 94.4 low freq apriori 78.5 79.7 76.2 low freq + fragments 86.9 88.0 85.3 Kind of works... but need to check results are more than just chance. Suggests that Multisource decoder may work if a subset of the fragments can be identified as speech prior to decoding.

Jun 7, 2002 10

SLIDE 12

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Sequential Grouping

Modelling of primitive grouping forces that occur between fragments.

Fragments

T1 T2 T3 T4 T5 T6

time Hypotheses

w1 w2 w3 w4 w5 w6 w7 w8

May be implemented by adding probabili- ties to decoding paths c.f. bigram/trigram language models. Need to be careful to ensure we preserve Markov assumption. i.e. given the state the future must be independent of the past. Work in progress...

Jun 7, 2002 11

SLIDE 13

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Summary of Sheffield RESPITE work

Use of harmonicity information Evaluations on Aurora Representations e.g. log vs cuberoot compression Theoretical Development Experiments with SNR masks Soft Fragments A priori fragments GUI Code maintainance Development of data flow system Scripting language

CTK Multisource Decoder

Multisource Decoder Development Decoder Tuning Efficient MD computation HMM Decoder Gender Dependency Soft Masks Adaptive noise estimates Model Combination

Experiments with Multisource Decoding and ‘A priori’ Fragments

Speech and Hearing Research Group,

University of Sheffield, UK June 6, 2002

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

The Multisource System

Testing issues:

front-end

Jun 7, 2002 1

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

The Noise Sources

Jun 7, 2002 2

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Constructing the test set

filtered with G712 filter

from the 10 second extracts

0dB during target speech

Jun 7, 2002 3

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Using A Priori Test Fragments

Use knowledge of signals prior to mixing to mark out a set of ‘ideal’ test

mask.

Jun 7, 2002 4

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Example fragments for Speech + Drums

mixed 50 Hz 4 KHz fragments Frequency correct segmentation Time (s) 0.5 1.0 1.5 2.0

Jun 7, 2002 5

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Example fragments for Speech + Speech

mixed 50 Hz 4 KHz fragments Frequency correct segmentation Time (s) 0.5 1.0 1.5 2.0

Jun 7, 2002 6

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Recognition Results

speech violins drums Standard 28.6 7.9

Soft MD 30.1 54.3 47.0 Adaptive 29.4 76.7 45.0 a priori MD 94.8 94.0 94.4 fragments 42.4 65.4 58.6 i.e. disappointing results - insufficient information in speech models to

Jun 7, 2002 7

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Gender Dependency

Awaiting results...

Jun 7, 2002 8

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

High Frequency Recruitment

The decoder is given the correct segmentation in the low frequency region. Can it selectively recruit the correct high frequency fragments?

2 5 2 8 3 + 7 0 9 4 50 Hz 4 KHz correct speech fragments Frequency hypothesis: 8 5 3 8 3 Time (s) 0.5 1.0 1.5 2.0

Jun 7, 2002 9

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Hi Freq Recruitment Results

Jun 7, 2002 10

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Sequential Grouping

Modelling of primitive grouping forces that occur between fragments.

Fragments

time Hypotheses

May be implemented by adding probabili- ties to decoding paths c.f. bigram/trigram language models. Need to be careful to ensure we preserve Markov assumption. i.e. given the state the future must be independent of the past. Work in progress...

Jun 7, 2002 11

Some Experiments with Multisource Decoding and ‘A priori’ Fragments

Summary of Sheffield RESPITE work

Use of harmonicity information Evaluations on Aurora Representations e.g. log vs cuberoot compression Theoretical Development Experiments with SNR masks Soft Fragments A priori fragments GUI Code maintainance Development of data flow system Scripting language

CTK Multisource Decoder

Multisource Decoder Development Decoder Tuning Efficient MD computation HMM Decoder Gender Dependency Soft Masks Adaptive noise estimates Model Combination

Missing Data

Jun 7, 2002 12