Dual-Channel Acoustic Detection of X. Niu & J. van Santen - - PowerPoint PPT Presentation

dual channel acoustic detection of
SMART_READER_LITE
LIVE PREVIEW

Dual-Channel Acoustic Detection of X. Niu & J. van Santen - - PowerPoint PPT Presentation

Dual-Channel Acoustic Detection of Nasalization Statuses Dual-Channel Acoustic Detection of X. Niu & J. van Santen Nasalization Statuses Introduction Method Dual-channel acoustic model Xiaochuan Niu Dual-channel analysis method


slide-1
SLIDE 1

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Dual-Channel Acoustic Detection of Nasalization Statuses

Xiaochuan Niu Adviser: Jan P . H. van Santen

Center for Spoken Language Understanding OGI School of Science & Engineering at OHSU

November 27, 2007 in SRI

slide-2
SLIDE 2

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Outline

1

Introduction

2

Method Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

3

Experiments Simulation Speech materials Detection tasks Results

4

Conclusion

slide-3
SLIDE 3

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Velopharyngeal control during speech

tract Pharyngeal Nasal tract Oral tract Glottal folds Velum Velopharyngeal port

Appropriate control of the VP port

Closure: fricatives, plosives, non-nasal vowels Opening: nasals, nasal vowels, nasalized vowels

Lack of coordination

Resonance: hypo- or hyper-nasality Airflow: nasal emission

slide-4
SLIDE 4

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Statuses of nasal resonance

Different oral-nasal articulatory configurations that can be identified perceptually from acoustic signals

Vo Oral opening only (e.g. non-nasal vowels) Ns Nasal opening only (e.g. nasals) Nv Oral & nasal opening simultaneously (e.g. nasalized vowels)

Research motivation: non-invasive detection of nasalization statuses for

Understanding the VP control mechanism during normal nasalization Analysis and enhancement of disordered speech with resonance problems

slide-5
SLIDE 5

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Statuses of nasal resonance

Different oral-nasal articulatory configurations that can be identified perceptually from acoustic signals

Vo Oral opening only (e.g. non-nasal vowels) Ns Nasal opening only (e.g. nasals) Nv Oral & nasal opening simultaneously (e.g. nasalized vowels)

Research motivation: non-invasive detection of nasalization statuses for

Understanding the VP control mechanism during normal nasalization Analysis and enhancement of disordered speech with resonance problems

slide-6
SLIDE 6

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Acoustic features of nasalization (review)

Single-channel spectral characteristics of nasalized vowels

Qualitative observations

Reduced amplitude and/or upward-shift of F1 Pole-zero pair in F1 region Pole-zero pair in 200-500 Hz, etc.

Quantitative features

Parameters of spectral “flatness” General spectral envelop features (MFCC, etc.)

Dual-channel acoustic measurement

Energy balance (nasalance): En/(En + Em) Oral-nasal transfer ratio function (ONTRIF) analysis (Niu et al. 2005)

slide-7
SLIDE 7

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Outline

1

Introduction

2

Method Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

3

Experiments Simulation Speech materials Detection tasks Results

4

Conclusion

slide-8
SLIDE 8

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Dual-channel acoustic model

Transmission-line model for a lossy cylindrical acoustic tube (Flanagan, 1972)

dx U U+dU P P+dP − P + P+dP + −

w

L

w

R

w

C G C

U U+dU

R/2 L/2 R/2 L/2

Acoustic waves in a tube modeled as electrical waves in a transmission line Sound pressure (P) vs. Voltage Volume velocity (U) vs. Current Circuit parameters are determined by the physical properties of the tube

slide-9
SLIDE 9

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Dual-channel acoustic model

+ −

Pv Pml Pnl

+ + − − − +

Pg Ug Upi Upo Uno Umo Umi Uni Usub Zsub Znl Zml Tp Tm Tn

» Pg Upi – = » Ap Bp Cp Dp – » Pv Upo – » Pv Umi – = » Am Bm Cm Dm – » Pml Umo – » Pv Uni – = » An Bn Cn Dn – » Pnl Uno – Upo = Umi + Uni Usub = Ug + Upi

A circuit network represents voiced sound production through nasal and oral channels (Childers, 2000) Transmission properties of acoustic waves through vocal tracts are modeled by chain-matrix equations Coupling effects result from the constraints applied by the boundary equations

slide-10
SLIDE 10

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Dual-channel acoustic model

+ −

Pv Pml Pnl

+ + − − − +

Pg Ug Upi Upo Uno Umo Umi Uni Usub Zsub Znl Zml Tp Tm Tn

» Pg Upi – = » Ap Bp Cp Dp – » Pv Upo – » Pv Umi – = » Am Bm Cm Dm – » Pml Umo – » Pv Uni – = » An Bn Cn Dn – » Pnl Uno – Upo = Umi + Uni Usub = Ug + Upi

A circuit network represents voiced sound production through nasal and oral channels (Childers, 2000) Transmission properties of acoustic waves through vocal tracts are modeled by chain-matrix equations Coupling effects result from the constraints applied by the boundary equations

slide-11
SLIDE 11

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Dual-channel acoustic model

+ −

Pv Pml Pnl

+ + − − − +

Pg Ug Upi Upo Uno Umo Umi Uni Usub Zsub Znl Zml Tp Tm Tn

» Pg Upi – = » Ap Bp Cp Dp – » Pv Upo – » Pv Umi – = » Am Bm Cm Dm – » Pml Umo – » Pv Uni – = » An Bn Cn Dn – » Pnl Uno – Upo = Umi + Uni Usub = Ug + Upi

A circuit network represents voiced sound production through nasal and oral channels (Childers, 2000) Transmission properties of acoustic waves through vocal tracts are modeled by chain-matrix equations Coupling effects result from the constraints applied by the boundary equations

slide-12
SLIDE 12

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Outline

1

Introduction

2

Method Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

3

Experiments Simulation Speech materials Detection tasks Results

4

Conclusion

slide-13
SLIDE 13

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Dual-channel analysis method

Oral-nasal transfer ratio function (ONTRIF) Tn/m (ω) ≡ Pn (ω) Pm (ω) = (AmZml + Bm) Znr (AnZnl + Bn) Zmr Properties of the ONTRIF

Tn/m (ω) is independent of the acoustic system below the VP port; Poles stem from the transfer admittance of the nasal cavity; zeros stem from the transfer admittance of the

  • ral cavity; sinuses result in pole-zero pairs;

Tn/m (ω) can be estimated from dual-channel signals directly.

slide-14
SLIDE 14

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Dual-channel analysis method

Oral-nasal transfer ratio function (ONTRIF) Tn/m (ω) ≡ Pn (ω) Pm (ω) = (AmZml + Bm) Znr (AnZnl + Bn) Zmr Properties of the ONTRIF

Tn/m (ω) is independent of the acoustic system below the VP port; Poles stem from the transfer admittance of the nasal cavity; zeros stem from the transfer admittance of the

  • ral cavity; sinuses result in pole-zero pairs;

Tn/m (ω) can be estimated from dual-channel signals directly.

slide-15
SLIDE 15

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Dual-channel analysis method

Estimation of the ONTRIF

Assuming a ARMA structure in the Z-domain Tn/m(Z) = B(Z) A(Z) = b0 + b1Z −1 + b2Z −2 + · · · + bNZ −N 1 + a1Z −1 + a2Z −2 + · · · + aMZ −M , Parameters are estimated by minimizing the mean square error of the following system

e (n) p (n) A(Z)

n m

p (n) B(Z)

slide-16
SLIDE 16

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Outline

1

Introduction

2

Method Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

3

Experiments Simulation Speech materials Detection tasks Results

4

Conclusion

slide-17
SLIDE 17

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Nasalization feature extraction

Dual-channel data acquisition (NasalView system)

Headset Sound-separating plate 2 microphones 2-channel amplifier

Generalized dual-channel model for all statuses

Nv Oral & nasal output Vo Oral output & vibrations across the velum Ns Nasal output & tissue radiations

slide-18
SLIDE 18

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Nasalization feature extraction

Short-time ONTRIF analysis (sample word: dean)

−1 +/−1 1 Nasal−oral signal Power spectrogram of the ONTRIF Frequency (Hz) 5000 10000 ONTRIF in Mel−frequency scale Mel−bin .pau dcl d iy iy_~ n ax_~ .br .pau 20 40 60 80 100

slide-19
SLIDE 19

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Nasalization feature extraction

Feature extraction algorithm

1

High-pass filter the oral and nasal signals, obtaining pm(n) and pn(n);

2

Segment pm(n) and pn(n) into equal-length short-time frames with a fixed frame shift;

3

For each pair of oral and nasal frames,

1

Perform the ONTRIF estimation, obtaining Tn/m (z);

2

Evaluate Tn/m (z) to obtain the magnitude response, ˛ ˛Tn/m [k] ˛ ˛2 (k is the frequency index);

3

Calculate the log-magnitude, log h˛ ˛Tn/m [k] ˛ ˛2i ;

4

Apply Mel-scaled triangle filters to the log-magnitude,

  • btaining M [i] (i is the index of Mel-bins);

5

Apply a type-II discrete cosine transform (DCT-II) to M [i], obtaining coefficients, C [j] (j is the index of the components);

6

Store the first N dimensions of C [j] as a feature vector

  • f current frame;
slide-20
SLIDE 20

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Outline

1

Introduction

2

Method Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

3

Experiments Simulation Speech materials Detection tasks Results

4

Conclusion

slide-21
SLIDE 21

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Nasalization detector

Goal: to evaluate the dual-channel nasalization feature Bayes classifier for 3 nasalization statuses

State conditional PDF, p (X/S), is modeled by a Gaussian or a GMM; Priors of statuses are assumed to be the same; The decision rule is, given x is the feature vector of a frame, S∗ = arg max

sj

[p (x/sj)] , sj ∈ {Vo, Ns, Nv} .

slide-22
SLIDE 22

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Outline

1

Introduction

2

Method Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

3

Experiments Simulation Speech materials Detection tasks Results

4

Conclusion

slide-23
SLIDE 23

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Simulation

Purpose: to validate the ONTRIF analysis method with synthetic speech Design of an articulatory synthesizer

calculation Chain matrix conversion RLC IDFT Glottal source generation calculation Transfer function

* *

Area functions

  • f the vocal tract

parameters Source u (n)

g

h (n)

m

h (n)

n

u (n)

no

u (n)

mo

slide-24
SLIDE 24

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Simulation

Articulatory configuration for nasalized /aa/

2 4 6 8 10 12 14 16 18 20 8 6 4 2 2 4 6 8 distance from the glottis (cm) area (cm2) pharyngeal tract

  • ral tract

nasal tract position of VP opening position of a sinus opening

slide-25
SLIDE 25

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Simulation

Power spectra of the pre-calculated and estimated

  • ral-nasal transfer ratio functions of the nasalized /aa/

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 −50 −40 −30 −20 −10 10 20 30 Hz dB pre−calculated estimated Fn1 Fsin

slide-26
SLIDE 26

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Outline

1

Introduction

2

Method Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

3

Experiments Simulation Speech materials Detection tasks Results

4

Conclusion

slide-27
SLIDE 27

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Speech materials

Corpus design

24 NVN and 24 CVC words in carrier sentences N ∈ {/m/, /n/, /ng/} C ∈ {/t/, /d/, /p/, /b/, /k/, /g/} V ∈ {/iy/, /ae/, /aa/, /uw/}

Dual-channel corpus recorded with the NasalView

3 male and 3 female native American speakers 3 repetitions of the recording session of all sentences Gains of two channels calibrated to the same level before each session Phoneme boundaries manually labeled

Vowels in nasal contexts marked as nasalized Vowels in plosive contexts marked as non-nasalized

Pseudo-single-channel corpus generation

Signals of two channels arithmetically added up

slide-28
SLIDE 28

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Speech materials

Corpus design

24 NVN and 24 CVC words in carrier sentences N ∈ {/m/, /n/, /ng/} C ∈ {/t/, /d/, /p/, /b/, /k/, /g/} V ∈ {/iy/, /ae/, /aa/, /uw/}

Dual-channel corpus recorded with the NasalView

3 male and 3 female native American speakers 3 repetitions of the recording session of all sentences Gains of two channels calibrated to the same level before each session Phoneme boundaries manually labeled

Vowels in nasal contexts marked as nasalized Vowels in plosive contexts marked as non-nasalized

Pseudo-single-channel corpus generation

Signals of two channels arithmetically added up

slide-29
SLIDE 29

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Speech materials

Corpus design

24 NVN and 24 CVC words in carrier sentences N ∈ {/m/, /n/, /ng/} C ∈ {/t/, /d/, /p/, /b/, /k/, /g/} V ∈ {/iy/, /ae/, /aa/, /uw/}

Dual-channel corpus recorded with the NasalView

3 male and 3 female native American speakers 3 repetitions of the recording session of all sentences Gains of two channels calibrated to the same level before each session Phoneme boundaries manually labeled

Vowels in nasal contexts marked as nasalized Vowels in plosive contexts marked as non-nasalized

Pseudo-single-channel corpus generation

Signals of two channels arithmetically added up

slide-30
SLIDE 30

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Outline

1

Introduction

2

Method Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

3

Experiments Simulation Speech materials Detection tasks Results

4

Conclusion

slide-31
SLIDE 31

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Detection tasks

Dual-channel feature vs. single-channel feature

25-dimension ONTRIF features extracted from dual-channel singals 25-dimension MFCC features extracted from pseudo-single-channel singals Both 20ms frame-length, 10ms frame-shift

Speaker-dependent (SD) task

For each speaker: 2 sessions of data for training, one session of data for testing Gaussian PDF trained for each class

Speaker-independent (SI) task

For each seesion: 5 speakers’ data for training, one speaker’s data for testing 4-component GMM trained for each class

slide-32
SLIDE 32

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Detection tasks

Dual-channel feature vs. single-channel feature

25-dimension ONTRIF features extracted from dual-channel singals 25-dimension MFCC features extracted from pseudo-single-channel singals Both 20ms frame-length, 10ms frame-shift

Speaker-dependent (SD) task

For each speaker: 2 sessions of data for training, one session of data for testing Gaussian PDF trained for each class

Speaker-independent (SI) task

For each seesion: 5 speakers’ data for training, one speaker’s data for testing 4-component GMM trained for each class

slide-33
SLIDE 33

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Detection tasks

Dual-channel feature vs. single-channel feature

25-dimension ONTRIF features extracted from dual-channel singals 25-dimension MFCC features extracted from pseudo-single-channel singals Both 20ms frame-length, 10ms frame-shift

Speaker-dependent (SD) task

For each speaker: 2 sessions of data for training, one session of data for testing Gaussian PDF trained for each class

Speaker-independent (SI) task

For each seesion: 5 speakers’ data for training, one speaker’s data for testing 4-component GMM trained for each class

slide-34
SLIDE 34

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Outline

1

Introduction

2

Method Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

3

Experiments Simulation Speech materials Detection tasks Results

4

Conclusion

slide-35
SLIDE 35

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Speaker-dependent (SD) task

Confusion matrices of frame classification rates (FCR) and token classification rates (TCR)

FCR (%) TCR (%) Vo Nv Ns Vo Nv Ns Vo 97.38 1.06 0.17 98.84 0.00 0.00 Dual Nv 1.32 92.54 1.06 0.93 96.75 0.23 Ns 1.30 6.41 98.77 0.23 3.25 99.77 Vo 96.37 5.53 1.92 97.77 2.32 0.93 Single Nv 2.10 85.73 3.14 0.23 96.98 1.86 Ns 1.53 8.74 94.94 0.00 0.70 97.20 Samples 8104 10610 11044 432 431 858

slide-36
SLIDE 36

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Speaker-dependent (SD) task

Average frame recognition accuracy: 96.23% (dual) vs. 92.35% (single) McNemar test: significant at a 0.001 level Average token recognition accuracy: 98.45% (dual) vs. 97.99% (single) McNemar test: not significant at a 0.001 level (p = 0.028)

slide-37
SLIDE 37

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Speaker-independent (SI) task

Confusion matrices of frame classification rates (FCR) and token classification rates (TCR)

FCR (%) TCR (%) Vo Nv Ns Vo Nv Ns Vo 92.97 5.74 0.55 95.83 6.96 0.70 Dual Nv 6.40 71.81 8.29 3.94 84.69 24.13 Ns 0.63 22.45 91.16 0.23 8.35 75.17 Vo 78.88 48.24 28.91 78.47 42.92 17.25 Single Nv 15.12 43.28 13.57 13.43 42.00 11.42 Ns 6.01 8.48 57.52 8.10 15.08 71.33

slide-38
SLIDE 38

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Speaker-independent (SI) task

Average frame recognition accuracy: 85.31% (dual) vs. 59.89% (single) McNemar test: significant at a 0.001 level Average token recognition accuracy: 85.23% (dual) vs. 63.93% (single) McNemar test: significant at a 0.001 level

slide-39
SLIDE 39

Dual-Channel Acoustic Detection of Nasalization Statuses

  • X. Niu & J.

van Santen Introduction Method

Dual-channel acoustic model Dual-channel analysis method Nasalization feature extraction Nasalization detector

Experiments

Simulation Speech materials Detection tasks Results

Conclusion

Conclusion

Summary

The proposed dual-channel ONTRIF feature is capable to discriminate different nasalization statuses; The ONTRIF feature performs better than the single-channel MFCC feature in classification tasks; The ONTRIF feature is more robust to speaker differences.

Future work

Direct usage: automatic nasality assessment Phonetic study: more accurate model of vowel production Speech recognition: multi-channel acoustic front-end in adverse environments