Assessment of Vocal Noise via Bi-directional Long-term Linear - - PowerPoint PPT Presentation

assessment of vocal noise via bi directional long term
SMART_READER_LITE
LIVE PREVIEW

Assessment of Vocal Noise via Bi-directional Long-term Linear - - PowerPoint PPT Presentation

Assessment of Vocal Noise via Bi-directional Long-term Linear Prediction of Running Speech F. Bettens * , F. Grenez * , J. Schoentgen *,** * Universit Libre de Bruxelles ** National Fund for Scientific Research Belgium Cause Vocal


slide-1
SLIDE 1

Assessment of Vocal Noise via Bi-directional Long-term Linear Prediction of Running Speech

  • F. Bettens*, F. Grenez*, J. Schoentgen*,**

*Université Libre de Bruxelles **National Fund for Scientific Research

Belgium

slide-2
SLIDE 2

Vocal Dysperiodicities Cause Pitch Breaks, Phonation Breaks, Timbre Breaks, … Transients Vibrations Ventricular Folds or Ary-Epiglottic Ligaments, … “Parasitic” Vibrations Breathiness, Breathy Voice, Whispery Voice, … (Audible) Additive Noise Owing to Turbulence Vocal Jitter & Shimmer, Frequency & Amplitude Tremor Perturbations Diplophonia, Bi-Phonation, Random Vibrations Vocal Fold Dynamics

slide-3
SLIDE 3

Existing Cues of Vocal Noise

  • Detection of individual vocal cycles

(or harmonics)

Steady vowel fragments (Pseudo)-Periodicity

Period Perturbation Quotient Amplitude Perturbation Quotient Harmonics-to-Noise Ratio

slide-4
SLIDE 4

Objectives : Analyses of Dysperiodicities

  • Give up request that speech fragments are :

(Pseudo)-Periodic Steady

  • Any Speech Fragment :

Modal Voices & (Very) Hoarse Voices Sustained Vowels & Running Speech

slide-5
SLIDE 5

Motivation : Analysis of Running Speech

  • Voicing in running speech

Variable acoustic impedance Voicing onsets & offsets Variable pressure drops Variable laryngeal positions

  • Voice Loading
slide-6
SLIDE 6

Double Linear Predictive Analysis

  • Conventional short-term linear prediction:
  • Long-term linear prediction:

remove existing correlations ⇒ unpredictable noise component (Qi, 1999)

] [ ] [ ] [ ] [ ] [

1

n x' n x n e i n x a n x'

S N i i

− = ⇒ − − = ∑

=

forward short-term prediction error

] [ ] [ ] [ ] [ ] [ n y' n y n e i P n y b n y'

L M i i

− = ⇒ − − − = ∑

=

forward double prediction error

] [ ] [ with n e n y

S

= signal speech : ] [ with n x

slide-7
SLIDE 7

Double Linear Predictive Analysis Drawbacks:

– eS[n] is an artificial signal – the dysperiodicities in weighted sum x′[n] are omitted – eL[n] is inflated to the right of unvoiced/voiced boundaries ] [ ] [ ] [ ] [ ] [

1

n x' n x n e i n x a n x'

S N i i

− = ⇒ − − = ∑

=

] [ ] [ ] [ ] [ ] [ n y' n y n e i P n y b n y'

L M i i

− = ⇒ − − − = ∑

=

⇒ Solutions:

⇒ remove short-term linear predictive analysis stage ⇒ proceed to bi-directional analysis

slide-8
SLIDE 8
  • Forward long-term linear prediction:
  • Backward long-term linear prediction:
  • Bi-directional long-term linear prediction:

keep the “best”

(frame by frame)

Bi-directional Long-term Prediction

( )

] [ , ] [ min ] [ n e n e n e

L L L

  • =

forward long-term prediction error

] [ ] [ ] [ ] [ ] [ n ' y n y n e i P n y b n ' y

L M i i

= ⇒ + + − = ∑

=

backward long-term prediction error

signal speech : ] [ ] [ with n x n y = ] [ ] [ ] [ ] [ ] [ n ' y n y n e i P n y b n ' y

L M i i

= ⇒ − − − = ∑

=

bi-directional long- term prediction error

slide-9
SLIDE 9

Long-term Prediction Distance : P Maximum of the auto-correlation function

example: steady vowel [a] (dysphonic speaker)

⇒ P = 184 (2 cycles)

slide-10
SLIDE 10

Vocal Noise Cue

Signal-to-Dysperiodicity Ratio:

SDR = 31,2 dB

speech signal dysphonic speaker bi-directional long-term prediction error

SDR = 10,1 dB

healthy speaker x[n] eL[n]

example: steady vowel [a]             − =

∑ ∑

= =

1 ] [ ] [ log 10

1 2 1 2 10

P P

N n L N n dB

n e n x SDR

slide-11
SLIDE 11

Results1:Sentence(1 female speaker; modal phonation type) (http://www.limsi.fr/VOQUAL/ : “Il est sorti avant le jour”)

speech signal forward long-term prediction error bi-directional long-term prediction error

segments [il]

slide-12
SLIDE 12

Results 2 : Sentence (1 female speaker; 5 phonation types) (http://www.limsi.fr/VOQUAL/ : “Il est sorti avant le jour”)

Double prediction Long-term prediction Direction Signal SDR (dB) SDR (dB) Modal 25.7 19.5 Rough1 16.9 11.4 Rough2 13.9 8.0 Rough3 9.8 3.6 Bi-directional Whisper 9.5 3.2 Modal 25.4 16.2 Rough1 16.8 10.3 Rough2 13.7 6.9 Rough3 9.6 2.7 Forward Whisper 9.3 1.8

SDR bi-directional > SDR forward SDR double > SDR long-term

slide-13
SLIDE 13

Conclusion

The forward & backward long-term prediction

  • f speech enables the analysis of any speech

signal with a view to the assessment of the vocal noise (i.e. vocal dysperiodicities) The analysis is not based on any assumptions regarding the periodicity or stationarity of the speech signals