Modeling speech using pole-zero models
Christian H. Kasess
Acoustics Research Institute
25.10.2012
Kasess (ARI) Vocal tract modeling SPL 2012 1 / 31
Modeling speech using pole-zero models Christian H. Kasess - - PowerPoint PPT Presentation
Modeling speech using pole-zero models Christian H. Kasess Acoustics Research Institute 25.10.2012 Kasess (ARI) Vocal tract modeling SPL 2012 1 / 31 The vocal tract Roughly divided into three cavities Pharyngeal Oral Nasal Oral vowel
Kasess (ARI) Vocal tract modeling SPL 2012 1 / 31
http://pegasus.cc.ucf.edu/ cnye/vocal tract pic.htm
Kasess (ARI) Vocal tract modeling SPL 2012 2 / 31
http://health.tau.ac.il/Communication Disorders/noam
Kasess (ARI) Vocal tract modeling SPL 2012 3 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 4 / 31
Correlation function...γ(i) = E[y(n)y(n − i)] Kasess (ARI) Vocal tract modeling SPL 2012 5 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 6 / 31
1000 2000 3000 4000 −50 −40 −30 −20 −10
f[Hz] level[dB]
(15,0), RMS= 0.56 (10,5), RMS= 0.46 (15,5), RMS= 0.45 (20,20), RMS= 0.2 Kasess (ARI) Vocal tract modeling SPL 2012 7 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 8 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 9 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 10 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 11 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 12 / 31
K−1
A(eiωk ,θ′) A(eiωk ,θi−1) − B(eiωk ,θ′) A(eiωk ,θi−1)
K−1
B(eiωk ,θ′) A(eiωk ,θ′)
A(eiωk ,θi−1)
Kasess (ARI) Vocal tract modeling SPL 2012 13 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 14 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 15 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 16 / 31
x
Kasess (ARI) Vocal tract modeling SPL 2012 17 / 31
Am+Am+1 and z := exp i2π f fs = exp i2πf c 2l
Kasess (ARI) Vocal tract modeling SPL 2012 18 / 31
glottis pharynx velum nasal cavity
ˆ B(µ,z) ˆ A(µ,z)
Kasess (ARI) Vocal tract modeling SPL 2012 19 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 20 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 21 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 22 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 23 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 24 / 31
−0.5 0.0 0.5 1.0
Reflection Coeff. µ1 µ2 µ3 µ4 µ5 µ6 µ1 µ2 µ3 µ4 µ5 µ6 µ1 µ2 µ3 µ4 µ5 µ6 σ2=0.02 σ2=0.1 GN
2000 3000 4000 −40 −30 −20 −10 Frequency [Hz] Level [dB]
σ2=0.02 σ2=0.1 GN
Kasess (ARI) Vocal tract modeling SPL 2012 25 / 31
2000 3000 4000 −40 −30 −20 −10 Frequency [Hz] Level [dB]
σ2=0.02 σ2=0.1 GN
2000 3000 4000 −40 −30 −20 −10 Frequency [Hz] Level [dB]
σ2=0.02 σ2=0.1 GN
Kasess (ARI) Vocal tract modeling SPL 2012 26 / 31
5 10 15 20 distance from glottis [cm] Cross−section speaker 1
nasal
pharyngeal /m/ /n/ IQR
5 10 15 20 distance from glottis [cm] Cross−section speaker 2
nasal
pharyngeal /m/ /n/ IQR
5 10 15 20 distance from glottis [cm] Cross−section speaker 3
nasal
pharyngeal /m/ /n/ IQR
Kasess (ARI) Vocal tract modeling SPL 2012 27 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 28 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 29 / 31
Kasess (ARI) Vocal tract modeling SPL 2012 30 / 31
the Laplace approximation. Neuroimage, 34, 220–234, 2006. I.-T. Lim and B.G. Lee. Lossy Pole-Zero Modeling for Speech Signals. IEEE Trans. Speech Audio Processing, 4(2), 1996.
. Balazs. On Pole-Zero Model Estimation Methods Minimizing a Logarithmic Criterion for Speech Analysis. IEEE, IEEE Trans. Audio Speech Lang. Process., 18(2):237–248, 2010. J.D. Markel and A.H. Gray, Jr.. Linear Prediction of Speech. Springer, Berlin, 1976.
thesis, Universität Frankfurt, 2000.
. Balazs, D. Marelli and T. Becker. A Logarithmic Based Pole-Zero Vocal Tract Model Estimation for Speaker Verification, ICASSP 2011 Steiglitz, K., and L.E. McBride. A Technique for the Identification of Linear Systems, IEEE Trans. Automatic Control, Vol. AC-10, pp.461-464, 1965. Kasess (ARI) Vocal tract modeling SPL 2012 31 / 31