Lecture 17: LPC speech synthesis and autocorrelation- based pitch tracking
ECE 401, Signal and Image Analysis November 5, 2020
Lecture 17: LPC speech synthesis and autocorrelation- based pitch - - PowerPoint PPT Presentation
Lecture 17: LPC speech synthesis and autocorrelation- based pitch tracking ECE 401, Signal and Image Analysis November 5, 2020 Outline The LPC-10 speech synthesis model The LPC-10 excitation model: white noise, pulse train Linear
ECE 401, Signal and Image Analysis November 5, 2020
Vocal Tract: Modeled by an LPC synthesis Filter.
๐ ๐ = $
!"#$ $
๐ ๐ โ ๐๐ ๐ ๐ ~๐ช 0,1 Voiced Speech, pitch period P Unvoiced Speech Binary Control Switch: Voiced (P>0) vs. Unvoiced (P=0)
G
Gain= ๐%&'()*
Vocal Tract: Modeled by an LPC synthesis Filter.
๐ ๐ = $
!"#$ $
๐ ๐ โ ๐๐ ๐ ๐ ~๐ช 0,1 Voiced Speech, pitch period P Unvoiced Speech Binary Control Switch: Voiced vs. Unvoiced
G
Gain= ๐%&'()*
By Morn - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index. php?curid=24084756
๐ ๐ = &
!"#$ $
๐ ๐ โ ๐๐
average energy to equal 1.0, we need to scale each pulse by ๐: ๐ ๐ = ๐ &
!"#$ $
๐ ๐ โ ๐๐
Pitch period = 80 samples โ first pulse in frame 31 canโt occur until the 70th sample of the frame
30
๐ 0 = 0
๐ ๐ = ๐ ๐ โ 1 + 2๐ ๐[๐]
๐ ๐ = / ๐ ๐ ๐ 2๐ โ ๐ ๐ โ 1 2๐ > 0 ๐๐๐ก๐
30
Sample Number, n Phase ๐ ๐ ๐ ๐ 2๐ 4๐ 6๐ 8๐ ๐ ๐
pulse train. In fact, each sample is highly predictable from previous samples. ๐ฆ[๐] โ &
."/ /0
๐ฝ.๐ฆ[๐ โ ๐]
predictability!
The LPC idea:
๐ ๐ = ๐ฆ ๐ โ &
."/ /0
๐ฝ.๐ฆ[๐ โ ๐]
explain as much as they can, so that ๐ ๐ is as close to zero as possible.
๐ ๐ ๐ฆ ๐
"#$ $%
!
"#$ $%
')+ = 0 gives
"#$ $%
๐,, ๐ ๐,, |๐ โ ๐|
"#$ $%
,#$ $%
,#$ $%
,#$ $%
,#$ $%
,#$ $%
$%
even in the worst possible case, ๐ ๐ โค 11๐ฝ234๐234.
just a delta function (๐ ๐ = ๐ ๐ ), and suppose all of the ๐ฝ. = 0 except the first one, ๐ฝ/ = 1. 1. Then ๐ก ๐ = ๐ ๐ + 1. 1๐ก[๐ โ 1] = 1. 1 5 Which overflows your 16-bit sample buffer after only 110 samples. Your
Fortunately, the LPC synthesis filter is causal, so itโs easy to guarantee stability. We just need to make sure that all of the poles have magnitude less than 1: |๐ !| < 1 We find the poles like this: ๐ผ ๐จ = 1 ๐ต(๐จ) = 1 1 โ โ"#$
$%
๐ฝ"๐จ&" = 1 โ!#$
$%
1 โ ๐ !๐จ&$ in other words, ๐ ! = ๐ ๐๐๐ข๐ก(๐ต ๐จ ) โฆwhich you can do using np.roots, if you define the polynomial correctly. Then you just truncate the magnitude, ๐ ! โ min ๐ ! , 0.999 ๐'โก)! โฆand then use np.poly to convert back from roots to polynomial.
** ๐ =
,#+-
** ๐ =
,#+-
,#+-
** 0
Pitch period = 9ms = 99 samples Pitch period = 9ms = 99 samples
๐ = argmax
6!"#7.76!$%
๐
88[๐]
29: and ๐ 234, are
important for good performance:
&'( corresponds to a high womanโs pitch,
about ๐บ
)/๐ &'( โ 250 Hz
&*+ corresponds to a low manโs pitch,
about ๐บ
)/๐ &*+ โ 80 Hz
๐
!"# ๐ !$%
Vocal Tract: Modeled by an LPC synthesis Filter.
๐ ๐ = $
!"#$ $
๐ ๐ โ ๐๐ ๐ ๐ ~๐ช 0,1 Voiced Speech, pitch period P Unvoiced Speech Binary Control Switch: Voiced (P>0) vs. Unvoiced (P=0)
G
Gain= ๐%&'()*
** ๐ โ ๐ ** 0
๐
,, ๐ โ ๐[๐]
which means that ๐
,, ๐ โช ๐ ,, 0
.-- % โฅ ๐ขโ๐ ๐๐กโ๐๐๐: say the frame is voiced.
.-- % < ๐ขโ๐ ๐๐กโ๐๐๐: say the frame is
voiced: ๐ฆ ๐ + ๐ โ ๐ฆ ๐ unvoiced: E[๐ฆ ๐ ๐ฆ ๐ โ ๐ ] โ ๐[๐]
Pitch Period Sample Number, n Frame Boundary Frame Boundary Frame Boundary Frame Boundary
Linear interpolation sounds much
interpolation using a formula like ๐ ๐ = (1 โ ๐)๐
< + ๐๐ <=/
Where
< is the pitch period in frame t
5#<> >
is how far sample n is from the beginning of frame t
Pitch Period Sample Number, n Frame Boundary Frame Boundary Frame Boundary Frame Boundary
1#02 0234+$