Lecture 17: LPC speech synthesis and autocorrelation- based pitch - PowerPoint PPT Presentation

Lecture 17: LPC speech synthesis and autocorrelation- based pitch tracking ECE 401, Signal and Image Analysis November 5, 2020

Outline • The LPC-10 speech synthesis model • The LPC-10 excitation model: white noise, pulse train • Linear predictive coding: how to find the coefficients • Linear predictive coding: how to make sure the coefficients are stable • Autocorrelation-based pitch tracking • Inter-frame interpolation of pitch and energy contours

The LPC-10 speech synthesis model

The LPC-10 Speech Coder: Transmitted Parameters Each frame is 54 bits, and is used to synthesize 22.5ms of speech. (54 bits/frame)/(0.0225 seconds/frame)=2400 bits/second • Pitch : 7 bits/frame (127 distinguishable non-zero pitch periods) • Energy : 5 bits/frame (32 levels, on a log-energy scale) • 10 linear predictive coefficients (LPC): 41 bits/frame • Synchronization: 1 bit/frame

The LPC-10 speech synthesis model $ 𝑓 𝑜 = $ 𝜀 𝑜 − 𝑞𝑄 !"#$ Voiced Speech, pitch period P 𝑡[𝑜] 𝐻 𝐼(𝑓 !" ) G Gain= 𝑓 𝑜 ~𝒪 0,1 Vocal Tract: 𝑓 %&'()* Unvoiced Speech Binary Control Modeled by Switch: an LPC synthesis Voiced (P>0) vs. Filter. Unvoiced (P=0)

The LPC-10 speech synthesis model $ 𝑓 𝑜 = $ 𝜀 𝑜 − 𝑞𝑄 !"#$ Voiced Speech, pitch period P 𝑡[𝑜] 𝐻 𝐼(𝑓 !" ) G Gain= 𝑓 𝑜 ~𝒪 0,1 Vocal Tract: 𝑓 %&'()* Unvoiced Speech Binary Control Modeled by Switch: an LPC synthesis Voiced vs. Filter. Unvoiced

Unvoiced speech: e[n]=white noise • Use zero-mean, unit-variance Gaussian white noise • The choice, to use “unvoiced speech,” is communicated by the special code word “P=0” By Morn - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index. php?curid=24084756

Voiced speech: e[n]=pulse train • The basic idea: $ 𝑓 𝑜 = & 𝜀 𝑜 − 𝑞𝑄 !"#$ • Modification #1: in order for the average energy to equal 1.0, we need to scale each pulse by 𝑄 : $ 𝑓 𝑜 = 𝑄 & 𝜀 𝑜 − 𝑞𝑄 !"#$

Modification #2: the first pulse is not at n=0 30 Pitch period = 80 samples ⇒ first pulse in frame 31 can’t occur until the 70 th sample of the frame

A mechanism for keeping track of pitch phase from one frame to the next • Start out, at the beginning of the speech, with a pitch phase equal to zero, 𝜒 0 = 0 • For every sample thereafter: • If the sample is unvoiced (P[n]=0), don’t increment the pitch phase • If the sample is voiced (P[n]>0), then increment the pitch phase 𝜒 𝑜 = 𝜒 𝑜 − 1 + 2𝜌 𝑄[𝑜] • Every time the phase passes a multiple of 2𝜌 , output a pitch pulse 𝜒 𝑜 − 𝜒 𝑜 − 1 𝑓 𝑜 = / 𝑄 > 0 2𝜌 2𝜌 0 𝑓𝑚𝑡𝑓

The pitch phase method: generate an excitation pulse whenever pitch phase crosses a 2𝜌 -level Phase 𝜒 𝑜 𝜒 𝑜 8𝜌 6𝜌 4𝜌 2𝜌 Sample Number, n 30 𝑓 𝑜

Speech is predictable • Speech is not just white noise and pulse train. In fact, each sample is highly predictable from previous samples. /0 𝑦[𝑜] ≈ & 𝛽 . 𝑦[𝑜 − 𝑛] ."/ • In fact, the pitch pulses are the only major exception to this predictability!

Linear predictive coding (LPC) The LPC idea: 𝑦 𝑜 1. Model the excitation as error /0 𝑓 𝑜 = 𝑦 𝑜 − & 𝛽 . 𝑦[𝑜 − 𝑛] ."/ 𝑓 𝑜 2. Force the coefficients 𝛽 . to explain as much as they can, so that 𝑓 𝑜 is as close to zero as possible.

Linear predictive coding (LPC) ! $% 𝜁 = 𝐹 𝑓 ! [𝑜] = 𝐹 𝑦 𝑜 − * 𝛽 " 𝑦[𝑜 − 𝑗] "#$ $% 𝜖𝜁 = −2𝐹 𝑦 𝑜 − 𝑘 𝑦 𝑜 − * 𝛽 " 𝑦 𝑜 − 𝑗 𝜖𝛽 & "#$ Setting '( ') + = 0 gives $% 𝐹 𝑦 𝑜 − 𝑘 𝑦[𝑜] = * 𝛽 " 𝐹 𝑦 𝑜 − 𝑘 𝑦[𝑜 − 𝑗] "#$ 𝑆 ,, 𝑘 𝑆 ,, |𝑗 − 𝑘|

Linear predictive coding (LPC) So we have a set of linked equations, for 1 ≤ 𝑘 ≤ 10 : $% 𝑆 ** 𝑘 = * 𝛽 " 𝑆 ** |𝑗 − 𝑘| "#$ • We can write these 10 equations as a 10x10 matrix equation: ⃗ 𝛿 = 𝑆 ⃗ 𝛽 𝛽 = 𝑆 +$ ⃗ • …which immediately gives the solution: ⃗ 𝛿 • …where 𝛽 $ 𝑆 ** 0 𝑆 ** 1 ⋯ 𝑆 ** 1 ⋮ 𝛿 = ⃗ ⋮ , 𝑆 = 𝑆 ** 1 𝑆 ** 0 ⋯ , 𝛽 = ⃗ 𝛽 $% 𝑆 ** 10 ⋮ ⋮ 𝑆 ** 0

Speech -> Excitation -> Speech Now that we know how to find the LPC coefficients, we can imagine an end-to-end LPC analysis-by-synthesis: Model excitation LPC using LPC 𝑦[𝑜] 𝑓[𝑜] 𝑓[𝑜] 𝑡[𝑜] analysis pulse train synthesis and white noise $% $% 𝑓 𝑜 = 𝑦 𝑜 − * 𝛽 , 𝑦[𝑜 − 𝑛] 𝑡 𝑜 = 𝑓 𝑜 + * 𝛽 , 𝑡[𝑜 − 𝑛] ,#$ ,#$

The LPC Analysis Filter The LPC Analysis Filter is an all-zeros filter (FIR = finite impulse response): $% 𝑓 𝑜 = 𝑦 𝑜 − * 𝛽 , 𝑦 𝑜 − 𝑛 ↔ 𝐹 𝑨 = 𝐵 𝑨 𝑌(𝑨) ,#$ …where… $% 𝛽 , 𝑨 +, 𝐵 𝑨 = 1 − * ,#$

The LPC Synthesis Filter The LPC Synthesis Filter is an all-poles filter (IIR = infinite impulse response): $% 𝑡 𝑜 = 𝑓 𝑜 + * 𝛽 , 𝑡 𝑜 − 𝑛 ↔ 𝑇 𝑨 = 𝐼 𝑨 𝐹(𝑨) ,#$ …where… 1 1 𝐼 𝑨 = 𝐵(𝑨) = $% 1 − ∑ ,#$ 𝛽 , 𝑨 +,

Speech -> Excitation -> Speech 1 Excitation 𝑦[𝑜] 𝐵 𝑨 𝑓[𝑜] 𝑓[𝑜] 𝑡[𝑜] Model 𝐵(𝑨)

The Stability Problem • The analysis filter is guaranteed to be stable, as long as the coefficients are finite. Suppose you know that |𝑦 𝑜 | ≤ 𝑌 234 , and |𝛽 . | ≤ 𝛽 234 . Then, even in the worst possible case, 𝑓 𝑜 ≤ 11𝛽 234 𝑌 234 . • The synthesis filter has no such guarantee. For example, suppose 𝑓 𝑜 is just a delta function ( 𝑓 𝑜 = 𝜀 𝑜 ), and suppose all of the 𝛽 . = 0 except the first one, 𝛽 / = 1. 1 . Then 𝑡 𝑜 = 𝜀 𝑜 + 1. 1𝑡[𝑜 − 1] = 1. 1 5 Which overflows your 16-bit sample buffer after only 110 samples. Your output will be full of NaNs, and you’ll be saying “What happened…?”

How to Guarantee Stability Fortunately, the LPC synthesis filter is causal, so it’s easy to guarantee stability. We just need to make sure that all of the poles have magnitude less than 1: |𝑠 ! | < 1 We find the poles like this: 1 1 1 𝐼 𝑨 = 𝐵(𝑨) = 𝛽 " 𝑨 &" = $% $% 1 − 𝑠 ! 𝑨 &$ 1 − ∑ "#$ ∏ !#$ in other words, 𝑠 ! = 𝑠𝑝𝑝𝑢𝑡(𝐵 𝑨 ) …which you can do using np.roots, if you define the polynomial correctly. Then you just truncate the magnitude, 𝑠 ! ← min 𝑠 ! , 0.999 𝑓 '∡) ! …and then use np.poly to convert back from roots to polynomial.

Autocorrelation is maximum at n=0 - 𝑠 ** 𝑜 = * 𝑦 𝑛 𝑦[𝑛 − 𝑜] ,#+-

Autocorrelation of a periodic signal Suppose x[n] is periodic, 𝑦[𝑜] = 𝑦[𝑜 − 𝑄] . Then the autocorrelation is also periodic: - - 𝑦 ! 𝑛 = 𝑠 𝑠 ** 𝑄 = * 𝑦 𝑛 𝑦[𝑛 − 𝑄] = * ** 0 ,#+- ,#+-

Autocorrelation of a periodic signal is periodic Pitch period = 9ms = 99 samples Pitch period = 9ms = 99 samples

Autocorrelation pitch tracking • Compute the autocorrelation • Find the pitch period: 𝑄 = argmax 𝑠 88 [𝑛] 6 !"# 7.76 !$% • The search limits, 𝑄 29: and 𝑄 234 , are important for good performance: • 𝑄 &'( corresponds to a high woman’s pitch, about 𝐺 ) /𝑄 &'( ≈ 250 Hz • 𝑄 &*+ corresponds to a low man’s pitch, about 𝐺 ) /𝑄 &*+ ≈ 80 Hz 𝑄 !"# 𝑄 !$%

The LPC-10 speech synthesis model $ 𝑓 𝑜 = $ 𝜀 𝑜 − 𝑞𝑄 !"#$ Voiced Speech, pitch period P 𝑡[𝑜] 𝐻 𝐼(𝑓 !" ) G Gain= 𝑓 𝑜 ~𝒪 0,1 Vocal Tract: 𝑓 %&'()* Unvoiced Speech Binary Control Modeled by Switch: an LPC synthesis Voiced (P>0) vs. Filter. Unvoiced (P=0)

Lecture 17: LPC speech synthesis and autocorrelation- based pitch - PowerPoint PPT Presentation

Lecture 17: LPC speech synthesis and autocorrelation- based pitch tracking ECE 401, Signal and Image Analysis November 5, 2020 Outline The LPC-10 speech synthesis model The LPC-10 excitation model: white noise, pulse train Linear

Lecture 14: LPC speech synthesis and autocorrelation- based pitch tracking ECE 417, Multimedia

LPC Docket 17-8216: 266 West End Avenue LPC Docket 17-8216: 266 West End Avenue LPC Docket

Partial and Autocorrelation Functions Overview Autocorrelation Function Defined Normalized

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

ATTACHMENT 7 LPC 05-03-12 Page 1 of 58 ATTACHMENT 7 LPC 05-03-12 Page 2 of 58 ATTACHMENT 7

DITMAS PARK PASSIVE HOUSE RETROFIT 476 E 18TH STREET BROOKLYN NY 11226 Passive House is an

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Jonatan Laland With inputs from his colleagues CERN, for LPC meeting, 2018 @ CERN te-epc-lpc

Lecture 19: Autocorrelation Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Chapter 9 Linear Predictive Analysis of Speech Signals 1 LPC

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and

11-752: Speech Synthesis Objectives Understand basic processing in speech synthesis

a no-nonsense quick guide Jarlath Quinn Analytics Consultant Rachel Clinton Business

Forsaking Folly 2 1 10/1/2020 F OLLY : The Real Pandemic (1) The nave or simple

WORKPLACE Critical WELLBEING During a crisis Building or emergency Features event

Research questions... Research questions... Could missing data method change the quality of the

Administrivia Students are 75% EE, 25% CS. Lecture 2 Top three goals: Signal Processing and

Managing a Dynamic Sharded Pool Anthony Tiradani HTCondor Week 2019 22 May 2019 Introduction

High Energy WW Scattering at the LHC James (Jamie) Gainer University of Florida August 19, 2013

Vocoders 1 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass

Lecture 17: LPC speech synthesis and autocorrelation- based pitch - PowerPoint PPT Presentation

Lecture 17: LPC speech synthesis and autocorrelation- based pitch tracking ECE 401, Signal and Image Analysis November 5, 2020 Outline The LPC-10 speech synthesis model The LPC-10 excitation model: white noise, pulse train Linear

Lecture 14: LPC speech synthesis and autocorrelation- based pitch tracking ECE 417, Multimedia

LPC Docket 17-8216: 266 West End Avenue LPC Docket 17-8216: 266 West End Avenue LPC Docket

Partial and Autocorrelation Functions Overview Autocorrelation Function Defined Normalized

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

ATTACHMENT 7 LPC 05-03-12 Page 1 of 58 ATTACHMENT 7 LPC 05-03-12 Page 2 of 58 ATTACHMENT 7

DITMAS PARK PASSIVE HOUSE RETROFIT 476 E 18TH STREET BROOKLYN NY 11226 Passive House is an

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Jonatan Laland With inputs from his colleagues CERN, for LPC meeting, 2018 @ CERN te-epc-lpc

Lecture 19: Autocorrelation Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Chapter 9 Linear Predictive Analysis of Speech Signals 1 LPC

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 5: Speech modeling and

11-752: Speech Synthesis Objectives Understand basic processing in speech synthesis

a no-nonsense quick guide Jarlath Quinn Analytics Consultant Rachel Clinton Business

Forsaking Folly 2 1 10/1/2020 F OLLY : The Real Pandemic (1) The nave or simple

WORKPLACE Critical WELLBEING During a crisis Building or emergency Features event

Research questions... Research questions... Could missing data method change the quality of the

Administrivia Students are 75% EE, 25% CS. Lecture 2 Top three goals: Signal Processing and

Managing a Dynamic Sharded Pool Anthony Tiradani HTCondor Week 2019 22 May 2019 Introduction

High Energy WW Scattering at the LHC James (Jamie) Gainer University of Florida August 19, 2013

Vocoders 1 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and