A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, - - PowerPoint PPT Presentation

a spectral temporal method for pitch tracking
SMART_READER_LITE
LIVE PREVIEW

A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, - - PowerPoint PPT Presentation

A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering D t t f El t i l d C t E i i Old Dominion University, Norfolk, VA 23529, USA. *


slide-1
SLIDE 1

A Spectral-Temporal Method for Pitch Tracking

Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu*

D t t f El t i l d C t E i i Department of Electrical and Computer Engineering Old Dominion University, Norfolk, VA 23529, USA. * Currently at Binghamton University 09/17/2006

1

09/17/2006

slide-2
SLIDE 2

O tline Outline

Introduction Algorithm

Algorithm overview The use of nonlinear processing

Pit h t ki f th t

Pitch tracking from the spectrum

Experimental evaluation C l i Conclusion

2

slide-3
SLIDE 3

I t d ti Introduction

Pi h( h f d l f ) li i Pitch(the fundamental frequency) applications

Automatic speech recognition (ASR), speech synthesis,

speech articulation training aids, etc. p g ,

Pitch detection algorithms

“Robust and accurate fundamental frequency estimation

b d d i h i ” k i based on dominant harmonic components,” Nakatani, etc

=> High accuracy for noisy speech reported using the harmonic dominance spectrum

“Yet another algorithm for pitch tracking(YAAPT),”

Zahorian, etc

=> Hybrid spectral-temporal processing for pitch tracking

3

> Hybrid spectral-temporal processing for pitch tracking

slide-4
SLIDE 4

Al ith O i Algorithm Overview

Squared Value Original Nonlinear processing Squared Value

  • f Speech

Original Speech Spectrum p g FFT F0 candidates estimation F0 candidates estimation Spectrum Pitch Tracking F0 candidates F0 candidates Refined F0 Refined F0 0 ca d dates

(Squared Value)

Spectral F0 track 0 ca d dates

(Original Speech)

Candidates refinement Refined F0 Candidates Refined F0 Candidates

Fi l F0

Final F0 determination using dynamic programming

4

Final F0

using dynamic programming

slide-5
SLIDE 5

Th U f N li P i

i f i i f d l i l h h

The Use of Nonlinear Processing

Restoration of missing fundamental in telephone speech A periodic sound is characterized by the spectrum of its harmonics harmonics

The signal the fundamental missed be approximated as

) 3 cos( ) 2 cos( ) (

3 2

t b t b t y ω ω + =

+ ) cos(

1

t b ω

After squaring and applying trigonometric identities

) 3 cos( ) 2 cos( ) (

3 2

t b t b t y ω ω +

1st harmonic 2nd harmonic Fundamental

+ ) cos(

1

t b ω

( )

[ ]

( ) ( ) ( ) ( )

t t b b t t b b t y

b b b b

ω ω ω ω 6 cos 5 cos 4 cos cos

2 3 2 2 3 2 2 2

2 3 2 2 2 3 2 2

+ + + + =

+

5

( ) ( )

2 3 2

The fundamental reappears

slide-6
SLIDE 6

Ill t ti f N li P i Illustration of Nonlinear Processing

The telephone speech signal (top panel) and squared p p g ( p p ) q telephone signal (bottom panel) for one frame

6

slide-7
SLIDE 7

Ill t ti f N li P i Illustration of Nonlinear Processing

The magnitude spectrum for the telephone (top panel) and g p p ( p p ) nonlinear processed signal (bottom panel)

7

slide-8
SLIDE 8

S t l Eff t f N li P i Spectral Effects from Nonlinear Processing

The missing fundamental in the telephone speech (top panel) g p p ( p p ) is restored in the squared signal (bottom panel)

Spectrum of the telephone speech ) 300 400 Frequency (Hz 100 200 300 Time (Seconds) 18 18.5 19 19.5 20 20.5 21 21.5 22 22.5 23 Spectrum of the nonlinear processed signal z) 300 400 Frequency (Hz 100 200

8

Time (Seconds) 18 18.5 19 19.5 20 20.5 21 21.5 22 22.5 23

slide-9
SLIDE 9

Pit h T ki F th S t Pitch Tracking From the Spectrum

The pitch track from the spectrum refines the pitch candidates estimated from the temporal method To achieve a noise robust pitch track from the t t l ti t f f ti i spectrum, an autocorrelation type of function is proposed

9

slide-10
SLIDE 10

A t l ti t f F ti

The function takes into account multiple harmonics

Autocorrelation type of Function

The function takes into account multiple harmonics

0.15 0.2 Spectrum

k 2k 3k 4k

0.6 0.8 1 Autocorrelation type of function 0.05 0.1 0.2 0.4 0.6

X X X

Equation

100 200 300 400 500 600 700 800 900 1000 Frequency (Hz)

WL

∑ ∏

+ 2 / 1 WL N

100 200 300 400 Frequency (Hz)

∑ ∏

− = =

+ =

2 / 1

) ( ) (

WL i n

i nk f k y

) (i f

: The spectrum, k: Frequency index,

max _ min _ F F

k k k < <

10

WL: Window length (20Hz)

N : The number of harmonics (3),

slide-11
SLIDE 11

P k i A t l ti T f F ti

0.4 S pec trum

Peaks in Autocorrelation Type of Function

0.1 0.2 0.3 Amplitude 200 400 600 800 1000 1200 F requenc y (Hz ) 1 P eak s in autoc orrelation ty pe of func tion 0.5 Amplitude 50 100 150 200 250 300 350 400 450 F requenc y (Hz )

A i t k i b d i th d f ti

11

A very prominent peak is observed in the proposed function

slide-12
SLIDE 12

Candidate Insertion to Reduce Pitch D bli /H l i Doubling/Halving

If all candidates are larger than a threshold (typically 150 If all candidates are larger than a threshold (typically 150 Hz), an additional candidate is inserted at half the frequency

  • f the highest-ranking candidate

Similar logic is used to reduce pitch halving Similar logic is used to reduce pitch halving

1 Peaks in autocorrelation type of function 0.5 mplitude P1 P2(Hz)= P1(Hz)/2 50 100 150 200 250 300 350 400 Am 12 Frequency(Hz)

slide-13
SLIDE 13

E i t l E l ti Experimental Evaluation

D b Database

Keele pitch extraction database 5 male and 5 female speakers about 35seconds speaker 5 male and 5 female speakers, about 35seconds speaker High quality speech and telephone speech Additive Gaussian noise

Controls (reference pitch)

C t l C1 li d i K l d t b

Control C1: supplied in Keele database Control C2: computed from the laryngograph signal

with the proposed algorithm

13

slide-14
SLIDE 14

D fi iti f E M Definition of Error Measures

Gross error

The percentage of frames such that the pitch estimate of

the tracker deviates significantly (typically 20%) from the tracker deviates significantly (typically 20%) from the reference pitch (control)

Only evaluated in the voiced sections of the reference

14

slide-15
SLIDE 15

E i t 1 R lt Experiment 1 Results

Individual performance of the proposed algorithm p p p g

Control Studio, Clean (%) Studio, 5dB Noise(%) Telephone, Clean (%) Telephone, 5dB Noise(%) ( ) ( ) ( ) ( ) YAAPT C1 4.26 7.62 8.14 17.85 YAAPT* C1 1.59 1.99 2.69 4.48 S l Spectral method C1 4.23 4.45 6.52 6.95 NCCF C1 3.58 4.52 8.00 16.61 YAAPT*: Using control C1 for the spectral pitch track NCCF : Normalized cross correlation function, used as the temporal method in YAPPT

15

method in YAPPT

slide-16
SLIDE 16

E i t 2 R lt Experiment 2 Results

The results of the new method with various error thresholds

Error Threshold Control Studio, Clean (%) Studio, 5dB Noise(%) Telephone, Clean (%) Telephone, 5dB Noise(%) Clean (%) 5dB Noise(%) Clean (%) 5dB Noise(%) 10% C1 5.46 7.31 9.39 16.14 10% C2 4.18 6.06 7.77 14.78 10% C2 4.18 6.06 7.77 14.78 20% C1 2.90 3.65 4.86 7.45 20% C2 1.56 2.16 3.27 5.85 40% C1 2.25 2.44 2.75 3.63 40% C2 0.91 1.06 0.99 2.05

16

slide-17
SLIDE 17

C i Comparisons

Studio Studio Telephone Telephone Control Studio, Clean (%) Studio, 5dB Noise(%) Telephone, Clean (%) Telephone, 5dB Noise(%) Proposed C1 2 90 3 65 4 86(4 52 *) 7 45(5 90 *) Method C1 2.90 3.65 4.86(4.52 ) 7.45(5.90 ) DASH C1 2.81 2.32 3.73* 4.15 * REPS C1 2.68 2.98 6.91* 8.49 * YIN C1 2.57 7.22 7.55* 14.6*

DASH, REPS, YIN: the results are reported in “Robust and accurate fundamental frequency estimation ... ,” Nakatani, etc. * SRAEN filt i l t d t l h h

17

*: SRAEN filter simulated telephone speech

slide-18
SLIDE 18

C l i Conclusion

A new pitch-tracking algorithm has been developed which combines multiple information sources to enable accurate robust F0 tracking enable accurate robust F0 tracking An analysis of errors indicates better performance for both high quality and telephone speech than for both high quality and telephone speech than previously reported performance for pitch tracking Acknowledgements

This work was partially supported by JWFC 900

18

This work was partially supported by JWFC 900