A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, - PowerPoint PPT Presentation

A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering D t t f El t i l d C t E i i Old Dominion University, Norfolk, VA 23529, USA. * Currently at Binghamton University 09/17/2006 09/17/2006 1

O tline Outline � Introduction � Algorithm � Algorithm overview � The use of nonlinear processing � Pitch tracking from the spectrum Pit h t ki f th t � Experimental evaluation � C � Conclusion l i 2

I t Introduction d ti � Pitch(the fundamental frequency) applications Pi h( h f d l f ) li i � Automatic speech recognition (ASR), speech synthesis, speech articulation training aids, etc. p g , � Pitch detection algorithms � “Robust and accurate fundamental frequency estimation b based on dominant harmonic components,” Nakatani, etc d d i h i ” k i => High accuracy for noisy speech reported using the harmonic dominance spectrum � “Yet another algorithm for pitch tracking(YAAPT),” Zahorian, etc => Hybrid spectral-temporal processing for pitch tracking > Hybrid spectral-temporal processing for pitch tracking 3

Al Algorithm Overview ith O i Nonlinear processing p g Original Original Squared Value Squared Value Speech of Speech FFT Spectrum Spectrum F0 candidates estimation F0 candidates estimation Pitch Tracking F0 candidates 0 ca d dates F0 candidates 0 ca d dates Spectral F0 track (Original Speech) (Squared Value) Candidates refinement Refined F0 Refined F0 Refined F0 Refined F0 Candidates Candidates Final F0 determination using dynamic programming using dynamic programming Fi Final F0 l F0 4

Th U The Use of Nonlinear Processing f N li P i � Restoration of missing fundamental in telephone speech i f i i f d l i l h h � A periodic sound is characterized by the spectrum of its harmonics harmonics � The signal the fundamental missed be approximated as = ω ω + + ω ω ω ω + + y y ( ( t t ) ) b b cos( cos( 2 2 t t ) ) b b cos( cos( 3 3 t t ) ) b b cos( cos( t t ) ) 1 1 2 2 3 3 1 st harmonic 2 nd harmonic Fundamental � After squaring and applying trigonometric identities [ ] ( ) ( ) ( ) + 2 2 2 = + ω + ω b b 2 b y t b b cos t cos 4 t 2 3 2 2 3 2 2 ( ( ) ) ( ( ) ) 2 + ω + ω b b b cos 5 t cos 6 t 3 2 2 3 3 2 2 The fundamental reappears 5

Illustration of Nonlinear Processing Ill t ti f N li P i � The telephone speech signal (top panel) and squared p p g ( p p ) q telephone signal (bottom panel) for one frame 6

Illustration of Nonlinear Processing Ill t ti f N li P i � The magnitude spectrum for the telephone (top panel) and g p p ( p p ) nonlinear processed signal (bottom panel) 7

S Spectral Effects from Nonlinear Processing t l Eff t f N li P i � The missing fundamental in the telephone speech (top panel) g p p ( p p ) is restored in the squared signal (bottom panel) Spectrum of the telephone speech 400 300 300 ) Frequency (Hz 200 100 18 18.5 19 19.5 20 20.5 21 21.5 22 22.5 23 Time (Seconds) Spectrum of the nonlinear processed signal 400 300 z) Frequency (Hz 200 100 18 18.5 19 19.5 20 20.5 21 21.5 22 22.5 23 Time (Seconds) 8

Pit h T Pitch Tracking From the Spectrum ki F th S t � The pitch track from the spectrum refines the pitch candidates estimated from the temporal method � To achieve a noise robust pitch track from the spectrum, an autocorrelation type of function is t t l ti t f f ti i proposed 9

A t Autocorrelation type of Function l ti t f F ti � The function takes into account multiple harmonics � The function takes into account multiple harmonics Autocorrelation type of function Spectrum 1 0.2 0.8 0.15 k 4k 2k 3k 0.6 0.6 0.1 X X X 0.4 0.05 0.2 0 0 0 100 200 300 400 0 100 200 300 400 500 600 700 800 900 1000 Frequency (Hz) WL Frequency (Hz) � Equation + N 1 WL / 2 ∑ ∏ ∑ ∏ = + y ( k ) f ( nk i ) = − = i WL / 2 n 1 < < k : Frequency index, k k k : The spectrum, f ( i ) F 0 _ min F 0 _ max WL : Window length (20Hz) N : The number of harmonics (3), 10

P Peaks in Autocorrelation Type of Function k i A t l ti T f F ti S pec trum 0.4 0.3 Amplitude 0.2 0.1 0 0 200 400 600 800 1000 1200 F requenc y (Hz ) P eak s in autoc orrelation ty pe of func tion 1 Amplitude 0.5 0 0 50 100 150 200 250 300 350 400 450 F requenc y (Hz ) A A very prominent peak is observed in the proposed function i t k i b d i th d f ti 11

Candidate Insertion to Reduce Pitch Doubling/Halving D bli /H l i � If all candidates are larger than a threshold (typically 150 If all candidates are larger than a threshold (typically 150 Hz), an additional candidate is inserted at half the frequency of the highest-ranking candidate � Similar logic is used to reduce pitch halving � Similar logic is used to reduce pitch halving Peaks in autocorrelation type of function 1 P2 (Hz)= P1 (Hz)/ 2 P1 mplitude 0.5 Am 0 0 50 250 100 150 200 300 350 400 Frequency(Hz) 12

E Experimental Evaluation i t l E l ti � Database D b � Keele pitch extraction database � 5 male and 5 female speakers about 35seconds speaker � 5 male and 5 female speakers, about 35seconds speaker � High quality speech and telephone speech � Additive Gaussian noise � Controls (reference pitch) � Control C1: supplied in Keele database C t l C1 li d i K l d t b � Control C2: computed from the laryngograph signal with the proposed algorithm 13

D fi iti Definition of Error Measures f E M � Gross error � The percentage of frames such that the pitch estimate of the tracker deviates significantly (typically 20%) from the tracker deviates significantly (typically 20%) from the reference pitch (control) � Only evaluated in the voiced sections of the reference 14

E Experiment 1 Results i t 1 R lt � Individual performance of the proposed algorithm p p p g Control Studio, Studio, Telephone, Telephone, Clean (%) ( ) 5dB Noise(%) ( ) Clean (%) ( ) 5dB Noise(%) ( ) YAAPT C1 4.26 7.62 8.14 17.85 YAAPT* C1 1.59 1.99 2.69 4.48 S Spectral l C1 4.23 4.45 6.52 6.95 method NCCF C1 3.58 4.52 8.00 16.61 YAAPT*: Using control C1 for the spectral pitch track NCCF : Normalized cross correlation function, used as the temporal method in YAPPT method in YAPPT 15

E Experiment 2 Results i t 2 R lt � The results of the new method with various error thresholds Error Control Studio, Studio, Telephone, Telephone, Threshold Clean (%) Clean (%) 5dB Noise(%) Clean (%) 5dB Noise(%) Clean (%) 5dB Noise(%) 5dB Noise(%) 10% C1 5.46 7.31 9.39 16.14 10% 10% C2 C2 4.18 4.18 6.06 6.06 7.77 7.77 14.78 14.78 20% C1 2.90 3.65 4.86 7.45 20% C2 1.56 2.16 3.27 5.85 40% C1 2.25 2.44 2.75 3.63 40% C2 0.91 1.06 0.99 2.05 16

C Comparisons i Studio Studio, Studio Studio, Telephone Telephone, Telephone, Telephone Control Clean (%) 5dB Noise(%) Clean (%) 5dB Noise(%) Proposed C1 C1 2 90 2.90 3 65 3.65 4 86(4 52 *) 7 45(5 90 *) 4.86(4.52 ) 7.45(5.90 ) Method DASH C1 2.81 2.32 3.73* 4.15 * REPS C1 2.68 2.98 6.91* 8.49 * YIN C1 2.57 7.22 7.55* 14.6* � DASH, REPS, YIN: the results are reported in “Robust and accurate fundamental frequency estimation ... ,” Nakatani, etc. � * SRAEN filt � *: SRAEN filter simulated telephone speech i l t d t l h h 17

C Conclusion l i � A new pitch-tracking algorithm has been developed which combines multiple information sources to enable accurate robust F0 tracking enable accurate robust F0 tracking � An analysis of errors indicates better performance for both high quality and telephone speech than for both high quality and telephone speech than previously reported performance for pitch tracking � Acknowledgements � This work was partially supported by JWFC 900 � This work was partially supported by JWFC 900 18

A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, - PowerPoint PPT Presentation

A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian, Princy Dikshit, Hongbing Hu Department of Electrical and Computer Engineering D t t f El t i l d C t E i i Old Dominion University, Norfolk, VA 23529, USA. *

Pitch vs. loudness Emma Baron High pitch What is Pitch Pitch is as high or low a sound can go.

Pitch mix Exploring Pitch Data in R Chapter overview New tools to evaluate July pitch mix

Pitch and loudness By Aniyah Bilgrami Pitch Pitch is how high or low some sounds are. An example

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

PITCH AND VOLUME BY: ALEXANDRA BLASBERG WHAT IS PITCH? PITCH IS HOW HIGH OR LOW SOUND IS. AN

TINNITUS INVESTOR PITCH DECK THE TINNITUS BRAIN 2 INVESTOR PITCH DECK 3 CUTTING-EDGE

Pitch location and Greinkes July Exploring Pitch Data in R Strike zone success Exploring

Lecture 6: Music Mark Hasegawa-Johnson ECE 401: Signal and Image Analysis, Fall 2020 Review

Spatio-Temporal Statistics with R Chapter Two: Exploring Spatio-Temporal Data Spatio-Temporal

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

The Art Of The Pitch Persuasion And Presentation Skills That Win Business The Art Of The Pitch

Reaction House Experience The Pitch Pitch Katrina and Sandy displaced over 1.1 million people

GCT535- Sound Technology for Multimedia Pitch Analysis Graduate School of Culture Technology

Lecture 14: LPC speech synthesis and autocorrelation- based pitch tracking ECE 417, Multimedia

Temporal, Spatial, and Spatio-temporal Granularities Gabriele Pozzani Department of Computer

COLORADO TABOR AMENDMENT VOTER OPINION SURVEY August 5 th 7 th , 2019 2 Colorado TABOR

Why budget? As a board, its your responsibility! Planning Control of taxpayers

Bank of Israel January 1, 2019 Households general satisfaction with the banks To what extent

Managing Risk: Robbery Presented by Allen J. Stendahl New Risks Continue to Evolve Plastic

analytical testing for e-vapor products and impact on number of replicates Michael Morton,

Cyanotoxin Analysis Methods Oregon Cyanotoxin Rule Considerations Webinar August 22, 2018

Examining Temporality in Document Classification Xiaolei Huang Michael J. Paul University of

1 out of 20 possible scenarios: how to perform temporal disaggregation of annual sector accounts