GCT535- Sound Technology for Multimedia Time-Stretching and - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of Culture Technology KAIST Juhan Nam 1

Outlines § Resampling § OverLap and Add (OLA) methods – SOLA – WSOLA – PSOLA § Phase Vocoder 2

Playback Rate Conversion § “Playback rate” is not necessarily equal to the recording rate § Adjusting the playback rate given the recorded audio creates different tones – Sliding tapes on the magnetic header in a variable speed – Speeding down: “monster-like” – Speeding up: “chipmunk-like” – It can be even negative rate: reverse playback § Demo – https://musiclab.chromeexperiments.com/Voice-Spinner 3

Resampling § Reconstruct the original signal and sample it with a new rate – For a digital system with a constant playback rate • Up-sampling makes the original sound played slower • Down-sampling makes the original sound played faster 4

Resampling by Reconstruction Lowpass Filters § As you recall from the topic of digital audio, the original signal can be reconstructed by the sinc function – Resampling on the reconstructed signal is equivalent to interpolation with the reconstruction filter

Reconstruction Lowpass Filters (Interpolation Filters) Windowed Sinc Linear 1.5 1.5 1 1 0.5 0.5 0 0 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 Sample Time Sample Time 3rd − order B − spline 3rd − order Lagrange 1.5 1.5 1 1 0.5 0.5 0 0 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 Sample Time Sample Time 6

Resampling § Resampling changes pitch, length and timbre at the same time! Original Speed Down (Up-sampling) Speed Up (Down-sampling) [The DaFX book] 7

How can we control pitch and length independently? § Yes, the answer is processing samples in frame-level instead of sample- level – The sample block preserves the local shape of waveforms Sample Block Analysis hop size Synthesis hop size 8

Time-Stretching § Time-Stretching (without pitch-shifting) # $ – Time-stretching ratio: 𝛽 = # % ( 𝐼 ' : synthesis hop size, 𝐼 ( : analysis hop size) – If 𝛽 > 1 , increase the length – If 𝛽 < 1 , reduce the length § Algorithms – OLA – SOLA – PSOLA – Phase Vocoder 9

OverLap-and-Add (OLA) § A time-stretching algorithm by segmenting, overlapping and adding the waveform – Overlapped region is cross-faded between two adjacent frames – Problem: fuzziness by the phase difference between the frames Analysis Hop Size Fade-In Fade-In Fade-Out Synthesis Hop Size Fade-Out 10

Synchronized OverLap-and-Add (SOLA) § Reduce artifacts in OLA by shifting the overlapped region such that the two adjacent frames are maximally correlated Synchronization by cross-correlation n = L − 1 ∑ X corr ( l ) = x 1 ( n ) x 2 ( n + l ) Analysis Hop Size L n = 0 0.6 0.4 0.2 0 − 0.2 − 0.4 Synthesis Hop Size − 0.6 − 0.8 0 100 200 300 400 500 600 700 800 0.6 0.4 0.2 0 − 0.2 − 0.4 Synthesis Hop Size − 0.6 − 0.8 0 100 200 300 400 500 600 700 800 Find the lag ( l ) where the cross correlation is maximum Shift the next frame by the lag 11

Pitch-Synchronous OLA (PSOLA) § Analysis – Perform block-based pitch detection and find pitch PSOLA analysis marks 𝑢 . : pitch period 𝑄 𝑢 = 𝑢 .01 − 𝑢 . – Extract a segment centered at every pitch mark 𝑢 . Pitch marks using a Hanning window with length 𝑀 . = 2𝑄(𝑢 . ) to ensure fade-in and fade-out Segments § Synthesis for time-stretching – For every synthesis pitch mark 𝑢̃ 8 , search the Pitch marks corresponding 𝑢 . that minimizes 𝛽𝑢 . − 𝑢̃ 8 – Overlap and add the selected segment Segments • If 𝛽 > 1 , some segments will be repeated • If 𝛽 < 1 , some segments will be discarded PSOLA time stretching – The next synthesis pitch mark 𝑢̃ 8 is determined to Synthesis preserve local pitch pitch marks :(𝑢̃ 8 )= 𝑢̃ 8 + 𝑄(𝑢 . ) 𝑢̃ 801 = 𝑢̃ 8 + 𝑄 • Overlap and add 12

Pitch-Shifting § Using Time-stretching and Resampling – First, perform time-stretching with ratio 𝛽 – Second, resampling the output with the same ratio 𝛽 § Problem – Timbre ( i.e. formant) changes by the resampling – This is quite audible for human voice (e.g. speech or singing ) 13

Pitch-Synchronous OLA (PSOLA) § PSOLA can be used for pitch-shifting – For every synthesis pitch mark 𝑢̃ 8 , search the corresponding 𝑢 . that minimizes 𝑢 . − 𝑢̃ 8 – Overlap and add the selected segment Pitch marks • If 𝛾 > 1 , some segments will be repeated • If 𝛾 < 1 , some segments will be discarded Segments – The next synthesis pitch mark 𝑢̃ 8 is determined to preserve local pitch PSOLA pitch shifting :(𝑢̃ 8 )= 𝑢̃ 8 + 𝑄(𝑢 . )/𝛾 𝑢̃ 801 = 𝑢̃ 8 + 𝑄 • Synthesis pitch marks § It is possible to combine the time-stretching (with the term 𝛽𝑢 . − 𝑢̃ 8 ) and pitch-shifting Overlap and add § This preserves the formant of the input sound! 14

Resources § TSM Toolbox – Time-scaling modification code using WSOLA (waveform similarity OLSA) and phase vocoder – Additionally, with harmonic-percussive source separation (HPSS) – https://www.audiolabs-erlangen.de/resources/MIR/TSMtoolbox/ 15

GCT535- Sound Technology for Multimedia Time-Stretching and - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Resampling OverLap and Add (OLA) methods SOLA WSOLA PSOLA Phase Vocoder 2

GCT535- Sound Technology for Multimedia Digital Systems Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Digital Audio Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Pitch Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Tonal Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Temporal Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Delay-based Effects Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Filters Graduate School of Culture Technology KAIST

GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Music and Audio Alignment Graduate School of Culture

GCT535- Sound Technology for Multimedia Fourier Representations of Audio Graduate School of

Multimedia Systems Definition of Multimedia System A Multimedia System is a system capable of

Multimedia Applications Multimedia Applications Srinidhi Varadarajan Multimedia Applications

Chapter 1 Introduction to Multimedia 1.1 What is Multimedia? 1.2 Multimedia and Hypermedia 1.3

Dynamic Stretching According to the Mayo Clinic, the top five benefits of stretching include :

Stretching the Limits of NR Bla Szilgyi, CalTech NRDA 2013 Stretching the Limits of NR A

Gray Level Modification Contrast Stretching (1) Underexposed picture Stretching the gray

WRP STEERING COMMITTEE WITH COMMITTEE CO-CHAIRS CALL December 2016 Todays Agenda 1.

Speech Processing 15-492/18-492 Speech Synthesis Signal Processing Signal Manipulation Signal

Intro to Scala.js Singapore-Scala, 28 Mar 2017 Li Haoyi haoyi.sg@gmail.com Bright Technology

Learning objectives Understand the purposes and importance of documentation Identify

Test case selection and adequacy criteria Automated testing and J.P . Galeotti - Alessandra

Ashik Raj Manandhar Senior Mobile Application Engineer Pocket Gems Agenda Cocos2d and Me

INTRODUCTION TO RUBY PETER COOPER If you get bored... www.petercooper.co.uk www.rubyinside.com

THE NEW FRONTIER AND THE GREAT SOCIETY KENNEDY AND JOHNSON LEAD AMERICA IN THE 1960S 1960