gct535 sound technology for multimedia time stretching
play

GCT535- Sound Technology for Multimedia Time-Stretching and - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Resampling OverLap and Add (OLA) methods SOLA WSOLA PSOLA Phase Vocoder 2


  1. GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of Culture Technology KAIST Juhan Nam 1

  2. Outlines § Resampling § OverLap and Add (OLA) methods – SOLA – WSOLA – PSOLA § Phase Vocoder 2

  3. Playback Rate Conversion § “Playback rate” is not necessarily equal to the recording rate § Adjusting the playback rate given the recorded audio creates different tones – Sliding tapes on the magnetic header in a variable speed – Speeding down: “monster-like” – Speeding up: “chipmunk-like” – It can be even negative rate: reverse playback § Demo – https://musiclab.chromeexperiments.com/Voice-Spinner 3

  4. Resampling § Reconstruct the original signal and sample it with a new rate – For a digital system with a constant playback rate • Up-sampling makes the original sound played slower • Down-sampling makes the original sound played faster 4

  5. Resampling by Reconstruction Lowpass Filters § As you recall from the topic of digital audio, the original signal can be reconstructed by the sinc function – Resampling on the reconstructed signal is equivalent to interpolation with the reconstruction filter

  6. Reconstruction Lowpass Filters (Interpolation Filters) Windowed Sinc Linear 1.5 1.5 1 1 0.5 0.5 0 0 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 Sample Time Sample Time 3rd − order B − spline 3rd − order Lagrange 1.5 1.5 1 1 0.5 0.5 0 0 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 Sample Time Sample Time 6

  7. Resampling § Resampling changes pitch, length and timbre at the same time! Original Speed Down (Up-sampling) Speed Up (Down-sampling) [The DaFX book] 7

  8. How can we control pitch and length independently? § Yes, the answer is processing samples in frame-level instead of sample- level – The sample block preserves the local shape of waveforms Sample Block Analysis hop size Synthesis hop size 8

  9. Time-Stretching § Time-Stretching (without pitch-shifting) # $ – Time-stretching ratio: 𝛽 = # % ( 𝐼 ' : synthesis hop size, 𝐼 ( : analysis hop size) – If 𝛽 > 1 , increase the length – If 𝛽 < 1 , reduce the length § Algorithms – OLA – SOLA – PSOLA – Phase Vocoder 9

  10. OverLap-and-Add (OLA) § A time-stretching algorithm by segmenting, overlapping and adding the waveform – Overlapped region is cross-faded between two adjacent frames – Problem: fuzziness by the phase difference between the frames Analysis Hop Size Fade-In Fade-In Fade-Out Synthesis Hop Size Fade-Out 10

  11. Synchronized OverLap-and-Add (SOLA) § Reduce artifacts in OLA by shifting the overlapped region such that the two adjacent frames are maximally correlated Synchronization by cross-correlation n = L − 1 ∑ X corr ( l ) = x 1 ( n ) x 2 ( n + l ) Analysis Hop Size L n = 0 0.6 0.4 0.2 0 − 0.2 − 0.4 Synthesis Hop Size − 0.6 − 0.8 0 100 200 300 400 500 600 700 800 0.6 0.4 0.2 0 − 0.2 − 0.4 Synthesis Hop Size − 0.6 − 0.8 0 100 200 300 400 500 600 700 800 Find the lag ( l ) where the cross correlation is maximum Shift the next frame by the lag 11

  12. Pitch-Synchronous OLA (PSOLA) § Analysis – Perform block-based pitch detection and find pitch PSOLA analysis marks 𝑢 . : pitch period 𝑄 𝑢 = 𝑢 .01 − 𝑢 . – Extract a segment centered at every pitch mark 𝑢 . Pitch marks using a Hanning window with length 𝑀 . = 2𝑄(𝑢 . ) to ensure fade-in and fade-out Segments § Synthesis for time-stretching – For every synthesis pitch mark 𝑢̃ 8 , search the Pitch marks corresponding 𝑢 . that minimizes 𝛽𝑢 . − 𝑢̃ 8 – Overlap and add the selected segment Segments • If 𝛽 > 1 , some segments will be repeated • If 𝛽 < 1 , some segments will be discarded PSOLA time stretching – The next synthesis pitch mark 𝑢̃ 8 is determined to Synthesis preserve local pitch pitch marks :(𝑢̃ 8 )= 𝑢̃ 8 + 𝑄(𝑢 . ) 𝑢̃ 801 = 𝑢̃ 8 + 𝑄 • Overlap and add 12

  13. Pitch-Shifting § Using Time-stretching and Resampling – First, perform time-stretching with ratio 𝛽 – Second, resampling the output with the same ratio 𝛽 § Problem – Timbre ( i.e. formant) changes by the resampling – This is quite audible for human voice (e.g. speech or singing ) 13

  14. Pitch-Synchronous OLA (PSOLA) § PSOLA can be used for pitch-shifting – For every synthesis pitch mark 𝑢̃ 8 , search the corresponding 𝑢 . that minimizes 𝑢 . − 𝑢̃ 8 – Overlap and add the selected segment Pitch marks • If 𝛾 > 1 , some segments will be repeated • If 𝛾 < 1 , some segments will be discarded Segments – The next synthesis pitch mark 𝑢̃ 8 is determined to preserve local pitch PSOLA pitch shifting :(𝑢̃ 8 )= 𝑢̃ 8 + 𝑄(𝑢 . )/𝛾 𝑢̃ 801 = 𝑢̃ 8 + 𝑄 • Synthesis pitch marks § It is possible to combine the time-stretching (with the term 𝛽𝑢 . − 𝑢̃ 8 ) and pitch-shifting Overlap and add § This preserves the formant of the input sound! 14

  15. Resources § TSM Toolbox – Time-scaling modification code using WSOLA (waveform similarity OLSA) and phase vocoder – Additionally, with harmonic-percussive source separation (HPSS) – https://www.audiolabs-erlangen.de/resources/MIR/TSMtoolbox/ 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend