GCT535- Sound Technology for Multimedia Time-Stretching and - - PowerPoint PPT Presentation

gct535 sound technology for multimedia time stretching
SMART_READER_LITE
LIVE PREVIEW

GCT535- Sound Technology for Multimedia Time-Stretching and - - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Resampling OverLap and Add (OLA) methods SOLA WSOLA PSOLA Phase Vocoder 2


slide-1
SLIDE 1

GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting

Graduate School of Culture Technology KAIST Juhan Nam

1

slide-2
SLIDE 2

Outlines

§ Resampling § OverLap and Add (OLA) methods

– SOLA – WSOLA – PSOLA

§ Phase Vocoder

2

slide-3
SLIDE 3

Playback Rate Conversion

§ “Playback rate” is not necessarily equal to the recording rate § Adjusting the playback rate given the recorded audio creates different tones

– Sliding tapes on the magnetic header in a variable speed – Speeding down: “monster-like” – Speeding up: “chipmunk-like” – It can be even negative rate: reverse playback

§ Demo

– https://musiclab.chromeexperiments.com/Voice-Spinner

3

slide-4
SLIDE 4

§ Reconstruct the original signal and sample it with a new rate

– For a digital system with a constant playback rate

  • Up-sampling makes the original sound played slower
  • Down-sampling makes the original sound played faster

Resampling

4

slide-5
SLIDE 5

Resampling by Reconstruction Lowpass Filters

§ As you recall from the topic of digital audio, the original signal can be reconstructed by the sinc function

– Resampling on the reconstructed signal is equivalent to interpolation with the reconstruction filter

slide-6
SLIDE 6

Reconstruction Lowpass Filters (Interpolation Filters)

6

−5 −4 −3 −2 −1 1 2 3 4 5 0.5 1 1.5 Windowed Sinc Sample Time −5 −4 −3 −2 −1 1 2 3 4 5 0.5 1 1.5 Linear Sample Time −5 −4 −3 −2 −1 1 2 3 4 5 0.5 1 1.5 3rd−order B−spline Sample Time −5 −4 −3 −2 −1 1 2 3 4 5 0.5 1 1.5 3rd−order Lagrange Sample Time

slide-7
SLIDE 7

Resampling

§ Resampling changes pitch, length and timbre at the same time!

7

[The DaFX book] Original Speed Down (Up-sampling) Speed Up (Down-sampling)

slide-8
SLIDE 8

§ Yes, the answer is processing samples in frame-level instead of sample- level

– The sample block preserves the local shape of waveforms

How can we control pitch and length independently?

8

Sample Block Analysis hop size Synthesis hop size

slide-9
SLIDE 9

Time-Stretching

§ Time-Stretching (without pitch-shifting)

– Time-stretching ratio: 𝛽 =

#$ #% (𝐼': synthesis hop size, 𝐼 ( : analysis hop size)

– If 𝛽 > 1, increase the length – If 𝛽 < 1, reduce the length

§ Algorithms

– OLA – SOLA – PSOLA – Phase Vocoder

9

slide-10
SLIDE 10

OverLap-and-Add (OLA)

§ A time-stretching algorithm by segmenting, overlapping and adding the waveform

– Overlapped region is cross-faded between two adjacent frames – Problem: fuzziness by the phase difference between the frames

10 Analysis Hop Size Fade-In Fade-Out Fade-Out Fade-In Synthesis Hop Size

slide-11
SLIDE 11 100 200 300 400 500 600 700 800 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 100 200 300 400 500 600 700 800 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6

Synchronized OverLap-and-Add (SOLA)

§ Reduce artifacts in OLA by shifting the overlapped region such that the two adjacent frames are maximally correlated

11 Analysis Hop Size Synthesis Hop Size Synthesis Hop Size L

Synchronization by cross-correlation Xcorr(l) = x1(n)

n=0 n=L−1

x2(n +l) Find the lag (l) where the cross correlation is maximum Shift the next frame by the lag

slide-12
SLIDE 12

Pitch-Synchronous OLA (PSOLA)

§ Analysis

– Perform block-based pitch detection and find pitch marks 𝑢. : pitch period 𝑄 𝑢 = 𝑢.01 − 𝑢. – Extract a segment centered at every pitch mark 𝑢. using a Hanning window with length 𝑀. = 2𝑄(𝑢.) to ensure fade-in and fade-out

§ Synthesis for time-stretching

– For every synthesis pitch mark 𝑢̃8, search the corresponding 𝑢. that minimizes 𝛽𝑢. − 𝑢̃8 – Overlap and add the selected segment

  • If 𝛽 > 1, some segments will be repeated
  • If 𝛽 < 1, some segments will be discarded

– The next synthesis pitch mark 𝑢̃8 is determined to preserve local pitch

  • 𝑢̃801= 𝑢̃8 + 𝑄

:(𝑢̃8)=𝑢̃8 + 𝑄(𝑢.)

12

Pitch marks PSOLA analysis Segments

Pitch marks Segments PSOLA time stretching Synthesis pitch marks Overlap and add

slide-13
SLIDE 13

Pitch-Shifting

§ Using Time-stretching and Resampling

– First, perform time-stretching with ratio 𝛽 – Second, resampling the output with the same ratio 𝛽

§ Problem

– Timbre ( i.e. formant) changes by the resampling – This is quite audible for human voice (e.g. speech or singing )

13

slide-14
SLIDE 14

Pitch-Synchronous OLA (PSOLA)

§ PSOLA can be used for pitch-shifting

– For every synthesis pitch mark 𝑢̃8, search the corresponding 𝑢. that minimizes 𝑢. − 𝑢̃8 – Overlap and add the selected segment

  • If 𝛾 > 1, some segments will be repeated
  • If 𝛾 < 1, some segments will be discarded

– The next synthesis pitch mark 𝑢̃8 is determined to preserve local pitch

  • 𝑢̃801= 𝑢̃8 + 𝑄

:(𝑢̃8)=𝑢̃8 + 𝑄(𝑢.)/𝛾

§ It is possible to combine the time-stretching (with the term 𝛽𝑢. − 𝑢̃8 ) and pitch-shifting § This preserves the formant of the input sound!

14

Pitch marks Segments PSOLA pitch shifting Synthesis pitch marks Overlap and add

slide-15
SLIDE 15

Resources

§ TSM Toolbox

– Time-scaling modification code using WSOLA (waveform similarity OLSA) and phase vocoder – Additionally, with harmonic-percussive source separation (HPSS) – https://www.audiolabs-erlangen.de/resources/MIR/TSMtoolbox/

15