GCT535- Sound Technology for Multimedia Temporal Analysis Graduate - - PowerPoint PPT Presentation

gct535 sound technology for multimedia temporal analysis
SMART_READER_LITE
LIVE PREVIEW

GCT535- Sound Technology for Multimedia Temporal Analysis Graduate - - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Temporal Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Temporal Analysis Introduction Human perception of Tempo Onset detection Definition Onset


slide-1
SLIDE 1

GCT535- Sound Technology for Multimedia Temporal Analysis

Graduate School of Culture Technology KAIST Juhan Nam

1

slide-2
SLIDE 2

Outlines

§ Temporal Analysis

– Introduction – Human perception of Tempo

§ Onset detection

– Definition – Onset Detection Functions

§ Tempo Estimation

– Beat histogram – Auto-correlation – Comb-Filter banks

§ Applications

2

slide-3
SLIDE 3

Introduction

§ Rhythm

– A strong, regular, repeated pattern of movement or sound.

§ The most primitive and foundational element of music

– Melody, harmony, musical forms and other musical elements are arranged on the basis of rhythm

§ Human and rhythm

– Human has innate ability of rhythm perception: heart beat, walking – Associated with motor control: dance, labor song

3

slide-4
SLIDE 4

Rhythm Analysis

§ Hierarchical structure of rhythm (meter)

– Division (tatum): temporal atom, eighth or sixteenth – Beat (tactus): the most prominent level, foot tapping rate – Measure (bar): the unit of rhythm pattern (and also harmonic changes)

§ Notations

– Tempo: speed rate of beat, e.g. 90 bpm (beats per minute) – Time signature: 4/4 , 3/4, 6/8, ...

4

[Wikipedia]

slide-5
SLIDE 5

Human Perception of Tempo

§ Mckinney and Moelant (2006)

– Collect tapping data from 40 human subjects – Initial synchronization delay and anticipation (by tempo estimation) – Ambiguity in tempo: beat or its division ?

5

[From D. Ellis’ e4896 course slides]

slide-6
SLIDE 6

Rhythm Analysis in MIR

§ A process of detecting moments of musical stress (accents) in an acoustic signal and filtering them so that underlying periodicities are discovered. § Onset Detection § Tempo Estimation § Beat Tracking

6

Beat Tracking Tempo Estimation Onset Detection Musical Knowledge (Prior)

slide-7
SLIDE 7

Onset Detection

§ Identify the starting times of musical events

– Notes, drum sounds

§ Types of onsets

– Hard onsets: percussive sounds – Soft onsets: source-driven sounds (e.g. singing voice, woodwind, bowed strings)

7

[M.Muller]

slide-8
SLIDE 8

Example: Onset Detection

8

1 2 3 4 5 6 −1 −0.5 0.5 1 time [sec] amplitude

? “Eat (꺼내먹어요) ” Zion.T

slide-9
SLIDE 9

Onset Detection

§ Onset Detection Function (ODF)

– Instantaneous measure of temporal change in signals – Often called “novelty” function

§ Types of ODFs

– Time-domain energy – Spectral or sub-band energy – Phase difference – Statistical methods

9

slide-10
SLIDE 10

1 2 3 4 5 6 5 10 15 20 time [sec] ODF

§ Local energy

– Usually have high energy at onsets – Effective for percussive sounds

1 2 3 4 5 6 −1 −0.5 0.5 1 time [sec] amplitude

Time-Domain Onset Detection

10

Waveform Onset Detection Function

𝑃𝐸𝐺(𝑜) = 𝐹 𝑜 = ) 𝑦 𝑜 + 𝑛 𝑥(𝑛) .

/ 012/

𝑥(𝑛): window

slide-11
SLIDE 11

Time-Domain Onset Detection

§ Local energy with half-wave rectification

– Interested in increasing energy for onsets – Take positive differences of the local energy

11

𝐼 𝑠 = 𝑠 + 𝑠 2 = 6𝑠, 𝑠 ≥ 0 0, 𝑠 < 0 𝑃𝐸𝐺(𝑜) = 𝐼(𝐹 𝑜 + 1 − 𝐹 𝑜 )

1 2 3 4 5 6 2 4 6 8 10 time [sec] ODF 1 2 3 4 5 6 5 10 15 20 time [sec] ODF

slide-12
SLIDE 12

1 2 3 4 5 6 2 4 6 8 time [sec] ODF

Time-Domain Onset Detection

§ Positive differences of log-energy

– Human perception of sound intensity is logarithmic – Note that we often add an small value before taking the log

12

𝑃𝐸𝐺(𝑜) = 𝐼(log (𝐹 𝑜 + 1 ) − log (𝐹 𝑜 ))

1 2 3 4 5 6 2 4 6 8 10 time [sec] ODF

slide-13
SLIDE 13

Spectral-Based Onset Detection

§ Spectral Flux

– Sum of the positive differences from log spectrogram – ODF changes depending on the amount of compression 𝜍

13

𝑃𝐸𝐺(𝑜) = ) 𝐼(𝑍 𝑜 + 1, 𝑙 − 𝑍 𝑜, 𝑙 )

/2D E1F

𝑍 𝑜, 𝑙 = log 1 + 𝜍 𝑌 𝑜, 𝑙 𝑌 𝑜, 𝑙 : STFT

time [sec] frequency−kHz 1 2 3 4 5 0.5 1 1.5 2 x 10

4

1 2 3 4 5 100 200 300 400 time [sec] ODF

slide-14
SLIDE 14

Phase Deviation

§ Sinusoidal components of a note is continuous while the note is sustained

– Abrupt change in phase means that there may be a new event

14

Deviation from the steady-state for all frequency bins [From D. Ellis’ e4896 course slides]

ϕk(n)−ϕk(n −1) ≈ ϕk(n −1)−ϕk(n − 2)

Phase continuation (e.g. during sustain of a single note)

Δϕk(n) =ϕk(n)− 2ϕk(n −1)+ϕk(n − 2) ≈ 0 ζ p = 1 N Δϕk(n)

k=1 N

slide-15
SLIDE 15

Post-Processing

§ DC removal

– Subtract the mean of ODF

§ Normalization

– Scaling level of ODF

§ Low-pass filtering

– Remove small peaks

§ Down-sampling

– For data reduction

15

Low-pass Filtering (Solid line)

(Tzanetakis, 2010)

slide-16
SLIDE 16

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

time [sec]

50 100 150 200 250 300 350

ODF

ODF Threshold

Determining the Onsets

§ Peak Detection

– Peaks above thresholds are determined as onsets – The thresholds are often adaptively computed from the ODF – Averaging and median are popular choices to compute the thresholds

16

threshold =α + β ⋅median(ODF) α :offset,β :scaling

Median with window size 5

slide-17
SLIDE 17

References

§ J. Bello, “A Tutorial On Onset Detection in Musical Signals”, 2005 § S. Dixon, “Onset Detection Revisited”, 2006 § S. Bock et. al., “Evaluating the Online Capabilities of Onset Detection Methods”, 2012 § S. Bock et. al., “Maximum Filter Vibrato Suppression For Onset Detection”, 2013

17

slide-18
SLIDE 18

Tempo Estimation

§ Estimate a regular time interval between beats

– Tempo is a global attribute of a song – However, tempo often changes within a song

  • Intentionally: e.g. dramatic effect: Top 10 tempo changes
  • Unintentionally: e.g. re-mastering, live performance

– There are also local changes in the regularity: e.g. rubato

18

slide-19
SLIDE 19

Tempo estimation methods

§ Auto-Correlation

– Find the periodicity as used in pitch detection

§ Discrete Fourier Transform

– use DFT over ODF and find the periodicity

§ Comb-filter Banks

– Leverage the “oscillating nature” of musical beats

19

slide-20
SLIDE 20

1 2 3 4 5 −1 1 2 3 x 10

5

time [sec] ODF 1 2 3 4 5 100 200 300 400 time [sec] ODF

Auto-Correlation

§ ACF is a generic method to detect periodicity of a signal

– Thus, this can be applied to ODF to find a dominant period that may correspond to tempo – The ACF shows the dominant peaks that indicate dominant tempi

20

Onset Detection Function (spectral flux) Auto-Correlation

slide-21
SLIDE 21

Tempo Estimation using Tempo Prior

§ Tempo is estimated by multiplying the prior with the auto-correlation (observation)

– In a Bayesian sense, it is like a posterior. – Tempo prior can be calculated from beat annotations of a dataset

  • The distribution fits to a log-normal distribution well

21

Histogram of beats from a dataset [From D. Ellis’ e4896 course slides] (Klapuri, 2003)

slide-22
SLIDE 22

Beat Histogram

§ Discrete wavelet transform as a sub-band approach § Full-wave rectification to extract envelope § Picked up three highest peaks of the auto-correlation in an appropriate range (40-200 bpm) and accumulate them over segments.

22

(Tzanetakis, 2002)

slide-23
SLIDE 23

Example of Beat Histogram

23

(Tzanetakis, 2002)

slide-24
SLIDE 24

Beat Spectrum

§ Leverage the repetitive nature of music § Compute cosine distances between two frames of magnitude responses § Visualize all pairs as a 2-D matrix S

– The matrix in the left shows 34 notes in the piece

§ Beat spectrum is derived by summing the matrix S on the diagonal

24

DC(i, j) = vi •vj vi vj

B(l) = S(k,k +l)

k∈R

(Foote, 2001)

slide-25
SLIDE 25

Beat Spectrum

§ A more robust version can be obtained from the auto-correlation of the matrix S § The final beat spectrum is derived by summing over one variable

– The left plot shows five beats and a triplet within a beat.

§ “Beat spectrogram” can be also

  • btained by successive beat spectra

25

B(k,l) = S(i, j)⋅S(i+ k, j +l)

i, j

(Foote, 2001)

slide-26
SLIDE 26

Tempogram

§ Compute ODF from the half-wave rectified spectral flux § Compute “Predominant Local Periodicity (PLP)”

– Obtain the frequency and phase that provide the maximum magnitude for the ODF – Form a local sinusoidal kernel – Accumulate the successive local sinusoidal kernels to form a PLP curve

26

(Grosche, 2009)

k(m) = w(m −n)cos(2π( ˆ wm − ˆ ϕ))

slide-27
SLIDE 27

Tempogram

§ Take DFT or ACF over ODF

– Generate Fourier Tempogram or Auto- correlation Tempogram

§ Cyclic Tempogram

– Accumulate the tempogram for integer multiples of a tempo (up to four octaves)

27

(Grosche, 2011)

slide-28
SLIDE 28

Comb-Filter Banks

§ Also called resonant filter banks

– Comb-Filter equation – Compute this for the delay

§ Builds up rhythmic evidences (by anticipation?)

28

y(t) =αy(t −τ )+(1−α)x(t)

(Klapuri, 2006)

τ

slide-29
SLIDE 29

Sub-band Filter Banks

§ A sub-band filter bank as a front-end processing § Parallel ODFs for 6 bands § 150 resonators for each band and all possible tempo values (60 - 240 bpm) § Pick up the delay that provides the highest peak as a tempo

– Beat tracking is possible directly from the result. – This is the advantage of the resonant filter bank approach

29

(Scheirer, 1998)

slide-30
SLIDE 30

Applications

§ Music Transcription

– As a sub module

§ Beat-Synchronous Audio Representations

– Audio editing in DAW: cut, paste, time-stretching – Beat-synchronous audio features (MFCC, Chroma)

§ Music Performance

– Beat-synchronous digital audio effects: delay, flanger – Robot performance: https://www.youtube.com/watch?v=AJ--LrnkR6Y

§ Music Classification and Retrieval

30

slide-31
SLIDE 31

References

§ E. Scheirer, “Tempo and Beat Analysis of Acoustic Musical Signals”, 1998 § J. Foote and S. Uchihashi, “The Beat Spectrum: A New Approach to Rhythm Analysis”, 2001 § G. Tzanekatis, “Musical Genre Classification of Audio Signals”, 2002 § A. Klapuri, “Analysis of the Meter of Acoustic Musical Signals”, 2006 § P. Grosche and M. Muller, “Computing Predominant Local Periodicity Information In Music Recordings”, 2009 § P. Grosche and M. Muller, “Cyclic Tempogram – A Mid-Level Tempo Representation For Music Signals”, 2010

31