GCT535- Sound Technology for Multimedia Temporal Analysis Graduate - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Temporal Analysis Graduate School of Culture Technology KAIST Juhan Nam 1

Outlines § Temporal Analysis – Introduction – Human perception of Tempo § Onset detection – Definition – Onset Detection Functions § Tempo Estimation – Beat histogram – Auto-correlation – Comb-Filter banks § Applications 2

Introduction § Rhythm – A strong, regular, repeated pattern of movement or sound. § The most primitive and foundational element of music – Melody, harmony, musical forms and other musical elements are arranged on the basis of rhythm § Human and rhythm – Human has innate ability of rhythm perception: heart beat, walking – Associated with motor control: dance, labor song 3

Rhythm Analysis § Hierarchical structure of rhythm (meter) – Division (tatum): temporal atom, eighth or sixteenth – Beat (tactus): the most prominent level, foot tapping rate – Measure (bar): the unit of rhythm pattern (and also harmonic changes) § Notations – Tempo: speed rate of beat, e.g. 90 bpm (beats per minute) – Time signature: 4/4 , 3/4, 6/8, ... [Wikipedia] 4

Human Perception of Tempo § Mckinney and Moelant (2006) – Collect tapping data from 40 human subjects – Initial synchronization delay and anticipation (by tempo estimation) – Ambiguity in tempo: beat or its division ? [From D. Ellis’ e4896 course slides] 5

Rhythm Analysis in MIR § A process of detecting moments of musical stress (accents) in an acoustic signal and filtering them so that underlying periodicities are discovered. § Onset Detection Onset Tempo Beat Detection Estimation Tracking § Tempo Estimation Musical § Beat Tracking Knowledge (Prior) 6

Onset Detection § Identify the starting times of musical events – Notes, drum sounds [M.Muller] § Types of onsets – Hard onsets: percussive sounds – Soft onsets: source-driven sounds (e.g. singing voice, woodwind, bowed strings) 7

Example: Onset Detection 1 0.5 amplitude 0 “Eat ( 꺼내먹어요 ) ” − 0.5 Zion.T − 1 0 1 2 3 4 5 6 ? time [sec] 8

Onset Detection § Onset Detection Function (ODF) – Instantaneous measure of temporal change in signals – Often called “novelty” function § Types of ODFs – Time-domain energy – Spectral or sub-band energy – Phase difference – Statistical methods 9

Time-Domain Onset Detection § Local energy – Usually have high energy at onsets – Effective for percussive sounds / 𝑦 𝑜 + 𝑛 𝑥(𝑛) . 𝑃𝐸𝐺(𝑜) = 𝐹 𝑜 = ) 𝑥(𝑛) : window 012/ 1 20 0.5 15 amplitude ODF 0 10 − 0.5 5 − 1 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 time [sec] time [sec] Waveform Onset Detection Function 10

Time-Domain Onset Detection § Local energy with half-wave rectification – Interested in increasing energy for onsets – Take positive differences of the local energy 𝐼 𝑠 = 𝑠 + 𝑠 = 6𝑠, 𝑠 ≥ 0 𝑃𝐸𝐺(𝑜) = 𝐼(𝐹 𝑜 + 1 − 𝐹 𝑜 ) 0, 𝑠 < 0 2 20 10 8 15 6 ODF ODF 10 4 5 2 0 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 time [sec] time [sec] 11

Time-Domain Onset Detection § Positive differences of log-energy – Human perception of sound intensity is logarithmic – Note that we often add an small value before taking the log 𝑃𝐸𝐺(𝑜) = 𝐼(log (𝐹 𝑜 + 1 ) − log (𝐹 𝑜 )) 10 8 8 6 6 ODF ODF 4 4 2 2 0 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 time [sec] time [sec] 12

Spectral-Based Onset Detection 4 x 10 2 § Spectral Flux – Sum of the positive differences from log 1.5 spectrogram frequency − kHz – ODF changes depending on the amount of 1 compression 𝜍 0.5 𝑍 𝑜, 𝑙 = log 1 + 𝜍 𝑌 𝑜, 𝑙 𝑌 𝑜, 𝑙 : STFT 0 1 2 3 4 5 time [sec] /2D 400 𝑃𝐸𝐺(𝑜) = ) 𝐼(𝑍 𝑜 + 1, 𝑙 − 𝑍 𝑜, 𝑙 ) 300 E1F ODF 200 100 0 0 1 2 3 4 5 time [sec] 13

Phase Deviation § Sinusoidal components of a note is continuous while the note is sustained – Abrupt change in phase means that there may be a new event [From D. Ellis’ e4896 course slides] ϕ k ( n ) − ϕ k ( n − 1) ≈ ϕ k ( n − 1) − ϕ k ( n − 2) Phase continuation (e.g. during sustain of a single note) Δ ϕ k ( n ) = ϕ k ( n ) − 2 ϕ k ( n − 1) + ϕ k ( n − 2) ≈ 0 N ζ p = 1 Deviation from the steady-state ∑ Δ ϕ k ( n ) for all frequency bins N k = 1 14

Post-Processing § DC removal – Subtract the mean of ODF § Normalization – Scaling level of ODF § Low-pass filtering – Remove small peaks § Down-sampling – For data reduction Low-pass Filtering (Solid line) (Tzanetakis, 2010) 15

Determining the Onsets § Peak Detection – Peaks above thresholds are determined as onsets – The thresholds are often adaptively computed from the ODF – Averaging and median are popular choices to compute the thresholds threshold = α + β ⋅ median( ODF ) α :offset, β :scaling 350 ODF 300 Threshold 250 200 ODF 150 100 50 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 time [sec] Median with window size 5 16

References § J. Bello, “A Tutorial On Onset Detection in Musical Signals”, 2005 § S. Dixon, “Onset Detection Revisited”, 2006 § S. Bock et. al., “Evaluating the Online Capabilities of Onset Detection Methods”, 2012 § S. Bock et. al., “Maximum Filter Vibrato Suppression For Onset Detection”, 2013 17

Tempo Estimation § Estimate a regular time interval between beats – Tempo is a global attribute of a song – However, tempo often changes within a song • Intentionally: e.g. dramatic effect: Top 10 tempo changes • Unintentionally: e.g. re-mastering, live performance – There are also local changes in the regularity: e.g. rubato 18

Tempo estimation methods § Auto-Correlation – Find the periodicity as used in pitch detection § Discrete Fourier Transform – use DFT over ODF and find the periodicity § Comb-filter Banks – Leverage the “oscillating nature” of musical beats 19

Auto-Correlation § ACF is a generic method to detect periodicity of a signal – Thus, this can be applied to ODF to find a dominant period that may correspond to tempo – The ACF shows the dominant peaks that indicate dominant tempi 5 3 x 10 400 300 2 ODF ODF 200 1 100 0 0 − 1 0 1 2 3 4 5 0 1 2 3 4 5 time [sec] time [sec] Onset Detection Function (spectral flux) Auto-Correlation 20

Tempo Estimation using Tempo Prior § Tempo is estimated by multiplying the prior with the auto-correlation (observation) – In a Bayesian sense, it is like a posterior. – Tempo prior can be calculated from beat annotations of a dataset • The distribution fits to a log-normal distribution well Histogram of beats from a dataset (Klapuri, 2003) [From D. Ellis’ e4896 course slides] 21

Beat Histogram § Discrete wavelet transform as a sub-band approach § Full-wave rectification to extract envelope § Picked up three highest peaks of the auto-correlation in an appropriate range (40-200 bpm) and accumulate them over segments. (Tzanetakis, 2002) 22

Example of Beat Histogram (Tzanetakis, 2002) 23

Beat Spectrum § Leverage the repetitive nature of music § Compute cosine distances between two frames of magnitude responses D C ( i , j ) = v i • v j v i v j § Visualize all pairs as a 2-D matrix S – The matrix in the left shows 34 notes in the piece § Beat spectrum is derived by summing the matrix S on the diagonal ∑ B ( l ) = S ( k , k + l ) k ∈ R (Foote, 2001) 24

Beat Spectrum § A more robust version can be obtained from the auto-correlation of the matrix S ∑ B ( k , l ) = S ( i , j ) ⋅ S ( i + k , j + l ) i , j § The final beat spectrum is derived by summing over one variable – The left plot shows five beats and a triplet within a beat. § “Beat spectrogram” can be also obtained by successive beat spectra (Foote, 2001) 25

Tempogram § Compute ODF from the half-wave rectified spectral flux § Compute “Predominant Local Periodicity (PLP)” – Obtain the frequency and phase that provide the maximum magnitude for the ODF – Form a local sinusoidal kernel k ( m ) = w ( m − n )cos(2 π ( ˆ wm − ˆ ϕ )) – Accumulate the successive local sinusoidal kernels to form a PLP curve (Grosche, 2009) 26

Tempogram § Take DFT or ACF over ODF – Generate Fourier Tempogram or Auto- correlation Tempogram § Cyclic Tempogram – Accumulate the tempogram for integer multiples of a tempo (up to four octaves) (Grosche, 2011) 27

Comb-Filter Banks § Also called resonant filter banks – Comb-Filter equation y ( t ) = α y ( t − τ ) + (1 − α ) x ( t ) – Compute this for the delay τ § Builds up rhythmic evidences (by anticipation?) (Klapuri, 2006) 28

Sub-band Filter Banks § A sub-band filter bank as a front-end processing § Parallel ODFs for 6 bands § 150 resonators for each band and all possible tempo values (60 - 240 bpm) § Pick up the delay that provides the highest peak as a tempo – Beat tracking is possible directly from the result. – This is the advantage of the resonant filter bank approach (Scheirer, 1998) 29

GCT535- Sound Technology for Multimedia Temporal Analysis Graduate - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Temporal Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Temporal Analysis Introduction Human perception of Tempo Onset detection Definition Onset

GCT535- Sound Technology for Multimedia Tonal Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Digital Systems Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Digital Audio Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Pitch Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Delay-based Effects Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Filters Graduate School of Culture Technology KAIST

GCT535- Sound Technology for Multimedia Music and Audio Alignment Graduate School of Culture

GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of

GCT535- Sound Technology for Multimedia Fourier Representations of Audio Graduate School of

Multimedia Systems Definition of Multimedia System A Multimedia System is a system capable of

Multimedia Applications Multimedia Applications Srinidhi Varadarajan Multimedia Applications

Chapter 1 Introduction to Multimedia 1.1 What is Multimedia? 1.2 Multimedia and Hypermedia 1.3

Summary User-centric Social Social Multimedia Multimedia Computing From Users: user-perceptive

Distributed Multimedia Systems 8. Multimedia Applications Multimedia Applications - 1 Lszl

Multimedia Information Retrieval 1 What is multimedia information retrieval? 2 Basic Multimedia

Tempo and Beat Tracking Meinard Mller International Audio Laboratories Erlangen

TEMPO: Fast Mask Topography Effect Modeling with Deep Learning Wei Ye 1 , Mohamed Baker Alawieh 1

Wireless Communication Systems @CS.NCTU Lecture 15: mmWave Lecturer: Kate Ching-Ju Lin (

Flip Video: A Tool for Itinerant Educators April, 2010 Patti Glumack Deb Williamson Flip Video

ISMIR 2006 TUTORIAL: Computational Rhythm Description Fabien Gouyon Simon Dixon Austrian

Interactive Performance Systems for Rock Music Andrew Robertson, Centre for Digital Music

Surface Heat Load Modelling on Tungsten Monoblocks in the ITER Divertor FIP/1-2 25th Fusion

TEMPO, A Program Specializer for C Renaud MARLET Compose group IRISA / INRIA Rennes (France)