Rhythm Transcription Dynamic Programming Graduate School of Culture - - PowerPoint PPT Presentation
Rhythm Transcription Dynamic Programming Graduate School of Culture - - PowerPoint PPT Presentation
GCT634: Musical Applications of Machine Learning Rhythm Transcription Dynamic Programming Graduate School of Culture Technology, KAIST Juhan Nam Outlines Overview of Automatic Music Transcription (AMT) - Types of AMT Tasks Rhythmic
Outlines
- Overview of Automatic Music Transcription (AMT)
- Types of AMT Tasks
- Rhythmic Transcription
- Introduction
- Onset detection
- Tempo Estimation
- Dynamic Programming
- Beat Tracking
Overview of Automatic Music Transcription (AMT)
- Predicting musical score information from audio
- Primary score information is note but they are arranged based on rhythm,
harmony and structure
- Equivalent to automatic speech recognition (ASR) for speech signals
Model Beat Key Chord Structure Tempo Onsets
Types of AMT Tasks
- Rhythm transcription
- Onset detection
- Tempo estimation
- Beat tracking
- Tonal analysis
- Key estimation
- Chord recognition
- Timbre analysis
- Instrument identification
- Note transcription
- Monophonic note
- Polyphonic note
- Expression detection
(e.g. vibrato, pedal)
- Structure analysis
- Musical structure
- Musical boundary / repetition
detection
- Highlight detection
Types of AMT Tasks
- Rhythm transcription
- Onset detection
- Tempo estimation
- Beat tracking
- Tonal analysis
- Key estimation
- Chord recognition
- Timbre analysis
- Instrument identification
- Note transcription
- Monophonic note
- Polyphonic note
- Expression detection
(e.g. vibrato, pedal)
- Structure analysis
- Musical structure
- Musical boundary / repetition
detection
- Highlight detection
We will mainly focus on these topics!
Overview of AMT Systems
- Acoustic model
- Estimate the target information given input audio (usually short segment)
- Musical knowledge
- Music theory (e.g. rhythm, harmony), performance (e.g. playability)
- Prior/Lexical model
- Statistical distribution of the score-level music information (e.g. chord
progression) Acoustic Model Musical Knowledge Transcription Model Beat, Tempo Key, Chords Notes Prior or Lexical Model
Audio-Level Score-Level
Introduction to Rhythm
- Rhythm
- A strong, regular, and repeated pattern of sound
- Distinguish music from speech
- The most primitive and foundational element of music
- Melody, harmony and other musical elements are arranged on the basis of
rhythm
- Human and rhythm
- Human has innate ability of rhythm perception: heart beat, walking
- Associated with motor control: dance, labor song
Introduction to Rhythm
- Hierarchical structure of rhythm
- Beat (tactus): the most prominent level,
foot tapping rate
- Division (tatum): temporal atom, eighth
- r sixteenth
- Measure (bar): the unit of rhythm
pattern (and also harmonic changes)
- Notations
- Tempo: beats per minute, e.g. 90 bpm
- Time signature: e.g. 4/4, 3/4, 6/8
[Wikipedia]
Human Perception of Tempo
- Mckinney and Moelant (2006)
- Collect tapping data from 40 human subjects
- Initial synchronization delay and anticipation (by tempo estimation)
- Ambiguity in tempo: beat or its division ?
[D. Ellis’ e4896 slides]
Overview of Rhythm Transcription Systems
- Consists of several cascaded tasks that detect moments of
musical stress (accents) and their regularity
Beat Tracking Tempo Estimation Onset Detection Musical Knowledge
Onset Detection
- Identify the starting times of musical events
- Notes, drum sounds
- Types of onsets
- Hard onsets: percussive sounds
- Soft onsets: source-driven sounds (e.g. singing voice, woodwind, bowed
strings)
[M.Muller]
Example: Onset Detection
1 2 3 4 5 6 −1 −0.5 0.5 1 time [sec] amplitude ? “Eat (꺼내먹어요) ” Zion.T
Onset Detection Systems
- Onset detection function (ODF)
- Instantaneous measure of temporal change, often called “novelty” function
- Types: time-domain energy, spectral or sub-band energy, phase difference
- Decision algorithm
- Ruled-based approach
- Learning-based approach
Decision Algorithm Onset Detection Function Audio Representations
(Feature Extraction) (Classifier)
Onset Detection Function (ODF)
- Types of ODFs
- Time-domain energy
- Spectral or sub-band energy
- Phase difference
Time-Domain Onset Detection
- Local energy
- Usually have high energy at onsets
- Effective for percussive sounds
- Various versions
- Frame-level energy
- Half-wave rectification
𝑃𝐸𝐺(𝑜) = 𝐹 𝑜 = ) 𝑦 𝑜 + 𝑛 𝑥(𝑛) .
/ 012/
𝑃𝐸𝐺(𝑜) = 𝐼(𝐹 𝑜 + 1 − 𝐹 𝑜 )
𝐼 𝑠 = 𝑠 + 𝑠 2 = 8𝑠, 𝑠 ≥ 0 0, 𝑠 < 0
1 2 3 4 5 6 −1 −0.5 0.5 1 time [sec] amplitude
Waveform
1 2 3 4 5 6 5 10 15 20 time [sec] ODF 1 2 3 4 5 6 2 4 6 8 10 time [sec] ODF
Spectral-Based Onset Detection
- Spectral Flux
- Sum of the positive differences from
log spectrogram
- ODF changes depending on the
amount of compression 𝜍
time [sec] frequency−kHz 1 2 3 4 5 0.5 1 1.5 2 x 10
4
1 2 3 4 5 100 200 300 400 time [sec] ODF
𝑃𝐸𝐺(𝑜) = ) 𝐼(𝑍 𝑜 + 1, 𝑙 − 𝑍 𝑜, 𝑙 )
/2A B1C
𝑍 𝑜, 𝑙 = log 1 + 𝜍 𝑌 𝑜, 𝑙 𝑌 𝑜, 𝑙 : STFT
Phase Deviation
- Sinusoidal components of a note is continuous while the note is
sustained
- Abrupt change in phase means that there may be a new event
[D. Ellis’ e4896 slides] Deviation from the steady-state for all frequency bins
ϕk(n)−ϕk(n −1) ≈ ϕk(n −1)−ϕk(n − 2)
Phase continuation (e.g. during sustain of a single note)
Δϕk(n) =ϕk(n)− 2ϕk(n −1)+ϕk(n − 2) ≈ 0 ζ p = 1 N Δϕk(n)
k=1 N
∑
Post-Processing
- DC removal
- Subtract the mean of ODF
- Normalization
- Scaling level of ODF
- Low-pass filtering
- Remove small peaks
- Down-sampling
- For data reduction
Low-pass Filtering (Solid line)
(Tzanetakis, 2010)
Onset Decision Algorithm
- Rule-based Approach: peak detection rule
- Peaks above thresholds are determined as onsets
- The thresholds are often adaptively computed from the ODF
- Averaging and median are popular choices to compute the thresholds
threshold =α + β ⋅median(ODF) α :offset,β :scaling
Median with window size 5
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
time [sec]
50 100 150 200 250 300 350
ODF
ODF Threshold
Challenging Issue in Onset Detection: Vibrato
Onset detection using spectral flux
SuperFlux
- A state-of-the-art rule-based onset detection function
- S. Bock et al., “Maximum Filter Vibrato Suppression For Onset Detection”,
DAFx, 2013
- Step1: log-spectrogram
- Make harmonic partials have the same depth of vibrato contour
- Step2: max-filtering
- Take the maximum in a window on the frequency axis
- The vibrato contours become thicker
𝑍 𝑜, 𝑛 = log 1 + 𝑌 𝑜, 𝑙 L 𝐺 𝑙, 𝑛 𝑌 𝑜, 𝑙 : STFT 𝑍
0MN 𝑜, 𝑛 = max
(𝑍 𝑜, 𝑛 − 𝑚: 𝑛 + 𝑚 )
SuperFlux
- A state-of-the-art rule-based onset detection function
- S. Bock et al., “Maximum Filter Vibrato Suppression For Onset Detection”,
DAFx, 2013
- Step1: log-spectrogram
- Make harmonic partials have the same depth of vibrato contours
- Step2: max-filtering
- Take the maximum in a window on the frequency axis
- The vibrato contours become thicker
𝑍 𝑜, 𝑛 = log 1 + 𝑌 𝑜, 𝑙 L 𝐺 𝑙, 𝑛 𝑌 𝑜, 𝑙 : STFT 𝑍
0MN 𝑜, 𝑛 = max
(𝑍 𝑜, 𝑛 − 𝑚: 𝑛 + 𝑚 )
SuperFlux
Log-spectrogram Max-filtered Log-spectrogram
SuperFlux
- Step3: Super-flux
- Take the difference with some distance
- Assumption: frame-rate is high in onset detection (i.e. small hop size)
- Step 4: pick-picking
- 1) 𝑇𝐺∗ (𝑜) = max
(𝑇𝐺∗ 𝑜 − 𝑞𝑠𝑓0MN: 𝑜 + 𝑞𝑝𝑡𝑢0MN )
- 2) 𝑇𝐺∗ (𝑜) ≥ mean
(𝑇𝐺∗ 𝑜 − 𝑞𝑠𝑓M\]: 𝑜 + 𝑞𝑝𝑡𝑢M\] ) + 𝜀
- 3) 𝑜 − 𝑜_`a\bcde2cfeag > 𝑑𝑝𝑛𝑐𝑗𝑜𝑏𝑢𝑗𝑝𝑜 𝑥𝑗𝑒𝑢ℎ
𝑇𝐺∗(𝑜) = ) 𝐼(𝑍 𝑜 + 𝜈, 𝑙 − 𝑍 𝑜, 𝑙 )
/2A B1C
𝜈 = max (1, (𝑂 2 − min 𝑜 𝑥 𝑜 > 𝑠 ) ℎ + 0.5 (0 ≤ 𝑠 ≤ 1)
SuperFlux
Peak-picking Max-filtered Log-spectrogram
Tempo Estimation
- Estimate a regular time interval between beats
- Tempo is a global attribute of a song: e.g. bpm or mid-tempo song
- Tempo often changes within a song
- Intentionally: e.g. dramatic effect: Top 10 tempo changes
- Unintentionally: e.g. re-mastering, live performance
- There are also local tempo changes: e.g. rubato
Tempo Estimation Methods
- Auto-Correlation
- Find the periodicity as used in pitch detection
- Discrete Fourier Transform
- Use DFT over ODF and find the periodicity
- Comb-filter Banks
- Leverage the “oscillating nature” of musical beats
Auto-Correlation
- ACF is a generic method to detect periodicity of a signal
- Thus, this can be applied to ODF to find a dominant period that may
correspond to tempo
- The ACF shows the dominant peaks that indicate dominant tempi
1 2 3 4 5 −1 1 2 3 x 10
5
time [sec] ODF 1 2 3 4 5 100 200 300 400 time [sec] ODF
Onset Detection Function (spectral flux) Auto-Correlation
Tempo Estimation Using Tempo Prior
- Tempo is estimated by multiplying the prior with the auto-
correlation (observation)
- The auto-correlation corresponds to a likelihood function
- Tempo prior can be calculated from beat annotations of a dataset
- The distribution fits to a log-normal distribution well
Histogram of beats from a dataset
[D. Ellis’ e4896 slides] (Klapuri, 2003)
Beat Spectrum
- Leverage the repetitive nature of music
- Algorithm
- Step1: compute cosine distance between two
frames of magnitude spectrogram
- Step 2: sum the elements on the diagonals
(Foote, 2001)
𝑇(𝑗, 𝑘) = 𝑊
b L 𝑊 b
𝑊
b
L 𝑊
w
𝐶(𝑚) = ) 𝑇(𝑙, 𝑙 + 𝑚)
- B
Beat Spectrum
- A more robust version can be obtained
from the 2D auto-correlation of the similarity matrix
- The final beat spectrum is derived by
summing over one axis
- The left plot shows five beats and a triplet
within a beat.
- “Beat spectrogram” can be also
- btained by successive beat spectra
𝐶(𝑙, 𝑚) = ) 𝑇(𝑗, 𝑘) L 𝑇(𝑗 + 𝑙, 𝑘 + 𝑚)
- b,w
(Foote, 2001) Five beats and a triplet within a beat
Tempogram
- Algorithm
- Step 1: compute ODF from the half-wave
rectified spectral flux
- Step2: obtain the frequency and phase
that provide the maximum correlation with for the ODF and form a local sinusoidal kernel
- Step 3: accumulate the successive local
sinusoidal kernels to form a PLP curve
- Step 4: take DFT or auto-correlation
(Grosche, 2009)
k(m) = w(m −n)cos(2π( ˆ wm − ˆ ϕ))
- Modeling the onset function using sinusoid as predominant local
periodicity (PLP)
Tempogram
- Cyclic Tempogram
- Accumulate the tempogram
for integer multiples of a tempo (up to four octaves)
- Conceptually similar to
“Chromagram”
(Grosche, 2011)
Comb-Filter Banks
- Also called resonant filter banks
- Comb filter equation
- Builds up rhythmic evidences
(by anticipation?)
(Klapuri, 2006)
𝑧 𝑜 = 𝑦 𝑜 + 𝛽𝑧 𝑜 − 𝜐
Sub-band Resonant Filter Banks
- Algorithm
- A sub-band filter bank as a front-end
processing
- Parallel ODFs for 6 bands
- 150 resonators for each band and all
possible tempo values (60 - 240 bpm)
- Pick up the delay that provides the
highest peak as a tempo
(Scheirer, 1998)
Beat Tracking
- Estimate the position of beats in music
- Usually a subset of detected onsets selected by the tempo
Beat Tracking by the Resonator Model
- Once the resonator model chooses
the tempo that returns the highest peaks, the output produces a sequence of resonated peaks
- They correspond to the beats
(Scheirer, 1998)
- Find the optimal “hopping” path on music (Ellis, 2007)
- 𝐷
𝑢b : cost of the path 𝑢b
- 𝑃 𝑢b : onset strength function (i.e. ODF)
- 𝐺(∆𝑢, 𝑈): tempo (𝑈) consistency score: e.g. 𝐺 ∆𝑢, 𝑈 = −(𝑚𝑝
∆g ‚ ).
Beat Tracking by Dynamic Programming
𝐷 𝑢b = ) 𝑃 𝑢b
ƒ b1A
+ 𝛽 ) 𝐺 𝑢b − 𝑢b2A, 𝑈
ƒ b1.
. . .
1
Finding the Minimum-Cost-Path
- Naïve approach
- Find all paths from A to K and calculate the cost for each, and choose the
path that has the minimum cost.
- As the number of nodes increases, the number of possible paths increases
exponentially
A C B D E F G H 2 4 3 3 6 2 4 2 2 3 2 5 4 1 2 3 3 1 5 3 I J K 7 4 5 6 3 3 5 7 4 3 2 3 2
Dynamic Programming (DP)
- Observation
- Say the minimum-cost-path passes by a node p,
- What is the minimum-cost-path from A to p ?
- It is just a sub-path of the minimum-cost-path from A to K.
- Thus, we don’t have to compute the cost from scratch; we can use the cost
computed from the previous nodes.
A C B D E F G H 2 4 3 3 6 2 4 2 2 3 2 5 4 1 2 3 3 1 5 3 I J K 7 4 5 6 3 3 5 7 4 3 2 3 2
Dynamic Programming (DP)
- The minimum cost is computed by the following equation:
- The minimum-cost-path can be found by tracing back the
computation
Ck( j) = Ok( j)+ min
i {Ck−1(i)+cij}
Ck( j) Ok( j)
: cost up to node j : local cost at node j
cij : transition cost from i to j
A C B D E F G H 2 4 3 3 6 2 4 2 2 3 2 5 4 1 2 3 3 1 5 3 I J K 7 4 5 6 3 3 5 7 4 3 2 3 2
Applying DP to Beat Tracking
- To optimize:
- Define 𝐷∗ 𝑢 as best score up to time 𝑢 and compute it for every 𝑢
- Also, store the time that returns maximum score 𝑄 𝑢
- At the end of the sequence, traceback 𝑄 𝑢 , which returns the best path 𝑢b
𝐷 𝑢b = ) 𝑃 𝑢b
ƒ b1A
+ 𝛽 ) 𝐺 𝑢b − 𝑢b2A, 𝑈
ƒ b1.
𝐷∗ 𝑢 = 𝑃 𝑢 + max
… {𝛽𝐺 𝑢 − 𝜐, 𝑈 + 𝐷∗ 𝜐 }
𝑄 𝑢 = argmax
…
{𝛽𝐺 𝑢 − 𝜐, 𝑈 + 𝐷∗ 𝜐 }
1 2 3 4 5 100 200 300 400 time [sec] ODF
𝑢 𝜐 𝐷∗ 𝑢
Example of DP to Beat Tracking
References
- E. Scheirer, “Tempo and Beat Analysis of Acoustic Musical Signals”, 1998
- J. Foote and S. Uchihashi, “The Beat Spectrum: A New Approach to Rhythm
Analysis”, 2001
- G. Tzanekatis, “Musical Genre Classification of Audio Signals”, 2002
- A. Klapuri, “Analysis of the Meter of Acoustic Musical Signals”, 2006
- P. Grosche and M. Muller, “Computing Predominant Local Periodicity
Information In Music Recordings”, 2009
- P. Grosche and M. Muller, “Cyclic Tempogram – A Mid-Level Tempo
Representation For Music Signals”, 2010
- D. Ellis, “Beat Tracking by Dynamics Programming”, 2007
- S. Bock and G. Widmer, “Maximum Filter Vibrato Suppression For Onset