Rhythm Transcription Dynamic Programming Graduate School of Culture - - PowerPoint PPT Presentation

rhythm transcription dynamic programming
SMART_READER_LITE
LIVE PREVIEW

Rhythm Transcription Dynamic Programming Graduate School of Culture - - PowerPoint PPT Presentation

GCT634: Musical Applications of Machine Learning Rhythm Transcription Dynamic Programming Graduate School of Culture Technology, KAIST Juhan Nam Outlines Overview of Automatic Music Transcription (AMT) - Types of AMT Tasks Rhythmic


slide-1
SLIDE 1

GCT634: Musical Applications of Machine Learning

Rhythm Transcription Dynamic Programming

Graduate School of Culture Technology, KAIST Juhan Nam

slide-2
SLIDE 2

Outlines

  • Overview of Automatic Music Transcription (AMT)
  • Types of AMT Tasks
  • Rhythmic Transcription
  • Introduction
  • Onset detection
  • Tempo Estimation
  • Dynamic Programming
  • Beat Tracking
slide-3
SLIDE 3

Overview of Automatic Music Transcription (AMT)

  • Predicting musical score information from audio
  • Primary score information is note but they are arranged based on rhythm,

harmony and structure

  • Equivalent to automatic speech recognition (ASR) for speech signals

Model Beat Key Chord Structure Tempo Onsets

slide-4
SLIDE 4

Types of AMT Tasks

  • Rhythm transcription
  • Onset detection
  • Tempo estimation
  • Beat tracking
  • Tonal analysis
  • Key estimation
  • Chord recognition
  • Timbre analysis
  • Instrument identification
  • Note transcription
  • Monophonic note
  • Polyphonic note
  • Expression detection

(e.g. vibrato, pedal)

  • Structure analysis
  • Musical structure
  • Musical boundary / repetition

detection

  • Highlight detection
slide-5
SLIDE 5

Types of AMT Tasks

  • Rhythm transcription
  • Onset detection
  • Tempo estimation
  • Beat tracking
  • Tonal analysis
  • Key estimation
  • Chord recognition
  • Timbre analysis
  • Instrument identification
  • Note transcription
  • Monophonic note
  • Polyphonic note
  • Expression detection

(e.g. vibrato, pedal)

  • Structure analysis
  • Musical structure
  • Musical boundary / repetition

detection

  • Highlight detection

We will mainly focus on these topics!

slide-6
SLIDE 6

Overview of AMT Systems

  • Acoustic model
  • Estimate the target information given input audio (usually short segment)
  • Musical knowledge
  • Music theory (e.g. rhythm, harmony), performance (e.g. playability)
  • Prior/Lexical model
  • Statistical distribution of the score-level music information (e.g. chord

progression) Acoustic Model Musical Knowledge Transcription Model Beat, Tempo Key, Chords Notes Prior or Lexical Model

Audio-Level Score-Level

slide-7
SLIDE 7

Introduction to Rhythm

  • Rhythm
  • A strong, regular, and repeated pattern of sound
  • Distinguish music from speech
  • The most primitive and foundational element of music
  • Melody, harmony and other musical elements are arranged on the basis of

rhythm

  • Human and rhythm
  • Human has innate ability of rhythm perception: heart beat, walking
  • Associated with motor control: dance, labor song
slide-8
SLIDE 8

Introduction to Rhythm

  • Hierarchical structure of rhythm
  • Beat (tactus): the most prominent level,

foot tapping rate

  • Division (tatum): temporal atom, eighth
  • r sixteenth
  • Measure (bar): the unit of rhythm

pattern (and also harmonic changes)

  • Notations
  • Tempo: beats per minute, e.g. 90 bpm
  • Time signature: e.g. 4/4, 3/4, 6/8

[Wikipedia]

slide-9
SLIDE 9

Human Perception of Tempo

  • Mckinney and Moelant (2006)
  • Collect tapping data from 40 human subjects
  • Initial synchronization delay and anticipation (by tempo estimation)
  • Ambiguity in tempo: beat or its division ?

[D. Ellis’ e4896 slides]

slide-10
SLIDE 10

Overview of Rhythm Transcription Systems

  • Consists of several cascaded tasks that detect moments of

musical stress (accents) and their regularity

Beat Tracking Tempo Estimation Onset Detection Musical Knowledge

slide-11
SLIDE 11

Onset Detection

  • Identify the starting times of musical events
  • Notes, drum sounds
  • Types of onsets
  • Hard onsets: percussive sounds
  • Soft onsets: source-driven sounds (e.g. singing voice, woodwind, bowed

strings)

[M.Muller]

slide-12
SLIDE 12

Example: Onset Detection

1 2 3 4 5 6 −1 −0.5 0.5 1 time [sec] amplitude ? “Eat (꺼내먹어요) ” Zion.T

slide-13
SLIDE 13

Onset Detection Systems

  • Onset detection function (ODF)
  • Instantaneous measure of temporal change, often called “novelty” function
  • Types: time-domain energy, spectral or sub-band energy, phase difference
  • Decision algorithm
  • Ruled-based approach
  • Learning-based approach

Decision Algorithm Onset Detection Function Audio Representations

(Feature Extraction) (Classifier)

slide-14
SLIDE 14

Onset Detection Function (ODF)

  • Types of ODFs
  • Time-domain energy
  • Spectral or sub-band energy
  • Phase difference
slide-15
SLIDE 15

Time-Domain Onset Detection

  • Local energy
  • Usually have high energy at onsets
  • Effective for percussive sounds
  • Various versions
  • Frame-level energy
  • Half-wave rectification

𝑃𝐸𝐺(𝑜) = 𝐹 𝑜 = ) 𝑦 𝑜 + 𝑛 𝑥(𝑛) .

/ 012/

𝑃𝐸𝐺(𝑜) = 𝐼(𝐹 𝑜 + 1 − 𝐹 𝑜 )

𝐼 𝑠 = 𝑠 + 𝑠 2 = 8𝑠, 𝑠 ≥ 0 0, 𝑠 < 0

1 2 3 4 5 6 −1 −0.5 0.5 1 time [sec] amplitude

Waveform

1 2 3 4 5 6 5 10 15 20 time [sec] ODF 1 2 3 4 5 6 2 4 6 8 10 time [sec] ODF

slide-16
SLIDE 16

Spectral-Based Onset Detection

  • Spectral Flux
  • Sum of the positive differences from

log spectrogram

  • ODF changes depending on the

amount of compression 𝜍

time [sec] frequency−kHz 1 2 3 4 5 0.5 1 1.5 2 x 10

4

1 2 3 4 5 100 200 300 400 time [sec] ODF

𝑃𝐸𝐺(𝑜) = ) 𝐼(𝑍 𝑜 + 1, 𝑙 − 𝑍 𝑜, 𝑙 )

/2A B1C

𝑍 𝑜, 𝑙 = log 1 + 𝜍 𝑌 𝑜, 𝑙 𝑌 𝑜, 𝑙 : STFT

slide-17
SLIDE 17

Phase Deviation

  • Sinusoidal components of a note is continuous while the note is

sustained

  • Abrupt change in phase means that there may be a new event

[D. Ellis’ e4896 slides] Deviation from the steady-state for all frequency bins

ϕk(n)−ϕk(n −1) ≈ ϕk(n −1)−ϕk(n − 2)

Phase continuation (e.g. during sustain of a single note)

Δϕk(n) =ϕk(n)− 2ϕk(n −1)+ϕk(n − 2) ≈ 0 ζ p = 1 N Δϕk(n)

k=1 N

slide-18
SLIDE 18

Post-Processing

  • DC removal
  • Subtract the mean of ODF
  • Normalization
  • Scaling level of ODF
  • Low-pass filtering
  • Remove small peaks
  • Down-sampling
  • For data reduction

Low-pass Filtering (Solid line)

(Tzanetakis, 2010)

slide-19
SLIDE 19

Onset Decision Algorithm

  • Rule-based Approach: peak detection rule
  • Peaks above thresholds are determined as onsets
  • The thresholds are often adaptively computed from the ODF
  • Averaging and median are popular choices to compute the thresholds

threshold =α + β ⋅median(ODF) α :offset,β :scaling

Median with window size 5

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

time [sec]

50 100 150 200 250 300 350

ODF

ODF Threshold

slide-20
SLIDE 20

Challenging Issue in Onset Detection: Vibrato

Onset detection using spectral flux

slide-21
SLIDE 21

SuperFlux

  • A state-of-the-art rule-based onset detection function
  • S. Bock et al., “Maximum Filter Vibrato Suppression For Onset Detection”,

DAFx, 2013

  • Step1: log-spectrogram
  • Make harmonic partials have the same depth of vibrato contour
  • Step2: max-filtering
  • Take the maximum in a window on the frequency axis
  • The vibrato contours become thicker

𝑍 𝑜, 𝑛 = log 1 + 𝑌 𝑜, 𝑙 L 𝐺 𝑙, 𝑛 𝑌 𝑜, 𝑙 : STFT 𝑍

0MN 𝑜, 𝑛 = max

(𝑍 𝑜, 𝑛 − 𝑚: 𝑛 + 𝑚 )

slide-22
SLIDE 22

SuperFlux

  • A state-of-the-art rule-based onset detection function
  • S. Bock et al., “Maximum Filter Vibrato Suppression For Onset Detection”,

DAFx, 2013

  • Step1: log-spectrogram
  • Make harmonic partials have the same depth of vibrato contours
  • Step2: max-filtering
  • Take the maximum in a window on the frequency axis
  • The vibrato contours become thicker

𝑍 𝑜, 𝑛 = log 1 + 𝑌 𝑜, 𝑙 L 𝐺 𝑙, 𝑛 𝑌 𝑜, 𝑙 : STFT 𝑍

0MN 𝑜, 𝑛 = max

(𝑍 𝑜, 𝑛 − 𝑚: 𝑛 + 𝑚 )

slide-23
SLIDE 23

SuperFlux

Log-spectrogram Max-filtered Log-spectrogram

slide-24
SLIDE 24

SuperFlux

  • Step3: Super-flux
  • Take the difference with some distance
  • Assumption: frame-rate is high in onset detection (i.e. small hop size)
  • Step 4: pick-picking
  • 1) 𝑇𝐺∗ (𝑜) = max

(𝑇𝐺∗ 𝑜 − 𝑞𝑠𝑓0MN: 𝑜 + 𝑞𝑝𝑡𝑢0MN )

  • 2) 𝑇𝐺∗ (𝑜) ≥ mean

(𝑇𝐺∗ 𝑜 − 𝑞𝑠𝑓M\]: 𝑜 + 𝑞𝑝𝑡𝑢M\] ) + 𝜀

  • 3) 𝑜 − 𝑜_`a\bcde2cfeag > 𝑑𝑝𝑛𝑐𝑗𝑜𝑏𝑢𝑗𝑝𝑜 𝑥𝑗𝑒𝑢ℎ

𝑇𝐺∗(𝑜) = ) 𝐼(𝑍 𝑜 + 𝜈, 𝑙 − 𝑍 𝑜, 𝑙 )

/2A B1C

𝜈 = max (1, (𝑂 2 − min 𝑜 𝑥 𝑜 > 𝑠 ) ℎ + 0.5 (0 ≤ 𝑠 ≤ 1)

slide-25
SLIDE 25

SuperFlux

Peak-picking Max-filtered Log-spectrogram

slide-26
SLIDE 26

Tempo Estimation

  • Estimate a regular time interval between beats
  • Tempo is a global attribute of a song: e.g. bpm or mid-tempo song
  • Tempo often changes within a song
  • Intentionally: e.g. dramatic effect: Top 10 tempo changes
  • Unintentionally: e.g. re-mastering, live performance
  • There are also local tempo changes: e.g. rubato
slide-27
SLIDE 27

Tempo Estimation Methods

  • Auto-Correlation
  • Find the periodicity as used in pitch detection
  • Discrete Fourier Transform
  • Use DFT over ODF and find the periodicity
  • Comb-filter Banks
  • Leverage the “oscillating nature” of musical beats
slide-28
SLIDE 28

Auto-Correlation

  • ACF is a generic method to detect periodicity of a signal
  • Thus, this can be applied to ODF to find a dominant period that may

correspond to tempo

  • The ACF shows the dominant peaks that indicate dominant tempi

1 2 3 4 5 −1 1 2 3 x 10

5

time [sec] ODF 1 2 3 4 5 100 200 300 400 time [sec] ODF

Onset Detection Function (spectral flux) Auto-Correlation

slide-29
SLIDE 29

Tempo Estimation Using Tempo Prior

  • Tempo is estimated by multiplying the prior with the auto-

correlation (observation)

  • The auto-correlation corresponds to a likelihood function
  • Tempo prior can be calculated from beat annotations of a dataset
  • The distribution fits to a log-normal distribution well

Histogram of beats from a dataset

[D. Ellis’ e4896 slides] (Klapuri, 2003)

slide-30
SLIDE 30

Beat Spectrum

  • Leverage the repetitive nature of music
  • Algorithm
  • Step1: compute cosine distance between two

frames of magnitude spectrogram

  • Step 2: sum the elements on the diagonals

(Foote, 2001)

𝑇(𝑗, 𝑘) = 𝑊

b L 𝑊 b

𝑊

b

L 𝑊

w

𝐶(𝑚) = ) 𝑇(𝑙, 𝑙 + 𝑚)

  • B
slide-31
SLIDE 31

Beat Spectrum

  • A more robust version can be obtained

from the 2D auto-correlation of the similarity matrix

  • The final beat spectrum is derived by

summing over one axis

  • The left plot shows five beats and a triplet

within a beat.

  • “Beat spectrogram” can be also
  • btained by successive beat spectra

𝐶(𝑙, 𝑚) = ) 𝑇(𝑗, 𝑘) L 𝑇(𝑗 + 𝑙, 𝑘 + 𝑚)

  • b,w

(Foote, 2001) Five beats and a triplet within a beat

slide-32
SLIDE 32

Tempogram

  • Algorithm
  • Step 1: compute ODF from the half-wave

rectified spectral flux

  • Step2: obtain the frequency and phase

that provide the maximum correlation with for the ODF and form a local sinusoidal kernel

  • Step 3: accumulate the successive local

sinusoidal kernels to form a PLP curve

  • Step 4: take DFT or auto-correlation

(Grosche, 2009)

k(m) = w(m −n)cos(2π( ˆ wm − ˆ ϕ))

  • Modeling the onset function using sinusoid as predominant local

periodicity (PLP)

slide-33
SLIDE 33

Tempogram

  • Cyclic Tempogram
  • Accumulate the tempogram

for integer multiples of a tempo (up to four octaves)

  • Conceptually similar to

“Chromagram”

(Grosche, 2011)

slide-34
SLIDE 34

Comb-Filter Banks

  • Also called resonant filter banks
  • Comb filter equation
  • Builds up rhythmic evidences

(by anticipation?)

(Klapuri, 2006)

𝑧 𝑜 = 𝑦 𝑜 + 𝛽𝑧 𝑜 − 𝜐

slide-35
SLIDE 35

Sub-band Resonant Filter Banks

  • Algorithm
  • A sub-band filter bank as a front-end

processing

  • Parallel ODFs for 6 bands
  • 150 resonators for each band and all

possible tempo values (60 - 240 bpm)

  • Pick up the delay that provides the

highest peak as a tempo

(Scheirer, 1998)

slide-36
SLIDE 36

Beat Tracking

  • Estimate the position of beats in music
  • Usually a subset of detected onsets selected by the tempo
slide-37
SLIDE 37

Beat Tracking by the Resonator Model

  • Once the resonator model chooses

the tempo that returns the highest peaks, the output produces a sequence of resonated peaks

  • They correspond to the beats

(Scheirer, 1998)

slide-38
SLIDE 38
  • Find the optimal “hopping” path on music (Ellis, 2007)
  • 𝐷

𝑢b : cost of the path 𝑢b

  • 𝑃 𝑢b : onset strength function (i.e. ODF)
  • 𝐺(∆𝑢, 𝑈): tempo (𝑈) consistency score: e.g. 𝐺 ∆𝑢, 𝑈 = −(𝑚𝑝𝑕

∆g ‚ ).

Beat Tracking by Dynamic Programming

𝐷 𝑢b = ) 𝑃 𝑢b

ƒ b1A

+ 𝛽 ) 𝐺 𝑢b − 𝑢b2A, 𝑈

ƒ b1.

. . .

1

slide-39
SLIDE 39

Finding the Minimum-Cost-Path

  • Naïve approach
  • Find all paths from A to K and calculate the cost for each, and choose the

path that has the minimum cost.

  • As the number of nodes increases, the number of possible paths increases

exponentially

A C B D E F G H 2 4 3 3 6 2 4 2 2 3 2 5 4 1 2 3 3 1 5 3 I J K 7 4 5 6 3 3 5 7 4 3 2 3 2

slide-40
SLIDE 40

Dynamic Programming (DP)

  • Observation
  • Say the minimum-cost-path passes by a node p,
  • What is the minimum-cost-path from A to p ?
  • It is just a sub-path of the minimum-cost-path from A to K.
  • Thus, we don’t have to compute the cost from scratch; we can use the cost

computed from the previous nodes.

A C B D E F G H 2 4 3 3 6 2 4 2 2 3 2 5 4 1 2 3 3 1 5 3 I J K 7 4 5 6 3 3 5 7 4 3 2 3 2

slide-41
SLIDE 41

Dynamic Programming (DP)

  • The minimum cost is computed by the following equation:
  • The minimum-cost-path can be found by tracing back the

computation

Ck( j) = Ok( j)+ min

i {Ck−1(i)+cij}

Ck( j) Ok( j)

: cost up to node j : local cost at node j

cij : transition cost from i to j

A C B D E F G H 2 4 3 3 6 2 4 2 2 3 2 5 4 1 2 3 3 1 5 3 I J K 7 4 5 6 3 3 5 7 4 3 2 3 2

slide-42
SLIDE 42

Applying DP to Beat Tracking

  • To optimize:
  • Define 𝐷∗ 𝑢 as best score up to time 𝑢 and compute it for every 𝑢
  • Also, store the time that returns maximum score 𝑄 𝑢
  • At the end of the sequence, traceback 𝑄 𝑢 , which returns the best path 𝑢b

𝐷 𝑢b = ) 𝑃 𝑢b

ƒ b1A

+ 𝛽 ) 𝐺 𝑢b − 𝑢b2A, 𝑈

ƒ b1.

𝐷∗ 𝑢 = 𝑃 𝑢 + max

… {𝛽𝐺 𝑢 − 𝜐, 𝑈 + 𝐷∗ 𝜐 }

𝑄 𝑢 = argmax

{𝛽𝐺 𝑢 − 𝜐, 𝑈 + 𝐷∗ 𝜐 }

1 2 3 4 5 100 200 300 400 time [sec] ODF

𝑢 𝜐 𝐷∗ 𝑢

slide-43
SLIDE 43

Example of DP to Beat Tracking

slide-44
SLIDE 44

References

  • E. Scheirer, “Tempo and Beat Analysis of Acoustic Musical Signals”, 1998
  • J. Foote and S. Uchihashi, “The Beat Spectrum: A New Approach to Rhythm

Analysis”, 2001

  • G. Tzanekatis, “Musical Genre Classification of Audio Signals”, 2002
  • A. Klapuri, “Analysis of the Meter of Acoustic Musical Signals”, 2006
  • P. Grosche and M. Muller, “Computing Predominant Local Periodicity

Information In Music Recordings”, 2009

  • P. Grosche and M. Muller, “Cyclic Tempogram – A Mid-Level Tempo

Representation For Music Signals”, 2010

  • D. Ellis, “Beat Tracking by Dynamics Programming”, 2007
  • S. Bock and G. Widmer, “Maximum Filter Vibrato Suppression For Onset

Detection”, 2013