Rhythm Transcription Dynamic Programming Graduate School of Culture - PowerPoint PPT Presentation

GCT634: Musical Applications of Machine Learning Rhythm Transcription Dynamic Programming Graduate School of Culture Technology, KAIST Juhan Nam

Outlines • Overview of Automatic Music Transcription (AMT) - Types of AMT Tasks • Rhythmic Transcription - Introduction - Onset detection - Tempo Estimation • Dynamic Programming - Beat Tracking

Overview of Automatic Music Transcription (AMT) • Predicting musical score information from audio - Primary score information is note but they are arranged based on rhythm, harmony and structure - Equivalent to automatic speech recognition (ASR) for speech signals Beat Onsets Tempo Model Chord Key Structure

Types of AMT Tasks • Rhythm transcription • Note transcription - Onset detection - Monophonic note - Tempo estimation - Polyphonic note - Beat tracking - Expression detection (e.g. vibrato, pedal) • Tonal analysis • Structure analysis - Key estimation - Musical structure - Chord recognition - Musical boundary / repetition detection • Timbre analysis - Highlight detection - Instrument identification

Types of AMT Tasks • Rhythm transcription • Note transcription - Onset detection - Monophonic note - Tempo estimation - Polyphonic note - Beat tracking - Expression detection (e.g. vibrato, pedal) • Tonal analysis • Structure analysis - Key estimation - Musical structure - Chord recognition - Musical boundary / repetition detection • Timbre analysis - Highlight detection - Instrument identification We will mainly focus on these topics!

Overview of AMT Systems • Acoustic model - Estimate the target information given input audio (usually short segment) • Musical knowledge - Music theory (e.g. rhythm, harmony), performance (e.g. playability) • Prior/Lexical model - Statistical distribution of the score-level music information (e.g. chord progression) Score-Level Musical Prior or Lexical Model Knowledge Beat, Tempo Transcription Acoustic Key, Chords Model Model Notes Audio-Level

Introduction to Rhythm • Rhythm - A strong, regular, and repeated pattern of sound - Distinguish music from speech • The most primitive and foundational element of music - Melody, harmony and other musical elements are arranged on the basis of rhythm • Human and rhythm - Human has innate ability of rhythm perception: heart beat, walking - Associated with motor control: dance, labor song

Introduction to Rhythm • Hierarchical structure of rhythm - Beat (tactus): the most prominent level, foot tapping rate - Division (tatum): temporal atom, eighth or sixteenth - Measure (bar): the unit of rhythm pattern (and also harmonic changes) • Notations - Tempo: beats per minute, e.g. 90 bpm - Time signature: e.g. 4/4, 3/4, 6/8 [Wikipedia]

Human Perception of Tempo • Mckinney and Moelant (2006) - Collect tapping data from 40 human subjects - Initial synchronization delay and anticipation (by tempo estimation) - Ambiguity in tempo: beat or its division ? [D. Ellis’ e4896 slides]

Overview of Rhythm Transcription Systems • Consists of several cascaded tasks that detect moments of musical stress (accents) and their regularity Tempo Beat Onset Estimation Tracking Detection Musical Knowledge

Onset Detection • Identify the starting times of musical events - Notes, drum sounds [M.Muller] • Types of onsets - Hard onsets: percussive sounds - Soft onsets: source-driven sounds (e.g. singing voice, woodwind, bowed strings)

Example: Onset Detection 1 0.5 amplitude 0 − 0.5 “Eat ( 꺼내먹어요 ) ” Zion.T − 1 0 1 2 3 4 5 6 ? time [sec]

Onset Detection Systems Onset Detection Audio Decision Function Representations Algorithm (Feature Extraction) (Classifier) • Onset detection function (ODF) - Instantaneous measure of temporal change, often called “novelty” function - Types: time-domain energy, spectral or sub-band energy, phase difference • Decision algorithm - Ruled-based approach - Learning-based approach

Onset Detection Function (ODF) • Types of ODFs - Time-domain energy - Spectral or sub-band energy - Phase difference

Time-Domain Onset Detection Waveform 1 • Local energy 0.5 amplitude - Usually have high energy at onsets 0 - Effective for percussive sounds − 0.5 − 1 0 1 2 3 4 5 6 time [sec] • Various versions - Frame-level energy 20 15 / ODF 10 𝑦 𝑜 + 𝑛 𝑥(𝑛) . 𝑃𝐸𝐺(𝑜) = 𝐹 𝑜 = ) 5 012/ 0 0 1 2 3 4 5 6 time [sec] - Half-wave rectification 10 8 𝑃𝐸𝐺(𝑜) = 𝐼(𝐹 𝑜 + 1 − 𝐹 𝑜 ) 6 ODF 4 𝐼 𝑠 = 𝑠 + 𝑠 = 8𝑠, 𝑠 ≥ 0 2 0, 𝑠 < 0 2 0 0 1 2 3 4 5 6 time [sec]

Spectral-Based Onset Detection • Spectral Flux 4 x 10 - Sum of the positive differences from 2 log spectrogram 1.5 - ODF changes depending on the frequency − kHz amount of compression 𝜍 1 0.5 𝑍 𝑜, 𝑙 = log 1 + 𝜍 𝑌 𝑜, 𝑙 𝑌 𝑜, 𝑙 : STFT 0 1 2 3 4 5 time [sec] 400 /2A 300 𝑃𝐸𝐺(𝑜) = ) 𝐼(𝑍 𝑜 + 1, 𝑙 − 𝑍 𝑜, 𝑙 ) ODF 200 B1C 100 0 0 1 2 3 4 5 time [sec]

Phase Deviation • Sinusoidal components of a note is continuous while the note is sustained - Abrupt change in phase means that there may be a new event [D. Ellis’ e4896 slides] ϕ k ( n ) − ϕ k ( n − 1) ≈ ϕ k ( n − 1) − ϕ k ( n − 2) Phase continuation (e.g. during sustain of a single note) Δ ϕ k ( n ) = ϕ k ( n ) − 2 ϕ k ( n − 1) + ϕ k ( n − 2) ≈ 0 N ζ p = 1 Deviation from the steady-state ∑ Δ ϕ k ( n ) for all frequency bins N k = 1

Post-Processing • DC removal - Subtract the mean of ODF • Normalization - Scaling level of ODF • Low-pass filtering - Remove small peaks • Down-sampling - For data reduction Low-pass Filtering (Solid line) (Tzanetakis, 2010)

Onset Decision Algorithm • Rule-based Approach: peak detection rule - Peaks above thresholds are determined as onsets - The thresholds are often adaptively computed from the ODF - Averaging and median are popular choices to compute the thresholds threshold = α + β ⋅ median( ODF ) α :offset, β :scaling 350 ODF 300 Threshold 250 200 ODF 150 100 50 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 time [sec] Median with window size 5

Challenging Issue in Onset Detection: Vibrato Onset detection using spectral flux

SuperFlux • A state-of-the-art rule-based onset detection function - S. Bock et al., “Maximum Filter Vibrato Suppression For Onset Detection”, DAFx, 2013 • Step1: log-spectrogram - Make harmonic partials have the same depth of vibrato contour 𝑍 𝑜, 𝑛 = log 1 + 𝑌 𝑜, 𝑙 L 𝐺 𝑙, 𝑛 𝑌 𝑜, 𝑙 : STFT • Step2: max-filtering - Take the maximum in a window on the frequency axis - The vibrato contours become thicker 𝑍 0MN 𝑜, 𝑛 = max (𝑍 𝑜, 𝑛 − 𝑚: 𝑛 + 𝑚 )

SuperFlux • A state-of-the-art rule-based onset detection function - S. Bock et al., “Maximum Filter Vibrato Suppression For Onset Detection”, DAFx, 2013 • Step1: log-spectrogram - Make harmonic partials have the same depth of vibrato contours 𝑍 𝑜, 𝑛 = log 1 + 𝑌 𝑜, 𝑙 L 𝐺 𝑙, 𝑛 𝑌 𝑜, 𝑙 : STFT • Step2: max-filtering - Take the maximum in a window on the frequency axis - The vibrato contours become thicker 𝑍 0MN 𝑜, 𝑛 = max (𝑍 𝑜, 𝑛 − 𝑚: 𝑛 + 𝑚 )

SuperFlux Log-spectrogram Max-filtered Log-spectrogram

SuperFlux • Step3: Super-flux - Take the difference with some distance - Assumption: frame-rate is high in onset detection (i.e. small hop size) /2A (𝑂 2 − min 𝑜 𝑥 𝑜 > 𝑠 ) 𝑇𝐺 ∗ (𝑜) = ) 𝐼(𝑍 𝑜 + 𝜈, 𝑙 − 𝑍 𝑜, 𝑙 ) 𝜈 = max (1, + 0.5 ℎ B1C (0 ≤ 𝑠 ≤ 1) • Step 4: pick-picking (𝑇𝐺 ∗ 𝑜 − 𝑞𝑠𝑓 0MN : 𝑜 + 𝑞𝑝𝑡𝑢 0MN ) - 1) 𝑇𝐺 ∗ (𝑜) = max (𝑇𝐺 ∗ 𝑜 − 𝑞𝑠𝑓 M\] : 𝑜 + 𝑞𝑝𝑡𝑢 M\] ) + 𝜀 - 2) 𝑇𝐺 ∗ (𝑜) ≥ mean - 3) 𝑜 − 𝑜 _`a\bcde2cfeag > 𝑑𝑝𝑛𝑐𝑗𝑜𝑏𝑢𝑗𝑝𝑜 𝑥𝑗𝑒𝑢ℎ

SuperFlux Max-filtered Log-spectrogram Peak-picking

Tempo Estimation • Estimate a regular time interval between beats - Tempo is a global attribute of a song: e.g. bpm or mid-tempo song • Tempo often changes within a song - Intentionally: e.g. dramatic effect: Top 10 tempo changes - Unintentionally: e.g. re-mastering, live performance • There are also local tempo changes: e.g. rubato

Tempo Estimation Methods • Auto-Correlation - Find the periodicity as used in pitch detection • Discrete Fourier Transform - Use DFT over ODF and find the periodicity • Comb-filter Banks - Leverage the “oscillating nature” of musical beats

Auto-Correlation • ACF is a generic method to detect periodicity of a signal - Thus, this can be applied to ODF to find a dominant period that may correspond to tempo - The ACF shows the dominant peaks that indicate dominant tempi 5 3 x 10 400 300 2 ODF ODF 200 1 100 0 − 1 0 0 1 2 3 4 5 0 1 2 3 4 5 time [sec] time [sec] Onset Detection Function (spectral flux) Auto-Correlation

Rhythm Transcription Dynamic Programming Graduate School of Culture - PowerPoint PPT Presentation

GCT634: Musical Applications of Machine Learning Rhythm Transcription Dynamic Programming Graduate School of Culture Technology, KAIST Juhan Nam Outlines Overview of Automatic Music Transcription (AMT) - Types of AMT Tasks Rhythmic

Music Psychologist October 2017 Overview Musical rhythm: Introduction Rhythm and movement

Engaging Children with Rhythm Todays Speaker is Norm Jones from The Rhythm Child Network

Phonics Step 1: Listening to rhyme and rhythm Hear rhyme in words Continue a rhyming

TFClass a classifjcation of transcription factors Jrgen Dnitz, Edgar Wingender T

Unsupervised Piano Music Transcription Taylor Berg-Kirkpatrick Jacob Andreas and Dan Klein

Wayne Snyder Computer Science Department Boston University Today: Analyzing Rhythm Analyzing

DFW Experience With Rhythm Adaptive Signal Control Rhythm Adaptive Signal Control April 13 2012

Speech Rhythm and Reading Development Potential for Intervention? Prof. Clare Wood Overview

Elected Member Creating the rhythm Work Plan V.2 Tatau tatau - we together To step out the

Rhythm Rhythm We use symbols to show the stressed and unstressed syllables within a word.

A Parse-based Framework for Coupled Rhythm Quantization and Score Structuring Francesco Foscarin

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Theoretical Biology 2016 Transcription factors bind DNA to block or enhance transcription

Transcription: Pausing and Backtracking: Error Correction Mamata Sahoo and Stefan Klumpp Theory

FROM DRUM TRANSCRIPTION TO DRUM PATTERN VARIATION Richard Vogl richard.vogl@tuwien.ac.at PART 1

Tutorial: Music Signal Processing Mark Plumbley and Simon Dixon {mark.plumbley,

Synthesis and physicochemical properties of non-ionic and cationic surfactants derived from

Northamptonshire CCGs 2018/19 Annual General Meeting 5 September 2019 Welcome to our first

The Sustainable Laboratory: Perspectives on Laboratory Outreach Suzanne Carasso, MBA, MT (ASCP)

Through the Looking Glass Applying Analytics to Development Michael Feathers Independent

ECDA Conference 2014 Presented by: Dr. Carol Loy, Kinderland Educare Services Dr. Noel Chia,

Effects of Feedback on Eye Typing with a Short Dwell Time Pivi Majaranta, Anne Aula, and

Nikon Multimedia Event Detection System Takeshi Matsuo and Shinich Nakajima Optical Research