rhythm transcription dynamic programming
play

Rhythm Transcription Dynamic Programming Graduate School of Culture - PowerPoint PPT Presentation

GCT634: Musical Applications of Machine Learning Rhythm Transcription Dynamic Programming Graduate School of Culture Technology, KAIST Juhan Nam Outlines Overview of Automatic Music Transcription (AMT) - Types of AMT Tasks Rhythmic


  1. GCT634: Musical Applications of Machine Learning Rhythm Transcription Dynamic Programming Graduate School of Culture Technology, KAIST Juhan Nam

  2. Outlines • Overview of Automatic Music Transcription (AMT) - Types of AMT Tasks • Rhythmic Transcription - Introduction - Onset detection - Tempo Estimation • Dynamic Programming - Beat Tracking

  3. Overview of Automatic Music Transcription (AMT) • Predicting musical score information from audio - Primary score information is note but they are arranged based on rhythm, harmony and structure - Equivalent to automatic speech recognition (ASR) for speech signals Beat Onsets Tempo Model Chord Key Structure

  4. Types of AMT Tasks • Rhythm transcription • Note transcription - Onset detection - Monophonic note - Tempo estimation - Polyphonic note - Beat tracking - Expression detection (e.g. vibrato, pedal) • Tonal analysis • Structure analysis - Key estimation - Musical structure - Chord recognition - Musical boundary / repetition detection • Timbre analysis - Highlight detection - Instrument identification

  5. Types of AMT Tasks • Rhythm transcription • Note transcription - Onset detection - Monophonic note - Tempo estimation - Polyphonic note - Beat tracking - Expression detection (e.g. vibrato, pedal) • Tonal analysis • Structure analysis - Key estimation - Musical structure - Chord recognition - Musical boundary / repetition detection • Timbre analysis - Highlight detection - Instrument identification We will mainly focus on these topics!

  6. Overview of AMT Systems • Acoustic model - Estimate the target information given input audio (usually short segment) • Musical knowledge - Music theory (e.g. rhythm, harmony), performance (e.g. playability) • Prior/Lexical model - Statistical distribution of the score-level music information (e.g. chord progression) Score-Level Musical Prior or Lexical Model Knowledge Beat, Tempo Transcription Acoustic Key, Chords Model Model Notes Audio-Level

  7. Introduction to Rhythm • Rhythm - A strong, regular, and repeated pattern of sound - Distinguish music from speech • The most primitive and foundational element of music - Melody, harmony and other musical elements are arranged on the basis of rhythm • Human and rhythm - Human has innate ability of rhythm perception: heart beat, walking - Associated with motor control: dance, labor song

  8. Introduction to Rhythm • Hierarchical structure of rhythm - Beat (tactus): the most prominent level, foot tapping rate - Division (tatum): temporal atom, eighth or sixteenth - Measure (bar): the unit of rhythm pattern (and also harmonic changes) • Notations - Tempo: beats per minute, e.g. 90 bpm - Time signature: e.g. 4/4, 3/4, 6/8 [Wikipedia]

  9. Human Perception of Tempo • Mckinney and Moelant (2006) - Collect tapping data from 40 human subjects - Initial synchronization delay and anticipation (by tempo estimation) - Ambiguity in tempo: beat or its division ? [D. Ellis’ e4896 slides]

  10. Overview of Rhythm Transcription Systems • Consists of several cascaded tasks that detect moments of musical stress (accents) and their regularity Tempo Beat Onset Estimation Tracking Detection Musical Knowledge

  11. Onset Detection • Identify the starting times of musical events - Notes, drum sounds [M.Muller] • Types of onsets - Hard onsets: percussive sounds - Soft onsets: source-driven sounds (e.g. singing voice, woodwind, bowed strings)

  12. Example: Onset Detection 1 0.5 amplitude 0 − 0.5 “Eat ( 꺼내먹어요 ) ” Zion.T − 1 0 1 2 3 4 5 6 ? time [sec]

  13. Onset Detection Systems Onset Detection Audio Decision Function Representations Algorithm (Feature Extraction) (Classifier) • Onset detection function (ODF) - Instantaneous measure of temporal change, often called “novelty” function - Types: time-domain energy, spectral or sub-band energy, phase difference • Decision algorithm - Ruled-based approach - Learning-based approach

  14. Onset Detection Function (ODF) • Types of ODFs - Time-domain energy - Spectral or sub-band energy - Phase difference

  15. Time-Domain Onset Detection Waveform 1 • Local energy 0.5 amplitude - Usually have high energy at onsets 0 - Effective for percussive sounds − 0.5 − 1 0 1 2 3 4 5 6 time [sec] • Various versions - Frame-level energy 20 15 / ODF 10 𝑦 𝑜 + 𝑛 𝑥(𝑛) . 𝑃𝐸𝐺(𝑜) = 𝐹 𝑜 = ) 5 012/ 0 0 1 2 3 4 5 6 time [sec] - Half-wave rectification 10 8 𝑃𝐸𝐺(𝑜) = 𝐼(𝐹 𝑜 + 1 − 𝐹 𝑜 ) 6 ODF 4 𝐼 𝑠 = 𝑠 + 𝑠 = 8𝑠, 𝑠 ≥ 0 2 0, 𝑠 < 0 2 0 0 1 2 3 4 5 6 time [sec]

  16. Spectral-Based Onset Detection • Spectral Flux 4 x 10 - Sum of the positive differences from 2 log spectrogram 1.5 - ODF changes depending on the frequency − kHz amount of compression 𝜍 1 0.5 𝑍 𝑜, 𝑙 = log 1 + 𝜍 𝑌 𝑜, 𝑙 𝑌 𝑜, 𝑙 : STFT 0 1 2 3 4 5 time [sec] 400 /2A 300 𝑃𝐸𝐺(𝑜) = ) 𝐼(𝑍 𝑜 + 1, 𝑙 − 𝑍 𝑜, 𝑙 ) ODF 200 B1C 100 0 0 1 2 3 4 5 time [sec]

  17. Phase Deviation • Sinusoidal components of a note is continuous while the note is sustained - Abrupt change in phase means that there may be a new event [D. Ellis’ e4896 slides] ϕ k ( n ) − ϕ k ( n − 1) ≈ ϕ k ( n − 1) − ϕ k ( n − 2) Phase continuation (e.g. during sustain of a single note) Δ ϕ k ( n ) = ϕ k ( n ) − 2 ϕ k ( n − 1) + ϕ k ( n − 2) ≈ 0 N ζ p = 1 Deviation from the steady-state ∑ Δ ϕ k ( n ) for all frequency bins N k = 1

  18. Post-Processing • DC removal - Subtract the mean of ODF • Normalization - Scaling level of ODF • Low-pass filtering - Remove small peaks • Down-sampling - For data reduction Low-pass Filtering (Solid line) (Tzanetakis, 2010)

  19. Onset Decision Algorithm • Rule-based Approach: peak detection rule - Peaks above thresholds are determined as onsets - The thresholds are often adaptively computed from the ODF - Averaging and median are popular choices to compute the thresholds threshold = α + β ⋅ median( ODF ) α :offset, β :scaling 350 ODF 300 Threshold 250 200 ODF 150 100 50 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 time [sec] Median with window size 5

  20. Challenging Issue in Onset Detection: Vibrato Onset detection using spectral flux

  21. SuperFlux • A state-of-the-art rule-based onset detection function - S. Bock et al., “Maximum Filter Vibrato Suppression For Onset Detection”, DAFx, 2013 • Step1: log-spectrogram - Make harmonic partials have the same depth of vibrato contour 𝑍 𝑜, 𝑛 = log 1 + 𝑌 𝑜, 𝑙 L 𝐺 𝑙, 𝑛 𝑌 𝑜, 𝑙 : STFT • Step2: max-filtering - Take the maximum in a window on the frequency axis - The vibrato contours become thicker 𝑍 0MN 𝑜, 𝑛 = max (𝑍 𝑜, 𝑛 − 𝑚: 𝑛 + 𝑚 )

  22. SuperFlux • A state-of-the-art rule-based onset detection function - S. Bock et al., “Maximum Filter Vibrato Suppression For Onset Detection”, DAFx, 2013 • Step1: log-spectrogram - Make harmonic partials have the same depth of vibrato contours 𝑍 𝑜, 𝑛 = log 1 + 𝑌 𝑜, 𝑙 L 𝐺 𝑙, 𝑛 𝑌 𝑜, 𝑙 : STFT • Step2: max-filtering - Take the maximum in a window on the frequency axis - The vibrato contours become thicker 𝑍 0MN 𝑜, 𝑛 = max (𝑍 𝑜, 𝑛 − 𝑚: 𝑛 + 𝑚 )

  23. SuperFlux Log-spectrogram Max-filtered Log-spectrogram

  24. SuperFlux • Step3: Super-flux - Take the difference with some distance - Assumption: frame-rate is high in onset detection (i.e. small hop size) /2A (𝑂 2 − min 𝑜 𝑥 𝑜 > 𝑠 ) 𝑇𝐺 ∗ (𝑜) = ) 𝐼(𝑍 𝑜 + 𝜈, 𝑙 − 𝑍 𝑜, 𝑙 ) 𝜈 = max (1, + 0.5 ℎ B1C (0 ≤ 𝑠 ≤ 1) • Step 4: pick-picking (𝑇𝐺 ∗ 𝑜 − 𝑞𝑠𝑓 0MN : 𝑜 + 𝑞𝑝𝑡𝑢 0MN ) - 1) 𝑇𝐺 ∗ (𝑜) = max (𝑇𝐺 ∗ 𝑜 − 𝑞𝑠𝑓 M\] : 𝑜 + 𝑞𝑝𝑡𝑢 M\] ) + 𝜀 - 2) 𝑇𝐺 ∗ (𝑜) ≥ mean - 3) 𝑜 − 𝑜 _`a\bcde2cfeag > 𝑑𝑝𝑛𝑐𝑗𝑜𝑏𝑢𝑗𝑝𝑜 𝑥𝑗𝑒𝑢ℎ

  25. SuperFlux Max-filtered Log-spectrogram Peak-picking

  26. Tempo Estimation • Estimate a regular time interval between beats - Tempo is a global attribute of a song: e.g. bpm or mid-tempo song • Tempo often changes within a song - Intentionally: e.g. dramatic effect: Top 10 tempo changes - Unintentionally: e.g. re-mastering, live performance • There are also local tempo changes: e.g. rubato

  27. Tempo Estimation Methods • Auto-Correlation - Find the periodicity as used in pitch detection • Discrete Fourier Transform - Use DFT over ODF and find the periodicity • Comb-filter Banks - Leverage the “oscillating nature” of musical beats

  28. Auto-Correlation • ACF is a generic method to detect periodicity of a signal - Thus, this can be applied to ODF to find a dominant period that may correspond to tempo - The ACF shows the dominant peaks that indicate dominant tempi 5 3 x 10 400 300 2 ODF ODF 200 1 100 0 − 1 0 0 1 2 3 4 5 0 1 2 3 4 5 time [sec] time [sec] Onset Detection Function (spectral flux) Auto-Correlation

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend