Polyphonic Music Transcription Non-negative Matrix Factorization - PowerPoint PPT Presentation

GCT634: Musical Applications of Machine Learning Polyphonic Music Transcription Non-negative Matrix Factorization Graduate School of Culture Technology, KAIST Juhan Nam

Outlines • Introduction • Score-Audio Alignment • Multi-Pitch Estimation • Non-negative Matrix Factorization (NMF)

Polyphonic Music Transcription • Converting an acoustic musical signal into some form of music notation - MIDI piano roll, staff notation - Note information: pitch, onset, offset, loudness Model Input Output

Related Tasks • Multi-pitch estimation - Single source: piano, guitar - Multiple source: quartet (woodwind, string) • Predominant F0 estimation - Melody extraction, singing melody • Drum transcription - Kick, snare, high-hat • Let’s listen to a piece and try to transcribe (hum) the

Two Directions • Performance transcription - Detecting exact timing and dynamics of notes (micro-timing with 10ms resolution or so) - Frame-level: onset, offset, intensity - Piano-roll notation is usually used (performance score) • Score transcription - Transform performance into staff notation - Note-level: tempo, beat, downbeat - Rhythmic transcription (tempo, beat, downbeat) à Temporal quantization - Expression detection (pedal, articulation), often phrase-level - Instrument identification - Very challenging

Score and Performance MIDI (score) Valentina Lisitsa Vladimir Horowitz

Where Are The Differences? • Tempo - Note-level, (note onset/offset timings), phrase-level, song-level • Dynamics - Note-level, (note velocity), phrase-level, song-level • Different interpretation of musical expressions in score - Temporal: ritardando, rubato - Dynamics: piano, forte, crescendo, … - Play techniques or articulation: legato, staccato - Mood and emotion: dolce, grazioso

Score-to-Audio Alignment • Temporal alignment between score and audio from a piece of music - Audio-to-audio and MIDI-to-MIDI (either one is performance) are possible • Why do we synchronize them? - Automatic page turning - Performance analysis - Score following - Auto-accompaniment [Müller]

Algorithm Overview • Choose feature representations to compare - Often, MIDI is convert to audio for alignment on the same feature space • Compute a similarity matrix between two features sequences - All possible combinations of local feature pairs • Find a path that makes the best alignment on the similarity matrix - Dynamic Time Warping (DTW) Feature Seq. #1 Similarity Dynamic Matrix Programming Feature Seq. #2 Compute Find the local similarity the best path

Feature Representations • Audio feature representations - Frequent choice for piano music is chroma MIDI Lisitsa CENS : Normalized Chroma Features (Muller, 2005)

Similarity Matrix • Similarity between every pair of frame-level features - Euclidean or cosine distance

Finding the Optimal Path • There are so many possible paths from one corner to another 250 Schumann − Traumerei − MIDI 200 150 100 50 50 100 150 200 250 300 Schumann − Traumerei − Lisitsa

3D Surface Plot of Similarity Matrix • Finding the optimal path is analogous to figuring out a trail route that you can take with minimum efforts in hiking.

Dynamic Time Warping • Finding an (N, M)-warped path of length L - P = (p1, p2, p3, .. pL) where pi = (ni, mi) • Three conditions - Boundary condition: p1=(1,1), pL=(N,M) - Monotonicity condition - n1 <= n2 <= … <= nL - m1 <=m2 <= .. <mL - Step size condition - Move only upward, rightward, diagonal (upper-right) [Müller]

Dynamic Time Warping : Bad Examples [Müller]

Dynamic Programming for DTW • Algorithm - Initialization: D(n,1) = sum(C(1:n,1)), n=1…N D(1,m) = sum(C(1,1:m)), n=1…M - Recurrence Relation : For each m = 1…M For each n = 1…N D(n-1,m) D(n,m)= C(n,m)+ min D(n,m-1) D(n-1,m-1) - Termination : D(N,M) is distance

Dynamic Programming for DTW • Toy Example Similarity Matrix ( C ) Accumulated cost ( D ) [Müller]

Score and Audio Alignment by DTW D(i,j) C(i,j)

Limitations • The optimal path is obtained after we arrive the destination (by back-tracking) - In other words, DTW works offline - What if the sequences are very long? - Online version of DTW? • Every frame is equally important - In general, human is more sensitive to note onsets - Perceptually, every frame is not equally important

Online DTW • Set a moving search window and calculate the cost only within the window 20 - Time and space cost: quadratic à linear 17 16 13 21 11 18 19 • The movement is determined by the 10 9 14 15 position that gives a minimum cost within 7 12 5 the current window. If the position is ... 3 1 2 4 6 8 - Corner: move both up and right (alternatively) Figure 2: An example of the on-line time warping algorithm with - Upper edge: move up search window c = 4 , showing the order of evaluation for a partic- ular sequence of row and column increments. The axes represent - Right edge: move right the variables t and j (see Figure 1) respectively. All calculated cells are framed in bold, and the optimal path is coloured grey. [Dixon, 2005]

Automatic Page Turner (JKU, Austria)

Onset-sensitive Alignment • We are sensitive to the time alignment on note onsets. - The similarity matrix has no additional weight to onsets • DLNCO Features - D ecaying L ocally-adapted N ormalized C hroma O nset - Capture only onset strength on chroma features - Normalize onset energy and note length (by artificially-created note tail) [Ewert, 2009]

Demo: PerformScore • https://jdasam.github.io/PerformScore/

Multi-pitch Estimation • Two types of polyphonic settings - Polyphonic instruments: piano, guitar - Ensemble of monophonic instruments: woodwind quintet, string quartet, chorale • Three levels of subtasks - First-level: frame-wise estimation of pitches and polyphony (number of notes) - Second-level: tracking pitch within a note based on temporal continuity - Third-level: tracking notes for each sound source, usually for ensembles of monophonic instruments

Challenges • Many sources are mixed and played simultaneously - They are likely to be harmonically related in music - Some sources can be masked by others - Content changes continuously by musical expressions (e.g. vibrato) • Compromises - Transcribe as many source sounds as possible - Only dominant sources: melody, bass, drum

Frame-wise Multi-pitch Estimation • Three categories of approaches - Iterative F0 search: repeatedly finds predominant-F0 and removes its related sources - Joint source estimation: examines possible combinations of multiples sources, e.g., NMF - Classification-base approach: no prior knowledge of musical acoustics, only relies on supervised learning

Iterative F0 estimation • Based on repeated cancellation of harmonic overtones of detected F0s (Klapuri, 2003) • Procedure Set the original to the residual 1. Detect predominant F0: based on the harmonic sieve method 2. Spectral smoothing on harmonics on the detected F0 3. Cancel the smoothed harmonics from the residual 4. Repeat the step 2 & 3 until the residual is sufficiently flat 5. Cancel sound From mixture Y R ( k ) ← max( Y R ( k ) − d Y D ( k ),0) F0 detection Y R ( k )

Iterative F0 estimation Spectral Smoothness Iterative Estimation Spectral Smoothness ECE 477 - Computer Audition, Zhiyao Duan 2014

Iterative F0 estimation • Advantages - Deterministic: only by signal processing and no data-driven training - Can handle inharmonicity (e.g. piano) and vibratio • Limitations - F0 estimation becomes unreliable as iteration increases - Spectral smoothing is not accurate enough

Joint Source Estimation • Based on a model for sound mixture - All sources compete with each other to explain the mixture and find a subset that are mostly likely - The number of sources are limited - Non-negative matrix factorization (NMF) has been most widely explored

Joint Source Estimation • How many spectral templates can explain the source ?

Joint Source Estimation We can explain the spectrogram with three spectral basis ( 𝑋 ) • and corresponding activations ( 𝐼 ) Can we decompose 𝑊 into 𝑋 and 𝐼 automatically ? • 𝑋 𝑊 ≈ 𝑋𝐼 𝐼

Non-negative Matrix Factorization (NMF) • One of matrix factorization algorithms but all elements are non- negative - 𝑊 ( 𝑁 x 𝑂 matrix): original data (e.g. spectrogram) - 𝑋 ( 𝑁 x 𝐿 matrix ): 𝐿 basis vectors (e.g. dictionary) - 𝐼 ( 𝐿 x 𝑂 matrix): activation matrix (e.g. weights or gains) • Note that this provides a compressed representation. - A low-rank approximation ! $ ! $ ! $ # & # & # & # & # & ≈ # & # & # & # & # & # & " % " % " % 𝑊 𝐼 𝑋

Algorithm for NMF • 𝑊 is known, and 𝑋 and 𝐼 are unknown. How? • Alternative the estimation (similar to the EM algorithm) - Start with random 𝑋 - Estimate an 𝐼 given 𝑋 - Estimate a new 𝑋 given 𝐼 - Repeat until convergence

Polyphonic Music Transcription Non-negative Matrix Factorization - PowerPoint PPT Presentation

GCT634: Musical Applications of Machine Learning Polyphonic Music Transcription Non-negative Matrix Factorization Graduate School of Culture Technology, KAIST Juhan Nam Outlines Introduction Score-Audio Alignment Multi-Pitch

Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals

Polyphonic Music Transcription using Deep Learning Methods Aniruddha Zalani Ayush Mittal Course

DRUM TRANSCRIPTION FROM POLYPHONIC MUSIC WITH RECURRENT NEURAL NETWORKS Richard Vogl 1,2 ,

Unsupervised Piano Music Transcription Taylor Berg-Kirkpatrick Jacob Andreas and Dan Klein

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

Music transcription via convex optimization Song Mei ICME, Stanford June 3, 2015 Song Mei

Automatic Drum Transcription E6820 Project Proposal Ron Weiss ronw@ee.columbia.edu Automatic

on FPGA Shuyi Chen Lizi George Kelly Ran Outline Motivation System Architecture

Greedy Orthogonal Pivoting for Non-negative Matrix Factorization Kai Zhang, Jun Liu, Jie Zhang,

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

TFClass a classifjcation of transcription factors Jrgen Dnitz, Edgar Wingender T

Data Mining and Matrices 06 Non-Negative Matrix Factorization Rainer Gemulla, Pauli Miettinen

Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Factorization

FROM DRUM TRANSCRIPTION TO DRUM PATTERN VARIATION Richard Vogl richard.vogl@tuwien.ac.at PART 1

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Examining the role of the intermediary in the criminal justice system John Taggart LSE,

Joint work with Marc Brockschmidt, Alex Gaunt, Alex Polozov, Patrick Fernandes, Mahmoud Khademi

Low energy neutrino experiments sensitivity to physics beyond the Standard Model Timur Rashba

Why Dependent Origination? So what is dependent origination? Dependent on ignorance, there

Model-Based Synthesis of the Clavichord Vesa Vlimki 1 , Mikael Laurson 2 , Cumhur Erkut 1 , and

Electronic Locator of Vertical Interval Successions . Julie Cumming, Christopher Antila, and

Introduction Meinard Mller International Audio Laboratories Erlangen

Communicating Scala Objects Bernard Sufrin CPA, York, September 2008 [cpa2008-cso-talk]

Polyphonic Music Transcription Non-negative Matrix Factorization - PowerPoint PPT Presentation

GCT634: Musical Applications of Machine Learning Polyphonic Music Transcription Non-negative Matrix Factorization Graduate School of Culture Technology, KAIST Juhan Nam Outlines Introduction Score-Audio Alignment Multi-Pitch

Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals

Polyphonic Music Transcription using Deep Learning Methods Aniruddha Zalani Ayush Mittal Course

DRUM TRANSCRIPTION FROM POLYPHONIC MUSIC WITH RECURRENT NEURAL NETWORKS Richard Vogl 1,2 ,

Unsupervised Piano Music Transcription Taylor Berg-Kirkpatrick Jacob Andreas and Dan Klein

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

Music transcription via convex optimization Song Mei ICME, Stanford June 3, 2015 Song Mei

Automatic Drum Transcription E6820 Project Proposal Ron Weiss ronw@ee.columbia.edu Automatic

on FPGA Shuyi Chen Lizi George Kelly Ran Outline Motivation System Architecture

Greedy Orthogonal Pivoting for Non-negative Matrix Factorization Kai Zhang, Jun Liu, Jie Zhang,

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &amp;

TFClass a classifjcation of transcription factors Jrgen Dnitz, Edgar Wingender T

Data Mining and Matrices 06 Non-Negative Matrix Factorization Rainer Gemulla, Pauli Miettinen

Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Factorization

FROM DRUM TRANSCRIPTION TO DRUM PATTERN VARIATION Richard Vogl richard.vogl@tuwien.ac.at PART 1

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Examining the role of the intermediary in the criminal justice system John Taggart LSE,

Joint work with Marc Brockschmidt, Alex Gaunt, Alex Polozov, Patrick Fernandes, Mahmoud Khademi

Low energy neutrino experiments sensitivity to physics beyond the Standard Model Timur Rashba

Why Dependent Origination? So what is dependent origination? Dependent on ignorance, there

Model-Based Synthesis of the Clavichord Vesa Vlimki 1 , Mikael Laurson 2 , Cumhur Erkut 1 , and

Electronic Locator of Vertical Interval Successions . Julie Cumming, Christopher Antila, and

Introduction Meinard Mller International Audio Laboratories Erlangen

Communicating Scala Objects Bernard Sufrin CPA, York, September 2008 [cpa2008-cso-talk]

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &