SLIDE 1 Music Processing Meinard Müller
Lecture
Music Synchronization
International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de
SLIDE 2 Book: Fundamentals of Music Processing
Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de
SLIDE 3 Book: Fundamentals of Music Processing
Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de
SLIDE 4 Book: Fundamentals of Music Processing
Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de
SLIDE 5 Chapter 3: Music Synchronization
3.1 Audio Features 3.2 Dynamic Time Warping 3.3 Applications 3.4 Further Notes
As a first music processing task, we study in Chapter 3 the problem of music synchronization. The
is to temporally align compatible representations of the same piece of music. Considering this scenario, we explain the need for musically informed audio features. In particular, we introduce the concept of chroma-based music features, which capture properties that are related to harmony and melody. Furthermore, we study an alignment technique known as dynamic time warping (DTW), a concept that is applicable for the analysis of general time series. For its efficient computation, we discuss an algorithm based on dynamic programming—a widely used method for solving a complex problem by breaking it down into a collection of simpler subproblems.
SLIDE 6
Music Data
SLIDE 7
Music Data
SLIDE 8
Music Data
SLIDE 9
Music Data
Various interpretations – Beethoven’s Fifth Bernstein Karajan Gould (piano) MIDI (piano)
SLIDE 10
Music Synchronization: Audio-Audio
Given: Two different audio recordings of the same underlying piece of music. Goal: Find for each position in one audio recording the musically corresponding position in the other audio recording.
SLIDE 11 Music Synchronization: Audio-Audio
Karajan Gould Beethoven’s Fifth
Time (seconds) Time (seconds)
SLIDE 12 Music Synchronization: Audio-Audio
Karajan Gould Beethoven’s Fifth
Time (seconds) Time (seconds)
SLIDE 13
Music Synchronization: Audio-Audio
Application: Interpretation Switcher
SLIDE 14 Music Synchronization: Audio-Audio
Two main steps:
- Robust but discriminative
- Chroma features
- Robust to variations in instrumentation, timbre, dynamics
- Correlate to harmonic progression
1.) Audio features
- Deals with local and global tempo variations
- Needs to be efficient
2.) Alignment procedure
SLIDE 15 Music Synchronization: Audio-Audio
Karajan Gould Beethoven’s Fifth
Time (seconds) Time (seconds)
SLIDE 16 Music Synchronization: Audio-Audio
Karajan Gould Beethoven’s Fifth
Time (indices) Time (indices)
SLIDE 17 Music Synchronization: Audio-Audio
Karajan Gould Beethoven’s Fifth
Time (indices) Time (indices)
SLIDE 18 Music Synchronization: Audio-Audio
Karajan Gould Beethoven’s Fifth
Time (indices) Time (indices)
G G
SLIDE 19 Music Synchronization: Audio-Audio
Karajan Gould Beethoven’s Fifth
Time (indices) Time (indices)
E
♭
E
♭
SLIDE 20 Music Synchronization: Audio-Audio
Time (indices) Time (indices)
Karajan Gould
SLIDE 21 Music Synchronization: Audio-Audio
Cost matrix
Time (indices) Time (indices)
Karajan Gould
SLIDE 22 Music Synchronization: Audio-Audio
Cost matrix
Time (indices) Time (indices)
Karajan Gould
SLIDE 23 Music Synchronization: Audio-Audio
Optimal alignment (cost-minimizing warping path)
Time (indices) Time (indices)
Karajan Gould
SLIDE 24
Music Synchronization: Audio-Audio
Cost matrix
SLIDE 25
Music Synchronization: Audio-Audio
Optimal alignment (cost-minimizing warping path)
SLIDE 26 Music Synchronization: Audio-Audio
Karajan Gould Optimal alignment (cost-minimizing warping path)
Time (indices) Time (indices)
SLIDE 27
Cost matrices Dynamic programming Dynamic Time Warping (DTW)
Music Synchronization: Audio-Audio
How to compute the alignment?
SLIDE 28 Applications
Music Library
Freude, schoener Götterfunken, Tochter aus Elysium, Wir betreten feuertrunken, Himmlische dein Heiligtum. Deine Zauber binden wieder, Was die Mode streng geteilt; Alle Menschen werden Brueder, Wo dein sanfter Flügel weilt. Wem der grosse Wurf gelungen, Eines Freundes Freund zu sein, Wer ein holdes Weib errungen, Mische seine Jubel ein!
SLIDE 29 Music Synchronization: MIDI-Audio
Time
SLIDE 30
Music Synchronization: MIDI-Audio MIDI = meta data Automated annotation Audio recording
Sonification of annotations
SLIDE 31
- Automated audio annotation
- Accurate audio access after MIDI-based retrieval
- Automated tracking of MIDI note parameters
during audio playback
Music Synchronization: MIDI-Audio
SLIDE 32
Music Synchronization: MIDI-Audio MIDI = reference (score) Tempo information Audio recording
SLIDE 33 Performance Analysis: Tempo Curves
1 2 3 4 5 1 2 3 4
Time (beats)
1 2 3 4 5
Time (beats)
60 120 180 240
Reference version Reference version Alignment Local tempo
Time (seconds) Tempo (BPM)
Performed version Performed version
SLIDE 34 Performance Analysis: Tempo Curves
1 2 3 4 5 1 2 3 4
Time (beats)
1 2 3 4 5
Time (beats)
60 120 180 240
Reference version Reference version Alignment Local tempo
30
Time (seconds) Tempo (BPM)
Performed version Performed version 1 beat lasting 2 seconds ≙ 30 BPM
SLIDE 35 Performance Analysis: Tempo Curves
1 2 3 4 5 1 2 3 4
Time (beats)
1 2 3 4 5
Time (beats)
60
120 180 240
Reference version Reference version Alignment Local tempo
Time (seconds) Tempo (BPM)
Performed version Performed version
30
1 beat lasting 1 seconds ≙ 60 BPM
SLIDE 36 Performance Analysis: Tempo Curves
1 2 3 4 5 1 2 3 4
Time (seconds) Time (beats)
1 2 3 4 5
Time (beats) Tempo (BPM)
120 180 240
Performed version Reference version Reference version Performed version Alignment Local tempo
150 60 30
1 beat lasting 0.4 seconds ≙ 150 BPM
SLIDE 37 Performance Analysis: Tempo Curves
1 2 3 4 5 1 2 3 4
Time (beats)
1 2 3 4 5
Time (beats)
120 180 240
Reference version Reference version Alignment Tempo curve
Time (seconds) Tempo (BPM)
Performed version Performed version
200 150 60 30
Tempo curve is optained by interpolation
SLIDE 38 Schumann: Träumerei
Performance Analysis: Tempo Curves
Performance:
1 5 10 15 20 25
0.1 30
Time (seconds)
SLIDE 39 Schumann: Träumerei
Performance Analysis: Tempo Curves
Score (reference):
1 2 3 4 5 6 7 8
Performance:
1 5 10 15 20 25
0.1 30
Time (seconds)
SLIDE 40 Schumann: Träumerei
Performance Analysis: Tempo Curves
Strategy: Compute score-audio synchronization and derive tempo curve Score (reference):
1 2 3 4 5 6 7 8
Performance:
1 5 10 15 20 25
0.1 30
Time (seconds)
SLIDE 41 Performance Analysis: Tempo Curves
Schumann: Träumerei
Tempo curve: Score (reference):
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 40 80 120 160 8
Tempo (BPM) Time (measures)
SLIDE 42 Performance Analysis: Tempo Curves
Schumann: Träumerei
Tempo curves: Score (reference):
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 40 80 120 160 8
Tempo (BPM) Time (measures)
SLIDE 43 Performance Analysis: Tempo Curves
Schumann: Träumerei
Tempo curves: Score (reference):
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 40 80 120 160 8
Tempo (BPM) Time (measures)
SLIDE 44 Performance Analysis: Tempo Curves
Schumann: Träumerei
Tempo curves: Score (reference):
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 40 80 120 160 8
Tempo (BPM) Time (measures)
?
SLIDE 45 Performance Analysis: Tempo Curves
Schumann: Träumerei
Tempo curves:
1 2 3 4 5 6 7 40 80 120 160 8
Tempo (BPM) Time (measures)
What can be done if no reference is available?
SLIDE 46 Performance Analysis: Tempo Curves
Schumann: Träumerei
Tempo curves:
1 2 3 4 5 6 7 40 80 120 160 8
Tempo (BPM) Time (measures)
What can be done if no reference is available? → Tempo and Beat Tracking
SLIDE 47
Music Synchronization: Image-Audio
Image Audio
SLIDE 48
Music Synchronization: Image-Audio
Image Audio
SLIDE 49
Music Synchronization: Image-Audio
Image Audio Convert data into common mid-level feature representation
SLIDE 50 Music Synchronization: Image-Audio
Image Audio
Image Processing: Optical Music Recognition
Convert data into common mid-level feature representation
SLIDE 51 Music Synchronization: Image-Audio
Image Audio
Image Processing: Optical Music Recognition Audio Processing: Fourier Analyse
Convert data into common mid-level feature representation
SLIDE 52 Music Synchronization: Image-Audio
Image Audio
Image Processing: Optical Music Recognition Audio Processing: Fourier Analyse
SLIDE 53
Application: Score Viewer
Music Synchronization: Image-Audio
SLIDE 54 Music Synchronization: Lyrics-Audio
Ich träumte von bunten Blumen, so wie sie wohl blühen im Mai
SLIDE 55 Music Synchronization: Lyrics-Audio
Ich träumte von bunten Blumen, so wie sie wohl blühen im Mai
Extremely difficult!
SLIDE 56 Music Synchronization: Lyrics-Audio
Ich träumte von bunten Blumen, so wie sie wohl blühen im Mai
Lyrics-Audio Lyrics-MIDI + MIDI-Audio
SLIDE 57 Music Synchronization: Lyrics-Audio
Lyrics-Audio Lyrics-MIDI + MIDI-Audio
Ich träumte von bunten Blumen, so wie sie wohl blühen im Mai
SLIDE 58
Score-Informed Source Separation
SLIDE 59
Score-Informed Source Separation
SLIDE 60
Score-Informed Source Separation
SLIDE 61 Score-Informed Source Separation
Experimental results for separating left and right hands for piano recordings:
Composer Piece Database Results
L R Eq Org
Bach BWV 875, Prelude SMD Chopin
SMD Chopin
European Archive
SLIDE 62 Score-Informed Source Separation
500 580 523 Frequency (Hertz) 1 0.5 Time (seconds)
Audio editing
9 8 7 6 1600 1200 800 400 9 8 7 6 1600 1200 800 400 500 580 554 Frequency (Hertz) 1 0.5 Time (seconds)
SLIDE 63
Dynamic Time Warping
SLIDE 64 Dynamic Time Warping
- Well-known technique to find an optimal alignment
between two given (time-dependent) sequences under certain restrictions.
- Intuitively, sequences are warped in a non-linear
fashion to match each other.
- Originally used to compare different speech
patterns in automatic speech recognition
SLIDE 65 Dynamic Time Warping
Sequence X Sequence Y x1 x2 x3 x4 x5 x6 x7 x8 x9 y1 y2 y3 y4 y5 y6 y7
SLIDE 66 Dynamic Time Warping
Sequence X Sequence Y x1 x2 x3 x4 x5 x6 x7 x8 x9 y1 y2 y3 y4 y5 y6 y7
Time alignment of two time-dependent sequences, where the aligned points are indicated by the arrows.
SLIDE 67 Dynamic Time Warping
Sequence X Sequence Y x1 x2 x3 x4 x5 x6 x7 x8 x9 y1 y2 y3 y4 y5 y6 y7
Time alignment of two time-dependent sequences, where the aligned points are indicated by the arrows.
1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9
Sequence Y Sequence X
SLIDE 68 The objective of DTW is to compare two (time-dependent) sequences
- f length and
- f length . Here,
are suitable features that are elements from a given feature space denoted by .
Dynamic Time Warping
SLIDE 69 To compare two different features
- ne needs a local cost measure which is defined
to be a function Typically, is small (low cost) if and are similar to each other, and otherwise is large (high cost).
Dynamic Time Warping
SLIDE 70
Dynamic Time Warping
Evaluating the local cost measure for each pair of elements of the sequences and , one obtains the cost matrix denfined by Then the goal is to find an alignment between and having minimal overall cost. Intuitively, such an optimal alignment runs along a “valley” of low cost within the cost matrix .
SLIDE 71 Dynamic Time Warping
Time (indices) Time (indices)
Cost matrix C
SLIDE 72 Dynamic Time Warping
Time (indices) Time (indices)
Cost matrix C C(5,6)
SLIDE 73
Dynamic Time Warping
Cost matrix C
SLIDE 74
Dynamic Time Warping
Cost matrix C C(5,6)
SLIDE 75
- Boundary condition: and
- Monotonicity condition: and
- Step size condition:
Dynamic Time Warping
The next definition formalizes the notion of an alignment. A warping path is a sequence with for satisfying the following three conditions: for
SLIDE 76 Dynamic Time Warping
1 2 3 4 5 6 7 9 8 7 6 5 4 3 2 1
Sequence Y Sequence X
Cell = (6,3) Each matrix entry (cell) corresponds to a pair of indices. Boundary cells: p1 = (1,1) pL = (N,M) = (9,7) Warping path
SLIDE 77 Dynamic Time Warping
Correct warping path Warping path
1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9
Sequence Y Sequence X Sequence X Sequence Y x1 x2 x3 x4 x5 x6 x7 x8 x9 y1 y2 y3 y4 y5 y6 y7
SLIDE 78 Dynamic Time Warping
Warping path
Sequence X Sequence Y
Violation of boundary condition
1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9
Sequence Y Sequence X y1 y2 y3 y4 y5 y6 y7 x1 x2 x3 x4 x5 x6 x7 x8 x9
SLIDE 79 Dynamic Time Warping
Warping path
Sequence X Sequence Y
Violation of monotonicity condition
1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9
Sequence Y Sequence X x1 x2 x3 x4 x5 x6 x7 x8 x9 y1 y2 y3 y4 y5 y6 y7
SLIDE 80 Dynamic Time Warping
Warping path
Sequence X Sequence Y
Violation of step size condition
1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9
Sequence Y Sequence X x1 x2 x3 x4 x5 x6 x7 x8 x9 y1 y2 y3 y4 y5 y6 y7
SLIDE 81 Furthermore, an optimal warping path between and is a warping path having minimal total cost among all possible warping paths. The DTW distance between and is then defined as the total cost of The total cost
and with respect to the local cost measure is defined as
Dynamic Time Warping
SLIDE 82 Dynamic Time Warping
- The warping path is not unique (in general).
- DTW does (in general) not definne a metric since it
may not satisfy the triangle inequality.
- There exist exponentially many warping paths.
- How can be computed efficiently?
SLIDE 83
Dynamic Time Warping
Notation: The matrix is called the accumulated cost matrix. The entry specifies the cost of an optimal warping path that aligns with .
SLIDE 84
Dynamic Time Warping
Lemma: for Proof: (i) – (iii) are clear by definition
SLIDE 85
Dynamic Time Warping
Proof of (iv): Induction via :
■
Let and be an optimal warping path for and . Then (boundary condition). Let . The step size condition implies The warping path must be optimal for . Thus,
SLIDE 86 Dynamic Time Warping
- Initialize using (ii) and (iii) of the lemma.
- Compute e for using (iv).
- using (i).
Given the two feature sequences and , the matrix is computed recursively. Note:
- Complexity O(NM).
- Dynamic programming: “overlapping-subproblem property”
Accumulated cost matrix
SLIDE 87 Given to the algorithm is the accumulated cost matrix . The optimal path is computed in reverse
- rder of the indices starting with .
Suppose has been computed. In case , one must have and we are done. Otherwise, where we take the lexicographically smallest pair in case “argmin” is not unique.
Dynamic Time Warping
Optimal warping path
SLIDE 88 Dynamic Time Warping
Summary
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6
m = 10 n = 6 D(n,m) D(N,M) = DTW(X,Y) D(n,m-1) D(n-1,m-1) D(n-1,m)
10 11 7
D(1,m) D(n,1)
12 13 14 15 8
SLIDE 89
Dynamic Time Warping
Summary
SLIDE 90 Dynamic Time Warping
1 1 1 7 6 1 6 8 8 1 6 1 3 3 5 4 1 1 3 3 5 4 1 1 1 1 7 6 1 2 8 7 2 10 10 11 14 13 9 9 11 13 7 8 14 3 5 7 10 12 13 2 4 5 8 12 13 1 2 3 10 16 17 2 8 7 2 1 3 3 8 1 2 8 7 2 1 8 3 3 1 1 8 3 3 1
Example
Alignment Optimal warping path:
SLIDE 91
Dynamic Time Warping
Step size conditions Σ 1,0 , 0,1 , 1,1
SLIDE 92
Dynamic Time Warping
Step size conditions Σ 2,1 , 1,2 , 1,1
SLIDE 93
Dynamic Time Warping
Step size conditions
SLIDE 94
- Computation via dynamic programming
- Memory requirements and running time: O(NM)
- Problem: Infeasible for large N and M
- Example: Feature resolution 10 Hz, pieces 15 min
N, M ~ 10,000 N ꞏ M ~ 100,000,000
Dynamic Time Warping
SLIDE 95
Sakoe-Chiba band Itakura parallelogram Global constraints
Dynamic Time Warping
SLIDE 96
Problem: Optimal warping path not in constraint region Sakoe-Chiba band Itakura parallelogram Global constraints
Dynamic Time Warping
SLIDE 97
Compute optimal warping path on coarse level Multiscale approach
Dynamic Time Warping
SLIDE 98
Project on fine level Multiscale approach
Dynamic Time Warping
SLIDE 99
Specify constraint region Multiscale approach
Dynamic Time Warping
SLIDE 100
Compute constrained optimal warping path Multiscale approach
Dynamic Time Warping
SLIDE 101 Good trade-off between efficiency and robustness?
- Suitable features?
- Suitable resolution levels?
- Size of constraint regions?
Multiscale approach
Dynamic Time Warping
Suitable parameters depend very much on application!