GCT535- Sound Technology for Multimedia Music and Audio Alignment - - PowerPoint PPT Presentation

gct535 sound technology for multimedia music and audio
SMART_READER_LITE
LIVE PREVIEW

GCT535- Sound Technology for Multimedia Music and Audio Alignment - - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Music and Audio Alignment Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Musical Representations Score, Audio, MIDI Music and Audio Alignment Synchronization Framework


slide-1
SLIDE 1

GCT535- Sound Technology for Multimedia Music and Audio Alignment

Graduate School of Culture Technology KAIST Juhan Nam

1

slide-2
SLIDE 2

Outlines

§ Musical Representations

– Score, Audio, MIDI

§ Music and Audio Alignment

– Synchronization Framework – Dynamic Time Warping – Dynamic Programming

2

slide-3
SLIDE 3

Music Representations

§ Score

– Abstract symbols of musical events

§ Audio

– Concrete(or actual) renditions of the score as sound

§ MIDI

– A series of events

  • Note messages: onset on/off (onset and offset), note number, note velocity,
  • Control messages: Pedal on/off, pitch wheel, modulation, …

– Can be either score-like abstract event sequences or a recording of note/control events from actual performance

3

slide-4
SLIDE 4

Symbols and Performances

§ MIDI (score) § Valentina Lisitsa § Vladimir Horowitz

4

slide-5
SLIDE 5

Where are the differences from?

§ Musical expressions

– Temporal: ritardando, rubato – Dynamics: piano, forte, crescendo, … – Play techniques: legato, staccato – Mood and emotion: dolce, grazioso

§ Different styles of performers

– Temporal: tempo (global) and note onset/offset timings (local) – Dynamics

§ Moreover…

– Variation in key, rhythm, chord, melody, instrumentation (e.g. cover songs) – Tuning

5

slide-6
SLIDE 6

Music Synchronization

§ Temporal align different representations from a piece of music

– Audio to Audio – Audio to Score

§ Why do we synchronize them?

– Score following – Auto-accompaniment – Related

  • Variable time-stretching
  • Audio classification

6

[from M. Muller’s Book]

slide-7
SLIDE 7

Synchronization Framework

§ Choose feature representations to compare

– Often, MIDI is convert to audio for alignment on the same feature space

§ Compute a similarity matrix between two features sequences

– All possible combinations of local feature pairs

§ Find a path that makes the best alignment on the similarity matrix

– Dynamic Time Warping (DTW)

7

Dynamic Programming Feature Seq. #1 Similarity Matrix Feature Seq. #2

Compute local similarity Find the best path

slide-8
SLIDE 8

Feature Representations

§ Frequent choices of audio feature representations

– Spectrogram, Chroma, MFCC, …

8

CENS : Normalized Chroma Features (Muller, 2005) MIDI Lisitsa

slide-9
SLIDE 9

§ M by N matrix

– (i, j) element is computed by similarity between the i-th vector

  • f an M-long feature sequence

and the j-th vector of an N-long feature sequence

9

Similarity Matrix

slide-10
SLIDE 10

Finding the Optimal Path

Schumann−Traumerei−Lisitsa Schumann−Traumerei−MIDI

50 100 150 200 250 300 50 100 150 200 250

10

§ You can move only to three directions

– Up, right, diagonal

§ The number of possible paths for M by N matrix is ???

slide-11
SLIDE 11

3D Surface Plot of Similarity Matrix

§ Finding the optimal path is like figuring out a trail route that you can take with minimum efforts in hiking.

11

slide-12
SLIDE 12

Dynamic Time Warping

§ Finding an (N, M)-warped path of length L

– P = (p1, p2, p3, .. pL) where pi = (ni, mi)

§ Three conditions

– Boundary condition: p1=(1,1), pL=(N,M) – Monotonicity condition

  • n1 <= n2 <= … <= nL
  • m1 <=m2 <= .. <mL

– Step size condition

  • Move only upward, rightward, diagonal

(upper-right)

12

slide-13
SLIDE 13

Dynamic Time Warping : Bad Examples

13

slide-14
SLIDE 14

Dynamic Programming

§ Finding the minimum-cost-path § Naïve approach

– Find all paths from A to K and calculate the cost for each – Choose the path that has the minimum cost. – However, as the number of nodes increases, the number of paths increase exponentially.

14

A C B D E F G H 2 4 3 3 6 2 4 2 2 3 2 5 4 1 2 3 3 1 5 3 I J K 7 4 5 6 3 3 5 7 4 3 2 3 2

slide-15
SLIDE 15

Dynamic Programming

§ Observation

– Say the minimum-cost-path passes by a node p, – What is the minimum-cost-path from A to p? – It is just a sub-path of the minimum-cost-path from A to K. – Thus, we don’t have to compute the cost from scratch; we can use the cost computed from the previous nodes.

15

A C B D E F G H 2 4 3 3 6 2 4 2 2 3 2 5 4 1 2 3 3 1 5 3 I J K 7 4 5 6 3 3 5 7 4 3 2 3 2

slide-16
SLIDE 16

Dynamic Programming

§ The minimum cost is computed by the following equation: § The minimum-cost-path can be found by tracing back the computation

16

Ck( j) = Ok( j)+ min

i {Ck−1(i)+cij}

Ck( j) Ok(j)

: cost up to node j : local cost at node j

cij : transition cost from i to j

A C B D E F G H 2 4 3 3 6 2 4 2 2 3 2 5 4 1 2 3 3 1 5 3 I J K 7 4 5 6 3 3 5 7 4 3 2 3 2

slide-17
SLIDE 17

DP for Dynamic Time Warping (DTW)

§ Algorithm

– Initialization:

C(n,1) = sum(O(1:n,1)), n=1…N C(1,m) = sum(O(1,1:m)), n=1…M

– Recurrence Relation:

For each m = 1…M For each n = 1…N C(n-1,m) C(n,m)= O(n,m)+ min C(n,m-1) C(n-1,m-1)

– Termination:

C(N,M) is distance

17

slide-18
SLIDE 18

DP for Dynamic Time Warping (DTW)

§ Toy Example

18

slide-19
SLIDE 19

Score and Audio Alignment by DTW

19

O(i,j) C(i,j)

slide-20
SLIDE 20

Limitations

§ The optimal path is obtained after we arrive the destination (by back- tracking)

– i.e. the DTW works offline – What if the sequences are very long? – Online version of DTW?

§ Every frame is equally important

– In general, human is more sensitive to note onsets – Perceptually, every frame is not equally important

20

slide-21
SLIDE 21

Online DTW

§ Set a moving search window and calculate the cost only within the window

– Time and space cost: quadratic à linear

§ The movement is determined by the position that gives a minimum cost within the current window. If the position is ...

– Corner: move both up and right (alternatively) – Upper edge: move up – Right edge: move right

21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Figure 2: An example of the on-line time warping algorithm with search window c = 4, showing the order of evaluation for a partic- ular sequence of row and column increments. The axes represent the variables t and j (see Figure 1) respectively. All calculated cells are framed in bold, and the optimal path is coloured grey.

[Dixon, 2005]

slide-22
SLIDE 22

Onset-sensitive Alignment

§ We are sensitive to the time alignment on note onsets.

– The similarity matrix has no additional weight to

  • nsets

§ DLNCO Features

– Decaying Locally-adapted Normalized Chroma Onset – Capture only onset strength on chroma features – Normalize onset energy and note length (by artificially-created note tail)

22

[Ewert, 2009]

slide-23
SLIDE 23

Onset-sensitive Alignment

23

Score Following Results on the RWC dataset

Demo: https://www.audiolabs-erlangen.de/resources/MIR/SyncRWC60

slide-24
SLIDE 24

DTW in Matlab

§ Check out: http://labrosa.ee.columbia.edu/matlab/dtw/

24

slide-25
SLIDE 25

Beat Tracking using Dynamic Programming

§ Find the optimal “hopping” path that accords with onset detection function and the estimated tempo:

– 𝑃(𝑢) is onset detection function – 𝐺(∆𝑢, 𝜐) is temporal consistency score: 𝐺 ∆𝑢, 𝜐 = −(log

∆. / )0

§ Recast it as a dynamic programming

– Maximize the following equation

25

𝐷 𝑢2 = 3 𝑃 𝑢2 + 𝛽 3 𝐺(𝑢2 − 𝑢267, 𝜐)

8 290 8 297

𝐷 𝑢 = 𝑃 𝑢 + max

/ {𝛽𝐺 𝑢 + 𝜐, 𝜐> + 𝐷 𝑢 }

slide-26
SLIDE 26

Beat Tracking By DP in Matlab

§ Check out: http://www.ee.columbia.edu/ln/rosa/matlab/beat_simple/

26

slide-27
SLIDE 27

Applications

§ Performance analysis

– Understand human performances

  • e.g. “In search of the Horowitz Factor” (G. Widmer, 2003)

– Performance evaluation for music education and entertainment

§ Interactive music notation system

– Score following: tracking notes or measure – Automatic page turner – Score-synchronized music listening

§ Auto-accompaniment

– Roger Dannerberg’s work – IRCAM Antescofo – Sonation Cadenza

27

slide-28
SLIDE 28

Applications

28

Interpretation Switcher Score viewer

slide-29
SLIDE 29

References

  • S. Dixon, “Live Tracking Of Musical Performance Using On-line Time

Warping”, 2005

  • G. Widmer, “In search of the Horowitz Factor”, 2003
  • S. Ewert, “High Resolution Audio Synchronization Using Chroma Onset

Features”, 2009

29