gct535 sound technology for multimedia music and audio
play

GCT535- Sound Technology for Multimedia Music and Audio Alignment - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Music and Audio Alignment Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Musical Representations Score, Audio, MIDI Music and Audio Alignment Synchronization Framework


  1. GCT535- Sound Technology for Multimedia Music and Audio Alignment Graduate School of Culture Technology KAIST Juhan Nam 1

  2. Outlines § Musical Representations – Score, Audio, MIDI § Music and Audio Alignment – Synchronization Framework – Dynamic Time Warping – Dynamic Programming 2

  3. Music Representations § Score – Abstract symbols of musical events § Audio – Concrete(or actual) renditions of the score as sound § MIDI – A series of events • Note messages: onset on/off (onset and offset), note number, note velocity, • Control messages: Pedal on/off, pitch wheel, modulation, … – Can be either score-like abstract event sequences or a recording of note/control events from actual performance 3

  4. Symbols and Performances § MIDI (score) § Valentina Lisitsa § Vladimir Horowitz 4

  5. Where are the differences from? § Musical expressions – Temporal: ritardando, rubato – Dynamics: piano, forte, crescendo, … – Play techniques: legato, staccato – Mood and emotion: dolce, grazioso § Different styles of performers – Temporal: tempo (global) and note onset/offset timings (local) – Dynamics § Moreover… – Variation in key, rhythm, chord, melody, instrumentation (e.g. cover songs) – Tuning 5

  6. Music Synchronization § Temporal align different representations from a piece of music – Audio to Audio – Audio to Score § Why do we synchronize them? – Score following – Auto-accompaniment – Related • Variable time-stretching • Audio classification [from M. Muller’s Book] 6

  7. Synchronization Framework § Choose feature representations to compare – Often, MIDI is convert to audio for alignment on the same feature space § Compute a similarity matrix between two features sequences – All possible combinations of local feature pairs § Find a path that makes the best alignment on the similarity matrix – Dynamic Time Warping (DTW) Feature Seq. #1 Similarity Dynamic Matrix Programming Feature Seq. #2 Compute Find the best path local similarity 7

  8. Feature Representations § Frequent choices of audio feature representations – Spectrogram, Chroma, MFCC, … MIDI Lisitsa CENS : Normalized Chroma Features (Muller, 2005) 8

  9. Similarity Matrix § M by N matrix – (i, j) element is computed by similarity between the i-th vector of an M-long feature sequence and the j-th vector of an N-long feature sequence 9

  10. Finding the Optimal Path 250 § You can move only to three directions Schumann − Traumerei − MIDI 200 – Up, right, diagonal 150 § The number of possible paths for 100 M by N matrix is ??? 50 50 100 150 200 250 300 Schumann − Traumerei − Lisitsa 10

  11. 3D Surface Plot of Similarity Matrix § Finding the optimal path is like figuring out a trail route that you can take with minimum efforts in hiking. 11

  12. Dynamic Time Warping § Finding an (N, M)-warped path of length L – P = (p1, p2, p3, .. pL) where pi = (ni, mi) § Three conditions – Boundary condition: p1=(1,1), pL=(N,M) – Monotonicity condition • n1 <= n2 <= … <= nL • m1 <=m2 <= .. <mL – Step size condition • Move only upward, rightward, diagonal (upper-right) 12

  13. Dynamic Time Warping : Bad Examples 13

  14. Dynamic Programming § Finding the minimum-cost-path § Naïve approach – Find all paths from A to K and calculate the cost for each – Choose the path that has the minimum cost. – However, as the number of nodes increases, the number of paths increase exponentially. 2 3 3 1 6 B E H 4 4 7 5 2 2 2 2 3 3 4 3 4 1 7 A C F I K 2 6 3 3 5 3 2 4 3 5 2 3 5 J D G 14

  15. Dynamic Programming § Observation – Say the minimum-cost-path passes by a node p , – What is the minimum-cost-path from A to p ? – It is just a sub-path of the minimum-cost-path from A to K. – Thus, we don’t have to compute the cost from scratch; we can use the cost computed from the previous nodes. 2 3 3 1 6 B E H 4 4 7 5 2 2 2 2 3 3 4 3 4 1 7 A C F I K 2 6 3 3 5 3 2 4 3 5 2 3 5 J D G 15

  16. Dynamic Programming § The minimum cost is computed by the following equation: : cost up to node j C k ( j ) C k ( j ) = O k ( j ) + min i { C k − 1 ( i ) + c ij } : local cost at node j O k ( j ) c ij : transition cost from i to j § The minimum-cost-path can be found by tracing back the computation 2 3 3 1 6 B E H 4 7 4 2 5 2 2 2 3 4 3 3 4 1 7 A C F I K 2 6 3 3 5 3 2 4 3 5 2 3 5 J D G 16

  17. DP for Dynamic Time Warping (DTW) § Algorithm – Initialization: C(n,1) = sum(O(1:n,1)), n=1…N C(1,m) = sum(O(1,1:m)), n=1…M – Recurrence Relation : For each m = 1…M For each n = 1…N C(n-1,m) C(n,m)= O(n,m)+ min C(n,m-1) C(n-1,m-1) – Termination : C(N,M) is distance 17

  18. DP for Dynamic Time Warping (DTW) § Toy Example 18

  19. Score and Audio Alignment by DTW C(i,j) O(i,j) 19

  20. Limitations § The optimal path is obtained after we arrive the destination (by back- tracking) – i.e. the DTW works offline – What if the sequences are very long? – Online version of DTW? § Every frame is equally important – In general, human is more sensitive to note onsets – Perceptually, every frame is not equally important 20

  21. Online DTW § Set a moving search window and 20 calculate the cost only within the 17 window 16 – Time and space cost: quadratic à linear 13 21 11 18 19 § The movement is determined by the 10 9 14 15 position that gives a minimum cost 7 12 within the current window. If the 5 position is ... 3 – Corner: move both up and right 1 2 4 6 8 (alternatively) – Upper edge: move up Figure 2: An example of the on-line time warping algorithm with search window c = 4 , showing the order of evaluation for a partic- – Right edge: move right ular sequence of row and column increments. The axes represent the variables t and j (see Figure 1) respectively. All calculated cells are framed in bold, and the optimal path is coloured grey. [Dixon, 2005] 21

  22. Onset-sensitive Alignment § We are sensitive to the time alignment on note onsets. – The similarity matrix has no additional weight to onsets § DLNCO Features – D ecaying L ocally-adapted N ormalized C hroma O nset – Capture only onset strength on chroma features – Normalize onset energy and note length (by artificially-created note tail) [Ewert, 2009] 22

  23. Onset-sensitive Alignment Demo: https://www.audiolabs-erlangen.de/resources/MIR/SyncRWC60 Score Following Results on the RWC dataset 23

  24. DTW in Matlab § Check out: http://labrosa.ee.columbia.edu/matlab/dtw/ 24

  25. Beat Tracking using Dynamic Programming § Find the optimal “hopping” path that accords with onset detection function and the estimated tempo: 8 8 𝐷 𝑢 2 = 3 𝑃 𝑢 2 + 𝛽 3 𝐺(𝑢 2 − 𝑢 267 , 𝜐) 297 290 – 𝑃(𝑢) is onset detection function ∆. – 𝐺(∆𝑢, 𝜐) is temporal consistency score: 𝐺 ∆𝑢, 𝜐 = −(log / ) 0 § Recast it as a dynamic programming – Maximize the following equation 𝐷 𝑢 = 𝑃 𝑢 + max / {𝛽𝐺 𝑢 + 𝜐, 𝜐 > + 𝐷 𝑢 } 25

  26. Beat Tracking By DP in Matlab § Check out: http://www.ee.columbia.edu/ln/rosa/matlab/beat_simple/ 26

  27. Applications § Performance analysis – Understand human performances • e.g. “In search of the Horowitz Factor” (G. Widmer, 2003) – Performance evaluation for music education and entertainment § Interactive music notation system – Score following: tracking notes or measure – Automatic page turner – Score-synchronized music listening § Auto-accompaniment – Roger Dannerberg’s work – IRCAM Antescofo – Sonation Cadenza 27

  28. Applications Interpretation Switcher Score viewer 28

  29. References • S. Dixon, “Live Tracking Of Musical Performance Using On-line Time Warping”, 2005 • G. Widmer, “In search of the Horowitz Factor”, 2003 • S. Ewert, “High Resolution Audio Synchronization Using Chroma Onset Features”, 2009 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend