audio files realignment by dynamic time warping dtw
play

Audio Files Realignment by Dynamic Time Warping (DTW) Florian - PowerPoint PPT Presentation

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Audio Files Realignment by Dynamic Time Warping (DTW) Florian Picard, Florian Tilquin June 27, 2013 1 / 22 Introduction Dynamic Time Warping Audio


  1. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Audio Files Realignment by Dynamic Time Warping (DTW) Florian Picard, Florian Tilquin June 27, 2013 1 / 22

  2. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion What is DTW ? Matching algorithm estimating similarities and establishing pointwise correspondance between two time series U and V ; In sound processing: Realignment of audio files; Speech recognition Robust to time fluctuations (interpretation, rhythm, noise...) 1 / 22

  3. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Signal representation: spectrograms 2 / 22

  4. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Signal representation: spectrograms 2 / 22

  5. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Definition of a path Let U and V be two sequences of vectors in R n , of size M , and N respectively. We define P a ”path of size K ” as a sequence of points in � 1 , M � × � 1 , N � of length K . Figure : Example of path 3 / 22

  6. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Definition of a path Let U and V be two sequences of vectors in R n , of size M , and N respectively. We define P a ”path of size K ” as a sequence of points in � 1 , M � × � 1 , N � of length K . Figure : Example of path 3 / 22

  7. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Definition of a path Let U and V be two sequences of vectors in R n , of size M , and N respectively. We define P a ”path of size K ” as a sequence of points in � 1 , M � × � 1 , N � of length K . Figure : Example of path 3 / 22

  8. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Definition of a path Let U and V be two sequences of vectors in R n , of size M , and N respectively. We define P a ”path of size K ” as a sequence of points in � 1 , M � × � 1 , N � of length K . Figure : Example of path 3 / 22

  9. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Cost function and set of path P � � Cost function: W ( P ) = � K � U i ( k ) − V j ( k ) � � k = 1 � 2 Definition of P : path in P are continuous monotonic and bounded. (a) Continuity, mono- (b) Example of path in P tonicity We want to minimize W over P : P ∗ = arg min P ∈ P W ( P ) 4 / 22

  10. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion The DTW Algorithm We note D the matrix of distances: D = ( � U i − V j � 2 ) . We define a cost matrix C, defined by recursion: C ( i , j ) is the minimal cost of a path finishing at ( i , j ) . Figure : Construction of the cost matrix Knowing C we easily can deduce P ∗ . 5 / 22

  11. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Variant: parameters α , β , γ , p , δ α , β , γ put weight on transitions for the path; p percentage of coefficients actually computed in C; δ controls the location of the end of the path 1 . Figure : Parameters: α , β , γ , p , δ 1 Considerations on DTW algorithm for spoken word recognition , L. Rabiner, A. Rosenberg, S. Levinson 6 / 22

  12. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Audio file realignment We recorded 18 vyniles (extracts of 30 seconds), some of records alterated (jump, acceleration...) into wav; U = reference signal, V = tested signal; We tested our algorithm on the normalized spectrograms. Objectives Use the path given by the algorithm in order to: output a pointwise time correspondance between the two signals; reconstruct the alterated signal to play both signals together. 7 / 22

  13. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Influence of parameters (a) Influence of the di- (b) Influence of the (c) Influence of the agonal transition pa- truncature parameter p , boundary parameter δ , rameter β , here set to 2 here set to 0.5 here set to 0.2 8 / 22

  14. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Isolated word recognition DTW gives the minimal cost of a realignement between two audio files, which is independant of time fluctuations. Provide a distance measure adapted to word recognition Data 13 subjects (3 females and 10 males); 31 words, some of them similar (”m` ene-m` ere”, ”irruption-´ eruption”, ”rateau-bateau”...); 9 / 22

  15. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Rematch matrix Bad match Sp1 Sp2 Sp3 Reference speaker Sp4 Sp5 Sp6 Sp7 Sp8 Sp9 Sp10 Sp11 Sp12 Sp13 Good match Sp10 Sp11 Sp12 Sp13 Sp4 Sp7 Sp9 Sp1 Sp2 Sp3 Sp5 Sp6 Sp8 T est speaker 10 / 22

  16. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Confusion matrices Detected speaker Detected word 1 Spoken word Speaker 0 Frequency (a) Word confusion matrix (b) Speaker confusion matrix 11 / 22

  17. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Conclusion What we were able to do during this internship: Do an extended bibliographical research on the subject and a detailed state of the art; Provide a code for the algorithm studied in matlab; Put it to the test with a lot of data that we collected; Explore two applications of the algorithm in order to test its precision and its limits; Submit a research article on the subject containing our results on observations to the journal IPOL/SPOL. 12 / 22

  18. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Sets of records 3 sets of records: recorded in wav (sample rate: 44100 Hz) Various modifications (jumps, accelerations, decelerations, offset...) Various origins (speech, classical music, instrumental music) For each of those record, we manage to realign the alterated records with the reference one. 13 / 22

  19. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Le concerto de la mer Track Modifications 1 Our reference 2 Acceleration at 10”, deceleration at 20” 3 A jump in the record 4 With another vinyle disc of the same music 5 Played faster 6 Played louder 7 Played backward 8 Everything (except playing backward) Table : Modifications on Le concerto de la mer 14 / 22

  20. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Mozart K550-1 Track Modifications 1 Our reference 2 Deceleration at 15” 3 Acceleration at 15” 4 Played faster 5 & 6 With another vinyle disc of the same music Table : Modifications on K.550 -1. 15 / 22

  21. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Le petit prince Track Modifications 1 Our reference 2 & 3 With another vinyle disc of the same music (huge offset) 4 Speed variations Table : Modifications on Le petit prince 16 / 22

  22. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Effect of acceleration We show here the result of a DTW between U = reference file (Mozart), and V = tested file (Mozart accelerated). Figure : Change in speed (acceleration) 17 / 22

  23. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Speakers caracteristics Speakers were male and female, and some subjects are speaking with accent, heavy or slight. Sp1 Male, no accent Sp8 Male, no accent Sp2 Male, no accent Sp9 Male, heavy accent Sp3 Male, no accent Sp10 Male, slight accent Sp4 Female, no accent Sp11 Female, slight accent Sp5 Male, no accent Sp12 Female, no accent Sp6 Male, no accent Sp13 Male, no accent Sp7 Male, no accent 18 / 22

  24. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Word list 1 M` ene 9 Soleil 17 Mˆ eler 25 Poteau 2 Usuel 10 Forme 18 Limonade 26 Vision 3 Sommeil 11 M` ere 19 Tomate 27 Oseille 4 Camion 12 Canon 20 Oreiller 28 Mˆ eme ´ 5 Tome 13 Mireille 21 Groseille 29 Eruption 6 M´ emoire 14 Bateau 22 Passoire 30 T´ el´ evision 7 Pareil 15 Homme 23 Rome 31 Rateau 8 Abricot 16 Irruption 24 Bravo Table : List of the 31 words pronunced, in the order of pronunciation respected by all speakers. 19 / 22

  25. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Speaker to speaker matches Bad match Bad match 5 5 10 10 Words (Sp2) Words (Sp4) 15 15 20 20 25 25 30 30 Good match Good match 5 10 15 20 25 30 5 10 15 20 25 30 Words (Sp3) Words (Sp10) (a) Sp2 - Sp3 (b) Sp4 - Sp10 20 / 22

  26. Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Word recognition: path examples 10 10 10 20 20 20 30 30 40 30 40 50 50 40 60 60 50 70 70 80 60 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 (a) M` ene (Sp2) - Mˆ eme (b) Sommeil (Sp2) - (c) Rateau (Sp2) - Usuel (Sp3) (Sp3) Sommeil (Sp3) 21 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend