Audio Files Realignment by Dynamic Time Warping (DTW) Florian - PowerPoint PPT Presentation

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Audio Files Realignment by Dynamic Time Warping (DTW) Florian Picard, Florian Tilquin June 27, 2013 1 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion What is DTW ? Matching algorithm estimating similarities and establishing pointwise correspondance between two time series U and V ; In sound processing: Realignment of audio files; Speech recognition Robust to time fluctuations (interpretation, rhythm, noise...) 1 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Signal representation: spectrograms 2 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Definition of a path Let U and V be two sequences of vectors in R n , of size M , and N respectively. We define P a ”path of size K ” as a sequence of points in � 1 , M � × � 1 , N � of length K . Figure : Example of path 3 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Cost function and set of path P � � Cost function: W ( P ) = � K � U i ( k ) − V j ( k ) � � k = 1 � 2 Definition of P : path in P are continuous monotonic and bounded. (a) Continuity, mono- (b) Example of path in P tonicity We want to minimize W over P : P ∗ = arg min P ∈ P W ( P ) 4 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion The DTW Algorithm We note D the matrix of distances: D = ( � U i − V j � 2 ) . We define a cost matrix C, defined by recursion: C ( i , j ) is the minimal cost of a path finishing at ( i , j ) . Figure : Construction of the cost matrix Knowing C we easily can deduce P ∗ . 5 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Variant: parameters α , β , γ , p , δ α , β , γ put weight on transitions for the path; p percentage of coefficients actually computed in C; δ controls the location of the end of the path 1 . Figure : Parameters: α , β , γ , p , δ 1 Considerations on DTW algorithm for spoken word recognition , L. Rabiner, A. Rosenberg, S. Levinson 6 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Audio file realignment We recorded 18 vyniles (extracts of 30 seconds), some of records alterated (jump, acceleration...) into wav; U = reference signal, V = tested signal; We tested our algorithm on the normalized spectrograms. Objectives Use the path given by the algorithm in order to: output a pointwise time correspondance between the two signals; reconstruct the alterated signal to play both signals together. 7 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Influence of parameters (a) Influence of the di- (b) Influence of the (c) Influence of the agonal transition pa- truncature parameter p , boundary parameter δ , rameter β , here set to 2 here set to 0.5 here set to 0.2 8 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Isolated word recognition DTW gives the minimal cost of a realignement between two audio files, which is independant of time fluctuations. Provide a distance measure adapted to word recognition Data 13 subjects (3 females and 10 males); 31 words, some of them similar (”m` ene-m` ere”, ”irruption-´ eruption”, ”rateau-bateau”...); 9 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Rematch matrix Bad match Sp1 Sp2 Sp3 Reference speaker Sp4 Sp5 Sp6 Sp7 Sp8 Sp9 Sp10 Sp11 Sp12 Sp13 Good match Sp10 Sp11 Sp12 Sp13 Sp4 Sp7 Sp9 Sp1 Sp2 Sp3 Sp5 Sp6 Sp8 T est speaker 10 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Confusion matrices Detected speaker Detected word 1 Spoken word Speaker 0 Frequency (a) Word confusion matrix (b) Speaker confusion matrix 11 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Conclusion What we were able to do during this internship: Do an extended bibliographical research on the subject and a detailed state of the art; Provide a code for the algorithm studied in matlab; Put it to the test with a lot of data that we collected; Explore two applications of the algorithm in order to test its precision and its limits; Submit a research article on the subject containing our results on observations to the journal IPOL/SPOL. 12 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Sets of records 3 sets of records: recorded in wav (sample rate: 44100 Hz) Various modifications (jumps, accelerations, decelerations, offset...) Various origins (speech, classical music, instrumental music) For each of those record, we manage to realign the alterated records with the reference one. 13 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Le concerto de la mer Track Modifications 1 Our reference 2 Acceleration at 10”, deceleration at 20” 3 A jump in the record 4 With another vinyle disc of the same music 5 Played faster 6 Played louder 7 Played backward 8 Everything (except playing backward) Table : Modifications on Le concerto de la mer 14 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Mozart K550-1 Track Modifications 1 Our reference 2 Deceleration at 15” 3 Acceleration at 15” 4 Played faster 5 & 6 With another vinyle disc of the same music Table : Modifications on K.550 -1. 15 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Le petit prince Track Modifications 1 Our reference 2 & 3 With another vinyle disc of the same music (huge offset) 4 Speed variations Table : Modifications on Le petit prince 16 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Effect of acceleration We show here the result of a DTW between U = reference file (Mozart), and V = tested file (Mozart accelerated). Figure : Change in speed (acceleration) 17 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Speakers caracteristics Speakers were male and female, and some subjects are speaking with accent, heavy or slight. Sp1 Male, no accent Sp8 Male, no accent Sp2 Male, no accent Sp9 Male, heavy accent Sp3 Male, no accent Sp10 Male, slight accent Sp4 Female, no accent Sp11 Female, slight accent Sp5 Male, no accent Sp12 Female, no accent Sp6 Male, no accent Sp13 Male, no accent Sp7 Male, no accent 18 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Word list 1 M` ene 9 Soleil 17 Mˆ eler 25 Poteau 2 Usuel 10 Forme 18 Limonade 26 Vision 3 Sommeil 11 M` ere 19 Tomate 27 Oseille 4 Camion 12 Canon 20 Oreiller 28 Mˆ eme ´ 5 Tome 13 Mireille 21 Groseille 29 Eruption 6 M´ emoire 14 Bateau 22 Passoire 30 T´ el´ evision 7 Pareil 15 Homme 23 Rome 31 Rateau 8 Abricot 16 Irruption 24 Bravo Table : List of the 31 words pronunced, in the order of pronunciation respected by all speakers. 19 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Speaker to speaker matches Bad match Bad match 5 5 10 10 Words (Sp2) Words (Sp4) 15 15 20 20 25 25 30 30 Good match Good match 5 10 15 20 25 30 5 10 15 20 25 30 Words (Sp3) Words (Sp10) (a) Sp2 - Sp3 (b) Sp4 - Sp10 20 / 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Word recognition: path examples 10 10 10 20 20 20 30 30 40 30 40 50 50 40 60 60 50 70 70 80 60 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 (a) M` ene (Sp2) - Mˆ eme (b) Sommeil (Sp2) - (c) Rateau (Sp2) - Usuel (Sp3) (Sp3) Sommeil (Sp3) 21 / 22

Audio Files Realignment by Dynamic Time Warping (DTW) Florian - PowerPoint PPT Presentation

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Audio Files Realignment by Dynamic Time Warping (DTW) Florian Picard, Florian Tilquin June 27, 2013 1 / 22 Introduction Dynamic Time Warping Audio

Implementation of DTW and DDTW algorithm on Cell Broadband Engine Pavel Bazika

Introduction Warping polynomial Span of warping polynomial Span and dealternating number Ayaka

Exact Indexing of Dynamic Exact Indexing of Dynamic Time Warping Time Warping Eamonn Keogh

Sta$s$cal model training DTW, EM, and HMM training DTW:

Today Alignment & warping 2d transformations Forward and inverse image warping

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Two Years into Criminal Justice Realignment: The Role of the Court Realignment: A Primmer

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

Motion Cyclification Cyclification Motion by by Time x Frequency Warping Time x Frequency

Accessing Files in Python Learning Objectives Concepts about files in Python How to open

The Dynamic Audio of Vessel The Dynamic Audio of Vessel Leonard J. Paul Leonard J. Paul

Dynamic Time Warping Averaging of Time Series allows Faster and more Accurate Classification F.

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

1991 Realignment Webinar Understanding the relationship between CCI, IHSS and 1991 Realignment

WELCOME US 81 Realignment, Chickasha Environmental Assessment Public Hearing March 23, 2017 US

West Main Street Realignment Project City Council November 3, 2009 West Main Realignment

MUSCLE WP5 Showcase: M. Perakakis E. Sanchez-Soto Real-Time Audio-Visual ICCS-NTUA

Effective Open Source Speech Recognition in Your Application #kde-speech Peter Grasch

Speech: The Next Generation Bryan Catanzaro along with Baidu

24.10.13 Investa Office Fund (ASX:IOF) Annual Unitholder Meeting Dear Sir/Madam, Enclosed is

WHAT IF WE HAD 5.5 MILLION PEOPLE DISCUSSING HOW TO APPLY #AI IN EVERYDAY LIFE? 2 AI IS A NEW

AD ADA Adacel Technologies Li Limited Investor Presentation April 2016 Introductions Gary

Deep learning for speech synthesis The good news, the bad news, and the fake news Scott

Software Infrastructure for Spoken Dialogue System Presenter: Aneef Izhar Ul Haq Components of a

Audio Files Realignment by Dynamic Time Warping (DTW) Florian - PowerPoint PPT Presentation

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Audio Files Realignment by Dynamic Time Warping (DTW) Florian Picard, Florian Tilquin June 27, 2013 1 / 22 Introduction Dynamic Time Warping Audio

Implementation of DTW and DDTW algorithm on Cell Broadband Engine Pavel Bazika

Introduction Warping polynomial Span of warping polynomial Span and dealternating number Ayaka

Exact Indexing of Dynamic Exact Indexing of Dynamic Time Warping Time Warping Eamonn Keogh

Sta$s$cal model training DTW, EM, and HMM training DTW:

Today Alignment &amp; warping 2d transformations Forward and inverse image warping

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Two Years into Criminal Justice Realignment: The Role of the Court Realignment: A Primmer

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

Motion Cyclification Cyclification Motion by by Time x Frequency Warping Time x Frequency

Accessing Files in Python Learning Objectives Concepts about files in Python How to open

The Dynamic Audio of Vessel The Dynamic Audio of Vessel Leonard J. Paul Leonard J. Paul

Dynamic Time Warping Averaging of Time Series allows Faster and more Accurate Classification F.

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

1991 Realignment Webinar Understanding the relationship between CCI, IHSS and 1991 Realignment

WELCOME US 81 Realignment, Chickasha Environmental Assessment Public Hearing March 23, 2017 US

West Main Street Realignment Project City Council November 3, 2009 West Main Realignment

MUSCLE WP5 Showcase: M. Perakakis E. Sanchez-Soto Real-Time Audio-Visual ICCS-NTUA

Effective Open Source Speech Recognition in Your Application #kde-speech Peter Grasch

Speech: The Next Generation Bryan Catanzaro along with Baidu

24.10.13 Investa Office Fund (ASX:IOF) Annual Unitholder Meeting Dear Sir/Madam, Enclosed is

WHAT IF WE HAD 5.5 MILLION PEOPLE DISCUSSING HOW TO APPLY #AI IN EVERYDAY LIFE? 2 AI IS A NEW

AD ADA Adacel Technologies Li Limited Investor Presentation April 2016 Introductions Gary

Deep learning for speech synthesis The good news, the bad news, and the fake news Scott

Software Infrastructure for Spoken Dialogue System Presenter: Aneef Izhar Ul Haq Components of a

Today Alignment & warping 2d transformations Forward and inverse image warping