Audio Files Realignment by Dynamic Time Warping (DTW) Florian - - PowerPoint PPT Presentation

audio files realignment by dynamic time warping dtw
SMART_READER_LITE
LIVE PREVIEW

Audio Files Realignment by Dynamic Time Warping (DTW) Florian - - PowerPoint PPT Presentation

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion Audio Files Realignment by Dynamic Time Warping (DTW) Florian Picard, Florian Tilquin June 27, 2013 1 / 22 Introduction Dynamic Time Warping Audio


slide-1
SLIDE 1

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Audio Files Realignment by Dynamic Time Warping (DTW)

Florian Picard, Florian Tilquin June 27, 2013

1 / 22

slide-2
SLIDE 2

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

What is DTW ?

Matching algorithm estimating similarities and establishing pointwise correspondance between two time series U and V ; In sound processing:

Realignment of audio files; Speech recognition

Robust to time fluctuations (interpretation, rhythm, noise...)

1 / 22

slide-3
SLIDE 3

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Signal representation: spectrograms

2 / 22

slide-4
SLIDE 4

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Signal representation: spectrograms

2 / 22

slide-5
SLIDE 5

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Definition of a path

Let U and V be two sequences of vectors in Rn, of size M, and N

  • respectively. We define P a ”path of size K” as a sequence of

points in 1, M × 1, N of length K.

Figure : Example of path

3 / 22

slide-6
SLIDE 6

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Definition of a path

Let U and V be two sequences of vectors in Rn, of size M, and N

  • respectively. We define P a ”path of size K” as a sequence of

points in 1, M × 1, N of length K.

Figure : Example of path

3 / 22

slide-7
SLIDE 7

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Definition of a path

Let U and V be two sequences of vectors in Rn, of size M, and N

  • respectively. We define P a ”path of size K” as a sequence of

points in 1, M × 1, N of length K.

Figure : Example of path

3 / 22

slide-8
SLIDE 8

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Definition of a path

Let U and V be two sequences of vectors in Rn, of size M, and N

  • respectively. We define P a ”path of size K” as a sequence of

points in 1, M × 1, N of length K.

Figure : Example of path

3 / 22

slide-9
SLIDE 9

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Cost function and set of path P

Cost function: W (P) = K

k=1

  • Ui(k) − Vj(k)
  • 2

Definition of P: path in P are continuous monotonic and bounded.

(a) Continuity, mono- tonicity (b) Example of path in P

We want to minimize W over P: P∗ = arg minP∈P W (P)

4 / 22

slide-10
SLIDE 10

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

The DTW Algorithm

We note D the matrix of distances: D = (Ui − Vj2). We define a cost matrix C, defined by recursion: C(i, j) is the minimal cost of a path finishing at (i, j).

Figure : Construction of the cost matrix

Knowing C we easily can deduce P∗.

5 / 22

slide-11
SLIDE 11

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Variant: parameters α, β, γ, p, δ

α, β, γ put weight on transitions for the path; p percentage of coefficients actually computed in C; δ controls the location of the end of the path1.

Figure : Parameters: α, β, γ, p, δ

1Considerations on DTW algorithm for spoken word recognition, L. Rabiner,

  • A. Rosenberg, S. Levinson

6 / 22

slide-12
SLIDE 12

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Audio file realignment

We recorded 18 vyniles (extracts of 30 seconds), some of records alterated (jump, acceleration...) into wav; U = reference signal, V = tested signal; We tested our algorithm on the normalized spectrograms. Objectives Use the path given by the algorithm in order to:

  • utput a pointwise time correspondance between the two

signals; reconstruct the alterated signal to play both signals together.

7 / 22

slide-13
SLIDE 13

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Influence of parameters

(a) Influence of the di- agonal transition pa- rameter β, here set to 2 (b) Influence

  • f

the truncature parameter p, here set to 0.5 (c) Influence

  • f

the boundary parameter δ, here set to 0.2

8 / 22

slide-14
SLIDE 14

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Isolated word recognition

DTW gives the minimal cost of a realignement between two audio files, which is independant of time fluctuations. Provide a distance measure adapted to word recognition Data 13 subjects (3 females and 10 males); 31 words, some of them similar (”m` ene-m` ere”, ”irruption-´ eruption”, ”rateau-bateau”...);

9 / 22

slide-15
SLIDE 15

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Rematch matrix

Sp13 Sp12 Sp11 Sp10 Sp9 Sp8 Sp7 Sp6 Sp5 Sp1 Sp2 Sp3 Sp4 Sp5 Sp6 Sp7 Sp8 Sp9

Reference speaker

T est speaker

Sp4 Sp3 Sp2 Sp10 Sp11 Sp12 Sp13 Sp1

Bad match Good match

10 / 22

slide-16
SLIDE 16

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Confusion matrices

Detected word Spoken word

(a) Word confusion matrix

Frequency 1

Detected speaker Speaker

(b) Speaker confusion matrix

11 / 22

slide-17
SLIDE 17

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Conclusion

What we were able to do during this internship: Do an extended bibliographical research on the subject and a detailed state of the art; Provide a code for the algorithm studied in matlab; Put it to the test with a lot of data that we collected; Explore two applications of the algorithm in order to test its precision and its limits; Submit a research article on the subject containing our results

  • n observations to the journal IPOL/SPOL.

12 / 22

slide-18
SLIDE 18

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Sets of records

3 sets of records: recorded in wav (sample rate: 44100 Hz) Various modifications (jumps, accelerations, decelerations,

  • ffset...)

Various origins (speech, classical music, instrumental music) For each of those record, we manage to realign the alterated records with the reference one.

13 / 22

slide-19
SLIDE 19

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Le concerto de la mer

Track Modifications 1 Our reference 2 Acceleration at 10”, deceleration at 20” 3 A jump in the record 4 With another vinyle disc of the same music 5 Played faster 6 Played louder 7 Played backward 8 Everything (except playing backward)

Table : Modifications on Le concerto de la mer

14 / 22

slide-20
SLIDE 20

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Mozart K550-1

Track Modifications 1 Our reference 2 Deceleration at 15” 3 Acceleration at 15” 4 Played faster 5 & 6 With another vinyle disc of the same music

Table : Modifications on K.550 -1.

15 / 22

slide-21
SLIDE 21

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Le petit prince

Track Modifications 1 Our reference 2 & 3 With another vinyle disc of the same music (huge offset) 4 Speed variations

Table : Modifications on Le petit prince

16 / 22

slide-22
SLIDE 22

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Effect of acceleration

We show here the result of a DTW between U = reference file (Mozart), and V = tested file (Mozart accelerated).

Figure : Change in speed (acceleration)

17 / 22

slide-23
SLIDE 23

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Speakers caracteristics

Speakers were male and female, and some subjects are speaking with accent, heavy or slight. Sp1 Male, no accent Sp8 Male, no accent Sp2 Male, no accent Sp9 Male, heavy accent Sp3 Male, no accent Sp10 Male, slight accent Sp4 Female, no accent Sp11 Female, slight accent Sp5 Male, no accent Sp12 Female, no accent Sp6 Male, no accent Sp13 Male, no accent Sp7 Male, no accent

18 / 22

slide-24
SLIDE 24

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Word list

1 M` ene 9 Soleil 17 Mˆ eler 25 Poteau 2 Usuel 10 Forme 18 Limonade 26 Vision 3 Sommeil 11 M` ere 19 Tomate 27 Oseille 4 Camion 12 Canon 20 Oreiller 28 Mˆ eme 5 Tome 13 Mireille 21 Groseille 29 ´ Eruption 6 M´ emoire 14 Bateau 22 Passoire 30 T´ el´ evision 7 Pareil 15 Homme 23 Rome 31 Rateau 8 Abricot 16 Irruption 24 Bravo

Table : List of the 31 words pronunced, in the order of pronunciation respected by all speakers.

19 / 22

slide-25
SLIDE 25

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Speaker to speaker matches

5 10 15 20 25 30 5 10 15 20 25 30

Words (Sp2) Words (Sp3)

Good match Bad match

(a) Sp2 - Sp3

5 10 15 20 25 30 5 10 15 20 25 30

Words (Sp4) Words (Sp10)

Good match Bad match

(b) Sp4 - Sp10

20 / 22

slide-26
SLIDE 26

Introduction Dynamic Time Warping Audio files realignment Speech recognition Conclusion

Word recognition: path examples

10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70

(a) M` ene (Sp2) - Mˆ eme (Sp3)

10 20 30 40 50 60 70 10 20 30 40 50 60 70 80

(b) Sommeil (Sp2) - Sommeil (Sp3)

10 20 30 40 50 60 70 80 10 20 30 40 50 60

(c) Rateau (Sp2) - Usuel (Sp3)

21 / 22