Implementation of DTW and DDTW algorithm on Cell Broadband Engine - - PowerPoint PPT Presentation

implementation of dtw and ddtw algorithm on cell
SMART_READER_LITE
LIVE PREVIEW

Implementation of DTW and DDTW algorithm on Cell Broadband Engine - - PowerPoint PPT Presentation

IBM - CVUT Student Research Projects Implementation of DTW and DDTW algorithm on Cell Broadband Engine Pavel Bazika (bazikp1@fel.cvut.cz) What do DTW and DDTW do? DTW (dynamic time warping) is dynamic programming based algorithm


slide-1
SLIDE 1

IBM - CVUT Student Research Projects

Implementation of DTW and DDTW algorithm on Cell Broadband Engine

Pavel Bazika (bazikp1@fel.cvut.cz)

slide-2
SLIDE 2

IBM - CVUT Student Research Projects 2

What do DTW and DDTW do?

  • DTW (dynamic time warping) is dynamic

programming based algorithm

  • compares two time series of different length and

computes similarity measure

  • DDTW is an improved version of DTW algorithm

Corresponding points

slide-3
SLIDE 3

IBM - CVUT Student Research Projects 3

DTW and DDTW applications

  • Speech recognition
  • Handwritten signature verification
slide-4
SLIDE 4

IBM - CVUT Student Research Projects 4

DTW calculation scheme

Minimum evaluation Time seq 1 Time seq 2 SIMDized distance computation Computed relation between series (minimum cost path)

slide-5
SLIDE 5

IBM - CVUT Student Research Projects 5

Computing Euclidean distance

  • Four distances in one step are computed
  • Distance matrix is distributed across

available SPU's

  • Each SPU then calculates part of the matrix
slide-6
SLIDE 6

IBM - CVUT Student Research Projects 6

Proposed SIMDization of DTW

  • In one step four

new cells are computed

  • Reading from

memory to register is done only once per matrix cell

Computation direction

slide-7
SLIDE 7

IBM - CVUT Student Research Projects 7

Job distribution

Pipeline processing Matrix transfer Control signals

slide-8
SLIDE 8

IBM - CVUT Student Research Projects 8

Platform comparison

1000 2000 3000 4000 5000 6000 7000 1000 2000 3000 4000 5000 6000 7000 8000 9000

Matrix size Processing time [ms]

PPU PPU SIMD SPU Pentium 4

from here SPU's are the best

slide-9
SLIDE 9

IBM - CVUT Student Research Projects 9

What I've done

  • Distance matrix filling SIMDized
  • Minimum computing also SIMDized
  • Work distribution to pipelined SPU's
  • Result backtracking in filled matrix by PPU
  • Implemented on different platforms
  • Implementations measured and compared
slide-10
SLIDE 10

IBM - CVUT Student Research Projects 10

Future work

  • Asynchronous DMA transfers
  • Tune up algorithms parameters