Implementation of K-means on the Cell Broadband Engine Filip imek - - PowerPoint PPT Presentation

implementation of k means on the cell broadband engine
SMART_READER_LITE
LIVE PREVIEW

Implementation of K-means on the Cell Broadband Engine Filip imek - - PowerPoint PPT Presentation

IBM - CVUT Student Research Projects Implementation of K-means on the Cell Broadband Engine Filip imek (simekf1@fel.cvut.cz) About K-means a data clustering algorithm places k centroids in an n-dimensional space so that intra-cluster


slide-1
SLIDE 1

IBM - CVUT Student Research Projects

Implementation of K-means on the Cell Broadband Engine

Filip Šimek (simekf1@fel.cvut.cz)

slide-2
SLIDE 2

IBM - CVUT Student Research Projects 2

About K-means

  • a data clustering algorithm
  • places k centroids in an

n-dimensional space so that intra-cluster variance (sum

  • f distances between points

and their nearest centroid) is minimal

  • large number of points gives

space for parallelization

slide-3
SLIDE 3

IBM - CVUT Student Research Projects 3

K-means application example

– reducing number of colors of an image

Truecolor original 16 colors found using K-means

slide-4
SLIDE 4

IBM - CVUT Student Research Projects 4

Implementation

1

st step - scalar implementation

  • the program runs on the PPU of Cell

Broadband Engine with no parallelism

  • easy to implement or port from another

scalar implementation

  • does not take full advantage of CBE

performance

slide-5
SLIDE 5

IBM - CVUT Student Research Projects 5

Implementation

2

nd step - vector implementation

  • the PPU distributes work among one or

more SPUs, which use SIMD instructions

  • a) multiple serial calls to one SPU thread,
  • ccupies only one SPE, the others can

perform another operations in a pipeline

  • b) multiple SPUs running in parallel on

multiple data

slide-6
SLIDE 6

IBM - CVUT Student Research Projects 6

What's done

  • scalar implementation of the algorithm

running on the PPU

  • operation on n-dimensional data
  • organisation of data into vectors
  • PPU loop creating a single SPE thread as

part of the implementation of step 2a)

slide-7
SLIDE 7

IBM - CVUT Student Research Projects 7

Future plans

  • code running on the SPU
  • implementation of step 2b)
  • possibility to separate learning and

classification

  • call K-means multiple times to reduce error
  • use DMA effectively
  • implement a unified interface to integrate

this code into a larger toolkit