Cloudster K -means algorithm for cloud computing {stephane.caron, - - PowerPoint PPT Presentation

cloudster
SMART_READER_LITE
LIVE PREVIEW

Cloudster K -means algorithm for cloud computing {stephane.caron, - - PowerPoint PPT Presentation

Cloudster ? Design User interaction Future enhancements Cloudster K -means algorithm for cloud computing {stephane.caron, guillaume.claret, anisse.ismaili, jacques-henri.jourdan, michael.mathieu, mathieu.prevot, guillaume.seguin,


slide-1
SLIDE 1

Cloudster ? Design User interaction Future enhancements

Cloudster

K-means algorithm for cloud computing {stephane.caron, guillaume.claret, anisse.ismaili, jacques-henri.jourdan, michael.mathieu, mathieu.prevot, guillaume.seguin, yingjie.xu}@ens.fr

École Normale Supérieure - Department of Computer Science

May 20 2009

Cloudster team Cloudster

slide-2
SLIDE 2

Cloudster ? Design User interaction Future enhancements

1

Cloudster ? K-means algorithm About Cloudster

2

Design

3

User interaction The CLI way The web way

4

Future enhancements Future (possible) core features Upcoming samples

Cloudster team Cloudster

slide-3
SLIDE 3

Cloudster ? Design User interaction Future enhancements K-means algorithm About Cloudster

Cloudster ?

Cloudster team Cloudster

slide-4
SLIDE 4

Cloudster ? Design User interaction Future enhancements K-means algorithm About Cloudster

K-means algorithm

Goal : given N objects, optimally partition them into K clusters. Basic algorithm : Randomly initialize groups Iterate: foreach point p: Find nearest centroid C(p) Add p to the C(p) group Update centroids

Cloudster team Cloudster

slide-5
SLIDE 5

Cloudster ? Design User interaction Future enhancements K-means algorithm About Cloudster

About Cloudster

Generic k-means algorithm implementation : feel free to feed it with your distance & centroïd computation functions ! Heavily scalable : uses Windows

R

Azure cloud-computing

platform Written in C# & uses the .NET framework BSD licensed, open development @ http://cloudster.sourceforge.net

Cloudster team Cloudster

slide-6
SLIDE 6

Cloudster ? Design User interaction Future enhancements

Design

Cloudster team Cloudster

slide-7
SLIDE 7

Cloudster ? Design User interaction Future enhancements

Design

Blob CoreLib.dll ClusterJob EntityJob IDistance Sample.dll ICentroid T ables

EntityCluster Cluster Status T asks

Worker Queue

Cloudster team Cloudster

slide-8
SLIDE 8

Cloudster ? Design User interaction Future enhancements The CLI way The web way

User interaction

Cloudster team Cloudster

slide-9
SLIDE 9

Cloudster ? Design User interaction Future enhancements The CLI way The web way

The CLI way

Three separate tools : The Builder, which initializes the blob storage and tables and uploads the initial entities The Tester, which starts the algorithm (either the sequential

  • ne or the cloud computed one)

The Evaluator, which computes the score of the current algorithm state

Cloudster team Cloudster

slide-10
SLIDE 10

Cloudster ? Design User interaction Future enhancements The CLI way The web way

The web way

A remote web interface, using Azure’s web roles power. Prefered way for interacting with the cloud : easier, better, faster : Unifies the CLI tools into a single interface Enables thorough monitoring of algorithm state (tasks, results) Enables case-specific visualisations of algorithm results

Cloudster team Cloudster

slide-11
SLIDE 11

Cloudster ? Design User interaction Future enhancements Future (possible) core features Upcoming samples

Future enhancements

Cloudster team Cloudster

slide-12
SLIDE 12

Cloudster ? Design User interaction Future enhancements Future (possible) core features Upcoming samples

Future (possible) core features

Use reflection to unify involved tools Blob storage handling improvements :

Assign workers to specific groups of entities Improve entities cache Store multiple entities in each blob

Split computations and storage queries to dedicated threads Enable the user to add/remove entities on the fly Table repair tool

Cloudster team Cloudster

slide-13
SLIDE 13

Cloudster ? Design User interaction Future enhancements Future (possible) core features Upcoming samples

Upcoming samples

Sparse vectors sample Image comparison sample, based on GIST algorithm (currently investigating some implementation bugs) DNA sequences comparison sample, based on NAligner, using FASTA file format

Cloudster team Cloudster

slide-14
SLIDE 14

Cloudster ? Design User interaction Future enhancements

Questions ?

Cloudster team Cloudster