On the complex network clustering using DryadLINQ Stojan Trajanovski - - PowerPoint PPT Presentation

on the complex network clustering using dryadlinq
SMART_READER_LITE
LIVE PREVIEW

On the complex network clustering using DryadLINQ Stojan Trajanovski - - PowerPoint PPT Presentation

Data Centric Networking (R202) Open source project study On the complex network clustering using DryadLINQ Stojan Trajanovski ( st508 ) MPhil in Advanced Computer Science Motivation Why going parallel in complex networks analysis? Online


slide-1
SLIDE 1

Open source project study

On the complex network clustering using DryadLINQ

Stojan Trajanovski (st508)

MPhil in Advanced Computer Science

Data Centric Networking (R202)

slide-2
SLIDE 2

Data Centric Networking (R202) presenter: Stojan Trajanovski (st508)

2

  • Online social networks, Internet graph
  • millions of users (Facebook, Twitter …)
  • increased computational complexity
  • Why is prospective?
  • some actions are fully independent
  • increased hardware performance
  • multi-core
  • network clusters, global cloud clusters

Motivation Why going parallel in complex networks analysis?

slide-3
SLIDE 3

Data Centric Networking (R202) presenter: Stojan Trajanovski (st508)

3

  • Inherited LINQ behaviour
  • declarative and imperative programming
  • T-SQL syntax in your code
  • no more SQL server store-procedures
  • optimized performance
  • inherited SELECT, GROUP/ORDER BY
  • + Dryad/Parallel processing
  • optimized job management

Motivation Why using PLINQ/DryadLINQ?

slide-4
SLIDE 4

Data Centric Networking (R202) presenter: Stojan Trajanovski (st508)

4

  • problems even with Microsoft concepts
  • requires .NET environment anyway
  • evaluated only on newest Microsoft OSs
  • head node:
  • > Windows Server ’08 OS (problems with ‘03)
  • more than 500G HD, 8 MB memory
  • computational nodes (at least Windows 7)
  • no Windows Azure support
  • Someone mentioned Linux/MacOS? ☺

Why not (mainly pure technical reasons)?

slide-5
SLIDE 5

Data Centric Networking (R202) presenter: Stojan Trajanovski (st508)

5

  • K-means clustering
  • parallel performs better
  • the approach:
  • parallelize the method
  • the results
  • significantly better time performance
  • TO DO
  • more clustering approaches, comparison …

My application/solution? Using PLINQ/DryadLINQ for network clustering?

slide-6
SLIDE 6

Data Centric Networking (R202) presenter: Stojan Trajanovski (st508)

6

different values of N= { 100,200,500,1000}

Some plots Parallel vs. non parallel LINQ (dataset:)

slide-7
SLIDE 7

Data Centric Networking (R202) presenter: Stojan Trajanovski (st508)

7

  • Questions??
  • Short Discussion
  • still work in progress ...