Ma aR RV VI IN N: : a a d di is st tr ri ib bu ut te - - PowerPoint PPT Presentation

ma ar rv vi in n a a d di is st tr ri ib bu ut te ed d p
SMART_READER_LITE
LIVE PREVIEW

Ma aR RV VI IN N: : a a d di is st tr ri ib bu ut te - - PowerPoint PPT Presentation

Ma aR RV VI IN N: : a a d di is st tr ri ib bu ut te ed d p pl la at tf fo or rm m f fo or r m ma as ss si iv ve e R RD DF F p pr ro oc ce es ss si in ng g M Ge eo or rg ge e A An na ad di


slide-1
SLIDE 1

MaRVIN is: a platform for processing lots of RDF data (now: computing RDFS/OWL closure) MaRVIN scales by: distributing computation over many nodes approximate (sound but incomplete) reasoning anytime convergence (more complete over time) MaRVIN runs on: in principle: any grid, using Ibis middleware currently: the DAS-3 distributed supercomputer (300 nodes) soon: a wide-area a peer-to-peer network

M Ma aR RV VI IN N: : a a d di is st tr ri ib bu ut te ed d p pl la at tf fo

  • r

rm m f fo

  • r

r m ma as ss si iv ve e R RD DF F p pr ro

  • c

ce es ss si in ng g

G Ge eo

  • r

rg ge e A An na ad di io

  • t

ti is s, , S Sp py yr ro

  • s

s K Ko

  • t

to

  • u

ul la as s, , E Ey ya al l O Or re en n, , R Ro

  • n

nn ny y S Si ie eb be es s, , F Fr ra an nk k v va an n H Ha ar rm me el le en n Niels Drost, Roelof Kemp, Jason Maassen, Frank J. Seinstra, Henri E. Bal Vrije Universiteit Amsterdam

"a brain the size of a planet"

slide-2
SLIDE 2

Main loop: divide-conquer-and-swap

  • 1. divide: split input data in chunks
  • 2. conquer: each node:

reads some chunks, DO: computes closure.

  • 3. swap: each node:

removes all triples: sends some to central storage, sends other to some peer repeat 2-3 ad infinitum anytime incremental results Currently:

  • running on DAS-3, a five-cluster grid system
  • max. 271 machines, 791 cores (2.4Ghz, 4Gb RAM)
  • suffering from growing pains
  • reading data @100ktps/min/node (1B in 1hr, on 100 nodes)
  • producing data @15-25ktps/min/node
slide-3
SLIDE 3

Closure on the dataset (computed) just one example: Infrastructure for experiments over massive RDF data Questions:

  • network overhead vs benefit?
  • scalability (nodes and data)?
  • output quality (anytime behaviour)?
  • routing policy?
  • modular architecture:
  • change initial data distribution
  • change functionality of node
  • change routing policy
  • ...
  • real-time logging, visualisation, analysis

EXPERIMENT AND EVALUATE A TOOL FOR THE RESEARCH COMMUNITY contact: Eyal Oren http://larkc.eu/marvin MaRVIN: a distributed platform for massive RDF processing