Ma aR RV VI IN N: : a a d di is st tr ri ib bu ut te - - PowerPoint PPT Presentation

▶

ma ar rv vi in n a a d di is st tr ri ib bu ut te ed d p

Aug 05, 2023 10 likes •43 views

Ma aR RV VI IN N: : a a d di is st tr ri ib bu ut te ed d p pl la at tf fo or rm m f fo or r m ma as ss si iv ve e R RD DF F p pr ro oc ce es ss si in ng g M Ge eo or rg ge e A An na ad di

SLIDE 1

MaRVIN is: a platform for processing lots of RDF data (now: computing RDFS/OWL closure) MaRVIN scales by: distributing computation over many nodes approximate (sound but incomplete) reasoning anytime convergence (more complete over time) MaRVIN runs on: in principle: any grid, using Ibis middleware currently: the DAS-3 distributed supercomputer (300 nodes) soon: a wide-area a peer-to-peer network

M Ma aR RV VI IN N: : a a d di is st tr ri ib bu ut te ed d p pl la at tf fo

rm m f fo

r m ma as ss si iv ve e R RD DF F p pr ro

ce es ss si in ng g

G Ge eo

rg ge e A An na ad di io

ti is s, , S Sp py yr ro

s K Ko

to

ul la as s, , E Ey ya al l O Or re en n, , R Ro

nn ny y S Si ie eb be es s, , F Fr ra an nk k v va an n H Ha ar rm me el le en n Niels Drost, Roelof Kemp, Jason Maassen, Frank J. Seinstra, Henri E. Bal Vrije Universiteit Amsterdam

"a brain the size of a planet"

SLIDE 2

Main loop: divide-conquer-and-swap

1. divide: split input data in chunks
2. conquer: each node:

reads some chunks, DO: computes closure.

3. swap: each node:

removes all triples: sends some to central storage, sends other to some peer repeat 2-3 ad infinitum anytime incremental results Currently:

running on DAS-3, a five-cluster grid system
max. 271 machines, 791 cores (2.4Ghz, 4Gb RAM)
suffering from growing pains
reading data @100ktps/min/node (1B in 1hr, on 100 nodes)
producing data @15-25ktps/min/node

SLIDE 3

Closure on the dataset (computed) just one example: Infrastructure for experiments over massive RDF data Questions:

network overhead vs benefit?
scalability (nodes and data)?
output quality (anytime behaviour)?
routing policy?
modular architecture:
change initial data distribution
change functionality of node
change routing policy
...
real-time logging, visualisation, analysis

EXPERIMENT AND EVALUATE A TOOL FOR THE RESEARCH COMMUNITY contact: Eyal Oren http://larkc.eu/marvin MaRVIN: a distributed platform for massive RDF processing