ma ar rv vi in n a a d di is st tr ri ib bu ut te ed d p
play

Ma aR RV VI IN N: : a a d di is st tr ri ib bu ut te - PowerPoint PPT Presentation

Ma aR RV VI IN N: : a a d di is st tr ri ib bu ut te ed d p pl la at tf fo or rm m f fo or r m ma as ss si iv ve e R RD DF F p pr ro oc ce es ss si in ng g M Ge eo or rg ge e A An na ad di


  1. Ma aR RV VI IN N: : a a d di is st tr ri ib bu ut te ed d p pl la at tf fo or rm m f fo or r m ma as ss si iv ve e R RD DF F p pr ro oc ce es ss si in ng g M Ge eo or rg ge e A An na ad di io ot ti is s, , S Sp py yr ro os s K Ko ot to ou ul la as s, , E Ey ya al l O Or re en n, , R Ro on nn ny y S Si ie eb be es s, , F Fr ra an nk k v va an n H Ha ar rm me el le en n G Niels Drost, Roelof Kemp, Jason Maassen, Frank J. Seinstra, Henri E. Bal Vrije Universiteit Amsterdam "a brain the size of a planet" MaRVIN is: a platform for processing lots of RDF data (now: computing RDFS/OWL closure) MaRVIN scales by: distributing computation over many nodes approximate (sound but incomplete) reasoning anytime convergence (more complete over time) MaRVIN runs on: in principle: any grid, using Ibis middleware currently: the DAS-3 distributed supercomputer (300 nodes) soon: a wide-area a peer-to-peer network

  2. Main loop: divide-conquer-and-swap 1. divide : split input data in chunks 2. conquer : each node: reads some chunks, DO : computes closure. 3. swap : each node: removes all triples: sends some to central storage, sends other to some peer repeat 2-3 ad infinitum anytime incremental results Currently: - running on DAS-3, a five-cluster grid system - max. 271 machines, 791 cores (2.4Ghz, 4Gb RAM) - suffering from growing pains - reading data @100ktps/min/node (1B in 1hr, on 100 nodes) - producing data @15-25ktps/min/node

  3. Closure on the dataset (computed) just one example: Infrastructure for experiments over massive RDF data Questions: - network overhead vs benefit? - scalability (nodes and data)? - output quality (anytime behaviour)? - routing policy? - modular architecture: - change initial data distribution - change functionality of node - change routing policy - ... - real-time logging, visualisation, analysis EXPERIMENT AND EVALUATE A TOOL FOR THE RESEARCH COMMUNITY MaRVIN: a distributed platform for massive RDF processing contact: Eyal Oren http://larkc.eu/marvin

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend