Big Data processing with Hadoop
Luca Pireddu
CRS4—Distributed Computing Group
April 18, 2012
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 1 / 44
Big Data processing with Hadoop Luca Pireddu CRS4Distributed - - PowerPoint PPT Presentation
Big Data processing with Hadoop Luca Pireddu CRS4Distributed Computing Group April 18, 2012 luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 1 / 44 Outline Motivation 1 Big Data Parallelizing Big Data
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 1 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 2 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 3 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 4 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 4 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 5 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 6 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 7 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 8 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 9 / 44
1
2
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 10 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 11 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 12 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 13 / 44
the, 1 quick, 1 brown, 1 fox, 1 green, 1 fox, 1 ate, 1 lazy, 1 the, 1
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 14 / 44
the, 1 quick, 1 brown, 1 fox, 1 ate, 1 lazy, 1 fox, 1 green, 1 quick, 1 brown, 1 fox, 2 ate, 1 the, 2 lazy, 1 green, 1 the, 1
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 15 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 16 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 17 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 18 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 18 / 44
Image courtesy of Maneesh Varshney luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 19 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 20 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 21 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 22 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 22 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 23 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 24 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 25 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 26 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 27 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 28 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 29 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 30 / 44
Processing capacity Sequencing capacity
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 31 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 32 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 33 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 33 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 34 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 35 / 44
1 Establish a single-node baseline throughput measure 2 Compare throughput/node of baseline, old CSGP workflow and Seal
3 Compare wall-clock runtimes 4 Evaluate scalability characteristics luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 36 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 37 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 38 / 44
baseline
seal: n 16 seal: n 32 seal: n 64 seal: n 96 read pairs / sec / node 200 400 600 800 1000 1200
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 39 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 40 / 44
MR1 MR3 MR8
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 41 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 42 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 43 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 44 / 44
luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 44 / 44