1
bioKepler - September, 2012
bioKepler.org
Ilkay ALTINTAS, Ph.D.
Deputy Coordinator for Research, San Diego Supercomputer Center, UCSD Lab Director, Scientific Workflow Automation Technologies altintas@sdsc.edu
Parallelization techniques: Applying Map, Reduce and Cross concepts - - PowerPoint PPT Presentation
Parallelization techniques: Applying Map, Reduce and Cross concepts using bioActors Ilkay ALTINTAS, Ph.D. Deputy Coordinator for Research, San Diego Supercomputer Center, UCSD Lab Director, Scientific Workflow Automation Technologies
1
bioKepler - September, 2012
Ilkay ALTINTAS, Ph.D.
Deputy Coordinator for Research, San Diego Supercomputer Center, UCSD Lab Director, Scientific Workflow Automation Technologies altintas@sdsc.edu
2
bioKepler - September, 2012
3
bioKepler - September, 2012
4
bioKepler - September, 2012
Grid Computing aims to “enable resource sharing and
Figure 1 FROM: “Cloud Computing and Grid Computing 360-Degree Compared”, Ian Foster, Yong Zhao, Ioan Raicu, Shiyong Lu. Grid Computing Environments Workshop (GCE), 2008.
5
bioKepler - September, 2012
– Examples: MPI and OpenMP – Hard to implement – Original sequential tools cannot be reused
– Examples: SGE and Condor – Original sequential tools can be reused – Create small jobs by splitting data or tasks – Hard to achieve data locality for each job
– Examples: Hadoop and Stratosphere – Original sequential tools can be reused – Support customized and automatic data partition and distribution – Support data locality for each job through special distributed file system, HDFS
6
bioKepler - September, 2012
D3 D2 D1 D4
D7 D6 D5 D8
7
bioKepler - September, 2012
D3 D2 D1 D4
D7 D6 D5 D8
8
bioKepler - September, 2012
9
bioKepler - September, 2012
http://www.stratosphere.eu
10
bioKepler - September, 2012
11
bioKepler - September, 2012
12
bioKepler - September, 2012
13
bioKepler - September, 2012
14
bioKepler - September, 2012
15
bioKepler - September, 2012
16
bioKepler - September, 2012
17
bioKepler - September, 2012
18
bioKepler - September, 2012
with other DDP engines, such as Hadoop execute with data partition
19
bioKepler - September, 2012
partition for each execution Query data partition for each execution Same reduce sub-workflow with the Map workflow
20
bioKepler - September, 2012
21
bioKepler - September, 2012
A2 An DDP Blast DDP Generic
Specific
Generic
Sub-Workflow
Workflow
Results
Larger Workflow
Library
Workflow
DDP Director
User: Workflow Developer
bioActor Library
22
bioKepler - September, 2012
Daniel Crawl