Efficient Scaling Up of Parallel Graph Algorithms for Genome-Scale - PowerPoint PPT Presentation

Efficient Scaling Up of Parallel Graph Algorithms for Genome-Scale Biological Problems on Cray XT Kevin Thomas , Cray Inc.

Outline Biological networks Graph algorithms and terminology Implementation of a parallel graph algorithm Optimization of single-thread performance Lessons learned May 2008 Cray Inc. Proprietary Slide 2

Analysis of Biological Networks Analysis of biological networks is increasingly an used tool in biology Numerous types of biological networks Gene Expression Protein Interaction Metabolic Phylogenetic Signal Transduction Biological networks analysis requires the solution of combinatorial problems Maximal and maximum clique Vertex cover Dominating set Shortest path May 2008 Cray Inc. Proprietary Slide 3

Biological Applications of Maximal Clique Enumeration Structural Alignment Tertiary Structure Genome Mapping MCE Functional Protein Relationships Gene Expression May 2008 Cray Inc. Proprietary Slide 4

Graphs and Cliques Graphs are composed of vertices connected by edges A clique is a set of vertices which are pair-wise connected A maximal clique cannot include any additional vertex and still remain a clique (a,c,d,e) is a maximal clique a b e d c Step 1 of 2 Step 2 of 2 May 2008 Cray Inc. Proprietary Slide 5

Maximal Clique Enumeration Finding all of the maximal cliques of a graph (a,b,d) (a,c,d,e) a b a b e d e d c c May 2008 Cray Inc. Proprietary Slide 6

Maximal Clique Enumeration Brute Force Search May 2008 Cray Inc. Proprietary Slide 7

Maximal Clique Enumeration Applying a backtracking algorithm results in a search tree / a b a b c d e e d b c e d c d d e e May 2008 Cray Inc. Proprietary Slide 8

Parallel Maximal Clique Enumeration The search tree is divided into independent sub-trees Unexplored sub-trees are represented as candidate paths The candidate paths are placed into per-thread work pools Thread 1 Thread 2 Thread 3 Thread 4 candidate path candidate path candidate path candidate path candidate path candidate path candidate path candidate path candidate path candidate path candidate path candidate path May 2008 Cray Inc. Proprietary Slide 9

Load Balancing The work pools can become unbalanced over time Thread 1 Thread 2 Thread 3 Thread 4 candidate path candidate path candidate path candidate path candidate path candidate path candidate path candidate path candidate path candidate path candidate path candidate path candidate path Dynamic load balancing through work stealing Step1 of 3 Step2 of 3 Step3 of 3 May 2008 Cray Inc. Proprietary Slide 10

Two levels of load balancing Thread level Used when one thread of a process becomes idle Balances work within a single process Each thread acts on its own to steal work from other threads Locks are used to prevent race conditions Process level Used when all threads of a process become idle Local master thread sends a request to another process Remote master thread responds to the request Master thread must poll for incoming requests while performing the main computation May 2008 Cray Inc. Proprietary Slide 11

Load balancing between processes Process 1 Process 2 Request Thread 1 Thread 2 Thread 1 Thread 2 candidate path candidate path candidate path candidate path Response candidate path candidate path candidate path candidate path Step 2 of 3 Step1 of 3 Step3 of 3 May 2008 Cray Inc. Proprietary Slide 12

Termination Process-level load balancing attempts are made until all processes have been checked When no process has work to share, then the idle state is entered To synchronize globally, an idle notification is sent to each process When all processes are idle, the job can terminate 2(N-1) 2 messages are required for termination May 2008 Cray Inc. Proprietary Slide 13

Adjacency Test – Linear List An important MCE operation is testing two vertices for adjacency Graph representation uses a vertex adjacency list Each vertex has a list of adjacent vertices An adjacency test requires a list traversal A linked list is easy to build, but slow to search A linear list (array) is faster to search a b c d e a b b a d e d c a d e d a b c e c e a c d May 2008 Cray Inc. Proprietary Slide 14

Adjacency Test – Bit Matrix Adjacency bit matrix has a fast, constant time lookup Memory requirement is N 2 a b c d e a b a - 1 1 1 1 b 1 - 0 1 0 c 1 0 - 1 1 e d d 1 1 1 - 1 e 1 0 1 1 - c May 2008 Cray Inc. Proprietary Slide 15

Adjacency Test – Hash Table Adjacency hash table has a fast, constant time lookup But not as fast as bit matrix Memory requirement is cN (2N in this example) Data structure is a sparse linear list But access is through direct through key hashing a - c - b - d - e a b b - - d a c e d - a - - e d d - c - b - - a e c e - d - a - c May 2008 Cray Inc. Proprietary Slide 16

Adjacency Test Performance Comparison 300000 250000 cliques/second 200000 150000 100000 50000 0 Linear List Hash Table Bit Matrix May 2008 Cray Inc. Proprietary Slide 17

SMP Versus DMP Programming 64.80 64.70 64.60 1 process, 8 threads Time (seconds) 64.50 2 processes, 4 threads each 64.40 4 processes, 2 threads each 64.30 8 processes, 1 thread each 64.20 64.10 64.00 May 2008 Cray Inc. Proprietary Slide 18

Parallel Scaling on Cray XT4 quad core At 2048 processes, compute time is 2.1 seconds Overhead due to message passing is 0.43 seconds Graph contains 3472 vertices, found 2.6 billion maximal cliques Ideal pDFS 2048 1024 512 256 Speedup 128 64 32 16 8 4 2 1 1 2 4 8 16 32 64 128 256 512 1024 2048 Processes May 2008 Cray Inc. Proprietary Slide 19

Conclusion Explicit decomposition at the thread level enabled easier implementation of MPI Independent work already identified Compact representation of units of work Additional work Improved load balancing by grouping processes Parallel I/O optimization May 2008 Cray Inc. Proprietary Slide 20

Conclusion Research group members Nagiza F. Samatova, North Carolina State University and Oak Ridge National Laboratory Matthew Schmidt, North Carolina State University and Oak Ridge National Laboratory Byung-Hoon Park, Oak Ridge National Laboratory Kevin Thomas, Cray Inc. Thank you! Questions? May 2008 Cray Inc. Proprietary Slide 21

Efficient Scaling Up of Parallel Graph Algorithms for Genome-Scale - PowerPoint PPT Presentation

Efficient Scaling Up of Parallel Graph Algorithms for Genome-Scale Biological Problems on Cray XT Kevin Thomas , Cray Inc. Outline Biological networks Graph algorithms and terminology Implementation of a parallel graph algorithm Optimization

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Conformal Finite Size Scaling of Conformal Finite Size Scaling of Flavors Chik Him Wong Twelve

Chapter 11: Scaling and Round-off Noise Keshab K. Parhi Outline Introduction Scaling

So#ware Scaling Mo/va/on & Goals HW Configura/on & Scale Out So#ware Scaling

ADAPTIVE RADIO OUTPUT SCALING FOR POWER AND BANDWIDTH SAVING Koen Zandberg 1 ADAPTIVE RADIO

Efficient signal processing using Haskell and LLVM Henning Thielemann 2016-09-15 Efficient

Scaling up from the stand to Scaling up from the stand to regional level regional level Kevin

Scaling Distributed Teams Around The Globe Ranganathan Balashanmugam Scaling Distributed Teams

Scaling-up SLA Monitoring in Scaling-up SLA Monitoring in Pervasive Environments Pervasive

Multidimensional Scaling Applied Multivariate Statistics Spring 2012 Outline Fundamental

Scaling Datacenter Accelerators With Compute-Reuse Architectures Adi Fuchs and David Wentzlaff

Genetics of Human Consultant: InCarda Atrial Fibrillation Advisory Board: Janssen UC

Drawing Tree-Based Phylogenetic Networks with Minimum Number of Crossings Jonathan Klawitter

Existing and Emerging Information Technologies that Affect Genomic Data Sharing Joyce A.

Analysis of High-Throughput Biological Data Part I: Scalable High Performance Algorithms and

Predicting Disease-related Genes using Integrated Biomedical Networks Jiajie Peng

Confounder Adjustment in Multiple Hypothesis Testing Qingyuan Zhao Department of Statistics,

Evolutionary Computation Computational Procedures patterned after biological evolution

1 Problem: the DNA sequence alone does not directly inform us about phenotype We have much work

Efficient Scaling Up of Parallel Graph Algorithms for Genome-Scale - PowerPoint PPT Presentation

Efficient Scaling Up of Parallel Graph Algorithms for Genome-Scale Biological Problems on Cray XT Kevin Thomas , Cray Inc. Outline Biological networks Graph algorithms and terminology Implementation of a parallel graph algorithm Optimization

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Conformal Finite Size Scaling of Conformal Finite Size Scaling of Flavors Chik Him Wong Twelve

Chapter 11: Scaling and Round-off Noise Keshab K. Parhi Outline Introduction Scaling

So#ware Scaling Mo/va/on &amp; Goals HW Configura/on &amp; Scale Out So#ware Scaling

ADAPTIVE RADIO OUTPUT SCALING FOR POWER AND BANDWIDTH SAVING Koen Zandberg 1 ADAPTIVE RADIO

Efficient signal processing using Haskell and LLVM Henning Thielemann 2016-09-15 Efficient

Scaling up from the stand to Scaling up from the stand to regional level regional level Kevin

Scaling Distributed Teams Around The Globe Ranganathan Balashanmugam Scaling Distributed Teams

Scaling-up SLA Monitoring in Scaling-up SLA Monitoring in Pervasive Environments Pervasive

Multidimensional Scaling Applied Multivariate Statistics Spring 2012 Outline Fundamental

Scaling Datacenter Accelerators With Compute-Reuse Architectures Adi Fuchs and David Wentzlaff

Genetics of Human Consultant: InCarda Atrial Fibrillation Advisory Board: Janssen UC

Drawing Tree-Based Phylogenetic Networks with Minimum Number of Crossings Jonathan Klawitter

Existing and Emerging Information Technologies that Affect Genomic Data Sharing Joyce A.

Analysis of High-Throughput Biological Data Part I: Scalable High Performance Algorithms and

Predicting Disease-related Genes using Integrated Biomedical Networks Jiajie Peng

Confounder Adjustment in Multiple Hypothesis Testing Qingyuan Zhao Department of Statistics,

Evolutionary Computation Computational Procedures patterned after biological evolution

1 Problem: the DNA sequence alone does not directly inform us about phenotype We have much work

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

So#ware Scaling Mo/va/on & Goals HW Configura/on & Scale Out So#ware Scaling