Efficient Parallel Implementations of Multiple Sequence Alignment - - PowerPoint PPT Presentation

efficient parallel implementations of multiple sequence
SMART_READER_LITE
LIVE PREVIEW

Efficient Parallel Implementations of Multiple Sequence Alignment - - PowerPoint PPT Presentation

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Efficient Parallel Implementations of Multiple Sequence Alignment Using BSP/CGM Model Jucele F. A. Vasconcellos,


slide-1
SLIDE 1

1

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work

Efficient Parallel Implementations of Multiple Sequence Alignment Using BSP/CGM Model

Jucele F. A. Vasconcellos, Christiane Nishibe, Nalvo F. Almeida and Edson N. C´ aceres

Faculdade de Computa¸ c˜ ao Universidade Federal de Mato Grosso do Sul Campo Grande - MS Brazil

15 de Fevereiro de 2014

slide-2
SLIDE 2

2

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Motivation and Goals

Motivation and Goals

Important tool in bioinformatics:

Extract biological similarities; Predict protein structure; Reconstruct phylogeny; Illustrate mutations events; Assess sequence conservations.

Design an BSP/CGM algorithm and implement it in a manycore archi- tecture; Compare with Message Passing implementation.

slide-3
SLIDE 3

3

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Definition Approaches Pairwise Alignment Gusfield Algorithm

Definition

s5 A C T G A C C

  • s4

A T C T T C

  • T T
  • s3

A T C

  • C A A T T T T

s2 A T G G C C A T T

  • s1

A T T G C C A T T

  • A multiple sequence alignment

s5 A C T G A C C s4 A T C T T C T T s3 A T C C A A T T T T s2 A T G G C C A T T s1 A T T G C C A T T Five input sequences

slide-4
SLIDE 4

4

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Definition Approaches Pairwise Alignment Gusfield Algorithm

Approaches

Exact Algorithms:

Carrillo-Lipman.

Progressive and Iterative Algorithms:

ClustalW; Muscle; T-Coffee; Gusfield. FFTNSI;

Consistency Based Algorithms:

CBA.

slide-5
SLIDE 5

5

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Definition Approaches Pairwise Alignment Gusfield Algorithm

Building one alignment

S = ACTTCCAGA T = AGTTCCGGAGG Si Si

  • Tj
  • Tj

Mi,j = max    Mi−1,j−1 + p[Si , Tj ] Mi−1,j + gap Mi,j−1 + gap A G T T C C G G A G G

  • 2
  • 4
  • 6
  • 8
  • 10
  • 12
  • 14
  • 16
  • 18
  • 20
  • 22

A

  • 2

1

  • 1
  • 3
  • 5
  • 7
  • 9
  • 11
  • 13
  • 15
  • 17
  • 19

C

  • 4
  • 1
  • 2
  • 4
  • 4
  • 6
  • 8
  • 10
  • 12
  • 14
  • 16

T

  • 6
  • 3
  • 2

1

  • 1
  • 3
  • 5
  • 7
  • 9
  • 11
  • 13
  • 15

T

  • 8
  • 5
  • 4
  • 1

2

  • 2
  • 4
  • 6
  • 8
  • 10
  • 12

C

  • 10
  • 7
  • 6
  • 3

3 1

  • 1
  • 3
  • 5
  • 7
  • 9

C

  • 12
  • 9
  • 8
  • 5
  • 2

1 4 2

  • 2
  • 4
  • 6

A

  • 14
  • 11
  • 10
  • 7
  • 4
  • 1

2 3 1 1

  • 1
  • 3

G

  • 16
  • 13
  • 10
  • 9
  • 6
  • 3

3 4 2 2 A

  • 18
  • 15
  • 12
  • 11
  • 8
  • 5
  • 2

1 2 5 3 1

S = A C T T C C

  • A

G A T = A G T T C C G G A G G +1

  • 1

+1 +1 +1 +1

  • 2
  • 2

+1 +1

  • 1

= +1

slide-6
SLIDE 6

6

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Definition Approaches Pairwise Alignment Gusfield Algorithm

Calculate the pairwise alignments

s1 = A T T G C C A T T s2 = A T G G C C A T T s3 = A T C C A A T T T T s4 = A T C T T C T T s5 = A C T G A C C

k(k − 1) 2

s1 = A T T G C C A T T s1 = A T T G C C A - - T T s2 = A T G G C C A T T s3 = A - T C C A A T T T T s1 = A T T G C C A T T s1 = A T T G C C A T T s4 = A T C T T C - T T s5 = A C T G - - A C C s2 = A T G G C C A - - T T s2 = A T G G C C A T T s3 = A T - C C A A T T T T s4 = A T C T T C - T T s2 = A - T G G C C A T T s3 = A T C C A A T T T T s5 = A C T G A C C - - - s4 = A T - C - T T C T T s3 = A T C C A A T T T T s4 = A T C T T C T T s5 = A - C T G A - - C C s5 = A - C T G A C C

slide-7
SLIDE 7

7

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Definition Approaches Pairwise Alignment Gusfield Algorithm

Find the center sequence Sc

s1 = A T T G C C A T T s1 = A T T G C C A - - T T s2 = A T G G C C A T T = 7 s3 = A - T C C A A T T T T = -2 s1 = A T T G C C A T T s1 = A T T G C C A T T s4 = A T C T T C - T T = 0 s5 = A C T G - - A C C = -3 s2 = A T G G C C A - - T T s2 = A T G G C C A T T s3 = A T - C C A A T T T T = -2 s4 = A T C T T C - T T = 0 s2 = A - T G G C C A T T s3 = A T C C A A T T T T s5 = A C T G A C C - - - = -4 s4 = A T - C - T T C T T = 0 s3 = A T C C A A T T T T s4 = A T C T T C T T s5 = A - C T G A - - C C = -7 s5 = A - C T G A C C = -3

s1 s2 s3 s4 s5 aln(si, sj) s1 7

  • 2
  • 3

2 s2 7

  • 2
  • 4

1 s3

  • 2
  • 2
  • 7
  • 11

s4

  • 3
  • 3

s5

  • 3
  • 4
  • 7
  • 3
  • 17
slide-8
SLIDE 8

8

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Definition Approaches Pairwise Alignment Gusfield Algorithm

Construct the alignment and add the alignment to the MSA

s1 = A T T G C C A T T s1 = A T T G C C A - - T T s2 = A T G G C C A T T s3 = A - T C C A A T T T T s1 = A T T G C C A T T s1 = A T T G C C A T T s4 = A T C T T C - T T s5 = A C T G - - A C C

s1 = A T T G C C A - - T T s2 = A T G G C C A - - T T s3 = A - T C C A A T T T T s4 = A T C T T C - - - T T s5 = A C T G - - A - - C C

slide-9
SLIDE 9

9

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work BSP/CGM Algorithm

BSP/CGM Model

Localcomputation SynchronizationBarrier GlobalCommunication Computationround Communicationround P0 P1 P2 Pp−1

O(p) rounds of communication; O(mn/p) local memory;

slide-10
SLIDE 10

10

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work BSP/CGM Algorithm

Wavefront Strategy

G G A C T G C C G T T C C G T A

1 2 3 4 5 6 7 8

G T T C C G T A

1 2 3 4 5 6 7 8

1

P

2

P

3

P

4

P

G G

1

P

A C

2

P

T C

3

P

4

P

G C

P p−1

1

P p

2

P 2p−2

p

P 2

1

P 1

1

P 2

2

P p

p

P 0

1

P 1

2

P 2

3

P p−1

p

P j

i

m n

m p n p

M x

i

slide-11
SLIDE 11

11

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work BSP/CGM Algorithm

MPI Implementation

1 Calculate the pairwise alignment;

k(k−1) 2

2 Find the center sequence Sc; 3 Calculate the pairwise alignment between Sc and the other sequences; 4 Construct the alignment and add the alignment to the MSA;

for 1 ≤ x ≤ k do P1 sends a subsequence of Sx, where Sx = Sc; Algorithm Pairwise (p, i, Sc, Sx); Each Pj constructs a part of the alignment between ScSx and sends the alignment to P1; P1 adds the alignment between ScSx to MSA; end for

slide-12
SLIDE 12

12

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work BSP/CGM Algorithm

Wavefront Strategy

m n H1 H2 H3 H2 H3 H3 Hm+n−3 Hm+n−3 Hm+n−3 Hm+n−2 Hm+n−2 Hm+n−1 Dm+n−5 Dm+n−5 Dm+n−5 Dm+n−5 Dm+n−5 Dm+n−4 Dm+n−4 Dm+n−4 Dm+n−4 D4 D4 D4 D4 D5 D5 D5 D5 D5

slide-13
SLIDE 13

13

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work BSP/CGM Algorithm

CUDA Implementation

1 Calculate the pairwise alignment; 2 Find the center sequence Sc; 3 Calculate the pairwise alignment between Sc and the other sequences; 4 Construct the alignment and add the alignment to the MSA;

for 1 ≤ x ≤ k do Copy to device the sequence Sx, Sx = Sc; Host and device calculate the pairwise alignment between Sc and Sx; Host constructs the alignment ScSx; Host adds ScSx to the MSA; end for

slide-14
SLIDE 14

14

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Computational Resources Executions MPI × CUDA

Resources

Carleton Cluster

64 Processor: AMD Opteron 2.2 GHz; Cache: 1024 KB; Memory: 8 GB.

Desktop - CUDA

Processor: Intel Core 2 Quad 2.83 GHz; Cache: 6144 KB; Memory: 4 GB; GeForce GTX 460:

336 CUDA Cores; GPU Clock rate: 1.50 GHz; Global memory: 1024 MBytes.

Quadro FX 380:

16 CUDA Cores; GPU Clock rate: 1.10 GHz; Global memory: 255 MBytes.

slide-15
SLIDE 15

15

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Computational Resources Executions MPI × CUDA

Input Data

Number of sequences: 8, 10, 12 and 14; Length of sequences: 1024, 4096, 8192 and 16384.

slide-16
SLIDE 16

16

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Computational Resources Executions MPI × CUDA

MPI Results

  • No. of

P = 1 P = 2 P = 4 P = 8 P = 16 P = 32 P = 64 Seqs 8 154.527 115.399 64.645 35.362 22.970 11.817 9.513 10 214.526 166.007 97.864 54.472 32.158 19.957 13.173 12 299.001 239.347 139.494 77.189 47.121 26.733 15.295 14 429.346 317.902 183.771 101.305 59.804 35.033 23.248

0.000 50.000 100.000 150.000 200.000 250.000 300.000 350.000 400.000 450.000 1 2 4 8 16 32 64 Time (s)

  • No. CPUs

8 Sequences 10 Sequences 12 Sequences 14 Sequences

slide-17
SLIDE 17

17

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Computational Resources Executions MPI × CUDA

CUDA Results

  • No. of Seqs

1024 4096 8192 16384 8 0.344 4.837 21.195 33.288 10 0.527 7.448 32.586 47.956 12 0.755 10.588 46.418 64.895 14 1.010 14.245 62.551 84.254

  • No. of Seqs

P = 1 P = 2 P = 4 P = 8 P = 16 P = 32 P = 64 8 1.779 1.331 0.822 0.463 0.269 0.196 0.803 10 2.715 2.141 1.276 0.717 0.405 0.300 0.398 12 3.943 2.982 1.866 1.019 0.588 0.471 0.487 14 5.735 3.935 2.481 1.402 0.796 0.577 0.665

  • No. of Seqs

P = 1 P = 2 P = 4 P = 8 P = 16 P = 32 P = 64 8 154.527 115.399 64.645 35.362 22.970 11.817 9.513 10 214.526 166.007 97.864 54.472 32.158 19.957 13.173 12 299.001 239.347 139.494 77.189 47.121 26.733 15.295 14 429.346 317.902 183.771 101.305 59.804 35.033 23.248

slide-18
SLIDE 18

18

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Computational Resources Executions MPI × CUDA

14 sequences, 16384 characters

50 100 150 200 250 300 350 400 450 P = 1 P = 2 P = 4 P = 8 GPU P = 16 P = 32 P = 64 Time (s)

slide-19
SLIDE 19

19

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Conclusions Future work

Conclusions

Scalable implementations; Use different architectures; CUDA/GPGPU is suitable for BSP/CGM algorithms.

slide-20
SLIDE 20

20

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Conclusions Future work

Future Work

Improve memory utilization in CUDA; Improve the Threads/Kernels/SM’s utilization; Comparison the results with other approaches.

slide-21
SLIDE 21

21

Introduction Multiple Sequence Alignment Distributed Memory Shared Memory Computational Results Conclusions and Future Work Conclusions Future work

Thank you!