A Tile-based Parallel Viterbi Algorithm for Biological Sequence - - PowerPoint PPT Presentation

a tile based parallel viterbi algorithm for biological
SMART_READER_LITE
LIVE PREVIEW

A Tile-based Parallel Viterbi Algorithm for Biological Sequence - - PowerPoint PPT Presentation

A Tile-based Parallel Viterbi Algorithm for Biological Sequence Alignment on GPU with CUDA Zhihui Du 1 , Zhaoming Yin 2 , David A. Bader 3 1, Department of Computer Science, Tsinghua University. 2, School of Software and Microelectronics, Peking


slide-1
SLIDE 1

A Tile-based Parallel Viterbi Algorithm for Biological Sequence Alignment on GPU with CUDA

Zhihui Du1, Zhaoming Yin2, David A. Bader3 1, Department of Computer Science, Tsinghua University. 2, School of Software and Microelectronics, Peking University. 3, School of Computing, Georgia Institute of Technology

slide-2
SLIDE 2

4 1 2 3

Content s

5

Background Parallel Algorithm Streaming Algorithm Tile-based Algorithm Experiments

Contents

slide-3
SLIDE 3

1 3 4 5 2

Background

Background

slide-4
SLIDE 4

The Importance of Accelerating Viterbi Algorithm

counting modeling initial viterbi scoring join viterbi scoring viterbi aligning

  • In the test of SATCHMO, Viterbi algorithm occupy

anout 80% compuation time

slide-5
SLIDE 5

H+ H- H-

  • 1.0
  • 2.0
  • 3.0

N/A N/A N/A N/A л N/A N/A DE N/A N/A DE N/A

  • 1.3
  • 0.8
  • 2.1
  • 1.8
  • 1.85
  • 1.4
  • 2.6
  • 2.4
  • 2.2
  • 3.8

H-

N/A N/A л л IN DE DE MA DE DE DE DE

  • 2.6
  • 2.6
  • 3.9
  • 1.8
  • 1.45
  • 2.6
  • 2.0

H-

N/A N/A IN IN IN MA MI IN DE

Dynamic Programming Matrix

  • 1.85 + log(0.5) + log(0.8)

= -2.25

  • 1.4 + log(0.1) + log(0.5)

= -2.7

  • 2.6 + log(0.3) + log(0.5)

= -3.4

  • 2.25
  • 2.45

3.0 MA MA DE

Forward Variable Transition Probability Emit Probability

slide-6
SLIDE 6

Trace Back

1:H 2: 0:H- H-

  • H-

H- H- H- H- H- H- H+H

  • result

H+

  • H

H-

  • H-

H+ H- H-

N/A N/A N/A N/A л N/A N/A DE N/A N/A DE N/A

H-

N/A N/A л л IN DE DE MA DE DE DE DE

  • 2.25
  • 2.45
  • 3.0

H-

N/A N/A IN IN IN MA MI IN DE MA MA DE

slide-7
SLIDE 7

Related Work

  • W. Liu, B. Schmidt, G. Voss, W. Muller Wittig.

“Streaming Algorithms for Biological Sequence Alignment on GPUs” IEEE TPDS, Vol. 18, No. 9. (2007), pp. 1270‐1281.

  • Y. Munekawa, F. Ino, and K. Hagihara. Design and

Implementation of the Smith‐Waterman Algorithm

  • n the CUDA‐Compatible GPU. 8th IEEE

International Conference on BioInformatics and BioEngineering, pages 1 C6, Oct .2008

slide-8
SLIDE 8

Related Work

  • S.A. Manavski, G. Valle. CUDA compatible GPU

cards as efficient hardware accelerators for Smith‐ Waterman sequence alignment. BMC

  • Bioinformatics. 2008 Mar 26;9 Suppl 2:S10
  • R. Horn, M. Houston, P. Hanrahan. ClawHMMer: A

streaming HMMer –search implementation. Proc. Supercomputing (2005).

slide-9
SLIDE 9

Parallel Algorithm

1 3 4 5 2

Parallel Algorithm

slide-10
SLIDE 10

Wave‐front Algorithm

wave-front algorithm: The computing procedure is similar to a frontier of a wave to fill a matrix, where each block’s value in the matrix is calculated based

  • n the values of the

previously-calculated blocks.

slide-11
SLIDE 11

1 3 4 5 2

Streaming Algorithm

Streaming Algorithm

slide-12
SLIDE 12

If the Sequence Length is Too Long!

kernel kernel

host host device device

Step i Step i+1

Streaming Algorithm: Transfering data between Host and Device.

slide-13
SLIDE 13

1 3 4 5 2

Tile-based Algorithm

The Tile‐based Algorithm

slide-14
SLIDE 14

The Tile Based Algorithm

AAAATTTT CTAC AAA CAAT AAAAAAA AATTTT CTAC AAAAA CAAT AAA … … AAAATTTTCTACAAACAATAAAAAAA AATTTTCTACAAAAACAATAAA Find Homological Segments … … AAAATTTT CTAC A - - AA CAAT AAAAAAA AA - - TTTT CTAC AAAAA CAAT A - - - - AA Align independently

Step1: Utilizing Homological Segments to divide long sequence See:

  • M. Katoh and M. Kuma. “MAFFT:

a novel method for rapid multiple sequence alignment based on fast Fourier transform”. In Nucleic Acids Res. 30:3059-3066 2002.

slide-15
SLIDE 15

The Tile Based Algorithm

Find homological segment pairs Divide sequence(shaded area)

Step2: Align sub-matrices

slide-16
SLIDE 16
slide-17
SLIDE 17

Experiments

1 3 4 5 2

Experiments

slide-18
SLIDE 18

Results

Execution Time (Second)/Speedup Seq-Length serial Simple Wave-front Streaming Tile-based DW 0.73 0.37 1.97 0.38 1.92 0.28 2.61 RW 0.017 0.007 2.42 0.02 0.85 0.006 2.83 DL 0.063 0.008 7.87 0.023 2.74 0.007 9 100 RL 0.027 0.007 3.86 0.023 1.17 0.007 3.86 DW 2.34 0.39 6 0.44 5.32 0.39 6 RW 0.05 0.03 1.67 0.061 0.82 0.028 1.79 DL 0.324 0.035 9.26 0.065 4.98 0.029 11.17 200 RL 0.142 0.035 4.06 0.065 2.18 0.029 4.9 DW 5.89 0.42 14.02 0.46 12.8 0.43 13.7 RW 0.12 0.068 1.76 0.1 1.2 0.055 2.18 DL 0.647 0.07 9.26 0.112 5.78 0.054 11.98 300 RL 0.283 0.068 4.16 0.116 2.44 0.054 5.24 DW 9.93 0.50 19.86 0.52 19.1 0.45 22.07 RW 0.21 0.13 1.61 0.159 1.32 0.098 2.14 DL 1.112 0.12 9.27 0.2 5.56 0.099 11.23 400 RL 0.485 0.122 3.98 0.174 2.79 0.097 5 DW 15.9 0.54 29.44 0.54 29.44 0.52 30.58 RW 0.34 0.19 1.78 0.239 1.42 0.174 1.95 DL 1.783 0.198 9 0.262 6.8 0.155 11.5 500 RL 0.783 0.191 4.10 0.251 3.12 0.153 5.12 DW 62.1 0.99 62.73 1.10 56.45 0.86 72.21 RW 1.34 0.64 2.09 0.686 1.95 0.554 2.42 DL 6.98 0.64 10.91 0.725 9.63 0.53 13.17 1000 RL 3.07 0.635 4.83 0.62 4.952 0.512 6.0

Intel Core 2 NVIDIA GeForce 9800

slide-19
SLIDE 19

Test of Streaming Algorithm

0% 20% 40% 60% 80% 100% DW RW DL RL DW RW DL RL DW RW DL RL DW RW DL RL DW RW DL RL DW RW DL RL 100 200 300 400 500 1000 Sequence Length Percentage

computation transmition

slide-20
SLIDE 20

Test of Tile based Algorithm

slide-21
SLIDE 21

Test of Long Sequences

5 10 15 20 25 30 35 40 1800 2300 2800 3300 3800 4300 4800 Sequence Length Time (Seconds) Tile Based Windows Tile Based Linux Streaming Windows Streaming Linux 2 4 6 8 10 12 1800 2300 2800 3300 3800 4300 4800 Sequence Length Time (Seconds) Tile based Windows Tile based Linux Streaming Windows Streaming Linux

Under Linux System Windows System

slide-22
SLIDE 22

Future work

1: Experiments on multiple GPUs 2: New Architectures such as FERMI

Q&A?