T HE bioinformatics community faces a daunting chal- and columns - PDF document

IEEE TRANSACTIONS ON COMPUTERS, VOL. 59, NO. 1, JANUARY 2010 29 Network-on-Chip Hardware Accelerators for Biological Sequence Alignment Souradip Sarkar, Student Member , IEEE , Gaurav Ramesh Kulkarni, Student Member , IEEE , Partha Pratim Pande, Member , IEEE , and Ananth Kalyanaraman, Member , IEEE Abstract —The most pervasive compute operation carried out in almost all bioinformatics applications is pairwise sequence homology detection (or sequence alignment). Due to exponentially growing sequence databases, computing this operation at a large-scale is becoming expensive. An effective approach to speed up this operation is to integrate a very high number of processing elements in a single chip so that the massive scales of fine-grain parallelism inherent in several bioinformatics applications can be exploited efficiently. Network-on-Chip (NoC) is a very efficient method to achieve such large-scale integration. In this work, we propose to bridge the gap between data generation and processing in bioinformatics applications by designing NoC architectures for the sequence alignment operation. Specifically, we 1) propose optimized NoC architectures for different sequence alignment algorithms that were originally designed for distributed memory parallel computers and 2) provide a thorough comparative evaluation of their respective performance and energy dissipation. While accelerators using other hardware architectures such as FPGA, General Purpose Graphics Processing Unit (GPU), and the Cell Broadband Engine (CBE) have been previously designed for sequence alignment, the NoC paradigm enables integration of a much larger number of processing elements on a single chip and also offers a higher degree of flexibility in placing them along the die to suit the underlying algorithm. The results show that our NoC-based implementations can provide above 10 2 -10 3 -fold speedup over other hardware accelerators and above 10 4 -fold speedup over traditional CPU architectures. This is significant because it will drastically reduce the time required to perform the millions of alignment operations that are typical in large-scale bioinformatics projects. To the best of our knowledge, this work embodies the first attempt to accelerate a bioinformatics application using NoC. Index Terms —Network-on-chip, bioinformatics, DNA/protein sequence alignment, on-chip parallelism, hardware acceleration. Ç 1 I NTRODUCTION T HE bioinformatics community faces a daunting chal- and columns representing the character sequence of the two lenge today because the rate of data generation is strings being compared. The operation is used almost on a rapidly outpacing the rate at which it can be computation- daily basis by molecular biologists, and also in all genome ally processed. Propelled by recent technological break- sequencing projects of any scale. While the task of carrying throughs in high-throughput DNA and protein sequencing, out a single pairwise sequence comparison is in itself experimentalists are generating data at unprecedented computationally lightweight (in milliseconds) on traditional rates. For example, the GenBank database [1], which is the machines, performing millions to billions of such compar- largest public repository for molecular sequence data, is isons could become easily prohibitive. For example, a recent continuing to double in size every 18 months since its analysis that computed pairwise alignments for over inception in the early 1990s. Sequence data are subsequently 28 million metagenomic sequences [4] took an aggregate used in further computational analysis that can ultimately of 10 6 CPU hours—a task that took months to complete lead to the discovery and fundamental understanding of the after parallelization at the coarse level using a combination genetic composition in organisms. of 2,300 processors and high-end memory systems. Our The most predominant compute operation that is carried experiments show that running the multiple sequence out in nearly all sequence analysis projects is pairwise alignment tool ClustalW [5] even on hundreds of sequences sequence alignment, which aims at measuring the similar- requires several hours on state-of-the-art workstations. ity between two DNA or protein “sequences” (represented To speed up the data processing, several hardware as strings over a fixed alphabet). The operation is accelerators have been proposed recently including, but not performed using a dynamic programming (DP) algorithm limited to, [6], [7], [8], [9], [10]. Among these, the use of [2], [3] that computes a two-dimensional table, with rows FPGA-based reconfigurable hardware platforms, Graphics Processing Unit (GPU), and Cell Broadband Engine (CBE) is . The authors are with the School of Electrical Engineering and notable. The principal advantages of using FPGA-, GPU-, or Computer Science, Washington State University, PO Box 642752, CBE-based systems are fast prototyping and ease of Pullman, WA 99164-2752. implementation. These systems primarily rely on software E-mail: (ssarkar, gkulkarn, pande, ananth)@eecs.wsu.edu. and use an existing hardware platform to map algorithms. Manuscript received 10 Nov. 2008; revised 11 Mar. 2009; accepted 16 Apr. In the aforementioned schemes, as the basic hardware is not 2009; published online 10 Sept. 2009. Recommended for acceptance by R. Marculescu. tailor-made for the applications under consideration, the For information on obtaining reprints of this article, please send e-mail to: achievable speedup is typically limited—the accelerators tc@computer.org, and reference IEEECS Log Number TC-2008-11-0565. built for sequence alignment based on FPGAs, GPUs, and Digital Object Identifier no. 10.1109/TC.2009.133. 0018-9340/10/$26.00 � 2010 IEEE Published by the IEEE Computer Society Authorized licensed use limited to: National Cheng Kung University. Downloaded on March 16,2010 at 21:21:15 EDT from IEEE Xplore. Restrictions apply.

T HE bioinformatics community faces a daunting chal- and columns - PDF document

IEEE TRANSACTIONS ON COMPUTERS, VOL. 59, NO. 1, JANUARY 2010 29 Network-on-Chip Hardware Accelerators for Biological Sequence Alignment Souradip Sarkar, Student Member , IEEE , Gaurav Ramesh Kulkarni, Student Member , IEEE , Partha Pratim

opportunities of FinTech in the insurance industry Prof. Che Lin National Tsing Hua University

Travel support from Octapharma. 2 29/11/2013 3 29/11/2013 Our local experience in Lige

An Improved GPS/GLONASS PPP Model for Kinematic Applications Title goes here Mahmoud Abd Rabbou

Cyclic Subspace Codes Via Subspace Polynomials Kamil Otal and Ferruh zbudak Middle East

Q&A with Brightons Caldicott Guardians 2 Soline Jerram, Lead Nurse, Director of Clinical

env env isi on on Sustainability Rating System Name Here Terry F. . Nei eimeyer, , PE, ,

solutions technologic al for the industrial management issues Taking advantage of our

energy efficiency policies in the EU The MURE database for energy efficiency policies Wolfgang

International Student Mental Health Lightning Round: Approaches to Understand Needs and Support

US National Multi-Model (NMME) Intra- Seasonal to Inter-Annual (ISI) Prediction System 1 Why

1D Ising model Simple model of interacting spins in a lattice Nearest-neighbor only No

A Few Experiments in 2D Information 5 June 2018 Background Interests: Probability

Application of conformal field theory in the study of critical phenomena in 2 d statistical

Long-range order in random 3 -colorings in high dimensions Ohad N. Feldheim Joint work with

A spatiotemporal stochastic model for tropical precipitation and water vapor dynamics. Scott

Does it pay to be consistent? Peter Schuster Institut fr Theoretische Chemie, Universitt

Entering the Quantum Griffiths Phase of a Disordered Superconductor Jrme Lesueur Physics and

Capital Markets Day Becoming the UKs leading income focused REIT 6 February 2017 01

Buy American ISM RGV Timeline 2020 Kristina B. Carrizales Purchasing Cooperative Specialist

IWM Toronto 2007 Report on the ISM One Day Micropiling Course 6 th to 17 th November 2006 Mark

March 21, 2016 EU CBRN Risk Mitigation Centres of Excellence The European Union Chemical,

When They Become One The purpose of this presentation is provide emergency management

Incremental Sampling Methodology with Petroleum Hydrocarbon Contaminated Soils in Canada

Three-dimensional Line Transfer Studies of Compact Molecular ISM at the Centers of Active

T HE bioinformatics community faces a daunting chal- and columns - PDF document

IEEE TRANSACTIONS ON COMPUTERS, VOL. 59, NO. 1, JANUARY 2010 29 Network-on-Chip Hardware Accelerators for Biological Sequence Alignment Souradip Sarkar, Student Member , IEEE , Gaurav Ramesh Kulkarni, Student Member , IEEE , Partha Pratim

opportunities of FinTech in the insurance industry Prof. Che Lin National Tsing Hua University

Travel support from Octapharma. 2 29/11/2013 3 29/11/2013 Our local experience in Lige

An Improved GPS/GLONASS PPP Model for Kinematic Applications Title goes here Mahmoud Abd Rabbou

Cyclic Subspace Codes Via Subspace Polynomials Kamil Otal and Ferruh zbudak Middle East

Q&amp;A with Brightons Caldicott Guardians 2 Soline Jerram, Lead Nurse, Director of Clinical

env env isi on on Sustainability Rating System Name Here Terry F. . Nei eimeyer, , PE, ,

solutions technologic al for the industrial management issues Taking advantage of our

energy efficiency policies in the EU The MURE database for energy efficiency policies Wolfgang

International Student Mental Health Lightning Round: Approaches to Understand Needs and Support

US National Multi-Model (NMME) Intra- Seasonal to Inter-Annual (ISI) Prediction System 1 Why

1D Ising model Simple model of interacting spins in a lattice Nearest-neighbor only No

A Few Experiments in 2D Information 5 June 2018 Background Interests: Probability

Application of conformal field theory in the study of critical phenomena in 2 d statistical

Long-range order in random 3 -colorings in high dimensions Ohad N. Feldheim Joint work with

A spatiotemporal stochastic model for tropical precipitation and water vapor dynamics. Scott

Does it pay to be consistent? Peter Schuster Institut fr Theoretische Chemie, Universitt

Entering the Quantum Griffiths Phase of a Disordered Superconductor Jrme Lesueur Physics and

Capital Markets Day Becoming the UKs leading income focused REIT 6 February 2017 01

Buy American ISM RGV Timeline 2020 Kristina B. Carrizales Purchasing Cooperative Specialist

IWM Toronto 2007 Report on the ISM One Day Micropiling Course 6 th to 17 th November 2006 Mark

March 21, 2016 EU CBRN Risk Mitigation Centres of Excellence The European Union Chemical,

When They Become One The purpose of this presentation is provide emergency management

Incremental Sampling Methodology with Petroleum Hydrocarbon Contaminated Soils in Canada

Three-dimensional Line Transfer Studies of Compact Molecular ISM at the Centers of Active

Q&A with Brightons Caldicott Guardians 2 Soline Jerram, Lead Nurse, Director of Clinical