Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
SWAMP+: Enhanced Smith- Waterman Search for Parallel Models - - PowerPoint PPT Presentation
SWAMP+: Enhanced Smith- Waterman Search for Parallel Models Shannon Steinfadt, Ph.D. Los Alamos National Laboratory shannon@lanl.gov U N C L A S S I F I E D Operated by Los Alamos National Security, LLC for the U.S. Department of Energys
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
Slide 2 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
gcggacgctccacg-tgtc--c—-ct-cgccgcgccc-cgtctacc gggccctcctggctcccaacagcttctcagttc ccacttc ||:|:||||::|-|::|--|--||-|-|:|:|::| ||-|:||
Slide 3 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Query:IHACYSRQPELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVA Subject:MFCVQCEQTIRTPAGNGCSYAQGMCGKTAETSDLQDLLIAALQGLSAWAVKAREYGIINHDVDSFAPRAFFST
LTNVNFDSPRIVGYAREAIALREALKAQCLAVDANARVDNPMADLQLVSDDLGELQRQAAEFTPNKDKAAIGENILGLRL LCLYGLKGAAAYMEHAHVLGQYDNDIYAQYHKIMAWLGTWPADMNALLECSMEIGQMNFKVMSILDAGETGKYGHPTPTQ VNVKATAGKCILISGHDLKDLYNLLEQTEGTGVNVYTHGEMLPAHGYPELRKFKHLVGNYGSGWQNQQVEFARFPGPIVM TSNCIIDPTVGAYDDRIWTRSIVGWPGVRHLDGDDFSAVITQAQQMAGFPYSEIPHLITVGFGRQTLLGAADTLIDLVSR EKLRHIFLLGGCDGARGERHYFTDFATSVPDDCLILTLACGKYRFNKLEFGDIEGLPRLVDAGQCNDAYSAIILAVTLAE KLGCGVNDLPLSLVLSWFEQKAIVILLTLLSLGVKNIVTGPTAPGFLTPDLLAVLNEKFGLRSITTVEEDMKQLLSA
Slide 4 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Query: VIA-EPYRE-RLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDK : : :: : :: : : : : Subject: LVSREKLRHIFLLGGCDGARGERHYFTDFATSVPDDCLILTLACGK
Slide 5 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
(derived by humans) (preserved by evolution)
Homologous Sequences
Slide 6 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 7 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 8 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 9 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 10 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 11 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 12 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 13 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 14 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 15 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 16 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 17 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 18 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
− − j i j i j i
, 1 , 1 .
− − 1 , 1 , ,
j i j i j i
j i j i j i
, j −1+ d S1i ,S2j
g : gap extension cost σ: gap opening cost
g
Slide 19 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 20 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 21 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
and Seeberg
Slide 22 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Sequential matrix
values
Slide 23 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Tilted data arrangement to parallelize and process a diagonal at a time.
Slide 24 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 25 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 26 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 27 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Used PEs Unused PEs Order of Computations
Slide 28 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
SIMD with special associative features Fine-grained parallelism Designed for fast associative searches
Content-based searches, not memory address
Slide 29 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 30 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
1) Read in S1 and S2 In Active PEs (those with data values for S1 or S2): 2) Initialize the two-dimension variables D[$], I[$], C[$] to zeros. 3) “Shift” or slide string S2 to create a titled matrix 4) For every anti_diagonal (a_d) from 2 to m+n-1 do in parallel { 5) If S2[$,a_d] is valid (S2 [$,a_d] ≠ “@” and S2[$,a_d] ≠ “-”) then {
6.1) Calculate score for deletion for D[$,a_d] 6.2) Calculate score for a insertion for I[$,a_d] 6.3) Calculate matrix score for C[$,a_d] } 7) local_maxPE=MAXDEX(C[$, a_d]) 8) if C[local_maxPE, a_d] > max_Val then { 9.1) max_PE = local_maxPE 9.2) max_Val = C[local_maxPE, a_d]) } }
10) Return max_Val, max_PE
Slide 31 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Used PEs Unused PEs Order of Computations
Slide 32 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 33 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
measurements using ASC language and emulator.
shown with the dashed line.
Predictions calculated using linear regression and the least squares method.
Slide 34 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
—
Spatial information
—
Length of comparisons
—
Identify regulatory regions and motifs
Slide 35 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 36 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Associative SIMD Model - ASC
ctcgccgcgc ggcggacgct ccacgtgtcc cccgtctacc gggccctcct ggctcccaac agcttctcag ttcccacttc
Slide 37 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
50 GFLOPS peak performance 25W average power dissipation
ctcgccgcgc ggcggacgct ccacgtgtcc cccgtctacc gggccctcct ggctcccaac agcttctcag ttcccacttc
Slide 38 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
—
Maximum
—
Any Responders
—
Pick One
Slide 39 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
1) Read in S1 and S2 In Active PEs (those with data values for S1 or S2): 2) Initialize Row 0, Col 0 variables D[$], I[$], C[$] to zeros. 3) For each PE, shift S2 down 1, copy entire string 4) For every a_d from 2 to m+n-1 do in parallel { 5) If S2[$,a_d] ≠ “@” and S2[$,a_d] ≠ “-” then { 6.1) Calculate score for deletion and insertion for D[$,a_d]; Calculate matrix score for C[$,a_d] } 7) local_maxPE=getPE(max_int(C[$, a_d])) 8) if C[local_maxPE] > max_Val then { 9.1) max_PE = local_maxPE 9.2) max_Val = C[local_maxPE]) } } 10) Return max_Val, max_PE 11) Perform traceback 12) Mark aligned values 13) Run alignment for all k values (2-12)
Slide 40 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 41 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Average Calculation w ith Eight Highest Outliers Rem ovedCycle Counts
10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 10 20 30 40 50 60 70 80 90 96 Sequence Lengths 1 Alignment 2 Alignments (1st) 2 Alignments (2nd) 3 Alignments (1st) 3 Alignments (2nd) 3 Alignments (3rd) 4 Alignments (1st) 4 Alignments (2nd) 4 Alignments (3rd) 4 Alignments (4th) 5 Alignments (1st) 5 Alignments (2nd) 5 Alignments (3rd) 5 Alignments (4th) 5 Alignments (5th)
Slide 42 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Average Traceback Cycle Counts for all Alignm ents
50000 100000 150000 200000 250000 300000 350000 400000 10 20 30 40 50 60 70 80 90 96 Sequence Lengths 1 Alignment 2 Alignments (1st) 2 Alignments (2nd) 3 Alignments (1st) 3 Alignments (2nd) 3 Alignments (3rd) 4 Alignments (1st) 4 Alignments (2nd) 4 Alignmens (3rd) 4 Alignments (4th) 5 Alignments (1st) 5 Alignments (2nd) 5 Alignments (3rd) 5 Alignments (4th) 5 Alignments (5th)
First (Longest) Traceback Cycle Counts Shorter (2 through k)
Alignments in this instance
Slide 43 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 44 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Slide 45 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
ctcgccgcgc ggcggacgct ccacgtgtcc cccgtctacc gggccctcct ggctcccaac agcttctcag ttcccacttc
Consists of a personality & application 768 GCUPS peak from single machine
Slide 46 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
searches
Slide 47 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
4 AEs each contain 4 tiles
short strings
‡Slide courtesy of Convey Computer
Slide 48 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
‡Graph courtesy of Convey Computer
Slide 49 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
1) Run parallel Convey cnysws application with traceback for two files containing sequences (S1^) and database queries (S2^) 2) Capture alignment information for all query and database matches For each query: For each database match: 3) Copy query and sequence into altered{Query,Database}_n_n.faa 4) Mark aligned bases for each query-database match While # of iterations <k-1: For each pair of files created in instruction 3: 5) Run cnysws 6) Repeat Step 2 7) If score for hit * δ < current score: 7.1) Track match 7.2) Mark aligned bases as matched for query-database match 8) Output the k sub-alignments
Slide 50 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
2 4 6 8 10 12 14 16 Tim e ( seconds) Sequence Length
Aver age Com putation Tim es
Externally
*Other files are single sequence to sequence
2 amino acid (AA) queries
Slide 51 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
—
Makes use Farrar extensions with SSE as well as North neighbor approximations
evolutionary models of similarity
Slide 52 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
—
Single-to-multiple
—
Multiple-to-single
—
Multiple-to-multiple
Slide 53 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
system in addition to compute time
Slide 54 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Biology, vol. 162, pp. 705-708, Dec 15 1982.
Search Tool," Journal of Molecular Biology, vol. 215, pp. 403-410, 1990.
BMC Bioinformatics, vol. 12, 2011.
Applications in the Biosciences (CABIOS), vol. 13, pp. 145 - 150, 1997.
implementations," Bioinformatics (Oxford, England), vol. 23, pp. 156-161, Jan 15 2007.
using parallel processing on common microprocessors," Bioinformatics (Oxford, England), vol. 16,
…Please see paper for full reference list
Slide 55 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA - LA-UR-12-20189
U N C L A S S I F I E D
Contact Info:
shannon@lanl.gov http://www.SwampAlign.com