Engineering Motif Search for Large Motifs Petteri Kaski 1 Juho Lauri - PowerPoint PPT Presentation

Engineering Motif Search for Large Motifs Petteri Kaski 1 Juho Lauri 2 Suhas Thejaswi 1 1 Department of Computer Science 2 Nokia Bell Labs Aalto University, Espoo, Finland Dublin, Ireland Symposium of Experimental Algorithms (SEA 2018) L’Aquila, Italy, Friday 29 June 2018

Summit — Oak Ridge National Laboratory 200M 27 , 648 × NVIDIA GV100 GPUs, 141,557,760 cores , 15 MW

NVIDIA DGX-1 — Aalto University 100K 8 × NVIDIA GV100 GPUs, 40960 cores , 3 KW

• ∼ 2400 GHz GF ( 2 8 ) field multiplications • ∼ 6 terabytes/sec 2400 GHz = 2 . 4 trillion multiplications/sec 1 bit = 1 cm 2 , 6 terabytes = 4800 km 2 • Rome metropolitan city area is 5 , 352 km 2 Source: Wikipedia

Outline • Background on motif search • Engineering a practical implementation of constrained multilinear sieving for massively vector-parallel microarchitectures (shared-memory multi-GPU systems) • Experiments What we want? • vector parallelization • saturate memory bandwidth • offload to multiple GPUs

Motif search problem Data Vertex-colored graph H (the host graph) Query Multiset M of colors (the motif) Query matches a connected subgraph?

Data, query, and one match

Complexity • NP -complete if M has at least two colors • Fixed-parameter tractable (FPT) • Solvable in linear time in the size of H ( exponential in the size of M ) Shown to be FPT by Fellows, Fertin, Hermelin, Vialette, ICALP 2007

FPT race Authors Time complexity Conference O ( ∼ 87 k poly ( n , m )) Fellows et al. ICALP 2007 O ( 4 . 32 k poly ( n , m )) Betzler et al. CPM 2008 O ( 4 k poly ( n , m )) Guillemot & Sikora MFCS 2010 O ( 2 . 54 k poly ( n , m )) Koutis IPL 2012 O ( 2 k k 2 m ) Björklund et al. STACS 2013 n – number of vertices m – number of edges k – motif size O ∗ (( 2 − ǫ ) k ) for motif search implies O ∗ (( 2 − δ ) n ) for set cover

Algorithm

Constrained multilinear sieving Converting a combinatorial problem to an algebraic problem (detecting a multilinear monomial in a multivariate polynomial) • Björklund, Kaski and Kowalik STACS-2013/Algorithmica-2016 Arithmetic over GF ( 2 b )

Constrained multilinear sieving Converting a combinatorial problem to an algebraic problem (detecting a multilinear monomial in a multivariate polynomial) • Björklund, Kaski and Kowalik STACS-2013/Algorithmica-2016 • Randomized decision algorithm (YES/NO) Arithmetic over GF ( 2 b )

Constrained multilinear sieving Converting a combinatorial problem to an algebraic problem (detecting a multilinear monomial in a multivariate polynomial) • Björklund, Kaski and Kowalik STACS-2013/Algorithmica-2016 • Randomized decision algorithm (YES/NO) • YES, always correct no false positives Arithmetic over GF ( 2 b )

Constrained multilinear sieving Converting a combinatorial problem to an algebraic problem (detecting a multilinear monomial in a multivariate polynomial) • Björklund, Kaski and Kowalik STACS-2013/Algorithmica-2016 • Randomized decision algorithm (YES/NO) • YES, always correct no false positives • NO, false-negative probability k · 2 − b + 1 Arithmetic over GF ( 2 b )

High-level algorithm (Björklund, Kaski, Kowalik) Output YES if and only if the sum of 2 k evaluations of a multivariate polynomial P ( x , y ) is non-zero • 2 k evaluations : 2 k points ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , , ( x ( 2 k ) , y ( 2 k ) ) · · · depend on motif M and random bits M ( b ) complexity to multiply in GF ( 2 b )

High-level algorithm (Björklund, Kaski, Kowalik) Output YES if and only if the sum of 2 k evaluations of a multivariate polynomial P ( x , y ) is non-zero • 2 k evaluations : 2 k points ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , , ( x ( 2 k ) , y ( 2 k ) ) · · · depend on motif M and random bits • Multivariate polynomial : defined by host graph H , has n + 2 ( k − 1 ) m variables and degree 2 k − 1 M ( b ) complexity to multiply in GF ( 2 b )

High-level algorithm (Björklund, Kaski, Kowalik) Output YES if and only if the sum of 2 k evaluations of a multivariate polynomial P ( x , y ) is non-zero • 2 k evaluations : 2 k points ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , , ( x ( 2 k ) , y ( 2 k ) ) · · · depend on motif M and random bits • Multivariate polynomial : defined by host graph H , has n + 2 ( k − 1 ) m variables and degree 2 k − 1 • One evaluation takes O ( k 2 mM ( b )) time M ( b ) complexity to multiply in GF ( 2 b )

High-level algorithm (Björklund, Kaski, Kowalik) Output YES if and only if the sum of 2 k evaluations of a multivariate polynomial P ( x , y ) is non-zero • 2 k evaluations : 2 k points ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , , ( x ( 2 k ) , y ( 2 k ) ) · · · depend on motif M and random bits • Multivariate polynomial : defined by host graph H , has n + 2 ( k − 1 ) m variables and degree 2 k − 1 • One evaluation takes O ( k 2 mM ( b )) time • Overall runtime O ( 2 k k 2 mM ( b )) M ( b ) complexity to multiply in GF ( 2 b )

CPU implementation (ALENEX 2015) Large graphs — how about large motifs? Open source — https://github.com/pkaski/motif-search Exponential complexity in motif size

Design considerations — shared-memory multi-GPUs Positives • High arithmetic and memory bandwidth • Massive vector-parallelism — for example NVIDIA DGX-1 has ∼ 40,000 cores Negatives • High memory latency — bandwidth comes at the cost of latency • Lack of hardware support for finite-field arithmetic — PCLMULQDQ instruction set speeds up finite-field arithmetic in CPUs

Design considerations — shared-memory multi-GPUs Using available bandwidth • Keeping pipeline busy — memory access and arithmetic operations simultaneously — hide latency with enough instructions “ in flight ” • Coalesced memory access — access memory in blocks • Bit-sliced finite-field arithmetic — to overcome the lack of hardware support

Vertex-localized sieve Base case, for all i ∈ [ n ] and L ⊆ [ k ] P i , 1 ( ζ L , α ) = ζ L i For each s = 2 , 3 , . . . , k , i ∈ [ n ] , and L ⊆ [ k ] � � P i , s ( ζ L , α ) = P i , s 1 ( ζ L , α ) P j , s 2 ( ζ L , α ) α s , ( i , j ) s 1 + s 2 = s j ∈ Γ H ( i ) s 1 , s 2 ≥ 1 Finally, sum at each vertex � P i , k ( ζ L , α ) Q i , k ( µ, ν, α ) = L ⊆ [ k ] Parallelization over vertices i ∈ [ n ] ( n threads) and L ⊆ [ k ] (2 k threads) Parallelization over L vectorizes upto 2 k threads

Workloads and uniformity Project (vertex) Project (vertex) Workers (CPU threads) Workers (GPU threads) D workers (threads) work on each project (vertex) D divides 2 k , execution in each thread of CPU is mostly independent All threads (typically 32) in a GPU warp execute same instructions

Workloads and uniformity Vertices 1 2 . . . n . . . Threads D workers D workers . . . D workers Workloads of shape n × D (single GPU) Workload of shape M × n × D ( M GPUs) Each project (vertex) has different completion time

Memory layout and coalescence Coalescence — each thread will be doing the same boring work � � P i , s ( ζ L , α ) = P i , s 1 ( ζ L , α ) P j , s 2 ( ζ L , α ) α s , ( i , j ) s 1 + s 2 = s j ∈ Γ H ( i ) s 1 , s 2 ≥ 1 Private L P i , s ( ζ L ′ , α ) = � � P i , s 1 ( ζ L ′ , α ) P j , s 2 ( ζ L ′ , α ) α s , ( i , j ) s 1 + s 2 = s j ∈ Γ H ( i ) s 1 , s 2 ≥ 1 Private L ′ L and L ′ work on same the vertex i Each thread execute same set of instructions on different data

Memory layout and coalescence P i , s ( ζ L , α ) = � � P i , s 1 ( ζ L , α ) P j , s 2 ( ζ L , α ) α s , ( i , j ) s 1 + s 2 = s j ∈ Γ H ( i ) s 1 , s 2 ≥ 1 Platoon C P i ′ , s ( ζ L , α ) = � � P i ′ , s 1 ( ζ L , α ) P j , s 2 ( ζ L , α ) α s , ( i ′ , j ) s 1 + s 2 = s j ∈ Γ H ( i ′ ) s 1 , s 2 ≥ 1 Platoon C ′ C and C ′ work on vertices i and i ′ , respectively Each platoon has D privates

Memory layout and coalescence Private Resources • Access A space each iteration • n × D workers X resources S resources A space U space (each iteration) (total) Memory layout shape U A × n × D × A Resources = scalars, space = memory (words) Each load/store access A words of data

Inner loop in CUDA For each s = 2 , 3 , . . . , k , i ∈ [ n ] , and L ⊆ [ k ] P i , s ( ζ L , α ) = � � P i , s 1 ( ζ L , α ) P j , s 2 ( ζ L , α ) α s , ( i , j ) s 1 + s 2 = s j ∈ Γ H ( i ) s 1 , s 2 ≥ 1 for(index_t s1 = 1; s1 < s; s1++) { index_t s2 = s-s1; index_t s1i = LINE_IDX(n, gl, s1, i, a); line_t p_s1i; LINE_LOAD(p_s1i, d_s, seg, s1i); /* Load P_{i,s1} */ index_t s2j = LINE_IDX(n, gl, s2, j, a); line_t p_s2j; LINE_LOAD(p_s2j, d_s, seg, s2j); /* Load P_{j,s2} */ line_t p_s1i_s2j; LINE_MUL(p_s1i_s2j, p_s1i, p_s2j); /* Line multiplication */ LINE_ADD(p_sij, p_sij, p_s1i_s2j); /* Store result */ }

Experiments

Hardware configurations • CPU compute node 2 × 2 . 6-GHz Intel Xeon E5-2690v3 CPU Haswell microarchitecture, 12 cores/CPU (24 cores), 30 MiB L3 cache, 128 GiB main memory, (8 × 16 GiB DDR4-2133) • NVIDIA DGX-1 8 × 1312-GHz NVIDIA GV100 GPU Volta microarchitecture, 5120 cores/GPU (40960 cores), 128 GiB of on-device memory (8 × 16 GiB 4096-bit HBM2)

Engineering Motif Search for Large Motifs Petteri Kaski 1 Juho Lauri - PowerPoint PPT Presentation

Engineering Motif Search for Large Motifs Petteri Kaski 1 Juho Lauri 2 Suhas Thejaswi 1 1 Department of Computer Science 2 Nokia Bell Labs Aalto University, Espoo, Finland Dublin, Ireland Symposium of Experimental Algorithms (SEA 2018)

RNA Search and Whirlwind tour of ncRNA search & discovery Motif Discovery RNA motif

Motif Discovery Upper Bound An Upper Bound on the Hardness of Exact Matrix Based Motif Discovery

Regulatory Motif Prediction in DNA Regulatory Motif Prediction in DNA Introduction: toward

Detection of network motifs by local Local Statistics concentration A global statistic Motif

More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery

Probability Theory as Extended Logic: Probability Theory as Extended Logic: Applications to motif

From Closing Triangles to Closing Higher-Order Motifs Ryan A. Rossi 1 | Anup Rao 1 , Sungchul Kim

More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery

Genome 559 Lecture 13a, 2/16/10 Larry Ruzzo A little more about motif models Motifs III

A STUDY OF TORSION ANGLES OF RNA MOTIFS By Sai Teja Kshir Sagar Bioinformatics Independent

Network Motifs Bioinformatics: Sequence Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice

Engineering motif search for large graphs 10101011110101 Andreas Bjrklund Petteri Kaski

Assi Assignm gnment 6: Motif f Findi nding ng Bi Bio5488 2/ 2/24/ 24/17 17 Slide

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

MOTIFS DISTRIBUTION IN DNA SEQUENCES St ephane ROBIN robin@inapg.inra.fr UMR INA-PG / INRA,

Hindustani Classical Music: Methods And Evaluation Strategy Joe Cheri Ross and Preeti Rao IIT

Thomas Tauris Argelander-Institut fr Astronomie - Universitt Bonn Max-Planck-Institut fr

Bank Mapping Tool Quantitative Assessment of banks risk 2020 Bank Mapping Tool A non- linear

A review of marine bird diving Scottish Natural Heritage behaviour: assessing underwater

Compare by-catch numbers against threshold value Photo: Australian Fisheries Management

Committed to providing quality of life enhancing opportunities through the management of

Particle acceleration, pair creation and gamma-ray emission of pulsars K.S. Cheng Department of

Ton IJlstra Ministry of Economic Affairs, the Netherlands; Chair of the Dogger Bank Steering

1. Discovering our training centers: CFAI and AFPI Our center AFPI Rgion havraise is part

Engineering Motif Search for Large Motifs Petteri Kaski 1 Juho Lauri - PowerPoint PPT Presentation

Engineering Motif Search for Large Motifs Petteri Kaski 1 Juho Lauri 2 Suhas Thejaswi 1 1 Department of Computer Science 2 Nokia Bell Labs Aalto University, Espoo, Finland Dublin, Ireland Symposium of Experimental Algorithms (SEA 2018)

RNA Search and Whirlwind tour of ncRNA search &amp; discovery Motif Discovery RNA motif

Motif Discovery Upper Bound An Upper Bound on the Hardness of Exact Matrix Based Motif Discovery

Regulatory Motif Prediction in DNA Regulatory Motif Prediction in DNA Introduction: toward

Detection of network motifs by local Local Statistics concentration A global statistic Motif

More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy &amp; EM for motif discovery

Probability Theory as Extended Logic: Probability Theory as Extended Logic: Applications to motif

From Closing Triangles to Closing Higher-Order Motifs Ryan A. Rossi 1 | Anup Rao 1 , Sungchul Kim

More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy &amp; EM for motif discovery

Genome 559 Lecture 13a, 2/16/10 Larry Ruzzo A little more about motif models Motifs III

A STUDY OF TORSION ANGLES OF RNA MOTIFS By Sai Teja Kshir Sagar Bioinformatics Independent

Network Motifs Bioinformatics: Sequence Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice

Engineering motif search for large graphs 10101011110101 Andreas Bjrklund Petteri Kaski

Assi Assignm gnment 6: Motif f Findi nding ng Bi Bio5488 2/ 2/24/ 24/17 17 Slide

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

MOTIFS DISTRIBUTION IN DNA SEQUENCES St ephane ROBIN robin@inapg.inra.fr UMR INA-PG / INRA,

Hindustani Classical Music: Methods And Evaluation Strategy Joe Cheri Ross and Preeti Rao IIT

Thomas Tauris Argelander-Institut fr Astronomie - Universitt Bonn Max-Planck-Institut fr

Bank Mapping Tool Quantitative Assessment of banks risk 2020 Bank Mapping Tool A non- linear

A review of marine bird diving Scottish Natural Heritage behaviour: assessing underwater

Compare by-catch numbers against threshold value Photo: Australian Fisheries Management

Committed to providing quality of life enhancing opportunities through the management of

Particle acceleration, pair creation and gamma-ray emission of pulsars K.S. Cheng Department of

Ton IJlstra Ministry of Economic Affairs, the Netherlands; Chair of the Dogger Bank Steering

1. Discovering our training centers: CFAI and AFPI Our center AFPI Rgion havraise is part

RNA Search and Whirlwind tour of ncRNA search & discovery Motif Discovery RNA motif

More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery

More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery