Communication-Efficient String Sorting Timo Bingmann, Peter Sanders, - - PowerPoint PPT Presentation

communication efficient string sorting
SMART_READER_LITE
LIVE PREVIEW

Communication-Efficient String Sorting Timo Bingmann, Peter Sanders, - - PowerPoint PPT Presentation

Communication-Efficient String Sorting Timo Bingmann, Peter Sanders, Matthias Schimek 2020-05-18 @ IPDPS20 I NSTITUTE OF T HEORETICAL I NFORMATICS A LGORITHMICS A n t i d i s e s t a b l i s h m e n t a r i a n i


slide-1
SLIDE 1

INSTITUTE OF THEORETICAL INFORMATICS – ALGORITHMICS

Communication-Efficient String Sorting

Timo Bingmann, Peter Sanders, Matthias Schimek · 2020-05-18 @ IPDPS’20

s0 s1 s2 A n t i d i s e s t a b l i s h m e n t a r i a n i s m F l

  • c

c i n a u c i n i h i l i p i l i f i c a t i

  • n

H

  • n
  • r

i f i c a b i l i t u d i n i t a t i b u s

KIT – The Research University in the Helmholtz Association

www.kit.edu

slide-2
SLIDE 2

Abstract

There has been surprisingly little work on algorithms for sorting strings on distributed-memory parallel machines. We develop efficient algorithms for this problem based on the multi-way merging principle. These algorithms inspect only characters that are needed to determine the sorting order. Moreover, communication volume is reduced by also communicating (roughly) only those characters and by communicating repetitions

  • f the same prefixes only once. Experiments on up to 1280 cores

reveal that these algorithm are often more than five times faster than previous algorithms.

This document is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

2 / 14

slide-3
SLIDE 3

Why String Sorting?

string: array of characters over

s t r i n g

alphabet Σ sorted string set: sorted lexicographically

⇒ like in a dictionary

characteristics of string sets

s0 s1 s2 s3

a l g o r i t h m 0 c

  • m p a

r e c

  • m p a

r i s

  • n 0

p r e f i x 0

#strings n, #characters N sum distinguishing prefix lengths D

⇒ multidimensional data

  • nly published distributed string sorting algorithm:
  • ne paragraph in [Fischer and Kurpicz, ALENEX’19]

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

3 / 14

slide-4
SLIDE 4

String Sorting Toolbox

Sequential Sorting: String Radix Sort, Multikey Quicksort, . . .

[Kärkkäinen et al., SPIRE’08], [Bentley and Sedgewick, SODA’97]

evaluation of many sequential

⊥ 2 5 1 4 6 2 a l g o r i t h m 0 c

  • m p u

t e r a l p h a b e t c

  • p y 0

c

  • m p u

t i n g c

  • m p

l e t e a l p h a c h a r a c t e r

algorithms in [Bingmann ’18] needed: string sorting + Longest Common Prefix (LCP) array computation Multiway Merging: LCP Losertree

[Bingmann et. al, Algorithmica’17]

exploit LCP values to

(2, aab) (1, acb) (2, aac) (0, bca) (2, aab) (2, aac) (0, bca) (1, acb) LCP- Merge

save character-comparisons

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

4 / 14

slide-5
SLIDE 5

String Sorting Toolbox

LCP Compression

2 5 1 4 6 2

a l g o r i t h m 0 p h a b e t c h a r a c t e r

  • m p

l e t e u t e r i n g p y 0 ⊥

2 5 1 4 6 2

a l g o r i t h m 0 c

  • m p u

t e r a l p h a b e t c

  • p y 0

c

  • m p u

t i n g c

  • m p

l e t e a l p h a c h a r a c t e r

compress

each longest common prefix is sent only once compression: iterate over strings + LCP array decompression: iterate over compressed strings + LCP array

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

5 / 14

slide-6
SLIDE 6

Distributed Merge String Sort (MS)

Distributed Partitioning Algorithm String Exchange local sorting local sorting local sorting

y y

merging merging merging

Local Sorting String Radix Sort new: String Radix Sort + LCP array String Exchange no compression new: LCP compression Merging plain losertree new: LCP losertree

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

6 / 14

slide-7
SLIDE 7

Distributed Merge String Sort (MS)

sample sets Sorting of Sample Sets + Final Splitter Selection p − 1 final splitters regular sampling regular sampling regular sampling partitioning partitioning partitioning

Partitioning equidistant sampling gather + seq. sort new: hypercube quicksort

[Axtmann and Sanders, ALENEX’17]

broadcast final splitters partitioning

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

7 / 14

slide-8
SLIDE 8

Partitioning – Sampling Approaches

string-based sampling

a a a a a a a b 0 c d 0 e f f f f f f f

Goal: equal number of strings per bucket sampling of string array provable upper bounds character-based sampling

a a a a a a a b 0 c d 0 e f f f f f f f

Goal: equal number of characters per bucket sampling of character array provable upper bounds

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

8 / 14

slide-9
SLIDE 9

Prefix Doubling String Merge Sort (PDMS)

PE1: PE2: PE3:

A n t i d i s e s t a b l i s h m e n t a r i a n i s m 0 F l

  • c

c i n a u c i n i h i l i p i l i f i c a t i

  • n 0

H o n o r i f i c a b i l i t u d i n i t a t i b u s

same main structure as before use distributed Single-Shot Bloom Filter (dSBF)

[Sanders et al., IEEE BigData’13]

to approximate distinguishing prefixes with distributed duplicate detection

  • nly operate on those characters

calculate only the permutation for sorting (exchanging further characters is optional).

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

9 / 14

slide-10
SLIDE 10

Distinguishing Prefix Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 s0 s1 s2 s3 s0 s1 s2 s3 h(si) h(si)

a l p h a 0 c h a r a c t e r 0 s o r t i n g 0 s t r i n g 0 a l g o 0 c o m p a r e 0 p r e f i x 0 s c a l e 0

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

10 / 14

slide-11
SLIDE 11

Distinguishing Prefix Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 s0 2 s1 19 s2 7 s3 7 s0 2 s1 19 s2 13 s3 7 h(si) h(si)

a l p h a 0 c h a r a c t e r 0 s o r t i n g 0 s t r i n g 0 a l g o 0 c o m p a r e 0 p r e f i x 0 s c a l e 0

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

10 / 14

slide-12
SLIDE 12

Distinguishing Prefix Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 s0 2 s1 19 s2 7 s3 7 s0 2 s1 19 s2 13 s3 7 h(si) h(si) m1 := [2, 7] m2 := [19] m1 := [2, 7] m2 := [13, 19]

a l p h a 0 c h a r a c t e r 0 s o r t i n g 0 s t r i n g 0 a l g o 0 c o m p a r e 0 p r e f i x 0 s c a l e 0

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

10 / 14

slide-13
SLIDE 13

Distinguishing Prefix Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 s0 2 s1 19 s2 7 s3 7 s0 2 s1 19 s2 13 s3 7 h(si) h(si) m1 := [2, 7] m2 := [19] m1 := [2, 7] m2 := [13, 19]

a l p h a 0 c h a r a c t e r 0 s o r t i n g 0 s t r i n g 0 a l g o 0 c o m p a r e 0 p r e f i x 0 s c a l e 0

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

10 / 14

slide-14
SLIDE 14

Distinguishing Prefix Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 s0 5 s1 15 s2 7 s3 0 s0 5 s1 11 s2 s3 0 h(si) h(si)

a l p h a 0 c h a r a c t e r 0 s o r t i n g 0 s t r i n g 0 a l g o 0 c o m p a r e 0 p r e f i x 0 s c a l e 0

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

10 / 14

slide-15
SLIDE 15

Distinguishing Prefix Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 s0 5 s1 15 s2 7 s3 0 s0 5 s1 11 s2 s3 0 h(si) h(si) m1 := [0, 5, 7] m2 := [15] m1 := [0, 5] m2 := [11]

a l p h a 0 c h a r a c t e r 0 s o r t i n g 0 s t r i n g 0 a l g o 0 c o m p a r e 0 p r e f i x 0 s c a l e 0

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

10 / 14

slide-16
SLIDE 16

Distinguishing Prefix Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 s0 5 s1 15 s2 7 s3 0 s0 5 s1 11 s2 s3 0 h(si) h(si) m1 := [0, 5, 7] m2 := [15] m1 := [0, 5] m2 := [11]

a l p h a 0 c h a r a c t e r 0 s o r t i n g 0 s t r i n g 0 a l g o 0 c o m p a r e 0 p r e f i x 0 s c a l e 0

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

10 / 14

slide-17
SLIDE 17

Distinguishing Prefix Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 s0 10 s1 s2 s3 3 s0 14 s1 s2 s3 18 h(si) h(si)

a l p h a 0 c h a r a c t e r 0 s o r t i n g 0 s t r i n g 0 a l g o 0 c o m p a r e 0 p r e f i x 0 s c a l e 0

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

10 / 14

slide-18
SLIDE 18

Distinguishing Prefix Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 s0 10 s1 s2 s3 3 s0 14 s1 s2 s3 18 h(si) h(si) m1 := [3] m2 := [10] m1 := [] m2 := [14, 18]

a l p h a 0 c h a r a c t e r 0 s o r t i n g 0 s t r i n g 0 a l g o 0 c o m p a r e 0 p r e f i x 0 s c a l e 0

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

10 / 14

slide-19
SLIDE 19

Distinguishing Prefix Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 s0 10 s1 s2 s3 3 s0 14 s1 s2 s3 18 h(si) h(si) m1 := [3] m2 := [10] m1 := [] m2 := [14, 18]

a l p h a 0 c h a r a c t e r 0 s o r t i n g 0 s t r i n g 0 a l g o 0 c o m p a r e 0 p r e f i x 0 s c a l e 0

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

10 / 14

slide-20
SLIDE 20

Distinguishing Prefix Computation

s0 s1 s2 s3 s0 s1 s2 s3 h(si) h(si)

a l p h a 0 c h a r a c t e r 0 s o r t i n g 0 s t r i n g 0 a l g o 0 c o m p a r e 0 p r e f i x 0 s c a l e 0

Implementation Remarks on dSBF Golomb encoding for hash values no need to materialize Bloom filter

⇒ merge received sequences

Timo Bingmann, Peter Sanders, Matthias Schimek – Communication-Efficient String Sorting Institute of Theoretical Informatics – Algorithmics May 18th, 2020

10 / 14

slide-21
SLIDE 21

Experimental Evaluation – Setup

Input Data D/N-Generator (n=9, ℓ=6, D/N=0.5)

s0 s1 s2 s3 s4 s5 s6 s7 s8

a a a a a 0 a a b a a 0 a a c a a 0 a b a a a 0 a b b a a 0 a b c a a 0 a c a a a 0 a c b a a 0 a c c a a 0

weak scaling with D/N-Generator strong scaling with COMMONCRAWL and DNAREADS Hardware (ForHLR I at KIT) 2 Deca-core Intel Xeon E5-2670 v2 (2.5 GHz) and 64 GB RAM per compute node InfiniBand 4X FDR interconnect Algorithms FKmerge: from Fischer and Kurpicz [ALENEX’19] hQuick: distributed quicksort

  • ur merge sort: MS-simple (no LCP-comp), MS (LCP-comp)
  • ur prefix doubling merge sort: PDMS-Golomb, PDMS
slide-22
SLIDE 22

D/N-Generator(n=p·500K, ℓ=500, D/N=?)

5 10 15 time (s)

0.0 0.25 0.5 0.75 1.0

20 40 80 160 320 640 1,280 200 400 600 # of PEs bytes sent per string 20 40 80 160 320 640 1,280 # of PEs 20 40 80 160 320 640 1,280 # of PEs 20 40 80 160 320 640 1,280 # of PEs 20 40 80 160 320 640 1,280 # of PEs FKmerge hQuick MS-simple MS PDMS-Golomb PDMS

slide-23
SLIDE 23

Strong Scaling with Real-World Inputs

10 20 30 time (s)

COMMONCRAWL (82 GB)

160 320 640 1,280 20 30 40 50 number of PEs bytes sent per string 5 10 15 20

DNAREADS (125 GB)

160 320 640 1,280 40 60 80 100 number of PEs FKmerge hQuick MS-simple MS PDMS-Golomb PDMS

slide-24
SLIDE 24

Conclusion

Summary two new communication-efficient string sorting algorithms: distributed string merge sort (MS) distributed prefix-doubling string merge sort (PDMS) theory and experimental evaluation different strategies best for low and high D/N-ratios Source code and recording of talk: https://panthema.net/2020/0518-distributed-string-sorting Future Work improve balancing by considering strings and characters can one show lower bounds? Questions via email to bingmann@kit.edu