Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort - PDF document

Department of Mathematics and Computer Science Department of Mathematics and Computer Science Overview � Why Parallel Sorting? � Parallel Quicksort � Bitonic Sort � Parallel Merge Sort Parallel Sorting Algorithms � Summary Course 01727 Parallel Programming Slide 2 Course 01727 Parallelism and VLSI Group Parallelism and VLSI Group Parallel Prof. Dr. J. Keller Programming Prof. Dr. Jörg Keller Department of Mathematics and Computer Science Department of Mathematics and Computer Science Why Parallel Sorting? Why Parallel Sorting? – cont‘d � One of the most important subroutines � Lots of parallel algorithms � Heavily investigated since >40 years � Three representatives: � Large data sets top-down/divide-conquer : quicksort � Looks quite sequential sorting network : bitonic sort � More difficult than numerics: bottom-up : merge sort little computation, mainly control and data movement � Concentrate on shared memory � Hints for message passing � Last two have been used on Cell BE processor Slide 3 Course 01727 Parallelism and VLSI Group Slide 4 Course 01727 Parallelism and VLSI Group Parallel Parallel Prof. Dr. J. Keller Prof. Dr. J. Keller Programming Programming Department of Mathematics and Computer Science Department of Mathematics and Computer Science Quicksort I Quicksort II � � Reminder: sequential quicksort Pivot can be chosen randomly qsort(int a[n]){ � Better: draw random sample of size O(sqrt(n)) choose pivot a[i]; choose pivot as median alow = {all a[j] with a[j]<a[i]}; // partition array improves balance of alow to ahigh ahigh = {all a[j] with a[j]>a[i]}; qsort(alow); qsort(ahigh); // divide � Pivot randomly attached to one of the partitions a = concat(alow, a[i], ahigh); // conquer Randomly to avoid continued disbalance } Attachment avoids separate treatment, e.g. in concat � Complexity: O(n log n) on average Slide 5 Course 01727 Parallelism and VLSI Group Slide 6 Course 01727 Parallelism and VLSI Group Parallel Parallel Prof. Dr. J. Keller Prof. Dr. J. Keller Programming Programming

Department of Mathematics and Computer Science Department of Mathematics and Computer Science Quicksort III Quicksort IV � � Partition implemented as reordering Two scenarios for Parallelization: - data already in shared memory, processors all running left = 0; right = n-1; - data must be read in, processors must be started seq. do{ while(a[left]<a[i]) left++; � Latter: runtime Ω (n) speedup O(log n) while(a[right]>a[i]) right--; ok for p=O(log n) i.e. small processor count exchange(a[left++],a[right--]); }while(left<right); � Simple parallelization: qsort(ahigh) done on different processor if size > n/p � Avoids separate arrays alow, ahigh (in-situ) Pointers suffice, concat implicit � Runtime: sequence of partitions n+n/2+n/4+…=O(n) Cache friendly plus seq sorts O(n/p*log(n/p)) Slide 7 Course 01727 Slide 8 Course 01727 Parallelism and VLSI Group Parallelism and VLSI Group Parallel Parallel Prof. Dr. J. Keller Prof. Dr. J. Keller Programming Programming Department of Mathematics and Computer Science Department of Mathematics and Computer Science Quicksort V Quicksort VI � � Advanced: accelerate partition step! Analysis: partition time O(n/p) seq sort O(n/p*log(n/p)) � Approach 1: flatten hierarchy (Sample Sort) choose p-1 pivots initially � Advantages: each proc i partitions n/p elements from array a no recursive calls into p partial lists ij according to pivots can also be used on message-passing machines each proc j gathers all partial lists ij into list j (one all-to-all communication) each proc j sorts list j sequentially � Disadvantage: not in-situ, lists ij need separate array Slide 9 Course 01727 Parallelism and VLSI Group Slide 10 Course 01727 Parallelism and VLSI Group Parallel Parallel Prof. Dr. J. Keller Prof. Dr. J. Keller Programming Programming Department of Mathematics and Computer Science Department of Mathematics and Computer Science Quicksort VII Quicksort VIII � � Approach 2: Tsigas‘ algorithm Partition done pagewise � � Keep divide-and-conquer, parallelize partition loop Page = block of constant size � � For each proc: ≤ 1 page with elements from both partit. Each proc partitions part of array of size n/p Then re-order partial partitions (details below) � Partition these pages sequentially in time O(p) � Partition processors into two sets or parallel in time O(log p) Choose number of processors for each partition in proportion to sizes of partition � Re-order pages so that each partition in consecutive memory locations Slide 11 Course 01727 Parallelism and VLSI Group Slide 12 Course 01727 Parallelism and VLSI Group Parallel Parallel Prof. Dr. J. Keller Prof. Dr. J. Keller Programming Programming

Department of Mathematics and Computer Science Department of Mathematics and Computer Science Quicksort IX Bitonic Sort I � � Implementation: No sequential counterpart! instead of left/right keep leftblock, rightblock � A sequence of numbers a=a 1 ,…,a n is called bitonic if � either there is a k such that a 1 ≤ … ≤ a k ≥ … ≥ a n Concurrent access to leftblock and rightblock managed either by lock or by fetch-and-add primitive or the sequence can be rotated to that form � Lemma (Batcher, 1968): If a is bitonic, then a‘ = min(a 1 , a n/2+1 ),…,min(a n/2 , a n ) a‘‘ = max(a 1 , a n/2+1 ),…,max(a n/2 , a n ) are both bitonic and max(a‘) ≤ min(a‘‘) � Kind of divide rule for bitonic sequences Slide 13 Course 01727 Slide 14 Course 01727 Parallelism and VLSI Group Parallelism and VLSI Group Parallel Parallel Prof. Dr. J. Keller Prof. Dr. J. Keller Programming Programming Department of Mathematics and Computer Science Department of Mathematics and Computer Science Bitonic Sort II Bitonic Sort III � � Consequence of the Lemma: Turn arbitrary sequence into bitonic sequence by sorting its halves in ascending and descending sortb(int a[n],int which){ // a must be bitonic order: compute a‘, a‘‘ according to Lemma if which == asc exchange max and min if which == desc sort(int a[n],which){ // a is an arbitrary sequence return(concat(sortb(a‘,which),sortb(a‘‘,which)) sort(a[1..n/2],asc); sort(a[n/2+1..n],desc); // now bitonic } sortb(a,which); } � Analysis: � bitonic seq can be sorted in time O(log n) with n proc.s Analysis for n proc.s: T(n) = T(n/2) + O(log n) = O((log n) 2 ) � Note: asc/desc order needed in a minute � Not optimal but constant is very small Slide 15 Course 01727 Parallelism and VLSI Group Slide 16 Course 01727 Parallelism and VLSI Group Parallel Parallel Prof. Dr. J. Keller Prof. Dr. J. Keller Programming Programming Department of Mathematics and Computer Science Department of Mathematics and Computer Science Bitonic Sort IV Bitonic Sort V � � Example n=8 Bitonic sort example of sorting network i.e. was intended for hardware � In software: oblivious, i.e. control flow indep. of data � With p processors: simple: each processor simulates n/p comparators better: stop recursion when size n/p is reached then sort sequentially Slide 17 Course 01727 Parallelism and VLSI Group Slide 18 Course 01727 Parallelism and VLSI Group Parallel Parallel Prof. Dr. J. Keller Prof. Dr. J. Keller Programming Programming

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort - PDF document

Department of Mathematics and Computer Science Department of Mathematics and Computer Science Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort Parallel Sorting Algorithms Summary Course 01727

01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 |

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 SF park overview OVERVIEW PRESENTATION / 2

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 Acknowledgements OVERVIEW PRESENTATION / 2 SF

INVESTOR PRESENTATION FEBRUARY 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

INVESTOR PRESENTATION MAY 2019 Index Executive Summary Company Overview Business Overview

INVESTOR PRESENTATION MARCH 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

1 Overview Overview Regional demographic overview Regional demographic overview Workforce

Covid-19 and Business Interruption: Maximizing Insurance Coverage and Federal Grants Counsel

OVERVIEW OVERVIEW OVERVIEW OVERVIEW The qualifications are aimed at primary school

An overview to Maltese An overview to Maltese An overview to Maltese An overview to Maltese

GSM System Overview GSM System Overview GSM System Overview GSM System Overview Phone Lin

Butterball Employees Butterball Employees Butterball Employees Benefits Overview Ruan Benefits

Program-for-Results Financing Overview Overview Overview of World Bank Instruments

INVESTOR PRESENTATION Index Executive Summary Company Overview Business Overview Industry

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Federal Fiscal Year 2017-18 CHASE Fee Program June 21, 2018 Overview CHASE Overview Fee

Parallel Functional Programming Repa Mary Sheeran http://www.cse.chalmers.se/edu/course/pfp

Resource Resource Management Management RESOURCE MANAGEMENT RESOURCE MANAGEMENT We have a

ProtoDUNE-DP Electronics and DAQ LBNC Meeting, 5 December 2019 Dario Autiero SFT chimneys,

CS 744: SPARK STREAMING Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Midterm grades this

Announcements My office hours 45PM today (schedule change for this week only). Last

1 Case study 1 Case study 2 Problem Problem Sort a huge randomly-ordered file of small Sort a

10. Genetic Algorithms 10.1 The Basic Algorithm General-purpose black-box optimisation

FourierSAT: A Fourier Expansion-Based Algebraic Framework for Solving Hybrid Boolean Constraints