A Parallel, In-Place, Rectangular Matrix Transpose Algorithm - - PowerPoint PPT Presentation

a parallel in place rectangular matrix transpose algorithm
SMART_READER_LITE
LIVE PREVIEW

A Parallel, In-Place, Rectangular Matrix Transpose Algorithm - - PowerPoint PPT Presentation

Stefan Amberger ICA & RISC amberger.stefan@gmail.com A Parallel, In-Place, Rectangular Matrix Transpose Algorithm Computational Complexity Analysis Table of Contents 1. Introduction 2. Revision of TRIP 3. Analysis of Computational


slide-1
SLIDE 1

A Parallel, In-Place, Rectangular Matrix Transpose Algorithm

Computational Complexity Analysis

Stefan Amberger

ICA & RISC amberger.stefan@gmail.com

slide-2
SLIDE 2

Table of Contents

1. Introduction 2. Revision of TRIP 3. Analysis of Computational Complexity

a. Work b. Span c. Parallelism d. Generalizations

2

slide-3
SLIDE 3

Introduction

3

slide-4
SLIDE 4

Work “execution time on one processor” i.e.: all vertices of computation dag approximation: # of nodes of computation dag Span “execution time on infinitely many processors” i.e. length of critical path of computation dag approximation: # of nodes on critical path

4

Introduction

Introduction to Algorithms Third Edition, p777ff

Computational Complexity of Parallel Algorithms

Parallelism

  • average amount of work per step along critical

path

  • maximum possible speedup
  • limit on possibility of attaining perfect

speedup

slide-5
SLIDE 5

Revision of TRIP

5

slide-6
SLIDE 6

TRIP: If matrix is rectangular TRIP transposes sub-matrices, then combines the result with merge or split merge: first rotates the middle part of the array, then recursively merges the left and right parts

  • f the array

split: first recursively splits the left and right parts of the array, then rotates the middle part of the array

6

slide-7
SLIDE 7

Analysis of Computational Complexity

7

slide-8
SLIDE 8

Matrix dimensions M x N and N x M recursive calls are symmetric TRIP’s recursive call are either all merge or all split

8

Restriction to Powers of Two

“power condition”

slide-9
SLIDE 9

Work

9

  • 1. Example: Basic Algorithms
  • 2. TRIP Result & Proof Sketch
slide-10
SLIDE 10

10

Work Example

Work of Base Algorithms

slide-11
SLIDE 11

Show that under power condition, for M x N matrix

11

Result

Work of TRIP

METHOD don’t count vertices in computation dag count inner nodes in recursion tree and swaps

  • function calls (nodes in recursive call trees)
  • memory accesses (swaps)
slide-12
SLIDE 12

Work of TRIP Work of merge Work of rol

12

Work of TRIP

Proof Sketch

slide-13
SLIDE 13

TRIP recursion is analogous for tall and wide matrices

  • nly difference:
  • in merge rol is called before the recursive merge call
  • in splitt rol is called after the recursive split call

This difference does not cause a change in the amount of work of TRIP.

13

Work of TRIP

wide matrices

slide-14
SLIDE 14

14

Visualization

Work as function of Matrix Dimensions

slide-15
SLIDE 15

Span

15

  • 1. Example: Basic Algorithms
  • 2. TRIP Result
slide-16
SLIDE 16

16

Span Example

Span of Base Algorithms

slide-17
SLIDE 17

Calculate span of tall matrix transpose count levels and swaps on critical path, that includes span of

  • creating the divide tree
  • combining the nodes via merge/split (itself

recursive procedures)

  • square-transposing in the leaf nodes

17

Span of TRIP

Result

slide-18
SLIDE 18

18

Visualization

Span as function of Matrix Dimensions

slide-19
SLIDE 19

Parallelism

19

slide-20
SLIDE 20

Rectangular Matrices Square Matrices calculation:

  • divide work by span
  • case distinction rectangular / square
  • simplification using Landau symbols

20

Result

slide-21
SLIDE 21

Generalizations

21

power condition unsatisfied

slide-22
SLIDE 22

22

Example: 7 x 5 Matrix

slide-23
SLIDE 23

23

Generalization

Power Condition not Satisfied

slide-24
SLIDE 24

24

Generalization

Power Condition not Satisfied

slide-25
SLIDE 25

25

Generalization

Power Condition not Satisfied

slide-26
SLIDE 26

Thank you

slide-27
SLIDE 27

Revision of TRIP

27

slide-28
SLIDE 28

TRIP Algorithm

If matrix is rectangular TRIP transposes sub-matrices, then combines the result with merge or split

28

slide-29
SLIDE 29

merge combines the transposes of sub-matrices of tall matrices merge first rotates the middle part of the array, then recursively merges the left and right parts of the array rol(arr, k) … left rotation (circular shift) of array arr by k elements

29

merge Algorithm

slide-30
SLIDE 30

split combines the transposes of sub-matrices of wide matrices split first recursively splits the left and right parts of the array, then rotates the middle part of the array split and merge are inverse to each other

30

split Algorithm

slide-31
SLIDE 31

Work Proof

31

slide-32
SLIDE 32

Calculate work of tall matrix transpose

  • spanning the divide tree
  • combining the nodes via merge/split (itself

recursive procedures)

  • square-transposing in the leaf nodes

32

Work of TRIP

Overview

slide-33
SLIDE 33

Combining Nodes via merge

33

Work of TRIP

Proof - TRIP Tree

Spanning Divide Tree

slide-34
SLIDE 34

34

Work of TRIP

Proof - Merge Tree

Combining via merge, rotate sub-arrays

slide-35
SLIDE 35

Integrate rol result into merge work

35

Work of TRIP

Proof - Merge Tree

slide-36
SLIDE 36

Integrate merge result into TRIP work

36

Work of TRIP

Proof - TRIP Tree

slide-37
SLIDE 37

Recap

37

Work of TRIP

Proof - Square Transpose

slide-38
SLIDE 38

Lower Bound on # of inner nodes purely ternary tree Upper Bound on # of inner nodes purely quaternary tree

38

Work of TRIP

Proof - Square Transpose

slide-39
SLIDE 39

Integrate square transpose result into TRIP work work of square transpose (including swapping)

39

Work of TRIP

Proof - TRIP Tree

slide-40
SLIDE 40

Conclusions

40

slide-41
SLIDE 41

Novel Algorithm TRIP transposes rectangular matrices

  • correctly
  • in-place
  • in highly parallel manner

41

Conclusions

slide-42
SLIDE 42

Roadmap

1. Work 2. Span 3. Parallelism

42