Alternating paths and US patent #5,905,666 Alan J. Hoffman John - - PowerPoint PPT Presentation

alternating paths and us patent 5 905 666
SMART_READER_LITE
LIVE PREVIEW

Alternating paths and US patent #5,905,666 Alan J. Hoffman John - - PowerPoint PPT Presentation

Alternating paths and US patent #5,905,666 Alan J. Hoffman John A. Tomlin William R. Pulleyblank IBM US Patent 5,905,666 PROCESSING SYSTEM AND METHOD FOR PERFORMING SPARSE MATRIX MULTIPLICATION BY REORDERING VECTOR BLOCKS The problem:


slide-1
SLIDE 1

Alternating paths and US patent #5,905,666

Alan J. Hoffman John A. Tomlin William R. Pulleyblank IBM

slide-2
SLIDE 2

US Patent 5,905,666 PROCESSING SYSTEM AND METHOD FOR PERFORMING SPARSE MATRIX MULTIPLICATION BY REORDERING VECTOR BLOCKS

slide-3
SLIDE 3

The problem: Efficiently computing Ax and yA

  • Linear programming:

– Primal: max cx, Ax = b, x m 0 – Dual: min yb, yA m c – Requires computation of Ax and yA

  • Simplex algorithm:

– Small number of computations of Ax, but many computations of yA when performing pricing

  • Critical to performance of algorithm
  • Interior algorithms

– Many fewer iterations, but require similar number of computations of Ax and yA

slide-4
SLIDE 4

Computational details - yA

# # . . . # # # . . . . # # . . . . # . # # # . # . . . # . # . . # . # # # # . . # . . # . . . # # . . . # # . . # . . . # . # # . . # . # . . . # # # # # # # # . .

.

# # # # #

y A Problem: typically the matrices are sparse in practice, typically 4 to 8 nonzeros per column Partition matrix based on the number of nonzeros per column Row indices of nonzeros Nonzero values

slide-5
SLIDE 5

Now we can focus on dense submatrices

# # # # #

y values indices Normal (scalar) computation: 18 machine cycles per calculation; 15 cycles initialization Vector Unit computation: 180 machine cycles to startup, 4 cycles per calculation

  • Require approximately 12 elements to break even
  • so, do computation by rows
  • Use indices to select components of y
  • Use “accumulator” to compute y*Aj

A y Expanded y row of A y*A Expanded indices

slide-6
SLIDE 6

What goes wrong with the computation of Ax?

x A Ax Indices of nonzeros * = Entries in index rows now specify which component of Ax is being computed. If all entries in each index row are distinct, then it works. If there are duplicate entries, then we can attempt to permute the entries in each column to eliminate duplicates.

slide-7
SLIDE 7

Example

13 8 11 10 12 14 7 9 4 6 1 6 8 3 3 11 4 9 8 11 15 5 4 2 6 7 5 7 14 2 2 15 16 1 3 1

Index matrix Suppose A has 16 rows, and our block has 12 columns and 3 rows. Columns of submatrix of A Rows of A . . . Bipartite graph representation

slide-8
SLIDE 8

Applying Koenig’s edge coloring theorem

  • Let d be the number of rows in the submatrix. Permuting

the entries in each column, so that no row contains duplicates, is equivalent to edge coloring the bipartite graph with d colors.

  • By Koenig’s theorem, this is possible if and only if each

node of the bipartite graph has degree at most d, which is equivalent to each index appearing at most d times in the submatrix.

  • We need an efficient way to obtain the edge coloring. If

there is no coloring, because some index occurs too often, we can permute these to the last row and reduce the size

  • f the block.
slide-9
SLIDE 9

Alternating path based proof

  • Faber, Ehrenfeucht and Kierstead (1982)

presented an alternating path based proof:

– Start with any coloring of the edges. – If some node v is incident with two identically colored edges c1, then some edge color c2 must be missing – Switch the edges in a c1-c2 alternating path’

v v

slide-10
SLIDE 10

Matrix interpretation

  • Start with any permutations in the columns of the

submatrix; process columns left to right, top to bottom c1 c1

Row not containing c1 in processed part c2 Swap these Two entries

If this creates a c2 conflict with processed elements, it involves original row, repeat.

Row with duplicate c1

slide-11
SLIDE 11

Computational results

502943 244439 210211 24465 40757 9994 3580 1670

Nonzeros

2.3 219249 90482 MPS4 2.2 110475 40022 Mod42 2.2 94752 34952 Mod41 14.2 1727 1412 Degen3 12.1 3361 1374 PILOTS 6.7 1484 715 25FV47 7.5 473 381 Degen2 4.9 340 185 BandM

Average # of nonzeros per column

Columns Rows Model

slide-12
SLIDE 12

Computational results

244 166 127 53 109 79 40 46

  • No. of yA

calcs.

408 278 213 93 184 134 68 79

  • No. of Ax

calcs.

82.826 26.746 17.672 0.676 2.440 0.486 0.101 0.056

Old Ax time

18.187 43.543 3.361

MPS4

5.863 17.169 1.289

Mod42

3.793 11.209 1.340

Mod41

0.186 0.462 0.089

Degen3

0.523 1.662 0.216

PILOTS

0.099 0.416 0.044

25FV47

0.028 0.062 0.020

Degen2

0.018 0.043 0.007

BandM

Total yA time New Ax time Reordering time Model

slide-13
SLIDE 13

Reference

A.J. Hoffman, W.R. Pulleyblank, J.A. Tomlin, On computing Ax and πTA, when A is sparse, Annals of Numer. Math. 4 (1997) 359-367.