Alternating paths and US patent #5,905,666 Alan J. Hoffman John - - PowerPoint PPT Presentation
Alternating paths and US patent #5,905,666 Alan J. Hoffman John - - PowerPoint PPT Presentation
Alternating paths and US patent #5,905,666 Alan J. Hoffman John A. Tomlin William R. Pulleyblank IBM US Patent 5,905,666 PROCESSING SYSTEM AND METHOD FOR PERFORMING SPARSE MATRIX MULTIPLICATION BY REORDERING VECTOR BLOCKS The problem:
US Patent 5,905,666 PROCESSING SYSTEM AND METHOD FOR PERFORMING SPARSE MATRIX MULTIPLICATION BY REORDERING VECTOR BLOCKS
The problem: Efficiently computing Ax and yA
- Linear programming:
– Primal: max cx, Ax = b, x m 0 – Dual: min yb, yA m c – Requires computation of Ax and yA
- Simplex algorithm:
– Small number of computations of Ax, but many computations of yA when performing pricing
- Critical to performance of algorithm
- Interior algorithms
– Many fewer iterations, but require similar number of computations of Ax and yA
Computational details - yA
# # . . . # # # . . . . # # . . . . # . # # # . # . . . # . # . . # . # # # # . . # . . # . . . # # . . . # # . . # . . . # . # # . . # . # . . . # # # # # # # # . .
.
# # # # #
y A Problem: typically the matrices are sparse in practice, typically 4 to 8 nonzeros per column Partition matrix based on the number of nonzeros per column Row indices of nonzeros Nonzero values
Now we can focus on dense submatrices
# # # # #
y values indices Normal (scalar) computation: 18 machine cycles per calculation; 15 cycles initialization Vector Unit computation: 180 machine cycles to startup, 4 cycles per calculation
- Require approximately 12 elements to break even
- so, do computation by rows
- Use indices to select components of y
- Use “accumulator” to compute y*Aj
A y Expanded y row of A y*A Expanded indices
What goes wrong with the computation of Ax?
x A Ax Indices of nonzeros * = Entries in index rows now specify which component of Ax is being computed. If all entries in each index row are distinct, then it works. If there are duplicate entries, then we can attempt to permute the entries in each column to eliminate duplicates.
Example
13 8 11 10 12 14 7 9 4 6 1 6 8 3 3 11 4 9 8 11 15 5 4 2 6 7 5 7 14 2 2 15 16 1 3 1
Index matrix Suppose A has 16 rows, and our block has 12 columns and 3 rows. Columns of submatrix of A Rows of A . . . Bipartite graph representation
Applying Koenig’s edge coloring theorem
- Let d be the number of rows in the submatrix. Permuting
the entries in each column, so that no row contains duplicates, is equivalent to edge coloring the bipartite graph with d colors.
- By Koenig’s theorem, this is possible if and only if each
node of the bipartite graph has degree at most d, which is equivalent to each index appearing at most d times in the submatrix.
- We need an efficient way to obtain the edge coloring. If
there is no coloring, because some index occurs too often, we can permute these to the last row and reduce the size
- f the block.
Alternating path based proof
- Faber, Ehrenfeucht and Kierstead (1982)
presented an alternating path based proof:
– Start with any coloring of the edges. – If some node v is incident with two identically colored edges c1, then some edge color c2 must be missing – Switch the edges in a c1-c2 alternating path’
v v
Matrix interpretation
- Start with any permutations in the columns of the
submatrix; process columns left to right, top to bottom c1 c1
Row not containing c1 in processed part c2 Swap these Two entries
If this creates a c2 conflict with processed elements, it involves original row, repeat.
Row with duplicate c1
Computational results
502943 244439 210211 24465 40757 9994 3580 1670
Nonzeros
2.3 219249 90482 MPS4 2.2 110475 40022 Mod42 2.2 94752 34952 Mod41 14.2 1727 1412 Degen3 12.1 3361 1374 PILOTS 6.7 1484 715 25FV47 7.5 473 381 Degen2 4.9 340 185 BandM
Average # of nonzeros per column
Columns Rows Model
Computational results
244 166 127 53 109 79 40 46
- No. of yA
calcs.
408 278 213 93 184 134 68 79
- No. of Ax
calcs.
82.826 26.746 17.672 0.676 2.440 0.486 0.101 0.056
Old Ax time
18.187 43.543 3.361
MPS4
5.863 17.169 1.289
Mod42
3.793 11.209 1.340
Mod41
0.186 0.462 0.089
Degen3
0.523 1.662 0.216
PILOTS
0.099 0.416 0.044
25FV47
0.028 0.062 0.020
Degen2
0.018 0.043 0.007
BandM
Total yA time New Ax time Reordering time Model