Format Abstraction for Sparse Tensor Algebra Compilers
Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe
Format Abstraction for Sparse Tensor Algebra Compilers Stephen Chou , - - PowerPoint PPT Presentation
Format Abstraction for Sparse Tensor Algebra Compilers Stephen Chou , Fredrik Kjolstad, and Saman Amarasinghe Sparse tensors are a natural way of representing real-world data 2 Sparse tensors are a natural way of representing real-world data 2
Format Abstraction for Sparse Tensor Algebra Compilers
Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe
2
Sparse tensors are a natural way of representing real-world data
2
Sparse tensors are a natural way of representing real-world data
2
Q u a l i t y D u r a b l e P
… 2 1 1 1 1 1 2 3 1 1 Kindle Dubliners The Iliad Monitor Sweater Laptop Candide Jacket … Peter Paul Mary Bob Sam Billy Lilly Hilde …
Users Words Products
Sparse tensors are a natural way of representing real-world data
2
Q u a l i t y D u r a b l e P
… 2 1 1 1 1 1 2 3 1 1 Kindle Dubliners The Iliad Monitor Sweater Laptop Candide Jacket … Peter Paul Mary Bob Sam Billy Lilly Hilde …
Users Words Products
Sparse tensors are a natural way of representing real-world data
Dense storage: 107 exabytes Sparse storage: 13 gigabytes
3
ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR
CSC
Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL
Many different formats for storing tensors exist
3
ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR
CSC
Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL Hash maps Sparse vector Dense array vector
Many different formats for storing tensors exist
3
ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR
CSC
Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL Hash maps Sparse vector Dense array vector CSF Coordinate tensor Dense array tensor Mode-generic tensor HiCOO F-COO
Many different formats for storing tensors exist
Thermal simulation
3
ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR
CSC
Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL Hash maps Sparse vector Dense array vector CSF Coordinate tensor Dense array tensor Mode-generic tensor HiCOO F-COO
Many different formats for storing tensors exist
Unstructured mesh simulation [Bell and Garland 2009] Thermal simulation
3
ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR
CSC
Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL Hash maps Sparse vector Dense array vector CSF Coordinate tensor Dense array tensor Mode-generic tensor HiCOO F-COO
Many different formats for storing tensors exist
Unstructured mesh simulation [Bell and Garland 2009] Thermal simulation
3
CNN with block-sparse weights [Gray et al. 2017]
ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR
CSC
Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL Hash maps Sparse vector Dense array vector CSF Coordinate tensor Dense array tensor Mode-generic tensor HiCOO F-COO
Many different formats for storing tensors exist
Unstructured mesh simulation [Bell and Garland 2009] Thermal simulation Image processing
3
CNN with block-sparse weights [Gray et al. 2017]
ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR
CSC
Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL Hash maps Sparse vector Dense array vector CSF Coordinate tensor Dense array tensor Mode-generic tensor HiCOO F-COO
Many different formats for storing tensors exist
Unstructured mesh simulation [Bell and Garland 2009] Thermal simulation Data analytics Image processing
3
CNN with block-sparse weights [Gray et al. 2017]
ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR
CSC
Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL Hash maps Sparse vector Dense array vector CSF Coordinate tensor Dense array tensor Mode-generic tensor HiCOO F-COO
Many different formats for storing tensors exist
4
There is no universally superior tensor format
4
There is no universally superior tensor format
4
There is no universally superior tensor format
4
There is no universally superior tensor format
4
There is no universally superior tensor format
Normalized time 0.0 0.5 1.0 1.5 CSR DIA
CSR DIA 186x CSR BCSR
4
There is no universally superior tensor format
Normalized time 0.0 0.5 1.0 1.5 CSR DIA
CSR DIA 186x CSR BCSR
Format Abstraction for Sparse Tensor Algebra Compilers
Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe
6
[Kjolstad et al. 2017] This work
Code Generator
4
N pos
0 2 4 4
crd
0 1 1 7 3 4
Code Generator
6
M
4
N
0 1
crd 0 1 0
3 4 1
W
6
crd 0 1
4 6
0 1
4
N pos
0 2 4 4
crd
0 1 1 7 3 4
6
[Kjolstad et al. 2017] This work
Code Generator
4
N pos
0 2 4 4
crd
0 1 1 7 3 4
Code Generator
6
M
4
N
0 1
crd 0 1 0
3 4 1
W
6
crd 0 1
4 6
0 1
4
N pos
0 2 4 4
crd
0 1 1 7 3 4
Format abstraction
6
[Kjolstad et al. 2017] This work
Code Generator
4
N pos
0 2 4 4
crd
0 1 1 7 3 4
Code Generator
6
M
4
N
0 1
crd 0 1 0
3 4 1
W
6
crd 0 1
4 6
0 1
4
N pos
0 2 4 4
crd
0 1 1 7 3 4
F C D E A B
7
A B
1 2 3 1 2
F C D E
Storing sparse tensors efficiently requires additional metadata
F C D E A B
7
A B
1 2 3 4 5 6 7 8 9 10 11 1 2 3 1 2
F C D E
Storing sparse tensors efficiently requires additional metadata
F C D E A B
7
A B
1 2 3 4 5 6 7 8 9 10 11 1 2 3 1 2
F C D E
Storing sparse tensors efficiently requires additional metadata
F C D E A B
7
A B
1 2 3 4 5 6 7 8 9 10 11
row(6) = 6 / 4 = 1 col(6) = 6 % 4 = 2
1 2 3 1 2
F C D E
Storing sparse tensors efficiently requires additional metadata
F C D E A B
7
A B
1 2 3 4 5 6 7 8 9 10 11 1 2 3 1 2
F C D E
Storing sparse tensors efficiently requires additional metadata
F C D E A B
7
A B
1 2 3 4 5 6 7 8 9 10 11
locate(1,2) = 1 * 4 + 2
1 2 3 1 2
F C D E
= 6
Storing sparse tensors efficiently requires additional metadata
8
A B C D E F
1 2 3 4 5 6 7 8 9 10 11 1 2 3 1 2
F C D E A B
Storing sparse tensors efficiently requires additional metadata
8
A B C D E F
1 2 3 4 5 1 2 3 1 2
F C D E A B
Storing sparse tensors efficiently requires additional metadata
8
A B C D E F row(3) = ??? col(3) = ???
1 2 3 4 5 1 2 3 1 2
F C D E A B
Storing sparse tensors efficiently requires additional metadata
9
A B C D E F
1 2 3 4 5 1 2 3 1 2
F C D E A B
Coordinates of tensor elements can be encoded in many ways
9
A B C D E F 2 1 2 3 3
1 1
1 2 rows cols
1 2 3 4 5 1 2 3 1 2
F C D E A B
Coordinates of tensor elements can be encoded in many ways
Coordinate
9
A B C D E F 2 1 2 3 3
1 1
1 2 rows cols
1 2 3 4 5 1 2 3 1 2
F C D E A B
Coordinates of tensor elements can be encoded in many ways
Coordinate
9
A B C D E F 2 1 2 3 3 2 5 6 cols pos
1 2 3 4 5 1 2 3 1 2
F C D E A B
Coordinates of tensor elements can be encoded in many ways
CSR
1 2 3
9
A B C D E F 2 1 2 3 3 2 5 6 cols pos
1 2 3 4 5 1 2 3 1 2
F C D E A B
Coordinates of tensor elements can be encoded in many ways
CSR
1 2 3
9
A B C D E F 2 1 2 3 3 2 5 6 cols pos
1 2 3 4 5 1 2 3 1 2
F C D E A B
Coordinates of tensor elements can be encoded in many ways
CSR
1 2 3
9
A B C D E F 2 1 2 3 3 2 5 6 cols pos
1 2 3 4 5 1 2 3 1 2
F C D E A B
Coordinates of tensor elements can be encoded in many ways
CSR
1 2 3
9
A B C D E F 2 1 2 3 3 2 5 6 cols pos
1 2 3 4 5 1 2 3 1 2
F C D E A B
Coordinates of tensor elements can be encoded in many ways
CSR
1 2 3
10
Computing with different formats can require very different code
10
for (int pB = B1_pos[0]; pB < B1_pos[1]; pB++) { int i = B1_crd[pB]; int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC];
}
Coordinate ✕ Dense array
Computing with different formats can require very different code
10
for (int pB = B1_pos[0]; pB < B1_pos[1]; pB++) { int i = B1_crd[pB]; int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC];
}
Coordinate ✕ Dense array CSR ✕ Dense array
Computing with different formats can require very different code
for (int i = 0; i < M; i++) { for (int pB = B2_pos[i]; pB < B2_pos[i + 1]; pB++) { int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC]; } }
10
for (int pB = B1_pos[0]; pB < B1_pos[1]; pB++) { int i = B1_crd[pB]; int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC];
}
int pC1 = C1_pos[0]; while (pC1 < C1_pos[1]) { int i = C1_crd[pC1]; int C1_segend = pC1 + 1; while (C1_segend < C1_pos[1] && C1_crd[C1_segend] == i) C1_segend++; int pB2 = B2_pos[i]; int pC2 = pC1; while (pB2 < B2_pos[i + 1] && pC2 < C1_segend) { int jB2 = B2_crd[pB2]; int jC2 = C2_crd[pC2]; int j = min(jB2, jC2); int pA = i * N + j; if (jB2 == j && jC2 == j) A[pA] = B[pB2] * C[pC2]; if (jB2 == j) pB2++; if (jC2 == j) pC2++; } pC1 = C1_segend; }
Coordinate ✕ Dense array CSR ✕ Dense array CSR ✕ Coordinate
Computing with different formats can require very different code
for (int i = 0; i < M; i++) { for (int pB = B2_pos[i]; pB < B2_pos[i + 1]; pB++) { int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC]; } }
Hand-coding support for a wide range of formats is infeasible
Coordinate ✕ Dense array CSR ✕ Dense array CSR ✕ Coordinate
11
Hand-coding support for a wide range of formats is infeasible
Coordinate ✕ Dense array CSR ✕ Dense array CSR ✕ Coordinate
11
Hand-coding support for a wide range of formats is infeasible
Coordinate ✕ Dense array CSR ✕ Dense array CSR ✕ Coordinate
Dense array ✕ Dense array Coordinate ✕ Coordinate CSR ✕ CSR
DIA ✕ DIA DIA ✕ Dense array DIA ✕ Coordinate DIA ✕ CSR
ELLPACK ✕ ELLPACK ELLPACK ✕ Dense array ELLPACK ✕ Coordinate ELLPACK ✕ CSR ELLPACK ✕ DIA
BCSR ✕ BCSR BCSR ✕ Dense array BCSR ✕ Coordinate
11
Hand-coding support for a wide range of formats is infeasible
Coordinate ✕ Dense array CSR ✕ Dense array CSR ✕ Coordinate
Dense array ✕ Dense array Coordinate ✕ Coordinate CSR ✕ CSR
DIA ✕ DIA DIA ✕ Dense array DIA ✕ Coordinate DIA ✕ CSR
ELLPACK ✕ ELLPACK ELLPACK ✕ Dense array ELLPACK ✕ Coordinate ELLPACK ✕ CSR ELLPACK ✕ DIA
BCSR ✕ BCSR BCSR ✕ Dense array BCSR ✕ Coordinate
Dense array ✕ CSR ✕ CSR Coordinate ✕ CSR ✕ CSR CSR ✕ CSR ✕ CSR
Dense array ✕ Coordinate ✕ CSR Dense array ✕ Dense array ✕ CSR Coordinate ✕ Coordinate ✕ CSR
DIA ✕ Coordinate ✕ Dense array DIA ✕ Coordinate ✕ CSR DIA ✕ Dense array ✕ CSR DIA ✕ CSR ✕ CSR
DIA ✕ Coordinate ✕ Coordinate DIA ✕ Dense array ✕ Dense array DIA ✕ DIA ✕ CSR DIA ✕ DIA ✕ Coordinate DIA ✕ DIA ✕ Dense array
ELLPACK ✕ ELLPACK ✕ DIA ELLPACK ✕ CSR ✕ DIA ELLPACK ✕ BCSR ✕ DIA DIA ✕ DIA ✕ DIA 11
Hand-coding support for a wide range of formats is infeasible
Coordinate ✕ Dense array CSR ✕ Dense array CSR ✕ Coordinate
Dense array ✕ Dense array Coordinate ✕ Coordinate CSR ✕ CSR
DIA ✕ DIA DIA ✕ Dense array DIA ✕ Coordinate DIA ✕ CSR
ELLPACK ✕ ELLPACK ELLPACK ✕ Dense array ELLPACK ✕ Coordinate ELLPACK ✕ CSR ELLPACK ✕ DIA
BCSR ✕ BCSR BCSR ✕ Dense array BCSR ✕ Coordinate
Dense array ✕ CSR ✕ CSR Coordinate ✕ CSR ✕ CSR CSR ✕ CSR ✕ CSR
Dense array ✕ Coordinate ✕ CSR Dense array ✕ Dense array ✕ CSR Coordinate ✕ Coordinate ✕ CSR
DIA ✕ Coordinate ✕ Dense array DIA ✕ Coordinate ✕ CSR DIA ✕ Dense array ✕ CSR DIA ✕ CSR ✕ CSR
DIA ✕ Coordinate ✕ Coordinate DIA ✕ Dense array ✕ Dense array DIA ✕ DIA ✕ CSR DIA ✕ DIA ✕ Coordinate DIA ✕ DIA ✕ Dense array
ELLPACK ✕ ELLPACK ✕ DIA ELLPACK ✕ CSR ✕ DIA ELLPACK ✕ BCSR ✕ DIA DIA ✕ DIA ✕ DIA
Dense array ✕ Dense array ✕ Dense array Dense array ✕ Dense array ✕ Sparse vector Dense array ✕ Dense array ✕ Hash map
Dense array ✕ Sparse vector ✕ Sparse vector Dense array ✕ Sparse vector ✕ Hash map Dense array ✕ Hash map ✕ Sparse vector
Dense array ✕ Sparse vector ✕ Dense array Coordinate ✕ Dense array ✕ Dense array Coordinate ✕ Sparse vector ✕ Dense array Coordinate ✕ Dense array ✕ Hash map
Coordinate ✕ Sparse vector ✕ Hash map Coordinate ✕ Hash map ✕ Sparse vector CSR ✕ Dense array ✕ Dense array CSR ✕ Dense array ✕ Sparse vector CSR ✕ Hash map ✕ Sparse vector
CSR ✕ Hash map ✕ Dense array CSR ✕ Sparse vector ✕ Dense array DIA ✕ Dense array ✕ Dense array DIA ✕ Hash map ✕ Dense array ELLPACK ✕ Dense array ✕ Sparse vector 11
12
Evaluation
Mode-generic tensor Compressed Singleton Dense Dense DIA Dense Range Offset
x supports locate? y supports locate? y supports locate? x unordered and y ordered? co-iterate over x and y iterate over x and locate into y iterate over y and locate into x no yes no yes yes no no yesFormat Abstraction & Code Generation
12
Evaluation
Mode-generic tensor Compressed Singleton Dense Dense DIA Dense Range Offset
x supports locate? y supports locate? y supports locate? x unordered and y ordered? co-iterate over x and y iterate over x and locate into y iterate over y and locate into x no yes no yes yes no no yesFormat Abstraction & Code Generation
12
Evaluation
Mode-generic tensor Compressed Singleton Dense Dense DIA Dense Range Offset
x supports locate? y supports locate? y supports locate? x unordered and y ordered? co-iterate over x and y iterate over x and locate into y iterate over y and locate into x no yes no yes yes no no yesFormat Abstraction & Code Generation
13
J G H
3 2 1 1 1 2
B C A F E D
Tensor formats can be viewed as compositions of level formats
13
J G H
3 2 1 1 1 2
B C A F E D
3 5 9 3 1 1 1 1 1 2 3 1 1 3 3 A B C D E F G H J
1 2 3 4 5 6 7 8
Tensor formats can be viewed as compositions of level formats
13
J G H
3 2 1 1 1 2
B C A F E D
3 5 9 3 1 1 1 1 1 2 3 1 1 3 3 A B C D E F G H J
1 2 3 4 5 6 7 8
Tensor formats can be viewed as compositions of level formats
13
J G H
3 2 1 1 1 2
B C A F E D
3 5 9 3 1 1 1 1 1 2 3 1 1 3 3 A B C D E F G H J
1 2 3 4 5 6 7 8
Slices
Tensor formats can be viewed as compositions of level formats
13
J G H
3 2 1 1 1 2
B C A F E D
3 5 9 3 1 1 1 1 1 2 3 1 1 3 3 A B C D E F G H J
1 2 3 4 5 6 7 8
Rows
Tensor formats can be viewed as compositions of level formats
13
J G H
3 2 1 1 1 2
B C A F E D
3 5 9 3 1 1 1 1 1 2 3 1 1 3 3 A B C D E F G H J
1 2 3 4 5 6 7 8
Columns
Tensor formats can be viewed as compositions of level formats
13
J G H
3 2 1 1 1 2
B C A F E D
3 5 9 3 1 1 1 1 1 2 3 1 1 3 3 A B C D E F G H J
1 2 3 4 5 6 7 8
Compressed Singleton Dense
Tensor formats can be viewed as compositions of level formats
14
Dense Compressed Singleton
The same level formats can be composed in many ways
14
Dense Compressed Singleton
A B
1 2 3 1 2
F C D E
The same level formats can be composed in many ways
14
Dense Compressed Singleton
A B
1 2 3 1 2
F C D E
3
Dense
The same level formats can be composed in many ways
14
A B C D E F
Dense Compressed Singleton
A B
1 2 3 1 2
F C D E
3 0 2 2 3 3 1 0 2 5 6
Dense Compressed
The same level formats can be composed in many ways
14
A B C D E F
Dense Compressed Singleton
A B
1 2 3 1 2
F C D E
3 0 2 2 3 3 1 0 2 5 6
The same level formats can be composed in many ways
CSR{
15
Dense Compressed Singleton
A B
1 2 3 1 2
F C D E
The same level formats can be composed in many ways
15
Dense Compressed Singleton
0 0 1 1 2 1 0 6
A B
1 2 3 1 2
F C D E
Compressed
The same level formats can be composed in many ways
15
A B C D E F 0 2 2 3 3 1
Dense Compressed Singleton
0 0 1 1 2 1 0 6
A B
1 2 3 1 2
F C D E
Compressed Singleton
The same level formats can be composed in many ways
15
A B C D E F 0 2 2 3 3 1
Dense Compressed Singleton
0 0 1 1 2 1 0 6
A B
1 2 3 1 2
F C D E
The same level formats can be composed in many ways
Coordinate{
16
Dense Compressed Singleton
The same level formats can be composed in many ways
Tensor formats Level formats
16
Coordinate matrix Compressed Singleton CSR Dense Compressed
Dense Compressed Singleton
[Tinney and Walker, 1967]
The same level formats can be composed in many ways
Tensor formats Level formats
16
Coordinate matrix Compressed Singleton CSR Dense Compressed Coordinate tensor Compressed Singleton Singleton
Dense Compressed Singleton
Mode-generic tensor Compressed Singleton Dense Dense
[Baskaran et al. 2012] [Tinney and Walker, 1967]
Dense array tensor Dense Dense Dense
The same level formats can be composed in many ways
Tensor formats Level formats
16
Coordinate matrix Compressed Singleton CSR Dense Compressed Coordinate tensor Compressed Singleton Singleton BCSR Dense Compressed Dense Dense
Dense Compressed Singleton
ELLPACK Dense Dense Singleton Mode-generic tensor Compressed Singleton Dense Dense
[Baskaran et al. 2012]
CSB Dense Dense Compressed Singleton
[Kincaid et al. 1989] [Buluç et al. 2009] [Tinney and Walker, 1967] [Im and Yelick 1998]
Dense array tensor Dense Dense Dense
The same level formats can be composed in many ways
Tensor formats Level formats
16
Hashed Range Offset
Coordinate matrix Compressed Singleton CSR Dense Compressed Coordinate tensor Compressed Singleton Singleton BCSR Dense Compressed Dense Dense
Dense Compressed Singleton
ELLPACK Dense Dense Singleton Mode-generic tensor Compressed Singleton Dense Dense
[Baskaran et al. 2012]
CSB Dense Dense Compressed Singleton
[Kincaid et al. 1989] [Buluç et al. 2009] [Tinney and Walker, 1967] [Im and Yelick 1998]
Dense array tensor Dense Dense Dense
The same level formats can be composed in many ways
Tensor formats Level formats
16
Hashed Range Offset
Coordinate matrix Compressed Singleton CSR Dense Compressed Coordinate tensor Compressed Singleton Singleton BCSR Dense Compressed Dense Dense DIA Dense Range Offset Block DIA Dense Range Offset Dense Dense Hash map vector Hashed
Dense Compressed Singleton
ELLPACK Dense Dense Singleton Mode-generic tensor Compressed Singleton Dense Dense
[Baskaran et al. 2012]
Hash map matrix Hashed Hashed CSB Dense Dense Compressed Singleton
[Kincaid et al. 1989] [Buluç et al. 2009] [Tinney and Walker, 1967] [Im and Yelick 1998] [Saad 2003] [Patwary et al. 2015]
Dense array tensor Dense Dense Dense
The same level formats can be composed in many ways
17
for (int i = 0; i < m; i++) { for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j; int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_idx[pB3]; int kc = c1_idx[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { a[pA2] += b[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }
Tensor Algebra Compiler (taco)
c : (compressed)
<latexit sha1_base64="znx8HB3iu6URMn3Pu43WiyqTY=">ACAHicbVC7SgNBFJ31GeNr1cLCZjAIsQm7IihWQRvLCOYBSQizk7vJkNmdZeauGJY0/oqNhSK2foadf+PkUWjigYHDOfdw54gkcKg5307S8srq2vruY385tb2zq67t18zKtUcqlxJpRsBMyBFDFUKGRaGBRIKEeDG7Gfv0BtBEqvsdhAu2I9WIRCs7QSh3kNMrWmwhPGLGVWSzxkB3dNpxC17Jm4AuEn9GCmSGSsf9anUVTyOIkUtmTNP3EmxnTKPgEkb5VmogYXzAetC0NGYRmHY2OWBET6zSpaHS9sVIJ+rvRMYiY4ZRYCcjhn0z743F/7xmiuFlOxNxkiLEfLoTCVFRcdt0K7QwFEOLWFcC/tXyvtM462s7wtwZ8/eZHUzkq+5XfnhfL1rI4cOSLHpEh8ckHK5JZUSJVwMiLP5JW8OU/Oi/PufExHl5xZ5oD8gfP5Awmdlg0=</latexit><latexit sha1_base64="znx8HB3iu6URMn3Pu43WiyqTY=">ACAHicbVC7SgNBFJ31GeNr1cLCZjAIsQm7IihWQRvLCOYBSQizk7vJkNmdZeauGJY0/oqNhSK2foadf+PkUWjigYHDOfdw54gkcKg5307S8srq2vruY385tb2zq67t18zKtUcqlxJpRsBMyBFDFUKGRaGBRIKEeDG7Gfv0BtBEqvsdhAu2I9WIRCs7QSh3kNMrWmwhPGLGVWSzxkB3dNpxC17Jm4AuEn9GCmSGSsf9anUVTyOIkUtmTNP3EmxnTKPgEkb5VmogYXzAetC0NGYRmHY2OWBET6zSpaHS9sVIJ+rvRMYiY4ZRYCcjhn0z743F/7xmiuFlOxNxkiLEfLoTCVFRcdt0K7QwFEOLWFcC/tXyvtM462s7wtwZ8/eZHUzkq+5XfnhfL1rI4cOSLHpEh8ckHK5JZUSJVwMiLP5JW8OU/Oi/PufExHl5xZ5oD8gfP5Awmdlg0=</latexit><latexit sha1_base64="znx8HB3iu6URMn3Pu43WiyqTY=">ACAHicbVC7SgNBFJ31GeNr1cLCZjAIsQm7IihWQRvLCOYBSQizk7vJkNmdZeauGJY0/oqNhSK2foadf+PkUWjigYHDOfdw54gkcKg5307S8srq2vruY385tb2zq67t18zKtUcqlxJpRsBMyBFDFUKGRaGBRIKEeDG7Gfv0BtBEqvsdhAu2I9WIRCs7QSh3kNMrWmwhPGLGVWSzxkB3dNpxC17Jm4AuEn9GCmSGSsf9anUVTyOIkUtmTNP3EmxnTKPgEkb5VmogYXzAetC0NGYRmHY2OWBET6zSpaHS9sVIJ+rvRMYiY4ZRYCcjhn0z743F/7xmiuFlOxNxkiLEfLoTCVFRcdt0K7QwFEOLWFcC/tXyvtM462s7wtwZ8/eZHUzkq+5XfnhfL1rI4cOSLHpEh8ckHK5JZUSJVwMiLP5JW8OU/Oi/PufExHl5xZ5oD8gfP5Awmdlg0=</latexit><latexit sha1_base64="znx8HB3iu6URMn3Pu43WiyqTY=">ACAHicbVC7SgNBFJ31GeNr1cLCZjAIsQm7IihWQRvLCOYBSQizk7vJkNmdZeauGJY0/oqNhSK2foadf+PkUWjigYHDOfdw54gkcKg5307S8srq2vruY385tb2zq67t18zKtUcqlxJpRsBMyBFDFUKGRaGBRIKEeDG7Gfv0BtBEqvsdhAu2I9WIRCs7QSh3kNMrWmwhPGLGVWSzxkB3dNpxC17Jm4AuEn9GCmSGSsf9anUVTyOIkUtmTNP3EmxnTKPgEkb5VmogYXzAetC0NGYRmHY2OWBET6zSpaHS9sVIJ+rvRMYiY4ZRYCcjhn0z743F/7xmiuFlOxNxkiLEfLoTCVFRcdt0K7QwFEOLWFcC/tXyvtM462s7wtwZ8/eZHUzkq+5XfnhfL1rI4cOSLHpEh8ckHK5JZUSJVwMiLP5JW8OU/Oi/PufExHl5xZ5oD8gfP5Awmdlg0=</latexit>B : (dense, compressed, compressed)
<latexit sha1_base64="4q1Sk4lxncbqFoMX5S1GLw3t+GQ=">ACIXicbVDLSgMxFM34rPVdekmWIQKIjMiWFyVunFZwT6gLSWTuW2DmcyQ3BHL0F9x46+4caFId+LPmD4W2vZA4OSce29yjx9LYdB1v52V1bX1jc3MVnZ7Z3dvP3dwWDNRojlUeSQj3fCZASkUVFGghEasgYW+hLr/eDv260+gjYjUAw5iaIesp0RXcIZW6uSKZXpDCy2EZ0wDUAaG53R641FoJxkDwTLprJPLuxfuBHSReDOSJzNUOrlRK4h4EoJCLpkxTc+NsZ0yjYJLGZbiYGY8UfWg6alioVg2ulkwyE9tUpAu5G2RyGdqH87UhYaMwh9Wxky7Jt5bywu85oJdovtVKg4QVB8+lA3kRQjOo6LBkIDRzmwhHEt7F8p7zPNONpQszYEb37lRVK7vPAsv7/Kl8qzODLkmJyQAvHINSmRO1IhVcLJC3kjH+TeXenS9nNC1dcWY9R+QfnJ9fTfykRA=</latexit><latexit sha1_base64="4q1Sk4lxncbqFoMX5S1GLw3t+GQ=">ACIXicbVDLSgMxFM34rPVdekmWIQKIjMiWFyVunFZwT6gLSWTuW2DmcyQ3BHL0F9x46+4caFId+LPmD4W2vZA4OSce29yjx9LYdB1v52V1bX1jc3MVnZ7Z3dvP3dwWDNRojlUeSQj3fCZASkUVFGghEasgYW+hLr/eDv260+gjYjUAw5iaIesp0RXcIZW6uSKZXpDCy2EZ0wDUAaG53R641FoJxkDwTLprJPLuxfuBHSReDOSJzNUOrlRK4h4EoJCLpkxTc+NsZ0yjYJLGZbiYGY8UfWg6alioVg2ulkwyE9tUpAu5G2RyGdqH87UhYaMwh9Wxky7Jt5bywu85oJdovtVKg4QVB8+lA3kRQjOo6LBkIDRzmwhHEt7F8p7zPNONpQszYEb37lRVK7vPAsv7/Kl8qzODLkmJyQAvHINSmRO1IhVcLJC3kjH+TeXenS9nNC1dcWY9R+QfnJ9fTfykRA=</latexit><latexit sha1_base64="4q1Sk4lxncbqFoMX5S1GLw3t+GQ=">ACIXicbVDLSgMxFM34rPVdekmWIQKIjMiWFyVunFZwT6gLSWTuW2DmcyQ3BHL0F9x46+4caFId+LPmD4W2vZA4OSce29yjx9LYdB1v52V1bX1jc3MVnZ7Z3dvP3dwWDNRojlUeSQj3fCZASkUVFGghEasgYW+hLr/eDv260+gjYjUAw5iaIesp0RXcIZW6uSKZXpDCy2EZ0wDUAaG53R641FoJxkDwTLprJPLuxfuBHSReDOSJzNUOrlRK4h4EoJCLpkxTc+NsZ0yjYJLGZbiYGY8UfWg6alioVg2ulkwyE9tUpAu5G2RyGdqH87UhYaMwh9Wxky7Jt5bywu85oJdovtVKg4QVB8+lA3kRQjOo6LBkIDRzmwhHEt7F8p7zPNONpQszYEb37lRVK7vPAsv7/Kl8qzODLkmJyQAvHINSmRO1IhVcLJC3kjH+TeXenS9nNC1dcWY9R+QfnJ9fTfykRA=</latexit><latexit sha1_base64="4q1Sk4lxncbqFoMX5S1GLw3t+GQ=">ACIXicbVDLSgMxFM34rPVdekmWIQKIjMiWFyVunFZwT6gLSWTuW2DmcyQ3BHL0F9x46+4caFId+LPmD4W2vZA4OSce29yjx9LYdB1v52V1bX1jc3MVnZ7Z3dvP3dwWDNRojlUeSQj3fCZASkUVFGghEasgYW+hLr/eDv260+gjYjUAw5iaIesp0RXcIZW6uSKZXpDCy2EZ0wDUAaG53R641FoJxkDwTLprJPLuxfuBHSReDOSJzNUOrlRK4h4EoJCLpkxTc+NsZ0yjYJLGZbiYGY8UfWg6alioVg2ulkwyE9tUpAu5G2RyGdqH87UhYaMwh9Wxky7Jt5bywu85oJdovtVKg4QVB8+lA3kRQjOo6LBkIDRzmwhHEt7F8p7zPNONpQszYEb37lRVK7vPAsv7/Kl8qzODLkmJyQAvHINSmRO1IhVcLJC3kjH+TeXenS9nNC1dcWY9R+QfnJ9fTfykRA=</latexit>A : (dense, dense) Aij = X
k
Bijkck
[Kjolstad et al. 2017]
18
for (int i = 0; i < m; i++) { for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j; int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_crd[pB3]; int kc = c1_crd[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { A[pA2] += B[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }
Aij = X
k
Bijk · ck
<latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="ck8pdC+ekZH4nUmSP+ZG7r8lEyk=">AB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i7+gsTLXTzQrMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GS7KDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RtRxzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D7odOn4MoA7ncAFXEMIN3MEDdKALAhJ4hXdv4r15H6uat6tDP4I+/zBzjGijg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="9QtCk7M63G3IchN+Z5Vs/Le7raY=">ACinicbZHfatswFMZld1u7rNvS9nI3YmGwiy1IdhPblELX7WKXLSxtITZGVpRUi/wHS4EoYfpK+1ubzM5zWBLe0Dw4zvn0znSKRrBlUbot+fvPHv+YnfvZe/V/us3b/sHh1eqbiVlE1qLWt4URDHBKzbRXAt20hGykKw62L5tctf3zGpeF390KuGZSVZVHzOKdFOyv3Jl1fMpWLIjNoiOIRCo8/oeE4TBCOHSAchXFgv+SG/7QWnsJUtWVultZsWfEoPE4C5wijMQ6RgzhOUIDteWdOm9KZ7WGW7YIjYM46Rqtw0GEwygeWUNzZ7J5f/A3Bx8D3sAbOIi7/9KZzVtS1ZpKohSU4wanRkiNaeC2V7aKtYQuiQLNnVYkZKpzKxnsvCDU2ZwXkt3Kg3X6r8OQ0qlVmXhKkuib9V2rhOfyk1bPY8zw6um1ayiD43mrYC6ht1e4IxLRrVYOSBUcjcrpLdEqrd9nruE/D2kx/DVTDEji/R4Ox8x174B14Dz4CDCJwBr6DCzAB1Nv1PntjL/L3/cBP/JOHUt/beI7Af+F/+wOGoLrq</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit>taco generates code dimension by dimension
18
for (int i = 0; i < m; i++) { for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j; int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_crd[pB3]; int kc = c1_crd[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { A[pA2] += B[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }
Aij = X
k
Bijk · ck
<latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="ck8pdC+ekZH4nUmSP+ZG7r8lEyk=">AB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i7+gsTLXTzQrMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GS7KDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RtRxzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D7odOn4MoA7ncAFXEMIN3MEDdKALAhJ4hXdv4r15H6uat6tDP4I+/zBzjGijg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="9QtCk7M63G3IchN+Z5Vs/Le7raY=">ACinicbZHfatswFMZld1u7rNvS9nI3YmGwiy1IdhPblELX7WKXLSxtITZGVpRUi/wHS4EoYfpK+1ubzM5zWBLe0Dw4zvn0znSKRrBlUbot+fvPHv+YnfvZe/V/us3b/sHh1eqbiVlE1qLWt4URDHBKzbRXAt20hGykKw62L5tctf3zGpeF390KuGZSVZVHzOKdFOyv3Jl1fMpWLIjNoiOIRCo8/oeE4TBCOHSAchXFgv+SG/7QWnsJUtWVultZsWfEoPE4C5wijMQ6RgzhOUIDteWdOm9KZ7WGW7YIjYM46Rqtw0GEwygeWUNzZ7J5f/A3Bx8D3sAbOIi7/9KZzVtS1ZpKohSU4wanRkiNaeC2V7aKtYQuiQLNnVYkZKpzKxnsvCDU2ZwXkt3Kg3X6r8OQ0qlVmXhKkuib9V2rhOfyk1bPY8zw6um1ayiD43mrYC6ht1e4IxLRrVYOSBUcjcrpLdEqrd9nruE/D2kx/DVTDEji/R4Ox8x174B14Dz4CDCJwBr6DCzAB1Nv1PntjL/L3/cBP/JOHUt/beI7Af+F/+wOGoLrq</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit>taco generates code dimension by dimension
18
for (int i = 0; i < m; i++) { for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j; int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_crd[pB3]; int kc = c1_crd[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { A[pA2] += B[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }
Aij = X
k
Bijk · ck
<latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="ck8pdC+ekZH4nUmSP+ZG7r8lEyk=">AB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i7+gsTLXTzQrMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GS7KDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RtRxzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D7odOn4MoA7ncAFXEMIN3MEDdKALAhJ4hXdv4r15H6uat6tDP4I+/zBzjGijg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="9QtCk7M63G3IchN+Z5Vs/Le7raY=">ACinicbZHfatswFMZld1u7rNvS9nI3YmGwiy1IdhPblELX7WKXLSxtITZGVpRUi/wHS4EoYfpK+1ubzM5zWBLe0Dw4zvn0znSKRrBlUbot+fvPHv+YnfvZe/V/us3b/sHh1eqbiVlE1qLWt4URDHBKzbRXAt20hGykKw62L5tctf3zGpeF390KuGZSVZVHzOKdFOyv3Jl1fMpWLIjNoiOIRCo8/oeE4TBCOHSAchXFgv+SG/7QWnsJUtWVultZsWfEoPE4C5wijMQ6RgzhOUIDteWdOm9KZ7WGW7YIjYM46Rqtw0GEwygeWUNzZ7J5f/A3Bx8D3sAbOIi7/9KZzVtS1ZpKohSU4wanRkiNaeC2V7aKtYQuiQLNnVYkZKpzKxnsvCDU2ZwXkt3Kg3X6r8OQ0qlVmXhKkuib9V2rhOfyk1bPY8zw6um1ayiD43mrYC6ht1e4IxLRrVYOSBUcjcrpLdEqrd9nruE/D2kx/DVTDEji/R4Ox8x174B14Dz4CDCJwBr6DCzAB1Nv1PntjL/L3/cBP/JOHUt/beI7Af+F/+wOGoLrq</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit>taco generates code dimension by dimension
18
for (int i = 0; i < m; i++) { for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j; int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_crd[pB3]; int kc = c1_crd[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { A[pA2] += B[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }
Aij = X
k
Bijk · ck
<latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="ck8pdC+ekZH4nUmSP+ZG7r8lEyk=">AB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i7+gsTLXTzQrMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GS7KDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RtRxzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D7odOn4MoA7ncAFXEMIN3MEDdKALAhJ4hXdv4r15H6uat6tDP4I+/zBzjGijg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="9QtCk7M63G3IchN+Z5Vs/Le7raY=">ACinicbZHfatswFMZld1u7rNvS9nI3YmGwiy1IdhPblELX7WKXLSxtITZGVpRUi/wHS4EoYfpK+1ubzM5zWBLe0Dw4zvn0znSKRrBlUbot+fvPHv+YnfvZe/V/us3b/sHh1eqbiVlE1qLWt4URDHBKzbRXAt20hGykKw62L5tctf3zGpeF390KuGZSVZVHzOKdFOyv3Jl1fMpWLIjNoiOIRCo8/oeE4TBCOHSAchXFgv+SG/7QWnsJUtWVultZsWfEoPE4C5wijMQ6RgzhOUIDteWdOm9KZ7WGW7YIjYM46Rqtw0GEwygeWUNzZ7J5f/A3Bx8D3sAbOIi7/9KZzVtS1ZpKohSU4wanRkiNaeC2V7aKtYQuiQLNnVYkZKpzKxnsvCDU2ZwXkt3Kg3X6r8OQ0qlVmXhKkuib9V2rhOfyk1bPY8zw6um1ayiD43mrYC6ht1e4IxLRrVYOSBUcjcrpLdEqrd9nruE/D2kx/DVTDEji/R4Ox8x174B14Dz4CDCJwBr6DCzAB1Nv1PntjL/L3/cBP/JOHUt/beI7Af+F/+wOGoLrq</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit>taco generates code dimension by dimension
19
Dense Compressed
Dense Compressed
Hand-coding support for a wide range of level formats is also infeasible
19
Dense Compressed
Dense Compressed
Compressed
Hashed Compressed
Singleton Dense
Range Compressed
Offset Dense Compressed
Hashed Compressed
Hashed Singleton Compressed
Singleton
Dense Compressed Singleton
Dense
Hand-coding support for a wide range of level formats is also infeasible
20
Code generation is performed in two stages
20
Code generation is performed in two stages
Compressed
Hashed
20
Code generation is performed in two stages
High-level algorithm
Compressed
Hashed
20
Code generation is performed in two stages
High-level algorithm Runnable code
Compressed
Hashed
20
Code generation is performed in two stages
High-level algorithm Runnable code
Compressed
Hashed
How to compute with different data structures
20
Code generation is performed in two stages
High-level algorithm Runnable code
Compressed
Hashed
How to compute with different data structures How to compute with multiple operands
21
Tensor algebra computations can be expressed in terms of high-level operations on tensor operands O M N
C D E
F P K Q R A B G H I J S L T U
21
Tensor algebra computations can be expressed in terms of high-level operations on tensor operands O M N
C D E
F P K Q R A B G H I J S L T U
21
Tensor algebra computations can be expressed in terms of high-level operations on tensor operands O M N
C D E
F P K Q R A B G H I J S L T U
21
Tensor algebra computations can be expressed in terms of high-level operations on tensor operands O M N
C D E
F P K Q R A B G H I J S L T U
22
Level formats declare whether they support various high-level operations
Dense Compressed Hashed Singleton Range Offset
22
Level formats declare whether they support various high-level operations
Random access Iteration
Dense Compressed Hashed Singleton Range Offset
22
Level formats declare whether they support various high-level operations
Random access Iteration
Dense Compressed Hashed Singleton Range Offset
22
Level formats declare whether they support various high-level operations
Random access Iteration
Dense Compressed Hashed Singleton Range Offset
22
Level formats declare whether they support various high-level operations
Random access Iteration
Dense Compressed Hashed Singleton Range Offset
23
Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations
Dense Compressed Hashed Singleton Range Offset
Random access
23
Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations
Dense Compressed Hashed Singleton Range Offset
Random access
23
Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations
Compressed
Hashed Dense Compressed Hashed Singleton Range Offset
Random access
23
Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations
Compressed
Hashed Dense Compressed Hashed Singleton Range Offset
Random access
Iterate over B and random access C
23
Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations
Dense Compressed Hashed Singleton Range Offset
Random access
Compressed
Singleton
23
Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations
Dense Compressed Hashed Singleton Range Offset
Random access
Compressed
Singleton
Simultaneously iterate over B and C
24
Random access
Dense Hashed
Level formats also specify how they support high-level operations
Compressed
Iteration
24
int pB2 = pB1 * N + j;
Random access
Dense Hashed
Level formats also specify how they support high-level operations
Compressed
Iteration
24
int pB2 = pB1 * N + j; int pB2 = j % W + pB1 * W; if (crd[pB2] != j && crd[pB2] != -1) { int end = pB2; do { pB2 = (pB2 + 1) % W; } while (crd[pB2] != j && crd[pB2] != -1 && pB2 != end); } if (crd[pB2] == j) {
Random access
Dense Hashed
Level formats also specify how they support high-level operations
Compressed
Iteration
24
int pB2 = pB1 * N + j; int pB2 = j % W + pB1 * W; if (crd[pB2] != j && crd[pB2] != -1) { int end = pB2; do { pB2 = (pB2 + 1) % W; } while (crd[pB2] != j && crd[pB2] != -1 && pB2 != end); } if (crd[pB2] == j) {
Random access
Dense Hashed
Level formats also specify how they support high-level operations
Compressed
Iteration
for (int j = 0; j < N; j++) { int pB2 = pB1 * N + j; for (int pB2 = pos[pB1]; pB2 < pos[pB1+1]; pB2++) { int j = crd[pB2]; for (int pB2 = pB1 * W; pB2 < (pB1 + 1) * W; pB2++) { int j = crd[pB2]; if (j != -1) {
25
Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations
Compressed
Hashed
25
Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations
for every element b in B: find corresponding element c in C A[i][j] = b * c;
Compressed
Hashed
25
Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations
for every element b in B: find corresponding element c in C A[i][j] = b * c;
Compressed
Hashed
25
Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations
for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1+1]; pB2++) { int j = B2_crd[pB2]; find corresponding element c in C A[i][j] = B[pB2] * c; }
Compressed
Hashed
25
Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations
for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1+1]; pB2++) { int j = B2_crd[pB2]; find corresponding element c in C A[i][j] = B[pB2] * c; }
Compressed
Hashed
25
Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations
Compressed
Hashed
for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1+1]; pB2++) { int j = B2_crd[pB2]; int pC2 = j % W + pC1 * W; if (C2_crd[pC2] != j && C2_crd[pC2] != -1) { int end = pC2; do { pC2 = (pC2 + 1) % W; } while (C2_crd[pB2] != j && C2_crd[pB2] != -1 && pC2 != end); } if (C2_crd[pC2] == j) { A[i][j] = B[pB2] * C[pC2]; } }
26
The same process can be repeated dimension by dimension
26
The same process can be repeated dimension by dimension
int iB = 0; int C0_pos = C0_pos_arr[0]; while (C0_pos < C0_pos_arr[1]) { int iC = C0_idx_arr[C0_pos]; int C0_end = C0_pos + 1; if (iC == iB) while ((C0_end < C0_pos_arr[1]) && (C0_idx_arr[C0_end] == iB)) { C0_end++; } if (iC == iB) { int B1_pos = B1_pos_arr[iB]; int C1_pos = C0_pos; while ((B1_pos < B1_pos_arr[iB + 1]) && (C1_pos < C0_end)) { int jB = B1_idx_arr[B1_pos]; int jC = C1_idx_arr[C1_pos]; int j = min(jB, jC); int A1_pos = (iB * A1_size) + j; int C1_end = C1_pos + 1; if (jC == j) while ((C1_end < C0_end) && (C1_idx_arr[C1_end] == j)) { C1_end++; } if ((jB == j) && (jC == j)) { int B2_pos = B2_pos_arr[B1_pos]; int C2_pos = C1_pos; while ((B2_pos < B2_pos_arr[B1_pos + 1]) && (C2_pos < C1_end)) { int kB = B2_idx_arr[B2_pos]; int kC = C2_idx_arr[C2_pos]; int k = min(kB, kC); int A2_pos = (A1_pos * A2_size) + k; if ((kB == k) && (kC == k)) { A_val_arr[A2_pos] = B_val_arr[B2_pos] + C_val_arr[C2_pos]; } else if (kB == k) { A_val_arr[A2_pos] = B_val_arr[B2_pos]; } else { A_val_arr[A2_pos] = C_val_arr[C2_pos]; } if (kB == k) B2_pos++; if (kC == k) C2_pos++; } while (B2_pos < B2_pos_arr[B1_pos + 1]) { int kB0 = B2_idx_arr[B2_pos]; int A2_pos0 = (A1_pos * A2_size) + kB0; A_val_arr[A2_pos0] = B_val_arr[B2_pos]; B2_pos++; } while (C2_pos < C1_end) { int kC0 = C2_idx_arr[C2_pos]; int A2_pos1 = (A1_pos * A2_size) + kC0; A_val_arr[A2_pos1] = C_val_arr[C2_pos]; C2_pos++; } } else if (jB == j) { for (int B2_pos0 = B2_pos_arr[B1_pos]; B2_pos0 < B2_pos_arr[B1_pos + 1]; B2_pos0++) { int kB1 = B2_idx_arr[B2_pos0]; int A2_pos2 = (A1_pos * A2_size) + kB1; A_val_arr[A2_pos2] = B_val_arr[B2_pos0]; } } else { for (int C2_pos0 = C1_pos; C2_pos0 < C1_end; C2_pos0++) { int kC1 = C2_idx_arr[C2_pos0]; int A2_pos3 = (A1_pos * A2_size) + kC1; A_val_arr[A2_pos3] = C_val_arr[C2_pos0]; } } if (jB == j) B1_pos++; if (jC == j) C1_pos = C1_end; } while (B1_pos < B1_pos_arr[iB + 1]) { int jB0 = B1_idx_arr[B1_pos]; int A1_pos0 = (iB * A1_size) + jB0; for (int B2_pos1 = B2_pos_arr[B1_pos]; B2_pos1 < B2_pos_arr[B1_pos + 1]; B2_pos1++) { int kB2 = B2_idx_arr[B2_pos1]; int A2_pos4 = (A1_pos0 * A2_size) + kB2; A_val_arr[A2_pos4] = B_val_arr[B2_pos1]; } B1_pos++; } while (C1_pos < C0_end) { int jC0 = C1_idx_arr[C1_pos]; int A1_pos1 = (iB * A1_size) + jC0; int C1_end0 = C1_pos + 1; while ((C1_end0 < C0_end) && (C1_idx_arr[C1_end0] == jC0)) { C1_end0++; } for (int C2_pos1 = C1_pos; C2_pos1 < C1_end0; C2_pos1++) { int kC2 = C2_idx_arr[C2_pos1]; int A2_pos5 = (A1_pos1 * A2_size) + kC2; A_val_arr[A2_pos5] = C_val_arr[C2_pos1]; } C1_pos = C1_end0; } } else { for (int B1_pos0 = B1_pos_arr[iB]; B1_pos0 < B1_pos_arr[iB + 1]; B1_pos0++) { int jB1 = B1_idx_arr[B1_pos0]; int A1_pos2 = (iB * A1_size) + jB1; for (int B2_pos2 = B2_pos_arr[B1_pos0]; B2_pos2 < B2_pos_arr[B1_pos0 + 1]; B2_pos2++) { int kB3 = B2_idx_arr[B2_pos2]; int A2_pos6 = (A1_pos2 * A2_size) + kB3; A_val_arr[A2_pos6] = B_val_arr[B2_pos2]; } } } if (iC == iB) C0_pos = C0_end; iB++; } while (iB < B0_size) { for (int B1_pos1 = B1_pos_arr[iB]; B1_pos1 < B1_pos_arr[iB + 1]; B1_pos1++) { int jB2 = B1_idx_arr[B1_pos1]; int A1_pos3 = (iB * A1_size) + jB2; for (int B2_pos3 = B2_pos_arr[B1_pos1]; B2_pos3 < B2_pos_arr[B1_pos1 + 1]; B2_pos3++) { int kB4 = B2_idx_arr[B2_pos3]; int A2_pos7 = (A1_pos3 * A2_size) + kB4; A_val_arr[A2_pos7] = B_val_arr[B2_pos3]; } } iB++; }26
The same process can be repeated dimension by dimension
int iB = 0; int C0_pos = C0_pos_arr[0]; while (C0_pos < C0_pos_arr[1]) { int iC = C0_idx_arr[C0_pos]; int C0_end = C0_pos + 1; if (iC == iB) while ((C0_end < C0_pos_arr[1]) && (C0_idx_arr[C0_end] == iB)) { C0_end++; } if (iC == iB) { int B1_pos = B1_pos_arr[iB]; int C1_pos = C0_pos; while ((B1_pos < B1_pos_arr[iB + 1]) && (C1_pos < C0_end)) { int jB = B1_idx_arr[B1_pos]; int jC = C1_idx_arr[C1_pos]; int j = min(jB, jC); int A1_pos = (iB * A1_size) + j; int C1_end = C1_pos + 1; if (jC == j) while ((C1_end < C0_end) && (C1_idx_arr[C1_end] == j)) { C1_end++; } if ((jB == j) && (jC == j)) { int B2_pos = B2_pos_arr[B1_pos]; int C2_pos = C1_pos; while ((B2_pos < B2_pos_arr[B1_pos + 1]) && (C2_pos < C1_end)) { int kB = B2_idx_arr[B2_pos]; int kC = C2_idx_arr[C2_pos]; int k = min(kB, kC); int A2_pos = (A1_pos * A2_size) + k; if ((kB == k) && (kC == k)) { A_val_arr[A2_pos] = B_val_arr[B2_pos] + C_val_arr[C2_pos]; } else if (kB == k) { A_val_arr[A2_pos] = B_val_arr[B2_pos]; } else { A_val_arr[A2_pos] = C_val_arr[C2_pos]; } if (kB == k) B2_pos++; if (kC == k) C2_pos++; } while (B2_pos < B2_pos_arr[B1_pos + 1]) { int kB0 = B2_idx_arr[B2_pos]; int A2_pos0 = (A1_pos * A2_size) + kB0; A_val_arr[A2_pos0] = B_val_arr[B2_pos]; B2_pos++; } while (C2_pos < C1_end) { int kC0 = C2_idx_arr[C2_pos]; int A2_pos1 = (A1_pos * A2_size) + kC0; A_val_arr[A2_pos1] = C_val_arr[C2_pos]; C2_pos++; } } else if (jB == j) { for (int B2_pos0 = B2_pos_arr[B1_pos]; B2_pos0 < B2_pos_arr[B1_pos + 1]; B2_pos0++) { int kB1 = B2_idx_arr[B2_pos0]; int A2_pos2 = (A1_pos * A2_size) + kB1; A_val_arr[A2_pos2] = B_val_arr[B2_pos0]; } } else { for (int C2_pos0 = C1_pos; C2_pos0 < C1_end; C2_pos0++) { int kC1 = C2_idx_arr[C2_pos0]; int A2_pos3 = (A1_pos * A2_size) + kC1; A_val_arr[A2_pos3] = C_val_arr[C2_pos0]; } } if (jB == j) B1_pos++; if (jC == j) C1_pos = C1_end; } while (B1_pos < B1_pos_arr[iB + 1]) { int jB0 = B1_idx_arr[B1_pos]; int A1_pos0 = (iB * A1_size) + jB0; for (int B2_pos1 = B2_pos_arr[B1_pos]; B2_pos1 < B2_pos_arr[B1_pos + 1]; B2_pos1++) { int kB2 = B2_idx_arr[B2_pos1]; int A2_pos4 = (A1_pos0 * A2_size) + kB2; A_val_arr[A2_pos4] = B_val_arr[B2_pos1]; } B1_pos++; } while (C1_pos < C0_end) { int jC0 = C1_idx_arr[C1_pos]; int A1_pos1 = (iB * A1_size) + jC0; int C1_end0 = C1_pos + 1; while ((C1_end0 < C0_end) && (C1_idx_arr[C1_end0] == jC0)) { C1_end0++; } for (int C2_pos1 = C1_pos; C2_pos1 < C1_end0; C2_pos1++) { int kC2 = C2_idx_arr[C2_pos1]; int A2_pos5 = (A1_pos1 * A2_size) + kC2; A_val_arr[A2_pos5] = C_val_arr[C2_pos1]; } C1_pos = C1_end0; } } else { for (int B1_pos0 = B1_pos_arr[iB]; B1_pos0 < B1_pos_arr[iB + 1]; B1_pos0++) { int jB1 = B1_idx_arr[B1_pos0]; int A1_pos2 = (iB * A1_size) + jB1; for (int B2_pos2 = B2_pos_arr[B1_pos0]; B2_pos2 < B2_pos_arr[B1_pos0 + 1]; B2_pos2++) { int kB3 = B2_idx_arr[B2_pos2]; int A2_pos6 = (A1_pos2 * A2_size) + kB3; A_val_arr[A2_pos6] = B_val_arr[B2_pos2]; } } } if (iC == iB) C0_pos = C0_end; iB++; } while (iB < B0_size) { for (int B1_pos1 = B1_pos_arr[iB]; B1_pos1 < B1_pos_arr[iB + 1]; B1_pos1++) { int jB2 = B1_idx_arr[B1_pos1]; int A1_pos3 = (iB * A1_size) + jB2; for (int B2_pos3 = B2_pos_arr[B1_pos1]; B2_pos3 < B2_pos_arr[B1_pos1 + 1]; B2_pos3++) { int kB4 = B2_idx_arr[B2_pos3]; int A2_pos7 = (A1_pos3 * A2_size) + kB4; A_val_arr[A2_pos7] = B_val_arr[B2_pos3]; } } iB++; }27
Evaluation
Mode-generic tensor Compressed Singleton Dense Dense DIA Dense Range Offset
x supports locate? y supports locate? y supports locate? x unordered and y ordered? co-iterate over x and y iterate over x and locate into y iterate over y and locate into x no yes no yes yes no no yesFormat Abstraction & Code Generation
28
Our technique supports a wide range of disparate tensor formats
taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic
28
Our technique supports a wide range of disparate tensor formats
taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic
28
Our technique supports a wide range of disparate tensor formats
taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic
28
Our technique supports a wide range of disparate tensor formats
taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic
28
Our technique supports a wide range of disparate tensor formats
taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic
28
Our technique supports a wide range of disparate tensor formats
taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic
28
Our technique supports a wide range of disparate tensor formats
taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic
28
Our technique supports a wide range of disparate tensor formats
taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic
29
Our technique generates efficient code
29
Our technique generates efficient code
Coordinate SpMV
Normalized time 0.0 0.5 1.0 1.5 This work SciPy Intel MKL MTL4 TensorFlow
DIA SpMV
This work SciPy Intel MKL
29
Our technique generates efficient code
Coordinate SpMV
Normalized time 0.0 0.5 1.0 1.5 This work SciPy Intel MKL MTL4 TensorFlow
DIA SpMV
This work SciPy Intel MKL
CSR Addition
Normalized time 0.0 2.0 4.0 6.0 8.0 10.0 This work SciPy Intel MKL MTL4
Coordinate MTTKRP
This work Tensor Toolbox
29
Our technique generates efficient code
Coordinate SpMV
Normalized time 0.0 0.5 1.0 1.5 This work SciPy Intel MKL MTL4 TensorFlow
DIA SpMV
This work SciPy Intel MKL
CSR Addition
Normalized time 0.0 2.0 4.0 6.0 8.0 10.0 This work SciPy Intel MKL MTL4
Coordinate MTTKRP
This work Tensor Toolbox
29
Our technique generates efficient code
Coordinate SpMV
Normalized time 0.0 0.5 1.0 1.5 This work SciPy Intel MKL MTL4 TensorFlow
DIA SpMV
This work SciPy Intel MKL
CSR Addition
Normalized time 0.0 2.0 4.0 6.0 8.0 10.0 This work SciPy Intel MKL MTL4
Coordinate MTTKRP
This work Tensor Toolbox
29
Our technique generates efficient code
Coordinate SpMV
Normalized time 0.0 0.5 1.0 1.5 This work SciPy Intel MKL MTL4 TensorFlow
DIA SpMV
This work SciPy Intel MKL
CSR Addition
Normalized time 0.0 2.0 4.0 6.0 8.0 10.0 This work SciPy Intel MKL MTL4
Coordinate MTTKRP
This work Tensor Toolbox
30
In conclusion…
We can automatically generate kernels that compute with disparate tensor formats
30
In conclusion…
We can automatically generate kernels that compute with disparate tensor formats Adding support for even more tensor formats is straightforward
30
In conclusion…
Supporting many disparate tensor formats is essential for performance We can automatically generate kernels that compute with disparate tensor formats Adding support for even more tensor formats is straightforward
30
In conclusion…
Supporting many disparate tensor formats is essential for performance We can automatically generate kernels that compute with disparate tensor formats Adding support for even more tensor formats is straightforward This work supported by: tensor-compiler.org