1
Sparse tensors are a natural way of representing real-world data 1 - - PowerPoint PPT Presentation
Sparse tensors are a natural way of representing real-world data 1 - - PowerPoint PPT Presentation
Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural way of representing real-world data Durable Poor Quality Peter 2 Lilly
1
Sparse tensors are a natural way of representing real-world data
1
Quality Durable Poor … 2 1 1 1 1 1 2 3 1 1 Kindle Dubliners The Iliad Monitor Sweater Laptop Candide Jacket … Peter Paul Mary Bob Sam Billy Lilly Hilde …
Sparse tensors are a natural way of representing real-world data
1
Quality Durable Poor … 2 1 1 1 1 1 2 3 1 1 Kindle Dubliners The Iliad Monitor Sweater Laptop Candide Jacket … Peter Paul Mary Bob Sam Billy Lilly Hilde …
Sparse tensors are a natural way of representing real-world data
Dense storage: 107 exabytes Sparse storage: 13 gigabytes
2
DOK MSR LNK ELL DIA CSR COO DCSC USS DCSR CSC DNS BDIA BCSR BCOO SELL SKY BELL LIL VBR JAD
There exists many different formats for storing tensors
BND CSB
2
DOK MSR LNK ELL DIA CSR COO DCSC USS DCSR CSC DNS BDIA BCSR BCOO SELL SKY BELL LIL VBR JAD
There exists many different formats for storing tensors
BND CSB
Efficient insertions
2
DOK MSR LNK ELL DIA CSR COO DCSC USS DCSR CSC DNS BDIA BCSR BCOO SELL SKY BELL LIL VBR JAD
There exists many different formats for storing tensors
BND CSB
Structured stencils
2
DOK MSR LNK ELL DIA CSR COO DCSC USS DCSR CSC DNS BDIA BCSR BCOO SELL SKY BELL LIL VBR JAD
There exists many different formats for storing tensors
BND CSB
Unstructured mesh simulations
3
Applications must work with tensors in different formats for performance
Construct tensor T Compute with tensor T
Time
3
Applications must work with tensors in different formats for performance
Construct tensor T Compute with tensor T Construct tensor T in COO Compute with tensor T in COO
Only COO:
Time
3
Applications must work with tensors in different formats for performance
Construct tensor T Compute with tensor T Construct tensor T in COO Compute with tensor T in COO
Only COO:
Construct tensor T in DIA Compute with tensor T in DIA
Only DIA:
Time
3
Applications must work with tensors in different formats for performance
Construct tensor T Compute with tensor T Construct tensor T in COO Compute with tensor T in COO
Only COO:
Construct tensor T in DIA Compute with tensor T in DIA
Only DIA:
Time
3
Applications must work with tensors in different formats for performance
Construct tensor T Compute with tensor T Construct tensor T in COO Compute with tensor T in COO
Only COO:
Construct tensor T in DIA Compute with tensor T in DIA
Only DIA:
Time
3
Applications must work with tensors in different formats for performance
Construct tensor T Compute with tensor T
Hybrid:
Construct tensor T in COO Compute with tensor T in COO
Only COO:
Construct tensor T in DIA Compute with tensor T in DIA
Only DIA:
Time
Construct tensor T in COO Compute with tensor T in DIA
3
Applications must work with tensors in different formats for performance
Construct tensor T Compute with tensor T COO → DIA
Hybrid:
Construct tensor T in COO Compute with tensor T in COO
Only COO:
Construct tensor T in DIA Compute with tensor T in DIA
Only DIA:
Time
Construct tensor T in COO Compute with tensor T in DIA
4
Manually implementing support for efficient conversion between all combinations of formats is infeasible
ELL DIA BCSR COO JAD BND SKY
. . .
CSR
. . .
ELL DIA BCSR COO JAD BND SKY CSR
4
Manually implementing support for efficient conversion between all combinations of formats is infeasible
ELL DIA BCSR COO JAD BND SKY
. . . . . .
CSR
. . .
ELL DIA BCSR COO JAD BND SKY CSR
4
Manually implementing support for efficient conversion between all combinations of formats is infeasible
ELL DIA BCSR COO JAD BND SKY
. . . . . .
CSR
. . .
ELL DIA BCSR COO JAD BND SKY CSR
int K = 0; for (int i = 0; i < N; i++) { int ncols = A_pos[i+1] - A_pos[i]; K = max(K, ncols); } int* B_crd = new int[K * N](); double* B_vals = new double[K * N](); for (int i = 0; i < N; i++) { int count = 0; for (int pA2 = A_pos[i]; pA2 < A_pos[i+1]; pA2++) { int j = A_crd[pA2]; int k = count++; int pB2 = k * N + i; B_crd[pB2] = j; B_vals[pB2] = A_vals[pA2]; }} int count[N] = {0}; for (int pA1 = A_pos[0]; pA1 < A_pos[1]; pA1++) { int i = A1_crd[pA1]; count[i]++; } int* B_pos = new int[N + 1]; B_pos[0] = 0; for (int i = 0; i < N; i++) { B_pos[i + 1] = B_pos[i] + count[i]; } int* B_crd = new int[pos[N]]; double* B_vals = new double[pos[N]]; for (int pA1 = A_pos[0]; pA1 < A_pos[1]; pA1++) { int i = A1_crd[pA1]; int j = A2_crd[pA1]; int pB2 = pos[i]++; B_crd[pB2] = j; B_vals[pB2] = A_vals[pA2]; } for (int i = 0; i < N; i++) { B_pos[N - i] = B_pos[N - i - 1]; } B_pos[0] = 0;
bool nz[2 * N - 1] = {0}; for (int i = 0; i < N; i++) { for (int pA2 = A_pos[i]; pA2 < A_pos[i+1]; pA2++) { int j = A_crd[pA2]; int k = j - i; nz[k + N - 1] = true; }} int* B_perm = new int[2 * N - 1]; int K = 0; for (int i = -N + 1; i < N; i++) { if (nz[i + N - 1]) B_perm[K++] = i; } double* B_vals = new double[K * N](); int* B_rperm = new int[2 * N - 1]; for (int i = 0; i < K; i++) { B_rperm[B_perm[i] + N - 1] = i; } for (int i = 0; i < N; i++) { for (int pA2 = A_pos[i]; pA2 < A_pos[i+1]; pA2++) { int j = A_crd[pA2]; int k = j - i; int pB1 = B_rperm[k + N - 1]; int pB2 = pB1 * N + i; B_vals[pB2] = A_vals[pA2]; }}
5
Hand-optimized libraries limit support for efficient conversion to few combinations of formats
CSR ELL DIA BCSR COO JAD BND SKY
. . . . . .
ELL DIA BCSR COO JAD BND SKY
. . . . . .
5
Hand-optimized libraries limit support for efficient conversion to few combinations of formats
CSR ELL DIA BCSR COO JAD BND SKY
. . . . . .
ELL DIA BCSR COO JAD BND SKY
. . . . . .
5
Hand-optimized libraries limit support for efficient conversion to few combinations of formats
CSR ELL DIA BCSR COO JAD BND SKY
. . . . . .
ELL DIA BCSR COO JAD BND SKY
. . . . . .
5
Hand-optimized libraries limit support for efficient conversion to few combinations of formats
CSR ELL DIA BCSR COO JAD BND SKY
. . . . . .
ELL DIA BCSR COO JAD BND SKY
. . . . . .
6
Inefficient conversion eliminates benefit of using different formats
Construct tensor T in COO Compute with tensor T in COO Construct tensor T in DIA Compute with tensor T in DIA
Only COO: Only DIA:
Time
Construct tensor T in COO Compute with tensor T in DIA COO → CSR
Hybrid w/ libraries:
CSR → DIA
Automatic Generation of Efficient Sparse Tensor Format Conversion Routines
Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe
8
A compiler can generate efficient conversion routines from standalone specifications for each tensor format
ELL DIA BCSR COO JAD BND SKY
. . . . . . . . .
CSR ELL DIA BCSR COO JAD BND SKY CSR
8
A compiler can generate efficient conversion routines from standalone specifications for each tensor format
ELL DIA BCSR COO JAD BND SKY
. . . . . . . . .
CSR ELL DIA BCSR COO JAD BND SKY CSR
8
A compiler can generate efficient conversion routines from standalone specifications for each tensor format
ELL DIA BCSR COO JAD BND SKY
. . . . . . . . .
CSR ELL DIA BCSR COO JAD BND SKY CSR
8
A compiler can generate efficient conversion routines from standalone specifications for each tensor format
ELL DIA BCSR COO JAD BND SKY
. . . . . . . . .
CSR ELL DIA BCSR COO JAD BND SKY CSR
8
A compiler can generate efficient conversion routines from standalone specifications for each tensor format
ELL DIA BCSR COO JAD BND SKY
. . . . . . . . .
CSR ELL DIA BCSR COO JAD BND SKY CSR
9
Our technique generates efficient code
Normalized time 1 2 3 4 5 COO → CSR CSR → CSC CSR → DIA CSC → DIA COO → DIA
This work SPARSKIT Intel MKL
9
Our technique generates efficient code
Normalized time 1 2 3 4 5 COO → CSR CSR → CSC CSR → DIA CSC → DIA COO → DIA
This work SPARSKIT Intel MKL
10
Being able to generate efficient conversion routines lets users exploit different formats for performance
Construct tensor T in COO Compute with tensor T in COO Construct tensor T in DIA Compute with tensor T in DIA
Only COO: Only DIA:
Time
Construct tensor T in COO Compute with tensor T in DIA
Hybrid w/
- ur approach:
Construct tensor T in COO Compute with tensor T in DIA COO → CSR
Hybrid w/ libraries:
CSR → DIA COO → DIA
11
Coordinate Remappings Attribute Queries
11
Coordinate Remappings Attribute Queries
12
Different tensor formats arrange nonzeros in memory in different ways
A B F E C D H G J
12
Different tensor formats arrange nonzeros in memory in different ways
0 2 4 7 9 pos crd 0 2 1 2 1 2 4 2 5 vals A B C D E F G H J
CSR A B F E C D H G J
12
Different tensor formats arrange nonzeros in memory in different ways
4 N M 6 3 K
- 1 0 2
perm vals C E H A D F B G J
DIA
0 2 4 7 9 pos crd 0 2 1 2 1 2 4 2 5 vals A B C D E F G H J
CSR A B F E C D H G J
12
Different tensor formats arrange nonzeros in memory in different ways
4 N M 6 3 K
- 1 0 2
perm vals C E H A D F B G J
DIA
0 2 4 7 9 pos crd 0 2 1 2 1 2 4 2 5 vals A B C D E F G H J
CSR
0 1 3 pos crd 0 0 1 vals 2 BI BJ 3 A B C D E F H G J
BCSR A B F E C D H G J
13
Coordinate remapping captures how nonzeros are arranged in memory
A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J A B E C D H J
13
Coordinate remapping captures how nonzeros are arranged in memory
A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J A B C D E F 1 1 2 2 i j 2 3 3 G H J 2 1 1 2 4 2 5
13
Coordinate remapping captures how nonzeros are arranged in memory
A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J A B C D E F 1 1 2 2 i j 2 3 3 G H J 2 1 1 2 4 2 5 j-i 2
- 1
- 1
2
- 1
2
13
Coordinate remapping captures how nonzeros are arranged in memory
A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J i j j-i C E H A D F 1 2 3 1 2 2 3 B G J 1 2 1 2 2 4 5
- 1 -1 -1
2 2 2
13
Coordinate remapping captures how nonzeros are arranged in memory
A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J i j j-i C E H A D F 1 2 3 1 2 2 3 B G J 1 2 1 2 2 4 5
- 1 -1 -1
2 2 2 C E H A D F 1 2 3 1 2 2 3 B G J 1 2 1 2 2 4 5
- 1 -1 -1
2 2 2 A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J
14
Coordinate remapping captures how nonzeros are arranged in memory
C E H A D F 1 2 3 1 2 i j 2 3 B G J 1 2 1 2 2 4 5
- 1 -1 -1
j-i 2 2 2
14
Coordinate remapping captures how nonzeros are arranged in memory
C E H A D 4 N M F B 6 3 K
- 1
2 perm G J vals C E H A D F 1 2 3 1 2 i j 2 3 B G J 1 2 1 2 2 4 5
- 1 -1 -1
j-i 2 2 2
14
Coordinate remapping captures how nonzeros are arranged in memory
(i,j) -> (j-i,i,j) C E H A D 4 N M F B 6 3 K
- 1
2 perm G J vals
14
Coordinate remapping captures how nonzeros are arranged in memory
(i,j) -> (j-i,i,j) C E H A D 4 N M F B 6 3 K
- 1
2 perm G J vals
14
Coordinate remapping captures how nonzeros are arranged in memory
(i,j) -> (j-i,i,j) C E H A D 4 N M F B 6 3 K
- 1
2 perm G J vals
15
Compiler uses coordinate remapping to generate code to reorder nonzeros
(i,j) -> (j-i,i,j)
15
Compiler uses coordinate remapping to generate code to reorder nonzeros
for (int bi = 0; bi < M / BI; bi++) { for (int bj = 0; bj < N / BJ; bj++) { for (int i = bi * BI; i < (bi + 1) * BI; i++) { for (int j = bj * BJ; j < (bj + 1) * BJ; j++) { if (B[i,j] != 0.0) { Identify segment d in vals that corresponds to j - i Identify position p in d that corresponds to i and j vals[p] = B[i,j] } } } } }
(i,j) -> (j-i,i,j)
15
Compiler uses coordinate remapping to generate code to reorder nonzeros
for (int bi = 0; bi < M / BI; bi++) { for (int bj = 0; bj < N / BJ; bj++) { for (int i = bi * BI; i < (bi + 1) * BI; i++) { for (int j = bj * BJ; j < (bj + 1) * BJ; j++) { if (B[i,j] != 0.0) { Identify segment d in vals that corresponds to j - i Identify position p in d that corresponds to i and j vals[p] = B[i,j] } } } } }
(i,j) -> (j-i,i,j) 4 N M 6 3 K
- 1
2 perm vals A B F E C D H G J
j = 0 1 2 3 i = 0 1 2 3 4 5
15
Compiler uses coordinate remapping to generate code to reorder nonzeros
for (int bi = 0; bi < M / BI; bi++) { for (int bj = 0; bj < N / BJ; bj++) { for (int i = bi * BI; i < (bi + 1) * BI; i++) { for (int j = bj * BJ; j < (bj + 1) * BJ; j++) { if (B[i,j] != 0.0) { Identify segment d in vals that corresponds to j - i Identify position p in d that corresponds to i and j vals[p] = B[i,j] } } } } }
(i,j) -> (j-i,i,j) 4 N M 6 3 K
- 1
2 perm vals A B F E C D H G J
j = 0 1 2 3 i = 0 1 2 3 4 5
15
Compiler uses coordinate remapping to generate code to reorder nonzeros
for (int bi = 0; bi < M / BI; bi++) { for (int bj = 0; bj < N / BJ; bj++) { for (int i = bi * BI; i < (bi + 1) * BI; i++) { for (int j = bj * BJ; j < (bj + 1) * BJ; j++) { if (B[i,j] != 0.0) { Identify segment d in vals that corresponds to j - i Identify position p in d that corresponds to i and j vals[p] = B[i,j] } } } } }
(i,j) -> (j-i,i,j) 4 N M 6 3 K
- 1
2 perm vals A B F E C D H G J
j = 0 1 2 3 i = 0 1 2 3 4 5
15
Compiler uses coordinate remapping to generate code to reorder nonzeros
for (int bi = 0; bi < M / BI; bi++) { for (int bj = 0; bj < N / BJ; bj++) { for (int i = bi * BI; i < (bi + 1) * BI; i++) { for (int j = bj * BJ; j < (bj + 1) * BJ; j++) { if (B[i,j] != 0.0) { Identify segment d in vals that corresponds to j - i Identify position p in d that corresponds to i and j vals[p] = B[i,j] } } } } }
(i,j) -> (j-i,i,j) 4 N M 6 3 K
- 1
2 perm vals A B F E C D H G J
j = 0 1 2 3 i = 0 1 2 3 4 5
A
15
Compiler uses coordinate remapping to generate code to reorder nonzeros
for (int bi = 0; bi < M / BI; bi++) { for (int bj = 0; bj < N / BJ; bj++) { for (int i = bi * BI; i < (bi + 1) * BI; i++) { for (int j = bj * BJ; j < (bj + 1) * BJ; j++) { if (B[i,j] != 0.0) { Identify segment d in vals that corresponds to j - i Identify position p in d that corresponds to i and j vals[p] = B[i,j] } } } } }
4 N M 6 3 K
- 1
2 perm vals A B F E C D H G J
j = 0 1 2 3 i = 0 1 2 3 4 5
A
15
Compiler uses coordinate remapping to generate code to reorder nonzeros
for (int bi = 0; bi < M / BI; bi++) { for (int bj = 0; bj < N / BJ; bj++) { for (int i = bi * BI; i < (bi + 1) * BI; i++) { for (int j = bj * BJ; j < (bj + 1) * BJ; j++) { if (B[i,j] != 0.0) { Identify segment d in vals that corresponds to j - i Identify position p in d that corresponds to i and j vals[p] = B[i,j] } } } } }
4 N M 6 3 K
- 1
2 perm vals A B F E C D H G J
j = 0 1 2 3 i = 0 1 2 3 4 5
A B C A B C A D B C E A D B C E A D F B C E H A D F B C E H A D F B J C E H A D F B G J
16
Coordinate Remappings Attribute Queries
17
Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor
A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals COO
17
Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor
A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J A B C D E F 2 4 7 9 pos crd G H J 2 1 2 1 2 4 2 5 vals CSR
17
Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor
A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J A B C D E F 2 4 7 9 pos crd G H J 2 1 2 1 2 4 2 5 vals CSR
18
Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor
A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals pos crd vals
18
Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor
A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals pos crd vals A
18
Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor
A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals pos crd vals 1 1 1 1 1 1 1 1 1 1 A
18
Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor
A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals pos crd vals 1 1 1 1 1 1 1 1 1 1 A A C 1 2 2 2 A C D 1 3 3 3 1 A C D E 1 3 4 4 1 1
18
Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor
A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals pos crd vals 1 1 1 1 1 1 1 1 1 1 A A C 1 2 2 2 A C D 1 3 3 3 1 A C D E 1 3 4 4 1 1
18
Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor
A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals pos crd vals 1 1 1 1 1 1 1 1 1 1 A A C 1 2 2 2 A C D 1 3 3 3 1 A C D E 1 3 4 4 1 1 A C D E 1 1 A C D E 1 1 A C D E 1 1 A B C D E 2 1 1 2 4 5 5
18
Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor
A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals pos crd vals 1 1 1 1 1 1 1 1 1 1 A A C 1 2 2 2 A C D 1 3 3 3 1 A C D E 1 3 4 4 1 1 A C D E 1 1 A C D E 1 1 A C D E 1 1 A B C D E 2 1 1 2 4 5 5 A B C D E H 2 4 5 6 2 1 1 2 A B C D E H 2 4 5 7 J 2 1 1 2 5 A B C D E H J 2 1 1 2 5 A B C D E H J 2 1 1 2 5 A B C D E F 2 4 6 8 H J 2 1 1 2 2 5 A B C D E F H J 2 1 1 2 2 5 A B C D E F H J 2 1 1 2 2 5 A B C D E F 2 4 7 9 G H J 2 1 1 2 4 2 5
19
Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor
pos crd vals i nnz 2 1 2 2 3 3 2 A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals
19
Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor
pos crd vals i nnz 2 1 2 2 3 3 2 A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals 2 2 4 2 4 7 2 4 7 9
19
Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor
pos crd vals i nnz 2 1 2 2 3 3 2 A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals 2 2 4 2 4 7 2 4 7 9
19
Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor
pos crd vals i nnz 2 1 2 2 3 3 2 C C D 1 H 2 H J 2 5 A B 2 E 1 F 2 G 4 A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals 2 2 4 2 4 7 2 4 7 9
20
Converting tensors to different formats requires knowing different statistics about the tensors
A B F E C D H G J SKY:
20
Converting tensors to different formats requires knowing different statistics about the tensors
A B F E C D H G J SKY: A B F E C D H G J lb ub BND:
20
Converting tensors to different formats requires knowing different statistics about the tensors
A B F E C D H G J SKY: A B F E C D H G J lb ub BND: A B F E C D H G J DIA: −3 −2 −1 1 2
21
Attribute queries express tensor statistics as aggregations
- ver the coordinates of nonzeros
select [i] -> count(j) as nnz A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J
21
Attribute queries express tensor statistics as aggregations
- ver the coordinates of nonzeros
select [i] -> count(j) as nnz A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J
21
Attribute queries express tensor statistics as aggregations
- ver the coordinates of nonzeros
select [i] -> count(j) as nnz A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J
21
Attribute queries express tensor statistics as aggregations
- ver the coordinates of nonzeros
select [i] -> count(j) as nnz i nnz 2 1 2 2 3 3 2 A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J
21
Attribute queries express tensor statistics as aggregations
- ver the coordinates of nonzeros
select [i] -> count(j) as nnz i nnz 2 1 2 2 3 3 2 A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J
22
Compiler generates code to compute attribute queries by reducing them to sparse tensor computations
select [i] -> count(j) as Q
∀i∀j Qi += map(Bij, 1)
<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J
22
Compiler generates code to compute attribute queries by reducing them to sparse tensor computations
select [i] -> count(j) as Q 1 1
j = 0 1 2 3 i = 0 1 2
1 1 1 1
3
1
4 5
1 1
∀i∀j Qi += map(Bij, 1)
<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J
22
Compiler generates code to compute attribute queries by reducing them to sparse tensor computations
select [i] -> count(j) as Q 1 1
j = 0 1 2 3 i = 0 1 2
1 1 1 1
3
1
4 5
1 1 2 3 2 2
1 2 3
Q
∀i∀j Qi += map(Bij, 1)
<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>A B
j = 0 1 2 3 i = 0 1 2
F E C D
3
H
4 5
G J
23
select [i] -> count(j) as Q
∀i∀j Qi += map(Bij, 1)
<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>Compiler generates code to compute attribute queries by reducing them to sparse tensor computations
23
select [i] -> count(j) as Q
for (int j = 0; j < N; j++) { for (int pB = pos[j]; pB < pos[j+1]; pB++) { int i = crd[pB2]; Q[i] += 1; } }
B is CSC
∀i∀j Qi += map(Bij, 1)
<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>Compiler generates code to compute attribute queries by reducing them to sparse tensor computations
23
select [i] -> count(j) as Q
for (int j = 0; j < N; j++) { for (int pB = pos[j]; pB < pos[j+1]; pB++) { int i = crd[pB2]; Q[i] += 1; } }
B is CSC
∀i∀j Qi += map(Bij, 1)
<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>Compiler generates code to compute attribute queries by reducing them to sparse tensor computations
23
select [i] -> count(j) as Q
for (int j = 0; j < N; j++) { for (int pB = pos[j]; pB < pos[j+1]; pB++) { int i = crd[pB2]; Q[i] += 1; } }
B is CSC
∀i∀j Qi += map(Bij, 1)
<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>for (int pB = 0; pB < NNZ; pB++) { int i = rows[pB]; Q[i] += 1; }
B is COO
Compiler generates code to compute attribute queries by reducing them to sparse tensor computations
23
select [i] -> count(j) as Q
for (int j = 0; j < N; j++) { for (int pB = pos[j]; pB < pos[j+1]; pB++) { int i = crd[pB2]; Q[i] += 1; } }
B is CSC
∀i∀j Qi += map(Bij, 1)
<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>for (int pB = 0; pB < NNZ; pB++) { int i = rows[pB]; Q[i] += 1; }
B0
i ≡ (pos[i+1] − pos[i])
<latexit sha1_base64="HaH7Ltw/APGMaeJOrZa8AB6wifo=">ACGnicbVDLSgNBEJyNrxhfUY9eBoMYEcOuBNSbxIvHCYK2WZnXTM4OzDmV4xLPkOL/6KFw+KeBMv/o2Tx8FECxqKqm6u4JECo2/W3lZmbn5hfyi4Wl5ZXVteL6RlPHqeLQ4LGM1XANEgRQMFSrhOFLAwkHAV3J4N/Kt7UFrE0SX2EvBCdhOJjuAMjeQXndquL6gLd6m4p2UX4QERsyTWLbHveH16QCc0r7/nF0t2xR6C/iXOmJTIGHW/+Om2Y56GECGXTOuWYyfoZUyh4BL6BTfVkDB+y26gZWjEQtBeNnytT3eM0qadWJmKkA7V3xMZC7XuhYHpDBl29bQ3EP/zWil2jr1MREmKEPHRok4qKcZ0kBNtCwUcZc8QxpUwt1LeZYpxNGkWTAjO9Mt/SfOw4lQrJxfV0mltHEebJFtUiYOSKn5JzUSYNw8kieySt5s56sF+vd+hi15qzxzCaZgPX1A4WOoJo=</latexit>∀i Qi = B0
i
<latexit sha1_base64="gy7nyxWiKB53mq4z7gsxf6K8IKk=">ACAnicbVDLSsNAFL3xWesr6krcDBbRVUmkoCJCqRuXLdgHNCFMpN26OTBzEQobjxV9y4UMStX+HOv3HaZqGtBy4czrmXe+/xE86ksqxvY2l5ZXVtvbBR3Nza3tk19/ZbMk4FoU0S81h0fCwpZxFtKqY47SC4tDntO0Pbyd+4EKyeLoXo0S6oa4H7GAEay05JmHThALzLmXsTFyrlHDY+gG1U495pklq2xNgRaJnZMS5Kh75pfTi0ka0kgRjqXs2lai3AwLxQin46KTSpgMsR92tU0wiGVbjZ9YxOtNJD+hZdkUJT9fdEhkMpR6GvO0OsBnLem4j/ed1UBZduxqIkVTQis0VBypGK0SQP1GOCEsVHmAimL4VkQEWmCidWlGHYM+/vEha52W7Ur5qVErVWh5HAY7gGM7Ahguowh3UoQkEHuEZXuHNeDJejHfjY9a6ZOQzB/AHxucPCrqV6w=</latexit>B is COO B is CSR
Compiler generates code to compute attribute queries by reducing them to sparse tensor computations
23
select [i] -> count(j) as Q
for (int j = 0; j < N; j++) { for (int pB = pos[j]; pB < pos[j+1]; pB++) { int i = crd[pB2]; Q[i] += 1; } }
B is CSC
∀i∀j Qi += map(Bij, 1)
<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>for (int pB = 0; pB < NNZ; pB++) { int i = rows[pB]; Q[i] += 1; } for (int i = 0; i < N; i++) { Q[i] = pos[i+1] - pos[i]; }
B0
i ≡ (pos[i+1] − pos[i])
<latexit sha1_base64="HaH7Ltw/APGMaeJOrZa8AB6wifo=">ACGnicbVDLSgNBEJyNrxhfUY9eBoMYEcOuBNSbxIvHCYK2WZnXTM4OzDmV4xLPkOL/6KFw+KeBMv/o2Tx8FECxqKqm6u4JECo2/W3lZmbn5hfyi4Wl5ZXVteL6RlPHqeLQ4LGM1XANEgRQMFSrhOFLAwkHAV3J4N/Kt7UFrE0SX2EvBCdhOJjuAMjeQXndquL6gLd6m4p2UX4QERsyTWLbHveH16QCc0r7/nF0t2xR6C/iXOmJTIGHW/+Om2Y56GECGXTOuWYyfoZUyh4BL6BTfVkDB+y26gZWjEQtBeNnytT3eM0qadWJmKkA7V3xMZC7XuhYHpDBl29bQ3EP/zWil2jr1MREmKEPHRok4qKcZ0kBNtCwUcZc8QxpUwt1LeZYpxNGkWTAjO9Mt/SfOw4lQrJxfV0mltHEebJFtUiYOSKn5JzUSYNw8kieySt5s56sF+vd+hi15qzxzCaZgPX1A4WOoJo=</latexit>∀i Qi = B0
i
<latexit sha1_base64="gy7nyxWiKB53mq4z7gsxf6K8IKk=">ACAnicbVDLSsNAFL3xWesr6krcDBbRVUmkoCJCqRuXLdgHNCFMpN26OTBzEQobjxV9y4UMStX+HOv3HaZqGtBy4czrmXe+/xE86ksqxvY2l5ZXVtvbBR3Nza3tk19/ZbMk4FoU0S81h0fCwpZxFtKqY47SC4tDntO0Pbyd+4EKyeLoXo0S6oa4H7GAEay05JmHThALzLmXsTFyrlHDY+gG1U495pklq2xNgRaJnZMS5Kh75pfTi0ka0kgRjqXs2lai3AwLxQin46KTSpgMsR92tU0wiGVbjZ9YxOtNJD+hZdkUJT9fdEhkMpR6GvO0OsBnLem4j/ed1UBZduxqIkVTQis0VBypGK0SQP1GOCEsVHmAimL4VkQEWmCidWlGHYM+/vEha52W7Ur5qVErVWh5HAY7gGM7Ahguowh3UoQkEHuEZXuHNeDJejHfjY9a6ZOQzB/AHxucPCrqV6w=</latexit>B is COO B is CSR
Compiler generates code to compute attribute queries by reducing them to sparse tensor computations
24
In conclusion…
Efficient sparse tensor conversion routines can be automatically generated from per-format specifications This work was supported by:
tensor-compiler.org
24
In conclusion…
Efficient sparse tensor conversion routines can be automatically generated from per-format specifications Adding support for new sparse tensor formats is straightforward This work was supported by:
tensor-compiler.org
24