Sparse tensors are a natural way of representing real-world data 1 - - PowerPoint PPT Presentation

sparse tensors are a natural way of representing real
SMART_READER_LITE
LIVE PREVIEW

Sparse tensors are a natural way of representing real-world data 1 - - PowerPoint PPT Presentation

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural way of representing real-world data Durable Poor Quality Peter 2 Lilly


slide-1
SLIDE 1

1

Sparse tensors are a natural way of representing real-world data

slide-2
SLIDE 2

1

Sparse tensors are a natural way of representing real-world data

slide-3
SLIDE 3

1

Quality Durable Poor … 2 1 1 1 1 1 2 3 1 1 Kindle Dubliners The Iliad Monitor Sweater Laptop Candide Jacket … Peter Paul Mary Bob Sam Billy Lilly Hilde …

Sparse tensors are a natural way of representing real-world data

slide-4
SLIDE 4

1

Quality Durable Poor … 2 1 1 1 1 1 2 3 1 1 Kindle Dubliners The Iliad Monitor Sweater Laptop Candide Jacket … Peter Paul Mary Bob Sam Billy Lilly Hilde …

Sparse tensors are a natural way of representing real-world data

Dense storage: 107 exabytes Sparse storage: 13 gigabytes

slide-5
SLIDE 5

2

DOK MSR LNK ELL DIA CSR COO DCSC USS DCSR CSC DNS BDIA BCSR BCOO SELL SKY BELL LIL VBR JAD

There exists many different formats for storing tensors

BND CSB

slide-6
SLIDE 6

2

DOK MSR LNK ELL DIA CSR COO DCSC USS DCSR CSC DNS BDIA BCSR BCOO SELL SKY BELL LIL VBR JAD

There exists many different formats for storing tensors

BND CSB

Efficient insertions

slide-7
SLIDE 7

2

DOK MSR LNK ELL DIA CSR COO DCSC USS DCSR CSC DNS BDIA BCSR BCOO SELL SKY BELL LIL VBR JAD

There exists many different formats for storing tensors

BND CSB

Structured stencils

slide-8
SLIDE 8

2

DOK MSR LNK ELL DIA CSR COO DCSC USS DCSR CSC DNS BDIA BCSR BCOO SELL SKY BELL LIL VBR JAD

There exists many different formats for storing tensors

BND CSB

Unstructured mesh simulations

slide-9
SLIDE 9

3

Applications must work with tensors in different formats for performance

Construct tensor T Compute with tensor T

Time

slide-10
SLIDE 10

3

Applications must work with tensors in different formats for performance

Construct tensor T Compute with tensor T Construct tensor T in COO Compute with tensor T in COO

Only COO:

Time

slide-11
SLIDE 11

3

Applications must work with tensors in different formats for performance

Construct tensor T Compute with tensor T Construct tensor T in COO Compute with tensor T in COO

Only COO:

Construct tensor T in DIA Compute with tensor T in DIA

Only DIA:

Time

slide-12
SLIDE 12

3

Applications must work with tensors in different formats for performance

Construct tensor T Compute with tensor T Construct tensor T in COO Compute with tensor T in COO

Only COO:

Construct tensor T in DIA Compute with tensor T in DIA

Only DIA:

Time

slide-13
SLIDE 13

3

Applications must work with tensors in different formats for performance

Construct tensor T Compute with tensor T Construct tensor T in COO Compute with tensor T in COO

Only COO:

Construct tensor T in DIA Compute with tensor T in DIA

Only DIA:

Time

slide-14
SLIDE 14

3

Applications must work with tensors in different formats for performance

Construct tensor T Compute with tensor T

Hybrid:

Construct tensor T in COO Compute with tensor T in COO

Only COO:

Construct tensor T in DIA Compute with tensor T in DIA

Only DIA:

Time

Construct tensor T in COO Compute with tensor T in DIA

slide-15
SLIDE 15

3

Applications must work with tensors in different formats for performance

Construct tensor T Compute with tensor T COO → DIA

Hybrid:

Construct tensor T in COO Compute with tensor T in COO

Only COO:

Construct tensor T in DIA Compute with tensor T in DIA

Only DIA:

Time

Construct tensor T in COO Compute with tensor T in DIA

slide-16
SLIDE 16

4

Manually implementing support for efficient conversion between all combinations of formats is infeasible

ELL DIA BCSR COO JAD BND SKY

. . .

CSR

. . .

ELL DIA BCSR COO JAD BND SKY CSR

slide-17
SLIDE 17

4

Manually implementing support for efficient conversion between all combinations of formats is infeasible

ELL DIA BCSR COO JAD BND SKY

. . . . . .

CSR

. . .

ELL DIA BCSR COO JAD BND SKY CSR

slide-18
SLIDE 18

4

Manually implementing support for efficient conversion between all combinations of formats is infeasible

ELL DIA BCSR COO JAD BND SKY

. . . . . .

CSR

. . .

ELL DIA BCSR COO JAD BND SKY CSR

int K = 0; for (int i = 0; i < N; i++) { int ncols = A_pos[i+1] - A_pos[i]; K = max(K, ncols); } int* B_crd = new int[K * N](); double* B_vals = new double[K * N](); for (int i = 0; i < N; i++) { int count = 0; for (int pA2 = A_pos[i]; pA2 < A_pos[i+1]; pA2++) { int j = A_crd[pA2]; int k = count++; int pB2 = k * N + i; B_crd[pB2] = j; B_vals[pB2] = A_vals[pA2]; }} int count[N] = {0}; for (int pA1 = A_pos[0]; pA1 < A_pos[1]; pA1++) { int i = A1_crd[pA1]; count[i]++; } int* B_pos = new int[N + 1]; B_pos[0] = 0; for (int i = 0; i < N; i++) { B_pos[i + 1] = B_pos[i] + count[i]; } int* B_crd = new int[pos[N]]; double* B_vals = new double[pos[N]]; for (int pA1 = A_pos[0]; pA1 < A_pos[1]; pA1++) { int i = A1_crd[pA1]; int j = A2_crd[pA1]; int pB2 = pos[i]++; B_crd[pB2] = j; B_vals[pB2] = A_vals[pA2]; } for (int i = 0; i < N; i++) { B_pos[N - i] = B_pos[N - i - 1]; } B_pos[0] = 0;

bool nz[2 * N - 1] = {0}; for (int i = 0; i < N; i++) { for (int pA2 = A_pos[i]; pA2 < A_pos[i+1]; pA2++) { int j = A_crd[pA2]; int k = j - i; nz[k + N - 1] = true; }} int* B_perm = new int[2 * N - 1]; int K = 0; for (int i = -N + 1; i < N; i++) { if (nz[i + N - 1]) B_perm[K++] = i; } double* B_vals = new double[K * N](); int* B_rperm = new int[2 * N - 1]; for (int i = 0; i < K; i++) { B_rperm[B_perm[i] + N - 1] = i; } for (int i = 0; i < N; i++) { for (int pA2 = A_pos[i]; pA2 < A_pos[i+1]; pA2++) { int j = A_crd[pA2]; int k = j - i; int pB1 = B_rperm[k + N - 1]; int pB2 = pB1 * N + i; B_vals[pB2] = A_vals[pA2]; }}

slide-19
SLIDE 19

5

Hand-optimized libraries limit support for efficient conversion to few combinations of formats

CSR ELL DIA BCSR COO JAD BND SKY

. . . . . .

ELL DIA BCSR COO JAD BND SKY

. . . . . .

slide-20
SLIDE 20

5

Hand-optimized libraries limit support for efficient conversion to few combinations of formats

CSR ELL DIA BCSR COO JAD BND SKY

. . . . . .

ELL DIA BCSR COO JAD BND SKY

. . . . . .

slide-21
SLIDE 21

5

Hand-optimized libraries limit support for efficient conversion to few combinations of formats

CSR ELL DIA BCSR COO JAD BND SKY

. . . . . .

ELL DIA BCSR COO JAD BND SKY

. . . . . .

slide-22
SLIDE 22

5

Hand-optimized libraries limit support for efficient conversion to few combinations of formats

CSR ELL DIA BCSR COO JAD BND SKY

. . . . . .

ELL DIA BCSR COO JAD BND SKY

. . . . . .

slide-23
SLIDE 23

6

Inefficient conversion eliminates benefit of using different formats

Construct tensor T in COO Compute with tensor T in COO Construct tensor T in DIA Compute with tensor T in DIA

Only COO: Only DIA:

Time

Construct tensor T in COO Compute with tensor T in DIA COO → CSR

Hybrid w/ libraries:

CSR → DIA

slide-24
SLIDE 24

Automatic Generation of Efficient Sparse Tensor Format Conversion Routines

Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe

slide-25
SLIDE 25

8

A compiler can generate efficient conversion routines from standalone specifications for each tensor format

ELL DIA BCSR COO JAD BND SKY

. . . . . . . . .

CSR ELL DIA BCSR COO JAD BND SKY CSR

slide-26
SLIDE 26

8

A compiler can generate efficient conversion routines from standalone specifications for each tensor format

ELL DIA BCSR COO JAD BND SKY

. . . . . . . . .

CSR ELL DIA BCSR COO JAD BND SKY CSR

slide-27
SLIDE 27

8

A compiler can generate efficient conversion routines from standalone specifications for each tensor format

ELL DIA BCSR COO JAD BND SKY

. . . . . . . . .

CSR ELL DIA BCSR COO JAD BND SKY CSR

slide-28
SLIDE 28

8

A compiler can generate efficient conversion routines from standalone specifications for each tensor format

ELL DIA BCSR COO JAD BND SKY

. . . . . . . . .

CSR ELL DIA BCSR COO JAD BND SKY CSR

slide-29
SLIDE 29

8

A compiler can generate efficient conversion routines from standalone specifications for each tensor format

ELL DIA BCSR COO JAD BND SKY

. . . . . . . . .

CSR ELL DIA BCSR COO JAD BND SKY CSR

slide-30
SLIDE 30

9

Our technique generates efficient code

Normalized time 1 2 3 4 5 COO → CSR CSR → CSC CSR → DIA CSC → DIA COO → DIA

This work SPARSKIT Intel MKL

slide-31
SLIDE 31

9

Our technique generates efficient code

Normalized time 1 2 3 4 5 COO → CSR CSR → CSC CSR → DIA CSC → DIA COO → DIA

This work SPARSKIT Intel MKL

slide-32
SLIDE 32

10

Being able to generate efficient conversion routines lets users exploit different formats for performance

Construct tensor T in COO Compute with tensor T in COO Construct tensor T in DIA Compute with tensor T in DIA

Only COO: Only DIA:

Time

Construct tensor T in COO Compute with tensor T in DIA

Hybrid w/

  • ur approach:

Construct tensor T in COO Compute with tensor T in DIA COO → CSR

Hybrid w/ libraries:

CSR → DIA COO → DIA

slide-33
SLIDE 33

11

Coordinate Remappings Attribute Queries

slide-34
SLIDE 34

11

Coordinate Remappings Attribute Queries

slide-35
SLIDE 35

12

Different tensor formats arrange nonzeros in memory in different ways

A B F E C D H G J

slide-36
SLIDE 36

12

Different tensor formats arrange nonzeros in memory in different ways

0 2 4 7 9 pos crd 0 2 1 2 1 2 4 2 5 vals A B C D E F G H J

CSR A B F E C D H G J

slide-37
SLIDE 37

12

Different tensor formats arrange nonzeros in memory in different ways

4 N M 6 3 K

  • 1 0 2

perm vals C E H A D F B G J

DIA

0 2 4 7 9 pos crd 0 2 1 2 1 2 4 2 5 vals A B C D E F G H J

CSR A B F E C D H G J

slide-38
SLIDE 38

12

Different tensor formats arrange nonzeros in memory in different ways

4 N M 6 3 K

  • 1 0 2

perm vals C E H A D F B G J

DIA

0 2 4 7 9 pos crd 0 2 1 2 1 2 4 2 5 vals A B C D E F G H J

CSR

0 1 3 pos crd 0 0 1 vals 2 BI BJ 3 A B C D E F H G J

BCSR A B F E C D H G J

slide-39
SLIDE 39

13

Coordinate remapping captures how nonzeros are arranged in memory

A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J A B E C D H J

slide-40
SLIDE 40

13

Coordinate remapping captures how nonzeros are arranged in memory

A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J A B C D E F 1 1 2 2 i j 2 3 3 G H J 2 1 1 2 4 2 5

slide-41
SLIDE 41

13

Coordinate remapping captures how nonzeros are arranged in memory

A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J A B C D E F 1 1 2 2 i j 2 3 3 G H J 2 1 1 2 4 2 5 j-i 2

  • 1
  • 1

2

  • 1

2

slide-42
SLIDE 42

13

Coordinate remapping captures how nonzeros are arranged in memory

A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J i j j-i C E H A D F 1 2 3 1 2 2 3 B G J 1 2 1 2 2 4 5

  • 1 -1 -1

2 2 2

slide-43
SLIDE 43

13

Coordinate remapping captures how nonzeros are arranged in memory

A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J i j j-i C E H A D F 1 2 3 1 2 2 3 B G J 1 2 1 2 2 4 5

  • 1 -1 -1

2 2 2 C E H A D F 1 2 3 1 2 2 3 B G J 1 2 1 2 2 4 5

  • 1 -1 -1

2 2 2 A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J

slide-44
SLIDE 44

14

Coordinate remapping captures how nonzeros are arranged in memory

C E H A D F 1 2 3 1 2 i j 2 3 B G J 1 2 1 2 2 4 5

  • 1 -1 -1

j-i 2 2 2

slide-45
SLIDE 45

14

Coordinate remapping captures how nonzeros are arranged in memory

C E H A D 4 N M F B 6 3 K

  • 1

2 perm G J vals C E H A D F 1 2 3 1 2 i j 2 3 B G J 1 2 1 2 2 4 5

  • 1 -1 -1

j-i 2 2 2

slide-46
SLIDE 46

14

Coordinate remapping captures how nonzeros are arranged in memory

(i,j) -> (j-i,i,j) C E H A D 4 N M F B 6 3 K

  • 1

2 perm G J vals

slide-47
SLIDE 47

14

Coordinate remapping captures how nonzeros are arranged in memory

(i,j) -> (j-i,i,j) C E H A D 4 N M F B 6 3 K

  • 1

2 perm G J vals

slide-48
SLIDE 48

14

Coordinate remapping captures how nonzeros are arranged in memory

(i,j) -> (j-i,i,j) C E H A D 4 N M F B 6 3 K

  • 1

2 perm G J vals

slide-49
SLIDE 49

15

Compiler uses coordinate remapping to generate code to reorder nonzeros

(i,j) -> (j-i,i,j)

slide-50
SLIDE 50

15

Compiler uses coordinate remapping to generate code to reorder nonzeros

for (int bi = 0; bi < M / BI; bi++) { for (int bj = 0; bj < N / BJ; bj++) { for (int i = bi * BI; i < (bi + 1) * BI; i++) { for (int j = bj * BJ; j < (bj + 1) * BJ; j++) { if (B[i,j] != 0.0) { Identify segment d in vals that corresponds to j - i Identify position p in d that corresponds to i and j vals[p] = B[i,j] } } } } }

(i,j) -> (j-i,i,j)

slide-51
SLIDE 51

15

Compiler uses coordinate remapping to generate code to reorder nonzeros

for (int bi = 0; bi < M / BI; bi++) { for (int bj = 0; bj < N / BJ; bj++) { for (int i = bi * BI; i < (bi + 1) * BI; i++) { for (int j = bj * BJ; j < (bj + 1) * BJ; j++) { if (B[i,j] != 0.0) { Identify segment d in vals that corresponds to j - i Identify position p in d that corresponds to i and j vals[p] = B[i,j] } } } } }

(i,j) -> (j-i,i,j) 4 N M 6 3 K

  • 1

2 perm vals A B F E C D H G J

j = 0 1 2 3 i = 0 1 2 3 4 5

slide-52
SLIDE 52

15

Compiler uses coordinate remapping to generate code to reorder nonzeros

for (int bi = 0; bi < M / BI; bi++) { for (int bj = 0; bj < N / BJ; bj++) { for (int i = bi * BI; i < (bi + 1) * BI; i++) { for (int j = bj * BJ; j < (bj + 1) * BJ; j++) { if (B[i,j] != 0.0) { Identify segment d in vals that corresponds to j - i Identify position p in d that corresponds to i and j vals[p] = B[i,j] } } } } }

(i,j) -> (j-i,i,j) 4 N M 6 3 K

  • 1

2 perm vals A B F E C D H G J

j = 0 1 2 3 i = 0 1 2 3 4 5

slide-53
SLIDE 53

15

Compiler uses coordinate remapping to generate code to reorder nonzeros

for (int bi = 0; bi < M / BI; bi++) { for (int bj = 0; bj < N / BJ; bj++) { for (int i = bi * BI; i < (bi + 1) * BI; i++) { for (int j = bj * BJ; j < (bj + 1) * BJ; j++) { if (B[i,j] != 0.0) { Identify segment d in vals that corresponds to j - i Identify position p in d that corresponds to i and j vals[p] = B[i,j] } } } } }

(i,j) -> (j-i,i,j) 4 N M 6 3 K

  • 1

2 perm vals A B F E C D H G J

j = 0 1 2 3 i = 0 1 2 3 4 5

slide-54
SLIDE 54

15

Compiler uses coordinate remapping to generate code to reorder nonzeros

for (int bi = 0; bi < M / BI; bi++) { for (int bj = 0; bj < N / BJ; bj++) { for (int i = bi * BI; i < (bi + 1) * BI; i++) { for (int j = bj * BJ; j < (bj + 1) * BJ; j++) { if (B[i,j] != 0.0) { Identify segment d in vals that corresponds to j - i Identify position p in d that corresponds to i and j vals[p] = B[i,j] } } } } }

(i,j) -> (j-i,i,j) 4 N M 6 3 K

  • 1

2 perm vals A B F E C D H G J

j = 0 1 2 3 i = 0 1 2 3 4 5

A

slide-55
SLIDE 55

15

Compiler uses coordinate remapping to generate code to reorder nonzeros

for (int bi = 0; bi < M / BI; bi++) { for (int bj = 0; bj < N / BJ; bj++) { for (int i = bi * BI; i < (bi + 1) * BI; i++) { for (int j = bj * BJ; j < (bj + 1) * BJ; j++) { if (B[i,j] != 0.0) { Identify segment d in vals that corresponds to j - i Identify position p in d that corresponds to i and j vals[p] = B[i,j] } } } } }

4 N M 6 3 K

  • 1

2 perm vals A B F E C D H G J

j = 0 1 2 3 i = 0 1 2 3 4 5

A

slide-56
SLIDE 56

15

Compiler uses coordinate remapping to generate code to reorder nonzeros

for (int bi = 0; bi < M / BI; bi++) { for (int bj = 0; bj < N / BJ; bj++) { for (int i = bi * BI; i < (bi + 1) * BI; i++) { for (int j = bj * BJ; j < (bj + 1) * BJ; j++) { if (B[i,j] != 0.0) { Identify segment d in vals that corresponds to j - i Identify position p in d that corresponds to i and j vals[p] = B[i,j] } } } } }

4 N M 6 3 K

  • 1

2 perm vals A B F E C D H G J

j = 0 1 2 3 i = 0 1 2 3 4 5

A B C A B C A D B C E A D B C E A D F B C E H A D F B C E H A D F B J C E H A D F B G J

slide-57
SLIDE 57

16

Coordinate Remappings Attribute Queries

slide-58
SLIDE 58

17

Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor

A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals COO

slide-59
SLIDE 59

17

Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor

A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J A B C D E F 2 4 7 9 pos crd G H J 2 1 2 1 2 4 2 5 vals CSR

slide-60
SLIDE 60

17

Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor

A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J A B C D E F 2 4 7 9 pos crd G H J 2 1 2 1 2 4 2 5 vals CSR

slide-61
SLIDE 61

18

Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor

A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals pos crd vals

slide-62
SLIDE 62

18

Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor

A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals pos crd vals A

slide-63
SLIDE 63

18

Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor

A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals pos crd vals 1 1 1 1 1 1 1 1 1 1 A

slide-64
SLIDE 64

18

Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor

A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals pos crd vals 1 1 1 1 1 1 1 1 1 1 A A C 1 2 2 2 A C D 1 3 3 3 1 A C D E 1 3 4 4 1 1

slide-65
SLIDE 65

18

Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor

A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals pos crd vals 1 1 1 1 1 1 1 1 1 1 A A C 1 2 2 2 A C D 1 3 3 3 1 A C D E 1 3 4 4 1 1

slide-66
SLIDE 66

18

Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor

A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals pos crd vals 1 1 1 1 1 1 1 1 1 1 A A C 1 2 2 2 A C D 1 3 3 3 1 A C D E 1 3 4 4 1 1 A C D E 1 1 A C D E 1 1 A C D E 1 1 A B C D E 2 1 1 2 4 5 5

slide-67
SLIDE 67

18

Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor

A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals pos crd vals 1 1 1 1 1 1 1 1 1 1 A A C 1 2 2 2 A C D 1 3 3 3 1 A C D E 1 3 4 4 1 1 A C D E 1 1 A C D E 1 1 A C D E 1 1 A B C D E 2 1 1 2 4 5 5 A B C D E H 2 4 5 6 2 1 1 2 A B C D E H 2 4 5 7 J 2 1 1 2 5 A B C D E H J 2 1 1 2 5 A B C D E H J 2 1 1 2 5 A B C D E F 2 4 6 8 H J 2 1 1 2 2 5 A B C D E F H J 2 1 1 2 2 5 A B C D E F H J 2 1 1 2 2 5 A B C D E F 2 4 7 9 G H J 2 1 1 2 4 2 5

slide-68
SLIDE 68

19

Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor

pos crd vals i nnz 2 1 2 2 3 3 2 A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals

slide-69
SLIDE 69

19

Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor

pos crd vals i nnz 2 1 2 2 3 3 2 A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals 2 2 4 2 4 7 2 4 7 9

slide-70
SLIDE 70

19

Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor

pos crd vals i nnz 2 1 2 2 3 3 2 A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals 2 2 4 2 4 7 2 4 7 9

slide-71
SLIDE 71

19

Reordering a tensor’s nonzeros without explicitly sorting them requires knowing statistics about the tensor

pos crd vals i nnz 2 1 2 2 3 3 2 C C D 1 H 2 H J 2 5 A B 2 E 1 F 2 G 4 A C D E B H 1 1 2 3 rows cols 3 2 2 J F G 1 1 2 2 5 2 4 vals 2 2 4 2 4 7 2 4 7 9

slide-72
SLIDE 72

20

Converting tensors to different formats requires knowing different statistics about the tensors

A B F E C D H G J SKY:

slide-73
SLIDE 73

20

Converting tensors to different formats requires knowing different statistics about the tensors

A B F E C D H G J SKY: A B F E C D H G J lb ub BND:

slide-74
SLIDE 74

20

Converting tensors to different formats requires knowing different statistics about the tensors

A B F E C D H G J SKY: A B F E C D H G J lb ub BND: A B F E C D H G J DIA: −3 −2 −1 1 2

slide-75
SLIDE 75

21

Attribute queries express tensor statistics as aggregations

  • ver the coordinates of nonzeros

select [i] -> count(j) as nnz A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J

slide-76
SLIDE 76

21

Attribute queries express tensor statistics as aggregations

  • ver the coordinates of nonzeros

select [i] -> count(j) as nnz A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J

slide-77
SLIDE 77

21

Attribute queries express tensor statistics as aggregations

  • ver the coordinates of nonzeros

select [i] -> count(j) as nnz A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J

slide-78
SLIDE 78

21

Attribute queries express tensor statistics as aggregations

  • ver the coordinates of nonzeros

select [i] -> count(j) as nnz i nnz 2 1 2 2 3 3 2 A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J

slide-79
SLIDE 79

21

Attribute queries express tensor statistics as aggregations

  • ver the coordinates of nonzeros

select [i] -> count(j) as nnz i nnz 2 1 2 2 3 3 2 A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J

slide-80
SLIDE 80

22

Compiler generates code to compute attribute queries by reducing them to sparse tensor computations

select [i] -> count(j) as Q

∀i∀j Qi += map(Bij, 1)

<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>

A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J

slide-81
SLIDE 81

22

Compiler generates code to compute attribute queries by reducing them to sparse tensor computations

select [i] -> count(j) as Q 1 1

j = 0 1 2 3 i = 0 1 2

1 1 1 1

3

1

4 5

1 1

∀i∀j Qi += map(Bij, 1)

<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>

A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J

slide-82
SLIDE 82

22

Compiler generates code to compute attribute queries by reducing them to sparse tensor computations

select [i] -> count(j) as Q 1 1

j = 0 1 2 3 i = 0 1 2

1 1 1 1

3

1

4 5

1 1 2 3 2 2

1 2 3

Q

∀i∀j Qi += map(Bij, 1)

<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>

A B

j = 0 1 2 3 i = 0 1 2

F E C D

3

H

4 5

G J

slide-83
SLIDE 83

23

select [i] -> count(j) as Q

∀i∀j Qi += map(Bij, 1)

<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>

Compiler generates code to compute attribute queries by reducing them to sparse tensor computations

slide-84
SLIDE 84

23

select [i] -> count(j) as Q

for (int j = 0; j < N; j++) { for (int pB = pos[j]; pB < pos[j+1]; pB++) { int i = crd[pB2]; Q[i] += 1; } }

B is CSC

∀i∀j Qi += map(Bij, 1)

<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>

Compiler generates code to compute attribute queries by reducing them to sparse tensor computations

slide-85
SLIDE 85

23

select [i] -> count(j) as Q

for (int j = 0; j < N; j++) { for (int pB = pos[j]; pB < pos[j+1]; pB++) { int i = crd[pB2]; Q[i] += 1; } }

B is CSC

∀i∀j Qi += map(Bij, 1)

<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>

Compiler generates code to compute attribute queries by reducing them to sparse tensor computations

slide-86
SLIDE 86

23

select [i] -> count(j) as Q

for (int j = 0; j < N; j++) { for (int pB = pos[j]; pB < pos[j+1]; pB++) { int i = crd[pB2]; Q[i] += 1; } }

B is CSC

∀i∀j Qi += map(Bij, 1)

<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>

for (int pB = 0; pB < NNZ; pB++) { int i = rows[pB]; Q[i] += 1; }

B is COO

Compiler generates code to compute attribute queries by reducing them to sparse tensor computations

slide-87
SLIDE 87

23

select [i] -> count(j) as Q

for (int j = 0; j < N; j++) { for (int pB = pos[j]; pB < pos[j+1]; pB++) { int i = crd[pB2]; Q[i] += 1; } }

B is CSC

∀i∀j Qi += map(Bij, 1)

<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>

for (int pB = 0; pB < NNZ; pB++) { int i = rows[pB]; Q[i] += 1; }

B0

i ≡ (pos[i+1] − pos[i])

<latexit sha1_base64="HaH7Ltw/APGMaeJOrZa8AB6wifo=">ACGnicbVDLSgNBEJyNrxhfUY9eBoMYEcOuBNSbxIvHCYK2WZnXTM4OzDmV4xLPkOL/6KFw+KeBMv/o2Tx8FECxqKqm6u4JECo2/W3lZmbn5hfyi4Wl5ZXVteL6RlPHqeLQ4LGM1XANEgRQMFSrhOFLAwkHAV3J4N/Kt7UFrE0SX2EvBCdhOJjuAMjeQXndquL6gLd6m4p2UX4QERsyTWLbHveH16QCc0r7/nF0t2xR6C/iXOmJTIGHW/+Om2Y56GECGXTOuWYyfoZUyh4BL6BTfVkDB+y26gZWjEQtBeNnytT3eM0qadWJmKkA7V3xMZC7XuhYHpDBl29bQ3EP/zWil2jr1MREmKEPHRok4qKcZ0kBNtCwUcZc8QxpUwt1LeZYpxNGkWTAjO9Mt/SfOw4lQrJxfV0mltHEebJFtUiYOSKn5JzUSYNw8kieySt5s56sF+vd+hi15qzxzCaZgPX1A4WOoJo=</latexit>

∀i Qi = B0

i

<latexit sha1_base64="gy7nyxWiKB53mq4z7gsxf6K8IKk=">ACAnicbVDLSsNAFL3xWesr6krcDBbRVUmkoCJCqRuXLdgHNCFMpN26OTBzEQobjxV9y4UMStX+HOv3HaZqGtBy4czrmXe+/xE86ksqxvY2l5ZXVtvbBR3Nza3tk19/ZbMk4FoU0S81h0fCwpZxFtKqY47SC4tDntO0Pbyd+4EKyeLoXo0S6oa4H7GAEay05JmHThALzLmXsTFyrlHDY+gG1U495pklq2xNgRaJnZMS5Kh75pfTi0ka0kgRjqXs2lai3AwLxQin46KTSpgMsR92tU0wiGVbjZ9YxOtNJD+hZdkUJT9fdEhkMpR6GvO0OsBnLem4j/ed1UBZduxqIkVTQis0VBypGK0SQP1GOCEsVHmAimL4VkQEWmCidWlGHYM+/vEha52W7Ur5qVErVWh5HAY7gGM7Ahguowh3UoQkEHuEZXuHNeDJejHfjY9a6ZOQzB/AHxucPCrqV6w=</latexit>

B is COO B is CSR

Compiler generates code to compute attribute queries by reducing them to sparse tensor computations

slide-88
SLIDE 88

23

select [i] -> count(j) as Q

for (int j = 0; j < N; j++) { for (int pB = pos[j]; pB < pos[j+1]; pB++) { int i = crd[pB2]; Q[i] += 1; } }

B is CSC

∀i∀j Qi += map(Bij, 1)

<latexit sha1_base64="ydAryB2Ef+Csvamj2nB4eliHJ8w=">ACKnicbZBNSwMxEIaz9avWr6pHL8EiKErZlYKCLVePCrYWuiWJZtONZr9IJkVy7K/x4t/xUsPSvHqDzGtRdT6QuDhnRky8/qxFBpte2DlpqZnZufy84WFxaXleLqWkNHieJQ5GMVNnGqQIoY4CJTRjBSzwJVz792fD+vUDKC2i8Ap7MbQDdhOKruAMjeUVT91upJiUXioy+s13ho/pSeoGzC8VSDT3eyEugiPmAYszrZrZuAu26POjlcs2WV7JDoJzhKZKwLr9h3OxFPAgiRS6Z1y7FjbKdMoeASsoKbaIgZv2c30DIYsgB0Ox2dmtEt43So2dO8EOnI/TmRskDrXuCbzuHm+m9taP5XayXYPWynIowThJB/fdRNJMWIDnOjHaGAo+wZYFwJsyvlt0wxjibdgnB+XvyJDT2y06lfHRZKVr4zjyZINskm3ikANSJefkgtQJ0/khbySN+vZ6lsD6/2rNWeNZ9bJL1kfnxRSpwQ=</latexit>

for (int pB = 0; pB < NNZ; pB++) { int i = rows[pB]; Q[i] += 1; } for (int i = 0; i < N; i++) { Q[i] = pos[i+1] - pos[i]; }

B0

i ≡ (pos[i+1] − pos[i])

<latexit sha1_base64="HaH7Ltw/APGMaeJOrZa8AB6wifo=">ACGnicbVDLSgNBEJyNrxhfUY9eBoMYEcOuBNSbxIvHCYK2WZnXTM4OzDmV4xLPkOL/6KFw+KeBMv/o2Tx8FECxqKqm6u4JECo2/W3lZmbn5hfyi4Wl5ZXVteL6RlPHqeLQ4LGM1XANEgRQMFSrhOFLAwkHAV3J4N/Kt7UFrE0SX2EvBCdhOJjuAMjeQXndquL6gLd6m4p2UX4QERsyTWLbHveH16QCc0r7/nF0t2xR6C/iXOmJTIGHW/+Om2Y56GECGXTOuWYyfoZUyh4BL6BTfVkDB+y26gZWjEQtBeNnytT3eM0qadWJmKkA7V3xMZC7XuhYHpDBl29bQ3EP/zWil2jr1MREmKEPHRok4qKcZ0kBNtCwUcZc8QxpUwt1LeZYpxNGkWTAjO9Mt/SfOw4lQrJxfV0mltHEebJFtUiYOSKn5JzUSYNw8kieySt5s56sF+vd+hi15qzxzCaZgPX1A4WOoJo=</latexit>

∀i Qi = B0

i

<latexit sha1_base64="gy7nyxWiKB53mq4z7gsxf6K8IKk=">ACAnicbVDLSsNAFL3xWesr6krcDBbRVUmkoCJCqRuXLdgHNCFMpN26OTBzEQobjxV9y4UMStX+HOv3HaZqGtBy4czrmXe+/xE86ksqxvY2l5ZXVtvbBR3Nza3tk19/ZbMk4FoU0S81h0fCwpZxFtKqY47SC4tDntO0Pbyd+4EKyeLoXo0S6oa4H7GAEay05JmHThALzLmXsTFyrlHDY+gG1U495pklq2xNgRaJnZMS5Kh75pfTi0ka0kgRjqXs2lai3AwLxQin46KTSpgMsR92tU0wiGVbjZ9YxOtNJD+hZdkUJT9fdEhkMpR6GvO0OsBnLem4j/ed1UBZduxqIkVTQis0VBypGK0SQP1GOCEsVHmAimL4VkQEWmCidWlGHYM+/vEha52W7Ur5qVErVWh5HAY7gGM7Ahguowh3UoQkEHuEZXuHNeDJejHfjY9a6ZOQzB/AHxucPCrqV6w=</latexit>

B is COO B is CSR

Compiler generates code to compute attribute queries by reducing them to sparse tensor computations

slide-89
SLIDE 89

24

In conclusion…

Efficient sparse tensor conversion routines can be automatically generated from per-format specifications This work was supported by:

tensor-compiler.org

slide-90
SLIDE 90

24

In conclusion…

Efficient sparse tensor conversion routines can be automatically generated from per-format specifications Adding support for new sparse tensor formats is straightforward This work was supported by:

tensor-compiler.org

slide-91
SLIDE 91

24

In conclusion…

Efficient sparse tensor conversion routines can be automatically generated from per-format specifications Our technique makes it simple to fully exploit disparate tensor formats for performance Adding support for new sparse tensor formats is straightforward This work was supported by:

tensor-compiler.org