Format Abstraction for Sparse Tensor Algebra Compilers Stephen Chou , - - PowerPoint PPT Presentation

format abstraction for sparse tensor algebra compilers
SMART_READER_LITE
LIVE PREVIEW

Format Abstraction for Sparse Tensor Algebra Compilers Stephen Chou , - - PowerPoint PPT Presentation

Format Abstraction for Sparse Tensor Algebra Compilers Stephen Chou , Fredrik Kjolstad, and Saman Amarasinghe Sparse tensors are a natural way of representing real-world data 2 Sparse tensors are a natural way of representing real-world data 2


slide-1
SLIDE 1

Format Abstraction for Sparse Tensor Algebra Compilers

Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe

slide-2
SLIDE 2

2

Sparse tensors are a natural way of representing real-world data

slide-3
SLIDE 3

2

Sparse tensors are a natural way of representing real-world data

slide-4
SLIDE 4

2

Q u a l i t y D u r a b l e P

  • r

… 2 1 1 1 1 1 2 3 1 1 Kindle Dubliners The Iliad Monitor Sweater Laptop Candide Jacket … Peter Paul Mary Bob Sam Billy Lilly Hilde …

Users Words Products

Sparse tensors are a natural way of representing real-world data

slide-5
SLIDE 5

2

Q u a l i t y D u r a b l e P

  • r

… 2 1 1 1 1 1 2 3 1 1 Kindle Dubliners The Iliad Monitor Sweater Laptop Candide Jacket … Peter Paul Mary Bob Sam Billy Lilly Hilde …

Users Words Products

Sparse tensors are a natural way of representing real-world data

Dense storage: 107 exabytes Sparse storage: 13 gigabytes

slide-6
SLIDE 6

3

ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR

CSC

Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL

Many different formats for storing tensors exist

slide-7
SLIDE 7

3

ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR

CSC

Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL Hash maps Sparse vector Dense array vector

Many different formats for storing tensors exist

slide-8
SLIDE 8

3

ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR

CSC

Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL Hash maps Sparse vector Dense array vector CSF Coordinate tensor Dense array tensor Mode-generic tensor HiCOO F-COO

Many different formats for storing tensors exist

slide-9
SLIDE 9

Thermal simulation

3

ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR

CSC

Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL Hash maps Sparse vector Dense array vector CSF Coordinate tensor Dense array tensor Mode-generic tensor HiCOO F-COO

Many different formats for storing tensors exist

slide-10
SLIDE 10

Unstructured mesh simulation [Bell and Garland 2009] Thermal simulation

3

ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR

CSC

Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL Hash maps Sparse vector Dense array vector CSF Coordinate tensor Dense array tensor Mode-generic tensor HiCOO F-COO

Many different formats for storing tensors exist

slide-11
SLIDE 11

Unstructured mesh simulation [Bell and Garland 2009] Thermal simulation

3

CNN with block-sparse weights [Gray et al. 2017]

ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR

CSC

Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL Hash maps Sparse vector Dense array vector CSF Coordinate tensor Dense array tensor Mode-generic tensor HiCOO F-COO

Many different formats for storing tensors exist

slide-12
SLIDE 12

Unstructured mesh simulation [Bell and Garland 2009] Thermal simulation Image processing

3

CNN with block-sparse weights [Gray et al. 2017]

ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR

CSC

Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL Hash maps Sparse vector Dense array vector CSF Coordinate tensor Dense array tensor Mode-generic tensor HiCOO F-COO

Many different formats for storing tensors exist

slide-13
SLIDE 13

Unstructured mesh simulation [Bell and Garland 2009] Thermal simulation Data analytics Image processing

3

CNN with block-sparse weights [Gray et al. 2017]

ELLPACK DIA CSR Coordinate matrix DCSC CSB DCSR

CSC

Dense array matrix Block DIA BCSR BCOO SELL Skyline BELL LIL Hash maps Sparse vector Dense array vector CSF Coordinate tensor Dense array tensor Mode-generic tensor HiCOO F-COO

Many different formats for storing tensors exist

slide-14
SLIDE 14

4

There is no universally superior tensor format

slide-15
SLIDE 15

4

There is no universally superior tensor format

slide-16
SLIDE 16

4

There is no universally superior tensor format

slide-17
SLIDE 17

4

There is no universally superior tensor format

slide-18
SLIDE 18

4

There is no universally superior tensor format

Normalized time 0.0 0.5 1.0 1.5 CSR DIA

y = Ax

CSR DIA 186x CSR BCSR

slide-19
SLIDE 19

4

There is no universally superior tensor format

Normalized time 0.0 0.5 1.0 1.5 CSR DIA

y = Ax

CSR DIA 186x CSR BCSR

slide-20
SLIDE 20

Format Abstraction for Sparse Tensor Algebra Compilers

Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe

slide-21
SLIDE 21

6

[Kjolstad et al. 2017] This work

Code Generator

4

N pos

0 2 4 4

crd

0 1 1 7 3 4

Code Generator

6

M

4

N

  • ffset -3

0 1

  • 1

crd 0 1 0

3 4 1

W

6

crd 0 1

4 6

  • 1
  • 1
  • ffset -3

0 1

  • 1

4

N pos

0 2 4 4

crd

0 1 1 7 3 4

slide-22
SLIDE 22

6

[Kjolstad et al. 2017] This work

Code Generator

4

N pos

0 2 4 4

crd

0 1 1 7 3 4

Code Generator

6

M

4

N

  • ffset -3

0 1

  • 1

crd 0 1 0

3 4 1

W

6

crd 0 1

4 6

  • 1
  • 1
  • ffset -3

0 1

  • 1

4

N pos

0 2 4 4

crd

0 1 1 7 3 4

Format abstraction

slide-23
SLIDE 23

6

[Kjolstad et al. 2017] This work

Code Generator

4

N pos

0 2 4 4

crd

0 1 1 7 3 4

Code Generator

6

M

4

N

  • ffset -3

0 1

  • 1

crd 0 1 0

3 4 1

W

6

crd 0 1

4 6

  • 1
  • 1
  • ffset -3

0 1

  • 1

4

N pos

0 2 4 4

crd

0 1 1 7 3 4

slide-24
SLIDE 24

F C D E A B

7

A B

1 2 3 1 2

F C D E

Storing sparse tensors efficiently requires additional metadata

slide-25
SLIDE 25

F C D E A B

7

A B

1 2 3 4 5 6 7 8 9 10 11 1 2 3 1 2

F C D E

Storing sparse tensors efficiently requires additional metadata

slide-26
SLIDE 26

F C D E A B

7

A B

1 2 3 4 5 6 7 8 9 10 11 1 2 3 1 2

F C D E

Storing sparse tensors efficiently requires additional metadata

slide-27
SLIDE 27

F C D E A B

7

A B

1 2 3 4 5 6 7 8 9 10 11

row(6) = 6 / 4 = 1 col(6) = 6 % 4 = 2

1 2 3 1 2

F C D E

Storing sparse tensors efficiently requires additional metadata

slide-28
SLIDE 28

F C D E A B

7

A B

1 2 3 4 5 6 7 8 9 10 11 1 2 3 1 2

F C D E

Storing sparse tensors efficiently requires additional metadata

slide-29
SLIDE 29

F C D E A B

7

A B

1 2 3 4 5 6 7 8 9 10 11

locate(1,2) = 1 * 4 + 2

1 2 3 1 2

F C D E

= 6

Storing sparse tensors efficiently requires additional metadata

slide-30
SLIDE 30

8

A B C D E F

1 2 3 4 5 6 7 8 9 10 11 1 2 3 1 2

F C D E A B

Storing sparse tensors efficiently requires additional metadata

slide-31
SLIDE 31

8

A B C D E F

1 2 3 4 5 1 2 3 1 2

F C D E A B

Storing sparse tensors efficiently requires additional metadata

slide-32
SLIDE 32

8

A B C D E F row(3) = ??? col(3) = ???

1 2 3 4 5 1 2 3 1 2

F C D E A B

Storing sparse tensors efficiently requires additional metadata

slide-33
SLIDE 33

9

A B C D E F

1 2 3 4 5 1 2 3 1 2

F C D E A B

Coordinates of tensor elements can be encoded in many ways

slide-34
SLIDE 34

9

A B C D E F 2 1 2 3 3

1 1

1 2 rows cols

1 2 3 4 5 1 2 3 1 2

F C D E A B

Coordinates of tensor elements can be encoded in many ways

Coordinate

slide-35
SLIDE 35

9

A B C D E F 2 1 2 3 3

1 1

1 2 rows cols

1 2 3 4 5 1 2 3 1 2

F C D E A B

Coordinates of tensor elements can be encoded in many ways

Coordinate

slide-36
SLIDE 36

9

A B C D E F 2 1 2 3 3 2 5 6 cols pos

1 2 3 4 5 1 2 3 1 2

F C D E A B

Coordinates of tensor elements can be encoded in many ways

CSR

1 2 3

slide-37
SLIDE 37

9

A B C D E F 2 1 2 3 3 2 5 6 cols pos

1 2 3 4 5 1 2 3 1 2

F C D E A B

Coordinates of tensor elements can be encoded in many ways

CSR

1 2 3

slide-38
SLIDE 38

9

A B C D E F 2 1 2 3 3 2 5 6 cols pos

1 2 3 4 5 1 2 3 1 2

F C D E A B

Coordinates of tensor elements can be encoded in many ways

CSR

1 2 3

slide-39
SLIDE 39

9

A B C D E F 2 1 2 3 3 2 5 6 cols pos

1 2 3 4 5 1 2 3 1 2

F C D E A B

Coordinates of tensor elements can be encoded in many ways

CSR

1 2 3

slide-40
SLIDE 40

9

A B C D E F 2 1 2 3 3 2 5 6 cols pos

1 2 3 4 5 1 2 3 1 2

F C D E A B

Coordinates of tensor elements can be encoded in many ways

CSR

1 2 3

slide-41
SLIDE 41

10

Computing with different formats can require very different code

A = B ∘ C

slide-42
SLIDE 42

10

for (int pB = B1_pos[0]; pB < B1_pos[1]; pB++) { int i = B1_crd[pB]; int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC];

}

Coordinate ✕ Dense array

Computing with different formats can require very different code

A = B ∘ C

slide-43
SLIDE 43

10

for (int pB = B1_pos[0]; pB < B1_pos[1]; pB++) { int i = B1_crd[pB]; int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC];

}

Coordinate ✕ Dense array CSR ✕ Dense array

Computing with different formats can require very different code

A = B ∘ C

for (int i = 0; i < M; i++) { for (int pB = B2_pos[i]; pB < B2_pos[i + 1]; pB++) { int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC]; } }

slide-44
SLIDE 44

10

for (int pB = B1_pos[0]; pB < B1_pos[1]; pB++) { int i = B1_crd[pB]; int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC];

}

int pC1 = C1_pos[0]; while (pC1 < C1_pos[1]) { int i = C1_crd[pC1]; int C1_segend = pC1 + 1; while (C1_segend < C1_pos[1] && C1_crd[C1_segend] == i) C1_segend++; int pB2 = B2_pos[i]; int pC2 = pC1; while (pB2 < B2_pos[i + 1] && pC2 < C1_segend) { int jB2 = B2_crd[pB2]; int jC2 = C2_crd[pC2]; int j = min(jB2, jC2); int pA = i * N + j; if (jB2 == j && jC2 == j) A[pA] = B[pB2] * C[pC2]; if (jB2 == j) pB2++; if (jC2 == j) pC2++; } pC1 = C1_segend; }

Coordinate ✕ Dense array CSR ✕ Dense array CSR ✕ Coordinate

Computing with different formats can require very different code

A = B ∘ C

for (int i = 0; i < M; i++) { for (int pB = B2_pos[i]; pB < B2_pos[i + 1]; pB++) { int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC]; } }

slide-45
SLIDE 45

Hand-coding support for a wide range of formats is infeasible

A = B ∘ C

Coordinate ✕ Dense array CSR ✕ Dense array CSR ✕ Coordinate

11

slide-46
SLIDE 46

Hand-coding support for a wide range of formats is infeasible

A = B ∘ C

Coordinate ✕ Dense array CSR ✕ Dense array CSR ✕ Coordinate

11

slide-47
SLIDE 47

Hand-coding support for a wide range of formats is infeasible

A = B ∘ C

Coordinate ✕ Dense array CSR ✕ Dense array CSR ✕ Coordinate

Dense array ✕ Dense array Coordinate ✕ Coordinate CSR ✕ CSR

DIA ✕ DIA DIA ✕ Dense array DIA ✕ Coordinate DIA ✕ CSR

ELLPACK ✕ ELLPACK ELLPACK ✕ Dense array ELLPACK ✕ Coordinate ELLPACK ✕ CSR ELLPACK ✕ DIA

BCSR ✕ BCSR BCSR ✕ Dense array BCSR ✕ Coordinate

11

slide-48
SLIDE 48

Hand-coding support for a wide range of formats is infeasible

A = B ∘ C

Coordinate ✕ Dense array CSR ✕ Dense array CSR ✕ Coordinate

Dense array ✕ Dense array Coordinate ✕ Coordinate CSR ✕ CSR

DIA ✕ DIA DIA ✕ Dense array DIA ✕ Coordinate DIA ✕ CSR

ELLPACK ✕ ELLPACK ELLPACK ✕ Dense array ELLPACK ✕ Coordinate ELLPACK ✕ CSR ELLPACK ✕ DIA

BCSR ✕ BCSR BCSR ✕ Dense array BCSR ✕ Coordinate

A = B ∘ C ∘ D

Dense array ✕ CSR ✕ CSR Coordinate ✕ CSR ✕ CSR CSR ✕ CSR ✕ CSR

Dense array ✕ Coordinate ✕ CSR Dense array ✕ Dense array ✕ CSR Coordinate ✕ Coordinate ✕ CSR

DIA ✕ Coordinate ✕ Dense array DIA ✕ Coordinate ✕ CSR DIA ✕ Dense array ✕ CSR DIA ✕ CSR ✕ CSR

DIA ✕ Coordinate ✕ Coordinate DIA ✕ Dense array ✕ Dense array DIA ✕ DIA ✕ CSR DIA ✕ DIA ✕ Coordinate DIA ✕ DIA ✕ Dense array

ELLPACK ✕ ELLPACK ✕ DIA ELLPACK ✕ CSR ✕ DIA ELLPACK ✕ BCSR ✕ DIA DIA ✕ DIA ✕ DIA 11

slide-49
SLIDE 49

Hand-coding support for a wide range of formats is infeasible

A = B ∘ C

Coordinate ✕ Dense array CSR ✕ Dense array CSR ✕ Coordinate

Dense array ✕ Dense array Coordinate ✕ Coordinate CSR ✕ CSR

DIA ✕ DIA DIA ✕ Dense array DIA ✕ Coordinate DIA ✕ CSR

ELLPACK ✕ ELLPACK ELLPACK ✕ Dense array ELLPACK ✕ Coordinate ELLPACK ✕ CSR ELLPACK ✕ DIA

BCSR ✕ BCSR BCSR ✕ Dense array BCSR ✕ Coordinate

A = B ∘ C ∘ D

Dense array ✕ CSR ✕ CSR Coordinate ✕ CSR ✕ CSR CSR ✕ CSR ✕ CSR

Dense array ✕ Coordinate ✕ CSR Dense array ✕ Dense array ✕ CSR Coordinate ✕ Coordinate ✕ CSR

DIA ✕ Coordinate ✕ Dense array DIA ✕ Coordinate ✕ CSR DIA ✕ Dense array ✕ CSR DIA ✕ CSR ✕ CSR

DIA ✕ Coordinate ✕ Coordinate DIA ✕ Dense array ✕ Dense array DIA ✕ DIA ✕ CSR DIA ✕ DIA ✕ Coordinate DIA ✕ DIA ✕ Dense array

ELLPACK ✕ ELLPACK ✕ DIA ELLPACK ✕ CSR ✕ DIA ELLPACK ✕ BCSR ✕ DIA DIA ✕ DIA ✕ DIA

y = Ax + z

Dense array ✕ Dense array ✕ Dense array Dense array ✕ Dense array ✕ Sparse vector Dense array ✕ Dense array ✕ Hash map

Dense array ✕ Sparse vector ✕ Sparse vector Dense array ✕ Sparse vector ✕ Hash map Dense array ✕ Hash map ✕ Sparse vector

Dense array ✕ Sparse vector ✕ Dense array Coordinate ✕ Dense array ✕ Dense array Coordinate ✕ Sparse vector ✕ Dense array Coordinate ✕ Dense array ✕ Hash map

Coordinate ✕ Sparse vector ✕ Hash map Coordinate ✕ Hash map ✕ Sparse vector CSR ✕ Dense array ✕ Dense array CSR ✕ Dense array ✕ Sparse vector CSR ✕ Hash map ✕ Sparse vector

CSR ✕ Hash map ✕ Dense array CSR ✕ Sparse vector ✕ Dense array DIA ✕ Dense array ✕ Dense array DIA ✕ Hash map ✕ Dense array ELLPACK ✕ Dense array ✕ Sparse vector 11

slide-50
SLIDE 50

12

Evaluation

Mode-generic tensor Compressed Singleton Dense Dense DIA Dense Range Offset

x supports locate? y supports locate? y supports locate? x unordered and y ordered? co-iterate over x and y iterate over x and locate into y iterate over y and locate into x no yes no yes yes no no yes

Format Abstraction & Code Generation

slide-51
SLIDE 51

12

Evaluation

Mode-generic tensor Compressed Singleton Dense Dense DIA Dense Range Offset

x supports locate? y supports locate? y supports locate? x unordered and y ordered? co-iterate over x and y iterate over x and locate into y iterate over y and locate into x no yes no yes yes no no yes

Format Abstraction & Code Generation

slide-52
SLIDE 52

12

Evaluation

Mode-generic tensor Compressed Singleton Dense Dense DIA Dense Range Offset

x supports locate? y supports locate? y supports locate? x unordered and y ordered? co-iterate over x and y iterate over x and locate into y iterate over y and locate into x no yes no yes yes no no yes

Format Abstraction & Code Generation

slide-53
SLIDE 53

13

J G H

3 2 1 1 1 2

B C A F E D

Tensor formats can be viewed as compositions of level formats

slide-54
SLIDE 54

13

J G H

3 2 1 1 1 2

B C A F E D

3 5 9 3 1 1 1 1 1 2 3 1 1 3 3 A B C D E F G H J

1 2 3 4 5 6 7 8

Tensor formats can be viewed as compositions of level formats

slide-55
SLIDE 55

13

J G H

3 2 1 1 1 2

B C A F E D

3 5 9 3 1 1 1 1 1 2 3 1 1 3 3 A B C D E F G H J

1 2 3 4 5 6 7 8

Tensor formats can be viewed as compositions of level formats

slide-56
SLIDE 56

13

J G H

3 2 1 1 1 2

B C A F E D

3 5 9 3 1 1 1 1 1 2 3 1 1 3 3 A B C D E F G H J

1 2 3 4 5 6 7 8

Slices

Tensor formats can be viewed as compositions of level formats

slide-57
SLIDE 57

13

J G H

3 2 1 1 1 2

B C A F E D

3 5 9 3 1 1 1 1 1 2 3 1 1 3 3 A B C D E F G H J

1 2 3 4 5 6 7 8

Rows

Tensor formats can be viewed as compositions of level formats

slide-58
SLIDE 58

13

J G H

3 2 1 1 1 2

B C A F E D

3 5 9 3 1 1 1 1 1 2 3 1 1 3 3 A B C D E F G H J

1 2 3 4 5 6 7 8

Columns

Tensor formats can be viewed as compositions of level formats

slide-59
SLIDE 59

13

J G H

3 2 1 1 1 2

B C A F E D

3 5 9 3 1 1 1 1 1 2 3 1 1 3 3 A B C D E F G H J

1 2 3 4 5 6 7 8

Compressed Singleton Dense

Tensor formats can be viewed as compositions of level formats

slide-60
SLIDE 60

14

Dense Compressed Singleton

The same level formats can be composed in many ways

slide-61
SLIDE 61

14

Dense Compressed Singleton

A B

1 2 3 1 2

F C D E

The same level formats can be composed in many ways

slide-62
SLIDE 62

14

Dense Compressed Singleton

A B

1 2 3 1 2

F C D E

3

Dense

The same level formats can be composed in many ways

slide-63
SLIDE 63

14

A B C D E F

Dense Compressed Singleton

A B

1 2 3 1 2

F C D E

3 0 2 2 3 3 1 0 2 5 6

Dense Compressed

The same level formats can be composed in many ways

slide-64
SLIDE 64

14

A B C D E F

Dense Compressed Singleton

A B

1 2 3 1 2

F C D E

3 0 2 2 3 3 1 0 2 5 6

The same level formats can be composed in many ways

CSR{

slide-65
SLIDE 65

15

Dense Compressed Singleton

A B

1 2 3 1 2

F C D E

The same level formats can be composed in many ways

slide-66
SLIDE 66

15

Dense Compressed Singleton

0 0 1 1 2 1 0 6

A B

1 2 3 1 2

F C D E

Compressed

The same level formats can be composed in many ways

slide-67
SLIDE 67

15

A B C D E F 0 2 2 3 3 1

Dense Compressed Singleton

0 0 1 1 2 1 0 6

A B

1 2 3 1 2

F C D E

Compressed Singleton

The same level formats can be composed in many ways

slide-68
SLIDE 68

15

A B C D E F 0 2 2 3 3 1

Dense Compressed Singleton

0 0 1 1 2 1 0 6

A B

1 2 3 1 2

F C D E

The same level formats can be composed in many ways

Coordinate{

slide-69
SLIDE 69

16

Dense Compressed Singleton

The same level formats can be composed in many ways

slide-70
SLIDE 70

Tensor formats Level formats

16

Coordinate matrix Compressed Singleton CSR Dense Compressed

Dense Compressed Singleton

[Tinney and Walker, 1967]

The same level formats can be composed in many ways

slide-71
SLIDE 71

Tensor formats Level formats

16

Coordinate matrix Compressed Singleton CSR Dense Compressed Coordinate tensor Compressed Singleton Singleton

Dense Compressed Singleton

Mode-generic tensor Compressed Singleton Dense Dense

[Baskaran et al. 2012] [Tinney and Walker, 1967]

Dense array tensor Dense Dense Dense

The same level formats can be composed in many ways

slide-72
SLIDE 72

Tensor formats Level formats

16

Coordinate matrix Compressed Singleton CSR Dense Compressed Coordinate tensor Compressed Singleton Singleton BCSR Dense Compressed Dense Dense

Dense Compressed Singleton

ELLPACK Dense Dense Singleton Mode-generic tensor Compressed Singleton Dense Dense

[Baskaran et al. 2012]

CSB Dense Dense Compressed Singleton

[Kincaid et al. 1989] [Buluç et al. 2009] [Tinney and Walker, 1967] [Im and Yelick 1998]

Dense array tensor Dense Dense Dense

The same level formats can be composed in many ways

slide-73
SLIDE 73

Tensor formats Level formats

16

Hashed Range Offset

Coordinate matrix Compressed Singleton CSR Dense Compressed Coordinate tensor Compressed Singleton Singleton BCSR Dense Compressed Dense Dense

Dense Compressed Singleton

ELLPACK Dense Dense Singleton Mode-generic tensor Compressed Singleton Dense Dense

[Baskaran et al. 2012]

CSB Dense Dense Compressed Singleton

[Kincaid et al. 1989] [Buluç et al. 2009] [Tinney and Walker, 1967] [Im and Yelick 1998]

Dense array tensor Dense Dense Dense

The same level formats can be composed in many ways

slide-74
SLIDE 74

Tensor formats Level formats

16

Hashed Range Offset

Coordinate matrix Compressed Singleton CSR Dense Compressed Coordinate tensor Compressed Singleton Singleton BCSR Dense Compressed Dense Dense DIA Dense Range Offset Block DIA Dense Range Offset Dense Dense Hash map vector Hashed

Dense Compressed Singleton

ELLPACK Dense Dense Singleton Mode-generic tensor Compressed Singleton Dense Dense

[Baskaran et al. 2012]

Hash map matrix Hashed Hashed CSB Dense Dense Compressed Singleton

[Kincaid et al. 1989] [Buluç et al. 2009] [Tinney and Walker, 1967] [Im and Yelick 1998] [Saad 2003] [Patwary et al. 2015]

Dense array tensor Dense Dense Dense

The same level formats can be composed in many ways

slide-75
SLIDE 75

17

for (int i = 0; i < m; i++) { for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j; int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_idx[pB3]; int kc = c1_idx[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { a[pA2] += b[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }

Tensor Algebra Compiler (taco)

c : (compressed)

<latexit sha1_base64="znx8HB3iu6URMn3Pu43WiyqTY=">ACAHicbVC7SgNBFJ31GeNr1cLCZjAIsQm7IihWQRvLCOYBSQizk7vJkNmdZeauGJY0/oqNhSK2foadf+PkUWjigYHDOfdw54gkcKg5307S8srq2vruY385tb2zq67t18zKtUcqlxJpRsBMyBFDFUKGRaGBRIKEeDG7Gfv0BtBEqvsdhAu2I9WIRCs7QSh3kNMrWmwhPGLGVWSzxkB3dNpxC17Jm4AuEn9GCmSGSsf9anUVTyOIkUtmTNP3EmxnTKPgEkb5VmogYXzAetC0NGYRmHY2OWBET6zSpaHS9sVIJ+rvRMYiY4ZRYCcjhn0z743F/7xmiuFlOxNxkiLEfLoTCVFRcdt0K7QwFEOLWFcC/tXyvtM462s7wtwZ8/eZHUzkq+5XfnhfL1rI4cOSLHpEh8ckHK5JZUSJVwMiLP5JW8OU/Oi/PufExHl5xZ5oD8gfP5Awmdlg0=</latexit><latexit sha1_base64="znx8HB3iu6URMn3Pu43WiyqTY=">ACAHicbVC7SgNBFJ31GeNr1cLCZjAIsQm7IihWQRvLCOYBSQizk7vJkNmdZeauGJY0/oqNhSK2foadf+PkUWjigYHDOfdw54gkcKg5307S8srq2vruY385tb2zq67t18zKtUcqlxJpRsBMyBFDFUKGRaGBRIKEeDG7Gfv0BtBEqvsdhAu2I9WIRCs7QSh3kNMrWmwhPGLGVWSzxkB3dNpxC17Jm4AuEn9GCmSGSsf9anUVTyOIkUtmTNP3EmxnTKPgEkb5VmogYXzAetC0NGYRmHY2OWBET6zSpaHS9sVIJ+rvRMYiY4ZRYCcjhn0z743F/7xmiuFlOxNxkiLEfLoTCVFRcdt0K7QwFEOLWFcC/tXyvtM462s7wtwZ8/eZHUzkq+5XfnhfL1rI4cOSLHpEh8ckHK5JZUSJVwMiLP5JW8OU/Oi/PufExHl5xZ5oD8gfP5Awmdlg0=</latexit><latexit sha1_base64="znx8HB3iu6URMn3Pu43WiyqTY=">ACAHicbVC7SgNBFJ31GeNr1cLCZjAIsQm7IihWQRvLCOYBSQizk7vJkNmdZeauGJY0/oqNhSK2foadf+PkUWjigYHDOfdw54gkcKg5307S8srq2vruY385tb2zq67t18zKtUcqlxJpRsBMyBFDFUKGRaGBRIKEeDG7Gfv0BtBEqvsdhAu2I9WIRCs7QSh3kNMrWmwhPGLGVWSzxkB3dNpxC17Jm4AuEn9GCmSGSsf9anUVTyOIkUtmTNP3EmxnTKPgEkb5VmogYXzAetC0NGYRmHY2OWBET6zSpaHS9sVIJ+rvRMYiY4ZRYCcjhn0z743F/7xmiuFlOxNxkiLEfLoTCVFRcdt0K7QwFEOLWFcC/tXyvtM462s7wtwZ8/eZHUzkq+5XfnhfL1rI4cOSLHpEh8ckHK5JZUSJVwMiLP5JW8OU/Oi/PufExHl5xZ5oD8gfP5Awmdlg0=</latexit><latexit sha1_base64="znx8HB3iu6URMn3Pu43WiyqTY=">ACAHicbVC7SgNBFJ31GeNr1cLCZjAIsQm7IihWQRvLCOYBSQizk7vJkNmdZeauGJY0/oqNhSK2foadf+PkUWjigYHDOfdw54gkcKg5307S8srq2vruY385tb2zq67t18zKtUcqlxJpRsBMyBFDFUKGRaGBRIKEeDG7Gfv0BtBEqvsdhAu2I9WIRCs7QSh3kNMrWmwhPGLGVWSzxkB3dNpxC17Jm4AuEn9GCmSGSsf9anUVTyOIkUtmTNP3EmxnTKPgEkb5VmogYXzAetC0NGYRmHY2OWBET6zSpaHS9sVIJ+rvRMYiY4ZRYCcjhn0z743F/7xmiuFlOxNxkiLEfLoTCVFRcdt0K7QwFEOLWFcC/tXyvtM462s7wtwZ8/eZHUzkq+5XfnhfL1rI4cOSLHpEh8ckHK5JZUSJVwMiLP5JW8OU/Oi/PufExHl5xZ5oD8gfP5Awmdlg0=</latexit>

B : (dense, compressed, compressed)

<latexit sha1_base64="4q1Sk4lxncbqFoMX5S1GLw3t+GQ=">ACIXicbVDLSgMxFM34rPVdekmWIQKIjMiWFyVunFZwT6gLSWTuW2DmcyQ3BHL0F9x46+4caFId+LPmD4W2vZA4OSce29yjx9LYdB1v52V1bX1jc3MVnZ7Z3dvP3dwWDNRojlUeSQj3fCZASkUVFGghEasgYW+hLr/eDv260+gjYjUAw5iaIesp0RXcIZW6uSKZXpDCy2EZ0wDUAaG53R641FoJxkDwTLprJPLuxfuBHSReDOSJzNUOrlRK4h4EoJCLpkxTc+NsZ0yjYJLGZbiYGY8UfWg6alioVg2ulkwyE9tUpAu5G2RyGdqH87UhYaMwh9Wxky7Jt5bywu85oJdovtVKg4QVB8+lA3kRQjOo6LBkIDRzmwhHEt7F8p7zPNONpQszYEb37lRVK7vPAsv7/Kl8qzODLkmJyQAvHINSmRO1IhVcLJC3kjH+TeXenS9nNC1dcWY9R+QfnJ9fTfykRA=</latexit><latexit sha1_base64="4q1Sk4lxncbqFoMX5S1GLw3t+GQ=">ACIXicbVDLSgMxFM34rPVdekmWIQKIjMiWFyVunFZwT6gLSWTuW2DmcyQ3BHL0F9x46+4caFId+LPmD4W2vZA4OSce29yjx9LYdB1v52V1bX1jc3MVnZ7Z3dvP3dwWDNRojlUeSQj3fCZASkUVFGghEasgYW+hLr/eDv260+gjYjUAw5iaIesp0RXcIZW6uSKZXpDCy2EZ0wDUAaG53R641FoJxkDwTLprJPLuxfuBHSReDOSJzNUOrlRK4h4EoJCLpkxTc+NsZ0yjYJLGZbiYGY8UfWg6alioVg2ulkwyE9tUpAu5G2RyGdqH87UhYaMwh9Wxky7Jt5bywu85oJdovtVKg4QVB8+lA3kRQjOo6LBkIDRzmwhHEt7F8p7zPNONpQszYEb37lRVK7vPAsv7/Kl8qzODLkmJyQAvHINSmRO1IhVcLJC3kjH+TeXenS9nNC1dcWY9R+QfnJ9fTfykRA=</latexit><latexit sha1_base64="4q1Sk4lxncbqFoMX5S1GLw3t+GQ=">ACIXicbVDLSgMxFM34rPVdekmWIQKIjMiWFyVunFZwT6gLSWTuW2DmcyQ3BHL0F9x46+4caFId+LPmD4W2vZA4OSce29yjx9LYdB1v52V1bX1jc3MVnZ7Z3dvP3dwWDNRojlUeSQj3fCZASkUVFGghEasgYW+hLr/eDv260+gjYjUAw5iaIesp0RXcIZW6uSKZXpDCy2EZ0wDUAaG53R641FoJxkDwTLprJPLuxfuBHSReDOSJzNUOrlRK4h4EoJCLpkxTc+NsZ0yjYJLGZbiYGY8UfWg6alioVg2ulkwyE9tUpAu5G2RyGdqH87UhYaMwh9Wxky7Jt5bywu85oJdovtVKg4QVB8+lA3kRQjOo6LBkIDRzmwhHEt7F8p7zPNONpQszYEb37lRVK7vPAsv7/Kl8qzODLkmJyQAvHINSmRO1IhVcLJC3kjH+TeXenS9nNC1dcWY9R+QfnJ9fTfykRA=</latexit><latexit sha1_base64="4q1Sk4lxncbqFoMX5S1GLw3t+GQ=">ACIXicbVDLSgMxFM34rPVdekmWIQKIjMiWFyVunFZwT6gLSWTuW2DmcyQ3BHL0F9x46+4caFId+LPmD4W2vZA4OSce29yjx9LYdB1v52V1bX1jc3MVnZ7Z3dvP3dwWDNRojlUeSQj3fCZASkUVFGghEasgYW+hLr/eDv260+gjYjUAw5iaIesp0RXcIZW6uSKZXpDCy2EZ0wDUAaG53R641FoJxkDwTLprJPLuxfuBHSReDOSJzNUOrlRK4h4EoJCLpkxTc+NsZ0yjYJLGZbiYGY8UfWg6alioVg2ulkwyE9tUpAu5G2RyGdqH87UhYaMwh9Wxky7Jt5bywu85oJdovtVKg4QVB8+lA3kRQjOo6LBkIDRzmwhHEt7F8p7zPNONpQszYEb37lRVK7vPAsv7/Kl8qzODLkmJyQAvHINSmRO1IhVcLJC3kjH+TeXenS9nNC1dcWY9R+QfnJ9fTfykRA=</latexit>

A : (dense, dense) Aij = X

k

Bijkck

[Kjolstad et al. 2017]

slide-76
SLIDE 76

18

for (int i = 0; i < m; i++) { for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j; int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_crd[pB3]; int kc = c1_crd[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { A[pA2] += B[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }

Aij = X

k

Bijk · ck

<latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="ck8pdC+ekZH4nUmSP+ZG7r8lEyk=">AB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i7+gsTLXTzQrMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GS7KDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RtRxzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D7odOn4MoA7ncAFXEMIN3MEDdKALAhJ4hXdv4r15H6uat6tDP4I+/zBzjGijg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="9QtCk7M63G3IchN+Z5Vs/Le7raY=">ACinicbZHfatswFMZld1u7rNvS9nI3YmGwiy1IdhPblELX7WKXLSxtITZGVpRUi/wHS4EoYfpK+1ubzM5zWBLe0Dw4zvn0znSKRrBlUbot+fvPHv+YnfvZe/V/us3b/sHh1eqbiVlE1qLWt4URDHBKzbRXAt20hGykKw62L5tctf3zGpeF390KuGZSVZVHzOKdFOyv3Jl1fMpWLIjNoiOIRCo8/oeE4TBCOHSAchXFgv+SG/7QWnsJUtWVultZsWfEoPE4C5wijMQ6RgzhOUIDteWdOm9KZ7WGW7YIjYM46Rqtw0GEwygeWUNzZ7J5f/A3Bx8D3sAbOIi7/9KZzVtS1ZpKohSU4wanRkiNaeC2V7aKtYQuiQLNnVYkZKpzKxnsvCDU2ZwXkt3Kg3X6r8OQ0qlVmXhKkuib9V2rhOfyk1bPY8zw6um1ayiD43mrYC6ht1e4IxLRrVYOSBUcjcrpLdEqrd9nruE/D2kx/DVTDEji/R4Ox8x174B14Dz4CDCJwBr6DCzAB1Nv1PntjL/L3/cBP/JOHUt/beI7Af+F/+wOGoLrq</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit>

taco generates code dimension by dimension

slide-77
SLIDE 77

18

for (int i = 0; i < m; i++) { for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j; int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_crd[pB3]; int kc = c1_crd[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { A[pA2] += B[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }

Aij = X

k

Bijk · ck

<latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="ck8pdC+ekZH4nUmSP+ZG7r8lEyk=">AB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i7+gsTLXTzQrMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GS7KDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RtRxzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D7odOn4MoA7ncAFXEMIN3MEDdKALAhJ4hXdv4r15H6uat6tDP4I+/zBzjGijg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="9QtCk7M63G3IchN+Z5Vs/Le7raY=">ACinicbZHfatswFMZld1u7rNvS9nI3YmGwiy1IdhPblELX7WKXLSxtITZGVpRUi/wHS4EoYfpK+1ubzM5zWBLe0Dw4zvn0znSKRrBlUbot+fvPHv+YnfvZe/V/us3b/sHh1eqbiVlE1qLWt4URDHBKzbRXAt20hGykKw62L5tctf3zGpeF390KuGZSVZVHzOKdFOyv3Jl1fMpWLIjNoiOIRCo8/oeE4TBCOHSAchXFgv+SG/7QWnsJUtWVultZsWfEoPE4C5wijMQ6RgzhOUIDteWdOm9KZ7WGW7YIjYM46Rqtw0GEwygeWUNzZ7J5f/A3Bx8D3sAbOIi7/9KZzVtS1ZpKohSU4wanRkiNaeC2V7aKtYQuiQLNnVYkZKpzKxnsvCDU2ZwXkt3Kg3X6r8OQ0qlVmXhKkuib9V2rhOfyk1bPY8zw6um1ayiD43mrYC6ht1e4IxLRrVYOSBUcjcrpLdEqrd9nruE/D2kx/DVTDEji/R4Ox8x174B14Dz4CDCJwBr6DCzAB1Nv1PntjL/L3/cBP/JOHUt/beI7Af+F/+wOGoLrq</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit>

taco generates code dimension by dimension

slide-78
SLIDE 78

18

for (int i = 0; i < m; i++) { for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j; int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_crd[pB3]; int kc = c1_crd[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { A[pA2] += B[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }

Aij = X

k

Bijk · ck

<latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="ck8pdC+ekZH4nUmSP+ZG7r8lEyk=">AB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i7+gsTLXTzQrMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GS7KDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RtRxzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D7odOn4MoA7ncAFXEMIN3MEDdKALAhJ4hXdv4r15H6uat6tDP4I+/zBzjGijg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="9QtCk7M63G3IchN+Z5Vs/Le7raY=">ACinicbZHfatswFMZld1u7rNvS9nI3YmGwiy1IdhPblELX7WKXLSxtITZGVpRUi/wHS4EoYfpK+1ubzM5zWBLe0Dw4zvn0znSKRrBlUbot+fvPHv+YnfvZe/V/us3b/sHh1eqbiVlE1qLWt4URDHBKzbRXAt20hGykKw62L5tctf3zGpeF390KuGZSVZVHzOKdFOyv3Jl1fMpWLIjNoiOIRCo8/oeE4TBCOHSAchXFgv+SG/7QWnsJUtWVultZsWfEoPE4C5wijMQ6RgzhOUIDteWdOm9KZ7WGW7YIjYM46Rqtw0GEwygeWUNzZ7J5f/A3Bx8D3sAbOIi7/9KZzVtS1ZpKohSU4wanRkiNaeC2V7aKtYQuiQLNnVYkZKpzKxnsvCDU2ZwXkt3Kg3X6r8OQ0qlVmXhKkuib9V2rhOfyk1bPY8zw6um1ayiD43mrYC6ht1e4IxLRrVYOSBUcjcrpLdEqrd9nruE/D2kx/DVTDEji/R4Ox8x174B14Dz4CDCJwBr6DCzAB1Nv1PntjL/L3/cBP/JOHUt/beI7Af+F/+wOGoLrq</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit>

taco generates code dimension by dimension

slide-79
SLIDE 79

18

for (int i = 0; i < m; i++) { for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j; int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_crd[pB3]; int kc = c1_crd[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { A[pA2] += B[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }

Aij = X

k

Bijk · ck

<latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="ck8pdC+ekZH4nUmSP+ZG7r8lEyk=">AB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i7+gsTLXTzQrMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GS7KDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RtRxzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D7odOn4MoA7ncAFXEMIN3MEDdKALAhJ4hXdv4r15H6uat6tDP4I+/zBzjGijg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">ACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKkW+QNJLgShH7O/tLv+m8pClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vf9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO3zrp03pTOag23bBEaB3HSFVqHgwiHUTyhubOZPB0XMOvgS8gSOwict8Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe/BKugyF2/BOBXADsEXgEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="9QtCk7M63G3IchN+Z5Vs/Le7raY=">ACinicbZHfatswFMZld1u7rNvS9nI3YmGwiy1IdhPblELX7WKXLSxtITZGVpRUi/wHS4EoYfpK+1ubzM5zWBLe0Dw4zvn0znSKRrBlUbot+fvPHv+YnfvZe/V/us3b/sHh1eqbiVlE1qLWt4URDHBKzbRXAt20hGykKw62L5tctf3zGpeF390KuGZSVZVHzOKdFOyv3Jl1fMpWLIjNoiOIRCo8/oeE4TBCOHSAchXFgv+SG/7QWnsJUtWVultZsWfEoPE4C5wijMQ6RgzhOUIDteWdOm9KZ7WGW7YIjYM46Rqtw0GEwygeWUNzZ7J5f/A3Bx8D3sAbOIi7/9KZzVtS1ZpKohSU4wanRkiNaeC2V7aKtYQuiQLNnVYkZKpzKxnsvCDU2ZwXkt3Kg3X6r8OQ0qlVmXhKkuib9V2rhOfyk1bPY8zw6um1ayiD43mrYC6ht1e4IxLRrVYOSBUcjcrpLdEqrd9nruE/D2kx/DVTDEji/R4Ox8x174B14Dz4CDCJwBr6DCzAB1Nv1PntjL/L3/cBP/JOHUt/beI7Af+F/+wOGoLrq</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">ACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJoc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz9J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWdOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS7fXcJ+DdJz+G62CIHX8dDU7Pt9xAN6BY/ABYBCBU3AJrsAUG/f+RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit>

taco generates code dimension by dimension

slide-80
SLIDE 80

19

Dense Compressed

×

Dense Compressed

+

Hand-coding support for a wide range of level formats is also infeasible

slide-81
SLIDE 81

19

. . .

Dense Compressed

×

Dense Compressed

+

Compressed

×

Hashed Compressed

+

Singleton Dense

+

Range Compressed

+

Offset Dense Compressed

× ×

Hashed Compressed

× ×

Hashed Singleton Compressed

×

Singleton

+

Dense Compressed Singleton

+

Dense

+

Hand-coding support for a wide range of level formats is also infeasible

slide-82
SLIDE 82

20

Code generation is performed in two stages

slide-83
SLIDE 83

20

Code generation is performed in two stages

Compressed

×

Hashed

slide-84
SLIDE 84

20

Code generation is performed in two stages

High-level algorithm

Compressed

×

Hashed

slide-85
SLIDE 85

20

Code generation is performed in two stages

High-level algorithm Runnable code

Compressed

×

Hashed

slide-86
SLIDE 86

20

Code generation is performed in two stages

High-level algorithm Runnable code

Compressed

×

Hashed

How to compute with different data structures

slide-87
SLIDE 87

20

Code generation is performed in two stages

High-level algorithm Runnable code

Compressed

×

Hashed

How to compute with different data structures How to compute with multiple operands

slide-88
SLIDE 88

21

Tensor algebra computations can be expressed in terms of high-level operations on tensor operands O M N

C D E

B C

F P K Q R A B G H I J S L T U

slide-89
SLIDE 89

21

Tensor algebra computations can be expressed in terms of high-level operations on tensor operands O M N

C D E

B C

F P K Q R A B G H I J S L T U

slide-90
SLIDE 90

21

Tensor algebra computations can be expressed in terms of high-level operations on tensor operands O M N

C D E

B C

F P K Q R A B G H I J S L T U

slide-91
SLIDE 91

21

Tensor algebra computations can be expressed in terms of high-level operations on tensor operands O M N

C D E

B C

F P K Q R A B G H I J S L T U

slide-92
SLIDE 92

22

Level formats declare whether they support various high-level operations

Dense Compressed Hashed Singleton Range Offset

slide-93
SLIDE 93

22

Level formats declare whether they support various high-level operations

Random access Iteration

Dense Compressed Hashed Singleton Range Offset

slide-94
SLIDE 94

22

Level formats declare whether they support various high-level operations

Random access Iteration

Dense Compressed Hashed Singleton Range Offset

slide-95
SLIDE 95

22

Level formats declare whether they support various high-level operations

Random access Iteration

Dense Compressed Hashed Singleton Range Offset

slide-96
SLIDE 96

22

Level formats declare whether they support various high-level operations

Random access Iteration

Dense Compressed Hashed Singleton Range Offset

slide-97
SLIDE 97

23

Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations

Dense Compressed Hashed Singleton Range Offset

Random access

slide-98
SLIDE 98

23

Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations

Dense Compressed Hashed Singleton Range Offset

Random access

B C ∘

slide-99
SLIDE 99

23

Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations

Compressed

×

Hashed Dense Compressed Hashed Singleton Range Offset

Random access

B C ∘

slide-100
SLIDE 100

23

Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations

Compressed

×

Hashed Dense Compressed Hashed Singleton Range Offset

Random access

B C ∘

Iterate over B and random access C

slide-101
SLIDE 101

23

Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations

Dense Compressed Hashed Singleton Range Offset

Random access

B C ∘

Compressed

×

Singleton

slide-102
SLIDE 102

23

Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations

Dense Compressed Hashed Singleton Range Offset

Random access

B C ∘

Compressed

×

Singleton

Simultaneously iterate over B and C

slide-103
SLIDE 103

24

Random access

Dense Hashed

Level formats also specify how they support high-level operations

Compressed

Iteration

. . .

slide-104
SLIDE 104

24

int pB2 = pB1 * N + j;

Random access

Dense Hashed

Level formats also specify how they support high-level operations

Compressed

Iteration

. . .

slide-105
SLIDE 105

24

int pB2 = pB1 * N + j; int pB2 = j % W + pB1 * W; if (crd[pB2] != j && crd[pB2] != -1) { int end = pB2; do { pB2 = (pB2 + 1) % W; } while (crd[pB2] != j && crd[pB2] != -1 && pB2 != end); } if (crd[pB2] == j) {

Random access

Dense Hashed

Level formats also specify how they support high-level operations

Compressed

Iteration

. . .

slide-106
SLIDE 106

24

int pB2 = pB1 * N + j; int pB2 = j % W + pB1 * W; if (crd[pB2] != j && crd[pB2] != -1) { int end = pB2; do { pB2 = (pB2 + 1) % W; } while (crd[pB2] != j && crd[pB2] != -1 && pB2 != end); } if (crd[pB2] == j) {

Random access

Dense Hashed

Level formats also specify how they support high-level operations

Compressed

Iteration

for (int j = 0; j < N; j++) { int pB2 = pB1 * N + j; for (int pB2 = pos[pB1]; pB2 < pos[pB1+1]; pB2++) { int j = crd[pB2]; for (int pB2 = pB1 * W; pB2 < (pB1 + 1) * W; pB2++) { int j = crd[pB2]; if (j != -1) {

. . .

slide-107
SLIDE 107

25

Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations

Compressed

×

Hashed

B C ∘

slide-108
SLIDE 108

25

Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations

for every element b in B: find corresponding element c in C A[i][j] = b * c;

Compressed

×

Hashed

B C ∘

slide-109
SLIDE 109

25

Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations

for every element b in B: find corresponding element c in C A[i][j] = b * c;

Compressed

×

Hashed

B C ∘

slide-110
SLIDE 110

25

Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations

for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1+1]; pB2++) { int j = B2_crd[pB2]; find corresponding element c in C A[i][j] = B[pB2] * c; }

Compressed

×

Hashed

B C ∘

slide-111
SLIDE 111

25

Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations

for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1+1]; pB2++) { int j = B2_crd[pB2]; find corresponding element c in C A[i][j] = B[pB2] * c; }

Compressed

×

Hashed

B C ∘

slide-112
SLIDE 112

25

Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations

Compressed

×

Hashed

B C ∘

for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1+1]; pB2++) { int j = B2_crd[pB2]; int pC2 = j % W + pC1 * W; if (C2_crd[pC2] != j && C2_crd[pC2] != -1) { int end = pC2; do { pC2 = (pC2 + 1) % W; } while (C2_crd[pB2] != j && C2_crd[pB2] != -1 && pC2 != end); } if (C2_crd[pC2] == j) { A[i][j] = B[pB2] * C[pC2]; } }

slide-113
SLIDE 113

26

The same process can be repeated dimension by dimension

Aijk = Bijk + Cijk

slide-114
SLIDE 114

26

The same process can be repeated dimension by dimension

int iB = 0; int C0_pos = C0_pos_arr[0]; while (C0_pos < C0_pos_arr[1]) { int iC = C0_idx_arr[C0_pos]; int C0_end = C0_pos + 1; if (iC == iB) while ((C0_end < C0_pos_arr[1]) && (C0_idx_arr[C0_end] == iB)) { C0_end++; } if (iC == iB) { int B1_pos = B1_pos_arr[iB]; int C1_pos = C0_pos; while ((B1_pos < B1_pos_arr[iB + 1]) && (C1_pos < C0_end)) { int jB = B1_idx_arr[B1_pos]; int jC = C1_idx_arr[C1_pos]; int j = min(jB, jC); int A1_pos = (iB * A1_size) + j; int C1_end = C1_pos + 1; if (jC == j) while ((C1_end < C0_end) && (C1_idx_arr[C1_end] == j)) { C1_end++; } if ((jB == j) && (jC == j)) { int B2_pos = B2_pos_arr[B1_pos]; int C2_pos = C1_pos; while ((B2_pos < B2_pos_arr[B1_pos + 1]) && (C2_pos < C1_end)) { int kB = B2_idx_arr[B2_pos]; int kC = C2_idx_arr[C2_pos]; int k = min(kB, kC); int A2_pos = (A1_pos * A2_size) + k; if ((kB == k) && (kC == k)) { A_val_arr[A2_pos] = B_val_arr[B2_pos] + C_val_arr[C2_pos]; } else if (kB == k) { A_val_arr[A2_pos] = B_val_arr[B2_pos]; } else { A_val_arr[A2_pos] = C_val_arr[C2_pos]; } if (kB == k) B2_pos++; if (kC == k) C2_pos++; } while (B2_pos < B2_pos_arr[B1_pos + 1]) { int kB0 = B2_idx_arr[B2_pos]; int A2_pos0 = (A1_pos * A2_size) + kB0; A_val_arr[A2_pos0] = B_val_arr[B2_pos]; B2_pos++; } while (C2_pos < C1_end) { int kC0 = C2_idx_arr[C2_pos]; int A2_pos1 = (A1_pos * A2_size) + kC0; A_val_arr[A2_pos1] = C_val_arr[C2_pos]; C2_pos++; } } else if (jB == j) { for (int B2_pos0 = B2_pos_arr[B1_pos]; B2_pos0 < B2_pos_arr[B1_pos + 1]; B2_pos0++) { int kB1 = B2_idx_arr[B2_pos0]; int A2_pos2 = (A1_pos * A2_size) + kB1; A_val_arr[A2_pos2] = B_val_arr[B2_pos0]; } } else { for (int C2_pos0 = C1_pos; C2_pos0 < C1_end; C2_pos0++) { int kC1 = C2_idx_arr[C2_pos0]; int A2_pos3 = (A1_pos * A2_size) + kC1; A_val_arr[A2_pos3] = C_val_arr[C2_pos0]; } } if (jB == j) B1_pos++; if (jC == j) C1_pos = C1_end; } while (B1_pos < B1_pos_arr[iB + 1]) { int jB0 = B1_idx_arr[B1_pos]; int A1_pos0 = (iB * A1_size) + jB0; for (int B2_pos1 = B2_pos_arr[B1_pos]; B2_pos1 < B2_pos_arr[B1_pos + 1]; B2_pos1++) { int kB2 = B2_idx_arr[B2_pos1]; int A2_pos4 = (A1_pos0 * A2_size) + kB2; A_val_arr[A2_pos4] = B_val_arr[B2_pos1]; } B1_pos++; } while (C1_pos < C0_end) { int jC0 = C1_idx_arr[C1_pos]; int A1_pos1 = (iB * A1_size) + jC0; int C1_end0 = C1_pos + 1; while ((C1_end0 < C0_end) && (C1_idx_arr[C1_end0] == jC0)) { C1_end0++; } for (int C2_pos1 = C1_pos; C2_pos1 < C1_end0; C2_pos1++) { int kC2 = C2_idx_arr[C2_pos1]; int A2_pos5 = (A1_pos1 * A2_size) + kC2; A_val_arr[A2_pos5] = C_val_arr[C2_pos1]; } C1_pos = C1_end0; } } else { for (int B1_pos0 = B1_pos_arr[iB]; B1_pos0 < B1_pos_arr[iB + 1]; B1_pos0++) { int jB1 = B1_idx_arr[B1_pos0]; int A1_pos2 = (iB * A1_size) + jB1; for (int B2_pos2 = B2_pos_arr[B1_pos0]; B2_pos2 < B2_pos_arr[B1_pos0 + 1]; B2_pos2++) { int kB3 = B2_idx_arr[B2_pos2]; int A2_pos6 = (A1_pos2 * A2_size) + kB3; A_val_arr[A2_pos6] = B_val_arr[B2_pos2]; } } } if (iC == iB) C0_pos = C0_end; iB++; } while (iB < B0_size) { for (int B1_pos1 = B1_pos_arr[iB]; B1_pos1 < B1_pos_arr[iB + 1]; B1_pos1++) { int jB2 = B1_idx_arr[B1_pos1]; int A1_pos3 = (iB * A1_size) + jB2; for (int B2_pos3 = B2_pos_arr[B1_pos1]; B2_pos3 < B2_pos_arr[B1_pos1 + 1]; B2_pos3++) { int kB4 = B2_idx_arr[B2_pos3]; int A2_pos7 = (A1_pos3 * A2_size) + kB4; A_val_arr[A2_pos7] = B_val_arr[B2_pos3]; } } iB++; }

Aijk = Bijk + Cijk

slide-115
SLIDE 115

26

The same process can be repeated dimension by dimension

int iB = 0; int C0_pos = C0_pos_arr[0]; while (C0_pos < C0_pos_arr[1]) { int iC = C0_idx_arr[C0_pos]; int C0_end = C0_pos + 1; if (iC == iB) while ((C0_end < C0_pos_arr[1]) && (C0_idx_arr[C0_end] == iB)) { C0_end++; } if (iC == iB) { int B1_pos = B1_pos_arr[iB]; int C1_pos = C0_pos; while ((B1_pos < B1_pos_arr[iB + 1]) && (C1_pos < C0_end)) { int jB = B1_idx_arr[B1_pos]; int jC = C1_idx_arr[C1_pos]; int j = min(jB, jC); int A1_pos = (iB * A1_size) + j; int C1_end = C1_pos + 1; if (jC == j) while ((C1_end < C0_end) && (C1_idx_arr[C1_end] == j)) { C1_end++; } if ((jB == j) && (jC == j)) { int B2_pos = B2_pos_arr[B1_pos]; int C2_pos = C1_pos; while ((B2_pos < B2_pos_arr[B1_pos + 1]) && (C2_pos < C1_end)) { int kB = B2_idx_arr[B2_pos]; int kC = C2_idx_arr[C2_pos]; int k = min(kB, kC); int A2_pos = (A1_pos * A2_size) + k; if ((kB == k) && (kC == k)) { A_val_arr[A2_pos] = B_val_arr[B2_pos] + C_val_arr[C2_pos]; } else if (kB == k) { A_val_arr[A2_pos] = B_val_arr[B2_pos]; } else { A_val_arr[A2_pos] = C_val_arr[C2_pos]; } if (kB == k) B2_pos++; if (kC == k) C2_pos++; } while (B2_pos < B2_pos_arr[B1_pos + 1]) { int kB0 = B2_idx_arr[B2_pos]; int A2_pos0 = (A1_pos * A2_size) + kB0; A_val_arr[A2_pos0] = B_val_arr[B2_pos]; B2_pos++; } while (C2_pos < C1_end) { int kC0 = C2_idx_arr[C2_pos]; int A2_pos1 = (A1_pos * A2_size) + kC0; A_val_arr[A2_pos1] = C_val_arr[C2_pos]; C2_pos++; } } else if (jB == j) { for (int B2_pos0 = B2_pos_arr[B1_pos]; B2_pos0 < B2_pos_arr[B1_pos + 1]; B2_pos0++) { int kB1 = B2_idx_arr[B2_pos0]; int A2_pos2 = (A1_pos * A2_size) + kB1; A_val_arr[A2_pos2] = B_val_arr[B2_pos0]; } } else { for (int C2_pos0 = C1_pos; C2_pos0 < C1_end; C2_pos0++) { int kC1 = C2_idx_arr[C2_pos0]; int A2_pos3 = (A1_pos * A2_size) + kC1; A_val_arr[A2_pos3] = C_val_arr[C2_pos0]; } } if (jB == j) B1_pos++; if (jC == j) C1_pos = C1_end; } while (B1_pos < B1_pos_arr[iB + 1]) { int jB0 = B1_idx_arr[B1_pos]; int A1_pos0 = (iB * A1_size) + jB0; for (int B2_pos1 = B2_pos_arr[B1_pos]; B2_pos1 < B2_pos_arr[B1_pos + 1]; B2_pos1++) { int kB2 = B2_idx_arr[B2_pos1]; int A2_pos4 = (A1_pos0 * A2_size) + kB2; A_val_arr[A2_pos4] = B_val_arr[B2_pos1]; } B1_pos++; } while (C1_pos < C0_end) { int jC0 = C1_idx_arr[C1_pos]; int A1_pos1 = (iB * A1_size) + jC0; int C1_end0 = C1_pos + 1; while ((C1_end0 < C0_end) && (C1_idx_arr[C1_end0] == jC0)) { C1_end0++; } for (int C2_pos1 = C1_pos; C2_pos1 < C1_end0; C2_pos1++) { int kC2 = C2_idx_arr[C2_pos1]; int A2_pos5 = (A1_pos1 * A2_size) + kC2; A_val_arr[A2_pos5] = C_val_arr[C2_pos1]; } C1_pos = C1_end0; } } else { for (int B1_pos0 = B1_pos_arr[iB]; B1_pos0 < B1_pos_arr[iB + 1]; B1_pos0++) { int jB1 = B1_idx_arr[B1_pos0]; int A1_pos2 = (iB * A1_size) + jB1; for (int B2_pos2 = B2_pos_arr[B1_pos0]; B2_pos2 < B2_pos_arr[B1_pos0 + 1]; B2_pos2++) { int kB3 = B2_idx_arr[B2_pos2]; int A2_pos6 = (A1_pos2 * A2_size) + kB3; A_val_arr[A2_pos6] = B_val_arr[B2_pos2]; } } } if (iC == iB) C0_pos = C0_end; iB++; } while (iB < B0_size) { for (int B1_pos1 = B1_pos_arr[iB]; B1_pos1 < B1_pos_arr[iB + 1]; B1_pos1++) { int jB2 = B1_idx_arr[B1_pos1]; int A1_pos3 = (iB * A1_size) + jB2; for (int B2_pos3 = B2_pos_arr[B1_pos1]; B2_pos3 < B2_pos_arr[B1_pos1 + 1]; B2_pos3++) { int kB4 = B2_idx_arr[B2_pos3]; int A2_pos7 = (A1_pos3 * A2_size) + kB4; A_val_arr[A2_pos7] = B_val_arr[B2_pos3]; } } iB++; }

Aijk = Bijk + Cijk

slide-116
SLIDE 116

27

Evaluation

Mode-generic tensor Compressed Singleton Dense Dense DIA Dense Range Offset

x supports locate? y supports locate? y supports locate? x unordered and y ordered? co-iterate over x and y iterate over x and locate into y iterate over y and locate into x no yes no yes yes no no yes

Format Abstraction & Code Generation

slide-117
SLIDE 117

28

Our technique supports a wide range of disparate tensor formats

taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic

slide-118
SLIDE 118

28

Our technique supports a wide range of disparate tensor formats

taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic

slide-119
SLIDE 119

28

Our technique supports a wide range of disparate tensor formats

taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic

slide-120
SLIDE 120

28

Our technique supports a wide range of disparate tensor formats

taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic

slide-121
SLIDE 121

28

Our technique supports a wide range of disparate tensor formats

taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic

slide-122
SLIDE 122

28

Our technique supports a wide range of disparate tensor formats

taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic

slide-123
SLIDE 123

28

Our technique supports a wide range of disparate tensor formats

taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic

slide-124
SLIDE 124

28

Our technique supports a wide range of disparate tensor formats

taco Intel MKL SciPy MTL4 Tensor Toolbox TensorFlow This work [Kjolstad et al. 2017] Sparse vector Hash map vector Coordinate matrix CSR DCSR ELL DIA BCSR CSB DOK LIL Skyline Banded Coordinate tensor CSF Mode-generic

slide-125
SLIDE 125

29

Our technique generates efficient code

slide-126
SLIDE 126

29

Our technique generates efficient code

Coordinate SpMV

Normalized time 0.0 0.5 1.0 1.5 This work SciPy Intel MKL MTL4 TensorFlow

DIA SpMV

This work SciPy Intel MKL

slide-127
SLIDE 127

29

Our technique generates efficient code

Coordinate SpMV

Normalized time 0.0 0.5 1.0 1.5 This work SciPy Intel MKL MTL4 TensorFlow

DIA SpMV

This work SciPy Intel MKL

CSR Addition

Normalized time 0.0 2.0 4.0 6.0 8.0 10.0 This work SciPy Intel MKL MTL4

Coordinate MTTKRP

This work Tensor Toolbox

slide-128
SLIDE 128

29

Our technique generates efficient code

Coordinate SpMV

Normalized time 0.0 0.5 1.0 1.5 This work SciPy Intel MKL MTL4 TensorFlow

DIA SpMV

This work SciPy Intel MKL

CSR Addition

Normalized time 0.0 2.0 4.0 6.0 8.0 10.0 This work SciPy Intel MKL MTL4

Coordinate MTTKRP

This work Tensor Toolbox

slide-129
SLIDE 129

29

Our technique generates efficient code

Coordinate SpMV

Normalized time 0.0 0.5 1.0 1.5 This work SciPy Intel MKL MTL4 TensorFlow

DIA SpMV

This work SciPy Intel MKL

CSR Addition

Normalized time 0.0 2.0 4.0 6.0 8.0 10.0 This work SciPy Intel MKL MTL4

Coordinate MTTKRP

This work Tensor Toolbox

slide-130
SLIDE 130

29

Our technique generates efficient code

Coordinate SpMV

Normalized time 0.0 0.5 1.0 1.5 This work SciPy Intel MKL MTL4 TensorFlow

DIA SpMV

This work SciPy Intel MKL

CSR Addition

Normalized time 0.0 2.0 4.0 6.0 8.0 10.0 This work SciPy Intel MKL MTL4

Coordinate MTTKRP

This work Tensor Toolbox

slide-131
SLIDE 131

30

In conclusion…

We can automatically generate kernels that compute with disparate tensor formats

slide-132
SLIDE 132

30

In conclusion…

We can automatically generate kernels that compute with disparate tensor formats Adding support for even more tensor formats is straightforward

slide-133
SLIDE 133

30

In conclusion…

Supporting many disparate tensor formats is essential for performance We can automatically generate kernels that compute with disparate tensor formats Adding support for even more tensor formats is straightforward

slide-134
SLIDE 134

30

In conclusion…

Supporting many disparate tensor formats is essential for performance We can automatically generate kernels that compute with disparate tensor formats Adding support for even more tensor formats is straightforward This work supported by: tensor-compiler.org