BitWeaving: Fast Scans for Main Memory Data Processing Yinan Li and - - PowerPoint PPT Presentation

bitweaving fast scans for main memory data processing
SMART_READER_LITE
LIVE PREVIEW

BitWeaving: Fast Scans for Main Memory Data Processing Yinan Li and - - PowerPoint PPT Presentation

BitWeaving: Fast Scans for Main Memory Data Processing Yinan Li and Jignesh M. Patel University of Wisconsin-Madison Motivation - Example TPC-H Query 6 SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date


slide-1
SLIDE 1

BitWeaving: Fast Scans for Main Memory Data Processing

Yinan Li and Jignesh M. Patel

University of Wisconsin-Madison

slide-2
SLIDE 2

Motivation


  • Example TPC-H Query 6

2

SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

slide-3
SLIDE 3

Motivation


  • Example TPC-H Query 6

2

SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

Main memory analytics DBMSs convert native column values to codes.

slide-4
SLIDE 4

Motivation


  • Example TPC-H Query 6

2

SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

12 bits 4 bits 6 bits

Main memory analytics DBMSs convert native column values to codes.

slide-5
SLIDE 5

Motivation


  • Example TPC-H Query 6

2

SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

12 bits 4 bits 6 bits Code

Code size: 4-12 bits Word size: 64 bits

Main memory analytics DBMSs convert native column values to codes.

CPU register

slide-6
SLIDE 6

Motivation


  • Example TPC-H Query 6

2

SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

12 bits 4 bits 6 bits Code

Code size: 4-12 bits Word size: 64 bits

SIMD word size: 256 bits

Main memory analytics DBMSs convert native column values to codes.

CPU register

slide-7
SLIDE 7

Motivation


  • Example TPC-H Query 6

2

SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

12 bits 4 bits 6 bits Code

Code size: 4-12 bits Word size: 64 bits Underutilizes the processor word!

SIMD word size: 256 bits

Main memory analytics DBMSs convert native column values to codes.

CPU register

slide-8
SLIDE 8

Motivation


  • Example TPC-H Query 6

2

SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

12 bits 4 bits 6 bits Code

Code size: 4-12 bits Word size: 64 bits

SIMD word size: 256 bits

Main memory analytics DBMSs convert native column values to codes.

Code Code Code Code Code CPU register

slide-9
SLIDE 9

Motivation


  • Example TPC-H Query 6

2

SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

12 bits 4 bits 6 bits Code

Code size: 4-12 bits Word size: 64 bits

SIMD word size: 256 bits

Main memory analytics DBMSs convert native column values to codes. Intra-cycle parallelism!

Code Code Code Code Code CPU register

slide-10
SLIDE 10

BitWeaving

3

slide-11
SLIDE 11

BitWeaving

  • In this talk, we introduce BitWeaving

– A fast scan method – for column-oriented databases

3

slide-12
SLIDE 12

BitWeaving

  • In this talk, we introduce BitWeaving

– A fast scan method – for column-oriented databases

  • Fully exploits intra-cycle parallelism

3

slide-13
SLIDE 13

BitWeaving

  • In this talk, we introduce BitWeaving

– A fast scan method – for column-oriented databases

  • Fully exploits intra-cycle parallelism
  • How: By “gainfully” using every bit in

every processor word.

3

slide-14
SLIDE 14

BitWeaving: Two Flavors

4

1 1

Code Word

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Word Code

BitWeaving/H (Horizontal bit organization) BitWeaving/V (Vertical bit organization)

slide-15
SLIDE 15

Framework

5

slide-16
SLIDE 16

Framework

  • Targets single-table scans

5

slide-17
SLIDE 17

Framework

  • Targets single-table scans
  • Column-scalar scan: scan on a single column

– produce a result bit vector, with one bit for each input tuple to indicate the matching tuples

5

slide-18
SLIDE 18

Framework

  • Targets single-table scans
  • Column-scalar scan: scan on a single column

– produce a result bit vector, with one bit for each input tuple to indicate the matching tuples

  • Complex predicates in the scan: logical AND

and OR operations on these result bit vectors

5

slide-19
SLIDE 19

Framework – Example

6

AND AND l_shipdate l_discount l_quantity

SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

slide-20
SLIDE 20

Framework – Example

6

AND AND l_shipdate l_discount l_quantity

SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

Result bit vector

slide-21
SLIDE 21

Framework – Example

6

AND AND l_shipdate l_discount l_quantity

SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

Result bit vector Result bit vector

slide-22
SLIDE 22

Framework – Example

6

AND AND l_shipdate l_discount l_quantity

SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

Result bit vector Result bit vector Result bit vector

slide-23
SLIDE 23

Framework – Example

6

AND AND l_shipdate l_discount l_quantity

SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

Result bit vector Result bit vector Result bit vector Result bit vector Result bit vector

slide-24
SLIDE 24

Framework – Example

6

AND AND l_shipdate l_discount l_quantity

SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

Result bit vector Result bit vector Result bit vector Result bit vector Result bit vector

RID List: 9, 15

Convert to a RID List

slide-25
SLIDE 25

Framework – Example

6

AND AND l_shipdate l_discount l_quantity

SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

Result bit vector Result bit vector Result bit vector Result bit vector Result bit vector

RID List: 9, 15

Convert to a RID List Fetch codes from projection columns

l_price l_discount Aggregation

slide-26
SLIDE 26

Framework – Example

7

AND AND l_shipdate l_discount l_quantity

SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

Result bit vector Result bit vector Result bit vector Result bit vector Result bit vector

RID List: 9, 15

l_price l_discount Aggregation BitWeaving/V columns

slide-27
SLIDE 27

Framework – Example

7

AND AND l_shipdate l_discount l_quantity

SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

Result bit vector Result bit vector Result bit vector Result bit vector Result bit vector

RID List: 9, 15

l_price l_discount Aggregation BitWeaving/V columns

BitWeaving/V BitWeaving/V BitWeaving/V BitWeaving/V BitWeaving/V

slide-28
SLIDE 28

Framework – Example

8

AND AND l_shipdate l_discount l_quantity

SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

Result bit vector Result bit vector Result bit vector Result bit vector Result bit vector

RID List: 9, 15

l_price l_discount Aggregation Mixing of BitWeaving/V BitWeaving/H columns

slide-29
SLIDE 29

Framework – Example

8

AND AND l_shipdate l_discount l_quantity

SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

Result bit vector Result bit vector Result bit vector Result bit vector Result bit vector

RID List: 9, 15

l_price l_discount Aggregation Mixing of BitWeaving/V BitWeaving/H columns

BitWeaving/ H BitWeaving/V BitWeaving/V BitWeaving/ H BitWeaving/ H

slide-30
SLIDE 30

Framework – Example

8

AND AND l_shipdate l_discount l_quantity

SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

Result bit vector Result bit vector Result bit vector Result bit vector Result bit vector

RID List: 9, 15

l_price l_discount Aggregation Mixing of BitWeaving/V BitWeaving/H columns

BitWeaving/ H BitWeaving/V BitWeaving/V BitWeaving/ H BitWeaving/ H

Same result
 format

slide-31
SLIDE 31

Outline

  • Motivation & Overview
  • BitWeaving/V
  • BitWeaving/H
  • Evaluations
  • Conclusion

9

slide-32
SLIDE 32

BitWeaving/V

  • Storage layout

– Bit-level columnar data organization, i.e. its like a bit-level columnar store.

  • Column-scalar scan

– Predicate evaluation is converted to logical computation on these “words of bits”

10

slide-33
SLIDE 33

BitWeaving/V

  • Storage layout

– Bit-level columnar data organization, i.e. its like a bit-level columnar store.

  • Column-scalar scan

– Predicate evaluation is converted to logical computation on these “words of bits”

  • Based on the idea of Bit-sliced index*. Two

differences:

– Segmented storage layout – Early pruning technique

10

*Bit-sliced Paper: P .E. O‘Neil and D. Quass. Improved query performance with variant indexes. SIGMOD‘97

slide-34
SLIDE 34

BitWeaving/V – Storage Layout

11

c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19

slide-35
SLIDE 35

BitWeaving/V – Storage Layout

11

c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19

Segment 1

slide-36
SLIDE 36

BitWeaving/V – Storage Layout

11

c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19

Segment 1

Code size (4 bits)

slide-37
SLIDE 37

BitWeaving/V – Storage Layout

11

1 1 1 1 1 1 1 1 1 1 1 1 1 1 Word 1 Word 2 Word 3 Word 4

c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19

Segment 1

Code size (4 bits)

Segment 1

slide-38
SLIDE 38

BitWeaving/V – Storage Layout

11

The first (most significant) bits of the 8 codes 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Word 1 Word 2 Word 3 Word 4

c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19

Segment 1

Code size (4 bits)

Segment 1

slide-39
SLIDE 39

BitWeaving/V – Storage Layout

11

The first (most significant) bits of the 8 codes The second bits of the 8 codes 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Word 1 Word 2 Word 3 Word 4

c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19

Segment 1

Code size (4 bits)

Segment 1

slide-40
SLIDE 40

BitWeaving/V – Storage Layout

11

The first (most significant) bits of the 8 codes The second bits of the 8 codes The third bits of the 8 codes The last (least significant) bits of the 8 codes 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Word 1 Word 2 Word 3 Word 4

c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19

Segment 1

Code size (4 bits)

Segment 1

slide-41
SLIDE 41

BitWeaving/V – Storage Layout

11

c1, c2, c3,c4, c5,c6, c7,c8 Codes: 10,12, 3, 6, 9, 7, 1, 0

The first (most significant) bits of the 8 codes The second bits of the 8 codes The third bits of the 8 codes The last (least significant) bits of the 8 codes 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Word 1 Word 2 Word 3 Word 4

c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19

Segment 1

Code size (4 bits)

Segment 1

slide-42
SLIDE 42

BitWeaving/V – Column-scalar Scan

12

Column codes: c1, c2, c3, c4, c5,c6, c7, c8 10,12, 3, 6, 9, 7, 1, 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1

code < 5?

1 1 ✖ ✖ ? ? ✖ ? ? ?

Literal 5

Segment 1

slide-43
SLIDE 43

BitWeaving/V – Column-scalar Scan

12

Column codes: c1, c2, c3, c4, c5,c6, c7, c8 10,12, 3, 6, 9, 7, 1, 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1

code < 5?

1 1 ✖ ✖ ? ? ✖ ? ? ? ✖ ✖ ✔ ? ✖ ? ✔ ✔

Literal 5

Segment 1

slide-44
SLIDE 44

BitWeaving/V – Column-scalar Scan

12

Column codes: c1, c2, c3, c4, c5,c6, c7, c8 10,12, 3, 6, 9, 7, 1, 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1

code < 5?

1 1 ✖ ✖ ? ? ✖ ? ? ? ✖ ✖ ✔ ? ✖ ? ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔

Literal 5

Segment 1

slide-45
SLIDE 45

BitWeaving/V – Column-scalar Scan

12

Column codes: c1, c2, c3, c4, c5,c6, c7, c8 10,12, 3, 6, 9, 7, 1, 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1

code < 5?

1 1 ✖ ✖ ? ? ✖ ? ? ? ✖ ✖ ✔ ? ✖ ? ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔

Literal 5

Segment 1

slide-46
SLIDE 46

BitWeaving/V – Column-scalar Scan

12

Column codes: c1, c2, c3, c4, c5,c6, c7, c8 10,12, 3, 6, 9, 7, 1, 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1

code < 5?

1 1 ✖ ✖ ? ? ✖ ? ? ? ✖ ✖ ✔ ? ✖ ? ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔

Literal 5

Segment 1

Exploits intra-cycle parallelism! Uses every bit in every process word!

slide-47
SLIDE 47

BitWeaving/V – Column-scalar Scan

13

Column codes: c1, c2, c3, c4, c5,c6, c7, c8 10,12, 3, 6, 9, 7, 1, 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1

code < 5?

1 1 ✖ ✖ ? ? ✖ ? ? ? ✖ ✖ ✔ ? ✖ ? ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔

Literal 5

Segment 1

slide-48
SLIDE 48

BitWeaving/V – Column-scalar Scan

13

Column codes: c1, c2, c3, c4, c5,c6, c7, c8 10,12, 3, 6, 9, 7, 1, 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1

code < 5?

1 1 ✖ ✖ ? ? ✖ ? ? ? ✖ ✖ ✔ ? ✖ ? ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔

Literal 5

Segment 1 Memory space

slide-49
SLIDE 49

BitWeaving/V – Column-scalar Scan

13

Column codes: c1, c2, c3, c4, c5,c6, c7, c8 10,12, 3, 6, 9, 7, 1, 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1

code < 5?

1 1 ✖ ✖ ? ? ✖ ? ? ? ✖ ✖ ✔ ? ✖ ? ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔

Literal 5

Segment 1 Memory space

The layout of the segment exactly matches the access pattern of column-scalar scans

slide-50
SLIDE 50

BitWeaving/V – Early Pruning

14

Column codes: c1, c2, c3, c4, c5,c6, c7, c8 10,12, 3, 6, 9, 7, 1, 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1

code < 5?

1 1 ✖ ✖ ? ? ✖ ? ? ? ✖ ✖ ✔ ? ✖ ? ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔

Literal 5

Segment 1

slide-51
SLIDE 51

BitWeaving/V – Early Pruning

14

Column codes: c1, c2, c3, c4, c5,c6, c7, c8 10,12, 3, 6, 9, 7, 1, 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1

code < 5?

1 1 ✖ ✖ ? ? ✖ ? ? ? ✖ ✖ ✔ ? ✖ ? ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔ ✖ ✖ ✔ ✔

Literal 5

Segment 1 Early Pruning: terminate the predicate evaluation on a segment, when all results have been determined.

slide-52
SLIDE 52

Outline

  • Motivation & Overview
  • BitWeaving/V
  • BitWeaving/H
  • Evaluations
  • Conclusion

15

slide-53
SLIDE 53

BitWeaving/H

  • Storage layout

– Packs codes “horizontally” into processor words

  • Column-scalar scan

– Parallel predicate evaluation on packed codes

16

slide-54
SLIDE 54

BitWeaving/H

  • Storage layout

– Packs codes “horizontally” into processor words

  • Column-scalar scan

– Parallel predicate evaluation on packed codes

  • Shares similar basic idea with IBM Blink

method*. Two differences:

16

*Blink Paper: R. Johnson, V. Raman, R. Sidle, and G. Swart. Row-wise parallel predicate evaluation. VLDB‘08

slide-55
SLIDE 55

BitWeaving/H

  • Storage layout

– Packs codes “horizontally” into processor words

  • Column-scalar scan

– Parallel predicate evaluation on packed codes

  • Shares similar basic idea with IBM Blink

method*. Two differences:

– Uses an extra bit in each code

16

*Blink Paper: R. Johnson, V. Raman, R. Sidle, and G. Swart. Row-wise parallel predicate evaluation. VLDB‘08

slide-56
SLIDE 56

BitWeaving/H

  • Storage layout

– Packs codes “horizontally” into processor words

  • Column-scalar scan

– Parallel predicate evaluation on packed codes

  • Shares similar basic idea with IBM Blink

method*. Two differences:

– Uses an extra bit in each code – Staggers codes across words inside a segment

16

*Blink Paper: R. Johnson, V. Raman, R. Sidle, and G. Swart. Row-wise parallel predicate evaluation. VLDB‘08

slide-57
SLIDE 57

BitWeaving/H

  • Storage layout

– Packs codes “horizontally” into processor words

  • Column-scalar scan

– Parallel predicate evaluation on packed codes

  • Shares similar basic idea with IBM Blink

method*. Two differences:

– Uses an extra bit in each code – Staggers codes across words inside a segment

16

*Blink Paper: R. Johnson, V. Raman, R. Sidle, and G. Swart. Row-wise parallel predicate evaluation. VLDB‘08

More details about BitWeaving/H are in the paper!

slide-58
SLIDE 58

Outline

  • Motivation & Overview
  • BitWeaving/V
  • BitWeaving/H
  • Evaluations
  • Conclusion

17

slide-59
SLIDE 59

Evaluation – System

  • Intel Xeon X5650

– 64 bits ALU – 128 bits SIMD – 12MB L3 Cache

  • 24GB memory

18

slide-60
SLIDE 60

Evaluation - Micro-benchmark

  • Query:
  • 1 billion tuples
  • Uniform distribution
  • Selectivity: 10%
  • Single thread execution

19

SELECT COUNT(*) FROM R WHERE R.a < C

slide-61
SLIDE 61

Evaluation - Micro-benchmark

20

Cycles / code 2.5 5 7.5 10 Size of code (# bits) 4 8 12 16 20 24 28 32 Naive

slide-62
SLIDE 62

Evaluation - Micro-benchmark

21

Cycles / code 2.5 5 7.5 10 Size of code (# bits) 4 8 12 16 20 24 28 32 Naive SIMD

SIMD Paper: T . Willhalm, N. Popovici, Y . Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-Scan: Ultra fast in-memory table scan using on-chip vector processing units. PVLDB‘09

slide-63
SLIDE 63

Evaluation - Micro-benchmark

21

Cycles / code 2.5 5 7.5 10 Size of code (# bits) 4 8 12 16 20 24 28 32 Naive SIMD 2X: SIMD parallelism

SIMD Paper: T . Willhalm, N. Popovici, Y . Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-Scan: Ultra fast in-memory table scan using on-chip vector processing units. PVLDB‘09

slide-64
SLIDE 64

Evaluation - Micro-benchmark

22

Cycles / code 2.5 5 7.5 10 Size of code (# bits) 4 8 12 16 20 24 28 32 Naive SIMD BL

Blink Paper: R. Johnson, V. Raman, R. Sidle, and G. Swart. Row-wise parallel predicate evaluation. VLDB‘08

slide-65
SLIDE 65

Evaluation - Micro-benchmark

23

Cycles / code 2.5 5 7.5 10 Size of code (# bits) 4 8 12 16 20 24 28 32 Naive SIMD Bit-sliced BL

Bit-sliced Paper: P .E. O‘Neil and D. Quass. Improved query performance with variant indexes. SIGMOD‘97

slide-66
SLIDE 66

Evaluation: Micro-benchmark

24

Cycles / code 2.5 5 7.5 10 Size of code (# bits) 4 8 12 16 20 24 28 32 Naive SIMD Bit-sliced BL BitWeaving/H

slide-67
SLIDE 67

Evaluation: Micro-benchmark

24

Cycles / code 2.5 5 7.5 10 Size of code (# bits) 4 8 12 16 20 24 28 32 Naive SIMD Bit-sliced BL BitWeaving/H 3X-4X speedup over BL: 1) Use the extra (delimiter) bit 2) Staggered vertical layout

slide-68
SLIDE 68

Evaluation - Micro-benchmark

25

Cycles / code 2.5 5 7.5 10 Size of code (# bits) 4 8 12 16 20 24 28 32 Naive SIMD Bit-sliced BL BitWeaving/V BitWeaving/H

slide-69
SLIDE 69

Evaluation - Micro-benchmark

25

Cycles / code 2.5 5 7.5 10 Size of code (# bits) 4 8 12 16 20 24 28 32 Naive SIMD Bit-sliced BL BitWeaving/V BitWeaving/H 2X speedup: Early pruning

slide-70
SLIDE 70

Evaluation - Micro-benchmark

25

Cycles / code 2.5 5 7.5 10 Size of code (# bits) 4 8 12 16 20 24 28 32 Naive SIMD Bit-sliced BL BitWeaving/V BitWeaving/H 3X-4X over Bit-sliced: Fewer cache misses 2X speedup: Early pruning

slide-71
SLIDE 71

Evaluation - TPC-H Query 6

  • TPC-H Query 6:
  • Scale factor 10 (~10GB)
  • Selectivity: ~2%

26

SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

slide-72
SLIDE 72

Evaluation - TPC-H Query 6

  • TPC-H Query 6:
  • Scale factor 10 (~10GB)
  • Selectivity: ~2%

26

SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

12 bits 4 bits 6 bits

slide-73
SLIDE 73

Evaluation - TPC-H Query 6

  • TPC-H Query 6:
  • Scale factor 10 (~10GB)
  • Selectivity: ~2%

26

SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity

12 bits 4 bits 6 bits 24 bits 4 bits

slide-74
SLIDE 74

Evaluation - TPC-H Query 6

27

Cycles / tuple 10 20 30 40 Naive SIMD BL Bit-sliced BW/H BW/V Scan on l_discount Scan on l_quantity Scan on l_shipdate Aggregation

slide-75
SLIDE 75

Evaluation - TPC-H (Denormalized)

  • TPC-H Q4, Q5, Q12, Q14, Q17, Q19
  • Materialized primary-key foreign-key

joins in these queries

  • Statistics

– Code size (in selection): 2 – 12 bits – Code size (in projection): 3 – 24 bits – Predicates: between, less than, equality, inequality, in – # predicates (in selection): 1 – 18

28

slide-76
SLIDE 76

Evaluation - TPC-H (Denormalized)

29

Speedup over the Naive method 7.5 15 22.5 30 TPC-H Queries Q4 Q5 Q12 Q14 Q17 Q19 Naive SIMD BL Bit-sliced BW/H BW/V

slide-77
SLIDE 77

Outline

  • Motivation & Introduction
  • BitWeaving/V
  • BitWeaving/H
  • Evaluations
  • Conclusions

30

slide-78
SLIDE 78

Conclusions

31

slide-79
SLIDE 79

Conclusions

31

BitWeaving: A new method to use all the bits in a processor word gainfully.

slide-80
SLIDE 80

Conclusions

31

BitWeaving: A new method to use all the bits in a processor word gainfully. Two flavors: BitWeaving/H and BitWeaving/V.

slide-81
SLIDE 81

Conclusions

31

BitWeaving: A new method to use all the bits in a processor word gainfully. BitWeaving are faster than state-of-the-art scan methods, in some cases by an order of magnitude. Two flavors: BitWeaving/H and BitWeaving/V.

slide-82
SLIDE 82

Q & A

  • Thanks

32