Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and - - PowerPoint PPT Presentation

lect ure 11 advanced database
SMART_READER_LITE
LIVE PREVIEW

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and - - PowerPoint PPT Presentation

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @ Andy_Pavlo // 15- 721 // Spring 2018 DATABASE TALK Oracle In-Memory Database Engine Monday February 26 th @ 4:30pm GHC 4401


slide-1
SLIDE 1

System Catalogs and Database Compression

@ Andy_Pavlo // 15- 721 // Spring 2018

ADVANCED DATABASE SYSTEMS Lect ure # 11

slide-2
SLIDE 2

CMU 15-721 (Spring 2018)

DATABASE TALK

Oracle In-Memory Database Engine

→ Monday February 26th @ 4:30pm → GHC 4401

http://db.cs.cmu.edu/events/db-seminar-spring- 2018-ajit-mylavarapu-oracle/

slide-3
SLIDE 3

CMU 15-721 (Spring 2018)

System Catalogs Compression Background Naïve Compression OLAP Columnar Compression

3

slide-4
SLIDE 4

CMU 15-721 (Spring 2018)

SYSTEM CATALO GS

Almost every DBMS stores their a database's catalog in itself.

→ Wrap object abstraction around tuples. → Specialized code for "bootstrapping" catalog tables.

The entire DBMS should be aware of transactions in order to automatically provide ACID guarantees for DDL commands and concurrent txns.

4

slide-5
SLIDE 5

CMU 15-721 (Spring 2018)

SCH EM A CH AN GES

ADD COLUMN:

→ NSM: Copy tuples into new region in memory. → DSM: Just create the new column segment

DROP COLUMN:

→ NSM #1: Copy tuples into new region of memory. → NSM #2: Mark column as "deprecated", clean up later. → DSM: Just drop the column and free memory.

CHANGE COLUMN:

→ Check whether the conversion is allowed to happen. Depends on default values.

5

slide-6
SLIDE 6

CMU 15-721 (Spring 2018)

IN DEXES

CREATE INDEX:

→ Scan the entire table and populate the index. → Have to record changes made by txns that modified the table while another txn was building the index. → When the scan completes, lock the table and resolve changes that were missed after the scan started.

DROP INDEX:

→ Just drop the index logically from the catalog. → It only becomes "invisible" when the txn that dropped it

  • commits. All existing txns will still have to update it.

6

slide-7
SLIDE 7

CMU 15-721 (Spring 2018)

SEQ UEN CES

Typically stored in the catalog. Used for maintaining a global counter

→ Also called "auto-increment" or "serial" keys

Sequences are not maintained with the same isolation protection as regular catalog entries.

→ Rolling back a txn that incremented a sequence does not rollback the change to that sequence.

7

slide-8
SLIDE 8

CMU 15-721 (Spring 2018)

O BSERVATIO N

I/O is the main bottleneck if the DBMS has to fetch data from disk. In-memory DBMSs are more complicated

→ Compressing the database reduces DRAM requirements and processing.

Key trade-off is speed vs. compression ratio

→ In-memory DBMSs (always?) choose speed.

8

slide-9
SLIDE 9

CMU 15-721 (Spring 2018)

REAL- WO RLD DATA CH ARACTERISTICS

Data sets tend to have highly skewed distributions for attribute values.

→ Example: Zipfian distribution of the Brown Corpus

Data sets tend to have high correlation between attributes of the same tuple.

→ Example: Zip Code to City, Order Date to Ship Date

9

slide-10
SLIDE 10

CMU 15-721 (Spring 2018)

DATABASE CO M PRESSIO N

Goal #1: Must produce fixed-length values. Goal #2: Allow the DBMS to postpone decompression as long as possible during query execution. Goal #3: Must be a lossless scheme.

10

slide-11
SLIDE 11

CMU 15-721 (Spring 2018)

LO SSLESS VS. LO SSY CO M PRESSIO N

When a DBMS uses compression, it is always lossless because people don’t like losing data. Any kind of lossy compression is has to be performed at the application level. Some new DBMSs support approximate queries

→ Example: BlinkDB, SnappyData, XDB, Oracle (2017)

11

slide-12
SLIDE 12

CMU 15-721 (Spring 2018)

ZO N E M APS

Pre-computed aggregates for blocks of data. DBMS can check the zone map first to decide whether it wants to access the block.

12

Zone Map

val 100 400 280 1400 type MIN MAX AVG SUM 5 COUNT

Original Data

val 100 200 300 400 400

SELECT * FROM table WHERE val > 600

slide-13
SLIDE 13

CMU 15-721 (Spring 2018)

CO M PRESSIO N GRAN ULARITY

Choice #1: Block-level

→ Compress a block of tuples for the same table.

Choice #2: Tuple-level

→ Compress the contents of the entire tuple (NSM-only).

Choice #3: Attribute-level

→ Compress a single attribute value within one tuple. → Can target multiple attributes for the same tuple.

Choice #4: Column-level

→ Compress multiple values for one or more attributes stored for multiple tuples (DSM-only).

13

slide-14
SLIDE 14

CMU 15-721 (Spring 2018)

N AÏVE CO M PRESSIO N

Compress data using a general purpose algorithm. Scope of compression is only based on the data provided as input.

→ LZO (1996), LZ4 (2011), Snappy (2011), Zstd (2015)

Considerations

→ Computational overhead → Compress vs. decompress speed.

14

slide-15
SLIDE 15

CMU 15-721 (Spring 2018)

N AÏVE CO M PRESSIO N

Choice #1: Entropy Encoding

→ More common sequences use less bits to encode, less common sequences use more bits to encode.

Choice #2: Dictionary Encoding

→ Build a data structure that maps data segments to an

  • identifier. Replace those segments in the original data

with a reference to the segments position in the dictionary data structure.

15

slide-16
SLIDE 16

CMU 15-721 (Spring 2018)

MYSQ L IN N O DB CO M PRESSIO N

16

[1,2,4,8] KB

Source: MySQL 5.7 Documentation

Buffer Pool Disk Pages

Compressed page0 mod log Compressed page1 mod log Compressed page2 mod log

slide-17
SLIDE 17

CMU 15-721 (Spring 2018)

MYSQ L IN N O DB CO M PRESSIO N

16

[1,2,4,8] KB

Source: MySQL 5.7 Documentation

Buffer Pool Disk Pages

Compressed page0 mod log Compressed page0 mod log Compressed page1 mod log Compressed page2 mod log

Updates

slide-18
SLIDE 18

CMU 15-721 (Spring 2018)

MYSQ L IN N O DB CO M PRESSIO N

16

16 KB [1,2,4,8] KB

Source: MySQL 5.7 Documentation

Buffer Pool Disk Pages

Uncompressed page0

Compressed page0 mod log Compressed page0 mod log Compressed page1 mod log Compressed page2 mod log

Updates

slide-19
SLIDE 19

CMU 15-721 (Spring 2018)

N AÏVE CO M PRESSIO N

The data has to be decompressed first before it can be read and (potentially) modified.

→ This limits the “scope” of the compression scheme.

These schemes also do not consider the high-level meaning or semantics of the data.

17

slide-20
SLIDE 20

CMU 15-721 (Spring 2018)

O BSERVATIO N

We can perform exact-match comparisons and natural joins on compressed data if predicates and data are compressed the same way.

→ Range predicates are more tricky…

18

SELECT * FROM users WHERE name = 'Andy'

NAME SALARY Andy 99999 Prashanth 88888

slide-21
SLIDE 21

CMU 15-721 (Spring 2018)

O BSERVATIO N

We can perform exact-match comparisons and natural joins on compressed data if predicates and data are compressed the same way.

→ Range predicates are more tricky…

18

SELECT * FROM users WHERE name = 'Andy' SELECT * FROM users WHERE name = XX

NAME SALARY Andy 99999 Prashanth 88888 NAME SALARY

XX AA YY BB

slide-22
SLIDE 22

CMU 15-721 (Spring 2018)

CO LUM N AR CO M PRESSIO N

Run-length Encoding Bitmap Encoding Delta Encoding Incremental Encoding Mostly Encoding Dictionary Encoding

19

slide-23
SLIDE 23

CMU 15-721 (Spring 2018)

RUN- LEN GTH EN CO DIN G

Compress runs of the same value in a single column into triplets:

→ The value of the attribute. → The start position in the column segment. → The # of elements in the run.

Requires the columns to be sorted intelligently to maximize compression opportunities.

21

DATABASE COMPRESSION SIGMOD RECORD 1993

slide-24
SLIDE 24

CMU 15-721 (Spring 2018)

RUN- LEN GTH EN CO DIN G

22

Original Data

id 2 1 4 3 7 6 9 8 sex M M F M F M M M

slide-25
SLIDE 25

CMU 15-721 (Spring 2018)

RUN- LEN GTH EN CO DIN G

22

Compressed Data

id 2 1 4 3 7 6 9 8 sex (F,3,1) (M,0,3) (F,5,1) (M,4,1) (M,6,2)

Original Data

id 2 1 4 3 7 6 9 8 sex M M F M F M M M

RLE Triplet

  • Value
  • Offset
  • Length
slide-26
SLIDE 26

CMU 15-721 (Spring 2018)

RUN- LEN GTH EN CO DIN G

22

Compressed Data

id 2 1 4 3 7 6 9 8 sex (F,3,1) (M,0,3) (F,5,1) (M,4,1) (M,6,2)

Original Data

id 2 1 4 3 7 6 9 8 sex M M F M F M M M

RLE Triplet

  • Value
  • Offset
  • Length
slide-27
SLIDE 27

CMU 15-721 (Spring 2018)

RUN- LEN GTH EN CO DIN G

22

Compressed Data Sorted Data

id 2 1 6 3 9 8 7 4 sex M M M M M M F F id 2 1 6 3 9 7 7 4 sex (F,7,2) (M,0,6)

RLE Triplet

  • Value
  • Offset
  • Length
slide-28
SLIDE 28

CMU 15-721 (Spring 2018)

BITM AP EN CO DIN G

Store a separate Bitmap for each unique value for a particular attribute where an offset in the vector corresponds to a tuple.

→ The ith position in the Bitmap corresponds to the ith tuple in the table. → Typically segmented into chunks to avoid allocating large blocks of contiguous memory.

Only practical if the value cardinality is low.

23

MODEL 204 ARCHITECTURE AND PERFORMANCE High Performance Transaction Systems 1987

slide-29
SLIDE 29

CMU 15-721 (Spring 2018)

BITM AP EN CO DIN G

24

Original Data

id 2 1 4 3 7 6 9 8 sex M M F M F M M M

slide-30
SLIDE 30

CMU 15-721 (Spring 2018)

BITM AP EN CO DIN G

24

Compressed Data Original Data

id 2 1 4 3 7 6 9 8 sex M M F M F M M M id 2 1 4 3 7 6 9 8 M 1 1 1 1 1 1 F 1 1 sex

slide-31
SLIDE 31

CMU 15-721 (Spring 2018)

BITM AP EN CO DIN G

24

Compressed Data Original Data

id 2 1 4 3 7 6 9 8 sex M M F M F M M M id 2 1 4 3 7 6 9 8 M 1 1 1 1 1 1 F 1 1 sex

slide-32
SLIDE 32

CMU 15-721 (Spring 2018)

BITM AP EN CO DIN G: EXAM PLE

Assume we have 10 million tuples. 43,000 zip codes in the US.

→ 10000000 × 32-bits = 40 MB → 10000000 × 43000 = 53.75 GB

Every time a txn inserts a new tuple, we have to extend 43,000 different bitmaps.

25

CREATE TABLE customer_dim ( id INT PRIMARY KEY, name VARCHAR(32), email VARCHAR(64), address VARCHAR(64), zip_code INT );

slide-33
SLIDE 33

CMU 15-721 (Spring 2018)

DELTA EN CO DIN G

Recording the difference between values that follow each other in the same column.

→ The base value can be stored in-line or in a separate look- up table. → Can be combined with RLE to get even better compression ratios.

26

Original Data

time 12:01 12:00 12:03 12:02 12:04 temp 99.4 99.5 99.6 99.5 99.4

slide-34
SLIDE 34

CMU 15-721 (Spring 2018)

DELTA EN CO DIN G

Recording the difference between values that follow each other in the same column.

→ The base value can be stored in-line or in a separate look- up table. → Can be combined with RLE to get even better compression ratios.

26

Original Data

time 12:01 12:00 12:03 12:02 12:04 temp 99.4 99.5 99.6 99.5 99.4

Compressed Data

time +1 12:00 +1 +1 +1 temp

  • 0.1

99.5 +0.1 +0.1

  • 0.2
slide-35
SLIDE 35

CMU 15-721 (Spring 2018)

DELTA EN CO DIN G

Recording the difference between values that follow each other in the same column.

→ The base value can be stored in-line or in a separate look- up table. → Can be combined with RLE to get even better compression ratios.

26

Original Data

time 12:01 12:00 12:03 12:02 12:04 temp 99.4 99.5 99.6 99.5 99.4

Compressed Data

time (+1,4) 12:00 temp

  • 0.1

99.5 +0.1 +0.1

  • 0.2

Compressed Data

time +1 12:00 +1 +1 +1 temp

  • 0.1

99.5 +0.1 +0.1

  • 0.2
slide-36
SLIDE 36

CMU 15-721 (Spring 2018)

IN CREM EN TAL EN CO DIN G

Type of delta encoding whereby common prefixes

  • r suffixes and their lengths are recorded so that

they need not be duplicated. This works best with sorted data.

27

Original Data

rob robbed robbing robot

slide-37
SLIDE 37

CMU 15-721 (Spring 2018)

IN CREM EN TAL EN CO DIN G

Type of delta encoding whereby common prefixes

  • r suffixes and their lengths are recorded so that

they need not be duplicated. This works best with sorted data.

27

Original Data

rob robbed robbing robot

Common Prefix

slide-38
SLIDE 38

CMU 15-721 (Spring 2018)

IN CREM EN TAL EN CO DIN G

Type of delta encoding whereby common prefixes

  • r suffixes and their lengths are recorded so that

they need not be duplicated. This works best with sorted data.

27

Original Data

rob robbed robbing robot

Common Prefix

  • rob
slide-39
SLIDE 39

CMU 15-721 (Spring 2018)

IN CREM EN TAL EN CO DIN G

Type of delta encoding whereby common prefixes

  • r suffixes and their lengths are recorded so that

they need not be duplicated. This works best with sorted data.

27

Original Data

rob robbed robbing robot

Common Prefix

  • rob

robb rob

slide-40
SLIDE 40

CMU 15-721 (Spring 2018)

IN CREM EN TAL EN CO DIN G

Type of delta encoding whereby common prefixes

  • r suffixes and their lengths are recorded so that

they need not be duplicated. This works best with sorted data.

27

Original Data

rob robbed robbing robot

Common Prefix

  • rob

robb rob

Compressed Data

rob bed ing

  • t

3 4 3 Prefix Length Suffix

slide-41
SLIDE 41

CMU 15-721 (Spring 2018)

M O STLY EN CO DIN G

When the values for an attribute are “mostly” less than the largest size, you can store them as a smaller data type.

→ The remaining values that cannot be compressed are stored in their raw form.

28

Source: Redshift Documentation

Original Data

int64 4 2 6 99999999 8

Compressed Data

mostly8 4 2 6 XXX 8

  • ffset

3 value 99999999

slide-42
SLIDE 42

CMU 15-721 (Spring 2018)

DICTIO N ARY CO M PRESSIO N

Replace frequent patterns with smaller codes. Most pervasive compression scheme in DBMSs. Need to support fast encoding and decoding. Need to also support range queries.

29

DICTIONARY- BASED ORDER- PRESERVING STRING COMPRESSION FOR MAIN MEMORY COLUMN STORES SIGMOD 2009

slide-43
SLIDE 43

CMU 15-721 (Spring 2018)

DICTIO N ARY CO M PRESSIO N

When to construct the dictionary? What is the scope of the dictionary? How do we allow for range queries? How do we enable fast encoding/decoding?

30

slide-44
SLIDE 44

CMU 15-721 (Spring 2018)

DICTIO N ARY CO N STRUCTIO N

Choice #1: All At Once

→ Compute the dictionary for all the tuples at a given point

  • f time.

→ New tuples must use a separate dictionary or the all tuples must be recomputed.

Choice #2: Incremental

→ Merge new tuples in with an existing dictionary. → Likely requires re-encoding to existing tuples.

31

slide-45
SLIDE 45

CMU 15-721 (Spring 2018)

DICTIO N ARY SCO PE

Choice #1: Block-level

→ Only include a subset of tuples within a single table. → Potentially lower compression ratio, but can add new tuples more easily.

Choice #2: Table-level

→ Construct a dictionary for the entire table. → Better compression ratio, but expensive to update.

Choice #3: Multi-Table

→ Can be either subset or entire tables. → Sometimes helps with joins and set operations.

32

slide-46
SLIDE 46

CMU 15-721 (Spring 2018)

M ULTI- ATTRIBUTE EN CO DIN G

Instead of storing a single value per dictionary entry, store entries that span attributes.

→ I’m not sure any DBMS actually implements this.

33

Original Data

202 val2 101 202 101 202 101 val1 B A C A B A 101 C 101 B

slide-47
SLIDE 47

CMU 15-721 (Spring 2018)

M ULTI- ATTRIBUTE EN CO DIN G

Instead of storing a single value per dictionary entry, store entries that span attributes.

→ I’m not sure any DBMS actually implements this.

33

Original Data Compressed Data

202 val2 101 202 101 202 101 val1 B A C A B A 101 C 101 B val2 101 202 101 val1 B A C code YY XX ZZ val1+val2 YY XX ZZ XX YY ZZ XX YY

slide-48
SLIDE 48

CMU 15-721 (Spring 2018)

EN CO DIN G / DECO DIN G

A dictionary needs to support two operations:

→ Encode: For a given uncompressed value, convert it into its compressed form. → Decode: For a given compressed value, convert it back into its original form.

No magic hash function will do this for us. We need two data structures to support

  • perations in both directions.

34

slide-49
SLIDE 49

CMU 15-721 (Spring 2018)

O RDER- PRESERVIN G CO M PRESSIO N

The encoded values need to support sorting in the same order as original values.

35

Original Data

name Andrea Prashanth Andy Dana

slide-50
SLIDE 50

CMU 15-721 (Spring 2018)

O RDER- PRESERVIN G CO M PRESSIO N

The encoded values need to support sorting in the same order as original values.

35

Original Data

name Andrea Prashanth Andy Dana

Compressed Data

code 10 20 30 40 value Andrea Andy Dana Prashanth name 10 40 20 30

slide-51
SLIDE 51

CMU 15-721 (Spring 2018)

O RDER- PRESERVIN G CO M PRESSIO N

The encoded values need to support sorting in the same order as original values.

35

SELECT * FROM users WHERE name LIKE 'And%' Original Data

name Andrea Prashanth Andy Dana

Compressed Data

code 10 20 30 40 value Andrea Andy Dana Prashanth name 10 40 20 30

SELECT * FROM users WHERE name BETWEEN 10 AND 20

slide-52
SLIDE 52

CMU 15-721 (Spring 2018)

O RDER- PRESERVIN G CO M PRESSIO N

36

SELECT name FROM users WHERE name LIKE 'And%'

Still have to perform seq scan

Original Data

name Andrea Prashanth Andy Dana

Compressed Data

code 10 20 30 40 value Andrea Andy Dana Prashanth name 10 40 20 30

slide-53
SLIDE 53

CMU 15-721 (Spring 2018)

O RDER- PRESERVIN G CO M PRESSIO N

36

SELECT name FROM users WHERE name LIKE 'And%' SELECT DISTINCT name FROM users WHERE name LIKE 'And%'

Still have to perform seq scan Only need to access dictionary

Original Data

name Andrea Prashanth Andy Dana

Compressed Data

code 10 20 30 40 value Andrea Andy Dana Prashanth name 10 40 20 30

slide-54
SLIDE 54

CMU 15-721 (Spring 2018)

DICTIO N ARY IM PLEM EN TATIO N S

Hash Table:

→ Fast and compact. → Unable to support range and prefix queries.

B+Tree:

→ Slower than a hash table and takes more memory. → Can support range and prefix queries.

37

slide-55
SLIDE 55

CMU 15-721 (Spring 2018)

SH ARED- LEAVES TREES

38

DICTIONARY- BASED ORDER- PRESERVING STRING COMPRESSION FOR MAIN MEMORY COLUMN STORES SIGMOD 2009

Decode Index Encode Index Decode Index

value aab code 10 aae 20 aaf 30 aaz 40 value zzb code 960 zzm 970 zzx 980 zzz 990

Sorted Shared Leaf

slide-56
SLIDE 56

CMU 15-721 (Spring 2018)

SH ARED- LEAVES TREES

38

DICTIONARY- BASED ORDER- PRESERVING STRING COMPRESSION FOR MAIN MEMORY COLUMN STORES SIGMOD 2009

Decode Index Encode Index Decode Index

value aab code 10 aae 20 aaf 30 aaz 40 value zzb code 960 zzm 970 zzx 980 zzz 990

Original Value Encoded Value

Sorted Shared Leaf

slide-57
SLIDE 57

CMU 15-721 (Spring 2018)

SH ARED- LEAVES TREES

38

DICTIONARY- BASED ORDER- PRESERVING STRING COMPRESSION FOR MAIN MEMORY COLUMN STORES SIGMOD 2009

Decode Index Encode Index Decode Index

value aab code 10 aae 20 aaf 30 aaz 40 value zzb code 960 zzm 970 zzx 980 zzz 990

Original Value Encoded Value Encoded Value Original Value

Sorted Shared Leaf

slide-58
SLIDE 58

CMU 15-721 (Spring 2018)

SH ARED- LEAVES TREES

38

DICTIONARY- BASED ORDER- PRESERVING STRING COMPRESSION FOR MAIN MEMORY COLUMN STORES SIGMOD 2009

Decode Index Encode Index Decode Index

value aab code 10 aae 20 aaf 30 aaz 40 value zzb code 960 zzm 970 zzx 980 zzz 990

Original Value Encoded Value Encoded Value Original Value

Sorted Shared Leaf Incremental Encoding

slide-59
SLIDE 59

CMU 15-721 (Spring 2018)

PARTIN G TH O UGH TS

Dictionary encoding is probably the most useful compression scheme because it does not require pre-sorting. The DBMS can combine different approaches for even better compression. It is important to wait as long as possible during query execution to decompress data.

39

slide-60
SLIDE 60

CMU 15-721 (Spring 2018)

N EXT CLASS

Physical vs. Logical Logging

40