Lect ure # 10 ADVANCED DATABASE SYSTEMS Storage Models & Data - PowerPoint PPT Presentation

Lect ure # 10 ADVANCED DATABASE SYSTEMS Storage Models & Data Layout @ Andy_Pavlo // 15- 721 // Spring 2018

3 Type Representation In-Memory Data Layout Storage Models System Catalogs CMU 15-721 (Spring 2018)

4 DATA O RGAN IZATIO N Index Fixed-Length Variable-Length Data Blocks Data Blocks Block Id + Offset CMU 15-721 (Spring 2018)

5 DATA O RGAN IZATIO N One can think of an in-memory database as just a large array of bytes. → The schema tells the DBMS how to convert the bytes into the appropriate type. → Each tuple is prefixed with a header that contains its meta-data. Storing tuples with as fixed-length data makes it easy to compute the starting point of any tuple. Mapping virtual memory pages to database pages. CMU 15-721 (Spring 2018)

6 M EM O RY PAGES OS maps physical pages to virtual memory pages. The CPU's MMU maintains a TLB that contains the physical address of a virtual memory page. → The TLB resides in the CPU caches. → It can't obviously store every possible every possible entry for a large memory machine. When you allocate a block of memory, the allocator keeps that it aligned to page boundaries. CMU 15-721 (Spring 2018)

7 TRAN SPAREN T H UGE PAGES Maintain larger pages automatically (2MB to 1GB) → Each page has to be a contiguous blocks of memory. → Greatly reduces the # of TLB entries With THP, the OS will to reorganize pages in the background to keep things compact. → Split larger pages into smaller pages. → Combine smaller pages into larger pages. → Can cause the DBMS process to stall on memory access. Almost every DBMS says to disable this feature: → Oracle, MemSQL, NuoDB, MongoDB, Sybase IQ Source: Alexandr Nikitin CMU 15-721 (Spring 2018)

8 DATA REPRESEN TATIO N INTEGER / BIGINT / SMALLINT / TINYINT → C/C++ Representation FLOAT / REAL vs. NUMERIC / DECIMAL → IEEE-754 Standard / Fixed-point Decimals VARCHAR / VARBINARY / TEXT / BLOB → Pointer to other location if type is ≥ 64-bits → Header with length and address to next location (if segmented), followed by data bytes. TIME / DATE / TIMESTAMP → 32/64-bit integer of (micro)seconds since Unix epoch CMU 15-721 (Spring 2018)

9 VARIABLE PRECISIO N N UM BERS Inexact, variable-precision numeric type that uses the “native” C/C++ types. Store directly as specified by IEEE-754. Typically faster than arbitrary precision numbers. → Example: FLOAT , REAL / DOUBLE CMU 15-721 (Spring 2018)

10 VARIABLE PRECISIO N N UM BERS Rounding Example #include <stdio.h> Output int main( int argc, char * argv[]) { float x = 0.1; x+y = 0.30000001192092895508 float y = 0.2; 0.3 = 0.29999999999999998890 printf("x+y = %.20f\n", x+y); printf("0.3 = %.20f\n", 0.3); } CMU 15-721 (Spring 2018)

11 FIXED PRECISIO N N UM BERS Numeric data types with arbitrary precision and scale. Used when round errors are unacceptable. → Example: NUMERIC , DECIMAL Typically stored in a exact, variable-length binary representation with additional meta-data. → Like a VARCHAR but not stored as a string Demo… CMU 15-721 (Spring 2018)

12 PO STGRES: N UM ERIC # of Digits typedef unsigned char NumericDigit; Weight of 1 st Digit typedef struct { int ndigits; int weight; Scale Factor int scale; int sign; Positive/Negative/NaN NumericDigit *digits; } numeric; Digit Storage CMU 15-721 (Spring 2018)

13 DATA LAYO UT char[] CREATE TABLE AndySux ( header id value id INT PRIMARY KEY , value BIGINT ); CMU 15-721 (Spring 2018)

13 DATA LAYO UT char[] CREATE TABLE AndySux ( header id value id INT PRIMARY KEY , value BIGINT ); reinterpret_cast< int32_t* > (address) CMU 15-721 (Spring 2018)

14 VARIABLE- LEN GTH FIELDS char[] CREATE TABLE AndySux ( header 64-BIT POINTER value VARCHAR(1024) ); Variable-Length Data Blocks INSERT INTO AndySux LENGTH NEXT Andy has the worst VALUES (" Andy has the worst hygiene that I have ever hygiene that I have ever seen. I hate seen. I hate him so much. " ); LENGTH NEXT him so much. CMU 15-721 (Spring 2018)

14 VARIABLE- LEN GTH FIELDS char[] CREATE TABLE AndySux ( header 64-BIT POINTER Andy |64-BIT POINTER value VARCHAR(1024) ); Variable-Length Data Blocks INSERT INTO AndySux LENGTH NEXT Andy has the worst VALUES (" Andy has the worst hygiene that I have ever hygiene that I have ever seen. I hate seen. I hate him so much. " ); LENGTH NEXT him so much. CMU 15-721 (Spring 2018)

15 N ULL DATA TYPES Choice #1: Special Values → Designate a value to represent NULL for a particular data type (e.g., INT32_MIN ). Choice #2: Null Column Bitmap Header → Store a bitmap in the tuple header that specifies what attributes are null. Choice #3: Per Attribute Null Flag → Store a flag that marks that a value is null. → Have to use more space than just a single bit because this messes up with word alignment. CMU 15-721 (Spring 2018)

16 DISCLAIM ER The truth is that you only need to worry about word-alignment for cache lines (e.g., 64 bytes). I’m going to show you the basic idea using 64 -bit words since it’s easier to see… CMU 15-721 (Spring 2018)

17 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] id INT PRIMARY KEY , cdate TIMESTAMP , color CHAR(2) , zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word ); CMU 15-721 (Spring 2018)

17 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , cdate TIMESTAMP , id color CHAR(2) , zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word ); CMU 15-721 (Spring 2018)

17 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 64-bits cdate TIMESTAMP , id cdate color CHAR(2) , zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word ); CMU 15-721 (Spring 2018)

17 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 64-bits cdate TIMESTAMP , id cdate c 16-bits color CHAR(2) , zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word ); CMU 15-721 (Spring 2018)

17 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 64-bits cdate TIMESTAMP , id cdate c zipc 16-bits color CHAR(2) , 32-bits zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word ); CMU 15-721 (Spring 2018)

18 WO RD- ALIGN ED TUPLES If the CPU fetches a 64-bit value that is not word- aligned, it has three choices: → Execute two reads to load the appropriate parts of the data word and reassemble them. → Read some unexpected combination of bytes assembled into a 64-bit word. → Throw an exception Source: Levente Kurusa CMU 15-721 (Spring 2018)

19 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 00000000 0000 64-bits 00000000 0000 cdate TIMESTAMP , id cdate c zipc 00000000 0000 00000000 0000 16-bits color CHAR(2) , 32-bits zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word ); CMU 15-721 (Spring 2018)

20 STO RAGE M O DELS N -ary Storage Model (NSM) Decomposition Storage Model (DSM) Hybrid Storage Model CMU 15-721 (Spring 2018)

Lect ure # 10 ADVANCED DATABASE SYSTEMS Storage Models & Data - PowerPoint PPT Presentation

Lect ure # 10 ADVANCED DATABASE SYSTEMS Storage Models & Data Layout @ Andy_Pavlo // 15- 721 // Spring 2018 3 Type Representation In-Memory Data Layout Storage Models System Catalogs CMU 15-721 (Spring 2018) 4 DATA O RGAN IZATIO N

Lect 12a - Delaunay Triangulations Lect 12b - Delaunay Triangulations Lect 12c - Delaunay

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems

Lect ure # 13 ADVANCED DATABASE SYSTEMS Checkpoint Protocols @ Andy_Pavlo // 15- 721 //

Lect ure # 03 ADVANCED DATABASE SYSTEMS Query Compilation @ Andy_Pavlo // 15- 721 // Spring

Lect ure # 14 ADVANCED DATABASE SYSTEMS Networking @ Andy_Pavlo // 15- 721 // Spring 2018 2

Lect ure # 06 ADVANCED DATABASE SYSTEMS Multi-Version Concurrency Control (Part II) @

Lect ure # 24 ADVANCED DATABASE SYSTEMS Databases on New Hardware @ Andy_Pavlo // 15- 721 //

Lect ure # 22 ADVANCED DATABASE SYSTEMS Vectorized Execution (Part II) @ Andy_Pavlo // 15-

Lect 14a - Line Arrangements: Definitions and Zone Theorem Lect 14b - Line Arrangements:

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

ADVANCED DATABASE SYSTEMS Recovery Protocols @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Taking it all Offline with SQL Anywhere Eric Farrar, Product Manager Sybase iAnywhere March 5,

Magnitudes in the University of Utah Instrumental Earthquake Catalog: July 1962 - present

Chapter 11 Current Programmed Control Buck converter L i s (t) i L (t) The peak transistor

EECS 192: Mechatronics Design Lab Discussion 4: Power Systems written by: Richard Ducky Lin

Basic Rails Projects Professor Larry Heimann Application Design & Development Information

Introduction slide 3 Example DB slide 6

Database Access via Programming Languages SQL is a direct query language; as such, it has

Module 5: XML Modeling & Storage The major aspects of storing XML include Concepts: Data and

Lect ure # 10 ADVANCED DATABASE SYSTEMS Storage Models & Data - PowerPoint PPT Presentation

Lect ure # 10 ADVANCED DATABASE SYSTEMS Storage Models & Data Layout @ Andy_Pavlo // 15- 721 // Spring 2018 3 Type Representation In-Memory Data Layout Storage Models System Catalogs CMU 15-721 (Spring 2018) 4 DATA O RGAN IZATIO N

Lect 12a - Delaunay Triangulations Lect 12b - Delaunay Triangulations Lect 12c - Delaunay

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction &amp; History of Database Systems

Lect ure # 13 ADVANCED DATABASE SYSTEMS Checkpoint Protocols @ Andy_Pavlo // 15- 721 //

Lect ure # 03 ADVANCED DATABASE SYSTEMS Query Compilation @ Andy_Pavlo // 15- 721 // Spring

Lect ure # 14 ADVANCED DATABASE SYSTEMS Networking @ Andy_Pavlo // 15- 721 // Spring 2018 2

Lect ure # 06 ADVANCED DATABASE SYSTEMS Multi-Version Concurrency Control (Part II) @

Lect ure # 24 ADVANCED DATABASE SYSTEMS Databases on New Hardware @ Andy_Pavlo // 15- 721 //

Lect ure # 22 ADVANCED DATABASE SYSTEMS Vectorized Execution (Part II) @ Andy_Pavlo // 15-

Lect 14a - Line Arrangements: Definitions and Zone Theorem Lect 14b - Line Arrangements:

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

ADVANCED DATABASE SYSTEMS Recovery Protocols @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Taking it all Offline with SQL Anywhere Eric Farrar, Product Manager Sybase iAnywhere March 5,

Magnitudes in the University of Utah Instrumental Earthquake Catalog: July 1962 - present

Chapter 11 Current Programmed Control Buck converter L i s (t) i L (t) The peak transistor

EECS 192: Mechatronics Design Lab Discussion 4: Power Systems written by: Richard Ducky Lin

Basic Rails Projects Professor Larry Heimann Application Design &amp; Development Information

Introduction slide 3 Example DB slide 6

Database Access via Programming Languages SQL is a direct query language; as such, it has

Module 5: XML Modeling &amp; Storage The major aspects of storing XML include Concepts: Data and

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems

Basic Rails Projects Professor Larry Heimann Application Design & Development Information

Module 5: XML Modeling & Storage The major aspects of storing XML include Concepts: Data and