advanced database systems
play

ADVANCED DATABASE SYSTEMS Storage Models & Data Layout @ - PowerPoint PPT Presentation

Lect ure # 09 ADVANCED DATABASE SYSTEMS Storage Models & Data Layout @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721 (Spring 2019) 2 DATA O RGAN IZATIO N Variable-Length Fixed-Length Index Data Blocks Data Blocks Block Id +


  1. Lect ure # 09 ADVANCED DATABASE SYSTEMS Storage Models & Data Layout @ Andy_Pavlo // 15- 721 // Spring 2019

  2. CMU 15-721 (Spring 2019) 2 DATA O RGAN IZATIO N Variable-Length Fixed-Length Index Data Blocks Data Blocks Block Id + Offset

  3. CMU 15-721 (Spring 2019) 3 DATA O RGAN IZATIO N One can think of an in-memory database as just a large array of bytes. → The schema tells the DBMS how to convert the bytes into the appropriate type. → Each tuple is prefixed with a header that contains its meta-data. Storing tuples with as fixed-length data makes it easy to compute the starting point of any tuple.

  4. CMU 15-721 (Spring 2019) 4 Type Representation Data Layout / Alignment Storage Models System Catalogs

  5. CMU 15-721 (Spring 2019) 5 DATA REPRESEN TATIO N INTEGER / BIGINT / SMALLINT / TINYINT → C/C++ Representation FLOAT / REAL vs. NUMERIC / DECIMAL → IEEE-754 Standard / Fixed-point Decimals TIME / DATE / TIMESTAMP → 32/64-bit int of (micro/milli)seconds since Unix epoch VARCHAR / VARBINARY / TEXT / BLOB → Pointer to other location if type is ≥ 64-bits → Header with length and address to next location (if segmented), followed by data bytes.

  6. CMU 15-721 (Spring 2019) 6 VARIABLE PRECISIO N N UM BERS Inexact, variable-precision numeric type that uses the “native” C/C++ types. Store directly as specified by IEEE-754. Typically faster than arbitrary precision numbers. → Example: FLOAT , REAL / DOUBLE

  7. CMU 15-721 (Spring 2019) 7 VARIABLE PRECISIO N N UM BERS Rounding Example #include <stdio.h> Output int main( int argc, char * argv[]) { float x = 0.1; x+y = 0.30000001192092895508 float y = 0.2; 0.3 = 0.29999999999999998890 printf("x+y = %.20f\n", x+y); printf("0.3 = %.20f\n", 0.3); }

  8. CMU 15-721 (Spring 2019) 8 FIXED PRECISIO N N UM BERS Numeric data types with arbitrary precision and scale. Used when round errors are unacceptable. → Example: NUMERIC , DECIMAL Typically stored in a exact, variable-length binary representation with additional meta-data. → Like a VARCHAR but not stored as a string Postgres Demo

  9. CMU 15-721 (Spring 2019) 9 PO STGRES: N UM ERIC # of Digits typedef unsigned char NumericDigit; Weight of 1 st Digit typedef struct { int ndigits; int weight; Scale Factor int scale; int sign; Positive/Negative/NaN NumericDigit *digits; } numeric; Digit Storage

  10. CMU 15-721 (Spring 2019) 9 PO STGRES: N UM ERIC # of Digits typedef unsigned char NumericDigit; Weight of 1 st Digit typedef struct { int ndigits; int weight; Scale Factor int scale; int sign; Positive/Negative/NaN NumericDigit *digits; } numeric; Digit Storage

  11. CMU 15-721 (Spring 2019) 10 DATA LAYO UT char[] CREATE TABLE AndySux ( header id value id INT PRIMARY KEY , value BIGINT ); reinterpret_cast< int32_t* > (address)

  12. CMU 15-721 (Spring 2019) 11 VARIABLE- LEN GTH FIELDS char[] CREATE TABLE AndySux ( header 64-BIT POINTER value VARCHAR(1024) ); Variable-Length Data Blocks INSERT INTO AndySux LENGTH NEXT Andy has the worst VALUES (" Andy has the worst hygiene that I have ever hygiene that I have ever seen. I hate seen. I hate him so much. " ); LENGTH NEXT him so much.

  13. CMU 15-721 (Spring 2019) 11 VARIABLE- LEN GTH FIELDS char[] CREATE TABLE AndySux ( header 64-BIT POINTER id Andy |64-BIT POINTER value VARCHAR(1024) ); Variable-Length Data Blocks INSERT INTO AndySux LENGTH NEXT Andy has the worst VALUES (" Andy has the worst hygiene that I have ever hygiene that I have ever seen. I hate seen. I hate him so much. " ); LENGTH NEXT him so much.

  14. CMU 15-721 (Spring 2019) 12 N ULL DATA TYPES Choice #1: Special Values → Designate a value to represent NULL for a particular data type (e.g., INT32_MIN ). Choice #2: Null Column Bitmap Header → Store a bitmap in the tuple header that specifies what attributes are null. Choice #3: Per Attribute Null Flag → Store a flag that marks that a value is null. → Have to use more space than just a single bit because this messes up with word alignment.

  15. CMU 15-721 (Spring 2019) 12 N ULL DATA TYPES Choice #1: Special Values → Designate a value to represent NULL for a particular data type (e.g., INT32_MIN ). Choice #2: Null Column Bitmap Header → Store a bitmap in the tuple header that specifies what attributes are null. Choice #3: Per Attribute Null Flag → Store a flag that marks that a value is null. → Have to use more space than just a single bit because this messes up with word alignment.

  16. CMU 15-721 (Spring 2019) 13 DISCLAIM ER The truth is that you only need to worry about word-alignment for cache lines (e.g., 64 bytes). I’m going to show you the basic idea using 64 -bit words since it’s easier to see…

  17. CMU 15-721 (Spring 2019) 14 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] id INT PRIMARY KEY , cdate TIMESTAMP , color CHAR(2) , 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

  18. CMU 15-721 (Spring 2019) 14 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , cdate TIMESTAMP , id color CHAR(2) , 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

  19. CMU 15-721 (Spring 2019) 14 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 64-bits cdate TIMESTAMP , id cdate color CHAR(2) , 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

  20. CMU 15-721 (Spring 2019) 14 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 64-bits cdate TIMESTAMP , id cdate c 16-bits color CHAR(2) , 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

  21. CMU 15-721 (Spring 2019) 14 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 64-bits cdate TIMESTAMP , id cdate c zipc 16-bits color CHAR(2) , 32-bits 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

  22. CMU 15-721 (Spring 2019) 14 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 64-bits cdate TIMESTAMP , id cdate c zipc 16-bits color CHAR(2) , 32-bits 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

  23. CMU 15-721 (Spring 2019) 15 WO RD- ALIGN ED TUPLES Approach #1: Perform Extra Reads → Execute two reads to load the appropriate parts of the data word and reassemble them. Approach #2: Random Reads → Read some unexpected combination of bytes assembled into a 64-bit word. Approach #3: Reject → Throw an exception Source: Levente Kurusa

  24. CMU 15-721 (Spring 2019) 16 WO RD- ALIGN M EN T: PADDIN G Add empty bits after attributes to ensure that tuple is word aligned. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 00000000 0000 64-bits 00000000 0000 cdate TIMESTAMP , id cdate c zipc 00000000 0000 00000000 0000 16-bits color CHAR(2) , 32-bits 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

  25. CMU 15-721 (Spring 2019) 17 WO RD- ALIGN M EN T: REO RDERIN G Switch the order of attributes in the tuples' physical layout to make sure they are aligned. → May still have to use padding. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 000000000000 64-bits 000000000000 cdate TIMESTAMP , id zipc cdate c 000000000000 000000000000 16-bits color CHAR(2) , 32-bits 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

  26. CMU 15-721 (Spring 2019) 18 CM U- DB ALIGN M EN T EXPERIM EN T Processor: 1 socket, 4 cores w/ 2×HT Workload: Insert Microbenchmark Avg. Throughput No Alignment 0.523 MB/sec Optimization #1 11.7 MB/sec Optimization #2 814.8 MB/sec Source: Tianyu Li

  27. CMU 15-721 (Spring 2019) 19 STO RAGE M O DELS N -ary Storage Model (NSM) Decomposition Storage Model (DSM) Hybrid Storage Model COLUMN- STORES VS. ROW- STORES: HOW DIFFERENT ARE THEY REALLY? SIGMOD 2008

  28. CMU 15-721 (Spring 2019) 19 STO RAGE M O DELS N -ary Storage Model (NSM) Decomposition Storage Model (DSM) Hybrid Storage Model COLUMN- STORES VS. ROW- STORES: HOW DIFFERENT ARE THEY REALLY? SIGMOD 2008

  29. CMU 15-721 (Spring 2019) 20 N- ARY STO RAGE M O DEL (N SM ) The DBMS stores all of the attributes for a single tuple contiguously. Ideal for OLTP workloads where txns tend to operate only on an individual entity and insert- heavy workloads. Use the tuple-at-a-time iterator model.

  30. CMU 15-721 (Spring 2019) 21 N SM : PH YSICAL STO RAGE Choice #1: Heap-Organized Tables → Tuples are stored in blocks called a heap. → The heap does not necessarily define an order. Choice #2: Index-Organized Tables → Tuples are stored in the primary key index itself. → Not quite the same as a clustered index.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend