ADVANCED DATABASE SYSTEMS Storage Models & Data Layout @ - PowerPoint PPT Presentation

Lect ure # 09 ADVANCED DATABASE SYSTEMS Storage Models & Data Layout @ Andy_Pavlo // 15- 721 // Spring 2019

CMU 15-721 (Spring 2019) 2 DATA O RGAN IZATIO N Variable-Length Fixed-Length Index Data Blocks Data Blocks Block Id + Offset

CMU 15-721 (Spring 2019) 3 DATA O RGAN IZATIO N One can think of an in-memory database as just a large array of bytes. → The schema tells the DBMS how to convert the bytes into the appropriate type. → Each tuple is prefixed with a header that contains its meta-data. Storing tuples with as fixed-length data makes it easy to compute the starting point of any tuple.

CMU 15-721 (Spring 2019) 4 Type Representation Data Layout / Alignment Storage Models System Catalogs

CMU 15-721 (Spring 2019) 5 DATA REPRESEN TATIO N INTEGER / BIGINT / SMALLINT / TINYINT → C/C++ Representation FLOAT / REAL vs. NUMERIC / DECIMAL → IEEE-754 Standard / Fixed-point Decimals TIME / DATE / TIMESTAMP → 32/64-bit int of (micro/milli)seconds since Unix epoch VARCHAR / VARBINARY / TEXT / BLOB → Pointer to other location if type is ≥ 64-bits → Header with length and address to next location (if segmented), followed by data bytes.

CMU 15-721 (Spring 2019) 6 VARIABLE PRECISIO N N UM BERS Inexact, variable-precision numeric type that uses the “native” C/C++ types. Store directly as specified by IEEE-754. Typically faster than arbitrary precision numbers. → Example: FLOAT , REAL / DOUBLE

CMU 15-721 (Spring 2019) 7 VARIABLE PRECISIO N N UM BERS Rounding Example #include <stdio.h> Output int main( int argc, char * argv[]) { float x = 0.1; x+y = 0.30000001192092895508 float y = 0.2; 0.3 = 0.29999999999999998890 printf("x+y = %.20f\n", x+y); printf("0.3 = %.20f\n", 0.3); }

CMU 15-721 (Spring 2019) 8 FIXED PRECISIO N N UM BERS Numeric data types with arbitrary precision and scale. Used when round errors are unacceptable. → Example: NUMERIC , DECIMAL Typically stored in a exact, variable-length binary representation with additional meta-data. → Like a VARCHAR but not stored as a string Postgres Demo

CMU 15-721 (Spring 2019) 9 PO STGRES: N UM ERIC # of Digits typedef unsigned char NumericDigit; Weight of 1 st Digit typedef struct { int ndigits; int weight; Scale Factor int scale; int sign; Positive/Negative/NaN NumericDigit *digits; } numeric; Digit Storage

CMU 15-721 (Spring 2019) 10 DATA LAYO UT char[] CREATE TABLE AndySux ( header id value id INT PRIMARY KEY , value BIGINT ); reinterpret_cast< int32_t* > (address)

CMU 15-721 (Spring 2019) 11 VARIABLE- LEN GTH FIELDS char[] CREATE TABLE AndySux ( header 64-BIT POINTER value VARCHAR(1024) ); Variable-Length Data Blocks INSERT INTO AndySux LENGTH NEXT Andy has the worst VALUES (" Andy has the worst hygiene that I have ever hygiene that I have ever seen. I hate seen. I hate him so much. " ); LENGTH NEXT him so much.

CMU 15-721 (Spring 2019) 11 VARIABLE- LEN GTH FIELDS char[] CREATE TABLE AndySux ( header 64-BIT POINTER id Andy |64-BIT POINTER value VARCHAR(1024) ); Variable-Length Data Blocks INSERT INTO AndySux LENGTH NEXT Andy has the worst VALUES (" Andy has the worst hygiene that I have ever hygiene that I have ever seen. I hate seen. I hate him so much. " ); LENGTH NEXT him so much.

CMU 15-721 (Spring 2019) 12 N ULL DATA TYPES Choice #1: Special Values → Designate a value to represent NULL for a particular data type (e.g., INT32_MIN ). Choice #2: Null Column Bitmap Header → Store a bitmap in the tuple header that specifies what attributes are null. Choice #3: Per Attribute Null Flag → Store a flag that marks that a value is null. → Have to use more space than just a single bit because this messes up with word alignment.

CMU 15-721 (Spring 2019) 13 DISCLAIM ER The truth is that you only need to worry about word-alignment for cache lines (e.g., 64 bytes). I’m going to show you the basic idea using 64 -bit words since it’s easier to see…

CMU 15-721 (Spring 2019) 14 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] id INT PRIMARY KEY , cdate TIMESTAMP , color CHAR(2) , 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

CMU 15-721 (Spring 2019) 14 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , cdate TIMESTAMP , id color CHAR(2) , 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

CMU 15-721 (Spring 2019) 14 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 64-bits cdate TIMESTAMP , id cdate color CHAR(2) , 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

CMU 15-721 (Spring 2019) 14 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 64-bits cdate TIMESTAMP , id cdate c 16-bits color CHAR(2) , 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

CMU 15-721 (Spring 2019) 14 WO RD- ALIGN ED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 64-bits cdate TIMESTAMP , id cdate c zipc 16-bits color CHAR(2) , 32-bits 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

CMU 15-721 (Spring 2019) 15 WO RD- ALIGN ED TUPLES Approach #1: Perform Extra Reads → Execute two reads to load the appropriate parts of the data word and reassemble them. Approach #2: Random Reads → Read some unexpected combination of bytes assembled into a 64-bit word. Approach #3: Reject → Throw an exception Source: Levente Kurusa

CMU 15-721 (Spring 2019) 16 WO RD- ALIGN M EN T: PADDIN G Add empty bits after attributes to ensure that tuple is word aligned. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 00000000 0000 64-bits 00000000 0000 cdate TIMESTAMP , id cdate c zipc 00000000 0000 00000000 0000 16-bits color CHAR(2) , 32-bits 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

CMU 15-721 (Spring 2019) 17 WO RD- ALIGN M EN T: REO RDERIN G Switch the order of attributes in the tuples' physical layout to make sure they are aligned. → May still have to use padding. CREATE TABLE AndySux ( char[] 32-bits id INT PRIMARY KEY , 000000000000 64-bits 000000000000 cdate TIMESTAMP , id zipc cdate c 000000000000 000000000000 16-bits color CHAR(2) , 32-bits 64-bit Word 64-bit Word 64-bit Word 64-bit Word zipcode INT );

CMU 15-721 (Spring 2019) 18 CM U- DB ALIGN M EN T EXPERIM EN T Processor: 1 socket, 4 cores w/ 2×HT Workload: Insert Microbenchmark Avg. Throughput No Alignment 0.523 MB/sec Optimization #1 11.7 MB/sec Optimization #2 814.8 MB/sec Source: Tianyu Li

CMU 15-721 (Spring 2019) 19 STO RAGE M O DELS N -ary Storage Model (NSM) Decomposition Storage Model (DSM) Hybrid Storage Model COLUMN- STORES VS. ROW- STORES: HOW DIFFERENT ARE THEY REALLY? SIGMOD 2008

CMU 15-721 (Spring 2019) 20 N- ARY STO RAGE M O DEL (N SM ) The DBMS stores all of the attributes for a single tuple contiguously. Ideal for OLTP workloads where txns tend to operate only on an individual entity and insert- heavy workloads. Use the tuple-at-a-time iterator model.

CMU 15-721 (Spring 2019) 21 N SM : PH YSICAL STO RAGE Choice #1: Heap-Organized Tables → Tuples are stored in blocks called a heap. → The heap does not necessarily define an order. Choice #2: Index-Organized Tables → Tuples are stored in the primary key index itself. → Not quite the same as a clustered index.

ADVANCED DATABASE SYSTEMS Storage Models & Data Layout @ - PowerPoint PPT Presentation

Lect ure # 09 ADVANCED DATABASE SYSTEMS Storage Models & Data Layout @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721 (Spring 2019) 2 DATA O RGAN IZATIO N Variable-Length Fixed-Length Index Data Blocks Data Blocks Block Id +

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

National Address Database National Address Database What is a National Address Database?

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

A Graphical Notation for Design Pattern Detection Haneen Dabain Department of Computer Science

Data-Driven Reserve Prices for Social Advertising Auctions at LinkedIn Tingting Cui Lijun Peng

Information Systems An Overview of Database Management Nikolaj Popov Research Institute for

Tom Nichols VP PC Games, North America Aeria Games & Entertainment Agenda Aeria Games?

Strigi in KDE4 the power of indices Jos van den Oever Strigi aKademy 2007 History of free

J/ production in high multiplicity events in proton-proton collisions at the ALICE experiment

Content Management Systems Week 9 INFM 603 Muddiest Points How JSON differs from XML

Giving a science research talk How to (and to not) leave your audience behind Brian W. Mulligan

Sambuz

Useful Links

Newsletter

Mail Us

ADVANCED DATABASE SYSTEMS Storage Models & Data Layout @ - PowerPoint PPT Presentation

Lect ure # 09 ADVANCED DATABASE SYSTEMS Storage Models & Data Layout @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721 (Spring 2019) 2 DATA O RGAN IZATIO N Variable-Length Fixed-Length Index Data Blocks Data Blocks Block Id +

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

National Address Database National Address Database What is a National Address Database?

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction &amp; History of Database Systems

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

A Graphical Notation for Design Pattern Detection Haneen Dabain Department of Computer Science

Data-Driven Reserve Prices for Social Advertising Auctions at LinkedIn Tingting Cui Lijun Peng

Information Systems An Overview of Database Management Nikolaj Popov Research Institute for

Tom Nichols VP PC Games, North America Aeria Games &amp; Entertainment Agenda Aeria Games?

Strigi in KDE4 the power of indices Jos van den Oever Strigi aKademy 2007 History of free

J/ production in high multiplicity events in proton-proton collisions at the ALICE experiment

Content Management Systems Week 9 INFM 603 Muddiest Points How JSON differs from XML

Giving a science research talk How to (and to not) leave your audience behind Brian W. Mulligan

Sambuz

Useful Links

Newsletter

Mail Us

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems

Tom Nichols VP PC Games, North America Aeria Games & Entertainment Agenda Aeria Games?