DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #3: STORAGE MODELS

2 LAST CLASS Implications of availability of large DRAM chips for database systems → Buffer Management → Concurrency Control → Logging and Recovery → Query Processing How do these components fit together? How does a SQL query get executed within the system?

3 ANATOMY OF A DATABASE SYSTEM Process Manager Connection Manager + Admission Control Query Parser Query Processor Query Optimizer Query Executor Query Lock Manager (Concurrency Control) Transactional Access Methods (or Indexes) Storage Manager Buffer Pool Manager Log Manager Shared Utilities Memory Manager + Disk Manager Networking Manager Source: Anatomy of a Database System

4 ANATOMY OF A DATABASE SYSTEM Process Manager Connection Manager + Admission Control Query Parser Query Processor Query Optimizer Query Executor Query Lock Manager (Concurrency Control) Transactional Access Methods (or Indexes) Storage Manager Buffer Pool Manager Log Manager Shared Utilities Memory Manager + Disk Manager Networking Manager Source: Anatomy of a Database System

5 ANATOMY OF A DATABASE SYSTEM Process Manager → Manages client connections Query Processor → Parse, plan and execute queries on top of storage manager Transactional Storage Manager → Knits together buffer management, concurrency control, logging and recovery Shared Utilities → Manage hardware resources across threads

6 TODAY’S AGENDA Field Storage Format (Type Representation) Tuple Storage Format Table Storage Format (Storage Models)

7 DATA ORGANIZATION Index Fixed-Length Variable-Length Data Blocks Data Blocks Block Id + Offset

8 DATA ORGANIZATION One can think of an in-memory database as just a large array of bytes. → The schema tells the DBMS how to convert the bytes into the appropriate type (e.g., INTEGER , DATE ). → Each tuple is prefixed with a header that contains meta- data (e.g., last modified time-stamp). Storing tuples with as fixed-length data makes it easy to compute the starting point of any tuple. → No tuple indirection array as in the case of slotted pages with variable-length tuples in disk-oriented systems

9 MEMORY PAGES OS maps physical pages to virtual memory pages. The CPU's MMU maintains a TLB that contains the physical address of a virtual memory page. → The TLB resides in the CPU caches. → It can't obviously store every possible every possible entry for a large memory machine. When you allocate a block, the memory allocator keeps that it aligned to page boundaries to reduce memory fragmentation (e.g., glibc malloc ).

10 TRANSPARENT HUGE PAGES (THP) Maintain larger pages automatically (2MB to 1GB) → Each page has to be a contiguous blocks of memory. → Greatly reduces the # of TLB entries (meta-data) With THP, the OS will to reorganize pages in the background to keep things compact. → Split larger pages into smaller pages. → Combine smaller pages into larger pages. → Can cause the DBMS process to stall on memory access. Almost every DBMS says to disable this feature: → Oracle, MemSQL, NuoDB, MongoDB, Sybase IQ Source: Alexandr Nikitin

11 FIELD STORAGE FORMAT (TYPES) INTEGER / BIGINT / SMALLINT / TINYINT → C/C++ Representation (endianness depends on CPU) FLOAT / REAL/DOUBLE vs. NUMERIC / DECIMAL → Floating-point Decimals / Fixed-point Decimals VARCHAR / VARBINARY / TEXT / BLOB → Pointer to other location if type is ≥ 64-bits → Header with length and address to next location (if segmented), followed by data bytes. TIME / DATE / TIMESTAMP → 32/64-bit integer of (micro)seconds since Unix epoch

12 VARIABLE PRECISION NUMBERS Inexact, variable-precision floating point type that uses the “native” C/C++ types → Example: FLOAT , REAL , DOUBLE → FLOAT(n) : n is number of bits that are used to store the mantissa of the float number → REAL = FLOAT(24) → DOUBLE = FLOAT(53) Store directly as specified by IEEE-754. Typically faster than fixed precision numbers.

13 VARIABLE PRECISION NUMBERS Rounding Example #include <stdio.h> int main( int argc, char * argv[]) { float x = 0.1; float y = 0.2; printf("x+y = %.20f\n", x+y); printf("0.3 = %.20f\n", 0.3); }

14 VARIABLE PRECISION NUMBERS Rounding Example #include <stdio.h> Output int main( int argc, char * argv[]) { float x = 0.1; x+y = 0.30000001192092895508 float y = 0.2; 0.3 = 0.29999999999999998890 printf("x+y = %.20f\n", x+y); printf("0.3 = %.20f\n", 0.3); }

15 FIXED PRECISION NUMBERS Numeric data types with arbitrary precision and scale. Used when round errors are unacceptable. → Example: NUMERIC , DECIMAL → NUMERIC = DECIMAL Typically stored in a exact, variable-length binary representation with additional meta-data. → Like a VARCHAR but not stored as a string → 2 times slower to sum one million values.

16 POSTGRES: NUMERIC typedef unsigned char NumericDigit; typedef struct { int ndigits; int weight; int scale; int sign; NumericDigit *digits; } numeric;

17 POSTGRES: NUMERIC # of Digits typedef unsigned char NumericDigit; typedef struct { int ndigits; int weight; int scale; int sign; NumericDigit *digits; } numeric;

18 POSTGRES: NUMERIC # of Digits typedef unsigned char NumericDigit; Weight of 1 st Digit typedef struct { int ndigits; int weight; int scale; int sign; NumericDigit *digits; } numeric;

19 POSTGRES: NUMERIC # of Digits typedef unsigned char NumericDigit; Weight of 1 st Digit typedef struct { int ndigits; int weight; Scale Factor int scale; int sign; NumericDigit *digits; } numeric;

20 POSTGRES: NUMERIC # of Digits typedef unsigned char NumericDigit; Weight of 1 st Digit typedef struct { int ndigits; int weight; Scale Factor int scale; int sign; Positive/Negative/NaN NumericDigit *digits; } numeric;

21 POSTGRES: NUMERIC # of Digits typedef unsigned char NumericDigit; Weight of 1 st Digit typedef struct { int ndigits; int weight; Scale Factor int scale; int sign; Positive/Negative/NaN NumericDigit *digits; } numeric; Digit Storage

24 TUPLE STORAGE FORMAT char[] CREATE TABLE AndySux ( id INT PRIMARY KEY , value BIGINT );

25 TUPLE STORAGE FORMAT char[] CREATE TABLE AndySux ( header id value id INT PRIMARY KEY , value BIGINT );

28 TUPLE STORAGE FORMAT char[] CREATE TABLE AndySux ( header id value id INT PRIMARY KEY , value BIGINT ); reinterpret_cast< int32_t* > (address)

29 TUPLE STORAGE FORMAT char[] CREATE TABLE AndySux ( header id value id INT PRIMARY KEY , value BIGINT ); reinterpret_cast< int32_t* > (address) Reinterpret cast does not compile to any instructions. It instructs the compiler to treat the sequence of bits as if it is of <int32_t> type.

30 VARIABLE-LENGTH FIELDS char[] CREATE TABLE AndySux ( header 64-BIT POINTER value VARCHAR(1024) ); Variable-Length Data Blocks INSERT INTO AndySux VALUES (” His jokes are the worst that I have ever heard. I hate him so much. " );

31 VARIABLE-LENGTH FIELDS char[] CREATE TABLE AndySux ( header 64-BIT POINTER value VARCHAR(1024) ); Variable-Length Data Blocks INSERT INTO AndySux VALUES (” His jokes are the worst that I have ever heard. I hate him so much. " );

32 VARIABLE-LENGTH FIELDS char[] CREATE TABLE AndySux ( header 64-BIT POINTER value VARCHAR(1024) ); Variable-Length Data Blocks INSERT INTO AndySux LENGTH NEXT His jokes are the VALUES (” His jokes are the worst that I have ever heard. worst that I have ever heard. I hate I hate him so much. " ); LENGTH NEXT him so much.

33 VARIABLE-LENGTH FIELDS char[] CREATE TABLE AndySux ( header 64-BIT POINTER His |64-BIT POINTER value VARCHAR(1024) ); Variable-Length Data Blocks INSERT INTO AndySux LENGTH NEXT His jokes are the VALUES (” His jokes are the worst that I have ever heard. worst that I have ever heard. I hate I hate him so much. " ); LENGTH NEXT him so much.

34 NULL DATA TYPES Choice #1: Special Values → Designate a value to represent NULL for a particular data type (e.g., INT32_MIN ). Choice #2: Null Column Bitmap Header → Store a bitmap in the tuple header that specifies what attributes are null. Limits the number of columns. Choice #3: Per Attribute Null Flag → Store a flag that marks that a value is null (not in header). → Have to use more space than just a single bit because this messes up with word alignment. Increases column size.

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #3: STORAGE MODELS 2 LAST CLASS Implications of availability of large DRAM chips for database systems Buffer Management Concurrency Control Logging

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

CS411: Two Perspectives on DBMS User perspective CS411 how to use a database system

Database Systems Database Systems 1 Creating a Database System Design Construction

National Address Database National Address Database What is a National Address Database?

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

Database Management System (DBMS) DBMS contains information about a particular enterprise

Distributed Databases 1 19.1 Distributed Database System A distributed database system

Distributed Databases Distributed database management system A distributed database (DDB) is

CS 10: Problem solving via Object Oriented Programming Info Retrieval ADT Overview List

CIS 500 Software Foundations Recursion Fall 2005 2 November CIS 500, 2

Darrell Bethea May 12, 2011 1 Homework 0 due tonight Grades will be posted on Blackboard

LSM-trie: An LSM-tree-based Ultra- Large Key-Value Store for Small Data Xingbo Wu, Yuehai Xu, Zili

NON-GAAP FINANCIAL MEASURES Quarter Ended September 30, 2018 1 NON-GAAP FINANCIAL MEASURES We

This presentation contains certain forward-looking statements within the meaning our properties

Network Economics -- Lecture 3: Incen5ves in online

Fourth-Quarter 2018 Results January 30, 2019 Safe Harbor This presentation includes forward -