Data Management Systems Access Methods Data representation in - - PowerPoint PPT Presentation

data management systems
SMART_READER_LITE
LIVE PREVIEW

Data Management Systems Access Methods Data representation in - - PowerPoint PPT Presentation

Data Management Systems Access Methods Data representation in memory Pages and Blocks Indexing Slotted pages Access Methods in context Records Compression Gustavo Alonso Institute of Computing Platforms Department of Computer


slide-1
SLIDE 1

Data Management Systems

  • Access Methods
  • Pages and Blocks
  • Indexing
  • Access Methods in context

Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich

1 Pages and Blocks

Data representation in memory Slotted pages Records Compression

slide-2
SLIDE 2

Architecture of a database

Blocks, files, segments Pages in memory Physical records Logical records (tuples) Logical data (tables, schemas) Relations, views Queries, Transactions (SQL) Record Interface Record Access Page access File Access Application Logical view (logical data) Access Paths Physical data in memory Page structure Storage allocation

2 Pages and Blocks

Cloud storage and file system (Amazon S3)

slide-3
SLIDE 3

Goals of the lecture

  • Discuss how data is represented within blocks
  • Block/Page organization
  • Slotted pages
  • Record identifiers (tuple identifiers)
  • Header information for blocks and tuples
  • Column vs Row storage
  • Compression

Pages and Blocks 3

slide-4
SLIDE 4

In perspective

  • Data in a database is stored in blocks
  • Blocks are parts of extents
  • Extents are part of segments
  • The notion of pages in a database
  • Hardware pages (the atomic unit to write to storage, usually 4KB)
  • OS Page (the unit used by the OS to implement virtual memory, usually 4KB)
  • Database Page = a block, anywhere between 512B and 32 KB; examples:

Oracle 2KB to 32KB (typical 8KB), MS SQL Server 8KB, MySQL 16KB

  • Trend is towards larger block sizes (incurs less overhead)

Pages and Blocks 4

slide-5
SLIDE 5

Blocks and pages

Pages and Blocks 5

slide-6
SLIDE 6

Finding the page you need

  • Segment – Extent – Block/Page
  • The pages allocated to a given object (a table, an index, and captured in a segment) are

managed through lists

  • These lists are stored as part of the segment header
  • See “Segments and File Storage” slide 28

Pages and Blocks 6

USED FREE Block ID (Extent,Offset) Block ID (Extent,Offset) Space Available

slide-7
SLIDE 7

Potential bottleneck

  • The free and used lists of a segment are a potential bottleneck,

especially for transactions that will result in modifications

  • How these lists are implemented affects performance:
  • Use several free lists so that concurrent transactions can look for free space in

parallel rather than conflict on access to the free list

  • Make the traversal of the free list fast and keep the list small (shorter the

larger the pages, sort the free list by available size, cache positions …)

  • Make sure holes can be efficiently found (store the available space in each

page in incremental steps by using a small amount of bits)

Pages and Blocks 7

slide-8
SLIDE 8

Finding tuples within a page

  • Slotted pages:
  • Each page has a header (checksum,

version, transaction visibility, compression information, utilization, etc.)

  • Each tuple gets an id (typically, block ID

and offset)

  • The page maintains a list of the “slots”

storing tuples in a page by storing a pointer (offset) to the beginning of each tuple

  • Needed to store tuples of different sizes

Pages and Blocks 8

HEADER

slide-9
SLIDE 9

Why slotted pages

  • To support variable length tuples
  • Tuples get a permanent tuple id (record id) that does not change
  • When data does not change, it uses the space very efficiently (this is the

most common case)

  • When data changes, it requires careful management:
  • If a tuple is modified and becomes larger, use the original space to store a pointer to

the new location of the tuple (can be in another block)

  • If a tuple is deleted, just remove the pointer
  • If a tuple is modified and becomes smaller, just leave the unnecessary space empty
  • For insertion, look for a page with enough free space and compact the page if

needed

Pages and Blocks 9

slide-10
SLIDE 10

Block structure: slotted pages

  • A block is structured as follows:
  • Header: address and type of segment

(index, table, etc.)

  • Table directory: schema of the table

stored in the block

  • Row directory: pointers to the actual

tuples stored in the block

  • Free space
  • Row data (tuples) stored in the block
  • The directory grows downwards, the

space for tuples is used upwards

Pages and Blocks 10

slide-11
SLIDE 11

Optimizing the use of blocks I

  • Percentage Free
  • Determines how much space in

each block is reserved for updating tuples instead of using it for storing new tuples

  • This is needed because an update

can result in a bigger tuple than the original one UPDATE T SET Adress = “AStreetWithAVeryLongName” WHERE LegiNr = 12345678

Pages and Blocks 11

slide-12
SLIDE 12

Percentage Free

  • The use of “Percentage Free” is to

avoid the fragmentation that would

  • ccur if pages do not have enough

space to modify a tuple that becomes larger:

  • Every modification will result in an

indirection to another page

  • A few of them are acceptable, too many

would slow down access

  • This space is reserved for growing

tuples rather than for inserting

Pages and Blocks 12

slide-13
SLIDE 13

Optimizing the use of Blocks II

  • Percentage used
  • Determines how much space needs to

be free in a block before the free space can be used to insert new tuples

  • Blocks are unavailable to inserting new

tuples until they have the given amount

  • f free space
  • It is needed because if updates can

make tuples smaller (freeing up space), they can also make them bigger (needing space). The combination of both parameters avoids thrashing on the page

Pages and Blocks 13

slide-14
SLIDE 14

Percentage Used

  • The use of “Percentage Used” is to

avoid having to constantly move a block from the used list to the free list

  • The block keeps track of how much space

it has used and it is free

  • Only when enough free can it be used for

inserting (probably for several inserts)

  • Otherwise a block can constantly go

from FREE to USED to FREE with single tuple deletes and inserts, adding

  • verhead in managing the lists

Pages and Blocks 14

INSERT Tuple 1 Tuple 2 Tuple 3 … Space for one tuple

slide-15
SLIDE 15

Pages and Blocks 15

slide-16
SLIDE 16

Record Layout

Pages and Blocks 16

slide-17
SLIDE 17

Structure of a record (tuple)

  • A tuple contains:
  • Header (validity flags for deletion,

visibility info for concurrency control, bit map of null values, …)

  • Attributes
  • Data for each non-null attribute (or a

pointer to the data)

  • Relational engines do not store

schema information in the tuple (types of the attributed are known), schema-less systems need to store the structure of the tuple since everyone can be different

Pages and Blocks 17

ID Name Dept. Sem. 1 John D-INFK 3 2 Mary D-ITET 5 4 6 John D-INFK 3 4 6 Mary D-ITET 5 STRING STRING INTEGER Schema information length length data data int representation Actual tuples in memory

slide-18
SLIDE 18

Optimizing the record layout

Pages and Blocks 18

4 6 John D-INFK 3 3 4 John D-INFK 6 Intuitive serial representation but linear time to access each attribute Instead of length, store offsets. That way the record has a fixed sized part at the beginning and a varied sized part at the tail. Pointers point to tail of attribute. Each attribute can be accessed in constant time 3 4 John D-INFK 6 Reorder the attributes, place variable length data at the end. Better performance.

slide-19
SLIDE 19

Data Types

  • Integer Numbers
  • Usually represented the same format as C or C++ (standard in hardware

architectures)

  • Real Numbers
  • IEEE-754 standard for variable precision
  • Fixed point representations for fixed precision (e.g., Oracle numbers). Avoids

rounding errors, variable length by storing all digits plus where the decimal point is (not stored as a string)

  • Strings and BLOBS
  • Length and data
  • Time, coordinates, points …
  • System specific

Pages and Blocks 19

slide-20
SLIDE 20

Corner cases

  • When tuples are very big or some attribute is very big, instead of

storing the whole tuple, one stores the fixed part of the tuple and a pointer to the variable, large size of the tuple (potentially in some

  • ther page)
  • Used for BLOBs (Binary Large Objects)
  • What is a large object is typically in reference to the page size (larger than

some size, a page, half a page, more than one page …)

  • Usually, those large attributes are not processed by queries, putting them

somewhere else speeds up scanning of the page as we do not need to scan the large object

  • Examples: the attribute is a piece of text (a long string) or a photo
  • BLOBs can be very large (more than one block!)

Pages and Blocks 20

slide-21
SLIDE 21

Pages and Blocks 21

DLN DOB EXP

  • F. Name
  • L. Name
  • M. Name

Photo Street # 99999999 08/04/75 08/05/23 Janice SAMPLE Ann Main Street 123 BLOB Several options

  • Store as a BLOB on another block(s)
  • It can be read with the data
  • It does not affect scans
  • Accessible form the database
  • Store the name of a file where the BLOB is
  • Does not take space on the database
  • It does not affect scans
  • Requires to go somewhere else to retrieve
slide-22
SLIDE 22

Row vs column store

Pages and Blocks 22

slide-23
SLIDE 23

Row-store (or n-ary storage model)

  • Row store:
  • Tuples are stored as described so far, all their attributes together
  • A tuple can be quickly accessed and retrieved

Pages and Blocks 23

ID Name Dept. Sem. 1 John D-INFK 3 2 Mary D-ITET 5 4 6 John D-INFK 3 4 6 Mary D-ITET 5 TABLE SLOTTED PAGE HEADER HEADER

slide-24
SLIDE 24

Pages and Blocks 24

“Data page layouts for relational databases on deep memory hierarchies”; The VLDB Journal, November 2002

slide-25
SLIDE 25

Row store is for OLTP

  • OLTP = Online Transaction Processing
  • Typical of banking applications, shopping carts, etc.
  • Transactional operations (updates)
  • Operations mostly on a single tuple

Pages and Blocks 25

SELECT balance FROM Accounts WHERE acc-no = 123456789 INDEX Block

slide-26
SLIDE 26

Row store cumbersome for complex queries

Pages and Blocks 26

SELECT SUM(balance) FROM Accounts WHERE Accounts.balance > 1000 GROUP BY (Accounts.nationality) Block 1 Block 2 Block 3 Needs to retrieve all pages for the table Needs to scan all tuples Discards most of the data as it only needs two attributes Problematic for large tables => this is a lot of wasted bandwidth

slide-27
SLIDE 27

Column store

  • Instead of storing by tuples, store the data by columns
  • A block now contains columns (organized as slotted pages)

Pages and Blocks 27

Name Dept Sem Age John D-INFK 3 21 Mary D-ITET 5 20 John Mary D-INFK D-ITET 3 5 21 20 Block Row Store John Mary D-INFK D-ITET 3 5 21 20 Block Block Block Block Column Store

slide-28
SLIDE 28

Pages and Blocks 28

“Data page layouts for relational databases on deep memory hierarchies”; The VLDB Journal, November 2002

slide-29
SLIDE 29

Only needs to read the attributes involved in the query Blocks contain only what is needed No extra blocks and no extra data has to be read

Processing on row store

Pages and Blocks 29

SELECT SUM(balance) FROM Accounts WHERE Accounts.balance > 1000 GROUP BY (Accounts.nationality) Name Name Name Blocks Acc_# Acc-# Balance Balance Country Accounts

slide-30
SLIDE 30

Pages and Blocks 30

The Design and Implementation of Modern Column-Oriented Database Systems Foundations and Trends in Databases

  • Vol. 5, No. 3 (2012) 197–280

https://stratos.seas.harvard.edu/files/stratos/files/columnstoresfntdbs.pdf

slide-31
SLIDE 31

Column store and vectorized processing

  • Modern processors heavily used

vectorized processing to speed things up

  • SIMD/AVX allows to perform an
  • peration simultaneously on a vector of

values

  • A column store presents the data exactly

in the vectorized representation needed to exploit SIMD

  • Very useful for numeric values and bit

comparisons

Pages and Blocks 31

slide-32
SLIDE 32

Column stores and memory hierarchy

  • Column stores started to be heavily

used to address the memory wall

  • Column stores are a more compact

representation where a cache line is likely to bring data you want to see

  • Far better cache utilization than with

row stores

Pages and Blocks 32

CACHE CACHE ROW STORE COLUMN STORE

slide-33
SLIDE 33

Historical Overview

  • Idea is very old and originally applied to data on disks
  • Horizontal partitioning (sharding)
  • Vertical partitioning (stores columns or groups of columns on different files)
  • Today, column store is used in all analytical databases and many main

memory databases

  • Improves bandwidth across memory/caches
  • Eliminates needs to deal with unwanted attributes
  • Compresses better
  • Can exploit SIMD

Pages and Blocks 33

slide-34
SLIDE 34

When column store is no that good

  • Column store suffers when:
  • Most of the attributes are needed anyway
  • When the tuple or parts of it needs to be reconstructed (as intermediate

results or final results)

  • For updates and modifications
  • Alternative: PAX (Partition Attributes Across) representation
  • A block contains several tuples but organized as a column store
  • Reconstructing the tuple does not require to access several pages
  • “Data page layouts for relational databases on deep memory hierarchies”;

The VLDB Journal, November 2002

Pages and Blocks 34

slide-35
SLIDE 35

Pages and Blocks 35

“Data page layouts for relational databases on deep memory hierarchies”; The VLDB Journal, November 2002

slide-36
SLIDE 36

Pages and Blocks 36

“Data page layouts for relational databases on deep memory hierarchies”; The VLDB Journal, November 2002 PAX makes tuple reconstruction easier but space management is far more complex due to the mini-page layout Alternative = Hybrid NSM/PAX “A hybrid page layout integrating PAX and NSM” HP Laboratories HPL-2012-240

slide-37
SLIDE 37

Compression

Pages and Blocks 37

slide-38
SLIDE 38

Compression is for bandwidth

  • Databases optimize performance
  • Compression is not used to save space but to save bandwidth
  • Trade-off CPU cycles to compress/decompress vs memory bandwidth (today,

the CPOU wins by a large margin)

  • Possible to process data in compressed form
  • Many different options that depend on data organization
  • Dictionary compression
  • Run length encoding
  • Delta encoding
  • Bit-vector representations (bitmaps)

Pages and Blocks 38

slide-39
SLIDE 39

Compression approaches

  • Dictionary compression
  • Build a dictionary mapping long entries to, e.g., integers or small numbers
  • Example: use country codes (<200) instead of country names as strings = one

byte is enough

  • Can be applied to any finite collection of names (e.g., departments, cantons,

provinces within a country, etc.)

  • Dictionary automatically built when data is loaded
  • Data can be processed in compressed form (it is encoded rather than

compressed), dictionary used for query rewrite and result rewrite

  • Very much used in many systems, especially for analytics

Pages and Blocks 39

slide-40
SLIDE 40

Compression approaches

  • Frame of reference
  • Many attributes have value locality, they can be represented as a

delta over some base:

  • 1007, 1017, 1090 = 1000, 1, 17, 90
  • Useful for many sets of data with a common base (e.g., for dates: days since

1900)

  • Can be combined with delta encoding for sorted lists of data (store the

difference to the previous value rather than the value)

  • Allows operations over the compressed data

Pages and Blocks 40

slide-41
SLIDE 41

Compression approaches

  • Run Length encoding
  • If a value appears repeated many times, just store it once and how many

times it appears

  • Useful for attributes with low cardinality (e.g., departments)
  • Used in columnar representation as the data does not even need to be stored
  • In row stores, used for long strings with repeated characters
  • Can compress the data significantly but makes processing more difficult and

makes the encoding variable in size

Pages and Blocks 41

slide-42
SLIDE 42

Compression approaches

  • Bit-vector representations (bitmaps)
  • For every value that an attribute might take, construct a bitmap as follows:
  • Create an array as long as the number of tuples
  • If tuple i has value x for that attribute, position i in the bitmap x is set to 1
  • Bitmaps act as an index and can be used to process queries just by looking at

the bitmap (selections, joins on that attribute, group by on that attribute …)

  • Bitmaps can be further compressed using run length encoding (makes query

processing slightly more complicated)

Pages and Blocks 42

slide-43
SLIDE 43

Summary

  • Databases process large amounts of data
  • How the data is organized significantly affects performance
  • Organization in pages
  • Finding the data quickly
  • Not having to do too many changes when data is updated
  • Bandwidth optimization
  • Cache optimization
  • Compression and processing over compressed data
  • The ideas presented apply not only to databases but to any system

storing and processing large data collections

Pages and Blocks 43