Data Management Systems Access Methods Denormalized tables Pages - - PowerPoint PPT Presentation

▶

Jan 12, 2023 5 likes •146 views

Data Management Systems Access Methods Denormalized tables Pages and Blocks Indexing Log structured databases Access Methods in context No indexes Gustavo Alonso Institute of Computing Platforms Department of Computer Science

SLIDE 1

Data Management Systems

Access Methods
Pages and Blocks
Indexing
Access Methods in context

Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich

1 Access Methods in context

Denormalized tables Log structured databases No indexes

SLIDE 2

Alternative designs

There are many different versions of what we have studied regarding

access methods

New and old ideas keep being applied because the context changes:
Denormalized tables = store more than one table in the same blocks
It is a form of materialized view with a join already performed
Useful if tables are more often accessed together than on their own
Log Structured representation = do not store tuples but only updates to them
Main memory databases for OLTP
Cloud storage
Avoid creating indexes
Cloud databases (Snowflake)
Database Cracking (MonetDB)

Access Methods in context 2

SLIDE 3

Denormalized tables

Access Methods in context 3

SLIDE 4

Merging tables in advance

Normal forms are rules used to eliminate redundancy in the schema
f a database
They force to split tables into differnet tables until each one represents a

distinct concept

Good idea to save space but often tables will be joined in almost

every query

Some systems allow to cluster tables into the same segment so that

blocks contain data form both tables (indexed by the common attributes/keys)

Access Methods in context 4

SLIDE 5

Clustered Tables (Oracle)

Clustered tables make sense

when the tables are really processed together most of the time

Reduces I/O, saves space
Tuples that are related are

stored together in the same block, reducing access time

It is like a materialized join
Updates can become expensive

Access Methods in context 5

SLIDE 6

Log structured databases

Access Methods in context 6

SLIDE 7

Log structured file

Instead of storing tuples and modifying them as needed, keep a record of

how the data was modified (a log):

Inserts records the entire tuple
Delete records that the tuple is invalidated
Updates records the attributes that have been modified
New entries are appended at the end of the file

Block INSERT id = 17, Val = 3 DELETE id = 11 UPDATE id = 25, Val = 7 UPDATE id = 98, Val = 14 New entries appended at the end

Access Methods in context 7

SLIDE 8

Log structured database

It is much faster to update a file sequentially than to do random

accesses (even in SSDs)

In cloud storage, file storage is typically append only
It minimizes the cost of making the data persistent (very expensive for

OLTP databases with many transactions)

Not for heavy query analytics but very good for in memory

transactional workloads

Access Methods in context 8

SLIDE 9

Optimizing log structured representations

Periodically compact the log:
Remove the history and add one entry for every tuple
Index the tuples and their modifications (entry could contain pointer

to previous entry referring to that tuple)

It actually matches what often happens in many applications.
No updates in place
Record history of operations
Periodically apply all operations
Increasingly being used in a number of databases, especially cloud

databases

Access Methods in context 9

SLIDE 10

No indexes

Access Methods in context 10

SLIDE 11

Snowflake: Micro-partitions instead of indexes

Micro-partitions are Snowflake’s name for extents
What is interesting is how they are organized to facilitate query

processing

Size ranges between 50 and 500 MB (before compression, the data is always

compressed when in S3)

Each micro partition has metadata describing what is inside
The metadata can be read without reading the whole micro-partition
The metadata is used to read just the part of the micro-partition that is

relevant

Data in the micro-partition is stored in columnar form (by columns not by

rows)

Storage in Context 11

SLIDE 12

Storage in Context 12

https://docs.snowflake.com/en/user-guide/tables-clustering-micropartitions.html

SLIDE 13

Pruning based on metadata

The header for a micro-partition

contains information about the data. SELECT * FROM T WHERE age > 45

If header contains the min and max

age in the data, we can decide we do not need that micro-partition simply by looking at the header

Same as indexing built in the micro-

partitions

Storage in Context 13

SLIDE 14

Database Cracking

Pioneered in MonetDB (open source

column store)

Do not create an index, instead, build

the index incrementally while the data is being processed

Based on creating a copy of the

column

Do it in successive passes so that the

cost is not too high

Initial queries expensive, later cost is

amortized as work has already been done

Access Methods in context 14