Data Management Systems Access Methods Denormalized tables Pages - - PowerPoint PPT Presentation

data management systems
SMART_READER_LITE
LIVE PREVIEW

Data Management Systems Access Methods Denormalized tables Pages - - PowerPoint PPT Presentation

Data Management Systems Access Methods Denormalized tables Pages and Blocks Indexing Log structured databases Access Methods in context No indexes Gustavo Alonso Institute of Computing Platforms Department of Computer Science


slide-1
SLIDE 1

Data Management Systems

  • Access Methods
  • Pages and Blocks
  • Indexing
  • Access Methods in context

Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich

1 Access Methods in context

Denormalized tables Log structured databases No indexes

slide-2
SLIDE 2

Alternative designs

  • There are many different versions of what we have studied regarding

access methods

  • New and old ideas keep being applied because the context changes:
  • Denormalized tables = store more than one table in the same blocks
  • It is a form of materialized view with a join already performed
  • Useful if tables are more often accessed together than on their own
  • Log Structured representation = do not store tuples but only updates to them
  • Main memory databases for OLTP
  • Cloud storage
  • Avoid creating indexes
  • Cloud databases (Snowflake)
  • Database Cracking (MonetDB)

Access Methods in context 2

slide-3
SLIDE 3

Denormalized tables

Access Methods in context 3

slide-4
SLIDE 4

Merging tables in advance

  • Normal forms are rules used to eliminate redundancy in the schema
  • f a database
  • They force to split tables into differnet tables until each one represents a

distinct concept

  • Good idea to save space but often tables will be joined in almost

every query

  • Some systems allow to cluster tables into the same segment so that

blocks contain data form both tables (indexed by the common attributes/keys)

Access Methods in context 4

slide-5
SLIDE 5

Clustered Tables (Oracle)

  • Clustered tables make sense

when the tables are really processed together most of the time

  • Reduces I/O, saves space
  • Tuples that are related are

stored together in the same block, reducing access time

  • It is like a materialized join
  • Updates can become expensive

Access Methods in context 5

slide-6
SLIDE 6

Log structured databases

Access Methods in context 6

slide-7
SLIDE 7

Log structured file

  • Instead of storing tuples and modifying them as needed, keep a record of

how the data was modified (a log):

  • Inserts records the entire tuple
  • Delete records that the tuple is invalidated
  • Updates records the attributes that have been modified
  • New entries are appended at the end of the file

Block INSERT id = 17, Val = 3 DELETE id = 11 UPDATE id = 25, Val = 7 UPDATE id = 98, Val = 14 New entries appended at the end

Access Methods in context 7

slide-8
SLIDE 8

Log structured database

  • It is much faster to update a file sequentially than to do random

accesses (even in SSDs)

  • In cloud storage, file storage is typically append only
  • It minimizes the cost of making the data persistent (very expensive for

OLTP databases with many transactions)

  • Not for heavy query analytics but very good for in memory

transactional workloads

Access Methods in context 8

slide-9
SLIDE 9

Optimizing log structured representations

  • Periodically compact the log:
  • Remove the history and add one entry for every tuple
  • Index the tuples and their modifications (entry could contain pointer

to previous entry referring to that tuple)

  • It actually matches what often happens in many applications.
  • No updates in place
  • Record history of operations
  • Periodically apply all operations
  • Increasingly being used in a number of databases, especially cloud

databases

Access Methods in context 9

slide-10
SLIDE 10

No indexes

Access Methods in context 10

slide-11
SLIDE 11

Snowflake: Micro-partitions instead of indexes

  • Micro-partitions are Snowflake’s name for extents
  • What is interesting is how they are organized to facilitate query

processing

  • Size ranges between 50 and 500 MB (before compression, the data is always

compressed when in S3)

  • Each micro partition has metadata describing what is inside
  • The metadata can be read without reading the whole micro-partition
  • The metadata is used to read just the part of the micro-partition that is

relevant

  • Data in the micro-partition is stored in columnar form (by columns not by

rows)

Storage in Context 11

slide-12
SLIDE 12

Storage in Context 12

https://docs.snowflake.com/en/user-guide/tables-clustering-micropartitions.html

slide-13
SLIDE 13

Pruning based on metadata

  • The header for a micro-partition

contains information about the data. SELECT * FROM T WHERE age > 45

  • If header contains the min and max

age in the data, we can decide we do not need that micro-partition simply by looking at the header

  • Same as indexing built in the micro-

partitions

Storage in Context 13

slide-14
SLIDE 14

Database Cracking

  • Pioneered in MonetDB (open source

column store)

  • Do not create an index, instead, build

the index incrementally while the data is being processed

  • Based on creating a copy of the

column

  • Do it in successive passes so that the

cost is not too high

  • Initial queries expensive, later cost is

amortized as work has already been done

Access Methods in context 14