Data Management Systems Storage Management Basic principles Memory - - PowerPoint PPT Presentation

data management systems
SMART_READER_LITE
LIVE PREVIEW

Data Management Systems Storage Management Basic principles Memory - - PowerPoint PPT Presentation

Data Management Systems Storage Management Basic principles Memory hierarchy Blocks instead of pages Segments and file storage Tablespaces, segments, extents Database buffer cache Updates, free space Storage


slide-1
SLIDE 1

Data Management Systems

  • Storage Management
  • Memory hierarchy
  • Segments and file storage
  • Database buffer cache
  • Storage techniques in context
  • Basic principles
  • Blocks instead of pages
  • Tablespaces, segments, extents
  • Updates, free space

Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich

Storage - Segments and File storage 1

slide-2
SLIDE 2

Storage in databases

  • Remember the two key guarantees provided by a database in regard

to storage

  • Data is persistent
  • Data is recoverable
  • Even when failures occur!!
  • Add as well the following property:
  • Physical data independence
  • Taken together these three aspects play a big role on how databases

store data to maintain these properties while still achieving the necessary performance.

Storage - Segments and File storage 2

slide-3
SLIDE 3

Storage Management

Physical storage Blocks, files, segments Pages in memory Physical records Logical records (tuples) Logical data (tables, schemas) Relations, views Queries, Transactions (SQL) Record Interface Record Access Page access File Access Application Logical view (logical data) Access Paths Physical data in memory Page structure Storage allocation

3 Storage - Segments and File storage

slide-4
SLIDE 4

Disclaimers

  • As usual:
  • Standard database architectures based on slow, hard drive disks (HDD) with a

high latency for seek operations (random access)

  • Assuming most of the database is not in memory
  • Today, some things are different:
  • More main memory available
  • Different storage media (SSD, NVM, network attached storage)
  • Principles remain the same and illustrate the underlying problem

rather than a particular implementation

Storage - Segments and File storage 4

slide-5
SLIDE 5

Explanations using a real system

  • Many of the explanations that follow are based on Oracle’s database
  • Cover all the basics
  • Provide a good example of how a real system works
  • Demonstrate the many tuning knobs available in a database
  • Prove why some people consider databases too complicated (more about this later in the course)
  • Ideas and concepts are nevertheless generic and they are somewhat similar across

systems

  • Logical storage:
  • https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/logical-storage-

structures.html#GUID-4AF2D61A-8675-4D48-97A4-B20F401ADA16

  • https://docs.oracle.com/cd/B19306_01/server.102/b14220/logical.htm
  • Disk storage:
  • https://docs.oracle.com/cd/B19306_01/server.102/b14220/physical.htm
  • https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/physical-storage-

structures.html#GUID-008A1F08-9C75-4E9F-A70B-41FB942C60B4

Storage - Segments and File storage 5

slide-6
SLIDE 6

Problem statement

  • A database is doing many things at the same time (in the same way

an Operating System is managing many different user processes at the same time)

  • Each “thing” (a query, a system process, a database component)

active at any point in time needs its own logical view of the data (and correspondingly, of memory, and disk). This is the same as the OS giving a process the impression it is alone in the machine.

  • A database engine creates such virtual, logical views of the system

using its own mechanisms (and different from the OS)

Storage - Segments and File storage 6

slide-7
SLIDE 7

Logical and Physical storage in Oracle 19

  • Entity-relationship diagram

(crow’s foot implies one-to-many)

  • Logical:
  • Tablespaces
  • Segments
  • Extent
  • Block
  • Physical:
  • Data File
  • OS block

Storage - Segments and File storage 7 https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/logical-storage- structures.html#GUID-13CE5EDA-8C66-4CA0-87B5-4069215A368D

slide-8
SLIDE 8

Tablespaces

  • A tablespace is a logical data unit in the database:
  • Schema related:
  • A table
  • An index
  • Several tables (clustered tables)
  • Engine related:
  • Data structures for the database engine (result buffers, undo buffers, etc.)
  • A tablespace provides a logical representation of the principle of spatial

locality (keep together what belongs together)

  • Does not necessarily mean all data is continuous
  • It means all the information and all the data of a tablespace is under the same

umbrella

  • Space (memory/disk) is allocated to tablespaces

Storage - Segments and File storage 8

slide-9
SLIDE 9

Segments, Extents, and Blocks (example)

Storage - Segments and File storage 9 https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/logical-storage- structures.html#GUID-13CE5EDA-8C66-4CA0-87B5-4069215A368D

slide-10
SLIDE 10

Generalizing

  • A given logical object in a database (a table, an index) is “stored” in a

tablespace

  • A tablespace is organized into segments (each schema object has a

segment)

  • Segments have space allocated to them in the form of extents. A segment

can have several extents

  • Extents are sets of contiguously allocated data blocks. Extents are mapped

to one data file (and typically to a file)

  • Data blocks are the smallest allocation unit of space (not necessarily a

page, databases typically use blocks larger than an OS page). The size of the blocks is a tunable parameter.

Storage - Segments and File storage 10

slide-11
SLIDE 11

Translating

  • Tablespace = keep together what needs to be kept together
  • Segment = an object in the schema (or part of an object if it is

partitioned)

  • Extent = groups of continuously allocated pages
  • Blocks = glorified pages
  • Why so complicated?:
  • Tablespaces = logical locality
  • Segments = allocate (virtual) space to objects
  • Extents = allocate contiguous physical space
  • Blocks = the unit of space allocation

Storage - Segments and File storage 11

slide-12
SLIDE 12

More explanations and analogies

  • Tablespaces provide a logical unit to refer to the storage of well

defined entities (a table, an index, a buffer, etc.)

  • Virtual space is allocated to a Tablespace in a segment. A segment

acts as a form of virtual memory where everything belonging to the same entity appears continuous and can be treated as a single unit

  • Actual space is allocated to a segment through extents. Extents are

sets of blocks that are physically contiguous on storage and can be allocated as a whole

  • The space in a segment is divided into blocks which typically

correspond to several OS pages.

Storage - Segments and File storage 12

slide-13
SLIDE 13

Segments I

  • A segment allocates the equivalent of virtual memory to a tablespace
  • Common setup:
  • Tablespace -> table
  • Tablespace has one segment
  • Segment has one or more extents
  • These structures, like all examples that follow, are created to provide

physical data independence. Higher layers only need to know in which tablespace a table is. This allows the engine to change everything below by simply changing the pointers to the segment, to the extents, etc.

Storage - Segments and File storage 13

slide-14
SLIDE 14

Segments II

  • The mapping of a segment to a tablespace simplifies the

manipulation of database entities:

  • CREATE TABLE T
  • DROP TABLE T
  • CREATE INDEX I ON table_name (attributes)
  • DROP INDEX I
  • If we need more space for a table or an index, the table is still

referred to by its segment, but the segment indexes as many extents as needed.

  • If we partition a table, each partition has its own segment but the

table is still referred to by its tablespace

Storage - Segments and File storage 14

slide-15
SLIDE 15

Extents

Storage - Segments and File storage 15

  • Extents provide optimal access by

ensuring logically continuous allocation of blocks (blocks can end up non continuous on disk because of RAID, for instance)

  • Acts as a form of virtual address

space:

  • Allocate and release blocks to the

extent

  • Easy sequential search
  • Can be dropped as an unit

https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/logical-storage- structures.html#GUID-13CE5EDA-8C66-4CA0-87B5-4069215A368D

The big square is a data file! Space available in the data file

slide-16
SLIDE 16

Dynamic extents

  • When a segment is created, it

is allocated an extent (how big? => tunable parameter)

  • If more space is needed,

another extent is created

  • The new extent is not

contiguous with the previous

  • ne and can be on a different

data file

Storage - Segments and File storage 16

The big square is a data file!

slide-17
SLIDE 17

Why extents?

  • Better than the alternatives:

Storage - Segments and File storage 17

Static file mapping Segment (starting address, size) Easy to maintain Highly efficient (performance) Poor utilization (space) No flexibility Dynamic extents Dynamic block mapping Difficult to maintain Non contiguous Poor performance Maximum flexibility

slide-18
SLIDE 18

Why extents?

  • Databases must optimize along many dimensions.
  • A static file mapping is very easy to manage but induces

fragmentation and provides no flexibility

  • Dynamic block mapping is extremely flexible but data is not

contiguous and is expensive to maintain

  • Extents are a compromise:
  • An extent provides a static mapping to a set of blocks (like a static file

mapping)

  • When more space is needed, extents are dynamically allocated (like with

dynamic block mapping)

Storage - Segments and File storage 18

slide-19
SLIDE 19

Managing space in extents

  • Extents are collections of blocks.
  • How large should a extent be?
  • What do we do when we need more extents?
  • Modern databases offer many tunable parameters to control extents
  • Uniform allocation: all extents are of the same size (tunable) and when more

space is needed, another extent is added

  • Automatic: when more space is needed, a new extent is added and its size
  • ptimized according to come criterion.
  • Typical database design trade-off: we want a very large extent to have

the data contiguous but that induces fragmentation and it is not

  • efficient. Instead, allocate groups of contiguous blocks as needed.

Storage - Segments and File storage 19

slide-20
SLIDE 20

Allocating and sizing new extents

  • A good way to allocate new

extents is to increase their size exponentially (~1.25)

  • Better to allocate a bit more than

to be constantly allocating new extents (if a table grows, it is likely to keep growing)

  • Bounds the size of the extent

directory

Storage - Segments and File storage 20

Dynamic extents

slide-21
SLIDE 21

Blocks vs OS pages

  • Operating systems deal with pages to implement virtual memory
  • Databases typically store a lot of data:
  • A table, an index might span many OS pages
  • Allocating space page by page at OS granularity is too much overhead
  • If one OS page belongs to a table, it is likely we will also be interested in the
  • ther pages allocated to that table (locality)
  • Hence, databases use blocks rather than OS pages
  • This is why, although most databases today use the OS I/O system for

writing to disk, they manage their own files and memory representation

  • Later on we will see the structure of database pages in memory

Storage - Segments and File storage 21

slide-22
SLIDE 22

Block structure: slotted pages

  • A block is structured as follows:
  • Header: address and type of segment

(index, table, etc.)

  • Table directory: schema of the table

stored in the block

  • Row directory: pointers to the actual

tuples stored in the block

  • Free space
  • Row data (tuples) stored in the block
  • The directory grows downwards, the

space for tuples is used upwards

Storage - Segments and File storage 22

slide-23
SLIDE 23

Slotted pages

  • The row directory allows to insert

tuples anywhere in the block (and change their position) while maintain a simple addressing schema:

(block, slot)

  • How the space within a block is

managed depends on many

  • ptimizations
  • We will look into the block

structure in more detail later on

Storage - Segments and File storage 23

slide-24
SLIDE 24

Optimizing the use of blocks I

  • Percentage Free
  • Determines how much space in

each block is reserved for updating tuples instead of using it for storing new tuples

  • This is needed because an update

can result in a bigger tuple than the original one UPDATE T SET Adress = “AStreetWithAVeryLongName” WHERE LegiNr = 12345678

Storage - Segments and File storage 24

slide-25
SLIDE 25

Optimizing the use of Blocks II

  • Percentage used
  • Determines how much space needs to

be free in a block before the free space can be used to insert new tuples

  • Blocks are unavailable to inserting new

tuples until they have the given amount

  • f free space
  • It is needed because if updates can

make tuples smaller (freeing up space), they can also make them bigger (needing space). The combination of both parameters avoids thrashing on the page

Storage - Segments and File storage 25

slide-26
SLIDE 26

Storage - Segments and File storage 26

slide-27
SLIDE 27

Fragmentation within blocks

  • Like any unit of storage, blocks can suffer form fragmentation.

However, compaction is a very expensive procedure

  • Compaction often done only when the block has enough space for an

INSERT or UPDATE but the space is not contiguous. Otherwise, the available space is used without reorganizing the tuples in the block.

  • UPDATE might require to move a tuple from one block to another if

the new size does not fit in the block where the tuple is. In such a case, the tuple is inserted into a new block and the original space is used to store a pointer to the new row. The ID of the tuple does not change (which means an indirection is needed to find the tuple).

Storage - Segments and File storage 27

slide-28
SLIDE 28

Where to find space?

  • A segment contains one or more free lists
  • Not done at the extent level because the search for space would be more

complex

  • A free list contains pointers to blocks that have usable free space
  • Using several free lists helps avoiding contention when performing

parallel inserts or updates

  • The free list is updated as transactions execute INSERT, DELETE, or

UPDATE statements using the rules established with PCTFREE and PCTUSED

Storage - Segments and File storage 28

slide-29
SLIDE 29

Part 2 Part 2

Tables have no order

  • Recall that tables have no order on the tuples?
  • A property of the relational model
  • A consequence of how data is organized
  • Example with extents

Storage - Segments and File storage 29

Table T Extent Part 1 Part 1 Part 3 Part 3 Part 4 Part 4

SELECT * FROM T ORDER BY …

slide-30
SLIDE 30

The art of writing to disk

  • Databases provide concurrency control and recovery.
  • Concurrency control: database is correct even if several transactions modified

the data at the same tie

  • Recovery: the state of the database can be recovered in the event of failures
  • This makes relational database engines unique and imposes a lot of

design and performance constraints

  • It is, nevertheless, one of the major advantages of database engines
  • We will study concurrency control and recovery in more detail later
  • n, here we just point out some of the implications in terms of

dealing with storage.

Storage - Segments and File storage 30

slide-31
SLIDE 31

Shadow paging

Storage - Segments and File storage 31

SEGMENT Data Free List EXTENT 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 8 9 10 11 12 13 14 15 16 17 18 UPDATE T SET attribute_1 = 1000 WHERE ID = 17 Modifies a tuple in page 5

slide-32
SLIDE 32

Shadow paging

Storage - Segments and File storage 32

SEGMENT Data Free List EXTENT 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 8 9 10 11 12 13 14 15 16 17 18 UPDATE T SET attribute_1 = 1000 WHERE ID = 17 Modifies a tuple in page 5 Creates a copy of page 5 and puts it into page 8 5’ Does the update on the new page

slide-33
SLIDE 33

Shadow paging

Storage - Segments and File storage 33

SEGMENT Data Free List EXTENT 1 2 3 4 5 6 7 1 2 3 4 8 6 7 8 9 10 11 12 13 14 15 16 17 18 8 9 10 11 12 13 14 15 16 17 18 UPDATE T SET attribute_1 = 1000 WHERE ID = 17 Modifies a tuple in page 5 Creates a copy of page 5 and puts it into page 8 Does the update on the new page When the transaction commits, makes page 8 the correct page and marks the old one as free 5

slide-34
SLIDE 34

Shadow Paging, pros and cons

  • Shadow paging is good for:
  • Recovery: it is enough to make sure that only clean pages (committed) are

written to disk and then the disk contains only clean data

  • Recovery: undo of an uncommitted transaction is easy, throw the page away
  • Concurrency Control: A transaction can use the copies it makes without

interfering with queries which are reading the original copies (snapshot isolation)

  • Shadow paging is bad because:
  • Page access and management is complicated (clean vs dirty pages, copy a

page, refresh lists on the segment when transaction commits, etc.)

  • Every update ends up moving a page somewhere else, destroying locality

Storage - Segments and File storage 34

slide-35
SLIDE 35

When to use shadow paging

  • Shadow paging makes sense when the overhead of creating a copy of

the page and managing the copies is small enough

  • Main memory (e.g., use the Copy on Write feature of the OS)
  • Possibly with Flash and NVM

Storage - Segments and File storage 35

slide-36
SLIDE 36

Delta Files

  • Using delta files, before

modifying a page, one makes a copy of the page

  • The copy is stored in a delta file

and can be used to undo the changes

  • The update is performed directly
  • n the original page
  • The copy in the delta file can be

discarded when no longer needed

Storage - Segments and File storage 36

EXTENT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 EXTENT 1 2 3 4 5’ 6 7 8 9 10 11 12 13 1415 16 17 18 5 DELTA

slide-37
SLIDE 37

Implementing delta files

  • Delta files are used for many different purposes and in many different

ways

  • Keep the old data in the delta file
  • Favors commits, simplifies undo, allows looking at older data
  • Keep the new data in the delta file
  • Favors aborts, allows to delay the propagation of updates
  • Oracle
  • used rollback segments (segments specifically used to store the old copy of

the data)

  • Now it uses undo-tablespaces

Storage - Segments and File storage 37

slide-38
SLIDE 38

Summary Segments and file storage

  • I/O is a big component of the performance of a database engine
  • Many different architectures and possibilities
  • Many opportunities for optimization and trade-offs
  • Plays a big role in concurrency control and recovery (see later)
  • Plays a big role in how data is accessed and manipulated (see later)
  • Many details changing as storage evolves from disk to Flash to NVM

to network attached storage …

Storage - Segments and File storage 38