Data Storage and Disk Structure A Simple Implementation of DBMS - - PDF document

data storage and disk structure a simple implementation
SMART_READER_LITE
LIVE PREVIEW

Data Storage and Disk Structure A Simple Implementation of DBMS - - PDF document

Data Storage and Disk Structure A Simple Implementation of DBMS One file per table Students(name, id, dept) in a file Students A meta symbol # to separate attributes Smith#123#CS Johnson#522#EE Database schema in a


slide-1
SLIDE 1

Data Storage and Disk Structure

slide-2
SLIDE 2

CMPT 454: Database II -- Storage and Disk Structure 2

A Simple Implementation of DBMS

  • One file per table

– Students(name, id, dept) in a file Students – A meta symbol “#” to separate attributes

Smith#123#CS Johnson#522#EE …

  • Database schema in a special file Schema

Students#name#STR#id#INT#dept#STR Depts#name#STR#office#Str …

slide-3
SLIDE 3

CMPT 454: Database II -- Storage and Disk Structure 3

Naïve Query Answering

SELECT * FROM Students WHERE dept = ‘CS’ | CSStud

  • Read file Schema to determine the attributes of relation Student and

their types

  • Check that condition dept = ‘CS’ is semantically valid for Students
  • Create a new file CSStud
  • Read file Students, for each line

– Check condition dept = ‘CS’, if it is true then write the line as a tuple to file CSStud

  • Add to the file Schema a line about CSStud
  • Problems

– If we change EE to ECON in one tuple in Students, the entire file has to be rewritten – Even if we look for one student, we have to read the whole file – If multiple users read/write file Students simultaneously, what would happen?

slide-4
SLIDE 4

CMPT 454: Database II -- Storage and Disk Structure 4

Handling Joins

SELECT office FROM Students, Depts WHERE Students.name = ‘Smith’ AND Students.dept = Depts.name;

  • Algorithm

FOR each tuple s in Students DO FOR each tuple d in depts DO IF s.name = ‘Smith’ AND s.dept = d.dept THEN write d.office as a tuple to the output

  • More problems

– Why do we need to match a student “Cindy” with all departments? – I/O Complexity: O(n2), costly! – What if the system crashes?

slide-5
SLIDE 5

CMPT 454: Database II -- Storage and Disk Structure 5

Storage Device Hierarchy

  • How should we store

data on disks so that queries can be answered efficiently?

  • How can we organize

disks effectively so that a database built on top can be more efficient and robust?

slide-6
SLIDE 6

CMPT 454: Database II -- Storage and Disk Structure 6

Where Is DMBS?

Disk Tertiary storage Main memory Cache Virtual memory File system DBMS

slide-7
SLIDE 7

CMPT 454: Database II -- Storage and Disk Structure 7

Volatile/Nonvolatile, Trustworthy Storage

  • A volatile device “forgets” what is stored in it when the power goes off

– Main memory, cache

  • A nonvolatile device keeps its contents intact even for long periods

when the device is turned off or there is a power failure

– Disk, CD, USB flash memory, …

  • When a customer and a bank have controversy on a transaction … Let

us check the transaction record – oops, the records are stored on a disk in the bank …

  • Trustworthy: once the data is stored, it cannot be changed later

– WORM model: write once read many, e.g., CD ROM, ROM memory chips – WORM storage systems

slide-8
SLIDE 8

CMPT 454: Database II -- Storage and Disk Structure 8

Moving Head Disk Mechanism

slide-9
SLIDE 9

CMPT 454: Database II -- Storage and Disk Structure 9

Disk Controller

  • Control the actuator to move the head

assembly

  • Select a surface to read or write
  • Transfer the bits read/to be written between

main memory and the desired sector

Bus CPU Main memory Disk controller Disk Disk Disk Disk

slide-10
SLIDE 10

CMPT 454: Database II -- Storage and Disk Structure 10

Disk Access Characteristics

  • How is a desired data block read?

– The heads are positioned at the cylinder containing the track on which the block is located – The sectors containing the block move under the disk head as the entire disk assembly rotates

  • Latency of the disk: the time taken between the

moment at which the command to read a block is issued and the time that the contents of the block appear in main memory

slide-11
SLIDE 11

CMPT 454: Database II -- Storage and Disk Structure 11

Breakdown of Disk Latency

  • The time taken by processor and disk controller to process the request

– Typically a fraction of a millisecond, can be neglected

  • Seek time: the time to position the head assembly at the proper

cylinder

– Typically tens of milliseconds

  • Rotation latency: the time for the disk to rotate so the first of the sectors

containing the block reaches the head

– A typical disk rotates completely once very 10 milliseconds, thus, the expected rotational latency is about 5 milliseconds

  • Transfer time: the time it takes the sectors of the block and any gaps

between them to rotate past the head

– If a disk has 250 kb per track and rotates once in 10 milliseconds, the read rate is 25 mb per second

slide-12
SLIDE 12

CMPT 454: Database II -- Storage and Disk Structure 12

Breakdown of Disk Latency

slide-13
SLIDE 13

CMPT 454: Database II -- Storage and Disk Structure 13

Organizing Data by Cylinders

  • To reduce seek time, we can store data that is likely to be accessed

together on a single cylinder or several adjacent cylinders

  • If all bocks to read/write are on a single track or on a cylinder

consecutively, only one seek time and one rotational latency are needed

  • Writing blocks is quite analogous to reading a block

– If it is required to verify whether the written block is correct, wait for an additional rotation and read each sector back

  • Modifying blocks

– Read the block into main memory – Make changes to the block in the main memory copy of the block – Write the new contents of block back to the disk – If necessary, verify the write

slide-14
SLIDE 14

CMPT 454: Database II -- Storage and Disk Structure 14

Stripping and Mirroring

  • Bit-level stripping

– Write bit i of each byte to disk i – Access is sped up for 8 times

  • Block-level stripping

– With an array of n disks, block I of the disk array is written to disk (i mod n) + 1 and use block ⎣i / n⎦ of the disk

  • Mirroring: maintaining multiple copies
slide-15
SLIDE 15

CMPT 454: Database II -- Storage and Disk Structure 15

Stripping versus Mirroring

Block-level stripping

slide-16
SLIDE 16

CMPT 454: Database II -- Storage and Disk Structure 16

RAID Levels 0, 1, 2, and 3

  • Redundant Array of Independent/Inexpensive Drives
  • RAID 0: striping at the block level, no redundancy
  • RAID 1 mirroring and block striping
  • RAID 2: byte level striping + error-correcting codes
  • RAID 3: bit-interleaved parity organization
slide-17
SLIDE 17

CMPT 454: Database II -- Storage and Disk Structure 17

RAID Levels 4, 5, and 6

  • RAID 4: block-level striping + a parity block
  • RAID 5: parity distributed in all disks
  • RAID 6: storing more redundant information to recover

from multiple disk failures

slide-18
SLIDE 18

CMPT 454: Database II -- Storage and Disk Structure 18

RAM Model and I/O Model

  • RAM Model: when data can be held in main memory, the

bottleneck of computation is CPU

– Each data item can be accessed using approximate the same amount of time

  • I/O Model: if a block needs to be moved between disk and

main memory, the time taken to perform the read/write is much longer than the time likely to be used to manipulate the data in main memory

– Databases often cannot fit into main memory

  • The number of block access is a good approximation to the

time needed by the algorithm and should be minimized

slide-19
SLIDE 19

CMPT 454: Database II -- Storage and Disk Structure 19

Sorting in Main Memory

  • Sorting 5,000,000 tuples, each tuple takes 128

bytes, totally 640 Mb data – cannot fit into a computer with 512 Mb main memory

  • Each disk block has 16 kb and can contain 128

tuples – 39,063 disk blocks

  • Quicksort? Fastest if all data is in memory

– Need to randomly access data items, many accesses to disk blocks if data is on disk

slide-20
SLIDE 20

CMPT 454: Database II -- Storage and Disk Structure 20

Merge Sort

  • Basis: if the length of the list is
  • ne, return
  • Induction: if the list is of length

more than one, divide the list into two lists that are either of the same length or as close as possible, recursively sort the two sublists and then merge the resulting sorted sublists into

  • ne sorted list
  • The time to merge two lists in

main memory is O(n1+n2)

  • Log2n phases in total, total cost

O(n log n)

slide-21
SLIDE 21

CMPT 454: Database II -- Storage and Disk Structure 21

Two-Phase, Multiway Merge Sort

  • Phase 1 – sort main memory sized

partitions into sorted sublists

– Fill all available main memory with blocks from the original tuples to be sorted – Sort the records that are in main memory – Write the sorted records from main memory onto new blocks of disk, form

  • ne sorted list
  • Phase 2 – merge all the sorted

sublists into a single sorted list

– The number of sublists is smaller than the number of blocks in the available main memory

slide-22
SLIDE 22

CMPT 454: Database II -- Storage and Disk Structure 22

How to Merge?

  • Find the smallest key among the first remaining elements of all the lists

– A linear search of the heads of all sublists which are in main memory

  • Move the smallest element to the first available position of the output

block

  • If the output block is full, write it to disk and reinitialize the same buffer

in main memory to hold the next output block

  • If the block from which the smallest element was just taken is now

exhausted of records, read the next block from the same sorted sublist into the same buffer

  • Cost analysis

– Blocks are read in an unpredictable order – Every block holding records from one of the sorted lists is read from disk

  • nly once

– The number of block written to disk is the same of the blocks holding the sublists – Each record in a sublist will be moved to the output buffer exactly once

slide-23
SLIDE 23

CMPT 454: Database II -- Storage and Disk Structure 23

Generalization to Multiple Phases

  • A third phase can be used if there are too

many sublists

  • A small computer has 100 Mb for sorting
  • A block has 16 kb, 6400 blocks can be held

into main memory

  • A third phase is needed only if the size of

the data is more than 100 Mb * (6400-1) = 624.9 Gb

slide-24
SLIDE 24

CMPT 454: Database II -- Storage and Disk Structure 24

First-Come-First-Serve Scheduling

Cylinder

  • f request

First time available Time completed 2000 10 20 30 4.42 6000 13.84 14000 27.26 4000 42.68 16000 60.10 10000 71.52 2000 2000 + |6000-2000| 4000 + |14000-6000| 8000 + |4000-14000| 10000 + |16000-4000| 12000 + |10000-16000| 6000 = 42000 2000 4000 6000 10000 14000 16000

slide-25
SLIDE 25

CMPT 454: Database II -- Storage and Disk Structure 25

Elevator Algorithm

Cylinder

  • f request

First time available Time completed 2000 10 20 30 4.42 6000 13.84 14000 27.26 4000 57.52 16000 34.68 10000 46.10 2000 2000 + |6000-2000| 4000 + |14000-6000| 8000 + |16000-14000| 2000 + |10000-16000| 6000 + |4000-10000| 6000 = 28000 2000 4000 6000 10000 14000 16000

slide-26
SLIDE 26

CMPT 454: Database II -- Storage and Disk Structure 26

Disk Scheduling

  • The elevator algorithm can achieve good

performance on average

– In our example, it saves 14000 (1/3 of 42000 in first-come-first-serve method)

  • The more different request, the better

performance the elevator algorithm

  • The elevator algorithm is not optimal

– Can you give an example where the elevator algorithm performs worse than the first-come- first serve method?

slide-27
SLIDE 27

CMPT 454: Database II -- Storage and Disk Structure 27

Prefetching

slide-28
SLIDE 28

CMPT 454: Database II -- Storage and Disk Structure 28

Accelerating Access to Disk

R/W in a predictable sequence Scattered R/W requests, unpredictable Cylinder-based organization X Multiple disks X X Mirroring X X Elevator algorithm X Prefetching X More disks (and thus money) are needed for using multiple disks and mirroring

slide-29
SLIDE 29

CMPT 454: Database II -- Storage and Disk Structure 29

Fixed-Length Records and Free List

slide-30
SLIDE 30

CMPT 454: Database II -- Storage and Disk Structure 30

Slotted-Page Structure

slide-31
SLIDE 31

CMPT 454: Database II -- Storage and Disk Structure 31

Sequential File Organization

  • Search key: a set of attributes on which a

query is often conducted

slide-32
SLIDE 32

CMPT 454: Database II -- Storage and Disk Structure 32

Insertion into a Sequential File

slide-33
SLIDE 33

CMPT 454: Database II -- Storage and Disk Structure 33

Multitable Clustering File Structure