Modification of Records How to handle the following operations on - - PowerPoint PPT Presentation

modification of records
SMART_READER_LITE
LIVE PREVIEW

Modification of Records How to handle the following operations on - - PowerPoint PPT Presentation

Modification of Records How to handle the following operations on the record level? 1. Insertion 2. Deletion 3. Update 1 1. Insertion Easy case: records not in sequence Insert new record at end of file If records are fixed-length,


slide-1
SLIDE 1

Modification of Records

How to handle the following operations

  • n the record level?
  • 1. Insertion
  • 2. Deletion
  • 3. Update

1

slide-2
SLIDE 2
  • 1. Insertion
  • Easy case: records not in sequence
  • Insert new record at end of file
  • If records are fixed-length, insert new

record in deleted slot

  • Difficult case: records are sorted
  • Find position and slide following records
  • If records are sequenced by linking, insert
  • verflow blocks

2

slide-3
SLIDE 3
  • 2. Deletion
  • a. Immediately reclaim space by shifting
  • ther records or removing overflows
  • b. Mark deleted and list as free for re-use
  • Tradeoffs:
  • How expensive is immediate reclaim?
  • How much space is wasted?

3

slide-4
SLIDE 4

Problem with Deletion

  • Dangling pointers:
  • When using physical addresses:
  • When using logical addresses:

4

R1 ? Never reused May be reused ID LOC 7788 Never reuse ID 7788 nor space in the map

slide-5
SLIDE 5
  • 3. Update
  • If records are fixed-length and the
  • rder is not affected:
  • Fetch the record, modify it, write it back
  • Otherwise:
  • Delete the old record
  • Insert the new record overwriting the

tombstones from the deletion

5

slide-6
SLIDE 6

Pointer Swizzling

  • Swizzling = replacement of physical

addresses by memory addresses when loading blocks into memory

  • Automatic Swizzling: swizzle all

addresses when loading a block (need to swizzle all pointer from and to the block)

  • Swizzling on Demand: use addresses

which are invalid as memory addresses

6

slide-7
SLIDE 7

Data Organizaton

  • There are millions of ways to organize

the data on disk

  • Flexibility Space Utilization

Complexity Performance

7

slide-8
SLIDE 8

Summary 9

More things you should know:

  • Memory Hierarchy
  • Storage on harddisks
  • Values, Records, Blocks, Files
  • Storing and modifying records

8

slide-9
SLIDE 9

Index Structures

9

slide-10
SLIDE 10

Finding Records

  • How do we find the records for a query?
  • Example: SELECT * FROM Sells
  • Need to examine every block in every file
  • Group blocks into files by relation!
  • Example: SELECT * FROM Sells

WHERE price = 20;

  • Need to examine every block in the file

10

slide-11
SLIDE 11

Finding Records

  • Use of indexes allows to narrow search

to (almost) only the relevant blocks

11

Index Blocks Holding records Value Matching records

  • Indexes can be dense or sparse
slide-12
SLIDE 12

Dense Index

12

Sequential File 20 10 40 30 60 50 80 70 100 90 Dense Index

10 20 30 40 50 60 70 80 90 100 110 120

slide-13
SLIDE 13

Sparse Index

13

Sequential File 20 10 40 30 60 50 80 70 100 90 Sparse Index

10 30 50 70 90 110 130 150 170 190 210 230

2nd level

10 90 170 250 330 410 490 570

slide-14
SLIDE 14

40 30

  • Delete 40

Deletion from Sparse Index

14

20 10 60 50 80 70

10 30 50 70 90 110 130 150

40 30 30

slide-15
SLIDE 15
  • Delete 30

Deletion from Sparse Index

15

20 10 60 50 80 70

10 30 50 70 90 110 130 150

40 30 40 30 40 40

10 40 50 70 90 110 130 150

slide-16
SLIDE 16
  • Delete 30 & 40

Deletion from Sparse Index

16

20 10 60 50 80 70

10 30 50 70 90 110 130 150

40 30 40 30

10 50 70 90 110 130 150

slide-17
SLIDE 17
  • Insert 35

Insertion into Sparse Index

17

20 10 60 50 80 70

10 30 50 70 90 110 130 150

30 35 30

slide-18
SLIDE 18
  • Insert 25

Insertion into Sparse Index

18

20 10 60 50 80 70

10 30 50 70 90 110 130 150

35 30 25

slide-19
SLIDE 19

Sparse vs Dense

  • Sparse uses less index space per record

(can keep more of index in memory)

  • Sparse allows multi-level indexes
  • Dense can tell if record exists without

accessing it

  • Dense needed for secondary indexes
  • Primary index = order of records in storage
  • Secondary index = impose different order

19

slide-20
SLIDE 20

Secondary Index

20

Sequential File 40 20 20 10 30 50 50 10 20 60 Secondary Index

10 10 20 20 20 30 40 50 50 60

2nd level

10 20 50

Careful when Looking for 20

slide-21
SLIDE 21

Secondary Index

21

Sequential File 40 20 20 10 30 50 50 10 20 60 Secondary Index

10 20 30 40 50 60

2nd level

10 50

slide-22
SLIDE 22

Combining Indexes

  • Just intersect buckets in memory!

22

Beer index Sells Price index OC 20

  • SELECT * FROM Sells WHERE beer =

“Od.Cl.“ AND price = “20“

C.Ch.

slide-23
SLIDE 23

Conventional Indexes

  • Sparse, Dense, Multi-level, ...
  • Advantages:
  • Simple
  • Sequential index is good for scans
  • Disadvantage:
  • Inserts expensive
  • Lose sequentiality and balance

23

slide-24
SLIDE 24

Example: Unbalanced Index

24

10 20 30 40 50 60 70 80 90 33 39 31 35 36 32 38 34

  • verflow area

(not sequential)

slide-25
SLIDE 25

B+Trees

25

slide-26
SLIDE 26

Idea

  • Conventional indexes are fixed-level
  • Give up sequentiality of the index in

favour of balance

  • B+Tree = variant of B-Tree
  • Allows index tree to grow as needed
  • Ensures that all blocks are between half

used and completely full

26

slide-27
SLIDE 27

Characteristics

  • Parameter n determines number of keys

and pointers per node

  • Key size 4 and pointer size 8 allows for

maximal n = 340 (4n + 8(n+1) < 4096)

  • Leafs contain at least n/2 key-pointer pairs

to records and a pointer to the next leaf

  • Interior nodes contain at least (n-1)/2 keys

and at least n/2 pointers to other nodes

  • No restrictions for the root node

27

slide-28
SLIDE 28

Example: B+Tree (n=3)

28

3 6 9 23 31 37 11 15 17 64 85 42 57 64 11 23 42

slide-29
SLIDE 29

Example: Leaf node

29

42 57

To record With key 42 To record With key 57 To next leaf

slide-30
SLIDE 30

Example: Interior node

30

To keys K < 11 To keys 11 ≤ K < 23

11 23

To keys 23 ≤ K

slide-31
SLIDE 31

Restrictions

31

Full node

  • min. node

Non-leaf Leaf

11 23 42 64 11 15 17 64 85

Counts even when null

slide-32
SLIDE 32

Insertion

  • If there is place in the appropriate leaf,

just insert it there

  • Otherwise:
  • Split the leaf in two and divide the keys
  • Insert the smallest value reachable through

the right node into the parent node

  • Recurse until there is enough room
  • Special case: Splitting the root results in

a new root

32

slide-33
SLIDE 33

Example: Insertion

  • Insert 85

33

3 6 9 11 17 23 31 37 42 57 11 23 42 42 57 85

slide-34
SLIDE 34

Example: Insertion

  • Insert 15

34

3 6 9 11 17 23 31 37 11 23 42 42 57 85 11 15 17

slide-35
SLIDE 35

Example: Insertion

  • Insert 64

35

3 6 9 23 31 37 11 23 42 42 57 85 11 15 17 64 85 42 57 64 11 23 42 42

slide-36
SLIDE 36

Deletion

  • If there are enought keys left in the

appropriate leaf, just delete the key

  • Otherwise:
  • If there is a direct sibling with more than

minimum key, steal one!

  • If not, join the node with a direct sibling and

delete the smallest value reachable through the former right sibling from its parent

  • Special case: If the root contains only
  • ne pointer after deletion, delete it

36

slide-37
SLIDE 37

Example: Deletion

  • Delete 9

37

3 6 9 23 31 37 11 15 17 64 85 42 57 64 11 23 42 3 6 9 3 6

slide-38
SLIDE 38

Example: Deletion

  • Delete 3

38

3 6 23 31 37 11 15 17 64 85 42 57 64 11 23 42 3 6 6 6 11 15 17 15 23

slide-39
SLIDE 39

Example: Deletion

  • Delete 11

39

6 11 23 31 37 15 17 64 85 42 57 64 15 23 42 6 11 6 6 15 17 23

slide-40
SLIDE 40

Example: Deletion

  • Delete 17, 37

40

23 31 37 64 85 42 57 64 42 6 15 17 23 6 15 23 31

slide-41
SLIDE 41

Example: Deletion

  • Delete 31

41

64 85 42 57 64 42 23 6 15 23 31 23 6 15 23 6 15 23 42 64

slide-42
SLIDE 42

Efficiency

  • Need to load one block for each level!
  • With n = 340 and an average fill of 255

pointers, we can index 255^3 = 16.6 million records in only 3 levels

  • There are at most 342 blocks in the first

two levels

  • First two levels can be kept in memory

using less than 1.4 Mbyte

  • Only need to access one block!

42

slide-43
SLIDE 43

Range Queries

  • Queries often restrict an attribute to a

range of values

  • Example:

SELECT * FROM Sells WHERE beer > 20;

  • Records are found efficiently by searching

for value 20 and then traversing the leafs

  • Can also be used if there is both an upper

and a lower limit

43

slide-44
SLIDE 44

Summary 10

More things you should know:

  • Dense Index, Sparse Index
  • Multi-Level Indexes
  • Primary vs Secondary Index
  • Structure of B+Trees
  • Insertion and Deletion in B+Trees

44