Indexes Database Systems: The Complete Book Ch. 13.1-13.3, - - PowerPoint PPT Presentation

indexes
SMART_READER_LITE
LIVE PREVIEW

Indexes Database Systems: The Complete Book Ch. 13.1-13.3, - - PowerPoint PPT Presentation

Indexes Database Systems: The Complete Book Ch. 13.1-13.3, 14.1-14.2 1 2 3 $88 $24 4 $88 $24 Hardcover (heavy) Paperback (light) 5 $88 $24 Hardcover (heavy) Paperback (light) Bigger Small 6 $88 $24 Hardcover (heavy) Paperback


slide-1
SLIDE 1

Indexes

Database Systems: The Complete Book

  • Ch. 13.1-13.3, 14.1-14.2

1

slide-2
SLIDE 2

2

slide-3
SLIDE 3

3

slide-4
SLIDE 4

4

$88 $24

slide-5
SLIDE 5

5

$88 $24 Hardcover (heavy) Paperback (light)

slide-6
SLIDE 6

6

$88 $24 Hardcover (heavy) Paperback (light) Bigger Small

slide-7
SLIDE 7

7

$88 $24 Hardcover (heavy) Paperback (light) Bigger Small Good ToC/Index Bad ToC/Index

slide-8
SLIDE 8

The Memory Hierarchy

Fast (but small) Big (but slow)

8

slide-9
SLIDE 9

Data Organization

Heap Clustered/ Sorted Indexed Records stored in any order

Records grouped together or stored in sorted order,

Secondary file used to organize data records What are the benefits/drawbacks of each method? When do we use each method? Does it matter what medium the data is being stored on?

9

slide-10
SLIDE 10

IO Operations are Bad

10

slide-11
SLIDE 11

Recap / GroupWork

SELECT o.FirstName, o.LastName FROM Officers o WHERE o.Rank >= 3 AND ( o.Ship = 1701 OR o.Ship = 2000 )

What is an equivalent Relational Algebra expression?

What is the maximum working set size? What is the time complexity?

11

slide-12
SLIDE 12

Query Evaluation

  • A query plan identifies the evaluation path.
  • Individual operators express primitive operations.
  • Select, project, join, sort, etc…
  • Individual operators can be evaluated in isolation.
  • e.g., Select: Drop rows that fail the predicate
  • … but sometimes combinations of operators are better.
  • e.g., Select+Cross Product vs Join

12

slide-13
SLIDE 13

Let’s Consider Select…

SELECT o.FirstName, o.LastName FROM Officers o WHERE o.Rank >= 3 AND ( o.Ship = 1701 OR o.Ship = 2000 )

How would you evaluate this query? How would you organize the data for this query?

13

slide-14
SLIDE 14

14

Problem

Select searches for data Checking every data value is correct, but not efficient

Solution

Organize the data!

What are some ways of organizing the data?

slide-15
SLIDE 15

Organizing the Data

  • Solution 1: Sort
  • Store the data sorted
  • Solution 2: Partition (e.g., Hash)
  • Deterministically create ‘buckets’ of data.
  • Solution 3: Organize References
  • Store/organize ‘pointers’ to the data.

What are some pros and cons for each solution?

15

slide-16
SLIDE 16

Indexing (high level)

1, 5 10,5 3,9 1,8 5,1 6,1

16

A, B

slide-17
SLIDE 17

Indexing (high level)

1, 5 10,5 3,9 1,8 5,1 6,1 Data Sorted on A

(clustered index)

Pointers Sorted on B

(unclustered index)

5 9 8 1 1 5 Want Efficient Lookups on Both A and B!

17

slide-18
SLIDE 18

Back to Select

σA = 1 σA < 1 σA = 1 AND B = 2

How would you sort your data for… (and how would you evaluate it)

18

slide-19
SLIDE 19

Data Organization

  • Each clause in a CNF boolean formula must be true.
  • API: Give me all records (or record IDs) that satisfy this

predicate (these predicates)

  • Equality search: All records with field X = ‘Y’
  • Officer.Ship = ‘1701A’
  • Range search: All records with field X ∈ [Y, Z]
  • Officer.Rank ∈ [3, +∞)

19

slide-20
SLIDE 20

Problem…

20

Let’s say you have 220 blocks (~4GB) of data sorted on A How many IOs are required to find one A? In general, for N blocks, how many IOs? log2(N)

Why?

slide-21
SLIDE 21

“Searching”

21

1 2 3 4 5 6 7 8 9 10 All Things Things < 5 Things > 5

Things < 2 2 < Things < 5

“Find 3”

As you search, you are effectively building a binary tree.

slide-22
SLIDE 22

Shorter Trees

22

Binary Tree → Log 2 Depth N-ary Tree → Log N Depth

slide-23
SLIDE 23

Tree-Based Indexes

23

slide-24
SLIDE 24

The ISAM Datastructure

24

… … … … … … …

Non-Leaf Pages Leaf Pages

p0 p1 p2 p3 p4 k1 k2 k3 k4 … Non-Leaf Page

Leaf Pages contain <K, RID> or <K, Record> pairs

slide-25
SLIDE 25

Constructing an ISAM Index

25

1) Allocate (sequential) leaf pages 2) Ensure that the data on the leaf pages is sorted 3) Build the non-leaf pages (in arbitrary order)

… … … … … … …

slide-26
SLIDE 26

ISAM Index Searches

26

Equality: Start at root, use key comparisons to find leaf Range: Use key comparisons to find start and end page Scan all pages in between start/end leaves.

… … … … … … …

slide-27
SLIDE 27

Constructing an ISAM Index

27

… … … … … … …

Do you see any problems with this?

slide-28
SLIDE 28

Updating an ISAM Index

28

… … … … … … …

1) When creating the index leave free space in each leaf page 2) The index stays the same, new data is added to the free space 3) If a leaf page overflows, we create an overflow page (or more)

slide-29
SLIDE 29

An Example ISAM

29

10,15 33 20 40 63 51 20,27 33,37 40,46 23 , 48 41 42 51,55 63,97

slide-30
SLIDE 30

B+ Trees

30

Index Entries Data Entries

Data pages not sequential - Need linked list for traversals

slide-31
SLIDE 31

B+ Trees

31

2, 3,5, 7

17 13 30 24

14,16,_,_ 19,20,22,_ 24,27,29,_ 33,34,38,39

Search proceeds as in ISAM via key comparisons Find 5. Find 15. Find [24,∞)

slide-32
SLIDE 32

B+ Tree Invariants

32

  • Keep space open for insertions in inner/data nodes.
  • ‘Split’ nodes when they’re full
  • Avoid under-using space
  • ‘Merge’ nodes when they’re under-filled
  • Maintain Invariant: All Nodes ≥ 50% Full
  • (Exception: The Root)
slide-33
SLIDE 33

Example

33

Inner Nodes: 4 values, 5 pointers Data Nodes: 4 values

slide-34
SLIDE 34

17 13 30 24

Inserting into B+ Trees

34

2, 3,5, 7 14,16 19,20,22 24,27,29 33,34,38,39

Insert 8

slide-35
SLIDE 35

17 13 30 24

Inserting into B+ Trees

35

2, 3,5, 7 2, 3 5,7,8 ,

5

Copy <5> into parent index Insert 8

slide-36
SLIDE 36

17 13 30 24

Inserting into B+ Trees

36

Copy <5> into parent index

5

Move <17> into parent index : Root Split!

slide-37
SLIDE 37

17 13 30 24

Inserting into B+ Trees

37

Copy <5> into parent index

5

Move <17> into parent index : Root Split!

slide-38
SLIDE 38

Inserting into B+ Trees

38

5, 7,8 14,16 19,20,22 24,27,29 33,34,38,39

13 30 24 5 17

2, 3

Why do we move, rather than copy the 17? Are we guaranteed to satisfy our occupancy guarantee?

slide-39
SLIDE 39

Deleting from B+ Trees

39

5, 7,8 14,16 19,20,22 24,27,29 33,34,38,39

13 30 24 5 17

2, 3

Delete 19 Delete 20

20,22 22 22,24 27,29

27

slide-40
SLIDE 40

27

Deleting from B+ Trees

40

5, 7,8 14,16 33,34,38,39

13 30 5 17

2, 3

Delete 24

22,24 27,29 22 22,27,29

slide-41
SLIDE 41

Non-Leaf Redistribution

41

5, 7,8 14,16 22,27,29

20 13 5 17 22

2, 3 17,18 20,21

30

33,34,38

slide-42
SLIDE 42

Non-Leaf Redistribution

42

5, 7,8 14,16 22,27,29

20 13 5 17 22

2, 3 17,18 20,21

30

33,34,38

Intuitively, we rotate index entries 17-22 through the root