Database System Architecture Index Structures Hector Garcia-Molina - PowerPoint PPT Presentation

Database System Architecture Index Structures Hector Garcia-Molina Stijn Vansummeren

Index structure • Any data structure that takes as input a search key and efficiently returns the collection of matching records

Sequential File 10 20 30 40 50 60 70 80 90 100

Sequential File Dense Index 10 10 20 20 30 30 40 40 50 50 60 60 70 80 70 80 90 100 90 110 100 120

Sequential File Sparse Index 10 10 20 30 50 30 70 40 90 50 110 60 130 150 70 80 170 190 90 210 100 230

Sequential File Sparse 2nd level 10 10 10 20 90 30 170 50 30 250 70 40 90 330 50 110 410 60 130 490 150 70 570 80 170 190 90 210 100 230

Question: • Can we build a dense, 2nd level index for a dense index?

Sparse vs. Dense Tradeoff • Sparse: Less index space per record can keep more of index in memory • Dense: Can tell if any record exists without accessing file (Later: – sparse better for insertions – dense needed for secondary indexes)

Next: • Duplicate keys • Deletion/Insertion • Secondary indexes

Duplicate keys 10 10 10 20 20 30 30 30 40 45

Duplicate keys Dense index, one way to implement? 10 10 10 10 10 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 30 30 30 30 30 30 30 30 30 30 30 30 40 40 45 45

Duplicate keys Dense index, better way? 10 10 10 20 10 30 20 40 20 30 30 30 40 45

Duplicate keys Sparse index, one way? careful if looking 10 10 10 for 20 or 30! 10 10 20 20 30 20 30 30 30 40 45

Duplicate keys Sparse index, another way? place first new key from block 10 10 10 20 10 30 20 30 20 30 30 30 40 45

Deletion from sparse index 10 10 20 30 30 50 40 70 50 90 60 110 130 70 150 80

Deletion from sparse index ● delete record 40 10 10 20 30 30 50 40 70 50 90 60 110 130 70 150 80

Deletion from sparse index ● delete record 30 10 10 20 40 30 40 30 50 40 70 50 90 60 110 130 70 150 80

Deletion from sparse index ● delete records 30 & 40 10 10 20 50 30 30 50 70 40 70 50 90 60 110 130 70 150 80

Deletion from dense index 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80

Deletion from dense index ● delete record 30 10 10 20 20 30 40 40 30 40 40 50 50 60 60 70 70 80 80

Insertion, sparse index case 10 10 20 30 30 40 60 40 50 60

Insertion, sparse index case ● insert record 34 10 10 20 30 30 40 34 60 40 50 60 our lucky day! we have free space where we need it!

Insertion, sparse index case ● insert record 15 10 10 20 15 20 30 30 20 40 30 60 40 50 • Illustrated: Immediate 60 reorganization • Variation: ● insert new block (chained file) ● update index

Insertion, sparse index case ● insert record 25 10 25 10 20 30 30 overflow blocks 40 60 (reorganize later...) 40 50 60

Insertion, dense index case • Similar • Often more expensive . . .

Secondary indexes Sequence field 30 50 20 70 80 40 100 10 90 60

Secondary indexes Sequence • Sparse index field 30 30 50 20 80 20 100 70 80 90 40 ... 100 10 does not make sense! 90 60

Secondary indexes Sequence • Dense index field 10 30 20 50 30 10 20 40 50 70 90 50 ... 80 60 40 sparse 70 ... high 100 10 level 90 60

With secondary indexes: • Lowest level is dense • Other levels are sparse Also: Pointers are record pointers (not block pointers; not computed)

Duplicate values & secondary indexes 20 10 20 40 10 40 10 40 30 40

Duplicate values & secondary indexes one option... 20 10 10 10 10 Problem: 20 20 40 excess overhead! 20 ● disk space 10 30 40 40 ● search time 40 10 40 40 40 30 ... 40

Duplicate values & secondary indexes another option... 20 10 10 20 Problem: 20 40 variable size 10 records in 40 30 40 index! 10 40 30 40

Duplicate values & secondary indexes 20 10 10 20 20 30 40 40 10 50 40 60 ... 10 40 30 40 buckets

Why “bucket” idea is useful Indexes Records Name: primary EMP (name,dept,floor,...) Dept: secondary Floor: secondary

Query: Get employees in (Toy Dept) ^ (2nd floor) Dept. index EMP Floor index Toy 2nd → Intersect toy bucket and 2nd Floor bucket to get set of matching EMP’s

This idea used in text information retrieval Document cat s ...the cat is fat ... dog ...was raining cats and dogs... ...Fido the dog ... Inverted lists

IR QUERIES • Find articles with “cat” and “dog” • Find articles with “cat” or “dog” • Find articles with “cat” and not “dog” • Find articles with “cat” in title • Find articles with “cat” and “dog” within 5 words

Summary so far • Conventional index – Basic Ideas: sparse, dense, multi- level… – Duplicate Keys – Deletion/Insertion – Secondary indexes – Buckets of Postings List

Outline/summary • Conventional Indexes • Sparse vs. dense • Primary vs. secondary • B trees --> Next • B+trees vs. indexed sequential • Hashing schemes

Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance

Example Index (sequential) 10 39 20 31 30 35 33 36 continuous 40 50 60 32 38 34 free space 70 80 90 overflow area (not sequential)

• NEXT: Another type of index – Give up on sequentiality of index – Try to get “balance”

B+Tree Example 3 5 11 30 30 35 100 101 110 Root 100 120 130 150 120 156 150 179 180 n=3 180 200

Sample non-leaf 57 81 95 to keys to keys to keys to keys 81 ≤ k<95 ≥ 95 57 ≤ k<81 < 57

Sample leaf node: 57 To record with key 57 To record 81 with key 81 From non-leaf node To record 95 with key 85 in sequence to next leaf

In textbook’s notation n=3 Leaf: 30 35 30 35 Non-leaf: 30 30

3 5 Lookup record(s) with key = 35 11 30 30 35 100 101 110 Root 100 120 130 150 120 156 150 179 180 180 200 n=3

3 5 Lookup record(s) with key = 40 11 30 30 35 100 101 110 Root 100 120 130 150 120 156 150 179 180 180 200 n=3

Range query: lookup record(s) with 35 <= key <= 150 n=3 Root 100 120 150 180 30 100 101 110 120 130 150 156 179 180 200 11 30 35 3 5

• The I/O cost of a lookup in a BTree is equal to longest path of the root to a leaf • Hence, the goal is to keep this longest path as short as possible • In particular: we want all leafs to be at the same depth in the tree (and hence want a balanced tree)

Size of nodes: n+1 pointers (fixed) n keys

Don’t want nodes to be too empty • Use at least  (n+1)/2  Non-leaf: pointers Leaf :  (n+1)/2  pointers to data

Leaf Non-leaf n=3 node Full node 3 120 5 150 11 180 30 30 35 min. counts even if null

B+tree rules tree of order d (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer”

(3) Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrs → data keys Non-leaf  (n+1)/ 2   (n+1)/ 2  - 1 (non-root) n+1 n Leaf  (n+ 1) / 2   (n+ 1) / 2  (non-root) n+1 n Root n+1 n 2 1

Insert into B+tree (a) simple case – space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root

(a) Insert key = 32 n=3 100 30 11 30 31 32 3 5

(a) Insert key = 7 n=3 100 30 7 7 11 30 31 3 5 3 5

(c) Insert key = 160 100 160 150 120 156 150 179 180 160 179 180 n=3 180 200

(d) New root, insert n=3 45 new root 30 10 20 30 40 10 12 20 25 30 32 40 40 45 1 2 3

Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf

(b) Coalesce with n=4 sibling – Delete 50 100 10 40 40 10 20 30 40 50

(c) Redistribute keys n=4 – Delete 50 35 100 10 40 35 10 20 30 35 40 50

(d) Non-leaf coalese n=4 – Delete 37 25 new root 40 25 10 20 30 40 30 30 37 10 14 20 22 25 26 40 45 1 3

Outline/summary • Conventional Indexes • Sparse vs. dense • Primary vs. secondary • B trees • B+trees vs. indexed sequential • Hashing schemes --> Next

Database System Architecture Index Structures Hector Garcia-Molina - PowerPoint PPT Presentation

Database System Architecture Index Structures Hector Garcia-Molina Stijn Vansummeren Index structure Any data structure that takes as input a search key and efficiently returns the collection of matching records Sequential File 10 20

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Announcements 61A Lecture 34 Database Management System Architecture Database Management Systems

61A Lecture 33 Announcements Database Management Systems Database Management System Architecture

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

CS411: Two Perspectives on DBMS User perspective CS411 how to use a database system

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

Database Systems Database Systems 1 Creating a Database System Design Construction

Intermezzo: A typical database architecture 136 A typical database architecture SQL SQL SQL

National Address Database National Address Database What is a National Address Database?

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

An Example of Index An Example of Index pattern of structure in indicators pattern of structure

Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure

Architecture Case Study: 1 Key Word in Context (KWIC) Aims: To demonstrate key features

MySQL Index Cookbook Deep & Wide Index Tutorial Rick James Feb., 2015 TOC Preface

LSH-Based Probabilistic Pruning of Inverted Indices for Sets and Ranked Lists Koninika Pal and

Dropping on the Edge: Flexibility and Dropping on the Edge: Flexibility and Trac Conrmation

Outline Restless Bandits 1 Overview Problem Description Decomposition Applications 2

AICPA Business and Industry Economic Outlook Survey Detailed Survey Results: 2Q 2020 Management

Database System Architecture Index Structures Hector Garcia-Molina - PowerPoint PPT Presentation

Database System Architecture Index Structures Hector Garcia-Molina Stijn Vansummeren Index structure Any data structure that takes as input a search key and efficiently returns the collection of matching records Sequential File 10 20

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Announcements 61A Lecture 34 Database Management System Architecture Database Management Systems

61A Lecture 33 Announcements Database Management Systems Database Management System Architecture

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

CS411: Two Perspectives on DBMS User perspective CS411 how to use a database system

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

Database Systems Database Systems 1 Creating a Database System Design Construction

Intermezzo: A typical database architecture 136 A typical database architecture SQL SQL SQL

National Address Database National Address Database What is a National Address Database?

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

An Example of Index An Example of Index pattern of structure in indicators pattern of structure

Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data &amp; Infrastructure

Architecture Case Study: 1 Key Word in Context (KWIC) Aims: To demonstrate key features

MySQL Index Cookbook Deep &amp; Wide Index Tutorial Rick James Feb., 2015 TOC Preface

LSH-Based Probabilistic Pruning of Inverted Indices for Sets and Ranked Lists Koninika Pal and

Dropping on the Edge: Flexibility and Dropping on the Edge: Flexibility and Trac Conrmation

Outline Restless Bandits 1 Overview Problem Description Decomposition Applications 2

AICPA Business and Industry Economic Outlook Survey Detailed Survey Results: 2Q 2020 Management

Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure

MySQL Index Cookbook Deep & Wide Index Tutorial Rick James Feb., 2015 TOC Preface