1
Overview of Storage and Indexing
CMPSCI 645 Feb 28, 2008
Slides Courtesy of R. Ramakrishnan and J. Gehrke
Overview of Storage and Indexing CMPSCI 645 Feb 28, 2008 Slides - - PowerPoint PPT Presentation
Overview of Storage and Indexing CMPSCI 645 Feb 28, 2008 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query Executor File & Access Methods Lock Manager Log Manager
1
Slides Courtesy of R. Ramakrishnan and J. Gehrke
2
Disk Space Manager
File & Access Methods
Buffer Manager Query Parser Query Rewriter Query Optimizer Query Executor Lock Manager Log Manager
3
Disks: Can retrieve random page at fixed cost
Tapes: Can only read pages in sequence
Page: Unit of information read from or written to disk
Disk space manager:
Page I/O:
4
5
Access methods: routines to manage various disk-based
File of records:
Indexes:
6
7
An index on a file speeds up selections on the search
An index contains a collection of data entries, and
8
Leaf pages contain data entries, and are chained (prev & next) Non-leaf pages have index entries; only used to direct searches:
P0 K 1 P 1 K 2 P 2 K m P m
index entry
Non-leaf Pages Pages (Sorted by search key) Leaf
9
Equality selection: find 28*? 29*? Range selection: find all > 15* and < 30* Insert/delete: Find data entry in leaf, then change it.
2* 3*
Root
17
30 14* 16* 33* 34* 38* 39* 13 5 7* 6* 8* 22* 24* 27 27* 29*
10
h 1 2 3 … … … … … … N-1
11
12
13
14
15
16
Index entries Data entries direct search for (Index File) (Data file) Data Records data entries Data entries Data Records
CLUSTERED UNCLUSTERED
17
18
19
20
21
22
# of leaf pages: B*0.1/67%=0.15B; Cost of index I/O: 0.15BD # of data entries on a leaf page: R*10*0.67=6.7R; Cost of data I/O: 0.15B*6.7R*D=BDR
# of data entry pages: B*0.1/80%=0.125B; Cost of index I/O: 0.125BD # of data entries on a hash page: R*10*0.80=8R; Cost of data I/O: 0.125B*8R*D=BDR
D(log2B + #matching pages)
D(logF1.5B + #matching pages)
.15B)
D(logF.15B + #matching recs)