Data Management Systems
- Access Methods
- Pages and Blocks
- Indexing
- Access Methods in context
Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich
1 Indexing
Hashing B+ trees Other indexing techniques
Data Management Systems Access Methods Hashing Pages and Blocks - - PowerPoint PPT Presentation
Data Management Systems Access Methods Hashing Pages and Blocks Indexing B+ trees Access Methods in context Other indexing techniques Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zrich
Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich
1 Indexing
Hashing B+ trees Other indexing techniques
Indexing 2
databases
cache, lock table)
something is located
Indexing 3
Indexing 4
https://en.wikipedia.org/wiki/Hash_function https://en.wikipedia.org/wiki/Hash_table
database, however, the hash function has to be computationally cheap since it is used very often
cardinality of the attribute (4 byte keys = 4 GB table)
does not work for almost anything else)
Indexing 5
another entry in a linked list
efficient
the linked lists (more blocks but linked list traversal within the same block)
Indexing 6
a collision occurs, we look for an empty slot in the hash table using some rule
next slot(s)
functions, if collisions with first, use the second, if …
Indexing 7
Indexing 8
Indexing 9
00 01 10 11 Hash buckets (2 bits) h Database blocks 100110110 Hashed key Hash function 1
Indexing 10
00 01 10 11 Hash buckets (2 bits) h Database blocks 100110110 Hashed key Hash function 00 10 1
more space, allowing for more splitting
Indexing 11
000 001 010 011 Hash buckets (3 bits) h Database blocks 100110110 Hashed key Hash function 1 100 101 110 111
data block
unneeded entries
Indexing 12
Indexing 13
00 01 10 11 Hash buckets (2 bits) Database blocks Split pointer
A split pointer is used to indicate which bucket will be split in case of overflow
Indexing 14
000 01 10 11 Hash buckets (2 bits) Database blocks Split pointer
When overflow occurs Chain the block that overflows Split the bucket indicated by the pointer Move the pointer
001 Entries above the split pointer use a second hash function that targets the expanded range (mod n, mod 2n)
the data
split, the data will be reorganized
length, etc.
page (instead of doubling)
Indexing 15
chaining)
points to the actual data blocks
Indexing 16
Indexing 17
https://w6113.github.io/files/papers/btreesurvey-graefe.pdf
between k/2 and k child nodes
(unless it is also a leaf)
but B+ trees; even if they say they use B-trees!!
Indexing 18
https://en.wikipedia.org/wiki/B-tree#/media/File:B-tree.svg
pages for variable length data
Indexing 19
From Oracle documentation (11g)
where “value” is the value that is being indexed and “key” is the pointer to the tuple, typically as a row id or tuple id (recall: this is at least a block id and an offset)
as above)
the key. If there is no key, the engine assigns random keys and indexes them
Indexing 20
the index indicates => table is physically stored in a sorted manner
Indexing 21
attributes or a table is searched by several attributes
(a1,b1) < (a2,b2) <=> (a1 < b1) V ( a1 = b1 ˄ a2 < b2 )
beginning of the key
Indexing 22
department_id, last_name, salary
attributes (in that order)
sorted order
table and result is already sorted
Indexing 23
SELECT department_id, last_name, salary FROM employees WHERE salary > 5000 ORDER BY department_id, last_name;
not unique (e.g., the department in the table of students of ETH)
duplicates
entries
tuple the entry refers too (otherwise, they are all the same)
Indexing 24
the root
binary search to look for the correct entry
pointer
Indexing 25
Indexing 26
SELECT * FROM T WHERE T.x > 11 AND t.x < 222 Find first tuple that matches Traverse leaves until last match found
leaf
Indexing 27
leaf
both leaves
Indexing 28
creates conflicting, concurrent accesses
Indexing 29
to propagate all the way back to the root
Indexing 30
tree
Indexing 31
constantly being produced, inserting into a B+ tree is a problem:
copies of the block are produced
4321 and 5321)
Indexing 32
type of the attribute, this can become expensive
that have the same effect
Indexing 33
aacd bdsr dftq psta bdsr aacd bdsr dftq psta c
likely to be very similar (same as Frame of Reference compression)
Indexing 34
bdse bdsm bdso bdsr bdsb bdsm bdsr b c r bds bdse bdsm bdso bdsr
Indexing 35
good for slow storage devices
fast storage
Indexing 36
indexes
size than the data blocks
the index affects working on the data
the data
Indexing 37
Indexing 38
the results; low selectivity => many tuples in the results)
Indexing 39
value with an array where position i is set to 1 if tuple i contains that value:
Indexing 40
Name Country Juan Spain Mary UK Ueli Switzerland 1 … Spain 1 … UK 1 … Spain
indexes for low selectivity queries
Indexing 41
and store some basic statistics:
data needed is within that group
table without having to read all the data, only the aggregates
Indexing 42
Indexing 43