CSE 373: B-trees Michael Lee Wednesday, Jan 31, 2018 1 Motivation - PowerPoint PPT Presentation

CSE 373: B-trees Michael Lee Wednesday, Jan 31, 2018 1

Motivation What we’ve done so far: study difgerent dictionary implementations They all make one common assumption: all our data is stored in in-memory, on RAM . 2 ◮ ArrayDictionary ◮ SortedArrayDictionary ◮ Binary search trees ◮ AVL trees ◮ Hash tables

Motivation New challenge: what if our data is too large to store all in RAM? (For example, if we were trying to implement a database?) How can we do this effjciently? Two techniques: A tree-based technique Excels for range-lookups (e.g. “fjnd all users with an age between 20 and 30”, where “age” is the key) A hash-based technique Excels for specifjc key-value pair lookups 3

Motivation New challenge: what if our data is too large to store all in RAM? (For example, if we were trying to implement a database?) How can we do this effjciently? Two techniques: Excels for range-lookups (e.g. “fjnd all users with an age between 20 and 30”, where “age” is the key) Excels for specifjc key-value pair lookups 3 ◮ A tree-based technique ◮ A hash-based technique

A tree-based technique Idea 1: Use an AVL tree Suppose the tree has a height of 50. In the best case, how many disk accesses do we need to make? In the worst case? In the best case, the nodes we want happen to be stored in RAM, so we need zero accesses. In the worst case, each node is stored on a difgerent page on disk, so we need to make 50 accesses. 4

M-ary search trees Idea 1: children. Each node contains a sorted array of children nodes. Example: 5 ◮ Instead of having each node have 2 children, make it have M ◮ Pick M so that each node fjts into a single page

The height is approximately log M n . We need to examine log M n nodes. log M n M-ary search trees n ? Assume the tree is balanced. Per each node, we need to fjnd the child to pick. We can do so using binary search : log M Total runtime: height wordPerNode log M . 6 ◮ What is the height of an M -ary search tree in terms of M and ◮ What is the worst-case runtime of get(...) ?

M-ary search trees n ? Assume the tree is balanced. Per each node, we need to fjnd the child to pick. 6 ◮ What is the height of an M -ary search tree in terms of M and The height is approximately log M ( n ) . ◮ What is the worst-case runtime of get(...) ? We need to examine log M ( n ) nodes. We can do so using binary search : log 2 ( M ) Total runtime: height · wordPerNode = log M ( n ) · log 2 ( M ) .

It’s log M n log M-ary trees With M -ary trees, how many disk accesses do we make, assuming each node is stored on one page? M ! When doing binary search, we need to check the child to see if its key is the one we should pick. 7 Is it log M ( n ) , or log M ( n ) log 2 ( M ) ?

M-ary trees With M -ary trees, how many disk accesses do we make, assuming each node is stored on one page? the child to see if its key is the one we should pick. 7 Is it log M ( n ) , or log M ( n ) log 2 ( M ) ? It’s log M ( n ) log 2 ( M ) ! When doing binary search, we need to check

B-Trees Idea 2: need in the parent – store keys? Internal node A node that stores only keys and pointers to children nodes Leaf node A node that stores only keys and values 8 ◮ Rather then visiting each child, what if we stored the info we ◮ To avoid redundancy, store values only in leaf nodes.

B-Trees 18 d 33 f 32 b 31 a 29 a 27 a 26 e 25 m 19 z 17 c An example: 15 a 10 k 9 f 5 b 1 a 30 20 10 9

B-Trees 60 24 25 26 27 28 29 31 33 37 45 33 20 36 40 44 46 50 55 57 58 60 70 100 21 19 A larger example (values in leaf nodes omitted): 10 15 40 4 10 1 2 3 5 6 7 11 17 12 13 14 15 20 25 30 11 12 13 15 10

B-tree invariants The B-tree invariants 1. The B-tree node type invariant 2. The B-tree order invariant 3. The B-tree structure invariant 11

The B-tree node type invariant B-tree node type invariant A B-tree has two types of node: internal nodes, and leaf nodes. 12

The B-tree node type invariant A leaf node contains L key-value pairs, sorted by key. Example Note: M and L are parameters the creator of the B-tree must pick K V K V K V : of leaf node where L B-tree leaf node B-tree internal node K K K K 13 An internal node contains M pointers to children and M − 1 sorted keys. Note: M > 2 must be true. Example of internal node where M = 6 :

The B-tree node type invariant B-tree leaf node Note: M and L are parameters the creator of the B-tree must pick K V K V K V A leaf node contains L key-value pairs, sorted by key. Example K B-tree internal node K K K 13 An internal node contains M pointers to children and M − 1 sorted keys. Note: M > 2 must be true. Example of internal node where M = 6 : of leaf node where L = 3 :

The B-tree order invariant Example: 21 12 B-tree order invariant 3 7 This means the subtree between two adjacent keys a and b may For any given key k , all subtrees to the left may only contain keys 14 x that satisfy x < k . All subtrees to the right may only contain keys x that satisfy k ≥ k . only contain keys x that satisfy a ≤ x < b . x < 3 3 ≤ x < 7 7 ≤ x < 12 12 ≤ x < 21 21 ≤ x

The B-tree structure invariant All other internal nodes must have exception is the root, which can have as few as 2 children. In other words: all nodes must be at least half-full . The only to L children. L All leaf nodes must have to M children. M to M children. containing L , the root node MUST be an internal node When n L B-tree structure when n 12 15 B-tree structure when n ≤ L If n ≤ L , the root node is a leaf:

The B-tree structure invariant to M children. exception is the root, which can have as few as 2 children. In other words: all nodes must be at least half-full . The only to L children. All leaf nodes must have 15 12 All other internal nodes must have B-tree structure when n ≤ L If n ≤ L , the root node is a leaf: B-tree structure when n > L When n > L , the root node MUST be an internal node containing 2 to M children. � M � 2 � L � 2

Why? Otherwise, we could end up with a linked list. It lets us ensure the tree stays balanced . If n is relatively small compared to M and L , it may not be possible for the root to actually be half-full. 16 ◮ Why must M > 2 ? ◮ Why do we insist almost all nodes must be at least half-full? ◮ Why is the root allowed to have as few as 2 children?

What’s the worst-case runtime of get(...) ? Num disk accesses? log M n log Number of disk accesses is log M n . B-tree get 47 32 34 38 39 41 50 44 50 49 27 60 70 Runtime roughly the same as M -ary trees: log L M . 28 24 17 09 12 44 06 01 02 03 06 08 10 22 20 27 34 12 14 16 17 19 20 Try running get(6) , get(39)

log M n log Number of disk accesses is log M n . B-tree get 47 32 34 38 39 41 50 44 50 49 27 60 70 Runtime roughly the same as M -ary trees: log L M . 28 24 17 09 12 44 06 01 02 03 06 08 10 22 20 27 34 12 14 16 17 19 20 Try running get(6) , get(39) What’s the worst-case runtime of get(...) ? Num disk accesses?

CSE 373: B-trees Michael Lee Wednesday, Jan 31, 2018 1 Motivation - PowerPoint PPT Presentation

CSE 373: B-trees Michael Lee Wednesday, Jan 31, 2018 1 Motivation What weve done so far: study difgerent dictionary implementations They all make one common assumption: all our data is stored in in-memory, on RAM . 2 ArrayDictionary

Lecture 1: Welcome! CSE 373: Data Structures and Algorithms CSE 373 19 WI - KASEY CHAMPION 1

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

Splay Trees and B-Trees CSE 373 Data Structures Lecture 9 Readings Reading Sections

Lecture 13: Computer CSE 373 Data Structures and Memory Algorithms CSE 373 SP 18 - KASEY

Lecture 2: Stacks and CSE 373: Data Structures and Queues Algorithms CSE 373 19 SP - KASEY

Lecture 21: Disjoint Sets CSE 373: Data Structures and with Arrays Algorithms CSE 373 19 WI -

Lecture 15: Sorting CSE 373: Data Structures and Algorithms Algorithms CSE 373 WI 19 - KASEY

Lecture 4: Introduction to CSE 373: Data Structures and Asymptotic Analysis Algorithms CSE 373

Lecture 11: Introduction CSE 373: Data Structures and to Hash Tables Algorithms CSE 373 SU 19 -

Lecture 4: Introduction to CSE 373: Data Structures and Code Analysis Algorithms CSE 373 19 SP

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

B trees Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1 Warm Up Suppose we have

Queues Algorithms CSE 373 19 SU -- ROBBIE WEBER 1 Administrivia Course Stuff - Office hours

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

CSE 390B: Graph Algorithms Based on CSE 373 slides by Jessica Miller, Ruth Anderson 1 A Graph:

CAS CS 460/660 Introduction to Database Systems Tree Based Indexing: B+-tree Slides from UC

B-trees Carola Wenk 9/11/14 1 CMPS 2200 Intro. to Algorithms External memory dictionary Task:

Scaling 4x4 matrix s 0 0 0 x 0 s 0 0 1 1 1 y 1 S

Mechanical Engineering Drawing MECH 211/M / Lecture #1 Chapters 1 and 6 p Dr. John Cheung

CSE 326: Data Structures B-Trees Hal Perkins Weiss Sec. 4.7 Winter 2008 Winter 2008 Lecture

CPSC 221: Data Structures B+-Trees Alan J. Hu (Using mainly Steve Wolfmans Slides) Learning

CS 1501 www.cs.pitt.edu/~nlf4/cs1501/ B-trees The problem Weve discussed several approaches

B-T Inverters : Bayer ' ? McCreight , 1970 Boeing Research Labs . - tree ? why B- ? ? Bayer ,