b trees
play

B-trees Carola Wenk 9/11/14 1 CMPS 2200 Intro. to Algorithms - PowerPoint PPT Presentation

CMPS 2200 Fall 2014 B-trees Carola Wenk 9/11/14 1 CMPS 2200 Intro. to Algorithms External memory dictionary Task: Given a large amount of data that does not fit into main memory, process it into a dictionary data structure Need to


  1. CMPS 2200 – Fall 2014 B-trees Carola Wenk 9/11/14 1 CMPS 2200 Intro. to Algorithms

  2. External memory dictionary Task: Given a large amount of data that does not fit into main memory, process it into a dictionary data structure • Need to minimize number of disk accesses • With each disk read, read a whole block of data • Construct a balanced search tree that uses one disk block per tree node • Each node needs to contain more than one key 9/11/14 2 CMPS 2200 Intro. to Algorithms

  3. k -ary search trees A k -ary search tree T is defined as follows: •For each node x of T: • x has at most k children (i.e., T is a k -ary tree) • x stores an ordered list of pointers to its children, and an ordered list of keys • For every internal node: #keys = #children-1 • x fulfills the search tree property: keys in subtree rooted at i -th child  i -th key < keys in subtree rooted at ( i+ 1)-st child 9/11/14 3 CMPS 2200 Intro. to Algorithms

  4. Example of a 4-ary tree 9/11/14 4 CMPS 2200 Intro. to Algorithms

  5. Example of a 4-ary search tree 10 25 6 12 15 21 30 45 11 14 2 7 8 20 23 24 27 40 50 1 16 18 9/11/14 5 CMPS 2200 Intro. to Algorithms

  6. B-tree A B- tree T with minimum degree k  2 is defined as follows: 1. T is a (2 k )-ary search tree 2. Every node, except the root, stores at least k -1 keys (every internal non-root node has at least k children) 3. The root must store at least one key 4. All leaves have the same depth 9/11/14 6 CMPS 2200 Intro. to Algorithms

  7. B-tree with k =2 10 25 6 12 15 21 30 45 11 14 2 7 8 20 23 24 27 40 50 1. T is a (2 k )-ary search tree 9/11/14 7 CMPS 2200 Intro. to Algorithms

  8. B-tree with k =2 10 25 6 12 15 21 30 45 11 14 2 7 8 20 23 24 27 40 50 2. Every node, except the root, stores at least k -1 keys 9/11/14 8 CMPS 2200 Intro. to Algorithms

  9. B-tree with k =2 10 25 6 12 15 21 30 45 2 11 14 7 8 20 23 24 27 40 50 3. The root must store at least one key 9/11/14 9 CMPS 2200 Intro. to Algorithms

  10. B-tree with k =2 10 25 6 12 15 21 30 45 11 14 2 7 8 20 23 24 27 40 50 4. All leaves have the same depth 9/11/14 10 CMPS 2200 Intro. to Algorithms

  11. B-tree with k =2 10 25 6 12 15 21 30 45 11 14 2 7 8 20 23 24 27 40 50 Remark: This is a 2-3-4 tree. 9/11/14 11 CMPS 2200 Intro. to Algorithms

  12. Height of a B-tree Theorem: For a B-tree with minimum degree k  2 which stores n keys and has height h holds: h ≤ log k ( n +1)/2 Proof: #nodes  1+2+2 k +2 k 2 +…+2 k h -1 level 1 level 3 level 0 level 2 h-1 n = #keys  1+( k -1)  2 k i = 1 + 2 (k- 1 )  k h - 1 = 2 k h - 1 k- 1 i=0 9/11/14 12 CMPS 2200 Intro. to Algorithms

  13. B-tree search B-T REE -S EARCH ( x , key ) i  1 while i < #keys of x and key > i -th key of x do i++ if i < #keys of x and key = i -th key of x then return ( x,i ) if x is a leaf then return NIL else b =DISK-READ( i -th child of x ) return B-T REE -S EARCH ( b , key ) 9/11/14 13 CMPS 2200 Intro. to Algorithms

  14. B-tree search runtime • O(k) per node • Path has height h = O( log k n) • CPU-time: O(k log k n) • Disk accesses: O( log k n) disk accesses are more expensive than CPU time 9/11/14 14 CMPS 2200 Intro. to Algorithms

  15. B-tree insert • There are different insertion strategies. We just cover one of them • Make one pass down the tree: • The goal is to insert the new key into a leaf • Search where key should be inserted • Only descend into non-full nodes: • If a node is full, split it. Then continue descending. • Splitting of the root node is the only way a B- tree grows in height 9/11/14 15 CMPS 2200 Intro. to Algorithms

  16. B-T REE -S PLIT -C HILD ( x , i , y ) has 2 k -1 keys • Split full node y into two nodes y and z of k -1 keys • Median key S of y is moved up into y ’s parent x • Example below for k = 4 9/11/14 16 CMPS 2200 Intro. to Algorithms

  17. Split root: B-T REE -S PLIT -C HILD ( s , 1 , r ) • The full root node r is split in two. • A new root node s is created • s contains the median key H of r and has the two halves of r as children • Example below for k = 4 9/11/14 17 CMPS 2200 Intro. to Algorithms

  18. B-T REE -I NSERT ( T , key ) r = root[ T ] if ( # keys in r ) = 2 k- 1 // root r is full //insert new root node: s  A LLOCATE -N ODE () root[ T ]  s // split old root r to be two children of new root s B-T REE -S PLIT -C HILD ( s ,1, r ) B-T REE -I NSERT -N ONFULL ( s , key ) else B-T REE -I NSERT -N ONFULL ( r , key ) 9/11/14 18 CMPS 2200 Intro. to Algorithms

  19. B-T REE -I NSERT- N ONFULL ( x , key ) if x is a leaf then insert key at the correct (sorted) position in x D ISK -W RITE ( x ) else find child c of x which by the search tree property should contain key D ISK -R EAD ( c ) if c is full then // c contains 2 k -1 keys B-T REE -S PLIT -C HILD ( x , i , c ) c =child of x which should contain key B-T REE -I NSERT -N ONFULL ( c , key ) 9/11/14 19 CMPS 2200 Intro. to Algorithms

  20. Insert example ( k =3) G M P X A C D E A C D E J K N O R S T U V Y Z • Insert B : G M P X A B C D E J K N O R S T U V Y Z 9/11/14 20 CMPS 2200 Intro. to Algorithms

  21. Insert example ( k =3) -- cont. G M P X A B C D E J K N O R S T U V R S T U V Y Z node is full • Insert Q : G M P T X A B C D E J K N O R S Y Z Q R S U V 9/11/14 21 CMPS 2200 Intro. to Algorithms

  22. Insert example ( k =3) -- cont. node is full G M P T X G M P T X A B C D E J K N O Q R S Y Z U V • Insert L : P G M G M T X A B C D E J K L J K N O Q R S Y Z U V 9/11/14 22 CMPS 2200 Intro. to Algorithms

  23. Insert example ( k =3) -- cont. P G M T X node is full A B C D E A B C D E N O Y Z J K L Q R S U V • Insert F : P C G M T X A B D E D E F J K L N O Q R S Y Z U V 9/11/14 23 CMPS 2200 Intro. to Algorithms

  24. Runtime of B-T REE -I NSERT • O(k) runtime per node • Path has height h = O( log k n) • CPU-time: O(k log k n) • Disk accesses: O( log k n) disk accesses are more expensive than CPU time 9/11/14 24 CMPS 2200 Intro. to Algorithms

  25. Deletion of an element • Similar to insertion, but a bit more complicated • If sibling nodes get not full enough, they are merged into a single node • Same complexity as insertion 9/11/14 25 CMPS 2200 Intro. to Algorithms

  26. B-trees -- Conclusion • B-trees are balanced 2 k -ary search trees • The degree of each node is bounded from above and below using the parameter k • All leaves are at the same height • No rotations are needed: During insertion (or deletion) the balance is maintained by node splitting (or node merging ) • The tree grows (shrinks) in height only by splitting (or merging) the root 9/11/14 26 CMPS 2200 Intro. to Algorithms

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend