CS 225 Data Structures Oc October 19 19 In Intro Kd Kd-tr - - PowerPoint PPT Presentation

cs 225
SMART_READER_LITE
LIVE PREVIEW

CS 225 Data Structures Oc October 19 19 In Intro Kd Kd-tr - - PowerPoint PPT Presentation

CS 225 Data Structures Oc October 19 19 In Intro Kd Kd-tr trees a s and Bt Btrees G G Carl Evans Ra Range-ba base sed d Searche hes Balanced BSTs are useful structures for range-based and nearest-neighbor searches. Q: Consider


slide-1
SLIDE 1

CS 225

Data Structures

Oc October 19 19 – In Intro Kd Kd-tr trees a s and Bt Btrees

G G Carl Evans

slide-2
SLIDE 2

Ra Range-ba base sed d Searche hes

Balanced BSTs are useful structures for range-based and nearest-neighbor searches. Q: Consider points in 1D: p = {p1, p2, …, pn}. …what points fall in [11, 42]? Ex:

3 6 11 33 41 44 55

slide-3
SLIDE 3

Ra Range-ba base sed d Searche hes

Q: Consider points in 1D: p = {p1, p2, …, pn}. …what points fall in [11, 42]? Tree construction:

slide-4
SLIDE 4

Ra Range-ba base sed d Searche hes

6 3 11 33 44 41

3 6 11 33 41 44 55

Q: Consider points in 1D: p = {p1, p2, …, pn}. …what points fall in [11, 42]?

slide-5
SLIDE 5

Ra Range-ba base sed d Searche hes

Consider points in 2D: p = {p1, p2, …, pn}. Q: What points are in the rectangle: [ (x1, y1), (x2, y2) ]? Q: What is the nearest point to (x1, y1)?

p1 p2 p4 p3 p7 p5 p6

slide-6
SLIDE 6

Ra Range-ba base sed d Searche hes

Consider points in 2D: p = {p1, p2, …, pn}. Tree construction:

p1 p2 p4 p3 p7 p5 p6

slide-7
SLIDE 7

Ra Range-ba base sed d Searche hes

p1 p2 p3 p4 p5 p6 p7 p1 p2 p4 p3 p7 p5 p6

slide-8
SLIDE 8

kD kD-Tr Trees

p1 p2 p3 p4 p5 p6 p7 p1 p2 p4 p3 p7 p5 p6

slide-9
SLIDE 9

B-Tr Trees

Q: Can we always fit our data in main memory? Q: Where else can we keep our data? However, Our big-O has assumed uniform time for all

  • perations.
slide-10
SLIDE 10

Va Vast Differences in Time

A 3GHz CPU performs 3m operations in _______. Old Argument: “Disk Storage is Slow”

  • Bleeding-edge storage is pretty fast:

SSD

  • Large Disks (25 TB+) still have slow throughout:

New Argument: “The Cloud is Slow!”

slide-11
SLIDE 11

5 3 6 4 2 8 10 9 12 11 1 7

AV AVLs on Disk

slide-12
SLIDE 12

Re Real Application

Imagine storing TicTok profiles for everyone in the US: How many records? How much data in total? How deep is the AVL tree?

slide-13
SLIDE 13

BT BTree Mo Motiv tivatio tions

Knowing that we have large seek times for data, we want to:

slide-14
SLIDE 14

BT BTree (o (of f orde der m)

Goal: Minimize the number of reads!

Build a tree that uses ______________________ / node [1 network packet] [1 disk block]

  • 3

8 23 25 31 42 43 55 m=9

slide-15
SLIDE 15

BT BTree In Inser ertio tion

A BTrees of order m is an m-way tree:

  • All keys within a node are ordered
  • All leaves contain hold no more than m-1 keys.

m=5

slide-16
SLIDE 16

BT BTree In Inser ertio tion

When a BTree node reaches m keys:

m=5

slide-17
SLIDE 17

BT BTree Re Recurs rsive Insert

  • 3

8 23 25 31 42 43 55 m=3

slide-18
SLIDE 18

BT BTree Re Recurs rsive Insert

  • 3

8 23 25 31 42 43 55 m=3

slide-19
SLIDE 19

BT BTree Vi Visua sualization/ n/Tool

https://www.cs.usfca.edu/~galles/visualization/BTree.html

slide-20
SLIDE 20

Bt Btree Pr Properties

A BTrees of order m is an m-way tree:

  • All keys within a node are ordered
  • All leaves contain hold no more than m-1 keys.
  • All internal nodes have exactly one more child than keys
  • Root nodes can be a leaf or have [2, m] children.
  • All non-root, internal nodes have [ceil(m/2), m] children.
  • All leaves are on the same level
slide-21
SLIDE 21

BT BTree

3 17 16 28 48 8 1 2 6 7 25 26 29 45 12 14 52 53 55 68

slide-22
SLIDE 22

BT BTree Se Search ch

  • 3

8 23 25 31 42 43 55

  • 11

60

slide-23
SLIDE 23

BT BTree Se Search ch

bool Btree::_exists(BTreeNode & node, const K & key) { unsigned i; for ( i = 0; i < node.keys_ct_ && key < node.keys_[i]; i++) { } if ( i < node.keys_ct_ && key == node.keys_[i] ) { return true; } if ( node.isLeaf() ) { return false; } else { BTreeNode nextChild = node._fetchChild(i); return _exists(nextChild, key); } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

  • 3

8 23 25 31 42 43 55

  • 11

60

slide-24
SLIDE 24

BT BTree An Analysis

The height of the BTree determines maximum number of ____________ possible in search data. …and the height of the structure is: ______________. Therefore: The number of seeks is no more than __________. …suppose we want to prove this!

slide-25
SLIDE 25

BT BTree An Analysis

In our AVL Analysis, we saw finding an upper bound on the height (given n) is the same as finding a lower bound on the nodes (given h). We want to find a relationship for BTrees between the number of keys (n) and the height (h).

slide-26
SLIDE 26

BT BTree An Analysis

Strategy: We will first count the number of nodes, level by level. Then, we will add the minimum number of keys per node (n). The minimum number of nodes will tell us the largest possible height (h), allowing us to find an upper-bound on height.

slide-27
SLIDE 27

BT BTree An Analysis

The minimum number of nodes for a BTree of order m at each level: root: level 1: level 2: level 3: … level h:

slide-28
SLIDE 28

BT BTree An Analysis

The total number of nodes is the sum of all of the levels:

slide-29
SLIDE 29

BT BTree An Analysis

The total number of keys:

slide-30
SLIDE 30

BT BTree An Analysis

The smallest total number of keys is: So an inequality about n, the total number of keys: Solving for h, since h is the number of seek operations:

slide-31
SLIDE 31

BT BTree An Analysis

Given m=101, a tree of height h=4 has: Minimum Keys: Maximum Keys: