Chapter 8 Multiway Trees This chapter studies multiway trees. These - - PowerPoint PPT Presentation
Chapter 8 Multiway Trees This chapter studies multiway trees. These - - PowerPoint PPT Presentation
CS 2412 Data Structures Chapter 8 Multiway Trees This chapter studies multiway trees. These trees can be used for external search. When searched data is big, it is not suitable to load all the data to memory. Search in high-speed memory is
This chapter studies multiway trees. These trees can be used for external search. When searched data is big, it is not suitable to load all the data to memory. Search in high-speed memory is much faster than search in external devices (hard discs, CD etc.) The idea of external search: each time read in a block of information to the memory and decide what is the next block we should search on.
Data Structure 2015
- R. Wei
2
Definition An m-way tree has the following properties.
- Each node has 0 to m subtrees.
- A node with k < m subtrees contains k subtrees and k − 1 data
entries.
- The keys of the data entries are ordered:
key1 ≤ key2 ≤ · · · ≤ keyk−1.
- The key values in the first subtree (0th subtree) are all less
than key1; the key values in the ith subtrees are all greater than or equal to keyi but less than keyi+1.
- All subtrees are themselves multiway trees.
A 2-way tree is a BST.
Data Structure 2015
- R. Wei
3
Data Structure 2015
- R. Wei
4
Data Structure 2015
- R. Wei
5
We also want the multisay tree to be balance. Definition A B-tree is an m-way tree with the following additional properties:
- The root is either a leaf or it has 2 to m subtrees.
- All internal nodes have at least ⌈m/2⌉ nonnull subtrees.
- All leaf nodes are at the same level.
- A leaf node has at least ⌈m/2⌉ − 1 entries.
Data Structure 2015
- R. Wei
6
Data Structure 2015
- R. Wei
7
Data structure of B-tree. The structure of m-way tree: an entry of a node contains data and a pointer to its right subtree. A node contains the first pointer to the subtree with entries less than the key of the first entry, a count
- f the number of entries currently in the node, and an array of
- entries. The array can be of size m.
Main operations for B-trees are: insert, delete, traverse and search.
Data Structure 2015
- R. Wei
8
Data Structure 2015
- R. Wei
9
B-tree insertion: B-tree insertion takes place at a leaf node.
- Locate the leaf node where the data can be inserted.
- If the node is not full (has less than m − 1 entries), insert the
data to this node.
- If the node is full (called overflow condition), split the node
into two node. A B-tree grows from the bottom up.
Data Structure 2015
- R. Wei
10
Example Insert 11, 21, 14, 78, 97 to a B-tree of order 5:
Data Structure 2015
- R. Wei
11
The algorithm of B-tree insert:
- If the B-tree is empty, then create the root and insert the first
entry.
- If the B-tree is not empty, call the insert node algorithm which
finds the location, insert it and do necessary update (if
- verflow, then split and install the median entry to the parent
etc.).
- If the root needs to split, then create a new root.
Data Structure 2015
- R. Wei
12
Algorithm BTreeInsert( tree, data) if (tree empty) create new node set left subtree of node to null move data to first entry in new node set subtree of first entry to null set tree root to address of new node set number of entries to 1 else insertNode(tree, data, upEntry) end if if (tree higher) create new node move upEntry to first entry in new node set left subtree of the new node to tree set tree root to new node set number of entries to 1 end if
Data Structure 2015
- R. Wei
13
Data Structure 2015
- R. Wei
14
Data Structure 2015
- R. Wei
15
Algorithm searchNode(nodePtr, target) if (target < key in first entry) return 0 end if set walker to number of entries -1 loop (targer < entry key[walker]) decrement walker end loop return walker This function returns the index to entry with key ≤ target, or 0 if the key < first entry in node.
Data Structure 2015
- R. Wei
16
Algorithm splitNode (node, entryNdx, newEntryLow, upEntry) create new node move high entries to new node if (entryNdx < minimum entries) inset upEntry in new node end if move median data to upEntry make new node first Ptr the right subtree of median data make new node the right subtree of upEntry
Data Structure 2015
- R. Wei
17
Data Structure 2015
- R. Wei
18
Data Structure 2015
- R. Wei
19
Data Structure 2015
- R. Wei
20
Data Structure 2015
- R. Wei
21
B-tree deletion: B-tree deletion is a little more complicated than insertion.
- Search for the data to be deleted. If can’t find, then print an
error message and quit.
- If the data is found, then delete the data. Two cases need to
consider: the data at leaf node or non-leaf node.
- If an underflow (a leaf node has less than ⌈m/2⌉ − 1 entries or
an internal node has less than ⌈m/2⌉ nonull subtrees) occurred after the data deletion, then adjustment must be done.
Data Structure 2015
- R. Wei
22
The following algorithm delete an entry. Some situations are considered: empty tree, the root is empty after deletion. Leave the details about how to treat underflow to algorithm delete. Algorithm BTreeDelete (tree, dltKey) if (tree empty) return false end if delete (tree, dltKey, success) if (success) if(tree number of entries zero) set tree to left subtree end if end if return success
Data Structure 2015
- R. Wei
23
Data Structure 2015
- R. Wei
24
6 end if 6 end if 7 return underflow end delete
Data Structure 2015
- R. Wei
25
The following algorithm deletes the entry from a leaf node and returns the value of underflow. Algorithm deleteEntry (node, entryNdx) delete entry at entryNdx from node shift entries after delete to left if (number of entries less minimum, entries) return true else retrun false end if
Data Structure 2015
- R. Wei
26
When deleting an entry in an internal node, we must find substitute data. We use the immediate predecessor, which is the largest node on the left subtree of the entry to be deleted. In the subtree, the largest node is the rightmost subtree. Algorithm deleteMid (node, entryNdx, subtree) if (no rightmost subtree) //predecessor in a leaf node move predecessor’s data to deleted entry set underflow if node entries less minimum else set underflow to deleteMid(node, entryNdx, right subtree) if (underflow) set underflow to reFlow(root, entryNdx) end if end if return underflow
Data Structure 2015
- R. Wei
27
When a node is underflow, we need to do some adjustment which we call reflow. Suppose one of the subtree contains unerflow node, two situations need to consider:
- If the other subtree has more entries than the minimum
number, than we just move some entry to the underflow node, which we call it balance.
- If the other subtree only has minimum number of entries, then
we need to combine two node to one node together with the root entry. This is called combine.
Data Structure 2015
- R. Wei
28
Algorithm reflow (root, entryNdx) if (rightTree entries greater minimum entries) borrowRight (root, entryNdx, leftTree, rightTree) set underflow to false else if (leftTree entries greater minimum entries) borrowLeft (root, entryNdx, leftTree, rightTree) set underflow to false else combine (root, entryNdx, leftTree, rightTree) if (root numEntries less minimum entries) set underflow to true else set underflow to false end if end if return underflow
Data Structure 2015
- R. Wei
29
Data Structure 2015
- R. Wei
30
Algorithm borrowLeft(root, entryNdx, left, right) shift all elements one to the right move root data to first entry in right move right first pointer to right subtree of first entry move left last right pointer to right first pointer move left last entry data to root at entryNdx In above algorithm, when an entry is moved the according pointers are also adjusted. (To see that, consider the underflow node is not a leaf node). The algorithm of borrowRight is similar.
Data Structure 2015
- R. Wei
31
Data Structure 2015
- R. Wei
32
Algorithm combine (root, entryNdx, left, right) move parent entry to first open entry in left subtree move right subtree first subtree to moved parent left subtree move entries from right subtree to end of left subtree shift root data to left
Data Structure 2015
- R. Wei
33
Example
Data Structure 2015
- R. Wei
34
Data Structure 2015
- R. Wei
35
Similar to BST, the traversal of a B-tree uses inorder. The difference is that except of leaf nodes, the data in a node is not processed at the same time.
Data Structure 2015
- R. Wei
36
Algorithm BTreeTraversal (root) set scanCount to 0 set nextSubTree to root left subtree loop (scanCount <= number of entries) if (nextSubTree not null) BTreeTraversal (nextSubTree) end if if (ScanCount < number of entries) process (entry[scanCount]) set nextSubTree to current entry right subtree end if increment scanount end loop
Data Structure 2015
- R. Wei
37
The B-tree search algorithm follow the similar idea of search a binary tree. But we need find the node and then find the entry in that node. In this case, we need to return both the node and the location of the entry in that node. Recursive method are used for finding the node. At the node found, compare from the last entry to the first entry.
Data Structure 2015
- R. Wei
38
Data Structure 2015
- R. Wei
39
Data Structure 2015
- R. Wei
40
typedef struct { void* dataPtr; struct node* rightPtr; } ENTRY; typedef struct node { struct node* firstPtr; int numEntries; ENTRY entries[ORDER - 1]; } NODE; typedef struct { int count; NODE* root; int (*compare) (void* argu1, void* argu2); } BTREE;
Data Structure 2015
- R. Wei
41
void* BTree_Search (BTREE* tree, void* targetPtr) { if (tree->root) return _search (tree, targetPtr, tree->root); else return NULL; } // BTree_Search
Data Structure 2015
- R. Wei
42
void* _search (BTREE* tree, void* targetPtr, NODE* root) { int entryNo; if (!root) return NULL; if (tree->compare(targetPtr, root->entries[0].dataPtr) < 0) return _search (tree, targetPtr, root->firstPtr); entryNo = root->numEntries - 1; while (tree->compare(targetPtr, root->entries[entryNo].dataPtr) < 0) entryNo--; if (tree->compare(targetPtr, root->entries[entryNo].dataPtr) == 0) return (root->entries[entryNo].dataPtr); return (_search (tree, targetPtr, root->entries[entryNo].rightPtr)); } // _search
Data Structure 2015
- R. Wei
43
void BTree_Traverse (BTREE* tree, void (*process) (void* dataPtr)) { // Statements if (tree->root) _traverse (tree->root, process); return; } // end BTree_Traverse
Data Structure 2015
- R. Wei
44
void _traverse (NODE* root, void (*process) (void* dataPtr)) { int scanCount; NODE* ptr; scanCount = 0; ptr = root->firstPtr; while (scanCount <= root->numEntries) { if (ptr) _traverse (ptr, process); // Subtree processed -- get next entry if (scanCount < root->numEntries) { process (root->entries[scanCount].dataPtr); ptr = root->entries[scanCount].rightPtr; } // if scanCount scanCount++; } // if return; } // _traverse
Data Structure 2015
- R. Wei
45
void BTree_Insert (BTREE* tree, void* dataInPtr) { bool taller; NODE* newPtr; ENTRY upEntry; if (tree->root == NULL) // Empty Tree. Insert first node if (newPtr = (NODE*)malloc(sizeof (NODE))) { newPtr->firstPtr = NULL; newPtr->numEntries = 1; newPtr->entries[0].dataPtr = dataInPtr; newPtr->entries[0].rightPtr = NULL; tree->root = newPtr; (tree->count)++; for (int i = 1; i < ORDER - 1; i++)
Data Structure 2015
- R. Wei
46
{ newPtr->entries[i].dataPtr = NULL; newPtr->entries[i].rightPtr = NULL; } // for * return; } // if malloc else printf("Overflow error 100 in BTree_Insert\a\n"), exit (100); taller = _insert (tree, tree->root, dataInPtr, &upEntry); if (taller) { // Tree has grown. Create new root newPtr = (NODE*)malloc(sizeof(NODE)); if (newPtr)
Data Structure 2015
- R. Wei
47
{ newPtr->entries[0] = upEntry; newPtr->firstPtr = tree->root; newPtr->numEntries = 1; tree->root = newPtr; } // if newPtr else printf("Overflow error 101\a\n"), exit (100); } // if taller (tree->count)++; return; } // BTree_Insert
Data Structure 2015
- R. Wei
48
bool _insert (BTREE* tree, NODE* root, void* dataInPtr, ENTRY* upEntry) { int compResult; int entryNdx; bool taller; NODE* subtreePtr; if (!root) { (*upEntry).dataPtr = dataInPtr; (*upEntry).rightPtr = NULL; return true; // tree taller } // if NULL tree entryNdx = _searchNode (tree, root, dataInPtr); compResult = tree->compare(dataInPtr, root->entries[entryNdx].dataPtr);
Data Structure 2015
- R. Wei
49
if (entryNdx <= 0 && compResult < 0) // in node’s first subtree subtreePtr = root->firstPtr; else // in entry’s right subtree subtreePtr = root->entries[entryNdx].rightPtr; taller = _insert (tree, subtreePtr, dataInPtr, upEntry); if (taller) { if (root->numEntries >= ORDER - 1) { // Need to create new node _splitNode (root, entryNdx, compResult, upEntry); taller = true;
Data Structure 2015
- R. Wei
50
} // node full else { if (compResult >= 0) // New data >= current entry -- insert after _insertEntry(root, entryNdx + 1, *upEntry); else // Insert before current entry _insertEntry(root, entryNdx, *upEntry); (root->numEntries)++; taller = false; } // else } // if taller return taller; } // _insert
Data Structure 2015
- R. Wei
51
void _splitNode (NODE* node, int entryNdx, int compResult, ENTRY* upEntry) { int fromNdx; int toNdx; NODE* rightPtr; rightPtr = (NODE*)malloc(sizeof (NODE)); if (!rightPtr) printf("Overflow Error 101 in _splitNode\a\n"), exit (100); if (entryNdx < MIN_ENTRIES) fromNdx = MIN_ENTRIES; else fromNdx = MIN_ENTRIES + 1; toNdx = 0; rightPtr->numEntries = node->numEntries - fromNdx;
Data Structure 2015
- R. Wei
52
while (fromNdx < node->numEntries) rightPtr->entries[toNdx++] = node->entries[fromNdx++]; node->numEntries = node->numEntries
- rightPtr->numEntries;
if (entryNdx < MIN_ENTRIES) { if (compResult < 0) _insertEntry (node, entryNdx, *upEntry); else _insertEntry (node, entryNdx + 1, *upEntry); } // if else { _insertEntry (rightPtr, entryNdx - MIN_ENTRIES,
Data Structure 2015
- R. Wei
53
*upEntry); (rightPtr->numEntries)++; (node->numEntries)--; } // else upEntry->dataPtr = node->entries[MIN_ENTRIES].dataPtr; upEntry->rightPtr = rightPtr; rightPtr->firstPtr = node->entries[MIN_ENTRIES].rightPtr; return; } // _splitNode
Data Structure 2015
- R. Wei
54
bool BTree_Delete (BTREE* tree, void* dltKey) { bool success; NODE* dltPtr; if (!tree->root) return false; _delete (tree, tree->root, dltKey, &success); if (success) { (tree->count)--; if (tree->root->numEntries == 0) { dltPtr = tree->root;
Data Structure 2015
- R. Wei
55
tree->root = tree->root->firstPtr; free (dltPtr); } // root empty } // success return success; } // BTree_Delete
Data Structure 2015
- R. Wei
56
bool _delete (BTREE* tree, NODE* root, void* dltKeyPtr, bool* success) { NODE* leftPtr; NODE* subTreePtr; int entryNdx; int underflow; if (!root) { *success = false; return false; } // null tree entryNdx = _searchNode (tree, root, dltKeyPtr); if (tree->compare(dltKeyPtr, root->entries[entryNdx].dataPtr) == 0) {
Data Structure 2015
- R. Wei
57
*success = true; if (root->entries[entryNdx].rightPtr == NULL) underflow = _deleteEntry (root, entryNdx); else { if (entryNdx > 0) leftPtr = root->entries[entryNdx - 1].rightPtr; else leftPtr = root->firstPtr; underflow = _deleteMid (root, entryNdx, leftPtr); if (underflow) underflow = _reFlow (root, entryNdx); } // else internal node } // else found entry
Data Structure 2015
- R. Wei
58
else { if (tree->compare (dltKeyPtr, root->entries[0].dataPtr) < 0) subTreePtr = root->firstPtr; else subTreePtr = root->entries[entryNdx].rightPtr; underflow = _delete (tree, subTreePtr, dltKeyPtr, success); if (underflow) underflow = _reFlow (root, entryNdx); } // else not found * return underflow; } // _delete
Data Structure 2015
- R. Wei
59
bool _deleteMid (NODE* root, int entryNdx, NODE* subtreePtr) { int dltNdx; int rightNdx; bool underflow; if (subtreePtr->firstPtr == NULL) { // leaf located. Exchange data & delete leaf dltNdx = subtreePtr->numEntries - 1; root->entries[entryNdx].dataPtr = subtreePtr->entries[dltNdx].dataPtr;
- -subtreePtr->numEntries;
underflow = subtreePtr->numEntries < MIN_ENTRIES; } // if leaf
Data Structure 2015
- R. Wei
60
else { // Not located. Traverse right for predecessor rightNdx = subtreePtr->numEntries - 1; underflow = _deleteMid (root, entryNdx, subtreePtr->entries[rightNdx].rightPtr); if (underflow) underflow = _reFlow (subtreePtr, rightNdx); } // else traverse right return underflow; } // _deleteMid
Data Structure 2015
- R. Wei
61
bool _reFlow (NODE* root, int entryNdx) { NODE* leftTreePtr; NODE* rightTreePtr; bool underflow; if (entryNdx == 0) leftTreePtr = root->firstPtr; else leftTreePtr = root->entries[entryNdx - 1].rightPtr; rightTreePtr = root->entries[entryNdx].rightPtr; if (rightTreePtr->numEntries > MIN_ENTRIES) { _borrowRight (root, entryNdx, leftTreePtr, rightTreePtr); underflow = false; } // if borrow right else {
Data Structure 2015
- R. Wei
62
// Can’t borrow from right--try left if (leftTreePtr->numEntries > MIN_ENTRIES) { _borrowLeft (root, entryNdx, leftTreePtr, rightTreePtr); underflow = false; } // if borrow left * else { // Can’t borrow. Must combine nodes. _combine (root, entryNdx, leftTreePtr, rightTreePtr); underflow = (root->numEntries < MIN_ENTRIES); } // else combine } // else borrow right return underflow; } // _reFlow
Data Structure 2015
- R. Wei
63
void _borrowRight (NODE* root, int entryNdx, NODE* leftTreePtr, NODE* rightTreePtr) { int toNdx; int shifter; toNdx = leftTreePtr->numEntries; leftTreePtr->entries[toNdx].dataPtr = root->entries[entryNdx].dataPtr; leftTreePtr->entries[toNdx].rightPtr = rightTreePtr->firstPtr; ++leftTreePtr->numEntries; root->entries[entryNdx].dataPtr = rightTreePtr->entries[0].dataPtr;
Data Structure 2015
- R. Wei
64
rightTreePtr->firstPtr = rightTreePtr->entries[0].rightPtr; shifter = 0; while (shifter < rightTreePtr->numEntries - 1) { rightTreePtr->entries[shifter] = rightTreePtr->entries[shifter + 1]; ++shifter; } // while
- -rightTreePtr->numEntries;
return; } // _borrowRight
Data Structure 2015
- R. Wei
65
void _combine (NODE* root, int entryNdx, NODE* leftTreePtr, NODE* rightTreePtr) { int toNdx; int fromNdx; int shifter; toNdx = leftTreePtr->numEntries; leftTreePtr->entries[toNdx].dataPtr = root->entries[entryNdx].dataPtr; leftTreePtr->entries[toNdx].rightPtr = rightTreePtr->firstPtr; ++leftTreePtr->numEntries;
- -root->numEntries;
fromNdx = 0; toNdx++; while (fromNdx < rightTreePtr->numEntries)
Data Structure 2015
- R. Wei
66
leftTreePtr->entries[toNdx++] = rightTreePtr->entries[fromNdx++]; leftTreePtr->numEntries += rightTreePtr->numEntries; free (rightTreePtr); shifter = entryNdx; while (shifter < root->numEntries) { root->entries[shifter] = root->entries[shifter + 1]; shifter++; } // while return; } // _combine
Data Structure 2015
- R. Wei
67
BTREE* BTree_Create (int (*compare) (void* argu1, void* argu2)) { BTREE* tree; tree = (BTREE*) malloc (sizeof (BTREE)); if (tree) { tree->root = NULL; tree->count = 0; tree->compare = compare; } // if return tree; } // BTree_Create
Data Structure 2015
- R. Wei
68
void BTree_Print (BTREE* tree) { _print (tree->root, 0); return; } // BTree_PRINT void _print (NODE* root, int level) { int scanCount; NODE* ptr; void* voidPtr; // Statements if (root) {
Data Structure 2015
- R. Wei
69
scanCount = root->numEntries - 1; while (scanCount >= 0) { ptr = root->entries[scanCount].rightPtr; // Test for subtree if (ptr) _print (ptr, level + 1); // Subtree processed -- print current entry printf("(%02d)", level); for (int i = 1; i <= level; i++ ) printf (" ." ); voidPtr = root->entries[scanCount].dataPtr; printf("%4d", *((int*)voidPtr));
Data Structure 2015
- R. Wei
70
printf("\t--Node: %p\n", root); scanCount--; } // while // Process first pointer if (root->firstPtr) _print (root->firstPtr, level + 1); } // if root return; } // BTree_Print
Data Structure 2015
- R. Wei
71
Some special B-tree and variations:
- 2-3 Tree: a B-tree of order 3. (suitable for internal search)
- 2-3-4 Tree: a B-tree of order 4. (suitable for internal search)
- B*tree: when a node overflows, instead of being split
immediately, the data are tried to redistribute among the node’s siblings.
- B+tree: Some data need to be processed both randomly and
- sequentially. In a B+tree, data are all stored in leaf nodes. The
key in the internal node are just for searching. Each leaf node has one additional pointer pointed to the next leaf node.
Data Structure 2015
- R. Wei
72
Data Structure 2015
- R. Wei
73
Data Structure 2015
- R. Wei
74
Data Structure 2015
- R. Wei
75
Tries A trie is a multiway tree which is used to search keys as a sequence
- f characters (letters or digits, for example).
For example, if we want to search a key begin, then we first find b, then find be, then beg, and so on. In this way, the root has 26 children. And each node may have at most 26 children. So it is based on a 26-way tree. In English, there are no words beginning with ‘bb’, ‘bc’ or , ‘bf’, ‘bg’, · · · . So the according nodes can be pruned.
Data Structure 2015
- R. Wei
76
Data Structure 2015
- R. Wei
77
To prune the tree, we cut all of the branches that are not needed. For example, if no key starts with letter X, then at level 0 the X pointer is null. Similarly, after the letter Q, the only valid letter is
- U. So all the pointers in the Q branch except U are set to null.
As an example, we display a tries which only contains 5 letters A, B, C, E, and T. A node contains an array of 5. The node itself pointer to the letter if the letter exists. In this example, most pointers are null.
Data Structure 2015
- R. Wei
78
Data Structure 2015
- R. Wei
79
Algorithm searchTrie (dictionary, word) set root to dictionary set ltrNdx to 0 loop (root not null) if (root entry equals word) return true end if if (ltrNdx > = word length) return false end if set chNdx to word[ltrNdx] set root to chNdx subtree increment ltrNdx end loop return false
Data Structure 2015
- R. Wei
80