DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

43 BULK INSERT The fastest/best way to build a B+Tree is to first sort the keys and then build the index from the bottom up.

44 BULK INSERT The fastest/best way to build a B+Tree is to first sort the keys and then build the index from the bottom up. Keys: 3, 7, 9, 13, 6, 1

45 BULK INSERT The fastest/best way to build a B+Tree is to first sort the keys and then build the index from the bottom up. Keys: 3, 7, 9, 13, 6, 1 Sorted Keys: 1, 3, 6, 7, 9, 13

46 BULK INSERT The fastest/best way to build a B+Tree is to first sort the keys and then build the index from the bottom up. Keys: 3, 7, 9, 13, 6, 1 Sorted Keys: 1, 3, 6, 7, 9, 13 1 3 6 7 9 13

47 BULK INSERT The fastest/best way to build a B+Tree is to first sort the keys and then build the index from the bottom up. Keys: 3, 7, 9, 13, 6, 1 Sorted Keys: 1, 3, 6, 7, 9, 13 6 9 1 3 6 7 9 13

48 POINTER SWIZZLING Nodes use page ids to reference other nodes in the index. The DBMS has to get the memory location from the page table during traversal. If a page is pinned in the buffer pool, then we can store raw pointers instead of page ids, thereby removing the need to get address from the page table.

49 POINTER SWIZZLING Nodes use page ids to reference other 6 9 nodes in the index. The DBMS has to get the memory location from the page table during traversal. 1 3 6 7 If a page is pinned in the buffer pool, then we can store raw pointers instead Buffer Pool of page ids, thereby removing the need Header Header Header 1 2 3 to get address from the page table.

50 POINTER SWIZZLING Find Key>3 Nodes use page ids to reference other 6 9 nodes in the index. The DBMS has to get the memory location from the page table during traversal. 1 3 6 7 If a page is pinned in the buffer pool, then we can store raw pointers instead Buffer Pool of page ids, thereby removing the need Header Header Header 1 2 3 to get address from the page table.

51 POINTER SWIZZLING Find Key>3 Nodes use page ids to reference other 6 9 nodes in the index. The DBMS has to get the memory location from the page table during traversal. 1 3 6 7 If a page is pinned in the buffer pool, then we can store raw pointers instead Buffer Pool of page ids, thereby removing the need Header Header Header 1 2 3 to get address from the page table.

52 POINTER SWIZZLING Find Key>3 Nodes use page ids to reference other 6 9 nodes in the index. The DBMS has to Page #2 get the memory location from the page table during traversal. 1 3 6 7 If a page is pinned in the buffer pool, then we can store raw pointers instead Buffer Pool of page ids, thereby removing the need Header Header Header 1 2 3 to get address from the page table.

53 POINTER SWIZZLING Find Key>3 Nodes use page ids to reference other 6 9 nodes in the index. The DBMS has to Page #2 get the memory location from the page table during traversal. 1 3 6 7 If a page is pinned in the buffer pool, Page #2 → <Page*> then we can store raw pointers instead Buffer Pool of page ids, thereby removing the need Header Header Header 1 2 3 to get address from the page table.

54 POINTER SWIZZLING Find Key>3 Nodes use page ids to reference other 6 9 nodes in the index. The DBMS has to Page #2 get the memory location from the page Page #3 table during traversal. 1 3 6 7 If a page is pinned in the buffer pool, Page #2 → <Page*> then we can store raw pointers instead Buffer Pool of page ids, thereby removing the need Header Header Header 1 2 3 to get address from the page table.

55 POINTER SWIZZLING Find Key>3 Nodes use page ids to reference other 6 9 nodes in the index. The DBMS has to Page #2 get the memory location from the page Page #3 table during traversal. 1 3 6 7 If a page is pinned in the buffer pool, Page #2 → <Page*> Page #3 → <Page*> then we can store raw pointers instead Buffer Pool of page ids, thereby removing the need Header Header Header 1 2 3 to get address from the page table.

56 POINTER SWIZZLING Find Key>3 Nodes use page ids to reference other 6 9 nodes in the index. The DBMS has to <Page*> get the memory location from the page <Page*> table during traversal. 1 3 6 7 If a page is pinned in the buffer pool, then we can store raw pointers instead Buffer Pool of page ids, thereby removing the need Header Header Header 1 2 3 to get address from the page table.

57 MEMORY POOLS We don’t want to be calling malloc and free anytime we need to add or delete a node. This could lead to a system call. → If you call malloc to request 10 bytes of memory, the allocator may invoke the sbrk (or mmap ) system call to request 4K bytes from OS. → Then, when you call malloc next time to request another 10 bytes, it may not have to issue a system call; instead, it may return a pointer within allocated memory.

58 MEMORY POOLS If all the nodes are the same size, then the index can maintain a pool of available nodes. → Insert : Grab a free node, otherwise create a new one. → Delete: Add the node back to the free pool. Need some policy to decide when to retract the pool size (garbage collection & de-fragmentation).

59 GARBAGE COLLECTION We need to know when it is safe to reclaim memory for deleted nodes in a latch-free index. → Reference Counting → Epoch-based Reclamation → Hazard Pointers → Many others … K2 K3 K4 V2 V3 V4

62 GARBAGE COLLECTION We need to know when it is safe to reclaim memory for deleted nodes in a latch-free index. → Reference Counting → Epoch-based Reclamation → Hazard Pointers → Many others … X K2 K3 K4 V2 V3 V4

63 GARBAGE COLLECTION We need to know when it is safe to reclaim memory for deleted nodes in a latch-free index. → Reference Counting → Epoch-based Reclamation → Hazard Pointers → Many others … K2 K4 V2 V4

64 GARBAGE COLLECTION We need to know when it is safe to reclaim memory for deleted nodes in a latch-free index. → Reference Counting → Epoch-based Reclamation → Hazard Pointers → Many others … K2 K4 V2 V4

65 REFERENCE COUNTING Maintain a counter for each node to keep track of the number of threads that are accessing it. → Increment the counter before accessing. → Decrement it when finished. → A node is only safe to delete when the count is zero. This has bad performance for multi-core CPUs → Incrementing/decrementing counters causes a lot of cache coherence traffic.

66 OBSERVATION We don’t actually care about the actual value of the reference counter. We only need to know when it reaches zero. We don’t have to perform garbage collection immediately when the counter reaches zero. Source: Stephen Tu

67 EPOCH GARBAGE COLLECTION Maintain a global epoch counter that is periodically updated (e.g., every 10 ms). → Keep track of what threads enter the index during an epoch and when they leave. Mark the current epoch of a node when it is marked for deletion. → The node can be reclaimed once all threads have left that epoch (and all preceding epochs). Also known as Read-Copy-Update (RCU) in Linux.

68 NON-UNIQUE INDEXES Approach #1: Duplicate Keys → Use the same node layout but store duplicate keys multiple times. Approach #2: Value Lists → Store each key only once and maintain a linked list of unique values. MODERN B-TREE TECHNIQUES NOW PUBLISHERS 2010

69 NON-UNIQUE: DUPLICATE KEYS B+Tree Leaf Node Level Slots Prev Next ¤ ¤ # # Sorted Keys K2 • • • Kn K1 K1 K1 K2 Values • • • ¤ ¤ ¤ ¤ ¤ ¤

72 NON-UNIQUE: VALUE LISTS B+Tree Leaf Node Level Slots Prev Next ¤ ¤ # # Sorted Keys K5 • • • Kn K1 K2 K3 K4 Values ¤ ¤ ¤ ¤ ¤ • • •

73 NON-UNIQUE: VALUE LISTS B+Tree Leaf Node Level Slots Prev Next ¤ ¤ # # Sorted Keys K5 • • • Kn K1 K2 K3 K4 Values ¤ ¤ ¤ ¤ ¤ • • •

74 VARIABLE LENGTH KEYS Approach #1: Pointers → Store the keys as pointers to the tuple’s attribute. Approach #2: Variable Length Nodes → The size of each node in the index can vary. → Requires careful memory management. Approach #3: Padding → Always pad the key to be max length of the key type. Approach #4: Key Map / Indirection → Embed an array of pointers that map to the key + value list within the node.

75 KEY MAP / INDIRECTION B+Tree Leaf Node Level Slots Prev Next ¤ ¤ # # Key Map ¤ ¤ ¤ Key+Values Andy V1 Obama V2 Prashanth V3

79 KEY MAP / INDIRECTION B+Tree Leaf Node Level Slots Prev Next ¤ ¤ # # Key Map A·¤ O·¤ P·¤ ¤ ¤ ¤ Key+Values Andy V1 Obama V2 Prashanth V3

80 PREFIX COMPRESSION Store a minimum prefix that is needed to correctly route probes into the index. Since keys are sorted in lexicographical order, there will be a lot of duplicated prefixes. abcdefghijk lmnopqrstuv … … … …

81 PREFIX COMPRESSION Store a minimum prefix that is needed to correctly route probes into the index. Since keys are sorted in lexicographical order, there will be a lot of duplicated prefixes. abcdefghijk lmnopqrstuv … … … …

82 PREFIX COMPRESSION Store a minimum prefix that is needed to correctly route probes into the index. Since keys are sorted in lexicographical order, there will be a lot of duplicated prefixes. abc lmn … … … …

83 PREFIX COMPRESSION Store a minimum prefix that is needed to correctly route probes into the index. Since keys are sorted in lexicographical order, there will be a lot of duplicated prefixes. abc lmn Andre Andy Annie … … … …

84 PREFIX COMPRESSION Store a minimum prefix that is needed to correctly route probes into the index. Since keys are sorted in lexicographical order, there will be a lot of duplicated prefixes. abc lmn Andre Andy Annie … … … … An dre dy nie

85 ADAPATIVE RADIX TREE (ART) Uses digital representation of keys to examine prefixes 1-by-1 instead of comparing entire key. Radix trees properties: → The height of the tree depends on the length of keys. (unlike B+tree where height depends on the number of keys) → Does not require rebalancing → The path to a leaf node represents the key of the leaf → Keys are stored implicitly and can be reconstructed from paths. → Structure does not depend on order of key insertion THE ADAPTIVE RADIX TREE: ARTFUL INDEXING FOR MAIN-MEMORY DATABASES ICDE 2013

86 TRIE VS. RADIX TREE Trie (Re`trie’val - 1959) H E A V L ¤ T L E ¤ O Keys: HELLO , HAT , HAVE ¤

91 TRIE VS. RADIX TREE Trie (Re`trie’val - 1959) Radix Tree H H E A ELLO A ¤ V VE L ¤ T ¤ T ¤ L E ¤ O Keys: HELLO , HAT , HAVE ¤

92 TRIE VS. RADIX TREE Trie (Re`trie’val - 1959) Radix Tree H H E A ELLO A ¤ V VE L ¤ T ¤ T ¤ L E ¤ O Keys: HELLO , HAT , HAVE ¤

93 ART: ADAPTIVELY SIZED NODES The index supports four different internal node types with different capacities. Pack in multiple digits into a single node to improve cache locality.

94 ART: MODIFICATIONS H ELLO A VE T ¤ ¤ ¤

95 ART: MODIFICATIONS Operation: Insert HAIR H ELLO A VE T ¤ ¤ ¤

96 ART: MODIFICATIONS Operation: Insert HAIR H ELLO A VE T IR ¤ ¤ ¤ ¤

97 ART: MODIFICATIONS Operation: Insert HAIR H Operation: Delete HAT , HAVE ELLO A VE T IR ¤ ¤ ¤ ¤

98 ART: MODIFICATIONS Operation: Insert HAIR H Operation: Delete HAT , HAVE ELLO A VE T IR ¤ ¤ ¤ ¤

99 ART: MODIFICATIONS Operation: Insert HAIR H Operation: Delete HAT , HAVE ELLO A IR ¤ ¤

100 ART: MODIFICATIONS Operation: Insert HAIR H Operation: Delete HAT , HAVE ELLO A IR ¤ ¤

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #12: OLTP INDEXES (PART II) 2 TODAYS AGENDA B+Tree Overview Index Implementation Issues ART Index 3 LOGISTICS Reminder: Problem set due on Feb 21 st .

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

CS411: Two Perspectives on DBMS User perspective CS411 how to use a database system

Database Systems Database Systems 1 Creating a Database System Design Construction

National Address Database National Address Database What is a National Address Database?

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

Database Management System (DBMS) DBMS contains information about a particular enterprise

Distributed Databases 1 19.1 Distributed Database System A distributed database system

Distributed Databases Distributed database management system A distributed database (DDB) is

Application of evolutionary computation to the advanced image processing Farid Ghareh Mohammadi

Propagating Soft Table Constraints Nicolas Paris with Christophe Lecoutre, Olivier Roussel and

LArSystematics: A systematic shift framework for LArSoft Luke Pickering, K. McFarland, K. Mahn,

Non-Hodgkin lymphoma State of the art of treatment Stephen M. Ansell, MD, PhD Chair, Lymphoma

Block Interaction: A Generative Summarization Scheme for Frequent Patterns Ruoming Jin Kent

CSc 337 LECTURE 23: REGULAR EXPRESSIONS What is form validation? validation : ensuring that

to the Open Universe Initiative in developing countries Igor Molotov International Scientific

The abstract art of composing SDN applications Pedro A. Aranda Telefonica