Hash Tables Lecture # 06 Database Systems Andy Pavlo AP AP - PowerPoint PPT Presentation

Hash Tables Lecture # 06 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018

2 UPCO M IN G DATABASE EVEN TS MapD Talk → Thursday Sept 20 th @ 12:00pm → CIC 4 th Floor CMU 15-445/645 (Fall 2018)

3 ADM IN ISTRIVIA Project #1 is due Wednesday Sept 26 th @ 11:59pm Homework #2 is due Friday Sept 28 th @ 11:59pm CMU 15-445/645 (Fall 2018)

4 REM IN DER If you have a question during the lecture, raise your hand and stop me. Do not come up to the front after the lecture. There are no stupid questions (*) . CMU 15-445/645 (Fall 2018)

5 CO URSE STATUS We are now going to talk about how Query Planning to support the DBMS's execution engine to read/write data from pages. Operator Execution Access Methods Two types of data structures: → Hash Tables Buffer Pool Manager → Trees Disk Manager CMU 15-445/645 (Fall 2018)

6 DATA STRUCTURES Internal Meta-data Core Data Storage Temporary Data Structures Table Indexes CMU 15-445/645 (Fall 2018)

7 DESIGN DECISIO N S Data Organization → How we layout data structure in memory/pages and what information to store to support efficient access. Concurrency → How to enable multiple threads to access the data structure at the same time without causing problems. CMU 15-445/645 (Fall 2018)

8 H ASH TABLES A hash table implements an associative array abstract data type that maps keys to values. It uses a hash function to compute an offset into the array, from which the desired value can be found. CMU 15-445/645 (Fall 2018)

9 STATIC H ASH TABLE Allocate a giant array that has one slot hash(key) for every element that you need to 0 abc record. 1 Ø 2 def To find an entry, mod the key by the ⋮ number of elements to find the offset n xyz in the array. CMU 15-445/645 (Fall 2018)

9 STATIC H ASH TABLE Allocate a giant array that has one slot hash(key) for every element that you need to 0 record. abcdefghi 1 2 xyz123 To find an entry, mod the key by the ⋮ number of elements to find the offset defghijk n in the array. CMU 15-445/645 (Fall 2018)

10 ASSUM PTIO N S You know the number of elements hash(key) ahead of time. 0 abcdefghi 1 Each key is unique. 2 xyz123 ⋮ Perfect hash function. defghijk n → If key1≠key2 , then hash(key1)≠hash(key2) CMU 15-445/645 (Fall 2018)

11 H ASH TABLE Design Decision #1: Hash Function → How to map a large key space into a smaller domain. → Trade-off between being fast vs. collision rate. Design Decision #2: Hashing Scheme → How to handle key collisions after hashing. → Trade-off between allocating a large hash table vs. additional instructions to find/insert keys. CMU 15-445/645 (Fall 2018)

12 TO DAY'S AGEN DA Hash Functions Static Hashing Schemes Dynamic Hashing Schemes CMU 15-445/645 (Fall 2018)

13 H ASH FUN CTIO N S We don’t want to use a cryptographic hash function for our join algorithm. We want something that is fast and will have a low collision rate. CMU 15-445/645 (Fall 2018)

14 H ASH FUN CTIO N S MurmurHash (2008) → Designed to a fast, general purpose hash function. Google CityHash (2011) → Based on ideas from MurmurHash2 → Designed to be faster for short keys (<64 bytes). Google FarmHash (2014) → Newer version of CityHash with better collision rates. CLHash (2016) → Fast hashing function based on carry-less multiplication. CMU 15-445/645 (Fall 2018)

15 H ASH FUN CTIO N BEN CH M ARKS Intel Core i7-8700K @ 3.70GHz std::hash MurmurHash3 CityHash FarmHash CLHash 18000 64 32 192 Throughput (MB/sec) 128 12000 6000 0 1 51 101 151 201 251 Key Size (bytes) Source: Fredrik Widlund CMU 15-445/645 (Fall 2018)

16 H ASH FUN CTIO N BEN CH M ARKS Intel Core i7-8700K @ 3.70GHz std::hash MurmurHash3 CityHash FarmHash CLHash 192 36000 128 Throughput (MB/sec) 24000 64 32 12000 0 1 51 101 151 201 251 Key Size (bytes) Source: Fredrik Widlund CMU 15-445/645 (Fall 2018)

17 STATIC H ASH IN G SCH EM ES Approach #1: Linear Probe Hashing Approach #2: Robin Hood Hashing Approach #3: Cuckoo Hashing CMU 15-445/645 (Fall 2018)

18 LIN EAR PRO BE H ASH IN G Single giant table of slots. Resolve collisions by linearly searching for the next free slot in the table. → To determine whether an element is present, hash to a location in the index and scan for it. → Have to store the key in the index to know when to stop scanning. → Insertions and deletions are generalizations of lookups. CMU 15-445/645 (Fall 2018)

19 LIN EAR PRO BE H ASH IN G hash(key) A B | val A <key>|<value> C D E F CMU 15-445/645 (Fall 2018)

19 LIN EAR PRO BE H ASH IN G hash(key) | val B A B | val A C D E F CMU 15-445/645 (Fall 2018)

19 LIN EAR PRO BE H ASH IN G hash(key) | val B A B | val A C | val C D E F CMU 15-445/645 (Fall 2018)

19 LIN EAR PRO BE H ASH IN G hash(key) | val B A B | val A C | val C D | val E D F CMU 15-445/645 (Fall 2018)

20 N O N- UN IQ UE KEYS Choice #1: Separate Linked List → Store values in separate storage area for each key. CMU 15-445/645 (Fall 2018)

20 N O N- UN IQ UE KEYS Value Lists Choice #1: Separate Linked List value1 XYZ → Store values in separate storage area for value2 ABC each key. value3 value1 value2 Choice #2: Redundant Keys → Store duplicate keys entries together in the hash table. XYZ | value1 ABC | value1 XYZ | value2 XYZ | value3 ABC | value2 CMU 15-445/645 (Fall 2018)

21 O BSERVATIO N To reduce the # of wasteful comparisons, it is important to avoid collisions of hashed keys. This requires a hash table with ~2x the number of slots as the number of elements. CMU 15-445/645 (Fall 2018)

22 RO BIN H O O D H ASH IN G Variant of linear probe hashing that steals slots from "rich" keys and give them to "poor" keys. → Each key tracks the number of positions they are from where its optimal position in the table. → On insert, a key takes the slot of another key if the first key is farther away from its optimal position than the second key. CMU 15-445/645 (Fall 2018)

23 RO BIN H O O D H ASH IN G hash(key) A B | val [0] # of "Jumps" From First Position A C D E F CMU 15-445/645 (Fall 2018)

23 RO BIN H O O D H ASH IN G hash(key) | val [0] B A B | val [0] A C D E F CMU 15-445/645 (Fall 2018)

23 RO BIN H O O D H ASH IN G hash(key) | val [0] B A B | val [0] A A[0] == C[0] C | val [1] C D E F CMU 15-445/645 (Fall 2018)

23 RO BIN H O O D H ASH IN G hash(key) | val [0] B A B | val [0] A C | val [1] C[1] > D[0] C D | val [1] E D F CMU 15-445/645 (Fall 2018)

23 RO BIN H O O D H ASH IN G hash(key) | val [0] B A B | val [0] A A[0] == E[0] C | val [1] C D | val [1] E D F CMU 15-445/645 (Fall 2018)

23 RO BIN H O O D H ASH IN G hash(key) | val [0] B A B | val [0] A A[0] == E[0] C | val [1] C[1] == E[1] C D | val [1] E D F CMU 15-445/645 (Fall 2018)

23 RO BIN H O O D H ASH IN G hash(key) | val [0] B A B | val [0] A A[0] == E[0] C | val [1] C[1] == E[1] C D | val [1] D[1] < E[2] E D F CMU 15-445/645 (Fall 2018)

24 CUCKO O H ASH IN G Use multiple hash tables with different hash functions. → On insert, check every table and pick anyone that has a free slot. → If no table has a free slot, evict the element from one of them and then re-hash it find a new location. Look-ups and deletions are always O(1) because only one location per hash table is checked. CMU 15-445/645 (Fall 2018)

25 CUCKO O H ASH IN G Hash Table #1 Hash Table #2 Insert A hash 1 (A) hash 2 (A) ⋮ ⋮ CMU 15-445/645 (Fall 2018)

25 CUCKO O H ASH IN G Hash Table #1 Hash Table #2 Insert A hash 1 (A) hash 2 (A) A | val ⋮ ⋮ CMU 15-445/645 (Fall 2018)

25 CUCKO O H ASH IN G Hash Table #1 Hash Table #2 Insert A hash 1 (A) hash 2 (A) A | val Insert B hash 1 (B) hash 2 (B) ⋮ ⋮ CMU 15-445/645 (Fall 2018)

25 CUCKO O H ASH IN G Hash Table #1 Hash Table #2 Insert A B | val hash 1 (A) hash 2 (A) A | val Insert B hash 1 (B) hash 2 (B) ⋮ ⋮ CMU 15-445/645 (Fall 2018)

25 CUCKO O H ASH IN G Hash Table #1 Hash Table #2 Insert A B | val hash 1 (A) hash 2 (A) A | val Insert B hash 1 (B) hash 2 (B) ⋮ ⋮ Insert C hash 1 (C) hash 2 (C) CMU 15-445/645 (Fall 2018)

25 CUCKO O H ASH IN G Hash Table #1 Hash Table #2 Insert A C | val hash 1 (A) hash 2 (A) A | val Insert B hash 1 (B) hash 2 (B) ⋮ ⋮ Insert C hash 1 (C) hash 2 (C) CMU 15-445/645 (Fall 2018)

Hash Tables Lecture # 06 Database Systems Andy Pavlo AP AP - PowerPoint PPT Presentation

Hash Tables Lecture # 06 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018 2 UPCO M IN G DATABASE EVEN TS MapD Talk Thursday Sept 20 th @ 12:00pm CIC 4 th Floor CMU 15-445/645 (Fall

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Datastructures 1 Hash Tables Red Black Trees Week 8 Objectives Hash Tables, Hashing

Hash Tables 1 / 91 Hash Tables Administrivia Assignment 2 has been released. We will be

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Searching Hash Tables Hash Functions

Hash tables Hash functions Open addressing March 09, 2020 Cinda Heeren / Andy Roth / Geoffrey

Working with Hash Tables Daniel Petrolito (ANZ Bank) Working With Hash Tables Daniel SAS

Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used

Hash Tables 1 Hash Table in Primary Storage Main parameter B = number of buckets Hash

CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implementations: average

CS261 Data Structures Hash Tables Buckets/Chaining Hash Tables:

Hash tables Most data structures that were going to see are about storing and manipulating data

Owning the Routing Table Part II Gabi Nakibly 1 , Eitan Menahem 2 , Ariel Waizel 2 , Yuval Elovici

Guided Pathways at California Community Colleges WELCOME! Choose an index card from the table

X1D: Create Pivot Tables using Excel 2013 3/07/2018 V1N Create Pivot Tables using Excel 2013 1

Table Augmentation SIGIR 2019 tutorial - Part V Shuo Zhang and Krisztian Balog University of

Optimizing Lua Applications for LuaJIT and OpenResty agentzh@openresty.org Yichun Zhang

Boomerang Connectivity Table: A New Cryptanalysis Tool Carlos Cid 1 , Tao Huang 2 , Thomas Peyrin

Process Address Spaces and Binary Formats Don Porter 1 COMP 530: Operating Systems Background

Baggyboundschecking PeriklisAkri5dis,ManuelCosta,

Hash Tables Lecture # 06 Database Systems Andy Pavlo AP AP - PowerPoint PPT Presentation

Hash Tables Lecture # 06 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018 2 UPCO M IN G DATABASE EVEN TS MapD Talk Thursday Sept 20 th @ 12:00pm CIC 4 th Floor CMU 15-445/645 (Fall

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Datastructures 1 Hash Tables Red Black Trees Week 8 Objectives Hash Tables, Hashing

Hash Tables 1 / 91 Hash Tables Administrivia Assignment 2 has been released. We will be

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Searching Hash Tables Hash Functions

Hash tables Hash functions Open addressing March 09, 2020 Cinda Heeren / Andy Roth / Geoffrey

Working with Hash Tables Daniel Petrolito (ANZ Bank) Working With Hash Tables Daniel SAS

Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

Topic 22 Hash Tables &quot; hash collision n. [from the techspeak] (var. `hash clash') When used

Hash Tables 1 Hash Table in Primary Storage Main parameter B = number of buckets Hash

CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implementations: average

CS261 Data Structures Hash Tables Buckets/Chaining Hash Tables:

Hash tables Most data structures that were going to see are about storing and manipulating data

Owning the Routing Table Part II Gabi Nakibly 1 , Eitan Menahem 2 , Ariel Waizel 2 , Yuval Elovici

Guided Pathways at California Community Colleges WELCOME! Choose an index card from the table

X1D: Create Pivot Tables using Excel 2013 3/07/2018 V1N Create Pivot Tables using Excel 2013 1

Table Augmentation SIGIR 2019 tutorial - Part V Shuo Zhang and Krisztian Balog University of

Optimizing Lua Applications for LuaJIT and OpenResty agentzh@openresty.org Yichun Zhang

Boomerang Connectivity Table: A New Cryptanalysis Tool Carlos Cid 1 , Tao Huang 2 , Thomas Peyrin

Process Address Spaces and Binary Formats Don Porter 1 COMP 530: Operating Systems Background

Baggyboundschecking PeriklisAkri5dis,ManuelCosta,

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used