Databases and keys Integer keys A database stores records with - PowerPoint PPT Presentation

CS206 CS206 Databases and keys Integer keys A database stores records with various attributes. Let’s make a keyed table of all the students in the class, with the student number as the key. The database can be represented as a table, where each row is class Student(): a record, and each column is an attribute. def __init__(self, id, name, dept, alias): Number Name Dept Alias In Python, we represent a keyed table as a dictionary: 오 재 훈 산 디 과 20090612 Pichu 20100202 강 상 익 무 학 Cleffa row db[id] = Student(id, name, dept, alias) 20100311 손 호 진 무 학 Bulbasaur How does it manage to find the value for a key so fast? column First idea: Using a list with 100 slots, we use the last two digits Databases often designate one attribute as the key. The key of the student number as the index. has to be unique—every key appears on only one row. A table Number Name Dept Alias with keys is a keyed table. But the last two 정 민 수 무 학 20100874 Pikachu We want to find records (rows) by key, so the keyed table is a digits are not unique 20080174 방 태 수 산 디 과 Mew map: key → record. — we have collisions! CS206 CS206 Chaining Analysis We assume the hash function is good: It should distribute the Chaining: Each slot is actually a linked list of (key, value) pairs items on the slots uniformly. stored in this slot. (We need the key!) Analysis of hash tables assumes that the hash function is random: Each slot is equally likely to be chosen. The choices 73 for two different items are independent. 74 20100874 20080174 75 정 민 수 방 태 수 Consider insertion/deletion/searching an item x . The running time is proportional to the length of the chain for x . This is equal to the number of items y for which h ( y ) = h ( x ) . For given y , this happens with probability 1 /N . The expected To search for a key 20080174, we access the table at index 74, value for all y is n/N . and then search through the linked list. Here n is the number of items, and N is the table size. Load factor: The load factor λ of a hash table is n/N . Running time is O ( λ ) .

CS206 CS206 Open addressing Linear probing We could make the data structure much more compact if we could avoid the linked lists and store all data in the table. 0 Open addressing: allow to store items at a slot different from 1 its hash code. 2 3 Closed addressing: items must be stored at the slot given by its 4 hash code: chaining. 5 Easiest form of open addressing: Linear probing. 6 Start at the slot given by the hash code. 7 If it is already in use, try the next, and continue until a free 8 18 slot is found. 9 89 insert: 89 18 CS206 CS206 Linear probing Linear probing 0 49 0 49 1 1 58 2 2 3 3 4 4 5 5 6 6 7 7 8 18 8 18 9 89 9 89 insert: 89 18 49 insert: 89 18 49 58

CS206 CS206 Linear probing Linear probing Find operation: Need to search sequentially Find operation: Need to search sequentially 0 49 0 49 until key found or empty slot found. until key found or empty slot found. 1 58 1 2 9 2 9 Delete operation: Slot is marked as available 3 3 (can be reused at insertion, but is not the 4 4 same as an empty slot). 5 5 6 6 7 7 8 18 8 18 9 89 9 89 insert: 89 18 49 58 9 insert: 89 18 49 58 9 delete 58 CS206 CS206 Analysis of linear probing Real behavior of linear probing Unfortunately, the probabilities are not independent: How far do we have to search to insert a new item? Experiment 1: Fill each slot with probability λ = 0 . 7 : Simplified analysis: Let’s assume all slots are filled with equal #### ## ### ## ### ##### # #### #### ### # ## #### ### # # #### ## ##### # # # ##### ## and independent probability. So each slot is filled with Average number of probes: 2.4 probability λ = n/N . Experiment 2: Insert λ ∗ 100 = 70 items with linear probing: The expected number of probes (slots considered) until we find #### ########### # # ####### ## ##### ##### ## ###### ################# # # # ### ### Average number of probes: 4.4 a free slot is 1 / (1 − λ ) . Same with λ = 0 . 9 : 6.9 versus 24.0 ############ ### ####### ################## ############### ############### ## ######## # ###### ## The load factor λ ranges from 0 (empty hash table) to 1 ############################ # ########### ############# ##################################### (completely full hash table). When it approaches 1 , the hash N = 10000 , and repeating λ = 0 . 5 2.0 2.5 table becomes very inefficient, and needs to be enlarged. 1000 times. λ = 0 . 7 3.3 6.0 λ = 0 . 9 10.0 49.5 Linear probing causes λ = 0 . 95 20.0 182.1 clustering in the hash table. λ = 0 . 99 100.0 1750.5

CS206 CS206 Real analysis of linear probing Hash functions Assuming that the hash function behaves randomly, the What do we do if the key is not an integer? expected number of probes for an insertion (or unsucessful We use two functions: search) is (for N → ∞ ): Hash code 1 1 � � h 1 : keys → integers 1 + 2 (1 − λ ) 2 Compression function h 2 : integers → [0 , N − 1] Linear probing works very well when the hash function is good and the load factor λ is small, say λ ≤ 0 . 5 . Index in hash table is computed as h 2 ( h 1 ( key )) . Linear probing is more sensitive to bad hash functions than Ideally, the hash function should map keys uniformly at chaining. random to an index into the hash table. Load factor includes items that have been deleted! When there Resizing hash tables: We change the compression function are too many deleted items, we need to rehash the table. only, and then need to rehash all elements. CS206 CS206 Compression functions Hash Codes A good hash code for strings: Hash codes and compression functions are a bit of a black art. It is easy to mess up. def hash_code(s): h = 0 Mix up the bits An obvious compression function is h 2 ( x ) = x mod N . for ch in s: h = (127 * h + ord(ch)) % 16908799 It only works well if N is a prime number. return h Each character has different effect. A better compression function is Bad hash codes: h ( x ) = (( ax + b ) mod p ) mod N, • Sum up the codes of the letters (too small, and anagrams collide). where a , b , and p are positive integers, p is a large prime, and • Take the first three letters (“pre” is common, “xzq” never p ≫ N . N does not need to be prime. occurs). Why is the function above good? Because it works in practice. . .

CS206 Hash codes and equality Python set and dict compute a hash code by calling the builtin function hash . This uses the method __hash__ of the object. set and dict only work correctly if the following “contract” is observed: If obj1 == obj2 then hash(obj1) == hash(obj2) . If you define __eq__ for a class, you also need to define __hash__ (at least if you want to use it as a key. . . ) Mutable keys are dangerous! If you change a key in the hash table, you cannot find it anymore. Python documentation says: An object is hashable if it has a hash value which never changes during its lifetime . . .

Databases and keys Integer keys A database stores records with - PowerPoint PPT Presentation

CS206 CS206 Databases and keys Integer keys A database stores records with various attributes. Lets make a keyed table of all the students in the class, with the student number as the key. The database can be represented as a table, where

2010 2500 keys > 100 uses 1250 keys > 1000 uses 2018 11000 keys >

Password Human beings : Short keys; possibly used to generate longer keys Dictionary

Everglades excerpts of a talk by Fritz Davis 2004 John Kunkel Small The Keys Lower

Driving the 5 Keys Smiths 5 Keys 1- Aim High in Steering 2- Get the Big Picture 3-

(CAN,Tapestry, Pastry) Freenet: search towards keys, but no guarantees Chord: Map keys

AVL Trees All keys in left subtree smaller than nodes key 2 6 10 12 All keys in

Ordered Dictionaries Ordered Dictionaries Keys are ordered Perform usual dictionary

Compressing RSA/Rabin keys Public keys D. J. Bernstein Each user publishes a key 2 2047 + 1

Reaping and breaking keys at scale: when crypto meets big data Nils Amiet Yolan Romailler

Keys to Develop and Secure Potential Prospects Keys to Develop and Secure Potential Prospects

Keys to French FOLD-OUT & BOOKLET (a complement, not a substitute to any FSL programmes)

The Magic of Stenography MARY MCKEON, RMR, CRR, CBC 18 Letters 24 Keys Shorthand on a

Collision Resolution by Chaining 0 U ( universe of keys) k 1 k 4 k 1 k 4 K k 2 k 5 k 2 k 6

Relational join operator 1 Preliminaries 1.a Keys and partitioning Recall from our last reading

BST property: for each node v with key k , nodes in left subtree have keys k nodes in

RIOT and CAN Vincent Dupont OTA keys RIOT Summit September 25-26, 2017 Vincent Dupont (OTA

Reconstructing RSA Private Keys from Random Key Bits Nadia Heninger and Hovav Shacham

. ( key key - total j 'D keys closer to root float wt Ivo , . . .vn . , ) weight - and

S(b)-Trees: an Optimal Balancing of Variable Length Keys Konstantin V. Shvachko 2 Dynamic

Aggregation 1 Preliminaries 1.a Keys, partitions, relatedness of tuples We saw last week that

New Bern Choice Neighborhoods August 5, 2015 Keys to Our Success Draft Housing +

Getting started with SSH Keys with a free SYN Shop VM Host mrjones SYN Shop Wednesday May 16,

KEYS TO EFFECTIVE GRANT ADMINISTRATION CDBG Disaster Recovery 4 Keys to Effective Grant

Principle of the radix sort Sorts a list of fixed size integer keys - Separates the key into

Databases and keys Integer keys A database stores records with - PowerPoint PPT Presentation

CS206 CS206 Databases and keys Integer keys A database stores records with various attributes. Lets make a keyed table of all the students in the class, with the student number as the key. The database can be represented as a table, where

2010 2500 keys &gt; 100 uses 1250 keys &gt; 1000 uses 2018 11000 keys &gt;

Password Human beings : Short keys; possibly used to generate longer keys Dictionary

Everglades excerpts of a talk by Fritz Davis 2004 John Kunkel Small The Keys Lower

Driving the 5 Keys Smiths 5 Keys 1- Aim High in Steering 2- Get the Big Picture 3-

(CAN,Tapestry, Pastry) Freenet: search towards keys, but no guarantees Chord: Map keys

AVL Trees All keys in left subtree smaller than nodes key 2 6 10 12 All keys in

Ordered Dictionaries Ordered Dictionaries Keys are ordered Perform usual dictionary

Compressing RSA/Rabin keys Public keys D. J. Bernstein Each user publishes a key 2 2047 + 1

Reaping and breaking keys at scale: when crypto meets big data Nils Amiet Yolan Romailler

Keys to Develop and Secure Potential Prospects Keys to Develop and Secure Potential Prospects

Keys to French FOLD-OUT &amp; BOOKLET (a complement, not a substitute to any FSL programmes)

The Magic of Stenography MARY MCKEON, RMR, CRR, CBC 18 Letters 24 Keys Shorthand on a

Collision Resolution by Chaining 0 U ( universe of keys) k 1 k 4 k 1 k 4 K k 2 k 5 k 2 k 6

Relational join operator 1 Preliminaries 1.a Keys and partitioning Recall from our last reading

BST property: for each node v with key k , nodes in left subtree have keys k nodes in

RIOT and CAN Vincent Dupont OTA keys RIOT Summit September 25-26, 2017 Vincent Dupont (OTA

Reconstructing RSA Private Keys from Random Key Bits Nadia Heninger and Hovav Shacham

. ( key key - total j 'D keys closer to root float wt Ivo , . . .vn . , ) weight - and

S(b)-Trees: an Optimal Balancing of Variable Length Keys Konstantin V. Shvachko 2 Dynamic

Aggregation 1 Preliminaries 1.a Keys, partitions, relatedness of tuples We saw last week that

New Bern Choice Neighborhoods August 5, 2015 Keys to Our Success Draft Housing +

Getting started with SSH Keys with a free SYN Shop VM Host mrjones SYN Shop Wednesday May 16,

KEYS TO EFFECTIVE GRANT ADMINISTRATION CDBG Disaster Recovery 4 Keys to Effective Grant

Principle of the radix sort Sorts a list of fixed size integer keys - Separates the key into

2010 2500 keys > 100 uses 1250 keys > 1000 uses 2018 11000 keys >

Keys to French FOLD-OUT & BOOKLET (a complement, not a substitute to any FSL programmes)