Acknowledgement HashTable The set of slides have used materials - PowerPoint PPT Presentation

Acknowledgement HashTable • The set of slides have used materials from the following resources CISC4080, Computer Algorithms • Slides for textbook by Dr. Y. Chen from CIS, Fordham Univ. Shanghai Jiaotong Univ. • Slides from Dr. M. Nicolescu from UNR • Slides sets by Dr. K. Wayne from Princeton • which in turn have borrowed materials from � other resources Instructor: X. Zhang • Other online resources Spring 2018 2 Support for Dictionary Towards constant time • Dictionary ADT : a dynamic set of elements supporting • Direct address table: use key as index into the array INSERT, DELETE, SEARCH operations • T[i] stores the element whose key is i • elements have distinct key fields � 0 T • DELETE, SEARCH by key Insert ( element(2,Alice)) � 1 T[2]=element(2, Alice); • Different ways to implement Dictionary � 2, Alice Delete (element(4)) 2 • unsorted array � NULL T[4]=NULL; NULL • insert O(1), delete O(n), search O(n) Search (element(5)) � 4, Bob U: the set of all possible key values • sorted array return T[5]; � 5, Ed • insert O(n), delete O(n), search O(log n) � …. K: actual set of keys • binary search tree � in your data • insert O(log n), delete O(log n), search O(log n) • How big is the table? • linked list … • big enough to have one slot for every possible key • Can we have “almost” constant time insert/delete/ search? 3 4

Case studies Hash Table • A web server: maintains all active clients’ info, using IP • Hash Table: use a (hash) function to map key to index of addr. as key the table (array) • Element x is stored in T[h(x.key)] � U: the set of all • hash function: int hash (Key k) // return value 0…m-1 � possible key values � K: actual � set of keys Collision : when in your data � two different keys are mapped to • Universe of keys: the set of all possible IPv4 addr., |U|=2 32 same index. � • much bigger than total # of active clients � • Too big to use direct access table: Can collision be • a table with 2 32 entries, if each entry is 32bytes, then avoided? 128GB is needed! • How to have constant accessing time, while not requiring huge memory usage? Is it possible to design a hash function that is one-to-one? 5 6 Hint: domain and condomain of hash()? HashTable Operations Hashing: unavoidable collision • a large universe set U • If there is no collision: • A set K of actually occurred • Insert keys, |K| << |U| (much much • Table[h(“john”)]=Eleme smaller) nt(“John”, 25000) • Table T of size m, m = Θ ( | K | ) So that we don’t waste memory space • Delete • A hash function : • Table[h(“john”)]=NULL • Given |U| > |m|, hash function is many-to-one • Search • by pigeonhole theorem • return Table[h(“john”)] • Collisions cannot be • All constant time O(1) avoided but its chances can be reduced using a “good” hash function 7 8

Hash Function First stage: any type to integer • A hash function : . Given • Any basic type is represented in binary an element x, x is stored in T[h(x.key)] • Composite type which is made up of basic type • a character string (each char is coded as an int by ASCII • Good hash function: code), e.g.,“pt” • fast to compute • add all chars up, ‘p’+’t’=112+116=228 • Ideally, map any key equally likely to any of • radix notation: ‘p’*128+’t’=14452 the slots, independent of other keys • treat “pt” as base 128 number… • Hash Function: • a point type: (x,y) an ordered pair of int • first stage: map non-integer key to integer • x+y • ax+by // pick some non-zero constants a, b • second stage: map integer to [0…m-1] • … • IP address:four integers in range of 0…255 • add them up 9 10 • radix notation: 150*256 3 +108*256 2 +68*256+26 Hash Function: second stage Hash Function: second stage • Multiplication method : pick a constant A in the • Division method : divide integer by m (size of range of (0,1), hash table) and take remainder • h(key) = key mod m � • if key’s value are randomly uniformly distributed � all integer values, the above hash function is • take fraction part of kA, and multiply with m uniform • e.g., m=10000, • But often times data are not randomly distributed, h(123456)=41. • What if m=100, all keys have same last two digits? • Advantage: m could be exact power of 2… • Similarly, if m=2 p , then result is simply the lowest- ordre p bits • Rule of thumbs: choose m to be a prime not too close to exact powers of 2 11 12

Multiplication Method Exercise • Write a hash function that maps string type to a hash table of size 250 • First stage: using radix notation • “Hello!” => ‘H’*128^5+’e’*128^4+…+’!’ • Second stage: X • x mod 250 • How do you implement it efficiently? • Recall modular arithmetic theorem? • (x+y) mod n = ((x mod n)+(y mod n)) mod n • (x * y) mod n = ((x mod n)*(y mod n)) mod n • (x^e) mod n = (x mod n)^e mod n 14 13 Exercise Collision Resolution • Write a hash function that maps a point type as • Recall that h(.) is not one-to-one, so it maps below to a hash table of size 100 multiple keys to same slot: class point{ • for distinct k1, k2, h(k1)=h(k2) => collision int x, y; • Two different ways to resolve collision } • Chaining: store colliding keys in a linked list � (bucket) at the hash table slot • dynamic memory allocation, storing pointers (overhead) • Open addressing: if slot is taken, try another, and another (a probing sequence) • clustering problem. 15 16

Chaining Chaining: operations • Chaining: store colliding elements in a linked list at • Insert (T,x): the same hash table slot • insert x at the head of T[h(x.key)] • if all keys are hashed to same slot, hash table • Running time (worst and best case): O(1) degenerates to a linked list. • Search (T,k) � • search for an element with key x in list T[h(k)] � Here doubly-linked list is used • Delete (T,x) � � • Delete x from the list T[h(x.key)] � • Running time of search and delete: proportional � to length of list stored in h(x.key) � • C++: NodePtr T[m]; • STL: vector<list<HashedObject>> T; 17 18 Chaining: analysis Collision Resolution • Consider a hash table T with m slots stores n • Open addressing: store colliding elements elements. elsewhere in the table • load factor • Advantage: no need for dynamic allocation, no • If any given element is equally likely to hash need to store pointers into any of the m slots, independently of where • When inserting: any other element is hashed to, then average • examine (probe) a sequence of positions in hash table length of lists is until find empty slot • search and delete takes • e.g., linear probing: if T[h(x.key)] is taken, try slots: h(x.key)+1, h(x.key+2), … • If all keys are hashed to same slot, hash table degenerates to a linked list • When searching/deleting: • search and delete takes • examine (probe) a sequence of positions in hash table until find element 19 20

Open Addressing Linear Probing • Hash function: extended to probe sequence (m • Probing sequence functions): • h i (x)=(h(x)+i) mod m � • probe sequence: h(x),h(x) +1, h(x)+2, … � � • Continue until an empty slot is found • insert element with key x: if h 0 (x) is taken, try h 1 (x), and then h 2 (x), until find an empty/deleted • Problem: primary clustering slot • if there are multiple keys • Search for key x: if element at h 0 (x) is not a mapped to a slot, the slots match, try h 1 (x), and then h 2 (x), ..until find after it tends to be occupied matching element, or reach an empty slot • Reason: all keys using same probing: +1, +2, … • Delete key x: mark its slot as DELETED 21 22 Quadratic Probing Double Hashing • Use two functions f 1 ,f 2 : � � • probe sequence: • Probe sequence: • h 0 (x)=h(x) mod m • h 0 (x)=f 1 (x) mod m, • h 1 (x)=(h(x)+c 1 +c 2 ) mod m • h 1 (x)=(f 1 (x)+f 2 (x)) mod m • h 2 (x)=(h(x)+2c 1 +4c 2 ) mod m • h 2 (x)=(f 1 (x)+2f 2 (x)) mod m,… • … • f 2 (x) and m must be relatively prime for entire hash • Problem: table to be searched/used • secondary clustering • Two integers a, b are relatively prime with each • choose c 1 ,c 2 ,m carefully so that all slots are other if their greatest common divisor is 1 probed • e.g., m=2 k , f 2 (x) be odd • or, m be prime, f 2 (x)<m 23 24

Acknowledgement HashTable The set of slides have used materials - PowerPoint PPT Presentation

Acknowledgement HashTable The set of slides have used materials from the following resources CISC4080, Computer Algorithms Slides for textbook by Dr. Y. Chen from CIS, Fordham Univ. Shanghai Jiaotong Univ. Slides from Dr. M.

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

HashTable CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Spring 2018

GCL SymbolTable A Chain of Hash Tables based on java.util.Hashtable Joseph Bergin 1/12/99 1

Hashing Chapter 5 1 Objectives Understand the idea of hashing Compare hashing to sorting

Land Acknowledgement Land Acknowledgement Lenape Gayogo h n , Haudenosaunee

Teaching Acknowledgement & Permissions Acknowledgement & Permissions Reading/Language

COVID-19 Community Engagement Funding Announcement Agenda OHAs Acknowledgement to

MANAGING HEALTH RESOURCES: A FOUNDATION WORKSHOP GETTING STARTED Acknowledgement of Country

Teaching g Reading/Language Arts to All Students Tracie Lynn-Zakas tracie.zakas@cms.k12.nc.us

PEOPLE MANAGEMENT SKILLS PROGRAM DAY ONE SESSION 1 WELCOME AND INTRODUCTIONS ACKNOWLEDGEMENT

FINANCIAL MANAGEMENT IN A HEALTH CONTEXT WELCOME Acknowledgement of Country and Elders

Hashing and Dictionaries 15-110 Monday 03/02 Learning Goals Understand how and why hashing

Linked Lists Comp 1402/1002 Using Defined Types Data Structures are key to Computer Science

Improving Stroke Prevention in Patients With Atrial Fibrillation Acknowledgement Disclosures

Advanced Acceleration Concepts Advanced Acceleration Concepts Levi Sch chter chter Levi

Interventions for School Improvement Acknowledgement and disclaimer Information and materials for

Competence with Online Reference Resources Tanya Parsons EAP Teacher/ Researcher Challenges for

ONLINE LEARNING OF WEBSITE EMBEDDINGS for Accurate Prediction of User Behavior Even when Data

NIRS and the PPR Reporting Tips NIRS and Reporting Information Available both on the URC and

Papillon Project Mathieu Mangeot & David Thevenin Work done at NII, Tokyo, Japan Now looking

Sparse Audio Models For Inverse Audio Problems Rmi Gribonval INRIA Rennes - Bretagne

Programming Fundamentals and Python Steven Bird Ewan Klein Edward Loper University of

Lower Bounds for External Memory Dictionaries Gerth Stlting Brodal Rolf Fagerberg BRICS

Catherine Muller Toulouse Business School The Paper Objective : identify the factors influencing

Sambuz

Useful Links

Newsletter

Mail Us

Acknowledgement HashTable The set of slides have used materials - PowerPoint PPT Presentation

Acknowledgement HashTable The set of slides have used materials from the following resources CISC4080, Computer Algorithms Slides for textbook by Dr. Y. Chen from CIS, Fordham Univ. Shanghai Jiaotong Univ. Slides from Dr. M.

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

HashTable CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Spring 2018

GCL SymbolTable A Chain of Hash Tables based on java.util.Hashtable Joseph Bergin 1/12/99 1

Hashing Chapter 5 1 Objectives Understand the idea of hashing Compare hashing to sorting

Land Acknowledgement Land Acknowledgement Lenape Gayogo h n , Haudenosaunee

Teaching Acknowledgement &amp; Permissions Acknowledgement &amp; Permissions Reading/Language

COVID-19 Community Engagement Funding Announcement Agenda OHAs Acknowledgement to

MANAGING HEALTH RESOURCES: A FOUNDATION WORKSHOP GETTING STARTED Acknowledgement of Country

Teaching g Reading/Language Arts to All Students Tracie Lynn-Zakas tracie.zakas@cms.k12.nc.us

PEOPLE MANAGEMENT SKILLS PROGRAM DAY ONE SESSION 1 WELCOME AND INTRODUCTIONS ACKNOWLEDGEMENT

FINANCIAL MANAGEMENT IN A HEALTH CONTEXT WELCOME Acknowledgement of Country and Elders

Hashing and Dictionaries 15-110 Monday 03/02 Learning Goals Understand how and why hashing

Linked Lists Comp 1402/1002 Using Defined Types Data Structures are key to Computer Science

Improving Stroke Prevention in Patients With Atrial Fibrillation Acknowledgement Disclosures

Advanced Acceleration Concepts Advanced Acceleration Concepts Levi Sch chter chter Levi

Interventions for School Improvement Acknowledgement and disclaimer Information and materials for

Competence with Online Reference Resources Tanya Parsons EAP Teacher/ Researcher Challenges for

ONLINE LEARNING OF WEBSITE EMBEDDINGS for Accurate Prediction of User Behavior Even when Data

NIRS and the PPR Reporting Tips NIRS and Reporting Information Available both on the URC and

Papillon Project Mathieu Mangeot &amp; David Thevenin Work done at NII, Tokyo, Japan Now looking

Sparse Audio Models For Inverse Audio Problems Rmi Gribonval INRIA Rennes - Bretagne

Programming Fundamentals and Python Steven Bird Ewan Klein Edward Loper University of

Lower Bounds for External Memory Dictionaries Gerth Stlting Brodal Rolf Fagerberg BRICS

Catherine Muller Toulouse Business School The Paper Objective : identify the factors influencing

Sambuz

Useful Links

Newsletter

Mail Us

Teaching Acknowledgement & Permissions Acknowledgement & Permissions Reading/Language

Papillon Project Mathieu Mangeot & David Thevenin Work done at NII, Tokyo, Japan Now looking