Fill out the Brown Computer Science Survey you got in your email!
Only takes 5 min!
If you didn’t receive the survey, email litofish@cs.brown.edu
All multiple choice!
percentageproject.com
Fill out the Brown Computer Science Survey you got in your email! - - PowerPoint PPT Presentation
Fill out the Brown Computer Science Survey you got in your email! percentageproject.com Only takes 5 min! If you didnt receive the survey, email All multiple litofish@cs.brown.edu choice! 2 Sets, Dictionaries & Hash Tables CS16:
Fill out the Brown Computer Science Survey you got in your email!
Only takes 5 min!
If you didn’t receive the survey, email litofish@cs.brown.edu
All multiple choice!
percentageproject.comSets, Dictionaries & Hash Tables
CS16: Introduction to Data Structures & Algorithms Spring 2020
Q: how would you build a (basic) search engine?
What’s so Hard about Search Engines?
5Search Through Each Page?
Outline
Dictionary
Dictionary ADT
FALSE otherwise
Q: how can we implement a dictionary?
Array-based Dictionary
Is O(n) good enough? What if
key/value pairs?
Q: can we do better?
Yes! with a Hash Table
& h(x)
Dictionary vs. Hash Table
Yes! with a Hash Table
X Y h X Y h X Y h X Y
Building a Dictionary w/ a Hash Table
Hash Table — Add
17 00472885 David Laidlaw 00943855 Kaila Jeter 00238494 Alejandro Molina 00745911 Chantal Toupin 00943855 Kaila Jeter 00238494 Alejandro Molina 00472885 David Laidlaw keys: banner IDs values: names 00745911 Chantal ToupinBuilding a Dictionary w/ a Hash Table
Overcoming Collisions
& h(x)
A buckets:
FYI: there are many
e.g., linear probing, quadratic probing, cuckoo hashing,…
Hash Table
20 function add(k, v): index = h(k) table[index].append(k, v) function get(k): index = h(k) for (key, val) in table[index]: if key = k: return val error(“key not found”) table: array h: hash function O(1) if computing hash function is O(1) runtime depends on bucket sizeHash Table
Hash Table — Add
22 00472885 David Laidlaw 00943855 Kaila Jeter 00238494 Alejandro Molina 00745911 Chantal Toupin 00543163 Surbhi Madan 00231924 Lauren Ho 00943855 Kaila Jeter 00238494 Alejandro Molina 00472885 David Laidlaw keys: banner IDs values: names h(key)=key%7 Array of buckets w/ key/value pairs 00231924 Lauren Ho 00745911 Chantal Toupin 00543163 Surbhi MadanHash Table — Get
23 00472885 David Laidlaw 00943855 Kaila Jeter 00238494 Alejandro Molina 00745911 Chantal Toupin 00543163 Surbhi Madan 00231924 Lauren Ho keys: banner IDs values: names h(key)=key%7 Array of buckets w/ key/value pairs 00543163 What is the worst-case run time of Get?Hash Table with Chaining
Q:Can we do better than O(n)?
Beating O(n) — Idea #1
Beating O(n) — Idea #2
Banner ID Hashing
28Activity #1
Form groups of 10
Banner ID Hashing
29Activity #1
Banner ID Hashing
30Activity #1
Banner ID Hashing
31Activity #1
Banner ID Hashing
32Activity #1
Banner ID Hashing
33Activity #1
Banner ID Hashing
34Activity #1
Beating O(n) — Idea #2
Yes
Since keys are not necessarily random, we make the hash function random
Universal Hash Functions
and use it on a set of keys, then it is very likely that the function will “spread” the keys (roughly) evenly
37h2 h5 h3 h8 h6 h1 h7 h4 h6
Example of Universal Hash Functions
a3, a4 at random between 0 and p-1
a3=105, a4=83
k3=89, k4=18
h(k) =
4X
i=1ai · ki mod p
<latexit sha1_base64="jD4phmVtcxenQziux8UAHNSi7A=">ACD3icbVDLSsNAFJ3UV62vqEs3g0Wsm5JIQV0Uim5cVjC20MQwmUzboTOZMDMRSugnuPFX3LhQcevWnX/j9LHQ1gMXDufcy73RCmjSjvOt1VYWl5ZXSulzY2t7Z37N29OyUyiYmHBROyHSFGE2Ip6lmpJ1KgnjESCsaXI391gORiorkVg9TEnDUS2iXYqSNFNrH/crgBNahrzIe5rTuju5rEIXUx7HQcBS6HMRwzS0y07VmQAuEndGymCGZmh/+bHAGSeJxgwp1XGdVAc5kpiRkYlP1MkRXiAeqRjaI4UE+eWgEj4wSw6QphINJ+rviRxpY8Mp0c6b6a98bif14n093zIKdJmS4OmibsagFnCcDoypJFizoSEIS2puhbiPJMLaZFgyIbjzLy8S7R6UXVvauXG5SyNIjgAh6ACXHAGuAaNIEHMHgEz+AVvFlP1ov1bn1MWwvWbGYf/IH1+QMwC5sF</latexit><latexit sha1_base64="jD4phmVtcxenQziux8UAHNSi7A=">ACD3icbVDLSsNAFJ3UV62vqEs3g0Wsm5JIQV0Uim5cVjC20MQwmUzboTOZMDMRSugnuPFX3LhQcevWnX/j9LHQ1gMXDufcy73RCmjSjvOt1VYWl5ZXSulzY2t7Z37N29OyUyiYmHBROyHSFGE2Ip6lmpJ1KgnjESCsaXI391gORiorkVg9TEnDUS2iXYqSNFNrH/crgBNahrzIe5rTuju5rEIXUx7HQcBS6HMRwzS0y07VmQAuEndGymCGZmh/+bHAGSeJxgwp1XGdVAc5kpiRkYlP1MkRXiAeqRjaI4UE+eWgEj4wSw6QphINJ+rviRxpY8Mp0c6b6a98bif14n093zIKdJmS4OmibsagFnCcDoypJFizoSEIS2puhbiPJMLaZFgyIbjzLy8S7R6UXVvauXG5SyNIjgAh6ACXHAGuAaNIEHMHgEz+AVvFlP1ov1bn1MWwvWbGYf/IH1+QMwC5sF</latexit><latexit sha1_base64="jD4phmVtcxenQziux8UAHNSi7A=">ACD3icbVDLSsNAFJ3UV62vqEs3g0Wsm5JIQV0Uim5cVjC20MQwmUzboTOZMDMRSugnuPFX3LhQcevWnX/j9LHQ1gMXDufcy73RCmjSjvOt1VYWl5ZXSulzY2t7Z37N29OyUyiYmHBROyHSFGE2Ip6lmpJ1KgnjESCsaXI391gORiorkVg9TEnDUS2iXYqSNFNrH/crgBNahrzIe5rTuju5rEIXUx7HQcBS6HMRwzS0y07VmQAuEndGymCGZmh/+bHAGSeJxgwp1XGdVAc5kpiRkYlP1MkRXiAeqRjaI4UE+eWgEj4wSw6QphINJ+rviRxpY8Mp0c6b6a98bif14n093zIKdJmS4OmibsagFnCcDoypJFizoSEIS2puhbiPJMLaZFgyIbjzLy8S7R6UXVvauXG5SyNIjgAh6ACXHAGuAaNIEHMHgEz+AVvFlP1ov1bn1MWwvWbGYf/IH1+QMwC5sF</latexit><latexit sha1_base64="jD4phmVtcxenQziux8UAHNSi7A=">ACD3icbVDLSsNAFJ3UV62vqEs3g0Wsm5JIQV0Uim5cVjC20MQwmUzboTOZMDMRSugnuPFX3LhQcevWnX/j9LHQ1gMXDufcy73RCmjSjvOt1VYWl5ZXSulzY2t7Z37N29OyUyiYmHBROyHSFGE2Ip6lmpJ1KgnjESCsaXI391gORiorkVg9TEnDUS2iXYqSNFNrH/crgBNahrzIe5rTuju5rEIXUx7HQcBS6HMRwzS0y07VmQAuEndGymCGZmh/+bHAGSeJxgwp1XGdVAc5kpiRkYlP1MkRXiAeqRjaI4UE+eWgEj4wSw6QphINJ+rviRxpY8Mp0c6b6a98bif14n093zIKdJmS4OmibsagFnCcDoypJFizoSEIS2puhbiPJMLaZFgyIbjzLy8S7R6UXVvauXG5SyNIjgAh6ACXHAGuAaNIEHMHgEz+AVvFlP1ov1bn1MWwvWbGYf/IH1+QMwC5sF</latexit>h(00238918) = 50
<latexit sha1_base64="iGoxRFe4ctJcorkgedSyR+zrU=">AB+HicbVDLTgJBEOzF+Jr1aOXicQEL2QWNcLBhOjFIyaukMCGzA4DTJh9ZGaWhGz4Ey8e1Hj1U7z5Nw6wBwUr6aRS1Z3uLj8WXGmMv63c2vrG5lZ+u7Czu7d/YB8ePakokZS5NBKRbPlEMcFD5mquBWvFkpHAF6zpj+5mfnPMpOJR+KgnMfMCMgh5n1OijdS17WEJ48pFteZUz9ENusJdu4jLeA60SpyMFCFDo2t/dXoRTQIWaiqIUm0Hx9pLidScCjYtdBLFYkJHZMDahoYkYMpL5dP0ZlReqgfSVOhRnP190RKAqUmgW86A6KHatmbif957UT3q17KwzjRLKSLRf1EIB2hWQyoxyWjWkwMIVRycyuiQyIJ1SasgnBWX5lbiVcq3sPFwW67dZGnk4gVMogQPXUId7aIALFMbwDK/wZqXWi/VufSxac1Y2cwx/YH3+ADwekFk=</latexit><latexit sha1_base64="iGoxRFe4ctJcorkgedSyR+zrU=">AB+HicbVDLTgJBEOzF+Jr1aOXicQEL2QWNcLBhOjFIyaukMCGzA4DTJh9ZGaWhGz4Ey8e1Hj1U7z5Nw6wBwUr6aRS1Z3uLj8WXGmMv63c2vrG5lZ+u7Czu7d/YB8ePakokZS5NBKRbPlEMcFD5mquBWvFkpHAF6zpj+5mfnPMpOJR+KgnMfMCMgh5n1OijdS17WEJ48pFteZUz9ENusJdu4jLeA60SpyMFCFDo2t/dXoRTQIWaiqIUm0Hx9pLidScCjYtdBLFYkJHZMDahoYkYMpL5dP0ZlReqgfSVOhRnP190RKAqUmgW86A6KHatmbif957UT3q17KwzjRLKSLRf1EIB2hWQyoxyWjWkwMIVRycyuiQyIJ1SasgnBWX5lbiVcq3sPFwW67dZGnk4gVMogQPXUId7aIALFMbwDK/wZqXWi/VufSxac1Y2cwx/YH3+ADwekFk=</latexit><latexit sha1_base64="iGoxRFe4ctJcorkgedSyR+zrU=">AB+HicbVDLTgJBEOzF+Jr1aOXicQEL2QWNcLBhOjFIyaukMCGzA4DTJh9ZGaWhGz4Ey8e1Hj1U7z5Nw6wBwUr6aRS1Z3uLj8WXGmMv63c2vrG5lZ+u7Czu7d/YB8ePakokZS5NBKRbPlEMcFD5mquBWvFkpHAF6zpj+5mfnPMpOJR+KgnMfMCMgh5n1OijdS17WEJ48pFteZUz9ENusJdu4jLeA60SpyMFCFDo2t/dXoRTQIWaiqIUm0Hx9pLidScCjYtdBLFYkJHZMDahoYkYMpL5dP0ZlReqgfSVOhRnP190RKAqUmgW86A6KHatmbif957UT3q17KwzjRLKSLRf1EIB2hWQyoxyWjWkwMIVRycyuiQyIJ1SasgnBWX5lbiVcq3sPFwW67dZGnk4gVMogQPXUId7aIALFMbwDK/wZqXWi/VufSxac1Y2cwx/YH3+ADwekFk=</latexit><latexit sha1_base64="iGoxRFe4ctJcorkgedSyR+zrU=">AB+HicbVDLTgJBEOzF+Jr1aOXicQEL2QWNcLBhOjFIyaukMCGzA4DTJh9ZGaWhGz4Ey8e1Hj1U7z5Nw6wBwUr6aRS1Z3uLj8WXGmMv63c2vrG5lZ+u7Czu7d/YB8ePakokZS5NBKRbPlEMcFD5mquBWvFkpHAF6zpj+5mfnPMpOJR+KgnMfMCMgh5n1OijdS17WEJ48pFteZUz9ENusJdu4jLeA60SpyMFCFDo2t/dXoRTQIWaiqIUm0Hx9pLidScCjYtdBLFYkJHZMDahoYkYMpL5dP0ZlReqgfSVOhRnP190RKAqUmgW86A6KHatmbif957UT3q17KwzjRLKSLRf1EIB2hWQyoxyWjWkwMIVRycyuiQyIJ1SasgnBWX5lbiVcq3sPFwW67dZGnk4gVMogQPXUId7aIALFMbwDK/wZqXWi/VufSxac1Y2cwx/YH3+ADwekFk=</latexit>Hash Table with UHFs
Hash Table with UHFs
Why does Universal Hashing Work?
Proof of Universal Hashing
Inverses
Modular Arithmetic
Analysis
Analysis
Step 1: Equivalent Formulation
48h(x1, x2, x3, x4) = h(y1, y2, y3, y4) a1x1 + · · · + a4x4 ≡ a1y1 + · · · + a4y4 (mod p)
by definitiona4x4 − a4y4 ≡ (a1y1 + a2y2 + a3y3) − (a1x1 + a2x2 + a3x3) (mod p) a4 · (x4 − y4) ≡ c (mod p)
different move things just some number; let’s call it ca4 ≡ c · (x4 − y4)−1 (mod p)
Step 2: Probability of Equiv. Formulation
a4 ≡ c · (x4 − y4)−1 (mod p)
x4 y4 6= 0
Putting it all Together
End of Universal Hashing Proof
Summary
Q: what can we build from dictionaries?
A (Basic) Search Engine
A (Basic) Search Engine
Build Index
56 function build_index(page_list): index = dict() counts = dict() for page in page_list: for word in page: try: count = counts.get(word) except KeyError: counts.put(word,0) count = counts.get(word) counts.put(word, counts[word] + 1) key = word + str(counts.get(word)) index.put(key, page.url) return indexSearch Index
57 def search_index(index, word):A (Basic) Search Engine
Sets
Set ADT
FALSE otherwise
arbitrary order)
Set Data Structure
Sets from Hash Tables
Expected O(1) Expected O(1)
HashMap vs. HashSet