Fill out the Brown Computer Science Survey you got in your email! - PowerPoint PPT Presentation

Fill out the Brown Computer Science Survey you got in your email! percentageproject.com Only takes 5 min! If you didn’t receive the survey, email All multiple litofish@cs.brown.edu choice!

Sets, Dictionaries & Hash Tables CS16: Introduction to Data Structures & Algorithms Spring 2020

Q: how would you build a (basic) search engine?

What’s so Hard about Search Engines? 5

Search Through Each Page? ‣ Assume Google indexes 200 billion pages ‣ If we scan 1 page in 1 microsecond ‣ each search would take 55 hours ‣ How can we improve search time? 6

Outline ‣ Sets ‣ Dictionaries ‣ Hash Tables ‣ Ex: Search engine

Dictionary ‣ Collection of key/value pairs ‣ distinct and unordered keys ‣ Supports value lookup by key ‣ Also known as a map ‣ “maps” keys to values ‣ examples ‣ name → address ‣ word → definition 8

Dictionary ADT add ( key, value ): int size ( ): ‣ ‣ ‣ returns number key/value pairs ‣ adds key/value pair to dict. boolean isEmpty ( ): ‣ object get ( key ): ‣ ‣ returns TRUE if dict. is empty; ‣ returns value mapped to key FALSE otherwise remove ( key ): ‣ ‣ removes key/value pair

Q: how can we implement a dictionary?

Array-based Dictionary ‣ Can we use an expandable array A ? ‣ add ( k,v ): ‣ store (k,v) in first empty cell of A ‣ takes O(1) if you keep track of first empty cell ‣ get ( k ): Is O(n) good enough? What if ‣ scan A to find value with key key=k our dictionary stores 200B ‣ takes O(n) key/value pairs? ‣ remove ( k ): ‣ scan A to find pair with key=k & remove ‣ takes O(n) 11

Q: can we do better?

Yes! with a Hash Table ‣ Hash tables are composed of ‣ an array A ‣ and a “hash” function h: X ⟶ Y & h(x) 13

Dictionary vs. Hash Table ‣ A dictionary (or map) is an abstract data type ‣ can be implemented using many different data structures ‣ A hash table is a dictionary data structure ‣ one specific way to implement a dictionary 14

Yes! with a Hash Table A hash function is function h: X ⟶ Y that ‣ ‣ shrinks : maps elements from a large input space to a smaller output space X Y h ‣ well spread : h spreads elements of X over Y X Y X Y h h 15 Y X

Building a Dictionary w/ a Hash Table ‣ Choose a hash function h:X ⟶ Y with ‣ X = “universe of keys” and Y = “indices of array” ‣ add ( k,v ) ‣ set A[h(k)]=v which is O(1) ‣ get ( k ) ‣ return v=A[h(k)] which is O(1) ‣ remove ( k ) ‣ delete A[h(k)] which is O(1) 16

Hash Table — Add keys: banner IDs values: names 00472885 David Laidlaw 00943855 Kaila Jeter 00745911 Chantal Toupin 00238494 00943855 Alejandro Kaila Jeter Molina 00238494 Alejandro Molina 00472885 David Laidlaw 00745911 Chantal Toupin 17

Building a Dictionary w/ a Hash Table ‣ Q: What is the problem with this? ‣ Remember that | Y|<|X| ‣ (here |X| denotes size of X ) ‣ …so some keys in X will be hashed to the same location! ‣ this is called the pigeonhole principle ‣ there just isn’t enough room in Y to fit all of X ‣ …therefore some values in array will be overwritten ‣ this is called a collision 18

Overcoming Collisions ‣ Hash Table with Chaining ‣ store multiple values at each array location ‣ each array cell stores a “bucket ” of pairs ‣ can implement bucket as a list or expandable array or … & h(x) A FYI : there are many buckets: other approaches e.g., linear probing, quadratic probing, cuckoo hashing,… 19

Hash Table table: array h: hash function function add (k, v): O(1) if computing index = h(k) hash function is O(1) table[index].append(k, v) function get (k): runtime index = h(k) depends on for (key, val) in table[index]: bucket size if key = k: return val error(“key not found”) 20

Hash Table ‣ Let’s do another example but with Chaining! ‣ We’ll use the following hash function ‣ h(banner_id)=banner_id % 7 21

Hash Table — Add Array of buckets w/ key/value pairs keys: banner IDs values: names 00472885 00231924 David Laidlaw Lauren Ho 00943855 h(key)=key%7 Kaila Jeter 00745911 Chantal Toupin 00238494 00943855 Alejandro Kaila Jeter Molina 00238494 Alejandro Molina 00472885 David Laidlaw 00745911 00543163 Chantal Toupin Surbhi Madan 00231924 Lauren Ho 00543163 Surbhi Madan 22

Hash Table — Get Array of buckets w/ key/value pairs keys: banner IDs values: names 00472885 00231924 David Laidlaw Lauren Ho h(key)=key%7 00543163 00943855 Kaila Jeter 00238494 Alejandro Molina 00745911 00543163 Chantal Toupin Surbhi Madan What is the worst-case run time of Get? 23

Hash Table with Chaining ‣ What is the worst-case runtime of Get? ‣ ≈ size of largest bucket ‣ What is the size of largest bucket? ‣ assume we have n students and a table of size m ‣ if h “spreads” keys roughly evenly then ‣ each bucket has size ≈ n/m ‣ ex: if n=150 and m=7 each buckets has size ≈ 150/7 = 21 ‣ But what is the size of the largest bucket asymptotically ? ‣ assume m is a constant (i.e., it does not grow as a function of n ) ‣ each bucket has size ≈ n/m = n/c = O(n) 24

Q: Can we do better than O(n) ?

Beating O(n) — Idea #1 ‣ Idea: use large table ‣ Banner IDs have 8 digits so max ID is 99,999,999 ‣ Use table of size m=100,000,000 ‣ w/ hash function h(key)=key ‣ Are there any collisions in this case? ‣ no collisions because every pair gets its own cell ‣ What is run time of Get? ‣ O(1) since we don’t need to scan buckets ‣ What is the problem with this approach? ‣ what if we only store 150 students? we’re wasting 99,999,850 cells 26

Beating O(n) — Idea #2 Idea : use a table of size equal to the number of students + “good” hash function ‣ set the table size to m=n ‣ ‣ use a hash function h that spreads keys well ‣ No wasted space since n = m ‣ in other words, “table size” = “number of students” If h spreads keys roughly evenly then each bucket has size ‣ ‣ ≈ n/m = n/n = 1 = O(1) ‣ What hash function should we use? ‣ Suppose n = 150 (i.e., we want to insert 150 students) ‣ should we use the hash function h(key) = key % 150 ? 27

Banner ID Hashing Form groups of 10 5 min Activity #1 28

Banner ID Hashing 5 min Activity #1 29

Beating O(n) — Idea #2 ‣ Idea #2 relied on an assumption: if h spreads keys roughly evenly then each bucket has size ‣ ‣ ≈ n/m = n/n = 1 = O(1) ‣ Will h(ID)=ID%11 spread banner IDs evenly? ‣ it depends on the banner IDs… ‣ if banner IDs are chosen randomly then Yes ‣ But what if next year all banner IDs are multiples of 11 ? ‣ Then all banner IDs will map to 0 ! ‣ So there will be one bucket with all IDs ‣ so worst-case runtime of Get will be O(n) 35

Since keys are not necessarily random, we make the hash function random

Universal Hash Functions Special “ families ” of hash functions ‣ ‣ UHF = {h 1 ,h 2 ,…,h q } ‣ designed so that if we pick a function from the family at random and use it on a set of keys, then it is very likely that the function will “spread” the keys (roughly) evenly h 1 h 2 h 3 h 6 h 6 h 4 h 7 h 5 h 8 37

Fill out the Brown Computer Science Survey you got in your email! - PowerPoint PPT Presentation

Fill out the Brown Computer Science Survey you got in your email! percentageproject.com Only takes 5 min! If you didnt receive the survey, email All multiple litofish@cs.brown.edu choice! 2 Sets, Dictionaries & Hash Tables CS16:

Flood-Fill Flood-fill Used in interactive paint systems. The user specify a seed by

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

C H R I S T M A S T R E E FA R M I N with Harry Schwartz Yup. Weve got a little farm.

Lecture 2 No Silver Bullet About Background Survey Total 18 people responded. We know who

BLACKJACK INTRO AND NESTED LOOPS CSSE 120 Rose Hulman Institute of Technology Please fill out

Chapter 9. Survey Research Chapter 9. Survey Research survey research methods? survey research

To Do To Do Computer Graphics (Fall 2004) Computer Graphics (Fall 2004) Fill out survey

So You Got Insurance, Now What? Health Literacy Peg Brown, Deputy Insurance Commissioner June

Commission: Out of touch, out of date, out of pocket April 2017 Commission: Out of touch, out of

#prep X Assembly 03-A: Proximity Sensor You got Single Fan? You got the Dual Fan Upgrade? Good.

Leading Brown County Leading Brown County A Strategic Renewal of g Leading Brown County: Take

N BROWN GROUP PLC INTERIM RESULTS 16 OCTOBER 2012 1 N Brown Group N Brown Group plc ALAN

The Nuclear Shell Model and Beta Decay Alex Brown Michigan State University Alex Brown, ND2013,

High High Per erfor ormance mance Dummy ummy Fill Ins Fill nser ertion ion wit ith h

High High Per erfor ormance mance Dummy ummy Fill Ins Fill nser ertion ion wit ith h

Update on Survey Activities and Bottom Trawl Survey Calibration Russell W. Brown, Ph.D.

60% Increase in Customer Engagement Session Title MIKE LOVERIDGE Head of Digital Test and Learn

The state of Kotlin support in Spring Sbastien Deleuze @sdeleuze Copenhagen Denmark Safe

WHAT YOU NEED TO KNOW TO COMPLY WITH CALIFORNIAS NEW PRIVACY LAW ( CCPA) Presented By: Jim

!!!"#$%#"&'()$*+,-./0&123/4

COVID 19 UPDATE 5-4-2020 Todays Agenda Welcome Jean OLeary WICShopper App

University of North Carolina University of North Carolina SciQuest E Procurement Presentation

LEARNING TO FIGHT EXODUS 17:8-16 New Experience: Israel must learn to fight EXODUS 17:8-16 At

FRACTIONS, COURSE EVALUATIONS, FINAL EXAM REVIEW CSSE 120 Rose Hulman Institute of Technology

Fill out the Brown Computer Science Survey you got in your email! - PowerPoint PPT Presentation

Fill out the Brown Computer Science Survey you got in your email! percentageproject.com Only takes 5 min! If you didnt receive the survey, email All multiple litofish@cs.brown.edu choice! 2 Sets, Dictionaries & Hash Tables CS16:

Flood-Fill Flood-fill Used in interactive paint systems. The user specify a seed by

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

C H R I S T M A S T R E E FA R M I N with Harry Schwartz Yup. Weve got a little farm.

Lecture 2 No Silver Bullet About Background Survey Total 18 people responded. We know who

BLACKJACK INTRO AND NESTED LOOPS CSSE 120 Rose Hulman Institute of Technology Please fill out

Chapter 9. Survey Research Chapter 9. Survey Research survey research methods? survey research

To Do To Do Computer Graphics (Fall 2004) Computer Graphics (Fall 2004) Fill out survey

So You Got Insurance, Now What? Health Literacy Peg Brown, Deputy Insurance Commissioner June

Commission: Out of touch, out of date, out of pocket April 2017 Commission: Out of touch, out of

#prep X Assembly 03-A: Proximity Sensor You got Single Fan? You got the Dual Fan Upgrade? Good.

Leading Brown County Leading Brown County A Strategic Renewal of g Leading Brown County: Take

N BROWN GROUP PLC INTERIM RESULTS 16 OCTOBER 2012 1 N Brown Group N Brown Group plc ALAN

The Nuclear Shell Model and Beta Decay Alex Brown Michigan State University Alex Brown, ND2013,

High High Per erfor ormance mance Dummy ummy Fill Ins Fill nser ertion ion wit ith h

High High Per erfor ormance mance Dummy ummy Fill Ins Fill nser ertion ion wit ith h

Update on Survey Activities and Bottom Trawl Survey Calibration Russell W. Brown, Ph.D.

60% Increase in Customer Engagement Session Title MIKE LOVERIDGE Head of Digital Test and Learn

The state of Kotlin support in Spring Sbastien Deleuze @sdeleuze Copenhagen Denmark Safe

WHAT YOU NEED TO KNOW TO COMPLY WITH CALIFORNIAS NEW PRIVACY LAW ( CCPA) Presented By: Jim

!!!&quot;#$%#&quot;&amp;'()$*+,-./0&amp;123/4

COVID 19 UPDATE 5-4-2020 Todays Agenda Welcome Jean OLeary WICShopper App

University of North Carolina University of North Carolina SciQuest E Procurement Presentation

LEARNING TO FIGHT EXODUS 17:8-16 New Experience: Israel must learn to fight EXODUS 17:8-16 At

FRACTIONS, COURSE EVALUATIONS, FINAL EXAM REVIEW CSSE 120 Rose Hulman Institute of Technology

!!!"#$%#"&'()$*+,-./0&123/4