Compsc sci 201 201 Colle lectio ions ns, Hashing hing, O - - PowerPoint PPT Presentation

compsc sci 201 201 colle lectio ions ns hashing hing o
SMART_READER_LITE
LIVE PREVIEW

Compsc sci 201 201 Colle lectio ions ns, Hashing hing, O - - PowerPoint PPT Presentation

Compsc sci 201 201 Colle lectio ions ns, Hashing hing, O Objects Susan Rodger January 31, 2020 1/31/2020 CompSci 201, Spring 2020 1 G is for Git Version control that's ubiquitous Garbage Collection Java recycles


slide-1
SLIDE 1

Compsc sci 201 201 Colle lectio ions ns, Hashing hing, O Objects

1/31/2020 CompSci 201, Spring 2020 1

Susan Rodger January 31, 2020

slide-2
SLIDE 2

G is for …

  • Git
  • Version control that's ubiquitous
  • Garbage Collection
  • Java recycles
  • Google
  • How to find Stack Overflow

1/31/2020 CompSci 201, Spring 2020 2

slide-3
SLIDE 3

Announcements

  • Assig

ignm nment nt P1 P1 due ue yesterday ay

  • You are in the grace period through midnight
  • APT

APT-3 d 3 due T e Tues es, F Feb 4 4

  • Can still turn in Friday til 11:59pm
  • Discus

cussio ion n 4 4 on

  • n Feb 3

3

  • Prediscussion, do before, out today
  • Reading

ing on c calend ndar ar

  • Slowing down ….. Nothing posted…

1/22/2020 Compsci 201, Spring 2020 3

slide-4
SLIDE 4

Plan for the Day

  • Gener

eric c ic classes: : Arra rrayList to to Has ashS hSet

  • From ArrayList to HashSet to Collections to …
  • Fro

rom m Object ect.e .equa uals ls to to Object ct.ha .hashC hCode

  • Everything is an Object, what can an object do?
  • Maps, I

Inter erfac aces es, A Analy lysis is

  • Next week and next assignment

1/31/2020 CompSci 201, Spring 2020 5

slide-5
SLIDE 5

ArrayList Review

  • Wha

hat is is an an Ar ArrayList?

  • A class that "wraps an array"
  • Part of java.util.Collections hierarchy
  • Almost an array: consta

stant-time me access to any element given an index (independent of N)

  • How ar

are elem elements ad added?

  • New array allocated, values copied, continue

1/31/2020 CompSci 201, Spring 2020 6

slide-6
SLIDE 6

DIYAD ArrayList

  • Do
  • It

t Yourself lf Alg lgorithm and and Datast structu ture

  • SimpleStringArrayList: some methods
  • GrowableStringArrayList: more methods
  • Differences

nces b bet etween + een +100, + 0, +1000, 0, a and * *2

  • Helper methods are private: checkSize()

1/31/2020 CompSci 201, Spring 2020 7

slide-7
SLIDE 7

SimpleStringArrayList

  • DIYA

YAD - I want ant t to writ ite an an Ar ArrayList class

  • State to define a

ine an array

  • Meth

thods to s to

  • Constructor - Create an array – fixed size
  • Add an element to an array
  • Get an element from an array

1/31/2020 CompSci 201, Spring 2020 8

slide-8
SLIDE 8

SimpleStringArrayList (part 1)

1/31/2020 CompSci 201, Spring 2020 9

slide-9
SLIDE 9

SimpleStringArrayList (part 2)

1/31/2020 CompSci 201, Spring 2020 10

slide-10
SLIDE 10

GrowableStringArrayList

  • DIYA

YAD – write a e anothe her Ar ArrayList Class

1/31/2020 CompSci 201, Spring 2020 11

slide-11
SLIDE 11

DIYAD ArrayList

  • Do It Y

Your urself A Alg lgorithm and and Datast structu ture

  • SimpleStringArrayList: some methods
  • GrowableStringArrayList: more methods
  • Differences

nces b bet etween t een these t two classes?

  • Growable – grows as needed, not static

1/31/2020 CompSci 201, Spring 2020 12

slide-12
SLIDE 12

GrowableStringArrayList (part 1)

1/31/2020 CompSci 201, Spring 2020 13

slide-13
SLIDE 13

GrowableStringArrayList (part 2)

1/31/2020 CompSci 201, Spring 2020 14

slide-14
SLIDE 14

GrowableStringArrayList (part 3)

1/31/2020 CompSci 201, Spring 2020 15

slide-15
SLIDE 15

Analysis via Pictures Again

  • Growing ar

array b by doub ubli ling eac each t tim ime

  • Create/copy 1, 2, 4, 8, 16, … 2N
  • If

If X = 2N, , we' e've c crea eated 2x2N-1, or

  • r 2X-1
  • Roughly X, where "roughly" defined later

1/31/2020 CompSci 201, Spring 2020 16

slide-16
SLIDE 16

Analysis of Diyad ArrayLists

  • SimpleS

leString ngArrayLis ist

  • Add 10,000 strings? ok. Add one more? BAD
  • Growab

ableS leString ingArrayLis ist

  • Add as many strings as memory allows, how?
  • Conform

rmingA gArra rrayList

  • Is-a java.util.List, also stores any Object type
  • Must implement List methods, interface

1/31/2020 CompSci 201, Spring 2020 17

slide-17
SLIDE 17

DIYAD Ideas

  • Move f

fro rom S m Stri ring t g to Growab ableS leString ing to G Gener neric ic

  • Lots of work to fit in with Collections hierarchy
  • For our own work? Easier! All of Java? Harder!
  • Dif

ifferences b between + +10, 10, + +1000 1000, * *2 2 and and * 1.2 1.2

  • How do we measure empirically
  • How do we measure analytically
  • Private method checkSize()

1/31/2020 CompSci 201, Spring 2020 18

slide-18
SLIDE 18

Diyad ArrayList Growth

  • When

hen int internal ar array f full? ull? C Create ne new, c copy, us use

  • Efficient add, get, set when done repeatedly
  • Not efficient if resize with +1, +100, +1000
  • Is possible if resize with *2 or *1.25

1/31/2020 CompSci 201, Spring 2020 19

slide-19
SLIDE 19

Analysis with Math+Pictures

  • If we

e grow b by ad addin ing 1 1 (or 100 100 or 1000 1000)

  • Copy 1, then 2, then 3, then … then N
  • 1+2+ … +

+ N N = N N(N+1)/2

  • Same as 100+200+300+…
  • Roughly N2
  • Divide by 2, multiply by 100

1/31/2020 CompSci 201, Spring 2020 20

slide-20
SLIDE 20

Analysis via Math+Pictures Again

  • Growing ar

array b by doub ubli ling eac each t tim ime

  • Create/copy 1, 2, 4, 8, 16, … 2N
  • Total is 1+2+..+2N = 2N+1 -1
  • If

If X = 2N, , we' e've c crea eated 2x2N-1, or

  • r 2X-1
  • Roughly X, where "roughly" defined later

1/31/2020 CompSci 201, Spring 2020 21

slide-21
SLIDE 21

Runtimes summarized

  • Re

Re-sizing ing g geometric icall ally a and a addit itiv ively

  • Allocate new array, copy all pointers/references

1/31/2020 CompSci 201, Spring 2020 22

grow with x 2 size time 1000000 0.028 2000000 0.037 3000000 0.053 4000000 0.066 5000000 0.117 6000000 0.121 7000000 0.143 8000000 0.211 9000000 0.270 10000000 0.260 grow with x 1.25 size time 1000000 0.051 2000000 0.087 3000000 0.117 4000000 0.153 5000000 0.218 6000000 0.338 7000000 0.303 8000000 0.398 9000000 0.452 10000000 0.468 grow with +10000 size time 1000000 1.507 2000000 1.585 3000000 2.740 4000000 5.146 5000000 7.304 6000000 8.315 7000000 10.428 8000000 14.233 9000000 21.434 10000000 21.927

slide-22
SLIDE 22

Diyad ArrayList Summary

  • If we

e grow ad addit itively: + +1, o 1, or +100, 100, o

  • r +1000

1000

  • Performance is quadratic, for an array of N

elements we expect N2 time (allocate/copy)

  • If we

e grow g geo eometrically: * *2, * 2, *1.2 1.2, * *3

  • Performance is linear, for an array of N elements

we expect N time (allocate copy)

  • Ignore c

e constant ants: : N2/2 or

  • r 100*N2 or
  • r 200N or
  • r …

1/31/2020 CompSci 201, Spring 2020 23

slide-23
SLIDE 23

WOTO

http:// //bi bit.ly/2 /201spr pring20-01 0131 31-1

1/31/2020 CompSci 201, Spring 2020 24

slide-24
SLIDE 24

Maria Klawe

  • President of Harvey Mudd
  • Dean of Engineering at

Princeton, ACM Fellow, College Dropout (and re-enroller)

1/31/2020 CompSci 201, Spring 2020 25

Coding is today's language of creativity. All our children deserve a chance to become creators instead consumers of computer science. I personally believe that the most important thing we have to do today is use technology to address societal problems, especially in developing regions

slide-25
SLIDE 25

Generic ConformingArrayList

  • Rathe

her t than S an String ing, u use g e gener neric t ic type p parameter

  • Can use E, T, Type, any identifier <E>
  • Similar to code for GrowableStringArrayList
  • java.ut

util il.L .Lis ist

  • Interface

1/31/2020 CompSci 201, Spring 2020 26

slide-26
SLIDE 26

Can E be anything? String, Point, …

  • Met

etho hod .equals that w t works a ks as e expected f for E E !

  • Internal array myStorage contains Objects
  • ConformingArrayList<String>
  • What .equals is called? Object or String?
  • Runtime decision, not

t comp mpile ile t time decision

  • What does elt reference/point to? String!!!

1/31/2020 CompSci 201, Spring 2020 27

slide-27
SLIDE 27

Why Diyad?

  • Trad

adit itio iona nall lly us use Ar ArrayList<E> --

  • - client

ient code de

  • Understand methods via API
  • Problem solving in many contexts
  • Efficie

icienc ncy: : a.get(1)as f s fast a st as a.get(1000)

  • Why ef

efficie icient nt? U Understand anding ing b by analysis is

  • From the internal array which is efficient
  • From doubling on resize rather than adding one

1/31/2020 CompSci 201, Spring 2020 28

slide-28
SLIDE 28

Toward Applications

  • We

e can an s spea eak w wit ith a h a lim limit ited v vocabulary

  • Learn vocabulary then speak, then read
  • We

e can an als also w writ ite c code e sim imilarly

  • Eventually debugging may require

understanding how .equals works

  • https://

://arxiv iv.o .org/pdf/17 /1711. 11.00975.p 00975.pdf

1/31/2020 CompSci 201, Spring 2020 29

Scalable Streaming Tools for Analyzing N-body Simulations: Finding Halos and Investigating Excursion Sets in One Pass

slide-29
SLIDE 29

Massive Data sets

  • How do we

e find ind w wha hat #ha #hashtags ar are t e trending o

  • n

n Twit itter in in real eal-tim ime?

  • 6,000 tweets/second, 350,000/minute, …
  • Do we weight by tweeter-importance?
  • Mus

ust b be e ab able le t to lo look up up ver ery q quic uickly, c can annot s skim throug ugh a h all h has asht htags/all d all data

  • Conveniently, we use hashing and hash tables!

1/31/2020 CompSci 201, Spring 2020 30

slide-30
SLIDE 30

Toward Understanding HashSet

  • Addin

ding o

  • bj

bjects ts to to Has ashS hSet<..>, a avoid du id duplic plicates

  • We’ll see with Point class, doesn’t work
  • We’ll see with String class, does work
  • Just as we needed to add .equals() …
  • We need to add .hashCode()
  • Need s

d some knowle ledg dge of Obj bject a t and i d internals ls of Has ashS hSet<..> ..>, h , how d does set.add(X) wo work?

  • Every object can convert itself to a number
  • Ask not what you can do to an object …

1/31/2020 CompSci 201, Spring 2020 31

slide-31
SLIDE 31

Making .contains efficient

  • Why

hy is is ArrayLis ist.cont ntains ins(. (..) s ) slow?

  • Search through entire list to find something
  • If list is sorted can we do better?
  • Think of a number between 1 and 1,024, I'll tell you

high, low, correct: how many guesses needed?

  • How do you

u sea earch f for a a book in in the he s stacks?

  • That's not what you do in the stacks?
  • What about in ancient times …

1/31/2020 CompSci 201, Spring 2020 32

slide-32
SLIDE 32

Simple Example Hashing

Want a mapping of Soc Sec Num to Names

  • Duke’s C

CS Stu tude dent U t Union wants ts to to b be a able le to to q quic ickly y fin ind out t in info a about its its m

  • members. A

Als lso a add dd, d delete a and update me memb mbers.

  • s. Doe

Doesn't n need me memb mbers sor sorted. 26 267-89 89-5431 John S Smith 70 703-25 25-61 6141 41 Jack A Adams 31 319-86 86-2115 Be Betty H Harris 47 476-82 82-51 5120 20 Ro Rose B Black

  • Hash T

Table ble s siz ize is 0 0 to to 10

  • Possible

ible H Hash Functio tion: H H(ssn sn) = la last 2 2 dig digits m mod d 11

CompSci 201, Spring 2020 33 1/31/2020

slide-33
SLIDE 33

Have a list of size 11 from 0 to 10

CompSci 201, Spring 2020 34

7

4

2 5 10

  • Ins

nsert the hese int into t the he lis list

  • Insert as (key, v

value lue) t tuple le (267 267-89 89-54 5431 31, J John S Smith) (in in example ple, o

  • nly

ly showing ing n name) H(267 267-89 89-543 431) = = 31 31 %11 = 11 = 9 9 John S hn Smit ith H(703 03-25 25-61 6141) = = 41%11 = 1 = 8 Jack Ad Adam ams H(319 19-86 86-2115 15 ) )= 15 15 %11 = 11 = 4 Betty H Harris H(476 476-82 82-512 120) = = 20% 20%11 11 = = 9 9 Rose B e Blac ack

1 3 6 8 9

1/31/2020

slide-34
SLIDE 34

Finding an Object's number ..

  • Every o
  • bject

t has s .hashCode() metho hod

  • Returns int value, used as “locker number”
  • Could return 39, 2, 57, … even -321
  • Ideally uses properties of object to compute
  • Canno

nnot g guarant antee d e differ erent nt f for e every Object ct!

  • Search items in same locker
  • Use .equals find in locker

1/31/2020 CompSci 201, Spring 2020 38

slide-35
SLIDE 35

Ideal world? Real world!

1/31/2020 CompSci 201, Spring 2020 39

slide-36
SLIDE 36

Hash Metaphor and Pseudocode

  • Finit

ite number o

  • f locker

ers, , or buckets, t , table e le entries ies

  • Each locker stores ArrayList for hash collisions
  • In real world, might be another structure in locker
  • Given o

n object ct, f , find it' t's locker er/b /buc ucket n number er

  • locker # == o.hashCode() % table_size
  • Sea

earch t thr hrough lo locker t to see ee if if tar arget t the here

for(Object o : locker) if o.equals(target)

1/31/2020 CompSci 201, Spring 2020 40

slide-37
SLIDE 37

Point.hashCode

  • Conv

nvert a a Point int t to a a num number

  • Try to make every point a different number
  • That's not possible!!
  • For method below, what non-equal points have same

.hashCode()?

1/31/2020 CompSci 201, Spring 2020 41

slide-38
SLIDE 38

Inefficient but Correct .hashCode

  • Suppo

ppose .hashCode()simply r retur urns ns 5 5

  • Every Point goes in the same locker
  • There are always collisions, but we try to

minimize them. How are collisions resolved?

  • Can

an w we e modif ify PointDriver.java to to s stress ss-te test? t?

  • How many different points can be made?

1/31/2020 CompSci 201, Spring 2020 42

slide-39
SLIDE 39

The hashCode contract

  • Every o
  • bject

t has s .hashCode() metho hod

  • Inherited from Object, but typically overridden
  • Use @Override and read online
  • Must r

st resp spect . t .equals( s(): I If a.equals(b) ?

  • a.hashCode() == b.hashCode()
  • Converse not true! There will be collisions

1/31/2020 CompSci 201, Spring 2020 43

slide-40
SLIDE 40

When Strings Collide

  • Gene

enerate strings t tha hat w will c ill coll llid ide

  • Find such strings in the wild
  • http://hg.openjdk.java.net/jdk7u/jdk7u6/jdk/file/8c2c5d63a17e/src/

share/classes/java/lang/String.java

9/20/19 Compsci 201, Fall 2019: Interfaces, Lists, Sets, Big-Oh 44

String hashCode ayay 3009136 ayBZ 3009136 bZay 3009136 bZbZ 3009136 String hashCode buzzards

  • 931102253

righto

  • 931102253

snitz 109586548 unprecludible 109586548

slide-41
SLIDE 41

WOTO (correctness counts)

http:// //bi bit.ly/2 /201spr pring20-01 0131 31-2

1/31/2020 CompSci 201, Spring 2020 45

slide-42
SLIDE 42

Work in 201

  • How

w impo portant a are APT APTs?

  • How important are APT quizzes?
  • How important

nt a are as e assig ignm nment ents?

  • Earlier assignments, later assignments?
  • How im

important: r read eadin ing and and W WOTO in in-class

  • How important are reading quizzes?

1/31/2020 CompSci 201, Spring 2020 46