Hashing 14 September 2020 OSU CSE 1 Performance of Set (and Map ) - - PowerPoint PPT Presentation

hashing
SMART_READER_LITE
LIVE PREVIEW

Hashing 14 September 2020 OSU CSE 1 Performance of Set (and Map ) - - PowerPoint PPT Presentation

Hashing 14 September 2020 OSU CSE 1 Performance of Set (and Map ) How long does it take to execute each of the methods of Set2 (similarly Map2 ), which use a Queue as the data representation? Assume that each call to a Queue kernel


slide-1
SLIDE 1

Hashing

14 September 2020 OSU CSE 1

slide-2
SLIDE 2

Performance of Set (and Map)

  • How long does it take to execute each of

the methods of Set2 (similarly Map2), which use a Queue as the data representation?

  • Assume that each call to a Queue kernel

method executes in constant time, i.e., that the duration of a call is independent of the values of all the arguments, including the receiver

14 September 2020 OSU CSE 2

slide-3
SLIDE 3

Standard Methods

  • For almost every type in the OSU CSE

components, including Queue, each of the three Standard methods (newInstance, clear, and transferFrom) takes constant time to execute

14 September 2020 OSU CSE 3

slide-4
SLIDE 4

Queue Kernel Methods

14 September 2020 OSU CSE 4

Method (Op) Execution Time (TOp)

enqueue(x) Tenqueue = c1 dequeue(x) Tdequeue = c2 length Tlength = c3

slide-5
SLIDE 5

Set Kernel Methods

14 September 2020 OSU CSE 5

Method Execution Time

add(x) remove(x) contains(x) size

slide-6
SLIDE 6

Set Kernel Methods

14 September 2020 OSU CSE 6

Method Execution Time

add(x) remove(x) contains(x) size

Look at the method body in Set2, and figure out how much work it does...

slide-7
SLIDE 7

Set Kernel Methods

14 September 2020 OSU CSE 7

Method Execution Time

add(x) c4 remove(x) contains(x) size

It simply enqueues its argument; plus, there is some constant-time

  • verhead just to

make the call to add.

slide-8
SLIDE 8

Set Kernel Methods

14 September 2020 OSU CSE 8

Method Execution Time

add(x) c4 remove(x) contains(x) size

Look at the method body in Set2, and figure out how much work it does...

slide-9
SLIDE 9

Set Kernel Methods

14 September 2020 OSU CSE 9

Method Execution Time

add(x) c4 remove(x) c5•|this| + c6 contains(x) size

It has to search through a Queue containing all the Set’s elements.

slide-10
SLIDE 10

Set Kernel Methods

14 September 2020 OSU CSE 10

Method Execution Time

add(x) c4 remove(x) c5•|this| + c6 contains(x) size

Raising the question: a worst case, an average case, ...?

slide-11
SLIDE 11

Set Kernel Methods

14 September 2020 OSU CSE 11

Method Execution Time

add(x) c4 remove(x) c5•|this| + c6 contains(x) c7•|this| + c8 size c9

slide-12
SLIDE 12

Linear Search

  • Linear search is the algorithm that

examines—potentially—every item in a collection (e.g., code like moveToFront in Set2 and Map2) until it finds what it’s looking for

– The name reflects the fact that its execution time is a linear function of the size of the collection (e.g., c7•|this| + c8)

14 September 2020 OSU CSE 12

slide-13
SLIDE 13

Some Common Execution Times

14 September 2020 OSU CSE 13

T(n) n

Execution (“running”) time of some code as a function of the “size” of its input.

slide-14
SLIDE 14

Some Common Execution Times

14 September 2020 OSU CSE 14

T(n) n

“Size” of the input for some code.

slide-15
SLIDE 15

Some Common Execution Times

14 September 2020 OSU CSE 15

T(n) n

Constant time, e.g., c

slide-16
SLIDE 16

Some Common Execution Times

14 September 2020 OSU CSE 16

T(n) n

Log time, e.g., a•log(n) + b

slide-17
SLIDE 17

Some Common Execution Times

14 September 2020 OSU CSE 17

T(n) n

Linear time, e.g., a•n + b

slide-18
SLIDE 18

Some Common Execution Times

14 September 2020 OSU CSE 18

T(n) n

n log n time, e.g., a•n•log(n) + b

slide-19
SLIDE 19

Some Common Execution Times

14 September 2020 OSU CSE 19

T(n) n

Quadratic time, e.g., a•n2 + b•n + c

slide-20
SLIDE 20

Some Common Execution Times

14 September 2020 OSU CSE 20

T(n) n

Exponential time, e.g., 2n

slide-21
SLIDE 21

Faster Execution?

  • Option 1 (preferred): Reduce the order of

magnitude of the running time

– Example: Change from quadratic time to linear time, or linear time to log time

  • Option 2 (better than nothing): Reduce the

constant factor that multiplies the dominant term of the running time

– Example: Change from a larger slope for a linear function to a smaller slope

14 September 2020 OSU CSE 21

slide-22
SLIDE 22

Faster Execution

14 September 2020 OSU CSE 22

T(n) n

Reduce by order

  • f magnitude:

a•n + b

slide-23
SLIDE 23

Faster Execution

14 September 2020 OSU CSE 23

T(n) n

Reduce by order

  • f magnitude:

c•log(n) + d

slide-24
SLIDE 24

Faster Execution

14 September 2020 OSU CSE 24

T(n) n

Reduce by a constant factor: a•n + b

slide-25
SLIDE 25

Faster Execution

14 September 2020 OSU CSE 25

T(n) n

Reduce by a constant factor: (a/10)•n + b

slide-26
SLIDE 26

Example: Faster Linear Search

  • Goal: Reduce the constant factor in the

execution time of linear search, i.e., reduce it from a•n + b to something like(a/10)•n + b

  • Approach: Reduce the number of items

that need to be examined to find the one you’re looking for, because, e.g.: (a/10)•n + b = a•(n/10) + b

14 September 2020 OSU CSE 26

slide-27
SLIDE 27

Hashing: The Intuition

  • Instead of searching through all the items,

store the items in many smaller buckets and search through only one bucket that

  • 1. Can be quickly identified, and
  • 2. Must contain the item you’re looking for

14 September 2020 OSU CSE 27

slide-28
SLIDE 28

Hashing: The Intuition

  • Instead of searching through all the items,

store the items in many smaller buckets and search through only one bucket that

  • 1. Can be quickly identified, and
  • 2. Must contain the item you’re looking for

14 September 2020 OSU CSE 28

slide-29
SLIDE 29

Hashing: The Intuition

  • Instead of searching through all the items,

store the items in many smaller buckets and search through only one bucket that

  • 1. Can be quickly identified, and
  • 2. Must contain the item you’re looking for

14 September 2020 OSU CSE 29

slide-30
SLIDE 30

How To Identify The Bucket

  • Suppose you need to search through n

items of type T, and you decide to

  • rganize the items into m buckets
  • Given x of type T, compute from it some

integer value h(x)

  • Look in bucket number h(x) mod m

14 September 2020 OSU CSE 30

slide-31
SLIDE 31

How To Identify The Bucket

  • Suppose you need to search through n

items of type T, and you decide to

  • rganize the items into m buckets
  • Given x of type T, compute from it some

integer value h(x)

  • Look in bucket number h(x) mod m

14 September 2020 OSU CSE 31

The buckets have indices 0, 1, ..., m-1 in an array of buckets called a hashtable.

slide-32
SLIDE 32

How To Identify The Bucket

  • Suppose you need to search through n

items of type T, and you decide to

  • rganize the items into m buckets
  • Given x of type T, compute from it some

integer value h(x)

  • Look in bucket number h(x) mod m

14 September 2020 OSU CSE 32

The function that maps each value of type T to an integer is called the hash function.

slide-33
SLIDE 33

How To Identify The Bucket

  • Suppose you need to search through n

items of type T, and you decide to

  • rganize the items into m buckets
  • Given x of type T, compute from it some

integer value h(x)

  • Look in bucket number h(x) mod m

14 September 2020 OSU CSE 33

By “reducing” the hash function result modulo m, you are guaranteed to get the index of some bucket.

slide-34
SLIDE 34

How To Identify The Bucket

  • Suppose you need to search through n

items of type T, and you decide to

  • rganize the items into m buckets
  • Given x of type T, compute from it some

integer value h(x)

  • Look in bucket number h(x) mod m

14 September 2020 OSU CSE 34

The insight for hashing: if you put the item in this bucket when you store it, then it is the

  • nly place you need to look for it

when searching.

slide-35
SLIDE 35

Set Representation With Hashing

  • Suppose the data representation for a new

Set implementation, say Set4, uses an instance variable like this:

/** * Buckets for hashing. */ private Set<T>[] hashTable;

14 September 2020 OSU CSE 35

slide-36
SLIDE 36

Set Representation With Hashing

  • Suppose the data representation for a new

Set implementation, say Set4, uses an instance variable like this:

/** * Buckets for hashing. */ private Set<T>[] hashTable;

14 September 2020 OSU CSE 36

Abstract Set: Data representation using several “little Sets”:

slide-37
SLIDE 37

Set Representation With Hashing

  • Suppose the data representation for a new

Set implementation, say Set4, uses an instance variable like this:

/** * Buckets for hashing. */ private Set<T>[] hashTable;

14 September 2020 OSU CSE 37

Can we really do this: use Sets in the representation of a Set? Why is it not circular?

slide-38
SLIDE 38

Details

  • Suppose further (for illustration purposes)

that:

– T = Integer – h(x) = x – m = |$this.hashTable| = 3

14 September 2020 OSU CSE 38

slide-39
SLIDE 39

Details

  • Suppose further (for illustration purposes)

that:

– T = Integer – h(x) = x – m = |$this.hashTable| = 3

14 September 2020 OSU CSE 39

Here and in upcoming contracts, we’ll model Java arrays as mathematical strings.

slide-40
SLIDE 40

Examples

14 September 2020 OSU CSE 40

Abstract (this) Concrete ($this.hashTable)

{} <{}, {}, {}> {13} {5, 13} {-2, 13}

slide-41
SLIDE 41

Examples

14 September 2020 OSU CSE 41

Abstract (this) Concrete ($this.hashTable)

{} <{}, {}, {}> {13} <{}, {13}, {}> {5, 13} {-2, 13}

slide-42
SLIDE 42

Examples

14 September 2020 OSU CSE 42

Abstract (this) Concrete ($this.hashTable)

{} <{}, { }, {}> {13} <{}, {13}, {}> {5, 13} {-2, 13}

Why is 13 in bucket 1? h(x) mod m = h(13) mod 3 = 13 mod 3 = 1

slide-43
SLIDE 43

Examples

14 September 2020 OSU CSE 43

Abstract (this) Concrete ($this.hashTable)

{} <{}, {}, {}> {13} <{}, {13}, {}> {5, 13} <{}, {13}, {5}> {-2, 13}

slide-44
SLIDE 44

Examples

14 September 2020 OSU CSE 44

Abstract (this) Concrete ($this.hashTable)

{} <{}, {}, {}> {13} <{}, {13}, {}> {5, 13} <{}, {13}, {5}> {-2, 13} <{}, {-2, 13}, {}>

slide-45
SLIDE 45

Two-Level Thinking

14 September 2020 OSU CSE 45

{13} {} {5, 13} {-2, 13} <{}, {13}, {}> <{}, {}, {}> <{}, {-2, 13}, {}> <{}, {13}, {5}>

slide-46
SLIDE 46

The hashCode Method

  • In Java, the type Object defines this

instance method to compute h, i.e., as the programmatic version of a hash function:

public int hashCode()

  • As a best practice, nearly every type

should override the default implementation

  • f this method, which by default rarely

meets the requirements necessary for the hashing idea to work!

14 September 2020 OSU CSE 46

slide-47
SLIDE 47

Requirements

  • The only requirement for the hashing idea

to give correct behavior is that the hash function h should be a total function

  • In programming terms:

– hashCode has no precondition (so every PhoneNumber value has an int hash value) – hashCode always returns the same int hash value for the same PhoneNumber value

14 September 2020 OSU CSE 47

slide-48
SLIDE 48

Requirements

  • The only requirement for the hashing idea

to give correct behavior is that the hash function h should be a total function

  • In programming terms:

– hashCode has no precondition (so every PhoneNumber value has an int hash value) – hashCode always returns the same int hash value for the same PhoneNumber value

14 September 2020 OSU CSE 48

This is the part that is not satisfied by the default implementation of hashCode that comes with Object.

slide-49
SLIDE 49

Good Hash Functions

  • To result in good performance of the

hashing idea (not just correct behavior), hashCode should also:

– Give different output values for different input values – Execute in constant time

14 September 2020 OSU CSE 49

slide-50
SLIDE 50

Good Hash Functions

  • To result in good performance of the

hashing idea (not just correct behavior), hashCode should also:

– Give different output values for different input values – Execute in constant time

14 September 2020 OSU CSE 50

Why can this not always be achieved?

slide-51
SLIDE 51

Example

  • Suppose type T is PhoneNumber,

modeled as follows:

PHONE_NUMBER_MODEL is string of integer exemplar p constraint |p| = 10 and entries(p) is subset of {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

14 September 2020 OSU CSE 51

slide-52
SLIDE 52

Example

  • Suppose type T is PhoneNumber,

modeled as follows:

PHONE_NUMBER_MODEL is string of integer exemplar p constraint |p| = 10 and entries(p) is subset of {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

14 September 2020 OSU CSE 52

Maybe you need a Set<PhoneNumber> in developing a “contacts app” for a phone.

slide-53
SLIDE 53

Possible Hash Functions

  • The length of the phone-number string:

h("6142926446") = 10

  • The numerical value of the area code (first

three digits) of the phone-number string:

h("6142926446") = 614

  • The numerical value of the last four digits
  • f the phone-number string:

h("6142926446") = 6446

14 September 2020 OSU CSE 53

slide-54
SLIDE 54

Possible Hash Functions

  • The length of the phone-number string:

h("6142926446") = 10

  • The numerical value of the area code (first

three digits) of the phone-number string:

h("6142926446") = 614

  • The numerical value of the last four digits
  • f the phone-number string:

h("6142926446") = 6446

14 September 2020 OSU CSE 54

There are many more options as well; how do you choose one?

slide-55
SLIDE 55

An Empirical Matter

  • How well hashing distributes the data

among buckets depends, in part, on the data themselves

  • Your worst enemy, knowing your hash

function, could always provide data that would result in no performance gain over linear search

– Everything might fall into one bucket...

14 September 2020 OSU CSE 55

slide-56
SLIDE 56

Resources

  • Big Java (4th ed), Sections 16.3-16.4

– https://library.ohio-state.edu/record=b8540788~S7

14 September 2020 OSU CSE 56