Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions - - PowerPoint PPT Presentation

hash pile ups using collisions to identify unknown hash
SMART_READER_LITE
LIVE PREVIEW

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions - - PowerPoint PPT Presentation

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David Malone 11 October 2012 Hash Functions We are talking about hash functions for consistent assignment. For example, Hash tables, Network


slide-1
SLIDE 1

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions

  • R. Joshua Tobin and David Malone

11 October 2012

slide-2
SLIDE 2

Hash Functions

We are talking about hash functions for consistent assignment. For example,

  • Hash tables,
  • Network balancing packets (CEF, LAG, ECMP),
  • Service load balancing (BIG-IP),
  • Packets to CPUs (Microsoft RSS),
  • etc.

These are not usually cryptographic strength! Collisions relatively easy to find.

slide-3
SLIDE 3

Outline

  • 1. Background motivation.
  • 2. Idea — learning and generating collisions.
  • 3. 3 examples

3.1 the hash, 3.2 the attack, 3.3 the results.

  • 4. Conclusion.

There is an analysis of each attack in the paper.

slide-4
SLIDE 4

Background Motivation

  • Algorithmic Complexity Attacks (Crosby and Wallach, 2003).
  • Some algorithms have different typical and worst case.
  • Attack by choosing input to be worst case.
  • Can be applied to hash tables, sorting, string matching, . . .
  • Hashes are canonical examples.
slide-5
SLIDE 5

Demonstration attack

10 20 30 40 50 60 5 10 15 20 Packets Forwarded (pps) Time (s) Random Attack Complexity Attack

slide-6
SLIDE 6

How to Fix?

  • In general use algorithm with good worst case.
  • Hash functions too useful though.
  • Using crypto-strength hashes often too slow?
  • What happens if the hash used is a secret?

Choose your hash randomly from a family on startup. (Advisories still being released on this issues.)

slide-7
SLIDE 7

Hash Costs

2 4 6 8 10 12 14 16 Geode 500MHz Core 2 Duo 2.66GHz Athlon 64 2.6GHz Xeon 3GHz Atom 1.6GHz CPU Time (us) Xor Jenkins Pearson Universal MD5 SHA SHA256

slide-8
SLIDE 8

Idea — Learning from collisions

  • 1. You usually can’t observe hash output.
  • 2. You can often observe collisions (e.g. time hash lookups,

processing time, reordering, traceroute, server IDs, . . . ).

  • 3. By design, your hashes should have different collisions.
  • 4. Observing collisions leaks information about hash in use

Can we use this to identify the hash function or generate collisions?

slide-9
SLIDE 9

Example 1: Small Hash Family

  • 1. Often the hash is keyed by an integer or a few bits.
  • 2. Suppose the number of hashes is small enough to iterate

through.

  • 3. For example, Bob Jenkins’s hash in RFC 5475.
  • 4. Use 4 bits of output (e.g. 16 routes).
slide-10
SLIDE 10

Example 1: Small Hash Family

Attack:

  • 1. Make a list of all hashes.
  • 2. Find two colliding inputs (Birthday Paradox).
  • 3. Remove hashes that do not collide on these inputs.
  • 4. Repeat until one hash left.
slide-11
SLIDE 11

Example 1: Small Hash Family

5 10 15 20 25 30 10 100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 Number of Probe Strings Number of Hashes Attempts Optimistic Estimate Conservative Estimate

slide-12
SLIDE 12

Example 2: Pearson’s Hash

In 1990 Pearson proposed a neat, fast, randomly keyed hash, using a random permutation T of a byte and xor (⊗). To hash a string of bytes:

  • 1. h ← 0
  • 2. foreach (byte[i])

h ← T[byte[i] ⊕ h]

  • 3. return h

Family is really big — 256!

slide-13
SLIDE 13

Example 2: Pearson’s Hash

Attack: Recover the permutation.

  • 1. Insert all strings x000. . . 0 and 0y00. . . 0
  • 2. Algebra: collide in pairs (a, b) where T(a) = T(0) ⊗ b.
  • 3. From collisions, we know pairs (using 2*256 strings).
  • 4. T(0) is remaining unknown (small family, get in 256+small

strings). Attack generalises to replacing bytes and xor with any group.

slide-14
SLIDE 14

Example 2: Pearson’s Hash

0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 2 3 4 5 6 7 8 fraction of trials number of random strings hashes to recover T 1,000,000 trials predicted

slide-15
SLIDE 15

Example 3: Toeplitz Hash

Microsoft have a standard for network cards to hand off packet to CPUs (RSS). The key K is a longish bit string.

  • 1. r ← 0
  • 2. foreach bit b in input

if (b == 1) r ← r ⊗ left-most 32 bits of K shift K left 1 bit position

  • 3. return r

In practice you use 1–7 bits and might pass through a lookup table to choose CPU.

slide-16
SLIDE 16

Example 3: Toeplitz Hash

Attack: It’s linear over Z2, use some linear algebra.

  • 1. Choose the bits of the input you control. Set one to zero at a

time.

  • 2. Group the bits according to which collide (E1, . . . , El).
  • 3. For any even-sized subsets E ′

1, . . . , E ′ l of E1, . . . , El

h  x +

  • e∈ E ′

i

e   = h(x) +

  • e∈ E ′

i

h(e) = h(x),

  • 4. So every even-sized subset collection gives a collision.

Can work with other linear functions too, but more effective for low index.

slide-17
SLIDE 17

Example 3: Toeplitz Hash

10000 20000 30000 40000 50000 60000 20 40 60 80 100 120 140 160 Mean lookup time Basis bits used by attacker Base Attack on Linear Indirection Base Attack on Non-Linear Indirection Modified Attack on Non-Linear Indirection

slide-18
SLIDE 18

Conclusion

  • 1. Algorithmic Complexity Attacks.
  • 2. For hashes, choosing from a family is useful.
  • 3. However, collisions leak information.
  • 4. Means you need to choose family carefully.
  • 5. Small family is bad.
  • 6. Structure like linear or group is bad.