How fast are hash functions? D. J. Bernstein University of Illinois - - PDF document

how fast are hash functions d j bernstein university of
SMART_READER_LITE
LIVE PREVIEW

How fast are hash functions? D. J. Bernstein University of Illinois - - PDF document

How fast are hash functions? D. J. Bernstein University of Illinois at Chicago NSF ITR0716498 in cooperation with IST2002507932 ECRYPT Depends on the volume of data being hashed. For small data volumes can often see big costs for


slide-1
SLIDE 1

How fast are hash functions?

  • D. J. Bernstein

University of Illinois at Chicago NSF ITR–0716498 in cooperation with IST–2002–507932 ECRYPT

slide-2
SLIDE 2

Depends on the volume

  • f data being hashed.

For small data volumes can often see big costs for finalization, block padding, etc. Depends on the CPU. e.g. hashing 1000 bytes with OpenSSL SHA–512: 118366 cycles on a Pentium 3, 16822 cycles on an Athlon 64.

slide-3
SLIDE 3

Depends on the hash function. What is the fastest function (given a CPU and data volume)?

slide-4
SLIDE 4

Depends on the hash function. What is the fastest function (given a CPU and data volume)? “Surely a broken function.”

slide-5
SLIDE 5

Depends on the hash function. What is the fastest function (given a CPU and data volume)? “Surely a broken function.” What is the fastest function that hasn’t been broken?

slide-6
SLIDE 6

Depends on the hash function. What is the fastest function (given a CPU and data volume)? “Surely a broken function.” What is the fastest function that hasn’t been broken? What is the fastest function for which half the rounds haven’t been broken?

slide-7
SLIDE 7

Experience shows that slower functions attract much less interest. The fastest functions are the most tempting cryptanalytic targets. If they survive scrutiny then they are favored by users. e.g.: AES selected Rijndael; eSTREAM SW selected HC-128, Rabbit, Salsa20/12, Sosemanuk.

slide-8
SLIDE 8

Does hash speed really matter? Usually it doesn’t! Most computers aren’t flooded with cryptographic operations. Can afford many more rounds. Some computers are flooded, but is hashing the real issue?

slide-9
SLIDE 9

Does hash speed really matter? Usually it doesn’t! Most computers aren’t flooded with cryptographic operations. Can afford many more rounds. Some computers are flooded, but is hashing the real issue? “Yes: I’m using HMAC-MD5

  • n every packet, including

denial-of-service forgeries. I can’t afford HMAC-SHA-256!” That’s an obsolete application. We have faster non-hash MACs that inspire more confidence.

slide-10
SLIDE 10

Better example: “This computer is busy verifying public-key signatures.” My favorite example: Internet DNS security (currently nonexistent) would need DNS caches to verify signatures on every packet. These caches are centralized, hard to split, and often very heavily loaded. Hashing, even for short packets, can easily dominate the time for signature verification!

slide-11
SLIDE 11

Measuring what users will see Look back at 1999.03 Bassham “NIST’s Efficiency Testing for Round1 AES Candidates.” Remember CRYPTON? Fastest cipher in NIST’s tables! Fastest cipher in NIST’s graphs! 720 cycles key setup. 669 cycles encrypt. NIST’s numbers for Rijndael: 6787 cycles key setup. 809 cycles encrypt.

slide-12
SLIDE 12

Quite different figures in 1999 Schneier–Kelsey–Whiting– Wagner–Hall–Ferguson “AES performance comparisons.” Faster CRYPTON encryption: 955 cycles key setup. 345 cycles encrypt. But much faster Rijndael: 850 cycles key setup. 291 cycles encrypt.

slide-13
SLIDE 13

Users who care about speed will use the faster Rijndael software, not the painfully slow software that NIST tested. So the painfully slow software was discarded, and NIST’s tests were disregarded. 1999.10 NIST: “CRYPTON is : : : slower than either Rijndael

  • r Twofish on most platforms.”
slide-14
SLIDE 14

1999 Schneier–Kelsey–Whiting– Wagner–Hall–Ferguson: “Performance is only important in assembly language. : : : Any application which has speed as a requirement will code the encryption algorithm in assembly. : : : Optimized assembly implementations of AES will be available on the Internet. If performance is critical, it will be in assembly.”

slide-15
SLIDE 15

I still see some speed papers saying that a “fair comparison”

  • f algorithms means a comparison
  • f speeds of novice student

implementations in Java. That’s “fair”? No. It’s idiotic. Or dishonest: “We couldn’t compete, so we rewrote the competition to slow it down.” Benchmarks, just like users, should focus on the fastest implementations available. Designers with slow software should try to speed it up.

slide-16
SLIDE 16

The importance of automation During the AES competition, Biham and Knudsen proposed a 2ˆ security margin. NIST decided on (1 + ›)ˆ. But NIST refused to consider reduced-round Serpent etc. “Changing the number of rounds would impact the large amount

  • f performance analysis from

Rounds 1 and 2. All performance data for the modified algorithm would need to be either estimated

  • r performed again.”
slide-17
SLIDE 17

2004: eSTREAM called for stream-cipher submissions implementing a particular API. Received > 30 submissions from 97 cryptographers. De Canni` ere published software to time the submissions. Software has been run on dozens of different CPUs. Designers have seen results, provided faster implementations

  • f the submissions.

Timings were easily updated.

slide-18
SLIDE 18

2006, joint work with Lange: eBATS (ECRYPT Benchmarking

  • f Asymmetric Systems).

DH; signatures; encryption. New benchmarking software: much more data collected, improved portability, etc. As in eSTREAM benchmarking, anyone can submit new software

  • r improve existing software.

> 20 submissions so far, providing > 100 pub-key systems. All submissions are then measured on many computers.

slide-19
SLIDE 19

Today’s announcement: eBASH (ECRYPT Benchmarking

  • f All Submitted Hashes).

Anyone can submit a new hash function or a new implementation

  • f an old function. We’ll measure

all the software on many CPUs. Benchmarking software has been further improved: e.g., automatic ABI stratification, separating 32-bit compilers from 64-bit compilers on amd64 etc. Suggestions? Talk to us!

slide-20
SLIDE 20

What eBASH does Currently we’re measuring time to hash 0 bytes, time to hash 1 byte, time to hash 2 bytes, : : : time to hash 8192 bytes. Aligned input and output. Typical for speed-oriented applications but maybe we should also measure unaligned. Currently 1 core, 1 thread. No hint of speedups from massively parallel computation

  • f, e.g., tree-structured hashes.
slide-21
SLIDE 21

Measurement machines run Linux, BSD, Solaris, etc.

  • n a wide variety of CPUs.

Largest machine contributor: NMI Build and Test Laboratory at the University of Wisconsin. We’re experimenting with ARMs but haven’t included any yet. Currently no 8-bit CPUs. No FPGAs. No ASICs. Will hardware benchmarking be stuck forever in the dark ages?

slide-22
SLIDE 22

For each hash function, currently trying 796 combinations of C compilers + compiler options. Measurements of the function use the compiler + option that hashes 1536 bytes most efficiently. “Have to write in C?” No; can use C++ or asm. Need other options? Tell us!

slide-23
SLIDE 23

eBASH API examples The eBASH software has two files that measure the OpenSSL SHA512 software. hash/sha512/openssl/hash.c:

#include <openssl/sha.h> void crypto_hash_sha512_openssl( unsigned char *out, const unsigned char *in, unsigned long long inlen) { SHA512(in,inlen,out); }

slide-24
SLIDE 24

hash/sha512/openssl/hash.h:

#include <openssl/opensslv.h> #define CRYPTO_hash_\ sha512_openssl_VERSION \ OPENSSL_VERSION_TEXT #define CRYPTO_hash_\ sha512_openssl_BYTES \ 64

Version is optional; copied to database of timings.

slide-25
SLIDE 25

Integrating Whirlpool: hash/whirlpool/ref has two files copied from Whirlpool reference code; hash.h defining BYTES as 64; and an easy 19-line hash.c built from the lower-level functions in the reference code. hash/whirlpool/checksum (also optional) contains an expected hash output. Any compiled code that fails to produce this output is left out of the measurements.

slide-26
SLIDE 26

Examples of easy graphs to draw WARNING: These graphs currently include only a few hash-function implementations. If you have a faster implementation, sorry—it isn’t in these graphs. Advertise it by submitting it to eBASH! We’ll easily update benchmarks, graphs, etc.

slide-27
SLIDE 27

MD5 cycles on orpheus, x86 architecture, Pentium 3 672:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-28
SLIDE 28

MD5 cycles on fireball, x86 arch, Pentium 4 f12:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-29
SLIDE 29

MD5 cycles on nmi-0056, x86 arch, Xeon f41:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-30
SLIDE 30

MD5 cycles on thoth, x86 arch, Athlon:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-31
SLIDE 31

MD5 cycles on katana, x86 arch, Core 2 Duo:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-32
SLIDE 32

MD5 cycles on nmi-0104, x86 arch, Pentium D f64:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-33
SLIDE 33

MD5 cycles on nmi-0020, ia64 arch, Itanium II:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-34
SLIDE 34

MD5 cycles on gggg, ppc32 arch, PowerPC G4 7410:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-35
SLIDE 35

MD5 cycles on nmi-0056, amd64 arch, Xeon f41:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-36
SLIDE 36

MD5 cycles on katana, amd64 arch, Core 2 Duo:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-37
SLIDE 37

MD5 cycles on nmi-0104, amd64 arch, Pentium D f64:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-38
SLIDE 38

MD5 cycles on mace, amd64 arch, Athlon 64 X2:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-39
SLIDE 39

MD5 cycles/byte on orpheus, x86 architecture, Pentium 3 672:

100 101 102 103 104 1 10 100

slide-40
SLIDE 40

MD5 cycles/byte on fireball, x86 arch, Pentium 4 f12:

100 101 102 103 104 1 10 100

slide-41
SLIDE 41

MD5 cycles/byte on nmi-0056, x86 arch, Xeon f41:

100 101 102 103 104 1 10 100

slide-42
SLIDE 42

MD5 cycles/byte on thoth, x86 arch, Athlon:

100 101 102 103 104 1 10 100

slide-43
SLIDE 43

MD5 cycles/byte on katana, x86 arch, Core 2 Duo:

100 101 102 103 104 1 10 100

slide-44
SLIDE 44

MD5 cycles/byte on nmi-0104, x86 arch, Pentium D f64:

100 101 102 103 104 1 10 100

slide-45
SLIDE 45

MD5 cycles/byte on nmi-0020, ia64 arch, Itanium II:

100 101 102 103 104 1 10 100

slide-46
SLIDE 46

MD5 cycles/byte on gggg, ppc32 arch, PowerPC G4 7410:

100 101 102 103 104 1 10 100

slide-47
SLIDE 47

MD5 cycles/byte on nmi-0056, amd64 arch, Xeon f41:

100 101 102 103 104 1 10 100

slide-48
SLIDE 48

MD5 cycles/byte on katana, amd64 arch, Core 2 Duo:

100 101 102 103 104 1 10 100

slide-49
SLIDE 49

MD5 cycles/byte on nmi-0104, amd64 arch, Pentium D f64:

100 101 102 103 104 1 10 100

slide-50
SLIDE 50

MD5 cycles/byte on mace, amd64 arch, Athlon 64 X2:

100 101 102 103 104 1 10 100

slide-51
SLIDE 51

SHA-512 cycles on orpheus, x86 arch, Pentium 3 672:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-52
SLIDE 52

SHA-512 cycles on katana, amd64 arch, Core 2 Duo:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-53
SLIDE 53

SHA-512 cycles/byte on orpheus, x86 arch, Pentium 3 672:

100 101 102 103 104 1 10 100

slide-54
SLIDE 54

SHA-512 cycles/byte on katana, amd64 arch, Core 2 Duo:

100 101 102 103 104 1 10 100

slide-55
SLIDE 55

MD4 cycles on katana, amd64 arch, Core 2 Duo:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-56
SLIDE 56

MD5 cycles on katana, amd64 arch, Core 2 Duo:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-57
SLIDE 57

SHA-1 cycles on katana, amd64 arch, Core 2 Duo:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-58
SLIDE 58

SHA-256 cycles on katana, amd64 arch, Core 2 Duo:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-59
SLIDE 59

SHA-512 cycles on katana, amd64 arch, Core 2 Duo:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-60
SLIDE 60

Whirlpool cycles on katana, amd64 arch, Core 2 Duo:

200 400 600 800 1000 1000 2000 3000 4000 5000 6000

slide-61
SLIDE 61

MD4 cycles/byte on katana, amd64 arch, Core 2 Duo:

100 101 102 103 104 1 10 100

slide-62
SLIDE 62

MD5 cycles/byte on katana, amd64 arch, Core 2 Duo:

100 101 102 103 104 1 10 100

slide-63
SLIDE 63

SHA-1 cycles/byte on katana, amd64 arch, Core 2 Duo:

100 101 102 103 104 1 10 100

slide-64
SLIDE 64

SHA-256 cycles/byte on katana, amd64 arch, Core 2 Duo:

100 101 102 103 104 1 10 100

slide-65
SLIDE 65

SHA-512 cycles/byte on katana, amd64 arch, Core 2 Duo:

100 101 102 103 104 1 10 100

slide-66
SLIDE 66

Whirlpool cycles/byte on katana, amd64 arch, Core 2 Duo:

100 101 102 103 104 1 10 100