Sorting integer arrays: Bobs laptop screen: security, speed, and - - PowerPoint PPT Presentation

sorting integer arrays bob s laptop screen security speed
SMART_READER_LITE
LIVE PREVIEW

Sorting integer arrays: Bobs laptop screen: security, speed, and - - PowerPoint PPT Presentation

1 2 Sorting integer arrays: Bobs laptop screen: security, speed, and verification From: Alice D. J. Bernstein Thank you for your submission. We received many interesting papers, and unfortunately your Bob assumes this message is


slide-1
SLIDE 1

1

Sorting integer arrays: security, speed, and verification

  • D. J. Bernstein

2

Bob’s laptop screen:

From: Alice Thank you for your

  • submission. We received

many interesting papers, and unfortunately your

Bob assumes this message is something Alice actually sent. But today’s “security” systems fail to guarantee this property. Attacker could have modified

  • r forged the message.
slide-2
SLIDE 2

1

rting integer arrays: y, speed, and verification Bernstein

2

Bob’s laptop screen:

From: Alice Thank you for your

  • submission. We received

many interesting papers, and unfortunately your

Bob assumes this message is something Alice actually sent. But today’s “security” systems fail to guarantee this property. Attacker could have modified

  • r forged the message.

Trusted TCB: po that is resp the users’

slide-3
SLIDE 3

1

rrays: and verification

2

Bob’s laptop screen:

From: Alice Thank you for your

  • submission. We received

many interesting papers, and unfortunately your

Bob assumes this message is something Alice actually sent. But today’s “security” systems fail to guarantee this property. Attacker could have modified

  • r forged the message.

Trusted computing TCB: portion of computer that is responsible the users’ security

slide-4
SLIDE 4

1

verification

2

Bob’s laptop screen:

From: Alice Thank you for your

  • submission. We received

many interesting papers, and unfortunately your

Bob assumes this message is something Alice actually sent. But today’s “security” systems fail to guarantee this property. Attacker could have modified

  • r forged the message.

Trusted computing base (TCB TCB: portion of computer system that is responsible for enforcing the users’ security policy.

slide-5
SLIDE 5

2

Bob’s laptop screen:

From: Alice Thank you for your

  • submission. We received

many interesting papers, and unfortunately your

Bob assumes this message is something Alice actually sent. But today’s “security” systems fail to guarantee this property. Attacker could have modified

  • r forged the message.

3

Trusted computing base (TCB) TCB: portion of computer system that is responsible for enforcing the users’ security policy.

slide-6
SLIDE 6

2

Bob’s laptop screen:

From: Alice Thank you for your

  • submission. We received

many interesting papers, and unfortunately your

Bob assumes this message is something Alice actually sent. But today’s “security” systems fail to guarantee this property. Attacker could have modified

  • r forged the message.

3

Trusted computing base (TCB) TCB: portion of computer system that is responsible for enforcing the users’ security policy. Security policy for this talk: If message is displayed on Bob’s screen as “From: Alice” then message is from Alice.

slide-7
SLIDE 7

2

Bob’s laptop screen:

From: Alice Thank you for your

  • submission. We received

many interesting papers, and unfortunately your

Bob assumes this message is something Alice actually sent. But today’s “security” systems fail to guarantee this property. Attacker could have modified

  • r forged the message.

3

Trusted computing base (TCB) TCB: portion of computer system that is responsible for enforcing the users’ security policy. Security policy for this talk: If message is displayed on Bob’s screen as “From: Alice” then message is from Alice. If TCB works correctly, then message is guaranteed to be from Alice, no matter what the rest of the system does.

slide-8
SLIDE 8

2

laptop screen:

From: Alice Thank you for your

  • submission. We received

many interesting papers, unfortunately your

assumes this message is something Alice actually sent. day’s “security” systems guarantee this property. er could have modified rged the message.

3

Trusted computing base (TCB) TCB: portion of computer system that is responsible for enforcing the users’ security policy. Security policy for this talk: If message is displayed on Bob’s screen as “From: Alice” then message is from Alice. If TCB works correctly, then message is guaranteed to be from Alice, no matter what the rest of the system does. Examples

  • 1. Attack

in a device Linux

slide-9
SLIDE 9

2

creen:

for your We received interesting papers, unfortunately your

this message is actually sent. “security” systems this property. have modified message.

3

Trusted computing base (TCB) TCB: portion of computer system that is responsible for enforcing the users’ security policy. Security policy for this talk: If message is displayed on Bob’s screen as “From: Alice” then message is from Alice. If TCB works correctly, then message is guaranteed to be from Alice, no matter what the rest of the system does. Examples of attack

  • 1. Attacker uses buffer

in a device driver Linux kernel on

slide-10
SLIDE 10

2

received papers, your

is sent. systems erty. dified

3

Trusted computing base (TCB) TCB: portion of computer system that is responsible for enforcing the users’ security policy. Security policy for this talk: If message is displayed on Bob’s screen as “From: Alice” then message is from Alice. If TCB works correctly, then message is guaranteed to be from Alice, no matter what the rest of the system does. Examples of attack strategies:

  • 1. Attacker uses buffer overflo

in a device driver to control Linux kernel on Alice’s laptop.

slide-11
SLIDE 11

3

Trusted computing base (TCB) TCB: portion of computer system that is responsible for enforcing the users’ security policy. Security policy for this talk: If message is displayed on Bob’s screen as “From: Alice” then message is from Alice. If TCB works correctly, then message is guaranteed to be from Alice, no matter what the rest of the system does.

4

Examples of attack strategies:

  • 1. Attacker uses buffer overflow

in a device driver to control Linux kernel on Alice’s laptop.

slide-12
SLIDE 12

3

Trusted computing base (TCB) TCB: portion of computer system that is responsible for enforcing the users’ security policy. Security policy for this talk: If message is displayed on Bob’s screen as “From: Alice” then message is from Alice. If TCB works correctly, then message is guaranteed to be from Alice, no matter what the rest of the system does.

4

Examples of attack strategies:

  • 1. Attacker uses buffer overflow

in a device driver to control Linux kernel on Alice’s laptop.

  • 2. Attacker uses buffer overflow

in a web browser to control disk files on Bob’s laptop.

slide-13
SLIDE 13

3

Trusted computing base (TCB) TCB: portion of computer system that is responsible for enforcing the users’ security policy. Security policy for this talk: If message is displayed on Bob’s screen as “From: Alice” then message is from Alice. If TCB works correctly, then message is guaranteed to be from Alice, no matter what the rest of the system does.

4

Examples of attack strategies:

  • 1. Attacker uses buffer overflow

in a device driver to control Linux kernel on Alice’s laptop.

  • 2. Attacker uses buffer overflow

in a web browser to control disk files on Bob’s laptop. Device driver is in the TCB. Web browser is in the TCB. CPU is in the TCB. Etc.

slide-14
SLIDE 14

3

Trusted computing base (TCB) TCB: portion of computer system that is responsible for enforcing the users’ security policy. Security policy for this talk: If message is displayed on Bob’s screen as “From: Alice” then message is from Alice. If TCB works correctly, then message is guaranteed to be from Alice, no matter what the rest of the system does.

4

Examples of attack strategies:

  • 1. Attacker uses buffer overflow

in a device driver to control Linux kernel on Alice’s laptop.

  • 2. Attacker uses buffer overflow

in a web browser to control disk files on Bob’s laptop. Device driver is in the TCB. Web browser is in the TCB. CPU is in the TCB. Etc. Massive TCB has many bugs, including many security holes. Any hope of fixing this?

slide-15
SLIDE 15

3

rusted computing base (TCB) portion of computer system responsible for enforcing users’ security policy. Security policy for this talk: message is displayed on screen as “From: Alice” message is from Alice. works correctly, message is guaranteed from Alice, no matter what rest of the system does.

4

Examples of attack strategies:

  • 1. Attacker uses buffer overflow

in a device driver to control Linux kernel on Alice’s laptop.

  • 2. Attacker uses buffer overflow

in a web browser to control disk files on Bob’s laptop. Device driver is in the TCB. Web browser is in the TCB. CPU is in the TCB. Etc. Massive TCB has many bugs, including many security holes. Any hope of fixing this? Classic sec Rearchitect to have a

slide-16
SLIDE 16

3

computing base (TCB) computer system

  • nsible for enforcing

security policy. for this talk: displayed on “From: Alice” from Alice. rrectly, guaranteed Alice, no matter what system does.

4

Examples of attack strategies:

  • 1. Attacker uses buffer overflow

in a device driver to control Linux kernel on Alice’s laptop.

  • 2. Attacker uses buffer overflow

in a web browser to control disk files on Bob’s laptop. Device driver is in the TCB. Web browser is in the TCB. CPU is in the TCB. Etc. Massive TCB has many bugs, including many security holes. Any hope of fixing this? Classic security strategy: Rearchitect computer to have a much smaller

slide-17
SLIDE 17

3

(TCB) system rcing talk:

Alice”

Alice. ranteed matter what es.

4

Examples of attack strategies:

  • 1. Attacker uses buffer overflow

in a device driver to control Linux kernel on Alice’s laptop.

  • 2. Attacker uses buffer overflow

in a web browser to control disk files on Bob’s laptop. Device driver is in the TCB. Web browser is in the TCB. CPU is in the TCB. Etc. Massive TCB has many bugs, including many security holes. Any hope of fixing this? Classic security strategy: Rearchitect computer systems to have a much smaller TCB

slide-18
SLIDE 18

4

Examples of attack strategies:

  • 1. Attacker uses buffer overflow

in a device driver to control Linux kernel on Alice’s laptop.

  • 2. Attacker uses buffer overflow

in a web browser to control disk files on Bob’s laptop. Device driver is in the TCB. Web browser is in the TCB. CPU is in the TCB. Etc. Massive TCB has many bugs, including many security holes. Any hope of fixing this?

5

Classic security strategy: Rearchitect computer systems to have a much smaller TCB.

slide-19
SLIDE 19

4

Examples of attack strategies:

  • 1. Attacker uses buffer overflow

in a device driver to control Linux kernel on Alice’s laptop.

  • 2. Attacker uses buffer overflow

in a web browser to control disk files on Bob’s laptop. Device driver is in the TCB. Web browser is in the TCB. CPU is in the TCB. Etc. Massive TCB has many bugs, including many security holes. Any hope of fixing this?

5

Classic security strategy: Rearchitect computer systems to have a much smaller TCB. Carefully audit the TCB.

slide-20
SLIDE 20

4

Examples of attack strategies:

  • 1. Attacker uses buffer overflow

in a device driver to control Linux kernel on Alice’s laptop.

  • 2. Attacker uses buffer overflow

in a web browser to control disk files on Bob’s laptop. Device driver is in the TCB. Web browser is in the TCB. CPU is in the TCB. Etc. Massive TCB has many bugs, including many security holes. Any hope of fixing this?

5

Classic security strategy: Rearchitect computer systems to have a much smaller TCB. Carefully audit the TCB. e.g. Bob runs many VMs: VM A Alice data VM C Charlie data · · · TCB stops each VM from touching data in other VMs.

slide-21
SLIDE 21

4

Examples of attack strategies:

  • 1. Attacker uses buffer overflow

in a device driver to control Linux kernel on Alice’s laptop.

  • 2. Attacker uses buffer overflow

in a web browser to control disk files on Bob’s laptop. Device driver is in the TCB. Web browser is in the TCB. CPU is in the TCB. Etc. Massive TCB has many bugs, including many security holes. Any hope of fixing this?

5

Classic security strategy: Rearchitect computer systems to have a much smaller TCB. Carefully audit the TCB. e.g. Bob runs many VMs: VM A Alice data VM C Charlie data · · · TCB stops each VM from touching data in other VMs. Browser in VM C isn’t in TCB. Can’t touch data in VM A, if TCB works correctly.

slide-22
SLIDE 22

4

Examples of attack strategies:

  • 1. Attacker uses buffer overflow

in a device driver to control Linux kernel on Alice’s laptop.

  • 2. Attacker uses buffer overflow

in a web browser to control disk files on Bob’s laptop. Device driver is in the TCB. Web browser is in the TCB. CPU is in the TCB. Etc. Massive TCB has many bugs, including many security holes. Any hope of fixing this?

5

Classic security strategy: Rearchitect computer systems to have a much smaller TCB. Carefully audit the TCB. e.g. Bob runs many VMs: VM A Alice data VM C Charlie data · · · TCB stops each VM from touching data in other VMs. Browser in VM C isn’t in TCB. Can’t touch data in VM A, if TCB works correctly. Alice also runs many VMs.

slide-23
SLIDE 23

4

Examples of attack strategies: ttacker uses buffer overflow device driver to control Linux kernel on Alice’s laptop. ttacker uses buffer overflow web browser to control files on Bob’s laptop. driver is in the TCB. rowser is in the TCB. is in the TCB. Etc. Massive TCB has many bugs, including many security holes. hope of fixing this?

5

Classic security strategy: Rearchitect computer systems to have a much smaller TCB. Carefully audit the TCB. e.g. Bob runs many VMs: VM A Alice data VM C Charlie data · · · TCB stops each VM from touching data in other VMs. Browser in VM C isn’t in TCB. Can’t touch data in VM A, if TCB works correctly. Alice also runs many VMs. Cryptography How does that incoming is from Alice’s Cryptographic Message-authentication Alice’s authenticated authenticated Alice’s

slide-24
SLIDE 24

4

attack strategies: buffer overflow driver to control

  • n Alice’s laptop.

buffer overflow wser to control Bob’s laptop. in the TCB. in the TCB.

  • TCB. Etc.

has many bugs, security holes. fixing this?

5

Classic security strategy: Rearchitect computer systems to have a much smaller TCB. Carefully audit the TCB. e.g. Bob runs many VMs: VM A Alice data VM C Charlie data · · · TCB stops each VM from touching data in other VMs. Browser in VM C isn’t in TCB. Can’t touch data in VM A, if TCB works correctly. Alice also runs many VMs. Cryptography How does Bob’s laptop that incoming netw is from Alice’s laptop? Cryptographic solution: Message-authentication Alice’s message

  • authenticated message

untrusted

  • authenticated message
  • Alice’s message
slide-25
SLIDE 25

4

strategies:

  • verflow

control laptop.

  • verflow

control laptop. TCB. TCB. bugs, holes.

5

Classic security strategy: Rearchitect computer systems to have a much smaller TCB. Carefully audit the TCB. e.g. Bob runs many VMs: VM A Alice data VM C Charlie data · · · TCB stops each VM from touching data in other VMs. Browser in VM C isn’t in TCB. Can’t touch data in VM A, if TCB works correctly. Alice also runs many VMs. Cryptography How does Bob’s laptop know that incoming network data is from Alice’s laptop? Cryptographic solution: Message-authentication codes. Alice’s message

  • authenticated message

untrusted netwo

  • authenticated message
  • Alice’s message
slide-26
SLIDE 26

5

Classic security strategy: Rearchitect computer systems to have a much smaller TCB. Carefully audit the TCB. e.g. Bob runs many VMs: VM A Alice data VM C Charlie data · · · TCB stops each VM from touching data in other VMs. Browser in VM C isn’t in TCB. Can’t touch data in VM A, if TCB works correctly. Alice also runs many VMs.

6

Cryptography How does Bob’s laptop know that incoming network data is from Alice’s laptop? Cryptographic solution: Message-authentication codes. Alice’s message

  • k
  • authenticated message

untrusted network

  • authenticated message
  • Alice’s message

k

slide-27
SLIDE 27

5

Classic security strategy: Rearchitect computer systems to have a much smaller TCB. Carefully audit the TCB. e.g. Bob runs many VMs: VM A Alice data VM C Charlie data · · · TCB stops each VM from touching data in other VMs. Browser in VM C isn’t in TCB. Can’t touch data in VM A, if TCB works correctly. Alice also runs many VMs.

6

Cryptography How does Bob’s laptop know that incoming network data is from Alice’s laptop? Cryptographic solution: Message-authentication codes. Alice’s message

  • k
  • authenticated message

untrusted network

  • modified message
  • “Alert: forgery!”

k

slide-28
SLIDE 28

5

security strategy: rchitect computer systems have a much smaller TCB. refully audit the TCB. Bob runs many VMs: A data VM C Charlie data · · · stops each VM from touching data in other VMs. wser in VM C isn’t in TCB. touch data in VM A, works correctly. also runs many VMs.

6

Cryptography How does Bob’s laptop know that incoming network data is from Alice’s laptop? Cryptographic solution: Message-authentication codes. Alice’s message

  • k
  • authenticated message

untrusted network

  • modified message
  • “Alert: forgery!”

k

  • Important

to share What if

  • n their
slide-29
SLIDE 29

5

strategy: computer systems smaller TCB. the TCB. many VMs: VM C Charlie data · · · VM from

  • ther VMs.

C isn’t in TCB. ta in VM A, rrectly. many VMs.

6

Cryptography How does Bob’s laptop know that incoming network data is from Alice’s laptop? Cryptographic solution: Message-authentication codes. Alice’s message

  • k
  • authenticated message

untrusted network

  • modified message
  • “Alert: forgery!”

k

  • Important for Alice

to share the same What if attacker w

  • n their communication
slide-30
SLIDE 30

5

systems TCB. · · · VMs. TCB. A, VMs.

6

Cryptography How does Bob’s laptop know that incoming network data is from Alice’s laptop? Cryptographic solution: Message-authentication codes. Alice’s message

  • k
  • authenticated message

untrusted network

  • modified message
  • “Alert: forgery!”

k

  • Important for Alice and Bob

to share the same secret k. What if attacker was spying

  • n their communication of k
slide-31
SLIDE 31

6

Cryptography How does Bob’s laptop know that incoming network data is from Alice’s laptop? Cryptographic solution: Message-authentication codes. Alice’s message

  • k
  • authenticated message

untrusted network

  • modified message
  • “Alert: forgery!”

k

  • 7

Important for Alice and Bob to share the same secret k. What if attacker was spying

  • n their communication of k?
slide-32
SLIDE 32

6

Cryptography How does Bob’s laptop know that incoming network data is from Alice’s laptop? Cryptographic solution: Message-authentication codes. Alice’s message

  • k
  • authenticated message

untrusted network

  • modified message
  • “Alert: forgery!”

k

  • 7

Important for Alice and Bob to share the same secret k. What if attacker was spying

  • n their communication of k?

Solution 1: Public-key encryption. k private key a

  • ciphertext
  • public key aG

network

  • ciphertext

network

  • public key aG
  • k
slide-33
SLIDE 33

6

Cryptography does Bob’s laptop know incoming network data Alice’s laptop? Cryptographic solution: Message-authentication codes. Alice’s message

  • k
  • authenticated message

untrusted network

  • dified message
  • “Alert: forgery!”

k

  • 7

Important for Alice and Bob to share the same secret k. What if attacker was spying

  • n their communication of k?

Solution 1: Public-key encryption. k private key a

  • ciphertext
  • public key aG

network

  • ciphertext

network

  • public key aG
  • k
  • Solution

Public-key m

  • signed message
  • signed message
  • m
slide-34
SLIDE 34

6

laptop know network data laptop? solution: Message-authentication codes. message k

  • message

untrusted network message rgery!” k

  • 7

Important for Alice and Bob to share the same secret k. What if attacker was spying

  • n their communication of k?

Solution 1: Public-key encryption. k private key a

  • ciphertext
  • public key aG

network

  • ciphertext

network

  • public key aG
  • k
  • Solution 2:

Public-key signatures. m

  • signed message

network

  • signed message

m

slide-35
SLIDE 35

6

know data des. k work k

7

Important for Alice and Bob to share the same secret k. What if attacker was spying

  • n their communication of k?

Solution 1: Public-key encryption. k private key a

  • ciphertext
  • public key aG

network

  • ciphertext

network

  • public key aG
  • k
  • Solution 2:

Public-key signatures. m

  • a
  • signed message

network

  • aG

net

  • signed message
  • aG
  • m
slide-36
SLIDE 36

7

Important for Alice and Bob to share the same secret k. What if attacker was spying

  • n their communication of k?

Solution 1: Public-key encryption. k private key a

  • ciphertext
  • public key aG

network

  • ciphertext

network

  • public key aG
  • k
  • 8

Solution 2: Public-key signatures. m

  • a
  • signed message

network

  • aG

network

  • signed message
  • aG
  • m
slide-37
SLIDE 37

7

Important for Alice and Bob to share the same secret k. What if attacker was spying

  • n their communication of k?

Solution 1: Public-key encryption. k private key a

  • ciphertext
  • public key aG

network

  • ciphertext

network

  • public key aG
  • k
  • 8

Solution 2: Public-key signatures. m

  • a
  • signed message

network

  • aG

network

  • signed message
  • aG
  • m

No more shared secret k but Alice still has secret a. Cryptography requires TCB to protect secrecy of keys, even if user has no other secrets.

slide-38
SLIDE 38

7

rtant for Alice and Bob re the same secret k. if attacker was spying their communication of k? Solution 1: Public-key encryption. private key a

  • ciphertext

public key aG network

  • ciphertext

network public key aG

  • 8

Solution 2: Public-key signatures. m

  • a
  • signed message

network

  • aG

network

  • signed message
  • aG
  • m

No more shared secret k but Alice still has secret a. Cryptography requires TCB to protect secrecy of keys, even if user has no other secrets. Constant-time Large po

  • ptimizations

addresses Consider instruction parallel cache store-to-load branch p

slide-39
SLIDE 39

7

Alice and Bob same secret k. was spying communication of k? encryption. private key a

  • public key aG

network

  • public key aG

8

Solution 2: Public-key signatures. m

  • a
  • signed message

network

  • aG

network

  • signed message
  • aG
  • m

No more shared secret k but Alice still has secret a. Cryptography requires TCB to protect secrecy of keys, even if user has no other secrets. Constant-time soft Large portion of CPU

  • ptimizations depending

addresses of memo Consider data cachin instruction caching, parallel cache banks, store-to-load forwa branch prediction,

slide-40
SLIDE 40

7

Bob . ying

  • f k?

key a y aG network y aG

8

Solution 2: Public-key signatures. m

  • a
  • signed message

network

  • aG

network

  • signed message
  • aG
  • m

No more shared secret k but Alice still has secret a. Cryptography requires TCB to protect secrecy of keys, even if user has no other secrets. Constant-time software Large portion of CPU hardw

  • ptimizations depending on

addresses of memory locations. Consider data caching, instruction caching, parallel cache banks, store-to-load forwarding, branch prediction, etc.

slide-41
SLIDE 41

8

Solution 2: Public-key signatures. m

  • a
  • signed message

network

  • aG

network

  • signed message
  • aG
  • m

No more shared secret k but Alice still has secret a. Cryptography requires TCB to protect secrecy of keys, even if user has no other secrets.

9

Constant-time software Large portion of CPU hardware:

  • ptimizations depending on

addresses of memory locations. Consider data caching, instruction caching, parallel cache banks, store-to-load forwarding, branch prediction, etc.

slide-42
SLIDE 42

8

Solution 2: Public-key signatures. m

  • a
  • signed message

network

  • aG

network

  • signed message
  • aG
  • m

No more shared secret k but Alice still has secret a. Cryptography requires TCB to protect secrecy of keys, even if user has no other secrets.

9

Constant-time software Large portion of CPU hardware:

  • ptimizations depending on

addresses of memory locations. Consider data caching, instruction caching, parallel cache banks, store-to-load forwarding, branch prediction, etc. Many attacks (e.g. TLBleed from 2018 Gras–Razavi–Bos–Giuffrida) show that this portion of the CPU has trouble keeping secrets.

slide-43
SLIDE 43

8

Solution 2: Public-key signatures. m

  • a
  • message

network

  • aG

network

  • message
  • aG
  • m

re shared secret k Alice still has secret a. Cryptography requires TCB rotect secrecy of keys, if user has no other secrets.

9

Constant-time software Large portion of CPU hardware:

  • ptimizations depending on

addresses of memory locations. Consider data caching, instruction caching, parallel cache banks, store-to-load forwarding, branch prediction, etc. Many attacks (e.g. TLBleed from 2018 Gras–Razavi–Bos–Giuffrida) show that this portion of the CPU has trouble keeping secrets. Typical literature Understand But details not exposed Try to push This becomes Tweak the to try to

slide-44
SLIDE 44

8

signatures. a

  • aG

network

  • aG

secret k has secret a. requires TCB secrecy of keys, no other secrets.

9

Constant-time software Large portion of CPU hardware:

  • ptimizations depending on

addresses of memory locations. Consider data caching, instruction caching, parallel cache banks, store-to-load forwarding, branch prediction, etc. Many attacks (e.g. TLBleed from 2018 Gras–Razavi–Bos–Giuffrida) show that this portion of the CPU has trouble keeping secrets. Typical literature on Understand this po But details are often not exposed to securit Try to push attacks This becomes very Tweak the attacked to try to stop the kno

slide-45
SLIDE 45

8

network . TCB eys, secrets.

9

Constant-time software Large portion of CPU hardware:

  • ptimizations depending on

addresses of memory locations. Consider data caching, instruction caching, parallel cache banks, store-to-load forwarding, branch prediction, etc. Many attacks (e.g. TLBleed from 2018 Gras–Razavi–Bos–Giuffrida) show that this portion of the CPU has trouble keeping secrets. Typical literature on this topic: Understand this portion of CPU. But details are often proprieta not exposed to security review. Try to push attacks further. This becomes very complicated. Tweak the attacked software to try to stop the known attacks.

slide-46
SLIDE 46

9

Constant-time software Large portion of CPU hardware:

  • ptimizations depending on

addresses of memory locations. Consider data caching, instruction caching, parallel cache banks, store-to-load forwarding, branch prediction, etc. Many attacks (e.g. TLBleed from 2018 Gras–Razavi–Bos–Giuffrida) show that this portion of the CPU has trouble keeping secrets.

10

Typical literature on this topic: Understand this portion of CPU. But details are often proprietary, not exposed to security review. Try to push attacks further. This becomes very complicated. Tweak the attacked software to try to stop the known attacks.

slide-47
SLIDE 47

9

Constant-time software Large portion of CPU hardware:

  • ptimizations depending on

addresses of memory locations. Consider data caching, instruction caching, parallel cache banks, store-to-load forwarding, branch prediction, etc. Many attacks (e.g. TLBleed from 2018 Gras–Razavi–Bos–Giuffrida) show that this portion of the CPU has trouble keeping secrets.

10

Typical literature on this topic: Understand this portion of CPU. But details are often proprietary, not exposed to security review. Try to push attacks further. This becomes very complicated. Tweak the attacked software to try to stop the known attacks. For researchers: This is great!

slide-48
SLIDE 48

9

Constant-time software Large portion of CPU hardware:

  • ptimizations depending on

addresses of memory locations. Consider data caching, instruction caching, parallel cache banks, store-to-load forwarding, branch prediction, etc. Many attacks (e.g. TLBleed from 2018 Gras–Razavi–Bos–Giuffrida) show that this portion of the CPU has trouble keeping secrets.

10

Typical literature on this topic: Understand this portion of CPU. But details are often proprietary, not exposed to security review. Try to push attacks further. This becomes very complicated. Tweak the attacked software to try to stop the known attacks. For researchers: This is great! For auditors: This is a nightmare. Many years of security failures. No confidence in future security.

slide-49
SLIDE 49

9

Constant-time software portion of CPU hardware:

  • ptimizations depending on

addresses of memory locations. Consider data caching, instruction caching, rallel cache banks, re-to-load forwarding, prediction, etc. attacks (e.g. TLBleed from Gras–Razavi–Bos–Giuffrida) that this portion of the CPU trouble keeping secrets.

10

Typical literature on this topic: Understand this portion of CPU. But details are often proprietary, not exposed to security review. Try to push attacks further. This becomes very complicated. Tweak the attacked software to try to stop the known attacks. For researchers: This is great! For auditors: This is a nightmare. Many years of security failures. No confidence in future security. The “constant-time” Don’t give to this p (1987 Goldreich, Oblivious domain-sp

slide-50
SLIDE 50

9

software CPU hardware: depending on memory locations. caching, caching, banks, rwarding, rediction, etc. (e.g. TLBleed from Gras–Razavi–Bos–Giuffrida)

  • rtion of the CPU

eeping secrets.

10

Typical literature on this topic: Understand this portion of CPU. But details are often proprietary, not exposed to security review. Try to push attacks further. This becomes very complicated. Tweak the attacked software to try to stop the known attacks. For researchers: This is great! For auditors: This is a nightmare. Many years of security failures. No confidence in future security. The “constant-time” Don’t give any secrets to this portion of the (1987 Goldreich, 1990 Oblivious RAM; 2004 domain-specific for

slide-51
SLIDE 51

9

rdware:

  • n

cations. TLBleed from Gras–Razavi–Bos–Giuffrida) the CPU secrets.

10

Typical literature on this topic: Understand this portion of CPU. But details are often proprietary, not exposed to security review. Try to push attacks further. This becomes very complicated. Tweak the attacked software to try to stop the known attacks. For researchers: This is great! For auditors: This is a nightmare. Many years of security failures. No confidence in future security. The “constant-time” solution: Don’t give any secrets to this portion of the CPU. (1987 Goldreich, 1990 Ostrovsky: Oblivious RAM; 2004 Bernstein: domain-specific for better sp

slide-52
SLIDE 52

10

Typical literature on this topic: Understand this portion of CPU. But details are often proprietary, not exposed to security review. Try to push attacks further. This becomes very complicated. Tweak the attacked software to try to stop the known attacks. For researchers: This is great! For auditors: This is a nightmare. Many years of security failures. No confidence in future security.

11

The “constant-time” solution: Don’t give any secrets to this portion of the CPU. (1987 Goldreich, 1990 Ostrovsky: Oblivious RAM; 2004 Bernstein: domain-specific for better speed)

slide-53
SLIDE 53

10

Typical literature on this topic: Understand this portion of CPU. But details are often proprietary, not exposed to security review. Try to push attacks further. This becomes very complicated. Tweak the attacked software to try to stop the known attacks. For researchers: This is great! For auditors: This is a nightmare. Many years of security failures. No confidence in future security.

11

The “constant-time” solution: Don’t give any secrets to this portion of the CPU. (1987 Goldreich, 1990 Ostrovsky: Oblivious RAM; 2004 Bernstein: domain-specific for better speed) TCB analysis: Need this portion

  • f the CPU to be correct, but

don’t need it to keep secrets. Makes auditing much easier.

slide-54
SLIDE 54

10

Typical literature on this topic: Understand this portion of CPU. But details are often proprietary, not exposed to security review. Try to push attacks further. This becomes very complicated. Tweak the attacked software to try to stop the known attacks. For researchers: This is great! For auditors: This is a nightmare. Many years of security failures. No confidence in future security.

11

The “constant-time” solution: Don’t give any secrets to this portion of the CPU. (1987 Goldreich, 1990 Ostrovsky: Oblivious RAM; 2004 Bernstein: domain-specific for better speed) TCB analysis: Need this portion

  • f the CPU to be correct, but

don’t need it to keep secrets. Makes auditing much easier. Good match for attitude and experience of CPU designers: e.g., Intel issues errata for correctness bugs, not for information leaks.

slide-55
SLIDE 55

10

ypical literature on this topic: Understand this portion of CPU. details are often proprietary, exposed to security review. push attacks further. ecomes very complicated. the attacked software to stop the known attacks. researchers: This is great! auditors: This is a nightmare. years of security failures. confidence in future security.

11

The “constant-time” solution: Don’t give any secrets to this portion of the CPU. (1987 Goldreich, 1990 Ostrovsky: Oblivious RAM; 2004 Bernstein: domain-specific for better speed) TCB analysis: Need this portion

  • f the CPU to be correct, but

don’t need it to keep secrets. Makes auditing much easier. Good match for attitude and experience of CPU designers: e.g., Intel issues errata for correctness bugs, not for information leaks. Case study: Serious risk Attacker breaking public-key e.g., finding

slide-56
SLIDE 56

10

literature on this topic: portion of CPU.

  • ften proprietary,

security review. ttacks further. very complicated. attacked software the known attacks. This is great! This is a nightmare. security failures. future security.

11

The “constant-time” solution: Don’t give any secrets to this portion of the CPU. (1987 Goldreich, 1990 Ostrovsky: Oblivious RAM; 2004 Bernstein: domain-specific for better speed) TCB analysis: Need this portion

  • f the CPU to be correct, but

don’t need it to keep secrets. Makes auditing much easier. Good match for attitude and experience of CPU designers: e.g., Intel issues errata for correctness bugs, not for information leaks. Case study: Constant-time Serious risk within Attacker has quantum breaking today’s most public-key crypto (RSA e.g., finding a given

slide-57
SLIDE 57

10

topic:

  • f CPU.

rietary, review. further. complicated. are attacks. great! nightmare. failures. security.

11

The “constant-time” solution: Don’t give any secrets to this portion of the CPU. (1987 Goldreich, 1990 Ostrovsky: Oblivious RAM; 2004 Bernstein: domain-specific for better speed) TCB analysis: Need this portion

  • f the CPU to be correct, but

don’t need it to keep secrets. Makes auditing much easier. Good match for attitude and experience of CPU designers: e.g., Intel issues errata for correctness bugs, not for information leaks. Case study: Constant-time so Serious risk within 10 years: Attacker has quantum computer breaking today’s most popula public-key crypto (RSA and e.g., finding a given aG).

slide-58
SLIDE 58

11

The “constant-time” solution: Don’t give any secrets to this portion of the CPU. (1987 Goldreich, 1990 Ostrovsky: Oblivious RAM; 2004 Bernstein: domain-specific for better speed) TCB analysis: Need this portion

  • f the CPU to be correct, but

don’t need it to keep secrets. Makes auditing much easier. Good match for attitude and experience of CPU designers: e.g., Intel issues errata for correctness bugs, not for information leaks.

12

Case study: Constant-time sorting Serious risk within 10 years: Attacker has quantum computer breaking today’s most popular public-key crypto (RSA and ECC; e.g., finding a given aG).

slide-59
SLIDE 59

11

The “constant-time” solution: Don’t give any secrets to this portion of the CPU. (1987 Goldreich, 1990 Ostrovsky: Oblivious RAM; 2004 Bernstein: domain-specific for better speed) TCB analysis: Need this portion

  • f the CPU to be correct, but

don’t need it to keep secrets. Makes auditing much easier. Good match for attitude and experience of CPU designers: e.g., Intel issues errata for correctness bugs, not for information leaks.

12

Case study: Constant-time sorting Serious risk within 10 years: Attacker has quantum computer breaking today’s most popular public-key crypto (RSA and ECC; e.g., finding a given aG). 2017: Hundreds of people submit 69 complete proposals to international competition for post-quantum crypto standards.

slide-60
SLIDE 60

11

The “constant-time” solution: Don’t give any secrets to this portion of the CPU. (1987 Goldreich, 1990 Ostrovsky: Oblivious RAM; 2004 Bernstein: domain-specific for better speed) TCB analysis: Need this portion

  • f the CPU to be correct, but

don’t need it to keep secrets. Makes auditing much easier. Good match for attitude and experience of CPU designers: e.g., Intel issues errata for correctness bugs, not for information leaks.

12

Case study: Constant-time sorting Serious risk within 10 years: Attacker has quantum computer breaking today’s most popular public-key crypto (RSA and ECC; e.g., finding a given aG). 2017: Hundreds of people submit 69 complete proposals to international competition for post-quantum crypto standards. Subroutine in some submissions: sort array of secret integers. e.g. sort 768 32-bit integers.

slide-61
SLIDE 61

11

“constant-time” solution: give any secrets portion of the CPU. Goldreich, 1990 Ostrovsky: Oblivious RAM; 2004 Bernstein: domain-specific for better speed) analysis: Need this portion CPU to be correct, but need it to keep secrets. auditing much easier. match for attitude and erience of CPU designers: e.g., issues errata for correctness not for information leaks.

12

Case study: Constant-time sorting Serious risk within 10 years: Attacker has quantum computer breaking today’s most popular public-key crypto (RSA and ECC; e.g., finding a given aG). 2017: Hundreds of people submit 69 complete proposals to international competition for post-quantum crypto standards. Subroutine in some submissions: sort array of secret integers. e.g. sort 768 32-bit integers. How to so without

slide-62
SLIDE 62

11

“constant-time” solution: secrets

  • f the CPU.

Goldreich, 1990 Ostrovsky: 2004 Bernstein: for better speed) Need this portion e correct, but keep secrets. much easier. attitude and U designers: e.g., errata for correctness information leaks.

12

Case study: Constant-time sorting Serious risk within 10 years: Attacker has quantum computer breaking today’s most popular public-key crypto (RSA and ECC; e.g., finding a given aG). 2017: Hundreds of people submit 69 complete proposals to international competition for post-quantum crypto standards. Subroutine in some submissions: sort array of secret integers. e.g. sort 768 32-bit integers. How to sort secret without any secret

slide-63
SLIDE 63

11

solution: CPU. Ostrovsky: Bernstein: speed)

  • rtion

but secrets. easier. and designers: e.g., rrectness leaks.

12

Case study: Constant-time sorting Serious risk within 10 years: Attacker has quantum computer breaking today’s most popular public-key crypto (RSA and ECC; e.g., finding a given aG). 2017: Hundreds of people submit 69 complete proposals to international competition for post-quantum crypto standards. Subroutine in some submissions: sort array of secret integers. e.g. sort 768 32-bit integers. How to sort secret data without any secret addresses?

slide-64
SLIDE 64

12

Case study: Constant-time sorting Serious risk within 10 years: Attacker has quantum computer breaking today’s most popular public-key crypto (RSA and ECC; e.g., finding a given aG). 2017: Hundreds of people submit 69 complete proposals to international competition for post-quantum crypto standards. Subroutine in some submissions: sort array of secret integers. e.g. sort 768 32-bit integers.

13

How to sort secret data without any secret addresses?

slide-65
SLIDE 65

12

Case study: Constant-time sorting Serious risk within 10 years: Attacker has quantum computer breaking today’s most popular public-key crypto (RSA and ECC; e.g., finding a given aG). 2017: Hundreds of people submit 69 complete proposals to international competition for post-quantum crypto standards. Subroutine in some submissions: sort array of secret integers. e.g. sort 768 32-bit integers.

13

How to sort secret data without any secret addresses? Typical sorting algorithms— merge sort, quicksort, etc.— choose load/store addresses based on secret data. Usually also branch based on secret data.

slide-66
SLIDE 66

12

Case study: Constant-time sorting Serious risk within 10 years: Attacker has quantum computer breaking today’s most popular public-key crypto (RSA and ECC; e.g., finding a given aG). 2017: Hundreds of people submit 69 complete proposals to international competition for post-quantum crypto standards. Subroutine in some submissions: sort array of secret integers. e.g. sort 768 32-bit integers.

13

How to sort secret data without any secret addresses? Typical sorting algorithms— merge sort, quicksort, etc.— choose load/store addresses based on secret data. Usually also branch based on secret data. One submission to competition: “Radix sort is used as constant-time sorting algorithm.” Some versions of radix sort avoid secret branches.

slide-67
SLIDE 67

12

Case study: Constant-time sorting Serious risk within 10 years: Attacker has quantum computer breaking today’s most popular public-key crypto (RSA and ECC; e.g., finding a given aG). 2017: Hundreds of people submit 69 complete proposals to international competition for post-quantum crypto standards. Subroutine in some submissions: sort array of secret integers. e.g. sort 768 32-bit integers.

13

How to sort secret data without any secret addresses? Typical sorting algorithms— merge sort, quicksort, etc.— choose load/store addresses based on secret data. Usually also branch based on secret data. One submission to competition: “Radix sort is used as constant-time sorting algorithm.” Some versions of radix sort avoid secret branches. But data addresses in radix sort still depend on secrets.

slide-68
SLIDE 68

12

study: Constant-time sorting Serious risk within 10 years: er has quantum computer reaking today’s most popular public-key crypto (RSA and ECC; finding a given aG). Hundreds of people submit 69 complete proposals international competition for

  • st-quantum crypto standards.

routine in some submissions: rray of secret integers. rt 768 32-bit integers.

13

How to sort secret data without any secret addresses? Typical sorting algorithms— merge sort, quicksort, etc.— choose load/store addresses based on secret data. Usually also branch based on secret data. One submission to competition: “Radix sort is used as constant-time sorting algorithm.” Some versions of radix sort avoid secret branches. But data addresses in radix sort still depend on secrets. Foundation a compa x

  • min{x; y

Easy constant-time Warning: compiler Even easier

slide-69
SLIDE 69

12

Constant-time sorting within 10 years: quantum computer most popular crypto (RSA and ECC; given aG).

  • f people

complete proposals competition for crypto standards. some submissions: cret integers. 32-bit integers.

13

How to sort secret data without any secret addresses? Typical sorting algorithms— merge sort, quicksort, etc.— choose load/store addresses based on secret data. Usually also branch based on secret data. One submission to competition: “Radix sort is used as constant-time sorting algorithm.” Some versions of radix sort avoid secret branches. But data addresses in radix sort still depend on secrets. Foundation of solution: a comparator sorting x

  • min{x; y}

max Easy constant-time Warning: C standa compiler to screw Even easier exercise

slide-70
SLIDE 70

12

Constant-time sorting rs: computer

  • pular

and ECC;

  • sals

etition for standards. submissions: integers. gers.

13

How to sort secret data without any secret addresses? Typical sorting algorithms— merge sort, quicksort, etc.— choose load/store addresses based on secret data. Usually also branch based on secret data. One submission to competition: “Radix sort is used as constant-time sorting algorithm.” Some versions of radix sort avoid secret branches. But data addresses in radix sort still depend on secrets. Foundation of solution: a comparator sorting 2 integers. x y

  • min{x; y}

max{x; y} Easy constant-time exercise Warning: C standard allows compiler to screw this up. Even easier exercise in asm.

slide-71
SLIDE 71

13

How to sort secret data without any secret addresses? Typical sorting algorithms— merge sort, quicksort, etc.— choose load/store addresses based on secret data. Usually also branch based on secret data. One submission to competition: “Radix sort is used as constant-time sorting algorithm.” Some versions of radix sort avoid secret branches. But data addresses in radix sort still depend on secrets.

14

Foundation of solution: a comparator sorting 2 integers. x y

  • min{x; y}

max{x; y} Easy constant-time exercise in C. Warning: C standard allows compiler to screw this up. Even easier exercise in asm.

slide-72
SLIDE 72

13

to sort secret data without any secret addresses? ypical sorting algorithms— sort, quicksort, etc.— load/store addresses

  • n secret data. Usually

anch based on secret data. submission to competition: sort is used as constant-time sorting algorithm.” versions of radix sort secret branches. data addresses in radix sort depend on secrets.

14

Foundation of solution: a comparator sorting 2 integers. x y

  • min{x; y}

max{x; y} Easy constant-time exercise in C. Warning: C standard allows compiler to screw this up. Even easier exercise in asm. Combine sorting net Example

slide-73
SLIDE 73

13

ecret data secret addresses? algorithms— quicksort, etc.— re addresses

  • data. Usually

based on secret data. to competition: used as rting algorithm.”

  • f radix sort

ranches. addresses in radix sort secrets.

14

Foundation of solution: a comparator sorting 2 integers. x y

  • min{x; y}

max{x; y} Easy constant-time exercise in C. Warning: C standard allows compiler to screw this up. Even easier exercise in asm. Combine comparato sorting network fo Example of a sorting

slide-74
SLIDE 74

13

addresses? rithms— tc.— addresses Usually secret data. etition: rithm.” rt radix sort

14

Foundation of solution: a comparator sorting 2 integers. x y

  • min{x; y}

max{x; y} Easy constant-time exercise in C. Warning: C standard allows compiler to screw this up. Even easier exercise in asm. Combine comparators into a sorting network for more inputs. Example of a sorting network:

slide-75
SLIDE 75

14

Foundation of solution: a comparator sorting 2 integers. x y

  • min{x; y}

max{x; y} Easy constant-time exercise in C. Warning: C standard allows compiler to screw this up. Even easier exercise in asm.

15

Combine comparators into a sorting network for more inputs. Example of a sorting network:

slide-76
SLIDE 76

14

  • undation of solution:

comparator sorting 2 integers. y

  • ; y}

max{x; y} constant-time exercise in C. rning: C standard allows compiler to screw this up. easier exercise in asm.

15

Combine comparators into a sorting network for more inputs. Example of a sorting network:

  • Positions

in a sorting independent Naturally

slide-77
SLIDE 77

14

solution: sorting 2 integers. y

  • max{x; y}

constant-time exercise in C. standard allows screw this up. exercise in asm.

15

Combine comparators into a sorting network for more inputs. Example of a sorting network:

  • Positions of compa

in a sorting network independent of the Naturally constant-time.

slide-78
SLIDE 78

14

integers. } exercise in C. ws .

15

Combine comparators into a sorting network for more inputs. Example of a sorting network:

  • Positions of comparators

in a sorting network are independent of the input. Naturally constant-time.

slide-79
SLIDE 79

15

Combine comparators into a sorting network for more inputs. Example of a sorting network:

  • 16

Positions of comparators in a sorting network are independent of the input. Naturally constant-time.

slide-80
SLIDE 80

15

Combine comparators into a sorting network for more inputs. Example of a sorting network:

  • 16

Positions of comparators in a sorting network are independent of the input. Naturally constant-time. But (n2 − n)=2 comparators produce complaints about performance as n increases.

slide-81
SLIDE 81

15

Combine comparators into a sorting network for more inputs. Example of a sorting network:

  • 16

Positions of comparators in a sorting network are independent of the input. Naturally constant-time. But (n2 − n)=2 comparators produce complaints about performance as n increases. Speed is a serious issue in the post-quantum competition. “Cost” is evaluation criterion; “we’d like to stress this once again on the forum that we’d really like to see more platform-

  • ptimized implementations”; etc.
slide-82
SLIDE 82

15

Combine comparators into a rting network for more inputs. Example of a sorting network:

  • 16

Positions of comparators in a sorting network are independent of the input. Naturally constant-time. But (n2 − n)=2 comparators produce complaints about performance as n increases. Speed is a serious issue in the post-quantum competition. “Cost” is evaluation criterion; “we’d like to stress this once again on the forum that we’d really like to see more platform-

  • ptimized implementations”; etc.

void int32_sort(int32 { int64 if (n t = 1; while for (p for if for for } }

slide-83
SLIDE 83

15

rators into a for more inputs. rting network:

  • 16

Positions of comparators in a sorting network are independent of the input. Naturally constant-time. But (n2 − n)=2 comparators produce complaints about performance as n increases. Speed is a serious issue in the post-quantum competition. “Cost” is evaluation criterion; “we’d like to stress this once again on the forum that we’d really like to see more platform-

  • ptimized implementations”; etc.

void int32_sort(int32 { int64 t,p,q,i; if (n < 2) return; t = 1; while (t < n - for (p = t;p > for (i = 0;i if (!(i & p)) minmax(x+i,x+i+p); for (q = t;q for (i = 0;i if (!(i & minmax(x+i+p,x+i+q); } }

slide-84
SLIDE 84

15

a inputs.

  • rk:
  • 16

Positions of comparators in a sorting network are independent of the input. Naturally constant-time. But (n2 − n)=2 comparators produce complaints about performance as n increases. Speed is a serious issue in the post-quantum competition. “Cost” is evaluation criterion; “we’d like to stress this once again on the forum that we’d really like to see more platform-

  • ptimized implementations”; etc.

void int32_sort(int32 *x,int64 { int64 t,p,q,i; if (n < 2) return; t = 1; while (t < n - t) t += for (p = t;p > 0;p >>= for (i = 0;i < n - p;++i) if (!(i & p)) minmax(x+i,x+i+p); for (q = t;q > p;q >>= for (i = 0;i < n - if (!(i & p)) minmax(x+i+p,x+i+q); } }

slide-85
SLIDE 85

16

Positions of comparators in a sorting network are independent of the input. Naturally constant-time. But (n2 − n)=2 comparators produce complaints about performance as n increases. Speed is a serious issue in the post-quantum competition. “Cost” is evaluation criterion; “we’d like to stress this once again on the forum that we’d really like to see more platform-

  • ptimized implementations”; etc.

17

void int32_sort(int32 *x,int64 n) { int64 t,p,q,i; if (n < 2) return; t = 1; while (t < n - t) t += t; for (p = t;p > 0;p >>= 1) { for (i = 0;i < n - p;++i) if (!(i & p)) minmax(x+i,x+i+p); for (q = t;q > p;q >>= 1) for (i = 0;i < n - q;++i) if (!(i & p)) minmax(x+i+p,x+i+q); } }

slide-86
SLIDE 86

16

  • sitions of comparators

rting network are endent of the input. Naturally constant-time.

2 − n)=2 comparators

duce complaints about rmance as n increases. is a serious issue in the

  • st-quantum competition.

is evaluation criterion; like to stress this once

  • n the forum that we’d

like to see more platform-

  • ptimized implementations”; etc.

17

void int32_sort(int32 *x,int64 n) { int64 t,p,q,i; if (n < 2) return; t = 1; while (t < n - t) t += t; for (p = t;p > 0;p >>= 1) { for (i = 0;i < n - p;++i) if (!(i & p)) minmax(x+i,x+i+p); for (q = t;q > p;q >>= 1) for (i = 0;i < n - q;++i) if (!(i & p)) minmax(x+i+p,x+i+q); } }

Previous 1973 Knuth which is 1968 Batcher sorting net ≈n(log2 Much faster Warning:

  • f Batcher’s

require n Also, Wikip networks handling

slide-87
SLIDE 87

16

comparators

  • rk are

the input. constant-time. comparators complaints about n increases. serious issue in the competition. evaluation criterion; stress this once rum that we’d more platform- implementations”; etc.

17

void int32_sort(int32 *x,int64 n) { int64 t,p,q,i; if (n < 2) return; t = 1; while (t < n - t) t += t; for (p = t;p > 0;p >>= 1) { for (i = 0;i < n - p;++i) if (!(i & p)) minmax(x+i,x+i+p); for (q = t;q > p;q >>= 1) for (i = 0;i < n - q;++i) if (!(i & p)) minmax(x+i+p,x+i+q); } }

Previous slide: C translation 1973 Knuth “merge which is a simplified 1968 Batcher “odd-even sorting networks. ≈n(log2 n)2=4 compa Much faster than bubble Warning: many other

  • f Batcher’s sorting

require n to be a p Also, Wikipedia sa networks : : : are n handling arbitrarily

slide-88
SLIDE 88

16

rs increases. the etition. criterion;

  • nce

e’d platform- tations”; etc.

17

void int32_sort(int32 *x,int64 n) { int64 t,p,q,i; if (n < 2) return; t = 1; while (t < n - t) t += t; for (p = t;p > 0;p >>= 1) { for (i = 0;i < n - p;++i) if (!(i & p)) minmax(x+i,x+i+p); for (q = t;q > p;q >>= 1) for (i = 0;i < n - q;++i) if (!(i & p)) minmax(x+i+p,x+i+q); } }

Previous slide: C translation 1973 Knuth “merge exchange”, which is a simplified version 1968 Batcher “odd-even merge” sorting networks. ≈n(log2 n)2=4 comparators. Much faster than bubble sort. Warning: many other descriptions

  • f Batcher’s sorting networks

require n to be a power of 2. Also, Wikipedia says “Sorting networks : : : are not capable handling arbitrarily large inputs

slide-89
SLIDE 89

17

void int32_sort(int32 *x,int64 n) { int64 t,p,q,i; if (n < 2) return; t = 1; while (t < n - t) t += t; for (p = t;p > 0;p >>= 1) { for (i = 0;i < n - p;++i) if (!(i & p)) minmax(x+i,x+i+p); for (q = t;q > p;q >>= 1) for (i = 0;i < n - q;++i) if (!(i & p)) minmax(x+i+p,x+i+q); } }

18

Previous slide: C translation of 1973 Knuth “merge exchange”, which is a simplified version of 1968 Batcher “odd-even merge” sorting networks. ≈n(log2 n)2=4 comparators. Much faster than bubble sort. Warning: many other descriptions

  • f Batcher’s sorting networks

require n to be a power of 2. Also, Wikipedia says “Sorting networks : : : are not capable of handling arbitrarily large inputs.”

slide-90
SLIDE 90

17

int32_sort(int32 *x,int64 n) t,p,q,i; < 2) return; 1; (t < n - t) t += t; (p = t;p > 0;p >>= 1) { (i = 0;i < n - p;++i) if (!(i & p)) minmax(x+i,x+i+p); (q = t;q > p;q >>= 1) for (i = 0;i < n - q;++i) if (!(i & p)) minmax(x+i+p,x+i+q);

18

Previous slide: C translation of 1973 Knuth “merge exchange”, which is a simplified version of 1968 Batcher “odd-even merge” sorting networks. ≈n(log2 n)2=4 comparators. Much faster than bubble sort. Warning: many other descriptions

  • f Batcher’s sorting networks

require n to be a power of 2. Also, Wikipedia says “Sorting networks : : : are not capable of handling arbitrarily large inputs.” This constant-time Constant-time Bernstein–Chuengsatiansup– Lange–van “NTRU constant-time

slide-91
SLIDE 91

17

int32_sort(int32 *x,int64 n) return; t) t += t; 0;p >>= 1) { < n - p;++i) p)) minmax(x+i,x+i+p); > p;q >>= 1) 0;i < n - q;++i) & p)) minmax(x+i+p,x+i+q);

18

Previous slide: C translation of 1973 Knuth “merge exchange”, which is a simplified version of 1968 Batcher “odd-even merge” sorting networks. ≈n(log2 n)2=4 comparators. Much faster than bubble sort. Warning: many other descriptions

  • f Batcher’s sorting networks

require n to be a power of 2. Also, Wikipedia says “Sorting networks : : : are not capable of handling arbitrarily large inputs.” This constant-time vecto (fo

  • Constant-time so

included in Bernstein–Chuengsatiansup– Lange–van V “NTRU Prime” soft revamp higher

  • New: “djbso

constant-time so

slide-92
SLIDE 92

17

*x,int64 n) t; 1) { p;++i) minmax(x+i,x+i+p); >>= 1) q;++i) minmax(x+i+p,x+i+q);

18

Previous slide: C translation of 1973 Knuth “merge exchange”, which is a simplified version of 1968 Batcher “odd-even merge” sorting networks. ≈n(log2 n)2=4 comparators. Much faster than bubble sort. Warning: many other descriptions

  • f Batcher’s sorting networks

require n to be a power of 2. Also, Wikipedia says “Sorting networks : : : are not capable of handling arbitrarily large inputs.” This constant-time sorting co vectorization (for Haswell)

  • Constant-time sorting code

included in 2017 Bernstein–Chuengsatiansup– Lange–van Vredendaal “NTRU Prime” software release revamped fo higher speed

  • New: “djbsort”

constant-time sorting code

slide-93
SLIDE 93

18

Previous slide: C translation of 1973 Knuth “merge exchange”, which is a simplified version of 1968 Batcher “odd-even merge” sorting networks. ≈n(log2 n)2=4 comparators. Much faster than bubble sort. Warning: many other descriptions

  • f Batcher’s sorting networks

require n to be a power of 2. Also, Wikipedia says “Sorting networks : : : are not capable of handling arbitrarily large inputs.”

19

This constant-time sorting code vectorization (for Haswell)

  • Constant-time sorting code

included in 2017 Bernstein–Chuengsatiansup– Lange–van Vredendaal “NTRU Prime” software release revamped for higher speed

  • New: “djbsort”

constant-time sorting code

slide-94
SLIDE 94

18

Previous slide: C translation of Knuth “merge exchange”, is a simplified version of Batcher “odd-even merge” networks. (log2 n)2=4 comparators. faster than bubble sort. rning: many other descriptions Batcher’s sorting networks n to be a power of 2. Wikipedia says “Sorting rks : : : are not capable of handling arbitrarily large inputs.”

19

This constant-time sorting code vectorization (for Haswell)

  • Constant-time sorting code

included in 2017 Bernstein–Chuengsatiansup– Lange–van Vredendaal “NTRU Prime” software release revamped for higher speed

  • New: “djbsort”

constant-time sorting code The slowdo Massive 2015 Gueron–Krasnov: AVX2 (Hasw quicksort. ≈45 cycles/b ≈55 cycles/b Slower than implemented the fastest aware of IPP: Intel’s Performance

slide-95
SLIDE 95

18

translation of “merge exchange”, simplified version of dd-even merge” rks. comparators. than bubble sort.

  • ther descriptions

rting networks power of 2. says “Sorting not capable of rily large inputs.”

19

This constant-time sorting code vectorization (for Haswell)

  • Constant-time sorting code

included in 2017 Bernstein–Chuengsatiansup– Lange–van Vredendaal “NTRU Prime” software release revamped for higher speed

  • New: “djbsort”

constant-time sorting code The slowdown for Massive fast-sorting 2015 Gueron–Krasnov: AVX2 (Haswell) optimization

  • quicksort. For 32-bit

≈45 cycles/byte fo ≈55 cycles/byte fo Slower than “the radix implemented of IPP the fastest in-memo aware of”: 32, 40 IPP: Intel’s Integrated Performance Primitives

slide-96
SLIDE 96

18

translation of exchange”, version of merge” rs. sort. descriptions rks 2. rting capable of inputs.”

19

This constant-time sorting code vectorization (for Haswell)

  • Constant-time sorting code

included in 2017 Bernstein–Chuengsatiansup– Lange–van Vredendaal “NTRU Prime” software release revamped for higher speed

  • New: “djbsort”

constant-time sorting code The slowdown for constant time Massive fast-sorting literature. 2015 Gueron–Krasnov: AVX AVX2 (Haswell) optimization

  • quicksort. For 32-bit integers:

≈45 cycles/byte for n ≈ 210 ≈55 cycles/byte for n ≈ 220 Slower than “the radix sort implemented of IPP, which is the fastest in-memory sort w aware of”: 32, 40 cycles/byte. IPP: Intel’s Integrated Performance Primitives library

slide-97
SLIDE 97

19

This constant-time sorting code vectorization (for Haswell)

  • Constant-time sorting code

included in 2017 Bernstein–Chuengsatiansup– Lange–van Vredendaal “NTRU Prime” software release revamped for higher speed

  • New: “djbsort”

constant-time sorting code

20

The slowdown for constant time Massive fast-sorting literature. 2015 Gueron–Krasnov: AVX and AVX2 (Haswell) optimization of

  • quicksort. For 32-bit integers:

≈45 cycles/byte for n ≈ 210, ≈55 cycles/byte for n ≈ 220. Slower than “the radix sort implemented of IPP, which is the fastest in-memory sort we are aware of”: 32, 40 cycles/byte. IPP: Intel’s Integrated Performance Primitives library.

slide-98
SLIDE 98

19

constant-time sorting code vectorization (for Haswell)

  • Constant-time sorting code

included in 2017 Bernstein–Chuengsatiansup– Lange–van Vredendaal “NTRU Prime” software release revamped for higher speed

  • New: “djbsort”

constant-time sorting code

20

The slowdown for constant time Massive fast-sorting literature. 2015 Gueron–Krasnov: AVX and AVX2 (Haswell) optimization of

  • quicksort. For 32-bit integers:

≈45 cycles/byte for n ≈ 210, ≈55 cycles/byte for n ≈ 220. Slower than “the radix sort implemented of IPP, which is the fastest in-memory sort we are aware of”: 32, 40 cycles/byte. IPP: Intel’s Integrated Performance Primitives library. Constant-time again on

slide-99
SLIDE 99

19

constant-time sorting code vectorization (for Haswell) Constant-time sorting code in 2017 Bernstein–Chuengsatiansup– Vredendaal software release revamped for higher speed “djbsort” constant-time sorting code

20

The slowdown for constant time Massive fast-sorting literature. 2015 Gueron–Krasnov: AVX and AVX2 (Haswell) optimization of

  • quicksort. For 32-bit integers:

≈45 cycles/byte for n ≈ 210, ≈55 cycles/byte for n ≈ 220. Slower than “the radix sort implemented of IPP, which is the fastest in-memory sort we are aware of”: 32, 40 cycles/byte. IPP: Intel’s Integrated Performance Primitives library. Constant-time results, again on Haswell CPU

slide-100
SLIDE 100

19

rting code rization ell) code Bernstein–Chuengsatiansup– redendaal release ed for eed code

20

The slowdown for constant time Massive fast-sorting literature. 2015 Gueron–Krasnov: AVX and AVX2 (Haswell) optimization of

  • quicksort. For 32-bit integers:

≈45 cycles/byte for n ≈ 210, ≈55 cycles/byte for n ≈ 220. Slower than “the radix sort implemented of IPP, which is the fastest in-memory sort we are aware of”: 32, 40 cycles/byte. IPP: Intel’s Integrated Performance Primitives library. Constant-time results, again on Haswell CPU core:

slide-101
SLIDE 101

20

The slowdown for constant time Massive fast-sorting literature. 2015 Gueron–Krasnov: AVX and AVX2 (Haswell) optimization of

  • quicksort. For 32-bit integers:

≈45 cycles/byte for n ≈ 210, ≈55 cycles/byte for n ≈ 220. Slower than “the radix sort implemented of IPP, which is the fastest in-memory sort we are aware of”: 32, 40 cycles/byte. IPP: Intel’s Integrated Performance Primitives library.

21

Constant-time results, again on Haswell CPU core:

slide-102
SLIDE 102

20

The slowdown for constant time Massive fast-sorting literature. 2015 Gueron–Krasnov: AVX and AVX2 (Haswell) optimization of

  • quicksort. For 32-bit integers:

≈45 cycles/byte for n ≈ 210, ≈55 cycles/byte for n ≈ 220. Slower than “the radix sort implemented of IPP, which is the fastest in-memory sort we are aware of”: 32, 40 cycles/byte. IPP: Intel’s Integrated Performance Primitives library.

21

Constant-time results, again on Haswell CPU core: 2017 BCLvV: 6:5 cycles/byte for n ≈ 210, 33 cycles/byte for n ≈ 220.

slide-103
SLIDE 103

20

The slowdown for constant time Massive fast-sorting literature. 2015 Gueron–Krasnov: AVX and AVX2 (Haswell) optimization of

  • quicksort. For 32-bit integers:

≈45 cycles/byte for n ≈ 210, ≈55 cycles/byte for n ≈ 220. Slower than “the radix sort implemented of IPP, which is the fastest in-memory sort we are aware of”: 32, 40 cycles/byte. IPP: Intel’s Integrated Performance Primitives library.

21

Constant-time results, again on Haswell CPU core: 2017 BCLvV: 6:5 cycles/byte for n ≈ 210, 33 cycles/byte for n ≈ 220. 2018 djbsort: 2:5 cycles/byte for n ≈ 210, 15:5 cycles/byte for n ≈ 220.

slide-104
SLIDE 104

20

The slowdown for constant time Massive fast-sorting literature. 2015 Gueron–Krasnov: AVX and AVX2 (Haswell) optimization of

  • quicksort. For 32-bit integers:

≈45 cycles/byte for n ≈ 210, ≈55 cycles/byte for n ≈ 220. Slower than “the radix sort implemented of IPP, which is the fastest in-memory sort we are aware of”: 32, 40 cycles/byte. IPP: Intel’s Integrated Performance Primitives library.

21

Constant-time results, again on Haswell CPU core: 2017 BCLvV: 6:5 cycles/byte for n ≈ 210, 33 cycles/byte for n ≈ 220. 2018 djbsort: 2:5 cycles/byte for n ≈ 210, 15:5 cycles/byte for n ≈ 220. No slowdown. New speed records!

slide-105
SLIDE 105

20

The slowdown for constant time Massive fast-sorting literature. 2015 Gueron–Krasnov: AVX and AVX2 (Haswell) optimization of

  • quicksort. For 32-bit integers:

≈45 cycles/byte for n ≈ 210, ≈55 cycles/byte for n ≈ 220. Slower than “the radix sort implemented of IPP, which is the fastest in-memory sort we are aware of”: 32, 40 cycles/byte. IPP: Intel’s Integrated Performance Primitives library.

21

Constant-time results, again on Haswell CPU core: 2017 BCLvV: 6:5 cycles/byte for n ≈ 210, 33 cycles/byte for n ≈ 220. 2018 djbsort: 2:5 cycles/byte for n ≈ 210, 15:5 cycles/byte for n ≈ 220. No slowdown. New speed records! Warning: Comparison for n ≈ 220 involves microarchitecture details beyond Haswell core. Should measure all code on same CPU.

slide-106
SLIDE 106

20

slowdown for constant time Massive fast-sorting literature. Gueron–Krasnov: AVX and (Haswell) optimization of

  • quicksort. For 32-bit integers:

cycles/byte for n ≈ 210, cycles/byte for n ≈ 220. than “the radix sort implemented of IPP, which is fastest in-memory sort we are

  • f”: 32, 40 cycles/byte.

Intel’s Integrated rmance Primitives library.

21

Constant-time results, again on Haswell CPU core: 2017 BCLvV: 6:5 cycles/byte for n ≈ 210, 33 cycles/byte for n ≈ 220. 2018 djbsort: 2:5 cycles/byte for n ≈ 210, 15:5 cycles/byte for n ≈ 220. No slowdown. New speed records! Warning: Comparison for n ≈ 220 involves microarchitecture details beyond Haswell core. Should measure all code on same CPU. How can beat standa

slide-107
SLIDE 107

20

for constant time rting literature. Gueron–Krasnov: AVX and

  • ptimization of

32-bit integers: for n ≈ 210, for n ≈ 220. “the radix sort IPP, which is in-memory sort we are 40 cycles/byte. Integrated Primitives library.

21

Constant-time results, again on Haswell CPU core: 2017 BCLvV: 6:5 cycles/byte for n ≈ 210, 33 cycles/byte for n ≈ 220. 2018 djbsort: 2:5 cycles/byte for n ≈ 210, 15:5 cycles/byte for n ≈ 220. No slowdown. New speed records! Warning: Comparison for n ≈ 220 involves microarchitecture details beyond Haswell core. Should measure all code on same CPU. How can an n(log n beat standard n log

slide-108
SLIDE 108

20

constant time literature. VX and

  • ptimization of

integers:

10, 20.

rt which is we are yte. rary.

21

Constant-time results, again on Haswell CPU core: 2017 BCLvV: 6:5 cycles/byte for n ≈ 210, 33 cycles/byte for n ≈ 220. 2018 djbsort: 2:5 cycles/byte for n ≈ 210, 15:5 cycles/byte for n ≈ 220. No slowdown. New speed records! Warning: Comparison for n ≈ 220 involves microarchitecture details beyond Haswell core. Should measure all code on same CPU. How can an n(log n)2 algorithm beat standard n log n algorithms?

slide-109
SLIDE 109

21

Constant-time results, again on Haswell CPU core: 2017 BCLvV: 6:5 cycles/byte for n ≈ 210, 33 cycles/byte for n ≈ 220. 2018 djbsort: 2:5 cycles/byte for n ≈ 210, 15:5 cycles/byte for n ≈ 220. No slowdown. New speed records! Warning: Comparison for n ≈ 220 involves microarchitecture details beyond Haswell core. Should measure all code on same CPU.

22

How can an n(log n)2 algorithm beat standard n log n algorithms?

slide-110
SLIDE 110

21

Constant-time results, again on Haswell CPU core: 2017 BCLvV: 6:5 cycles/byte for n ≈ 210, 33 cycles/byte for n ≈ 220. 2018 djbsort: 2:5 cycles/byte for n ≈ 210, 15:5 cycles/byte for n ≈ 220. No slowdown. New speed records! Warning: Comparison for n ≈ 220 involves microarchitecture details beyond Haswell core. Should measure all code on same CPU.

22

How can an n(log n)2 algorithm beat standard n log n algorithms? Answer: well-known trends in CPU design, reflecting fundamental hardware costs

  • f various operations.
slide-111
SLIDE 111

21

Constant-time results, again on Haswell CPU core: 2017 BCLvV: 6:5 cycles/byte for n ≈ 210, 33 cycles/byte for n ≈ 220. 2018 djbsort: 2:5 cycles/byte for n ≈ 210, 15:5 cycles/byte for n ≈ 220. No slowdown. New speed records! Warning: Comparison for n ≈ 220 involves microarchitecture details beyond Haswell core. Should measure all code on same CPU.

22

How can an n(log n)2 algorithm beat standard n log n algorithms? Answer: well-known trends in CPU design, reflecting fundamental hardware costs

  • f various operations.

Every cycle, Haswell core can do 8 “min” ops on 32-bit integers + 8 “max” ops on 32-bit integers.

slide-112
SLIDE 112

21

Constant-time results, again on Haswell CPU core: 2017 BCLvV: 6:5 cycles/byte for n ≈ 210, 33 cycles/byte for n ≈ 220. 2018 djbsort: 2:5 cycles/byte for n ≈ 210, 15:5 cycles/byte for n ≈ 220. No slowdown. New speed records! Warning: Comparison for n ≈ 220 involves microarchitecture details beyond Haswell core. Should measure all code on same CPU.

22

How can an n(log n)2 algorithm beat standard n log n algorithms? Answer: well-known trends in CPU design, reflecting fundamental hardware costs

  • f various operations.

Every cycle, Haswell core can do 8 “min” ops on 32-bit integers + 8 “max” ops on 32-bit integers. Loading a 32-bit integer from a random address: much slower. Conditional branch: much slower.

slide-113
SLIDE 113

21

Constant-time results,

  • n Haswell CPU core:

BCLvV: cycles/byte for n ≈ 210, cycles/byte for n ≈ 220. djbsort: cycles/byte for n ≈ 210, cycles/byte for n ≈ 220.

  • wdown. New speed records!

rning: Comparison for n ≈ 220 involves microarchitecture details

  • nd Haswell core. Should

measure all code on same CPU.

22

How can an n(log n)2 algorithm beat standard n log n algorithms? Answer: well-known trends in CPU design, reflecting fundamental hardware costs

  • f various operations.

Every cycle, Haswell core can do 8 “min” ops on 32-bit integers + 8 “max” ops on 32-bit integers. Loading a 32-bit integer from a random address: much slower. Conditional branch: much slower. Verification Sorting s Does it w Test the random inputs, decreasing

slide-114
SLIDE 114

21

results, ell CPU core: for n ≈ 210, for n ≈ 220. for n ≈ 210, for n ≈ 220. New speed records! Comparison for n ≈ 220 rchitecture details

  • core. Should
  • n same CPU.

22

How can an n(log n)2 algorithm beat standard n log n algorithms? Answer: well-known trends in CPU design, reflecting fundamental hardware costs

  • f various operations.

Every cycle, Haswell core can do 8 “min” ops on 32-bit integers + 8 “max” ops on 32-bit integers. Loading a 32-bit integer from a random address: much slower. Conditional branch: much slower. Verification Sorting software is Does it work corre Test the sorting soft random inputs, increasing decreasing inputs.

slide-115
SLIDE 115

21

re: , . ,

20.

records! n ≈ 220 details Should CPU.

22

How can an n(log n)2 algorithm beat standard n log n algorithms? Answer: well-known trends in CPU design, reflecting fundamental hardware costs

  • f various operations.

Every cycle, Haswell core can do 8 “min” ops on 32-bit integers + 8 “max” ops on 32-bit integers. Loading a 32-bit integer from a random address: much slower. Conditional branch: much slower. Verification Sorting software is in the TCB. Does it work correctly? Test the sorting software on random inputs, increasing inputs, decreasing inputs. Seems to

slide-116
SLIDE 116

22

How can an n(log n)2 algorithm beat standard n log n algorithms? Answer: well-known trends in CPU design, reflecting fundamental hardware costs

  • f various operations.

Every cycle, Haswell core can do 8 “min” ops on 32-bit integers + 8 “max” ops on 32-bit integers. Loading a 32-bit integer from a random address: much slower. Conditional branch: much slower.

23

Verification Sorting software is in the TCB. Does it work correctly? Test the sorting software on many random inputs, increasing inputs, decreasing inputs. Seems to work.

slide-117
SLIDE 117

22

How can an n(log n)2 algorithm beat standard n log n algorithms? Answer: well-known trends in CPU design, reflecting fundamental hardware costs

  • f various operations.

Every cycle, Haswell core can do 8 “min” ops on 32-bit integers + 8 “max” ops on 32-bit integers. Loading a 32-bit integer from a random address: much slower. Conditional branch: much slower.

23

Verification Sorting software is in the TCB. Does it work correctly? Test the sorting software on many random inputs, increasing inputs, decreasing inputs. Seems to work. But are there occasional inputs where this sorting software fails to sort correctly? History: Many security problems involve occasional inputs where TCB works incorrectly.

slide-118
SLIDE 118

22

can an n(log n)2 algorithm standard n log n algorithms? er: well-known trends design, reflecting fundamental hardware costs rious operations. cycle, Haswell core can do “min” ops on 32-bit integers + “max” ops on 32-bit integers. Loading a 32-bit integer from a address: much slower. Conditional branch: much slower.

23

Verification Sorting software is in the TCB. Does it work correctly? Test the sorting software on many random inputs, increasing inputs, decreasing inputs. Seems to work. But are there occasional inputs where this sorting software fails to sort correctly? History: Many security problems involve occasional inputs where TCB works incorrectly. For each machine fully unrolled unrolled yes,

slide-119
SLIDE 119

22

(log n)2 algorithm log n algorithms? ell-known trends reflecting rdware costs tions. Haswell core can do 32-bit integers + 32-bit integers. integer from a much slower. ranch: much slower.

23

Verification Sorting software is in the TCB. Does it work correctly? Test the sorting software on many random inputs, increasing inputs, decreasing inputs. Seems to work. But are there occasional inputs where this sorting software fails to sort correctly? History: Many security problems involve occasional inputs where TCB works incorrectly. For each used n (e.g., C code normal

  • machine code

symbolic

  • fully unrolled co

new p

  • unrolled min-max

new so

  • yes, code works
slide-120
SLIDE 120

22

rithm rithms? trends costs can do integers + integers. from a wer. slower.

23

Verification Sorting software is in the TCB. Does it work correctly? Test the sorting software on many random inputs, increasing inputs, decreasing inputs. Seems to work. But are there occasional inputs where this sorting software fails to sort correctly? History: Many security problems involve occasional inputs where TCB works incorrectly. For each used n (e.g., 768): C code normal compiler

  • machine code

symbolic execution

  • fully unrolled code

new peephole optim

  • unrolled min-max code

new sorting verifier

  • yes, code works
slide-121
SLIDE 121

23

Verification Sorting software is in the TCB. Does it work correctly? Test the sorting software on many random inputs, increasing inputs, decreasing inputs. Seems to work. But are there occasional inputs where this sorting software fails to sort correctly? History: Many security problems involve occasional inputs where TCB works incorrectly.

24

For each used n (e.g., 768): C code normal compiler

  • machine code

symbolic execution

  • fully unrolled code

new peephole optimizer

  • unrolled min-max code

new sorting verifier

  • yes, code works
slide-122
SLIDE 122

23

erification rting software is in the TCB. it work correctly? the sorting software on many inputs, increasing inputs, decreasing inputs. Seems to work. re there occasional inputs this sorting software sort correctly? ry: Many security problems

  • ccasional inputs

TCB works incorrectly.

24

For each used n (e.g., 768): C code normal compiler

  • machine code

symbolic execution

  • fully unrolled code

new peephole optimizer

  • unrolled min-max code

new sorting verifier

  • yes, code works

Symbolic use existing with tiny eliminating a few missing

slide-123
SLIDE 123

23

is in the TCB. rrectly? software on many increasing inputs,

  • inputs. Seems to work.

ccasional inputs rting software rrectly? security problems ccasional inputs rks incorrectly.

24

For each used n (e.g., 768): C code normal compiler

  • machine code

symbolic execution

  • fully unrolled code

new peephole optimizer

  • unrolled min-max code

new sorting verifier

  • yes, code works

Symbolic execution: use existing “angr” with tiny new patches eliminating byte splitting, a few missing vecto

slide-124
SLIDE 124

23

TCB.

  • n many

inputs, to work. inputs roblems rrectly.

24

For each used n (e.g., 768): C code normal compiler

  • machine code

symbolic execution

  • fully unrolled code

new peephole optimizer

  • unrolled min-max code

new sorting verifier

  • yes, code works

Symbolic execution: use existing “angr” library, with tiny new patches for eliminating byte splitting, adding a few missing vector instructions.

slide-125
SLIDE 125

24

For each used n (e.g., 768): C code normal compiler

  • machine code

symbolic execution

  • fully unrolled code

new peephole optimizer

  • unrolled min-max code

new sorting verifier

  • yes, code works

25

Symbolic execution: use existing “angr” library, with tiny new patches for eliminating byte splitting, adding a few missing vector instructions.

slide-126
SLIDE 126

24

For each used n (e.g., 768): C code normal compiler

  • machine code

symbolic execution

  • fully unrolled code

new peephole optimizer

  • unrolled min-max code

new sorting verifier

  • yes, code works

25

Symbolic execution: use existing “angr” library, with tiny new patches for eliminating byte splitting, adding a few missing vector instructions. Peephole optimizer: recognize instruction patterns equivalent to min, max.

slide-127
SLIDE 127

24

For each used n (e.g., 768): C code normal compiler

  • machine code

symbolic execution

  • fully unrolled code

new peephole optimizer

  • unrolled min-max code

new sorting verifier

  • yes, code works

25

Symbolic execution: use existing “angr” library, with tiny new patches for eliminating byte splitting, adding a few missing vector instructions. Peephole optimizer: recognize instruction patterns equivalent to min, max. Sorting verifier: decompose DAG into merging networks. Verify each merging network using generalization of 2007 Even–Levi–Litman, correction of 1990 Chung–Ravikumar.

slide-128
SLIDE 128

24

each used n (e.g., 768): C code normal compiler

  • machine code

symbolic execution

  • fully unrolled code

new peephole optimizer

  • unrolled min-max code

new sorting verifier

  • es, code works

25

Symbolic execution: use existing “angr” library, with tiny new patches for eliminating byte splitting, adding a few missing vector instructions. Peephole optimizer: recognize instruction patterns equivalent to min, max. Sorting verifier: decompose DAG into merging networks. Verify each merging network using generalization of 2007 Even–Levi–Litman, correction of 1990 Chung–Ravikumar. First djbso verified int32 https://sorting.cr.yp.to Includes automatic simple b verification Web site use the verification Next release verified ARM and verified

slide-129
SLIDE 129

24

(e.g., 768): rmal compiler de symbolic execution code peephole optimizer min-max code sorting verifier

  • rks

25

Symbolic execution: use existing “angr” library, with tiny new patches for eliminating byte splitting, adding a few missing vector instructions. Peephole optimizer: recognize instruction patterns equivalent to min, max. Sorting verifier: decompose DAG into merging networks. Verify each merging network using generalization of 2007 Even–Levi–Litman, correction of 1990 Chung–Ravikumar. First djbsort release, verified int32 on A https://sorting.cr.yp.to Includes the sorting automatic build-time simple benchmarking verification tools. Web site shows ho use the verification Next release planned: verified ARM NEON and verified portable

slide-130
SLIDE 130

24

768):

  • mpiler

execution

  • ptimizer

verifier

25

Symbolic execution: use existing “angr” library, with tiny new patches for eliminating byte splitting, adding a few missing vector instructions. Peephole optimizer: recognize instruction patterns equivalent to min, max. Sorting verifier: decompose DAG into merging networks. Verify each merging network using generalization of 2007 Even–Levi–Litman, correction of 1990 Chung–Ravikumar. First djbsort release, verified int32 on AVX2: https://sorting.cr.yp.to Includes the sorting code; automatic build-time tests; simple benchmarking program; verification tools. Web site shows how to use the verification tools. Next release planned: verified ARM NEON code and verified portable code.

slide-131
SLIDE 131

25

Symbolic execution: use existing “angr” library, with tiny new patches for eliminating byte splitting, adding a few missing vector instructions. Peephole optimizer: recognize instruction patterns equivalent to min, max. Sorting verifier: decompose DAG into merging networks. Verify each merging network using generalization of 2007 Even–Levi–Litman, correction of 1990 Chung–Ravikumar.

26

First djbsort release, verified int32 on AVX2: https://sorting.cr.yp.to Includes the sorting code; automatic build-time tests; simple benchmarking program; verification tools. Web site shows how to use the verification tools. Next release planned: verified ARM NEON code and verified portable code.