Constant-time This code uses 256 squarings, square-and-multiply - - PowerPoint PPT Presentation

constant time this code uses 256 squarings square and
SMART_READER_LITE
LIVE PREVIEW

Constant-time This code uses 256 squarings, square-and-multiply - - PowerPoint PPT Presentation

1 2 Constant-time This code uses 256 squarings, square-and-multiply plus 1 extra multiplication for each bit set in e . D. J. Bernstein Problem when e is secret: time University of Illinois at Chicago; leaks number of bits set in e . Ruhr


slide-1
SLIDE 1

1

Constant-time square-and-multiply

  • D. J. Bernstein

University of Illinois at Chicago; Ruhr University Bochum

def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y if 1&(e>>i): y = y*x return y

2

This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e.

slide-2
SLIDE 2

1

Constant-time square-and-multiply

  • D. J. Bernstein

University of Illinois at Chicago; Ruhr University Bochum

def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y if 1&(e>>i): y = y*x return y

2

This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.”

slide-3
SLIDE 3

1

Constant-time square-and-multiply

  • D. J. Bernstein

University of Illinois at Chicago; Ruhr University Bochum

def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y if 1&(e>>i): y = y*x return y

2

This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.” — Time still depends on e, even if each multiplication takes time independent of inputs.

slide-4
SLIDE 4

1

Constant-time re-and-multiply Bernstein University of Illinois at Chicago; University Bochum

pow256bit(x,e): in reversed(range(256)): y*y 1&(e>>i): = y*x return y

2

This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.” — Time still depends on e, even if each multiplication takes time independent of inputs. Hardware is inherently CPU designers

slide-5
SLIDE 5

1

re-and-multiply Illinois at Chicago; Bochum

pow256bit(x,e): reversed(range(256)):

2

This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.” — Time still depends on e, even if each multiplication takes time independent of inputs. Hardware reality: Accessing is inherently expensive CPU designers try

slide-6
SLIDE 6

1

Chicago;

reversed(range(256)):

2

This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.” — Time still depends on e, even if each multiplication takes time independent of inputs. Hardware reality: Accessing is inherently expensive. CPU designers try to reduce

slide-7
SLIDE 7

2

This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.” — Time still depends on e, even if each multiplication takes time independent of inputs.

3

Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost.

slide-8
SLIDE 8

2

This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.” — Time still depends on e, even if each multiplication takes time independent of inputs.

3

Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost. Example: “L1 cache” typically has 32KB of recently used data. This cache inspects RAM addresses, performs various computations on addresses to try to save time.

slide-9
SLIDE 9

2

This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.” — Time still depends on e, even if each multiplication takes time independent of inputs.

3

Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost. Example: “L1 cache” typically has 32KB of recently used data. This cache inspects RAM addresses, performs various computations on addresses to try to save time. : : : so time is a function of RAM

  • addresses. Avoid all data flow

from secrets to RAM addresses.

slide-10
SLIDE 10

2

code uses 256 squarings, extra multiplication h bit set in e. Problem when e is secret: time number of bits set in e. choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then are no more leaks.” Time still depends on e, if each multiplication time independent of inputs.

3

Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost. Example: “L1 cache” typically has 32KB of recently used data. This cache inspects RAM addresses, performs various computations on addresses to try to save time. : : : so time is a function of RAM

  • addresses. Avoid all data flow

from secrets to RAM addresses. Example: from secrets Often describ for softw the same

slide-11
SLIDE 11

2

256 squarings, multiplication in e. is secret: time bits set in e. cret 256-bit e with

  • set. There are

e, and then re leaks.” depends on e, multiplication endent of inputs.

3

Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost. Example: “L1 cache” typically has 32KB of recently used data. This cache inspects RAM addresses, performs various computations on addresses to try to save time. : : : so time is a function of RAM

  • addresses. Avoid all data flow

from secrets to RAM addresses. Example: Avoid all from secrets to branch Often described as for software, but comes the same hardware

slide-12
SLIDE 12

2

rings, time e. e with There are then , inputs.

3

Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost. Example: “L1 cache” typically has 32KB of recently used data. This cache inspects RAM addresses, performs various computations on addresses to try to save time. : : : so time is a function of RAM

  • addresses. Avoid all data flow

from secrets to RAM addresses. Example: Avoid all data flow from secrets to branch condi Often described as a separate for software, but comes from the same hardware reality.

slide-13
SLIDE 13

3

Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost. Example: “L1 cache” typically has 32KB of recently used data. This cache inspects RAM addresses, performs various computations on addresses to try to save time. : : : so time is a function of RAM

  • addresses. Avoid all data flow

from secrets to RAM addresses.

4

Example: Avoid all data flow from secrets to branch conditions. Often described as a separate rule for software, but comes from the same hardware reality.

slide-14
SLIDE 14

3

Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost. Example: “L1 cache” typically has 32KB of recently used data. This cache inspects RAM addresses, performs various computations on addresses to try to save time. : : : so time is a function of RAM

  • addresses. Avoid all data flow

from secrets to RAM addresses.

4

Example: Avoid all data flow from secrets to branch conditions. Often described as a separate rule for software, but comes from the same hardware reality. How CPU runs a program (example of “code = data”):

while True: insn = RAM[state.ip] state = execute(state,insn)

ip (“instruction pointer” or “program counter”): address in RAM of next instruction.

slide-15
SLIDE 15

3

are reality: Accessing RAM inherently expensive. designers try to reduce cost. Example: “L1 cache” typically 32KB of recently used data. cache inspects RAM addresses, performs various computations on addresses to save time. time is a function of RAM

  • addresses. Avoid all data flow

secrets to RAM addresses.

4

Example: Avoid all data flow from secrets to branch conditions. Often described as a separate rule for software, but comes from the same hardware reality. How CPU runs a program (example of “code = data”):

while True: insn = RAM[state.ip] state = execute(state,insn)

ip (“instruction pointer” or “program counter”): address in RAM of next instruction. Standard to follow Square and

def pow256bit(x,e): y = 1 for i y = yx = bit y = return

If bit is an unused

slide-16
SLIDE 16

3

y: Accessing RAM ensive. try to reduce cost. cache” typically recently used data. ects RAM rms various addresses time. function of RAM all data flow RAM addresses.

4

Example: Avoid all data flow from secrets to branch conditions. Often described as a separate rule for software, but comes from the same hardware reality. How CPU runs a program (example of “code = data”):

while True: insn = RAM[state.ip] state = execute(state,insn)

ip (“instruction pointer” or “program counter”): address in RAM of next instruction. Standard square-and-m to follow these data-flo Square and always

def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y yx = y*x bit = 1&(e>>i) y = y+(yx-y)*bit return y

If bit is 0 then yx an unused “dummy

slide-17
SLIDE 17

3

Accessing RAM reduce cost. ypically data. rious sses

  • f RAM

flow addresses.

4

Example: Avoid all data flow from secrets to branch conditions. Often described as a separate rule for software, but comes from the same hardware reality. How CPU runs a program (example of “code = data”):

while True: insn = RAM[state.ip] state = execute(state,insn)

ip (“instruction pointer” or “program counter”): address in RAM of next instruction. Standard square-and-multiply to follow these data-flow rules: Square and always multiply.

def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y yx = y*x bit = 1&(e>>i) y = y+(yx-y)*bit return y

If bit is 0 then yx computation an unused “dummy operation”.

slide-18
SLIDE 18

4

Example: Avoid all data flow from secrets to branch conditions. Often described as a separate rule for software, but comes from the same hardware reality. How CPU runs a program (example of “code = data”):

while True: insn = RAM[state.ip] state = execute(state,insn)

ip (“instruction pointer” or “program counter”): address in RAM of next instruction.

5

Standard square-and-multiply fix to follow these data-flow rules: Square and always multiply.

def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y yx = y*x bit = 1&(e>>i) y = y+(yx-y)*bit return y

If bit is 0 then yx computation is an unused “dummy operation”.

slide-19
SLIDE 19

4

Example: Avoid all data flow secrets to branch conditions. described as a separate rule tware, but comes from same hardware reality. CPU runs a program (example of “code = data”):

True: = RAM[state.ip] = execute(state,insn)

(“instruction pointer” or rogram counter”): address

  • f next instruction.

5

Standard square-and-multiply fix to follow these data-flow rules: Square and always multiply.

def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y yx = y*x bit = 1&(e>>i) y = y+(yx-y)*bit return y

If bit is 0 then yx computation is an unused “dummy operation”. Another

def pow256bit(x,e): y,i,j while if j y if else: else: y i,j return

slide-20
SLIDE 20

4

all data flow branch conditions. as a separate rule comes from re reality. a program de = data”):

RAM[state.ip] execute(state,insn)

pointer” or ter”): address instruction.

5

Standard square-and-multiply fix to follow these data-flow rules: Square and always multiply.

def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y yx = y*x bit = 1&(e>>i) y = y+(yx-y)*bit return y

If bit is 0 then yx computation is an unused “dummy operation”. Another approach,

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y

slide-21
SLIDE 21

4

flow conditions. rate rule from data”):

execute(state,insn)

  • r

address instruction.

5

Standard square-and-multiply fix to follow these data-flow rules: Square and always multiply.

def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y yx = y*x bit = 1&(e>>i) y = y+(yx-y)*bit return y

If bit is 0 then yx computation is an unused “dummy operation”. Another approach, not well kno

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y

slide-22
SLIDE 22

5

Standard square-and-multiply fix to follow these data-flow rules: Square and always multiply.

def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y yx = y*x bit = 1&(e>>i) y = y+(yx-y)*bit return y

If bit is 0 then yx computation is an unused “dummy operation”.

6

Another approach, not well known:

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y

slide-23
SLIDE 23

5

Standard square-and-multiply fix follow these data-flow rules: and always multiply.

pow256bit(x,e): in reversed(range(256)): y*y = y*x = 1&(e>>i) y+(yx-y)*bit return y

is 0 then yx computation is unused “dummy operation”.

6

Another approach, not well known:

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y

This is lik

  • riginal squa

j is “instruction 0 if at top 1 if in middle Each “instruction” includes

slide-24
SLIDE 24

5

re-and-multiply fix data-flow rules: ys multiply.

pow256bit(x,e): reversed(range(256)): 1&(e>>i) y+(yx-y)*bit

yx computation is my operation”.

6

Another approach, not well known:

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y

This is like CPU’s

  • riginal square-and-multiply

j is “instruction pointer”: 0 if at top of loop, 1 if in middle of lo Each “instruction” includes exactly one

slide-25
SLIDE 25

5

ltiply fix rules: multiply.

reversed(range(256)):

computation is eration”.

6

Another approach, not well known:

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y

This is like CPU’s perspective

  • riginal square-and-multiply.

j is “instruction pointer”: 0 if at top of loop, 1 if in middle of loop. Each “instruction” here includes exactly one multiply

slide-26
SLIDE 26

6

Another approach, not well known:

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y

7

This is like CPU’s perspective on

  • riginal square-and-multiply.

j is “instruction pointer”: 0 if at top of loop, 1 if in middle of loop. Each “instruction” here includes exactly one multiply.

slide-27
SLIDE 27

6

Another approach, not well known:

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y

7

This is like CPU’s perspective on

  • riginal square-and-multiply.

j is “instruction pointer”: 0 if at top of loop, 1 if in middle of loop. Each “instruction” here includes exactly one multiply. Try to choose instruction set with big useful operations, avoiding control overhead. Analogous to designing CPU.

slide-28
SLIDE 28

6

Another approach, not well known:

pow256bit(x,e): = 1,255,0 i >= 0: j == 0: = y*y if 1&(e>>i): j = 1 else: i = i-1 else: = y*x i,j = i-1,0 return y

7

This is like CPU’s perspective on

  • riginal square-and-multiply.

j is “instruction pointer”: 0 if at top of loop, 1 if in middle of loop. Each “instruction” here includes exactly one multiply. Try to choose instruction set with big useful operations, avoiding control overhead. Analogous to designing CPU. Following assuming i shifts etc.) assuming

def pow256bit(x,e): y,i,j while z = y = bit i = j = return

slide-29
SLIDE 29

6

roach, not well known:

pow256bit(x,e): 1,255,0 1&(e>>i): i-1,0

7

This is like CPU’s perspective on

  • riginal square-and-multiply.

j is “instruction pointer”: 0 if at top of loop, 1 if in middle of loop. Each “instruction” here includes exactly one multiply. Try to choose instruction set with big useful operations, avoiding control overhead. Analogous to designing CPU. Following data-flow assuming all arithmetic i shifts etc.) is constant-time, assuming e weight

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y

slide-30
SLIDE 30

6

ell known:

7

This is like CPU’s perspective on

  • riginal square-and-multiply.

j is “instruction pointer”: 0 if at top of loop, 1 if in middle of loop. Each “instruction” here includes exactly one multiply. Try to choose instruction set with big useful operations, avoiding control overhead. Analogous to designing CPU. Following data-flow rules, assuming all arithmetic (including i shifts etc.) is constant-time, assuming e weight exactly 128:

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y

slide-31
SLIDE 31

7

This is like CPU’s perspective on

  • riginal square-and-multiply.

j is “instruction pointer”: 0 if at top of loop, 1 if in middle of loop. Each “instruction” here includes exactly one multiply. Try to choose instruction set with big useful operations, avoiding control overhead. Analogous to designing CPU.

8

Following data-flow rules, assuming all arithmetic (including i shifts etc.) is constant-time, assuming e weight exactly 128:

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y

slide-32
SLIDE 32

7

like CPU’s perspective on riginal square-and-multiply. “instruction pointer”: top of loop, middle of loop. “instruction” here includes exactly one multiply. choose instruction set big useful operations, avoiding control overhead. Analogous to designing CPU.

8

Following data-flow rules, assuming all arithmetic (including i shifts etc.) is constant-time, assuming e weight exactly 128:

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y

Allowing

def pow256bitweightle128(x,e): y,i,j for loop z = z = y = bit i = j = assert return

slide-33
SLIDE 33

7

CPU’s perspective on re-and-multiply. pointer”:

  • p,

loop. “instruction” here

  • ne multiply.

instruction set

  • perations,
  • verhead.

designing CPU.

8

Following data-flow rules, assuming all arithmetic (including i shifts etc.) is constant-time, assuming e weight exactly 128:

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y

Allowing any weight

def pow256bitweightle128(x,e): y,i,j = 1,255,0 for loop in range(384): z = y+(x-y)*j z = z+(1-z)*(i<0) y = y*z bit = 1&(e>>max(i,0)) i = i-(j|(1-bit)) j = bit&(1-j) assert i < 0 return y

slide-34
SLIDE 34

7

ective on re-and-multiply. multiply. set erations,

  • verhead.

CPU.

8

Following data-flow rules, assuming all arithmetic (including i shifts etc.) is constant-time, assuming e weight exactly 128:

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y

Allowing any weight ≤128:

def pow256bitweightle128(x,e): y,i,j = 1,255,0 for loop in range(384): z = y+(x-y)*j z = z+(1-z)*(i<0) y = y*z bit = 1&(e>>max(i,0)) i = i-(j|(1-bit)) j = bit&(1-j) assert i < 0 return y

slide-35
SLIDE 35

8

Following data-flow rules, assuming all arithmetic (including i shifts etc.) is constant-time, assuming e weight exactly 128:

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y

9

Allowing any weight ≤128:

def pow256bitweightle128(x,e): y,i,j = 1,255,0 for loop in range(384): z = y+(x-y)*j z = z+(1-z)*(i<0) y = y*z bit = 1&(e>>max(i,0)) i = i-(j|(1-bit)) j = bit&(1-j) assert i < 0 return y

slide-36
SLIDE 36

8

Following data-flow rules, assuming all arithmetic (including i shifts etc.) is constant-time, assuming e weight exactly 128:

def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y

9

Allowing any weight ≤128:

def pow256bitweightle128(x,e): y,i,j = 1,255,0 for loop in range(384): z = y+(x-y)*j z = z+(1-z)*(i<0) y = y*z bit = 1&(e>>max(i,0)) i = i-(j|(1-bit)) j = bit&(1-j) assert i < 0 return y

Exercise: constant-time ECC scalar mult with sliding windows.