SLIDE 1 1
Constant-time square-and-multiply
University of Illinois at Chicago; Ruhr University Bochum
def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y if 1&(e>>i): y = y*x return y
2
This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e.
SLIDE 2 1
Constant-time square-and-multiply
University of Illinois at Chicago; Ruhr University Bochum
def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y if 1&(e>>i): y = y*x return y
2
This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.”
SLIDE 3 1
Constant-time square-and-multiply
University of Illinois at Chicago; Ruhr University Bochum
def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y if 1&(e>>i): y = y*x return y
2
This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.” — Time still depends on e, even if each multiplication takes time independent of inputs.
SLIDE 4
1
Constant-time re-and-multiply Bernstein University of Illinois at Chicago; University Bochum
pow256bit(x,e): in reversed(range(256)): y*y 1&(e>>i): = y*x return y
2
This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.” — Time still depends on e, even if each multiplication takes time independent of inputs. Hardware is inherently CPU designers
SLIDE 5
1
re-and-multiply Illinois at Chicago; Bochum
pow256bit(x,e): reversed(range(256)):
2
This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.” — Time still depends on e, even if each multiplication takes time independent of inputs. Hardware reality: Accessing is inherently expensive CPU designers try
SLIDE 6
1
Chicago;
reversed(range(256)):
2
This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.” — Time still depends on e, even if each multiplication takes time independent of inputs. Hardware reality: Accessing is inherently expensive. CPU designers try to reduce
SLIDE 7
2
This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.” — Time still depends on e, even if each multiplication takes time independent of inputs.
3
Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost.
SLIDE 8
2
This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.” — Time still depends on e, even if each multiplication takes time independent of inputs.
3
Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost. Example: “L1 cache” typically has 32KB of recently used data. This cache inspects RAM addresses, performs various computations on addresses to try to save time.
SLIDE 9 2
This code uses 256 squarings, plus 1 extra multiplication for each bit set in e. Problem when e is secret: time leaks number of bits set in e. “I’ll choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then there are no more leaks.” — Time still depends on e, even if each multiplication takes time independent of inputs.
3
Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost. Example: “L1 cache” typically has 32KB of recently used data. This cache inspects RAM addresses, performs various computations on addresses to try to save time. : : : so time is a function of RAM
- addresses. Avoid all data flow
from secrets to RAM addresses.
SLIDE 10 2
code uses 256 squarings, extra multiplication h bit set in e. Problem when e is secret: time number of bits set in e. choose secret 256-bit e with exactly 128 bits set. There are enough of these e, and then are no more leaks.” Time still depends on e, if each multiplication time independent of inputs.
3
Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost. Example: “L1 cache” typically has 32KB of recently used data. This cache inspects RAM addresses, performs various computations on addresses to try to save time. : : : so time is a function of RAM
- addresses. Avoid all data flow
from secrets to RAM addresses. Example: from secrets Often describ for softw the same
SLIDE 11 2
256 squarings, multiplication in e. is secret: time bits set in e. cret 256-bit e with
e, and then re leaks.” depends on e, multiplication endent of inputs.
3
Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost. Example: “L1 cache” typically has 32KB of recently used data. This cache inspects RAM addresses, performs various computations on addresses to try to save time. : : : so time is a function of RAM
- addresses. Avoid all data flow
from secrets to RAM addresses. Example: Avoid all from secrets to branch Often described as for software, but comes the same hardware
SLIDE 12 2
rings, time e. e with There are then , inputs.
3
Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost. Example: “L1 cache” typically has 32KB of recently used data. This cache inspects RAM addresses, performs various computations on addresses to try to save time. : : : so time is a function of RAM
- addresses. Avoid all data flow
from secrets to RAM addresses. Example: Avoid all data flow from secrets to branch condi Often described as a separate for software, but comes from the same hardware reality.
SLIDE 13 3
Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost. Example: “L1 cache” typically has 32KB of recently used data. This cache inspects RAM addresses, performs various computations on addresses to try to save time. : : : so time is a function of RAM
- addresses. Avoid all data flow
from secrets to RAM addresses.
4
Example: Avoid all data flow from secrets to branch conditions. Often described as a separate rule for software, but comes from the same hardware reality.
SLIDE 14 3
Hardware reality: Accessing RAM is inherently expensive. CPU designers try to reduce cost. Example: “L1 cache” typically has 32KB of recently used data. This cache inspects RAM addresses, performs various computations on addresses to try to save time. : : : so time is a function of RAM
- addresses. Avoid all data flow
from secrets to RAM addresses.
4
Example: Avoid all data flow from secrets to branch conditions. Often described as a separate rule for software, but comes from the same hardware reality. How CPU runs a program (example of “code = data”):
while True: insn = RAM[state.ip] state = execute(state,insn)
ip (“instruction pointer” or “program counter”): address in RAM of next instruction.
SLIDE 15 3
are reality: Accessing RAM inherently expensive. designers try to reduce cost. Example: “L1 cache” typically 32KB of recently used data. cache inspects RAM addresses, performs various computations on addresses to save time. time is a function of RAM
- addresses. Avoid all data flow
secrets to RAM addresses.
4
Example: Avoid all data flow from secrets to branch conditions. Often described as a separate rule for software, but comes from the same hardware reality. How CPU runs a program (example of “code = data”):
while True: insn = RAM[state.ip] state = execute(state,insn)
ip (“instruction pointer” or “program counter”): address in RAM of next instruction. Standard to follow Square and
def pow256bit(x,e): y = 1 for i y = yx = bit y = return
If bit is an unused
SLIDE 16
3
y: Accessing RAM ensive. try to reduce cost. cache” typically recently used data. ects RAM rms various addresses time. function of RAM all data flow RAM addresses.
4
Example: Avoid all data flow from secrets to branch conditions. Often described as a separate rule for software, but comes from the same hardware reality. How CPU runs a program (example of “code = data”):
while True: insn = RAM[state.ip] state = execute(state,insn)
ip (“instruction pointer” or “program counter”): address in RAM of next instruction. Standard square-and-m to follow these data-flo Square and always
def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y yx = y*x bit = 1&(e>>i) y = y+(yx-y)*bit return y
If bit is 0 then yx an unused “dummy
SLIDE 17 3
Accessing RAM reduce cost. ypically data. rious sses
flow addresses.
4
Example: Avoid all data flow from secrets to branch conditions. Often described as a separate rule for software, but comes from the same hardware reality. How CPU runs a program (example of “code = data”):
while True: insn = RAM[state.ip] state = execute(state,insn)
ip (“instruction pointer” or “program counter”): address in RAM of next instruction. Standard square-and-multiply to follow these data-flow rules: Square and always multiply.
def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y yx = y*x bit = 1&(e>>i) y = y+(yx-y)*bit return y
If bit is 0 then yx computation an unused “dummy operation”.
SLIDE 18
4
Example: Avoid all data flow from secrets to branch conditions. Often described as a separate rule for software, but comes from the same hardware reality. How CPU runs a program (example of “code = data”):
while True: insn = RAM[state.ip] state = execute(state,insn)
ip (“instruction pointer” or “program counter”): address in RAM of next instruction.
5
Standard square-and-multiply fix to follow these data-flow rules: Square and always multiply.
def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y yx = y*x bit = 1&(e>>i) y = y+(yx-y)*bit return y
If bit is 0 then yx computation is an unused “dummy operation”.
SLIDE 19 4
Example: Avoid all data flow secrets to branch conditions. described as a separate rule tware, but comes from same hardware reality. CPU runs a program (example of “code = data”):
True: = RAM[state.ip] = execute(state,insn)
(“instruction pointer” or rogram counter”): address
5
Standard square-and-multiply fix to follow these data-flow rules: Square and always multiply.
def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y yx = y*x bit = 1&(e>>i) y = y+(yx-y)*bit return y
If bit is 0 then yx computation is an unused “dummy operation”. Another
def pow256bit(x,e): y,i,j while if j y if else: else: y i,j return
SLIDE 20
4
all data flow branch conditions. as a separate rule comes from re reality. a program de = data”):
RAM[state.ip] execute(state,insn)
pointer” or ter”): address instruction.
5
Standard square-and-multiply fix to follow these data-flow rules: Square and always multiply.
def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y yx = y*x bit = 1&(e>>i) y = y+(yx-y)*bit return y
If bit is 0 then yx computation is an unused “dummy operation”. Another approach,
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y
SLIDE 21 4
flow conditions. rate rule from data”):
execute(state,insn)
address instruction.
5
Standard square-and-multiply fix to follow these data-flow rules: Square and always multiply.
def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y yx = y*x bit = 1&(e>>i) y = y+(yx-y)*bit return y
If bit is 0 then yx computation is an unused “dummy operation”. Another approach, not well kno
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y
SLIDE 22
5
Standard square-and-multiply fix to follow these data-flow rules: Square and always multiply.
def pow256bit(x,e): y = 1 for i in reversed(range(256)): y = y*y yx = y*x bit = 1&(e>>i) y = y+(yx-y)*bit return y
If bit is 0 then yx computation is an unused “dummy operation”.
6
Another approach, not well known:
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y
SLIDE 23 5
Standard square-and-multiply fix follow these data-flow rules: and always multiply.
pow256bit(x,e): in reversed(range(256)): y*y = y*x = 1&(e>>i) y+(yx-y)*bit return y
is 0 then yx computation is unused “dummy operation”.
6
Another approach, not well known:
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y
This is lik
j is “instruction 0 if at top 1 if in middle Each “instruction” includes
SLIDE 24 5
re-and-multiply fix data-flow rules: ys multiply.
pow256bit(x,e): reversed(range(256)): 1&(e>>i) y+(yx-y)*bit
yx computation is my operation”.
6
Another approach, not well known:
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y
This is like CPU’s
- riginal square-and-multiply
j is “instruction pointer”: 0 if at top of loop, 1 if in middle of lo Each “instruction” includes exactly one
SLIDE 25 5
ltiply fix rules: multiply.
reversed(range(256)):
computation is eration”.
6
Another approach, not well known:
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y
This is like CPU’s perspective
- riginal square-and-multiply.
j is “instruction pointer”: 0 if at top of loop, 1 if in middle of loop. Each “instruction” here includes exactly one multiply
SLIDE 26 6
Another approach, not well known:
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y
7
This is like CPU’s perspective on
- riginal square-and-multiply.
j is “instruction pointer”: 0 if at top of loop, 1 if in middle of loop. Each “instruction” here includes exactly one multiply.
SLIDE 27 6
Another approach, not well known:
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: if j == 0: y = y*y if 1&(e>>i): j = 1 else: i = i-1 else: y = y*x i,j = i-1,0 return y
7
This is like CPU’s perspective on
- riginal square-and-multiply.
j is “instruction pointer”: 0 if at top of loop, 1 if in middle of loop. Each “instruction” here includes exactly one multiply. Try to choose instruction set with big useful operations, avoiding control overhead. Analogous to designing CPU.
SLIDE 28 6
Another approach, not well known:
pow256bit(x,e): = 1,255,0 i >= 0: j == 0: = y*y if 1&(e>>i): j = 1 else: i = i-1 else: = y*x i,j = i-1,0 return y
7
This is like CPU’s perspective on
- riginal square-and-multiply.
j is “instruction pointer”: 0 if at top of loop, 1 if in middle of loop. Each “instruction” here includes exactly one multiply. Try to choose instruction set with big useful operations, avoiding control overhead. Analogous to designing CPU. Following assuming i shifts etc.) assuming
def pow256bit(x,e): y,i,j while z = y = bit i = j = return
SLIDE 29 6
roach, not well known:
pow256bit(x,e): 1,255,0 1&(e>>i): i-1,0
7
This is like CPU’s perspective on
- riginal square-and-multiply.
j is “instruction pointer”: 0 if at top of loop, 1 if in middle of loop. Each “instruction” here includes exactly one multiply. Try to choose instruction set with big useful operations, avoiding control overhead. Analogous to designing CPU. Following data-flow assuming all arithmetic i shifts etc.) is constant-time, assuming e weight
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y
SLIDE 30 6
ell known:
7
This is like CPU’s perspective on
- riginal square-and-multiply.
j is “instruction pointer”: 0 if at top of loop, 1 if in middle of loop. Each “instruction” here includes exactly one multiply. Try to choose instruction set with big useful operations, avoiding control overhead. Analogous to designing CPU. Following data-flow rules, assuming all arithmetic (including i shifts etc.) is constant-time, assuming e weight exactly 128:
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y
SLIDE 31 7
This is like CPU’s perspective on
- riginal square-and-multiply.
j is “instruction pointer”: 0 if at top of loop, 1 if in middle of loop. Each “instruction” here includes exactly one multiply. Try to choose instruction set with big useful operations, avoiding control overhead. Analogous to designing CPU.
8
Following data-flow rules, assuming all arithmetic (including i shifts etc.) is constant-time, assuming e weight exactly 128:
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y
SLIDE 32
7
like CPU’s perspective on riginal square-and-multiply. “instruction pointer”: top of loop, middle of loop. “instruction” here includes exactly one multiply. choose instruction set big useful operations, avoiding control overhead. Analogous to designing CPU.
8
Following data-flow rules, assuming all arithmetic (including i shifts etc.) is constant-time, assuming e weight exactly 128:
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y
Allowing
def pow256bitweightle128(x,e): y,i,j for loop z = z = y = bit i = j = assert return
SLIDE 33 7
CPU’s perspective on re-and-multiply. pointer”:
loop. “instruction” here
instruction set
designing CPU.
8
Following data-flow rules, assuming all arithmetic (including i shifts etc.) is constant-time, assuming e weight exactly 128:
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y
Allowing any weight
def pow256bitweightle128(x,e): y,i,j = 1,255,0 for loop in range(384): z = y+(x-y)*j z = z+(1-z)*(i<0) y = y*z bit = 1&(e>>max(i,0)) i = i-(j|(1-bit)) j = bit&(1-j) assert i < 0 return y
SLIDE 34 7
ective on re-and-multiply. multiply. set erations,
CPU.
8
Following data-flow rules, assuming all arithmetic (including i shifts etc.) is constant-time, assuming e weight exactly 128:
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y
Allowing any weight ≤128:
def pow256bitweightle128(x,e): y,i,j = 1,255,0 for loop in range(384): z = y+(x-y)*j z = z+(1-z)*(i<0) y = y*z bit = 1&(e>>max(i,0)) i = i-(j|(1-bit)) j = bit&(1-j) assert i < 0 return y
SLIDE 35
8
Following data-flow rules, assuming all arithmetic (including i shifts etc.) is constant-time, assuming e weight exactly 128:
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y
9
Allowing any weight ≤128:
def pow256bitweightle128(x,e): y,i,j = 1,255,0 for loop in range(384): z = y+(x-y)*j z = z+(1-z)*(i<0) y = y*z bit = 1&(e>>max(i,0)) i = i-(j|(1-bit)) j = bit&(1-j) assert i < 0 return y
SLIDE 36
8
Following data-flow rules, assuming all arithmetic (including i shifts etc.) is constant-time, assuming e weight exactly 128:
def pow256bit(x,e): y,i,j = 1,255,0 while i >= 0: z = y+(x-y)*j y = y*z bit = 1&(e>>i) i = i-(j|(1-bit)) j = bit&(1-j) return y
9
Allowing any weight ≤128:
def pow256bitweightle128(x,e): y,i,j = 1,255,0 for loop in range(384): z = y+(x-y)*j z = z+(1-z)*(i<0) y = y*z bit = 1&(e>>max(i,0)) i = i-(j|(1-bit)) j = bit&(1-j) assert i < 0 return y
Exercise: constant-time ECC scalar mult with sliding windows.