Probabilistic Assertions Adrian Sampson University of Washington - - PowerPoint PPT Presentation

probabilistic assertions
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Assertions Adrian Sampson University of Washington - - PowerPoint PPT Presentation

Expressing and Verifying Probabilistic Assertions Adrian Sampson University of Washington Pavel Panchekha Todd Mytkowicz Microsoft Research Kathryn S. McKinley Dan Grossman University of Washington


slide-1
SLIDE 1 Adrian Sampson Pavel Panchekha Todd Mytkowicz Kathryn S. McKinley Dan Grossman Luis Ceze

Probabilistic Assertions

University of Washington PLDI 2014 Microsoft Research University of Washington

Expressing and Verifying

slide-2
SLIDE 2

Probabilistic assertions express correctness properties in modern software. Our verifier checks them efficiently and accurately.

slide-3
SLIDE 3

assert file != NULL

t e s t v e r i f y check

slide-4
SLIDE 4

assert file != NULL

e

e must hold on every execution

slide-5
SLIDE 5

assert e e

Approximate Computing this approximate image is close to its precise version k-means clustering is likely to converge even
  • n unreliable hardware
sensor error does not render the app’s conclusions useless Mobile and Sensing Obfuscation for Data Privacy
  • bfuscated data is still
useful in aggregate
slide-6
SLIDE 6

assert e e

sensor error does not render the app’s conclusions useless Mobile and Sensing Obfuscation for Data Privacy
  • bfuscated data is still
useful in aggregate

Traditional assertions are insufficient for programs with probabilistic behavior.

Approximate Computing this approximate image is close to its precise version k-means clustering is likely to converge even
  • n unreliable hardware
slide-7
SLIDE 7

true_avg = average(salaries) private_avg = average(obfuscate(salaries)) assert true_avg - private_avg <= 10,000

Assertions are insufficient for private-data obfuscation

slide-8
SLIDE 8

true_avg = average(salaries) private_avg = average( (salaries)) assert true_avg - private_avg <= 10,000

Assertions are insufficient for private-data obfuscation

probability distribution

  • bfuscate
slide-9
SLIDE 9

assert e

Assertion

slide-10
SLIDE 10

assert e p

, p, c

Probabilistic assertion

slide-11
SLIDE 11

Probabilistic assertion

assert e p

, p, c

e must hold with probability p at confidence c

slide-12
SLIDE 12

Probabilistic assertion

assert e p

, p, c

t e s t ? v e r i f y ? check?

slide-13
SLIDE 13

How to verify a probabilistic assertion

probabilistic program

?

passert e, p, c float obfuscated(float n) { return n + gaussian(0.0, 1000.0); } float average_salary(float* salaries) { total = 0.0; for (int i = 0; i < COUNT; ++i) total += obfuscated(salaries[i]); avg = total / len(salaries); p_avg = ...; }
slide-14
SLIDE 14

How to verify a probabilistic assertion naively

probabilistic program passert e, p, c float obfuscated(float n) { return n + gaussian(0.0, 1000.0); } float average_salary(float* salaries) { total = 0.0; for (int i = 0; i < COUNT; ++i) total += obfuscated(salaries[i]); avg = total / len(salaries); p_avg = ...; }

?

slide-15
SLIDE 15

How to verify a probabilistic assertion with statistical reasoning

for statistical models queries & inference for probabilistic software

passert

Church Infer.NET [Sankaranarayanan+ PLDI 2013] [Hur+ PLDI 2014] ⋮

?

slide-16
SLIDE 16 passert e, p, c float obfuscated(float n) { return n + gaussian(0.0, 1000.0); } float average_salary(float* salaries) { total = 0.0; for (int i = 0; i < COUNT; ++i) total += obfuscated(salaries[i]); avg = total / len(salaries); p_avg = ...; }

How to verify a probabilistic assertion efficiently and accurately

Bayesian network IR

distribution extraction

via symbolic execution statistical
  • ptimizations

verification

slide-17
SLIDE 17

How to verify a probabilistic assertion efficiently and accurately

Bayesian network IR

distribution extraction

via symbolic execution statistical
  • ptimizations

verification implementation for LLVM & Clang

passert e, p, c float obfuscated(float n) { return n + gaussian(0.0, 1000.0); } float average_salary(float* salaries) { total = 0.0; for (int i = 0; i < COUNT; ++i) total += obfuscated(salaries[i]); avg = total / len(salaries); p_avg = ...; }
slide-18
SLIDE 18

How to verify a probabilistic assertion efficiently and accurately

Bayesian network IR

distribution extraction

via symbolic execution statistical
  • ptimizations

verification implementation for LLVM & Clang

passert e, p, c float obfuscated(float n) { return n + gaussian(0.0, 1000.0); } float average_salary(float* salaries) { total = 0.0; for (int i = 0; i < COUNT; ++i) total += obfuscated(salaries[i]); avg = total / len(salaries); p_avg = ...; }
slide-19
SLIDE 19

Distribution extraction: random draws are symbolic

b = a + gaussian(0.0, 1.0) a 4.2 symbolic heap a 4.2 b G0,1 4.2 +
slide-20
SLIDE 20

Concrete vs. symbolic semantics

+

program input nondeterministic concrete execution
  • utputs
slide-21
SLIDE 21

Concrete vs. symbolic semantics

+

program input nondeterministic concrete execution
  • utputs
nondeterministic sampling
  • utputs

+

program input deterministic symbolic execution
slide-22
SLIDE 22 input: a = 4.2 b = gaussian(0.0, 1.0)

a

4.2

b

G0,1
slide-23
SLIDE 23 input: a = 4.2 b = gaussian(0.0, 1.0) c = a + b

a

4.2

b

G0,1

c

+
slide-24
SLIDE 24 input: a = 4.2 b = gaussian(0.0, 1.0) c = a + b d = c + b

a

4.2

b

G0,1

c

+

d

+
slide-25
SLIDE 25 input: a = 4.2 b = gaussian(0.0, 1.0) c = a + b d = c + b

a

4.2

b

G0,1

c

+

d

+
slide-26
SLIDE 26 input: a = 4.2 b = gaussian(0.0, 1.0) c = a + b d = c + b if b > 0.5 e = 2.0 else e = 4.0

a

4.2 G0,1

c

+

d

+ ?

e

if 2.0 4.0 then else

b

> 0.5
slide-27
SLIDE 27 input: a = 4.2 b = gaussian(0.0, 1.0) c = a + b d = c + b if b > 0.5 e = 2.0 else e = 4.0 passert e <= 3.0, 0.9, 0.9

a

4.2 G0,1

c

+

d

+ ?

e

if 2.0 4.0 then else

b

> 0.5 ≤ 3.0
slide-28
SLIDE 28 input: a = 4.2 b = gaussian(0.0, 1.0) c = a + b d = c + b if b > 0.5 e = 2.0 else e = 4.0 passert e <= 3.0, 0.9, 0.9 4.2 G0,1 + + ? if 2.0 4.0 then else > 0.5 ≤ 3.0
slide-29
SLIDE 29 input: a = 4.2 b = gaussian(0.0, 1.0) c = a + b d = c + b if b > 0.5 e = 2.0 else e = 4.0 passert e <= 3.0, 0.9, 0.9 4.2 G0,1 + + ? if 2.0 4.0 then else > 0.5 ≤ 3.0 input: a = unif(2.0, 9.0)
slide-30
SLIDE 30

≈ testing ≈ static analysis concrete input input distribution

salary = $24,000 salary = uniform(…)
slide-31
SLIDE 31

More in the paper

Arrays & pointers Loops External code Probabilistic path pruning

slide-32
SLIDE 32

Distribution extraction produces an expression dag

4.2 G0,1 + + > 0.5

Bayesian network

slide-33
SLIDE 33

Distribution extraction produces an expression dag

4.2 G0,1 + + > 0.5

Bayesian network

slide-34
SLIDE 34

Distribution extraction produces an expression dag

4.2 G0,1 + + > 0.5

Bayesian network

nodes: random variables edges: dependence directed & acyclic random draws

  • nly at leaves

sample in a single pass

slide-35
SLIDE 35 passert e, p, c float obfuscated(float n) { return n + gaussian(0.0, 1000.0); } float average_salary(float* salaries) { total = 0.0; for (int i = 0; i < COUNT; ++i) total += obfuscated(salaries[i]); avg = total / len(salaries); p_avg = ...; } Bayesian network IR

distribution extraction

via symbolic execution statistical
  • ptimizations

verification implementation for LLVM & Clang

slide-36
SLIDE 36

statistical property passert verifier

  • ptimization
slide-37
SLIDE 37

Bayesian-network IR enables new optimizations

+ Gʹ G Gʹʹ X ∼ G(µX, σ2 X) Y ∼ G(µY , σ2 Y ) Z = X + Y ⇒ Z ∼ G(µX + µY , σ2 X + σ2 Y )
slide-38
SLIDE 38

Bayesian-network IR enables new optimizations

×

U c Uʹ X ∼ U(a, b) Y = cX ⇒ Y ∼ U(ca, cb)
slide-39
SLIDE 39

Bayesian-network IR enables new optimizations

c U B X ∼ U(a, b) Y ∼ X ≤ c a ≤ c ≤ b ⇒ Y ∼ B ✓c − a b − a ◆
slide-40
SLIDE 40

Central Limit Theorem collapses large sums

D G + D D D D D D X1, X2, . . . , Xn ∼ D Y = X i Xi ⇒ Y ∼ G(nµD, nσ2 D)
slide-41
SLIDE 41 passert e, p, c float obfuscated(float n) { return n + gaussian(0.0, 1000.0); } float average_salary(float* salaries) { total = 0.0; for (int i = 0; i < COUNT; ++i) total += obfuscated(salaries[i]); avg = total / len(salaries); p_avg = ...; } Bayesian network IR

distribution extraction

via symbolic execution statistical
  • ptimizations

verification implementation for LLVM & Clang

slide-42
SLIDE 42

Verification via direct evaluation

D B + D D D D D D

c

slide-43
SLIDE 43

Verification via hypothesis testing

D G0,1 + ÷ > D

, p, c μ c p

2 3
slide-44
SLIDE 44 passert e, p, c float obfuscated(float n) { return n + gaussian(0.0, 1000.0); } float average_salary(float* salaries) { total = 0.0; for (int i = 0; i < COUNT; ++i) total += obfuscated(salaries[i]); avg = total / len(salaries); p_avg = ...; } Bayesian network IR

distribution extraction

via symbolic execution statistical
  • ptimizations

verification implementation for LLVM & Clang

slide-45
SLIDE 45

Probabilistic assertions for C and C++

.c LLVM IR LLVM IR Native Code

strawman stress-tester

slide-46
SLIDE 46

Probabilistic programs used in the evaluation

gpswalk salary salary-abs kmeans sobel hotspot inversek2j sensing privacy approximate computing

slide-47
SLIDE 47

Running time vs. stress testing

0.0 0.2 0.4 0.6 0.8 1.0 1.2 analyze sample

time relative to baseline

gpswalk salary salary-abs kmeans sobel hotspot inversek h.mean B B B B B B B B baseline
slide-48
SLIDE 48

Running time vs. stress testing

0.0 0.2 0.4 0.6 0.8 1.0 1.2 analyze sample

time relative to baseline

gpswalk salary salary-abs kmeans sobel hotspot inversek h.mean B N B N B N B N B N B N B N B N baseline no statistical optimizations
slide-49
SLIDE 49

Running time vs. stress testing

24× faster than baseline verifier on average Mostly analysis time

0.0 0.2 0.4 0.6 0.8 1.0 1.2 analyze sample

time relative to baseline

gpswalk salary salary-abs kmeans sobel hotspot inversek h.mean B N O B N O B N O B N O B N O B N O B N O B N O
  • ptimized
slide-50
SLIDE 50

Probabilistic assertions express correctness properties in modern software. Our verifier checks them efficiently and accurately.