Probabilistic Assertions
University of Washington PLDI 2014 Microsoft Research University of WashingtonExpressing and Verifying
Probabilistic Assertions Adrian Sampson University of Washington - - PowerPoint PPT Presentation
Expressing and Verifying Probabilistic Assertions Adrian Sampson University of Washington Pavel Panchekha Todd Mytkowicz Microsoft Research Kathryn S. McKinley Dan Grossman University of Washington
Probabilistic Assertions
University of Washington PLDI 2014 Microsoft Research University of WashingtonExpressing and Verifying
Probabilistic assertions express correctness properties in modern software. Our verifier checks them efficiently and accurately.
assert file != NULL
t e s t v e r i f y check
assert file != NULL
ee must hold on every execution
assert e e
assert e e
sensor error does not render the app’s conclusions useless Mobile and Sensing Obfuscation for Data PrivacyTraditional assertions are insufficient for programs with probabilistic behavior.
true_avg = average(salaries) private_avg = average(obfuscate(salaries)) assert true_avg - private_avg <= 10,000
Assertions are insufficient for private-data obfuscation
true_avg = average(salaries) private_avg = average( (salaries)) assert true_avg - private_avg <= 10,000
Assertions are insufficient for private-data obfuscation
probability distribution
assert e
Assertion
assert e p
, p, cProbabilistic assertion
Probabilistic assertion
assert e p
, p, ce must hold with probability p at confidence c
Probabilistic assertion
assert e p
, p, ct e s t ? v e r i f y ? check?
How to verify a probabilistic assertion
probabilistic program?
passert e, p, c float obfuscated(float n) { return n + gaussian(0.0, 1000.0); } float average_salary(float* salaries) { total = 0.0; for (int i = 0; i < COUNT; ++i) total += obfuscated(salaries[i]); avg = total / len(salaries); p_avg = ...; }How to verify a probabilistic assertion naively
probabilistic program passert e, p, c float obfuscated(float n) { return n + gaussian(0.0, 1000.0); } float average_salary(float* salaries) { total = 0.0; for (int i = 0; i < COUNT; ++i) total += obfuscated(salaries[i]); avg = total / len(salaries); p_avg = ...; }?
How to verify a probabilistic assertion with statistical reasoning
for statistical models queries & inference for probabilistic software
passert
Church Infer.NET [Sankaranarayanan+ PLDI 2013] [Hur+ PLDI 2014] ⋮
?
How to verify a probabilistic assertion efficiently and accurately
Bayesian network IR✓
distribution extraction
via symbolic execution statisticalverification
How to verify a probabilistic assertion efficiently and accurately
Bayesian network IR✓
distribution extraction
via symbolic execution statisticalverification implementation for LLVM & Clang
passert e, p, c float obfuscated(float n) { return n + gaussian(0.0, 1000.0); } float average_salary(float* salaries) { total = 0.0; for (int i = 0; i < COUNT; ++i) total += obfuscated(salaries[i]); avg = total / len(salaries); p_avg = ...; }How to verify a probabilistic assertion efficiently and accurately
Bayesian network IR✓
distribution extraction
via symbolic execution statisticalverification implementation for LLVM & Clang
passert e, p, c float obfuscated(float n) { return n + gaussian(0.0, 1000.0); } float average_salary(float* salaries) { total = 0.0; for (int i = 0; i < COUNT; ++i) total += obfuscated(salaries[i]); avg = total / len(salaries); p_avg = ...; }Distribution extraction: random draws are symbolic
b = a + gaussian(0.0, 1.0) a 4.2 symbolic heap a 4.2 b G0,1 4.2 +Concrete vs. symbolic semantics
+
program input nondeterministic concrete executionConcrete vs. symbolic semantics
+
program input nondeterministic concrete execution+
program input deterministic symbolic executiona
4.2b
G0,1a
4.2b
G0,1c
+a
4.2b
G0,1c
+d
+a
4.2b
G0,1c
+d
+a
4.2 G0,1c
+d
+ ?e
if 2.0 4.0 then elseb
> 0.5a
4.2 G0,1c
+d
+ ?e
if 2.0 4.0 then elseb
> 0.5 ≤ 3.0≈ testing ≈ static analysis concrete input input distribution
salary = $24,000 salary = uniform(…)More in the paper
Arrays & pointers Loops External code Probabilistic path pruning
Distribution extraction produces an expression dag
4.2 G0,1 + + > 0.5Bayesian network
Distribution extraction produces an expression dag
4.2 G0,1 + + > 0.5Bayesian network
Distribution extraction produces an expression dag
4.2 G0,1 + + > 0.5Bayesian network
nodes: random variables edges: dependence directed & acyclic random draws
sample in a single pass
✓
distribution extraction
via symbolic execution statisticalverification implementation for LLVM & Clang
statistical property passert verifier
Bayesian-network IR enables new optimizations
+ Gʹ G Gʹʹ X ∼ G(µX, σ2 X) Y ∼ G(µY , σ2 Y ) Z = X + Y ⇒ Z ∼ G(µX + µY , σ2 X + σ2 Y )Bayesian-network IR enables new optimizations
×
U c Uʹ X ∼ U(a, b) Y = cX ⇒ Y ∼ U(ca, cb)Bayesian-network IR enables new optimizations
≤
c U B X ∼ U(a, b) Y ∼ X ≤ c a ≤ c ≤ b ⇒ Y ∼ B ✓c − a b − a ◆Central Limit Theorem collapses large sums
D G + D D D D D D X1, X2, . . . , Xn ∼ D Y = X i Xi ⇒ Y ∼ G(nµD, nσ2 D)✓
distribution extraction
via symbolic execution statisticalverification implementation for LLVM & Clang
Verification via direct evaluation
D B + D D D D D D≤
c✓
Verification via hypothesis testing
D G0,1 + ÷ > D, p, c μ c p
2 3✓
distribution extraction
via symbolic execution statisticalverification implementation for LLVM & Clang
Probabilistic assertions for C and C++
.c LLVM IR LLVM IR Native Codestrawman stress-tester
Probabilistic programs used in the evaluation
gpswalk salary salary-abs kmeans sobel hotspot inversek2j sensing privacy approximate computing
Running time vs. stress testing
0.0 0.2 0.4 0.6 0.8 1.0 1.2 analyze sampletime relative to baseline
gpswalk salary salary-abs kmeans sobel hotspot inversek h.mean B B B B B B B B baselineRunning time vs. stress testing
0.0 0.2 0.4 0.6 0.8 1.0 1.2 analyze sampletime relative to baseline
gpswalk salary salary-abs kmeans sobel hotspot inversek h.mean B N B N B N B N B N B N B N B N baseline no statistical optimizationsRunning time vs. stress testing
24× faster than baseline verifier on average Mostly analysis time
0.0 0.2 0.4 0.6 0.8 1.0 1.2 analyze sampletime relative to baseline
gpswalk salary salary-abs kmeans sobel hotspot inversek h.mean B N O B N O B N O B N O B N O B N O B N O B N OProbabilistic assertions express correctness properties in modern software. Our verifier checks them efficiently and accurately.