Symbolic Finite Automata Margus Veanes April 5, 2014 VSSE'14, - - PowerPoint PPT Presentation

symbolic finite automata
SMART_READER_LITE
LIVE PREVIEW

Symbolic Finite Automata Margus Veanes April 5, 2014 VSSE'14, - - PowerPoint PPT Presentation

Applications of Symbolic Finite Automata Margus Veanes April 5, 2014 VSSE'14, Grenoble, France 1 Overview Are SFAs applicable to analysis of software evolution? automata modulo theories S ymbolic Finite Automaton (SFA) Main


slide-1
SLIDE 1

Applications of Symbolic Finite Automata

Margus Veanes

VSSE'14, Grenoble, France 1 April 5, 2014

slide-2
SLIDE 2

Overview

  • Are SFAs applicable to analysis of software evolution?
  • Symbolic Finite Automaton (SFA)
  • Main properties
  • Symbolic finite transducers
  • Current Applications

– Testing (unit, fuzz) – Regex processing – Web security – SMT theory plugin – backend for MSO

  • Extensions

– look-ahead – trees – registers

April 5, 2014 VSSE'14, Grenoble, France 2

SFA with symbolic outputs automata modulo theories Boolean closed, succinct for large 

slide-3
SLIDE 3

Automata based analysis of software evolution?

  • Possible extension of graph based approaches

– SFAs are directed graphs – In addition to talking about structural properties such as cyclicity and rank, one can talk about regularity and language

  • Possible extension of FSA based approaches

– Not bound to a finite small alphabet – The alphabet can be rich, possibly infinite

  • Brings in an aspect of model based analysis

– SFA can act as a model or oracle

April 5, 2014 VSSE'14, Grenoble, France 3

slide-4
SLIDE 4

Mile-high view

April 5, 2014 VSSE'14, Grenoble, France 4

Software v1 traces monitor learn Software v2 traces monitor learn SFA2

  

SFA1

  

L(SFA1) = L(SFA2) ? trace  L(SFA1) ? evolve

slide-5
SLIDE 5

Possible scenario

April 5, 2014 VSSE'14, Grenoble, France 5

Prog.v1 = loop{t=now; critical_code; save(now-t)} Prog.v2 = loop{t=now; critical_code_upd; save(now-t)}

SFA1: 0-255 0-255 regex: [\0-\xFF]+ trace: [56,150,500]  L(SFA1)

slide-6
SLIDE 6

Symbolic Finite Automaton (SFA)

  • Alphabet is an effective Boolean Algebra A
  • Labels are predicates over A

q p  x. 'a' ≤ x ≤ 'd'

April 5, 2014 VSSE'14, Grenoble, France 6

  • ne symbolic

transition: denotes many concrete transitions: q p

'a' 'b' 'c' 'd'

for x 〚'a' ≤ x ≤ 'd'〛

slide-7
SLIDE 7

SFA Execution Example

7

even(x)

  • dd(x)

p q even(x)

  • dd(x)

1 2 5 3 p p q p p

p is final  accept the input

April 5, 2014 VSSE'14, Grenoble, France

slide-8
SLIDE 8

Alphabet Effective Boolean Algebra

April 5, 2014 VSSE'14, Grenoble, France 8

Domain Predicates   2D

slide-9
SLIDE 9

Alphabet SMTint

  • D = Integers
  •  = integer linear arithmetic formulas

(with one fixed free variable)

  • 〚  〛= 〚〛 〚〛
  • 〚〛=  ,〚  〛= D \〚〛
  • Sat

atis isfiab fiability ility: 〚〛 

April 5, 2014 VSSE'14, Grenoble, France 9

slide-10
SLIDE 10

Alphabet 2{a,b}

April 5, 2014 VSSE'14, Grenoble, France 10

p q {a,b} {a} {b}

a*b(a|b)*

{a,b} {,{a},{b},{a,b}} id  {a,b}  

c

SFA over 2{a,b} : regex :

slide-11
SLIDE 11

Alphabet 2bvk

  • D = {n | 0  n < 2k}
  •  = BDDs of depth k
  • Boolean operations are BDD operations
  • Below〚i〛= {n  D | i'th bit of n is 1}

April 5, 2014 VSSE'14, Grenoble, France 11

i has fixed size independent of i

slide-12
SLIDE 12

Boolean operations over SFAs

  • Intersection (product of transitions)

April 5, 2014 VSSE'14, Grenoble, France 12

p1 q1

1

p2 q2

2

A1: A2:

p1 p2

12

A1A2:

q1 q2

delete when 12 unsat X

slide-13
SLIDE 13

Boolean operations over SFAs

  • Complementation

(first determinize then swap final and nonfinal states)

April 5, 2014 VSSE'14, Grenoble, France 13

p q

r

{p} {q}



{q,r} {r}



  • 

delete unsat guards

determinize

slide-14
SLIDE 14

Intersection example

April 5, 2014 VSSE'14, Grenoble, France 14

a1 a2 2

A: B:

6

  • 6

b1 3 a1 b1 a2 b2 23

  • 6
  • 3

a1 b2

  • 3

let k(x)  ((x mod k) = 0)

AB:

b2 63

X

slide-15
SLIDE 15

Are SFAs a useful extension of classical automata?

  • Can classical automata theory and algorithms

be extended to work modulo large (infinite) alphabets  ?

  • The answer is nontrivial. For example.

– NFA determinization is O(||2n) – DFA minimization is O(||n logn) What happens when  is infinite?

April 5, 2014 VSSE'14, Grenoble, France 15

slide-16
SLIDE 16

Why care about symbolic representation at all?

  • Scalability.

– Explicit expansion is expensive even for finite case (take e.g. ASCII where || = 27)

  • String analysis

– typically  is UTF16, || = 216

  • Often characters are lifted to integers and use

arithmetic operations

  • List processing

– elements are integers or have composite types, such as tuples or lists

April 5, 2014 VSSE'14, Grenoble, France 16

slide-17
SLIDE 17

Perhaps SFA  NFA ?

  • Given SFA Create NFA whose characters are

minterms of predicates occurring in the SFA

  • Minterms (,) = {, , , }

(keep satisfiable combinations only)

  • May blow up exponentially, e.g., the following

SFA has 2k minterms (alphabet 2bvk)

April 5, 2014 VSSE'14, Grenoble, France 17

slide-18
SLIDE 18

We also want output

  • ... transducers

April 5, 2014 VSSE'14, Grenoble, France 18

slide-19
SLIDE 19

Symbolic Finite Transducer (SFT)

  • Labels are guarded transformation functions

Concrete transitions: p q Symbolic transition: ‘\x80’/ “\xC2\x80” … ‘\x7FF’/ “\xDF\xBF” q p  x. 8016 ≤ x ≤ 7FF16/ [C016|x10,6, 8016|x5,0] guard bitvector

  • perations

1920 transitions

VSSE'14, Grenoble, France 19 April 5, 2014

slide-20
SLIDE 20

SFT Execution Example

20

even(x)/[x, x]

  • dd(x)/[x-1]

p q even(x)/[]

  • dd(x)/[x-1]

1 2 5 3 p p q p p Input tape Output tape 2 2 4 2

April 5, 2014 VSSE'14, Grenoble, France

slide-21
SLIDE 21

Some Applications of SFAs/SFTs

  • SFAs:

– Regex support in parameterized unit testing – Password generation

  • SFTs:

– Analysis of string encoders/decoders – Security analysis of sanitizers

April 5, 2014 VSSE'14, Grenoble, France 21

slide-22
SLIDE 22

Application 1 Regexes in parameterized unit testing

  • Rex component in Pex
  • Generate values for s that reach the return branches

– s is a string of Unicode characters (16-bit bit-vectors)

April 5, 2014 VSSE'14, Grenoble, France 23

bool IsValidEmail(string s) { string r1 = @"^[A-Za-z0-9]+@(([A-Za-z0-9\-])+\.)+([A-Za-z\-])+$"; string r2 = @"^\d.*$"; if (System.Text.RegularExpressions.Regex.IsMatch(s, r1)) if (System.Text.RegularExpressions.Regex.IsMatch(s, r2)) return false; //branch 1 else return true; //branch 2 else return false; //branch 3 }

Solve: sL(r1)L(r2) [eg. s = “3@a.b”] Solve: sL(r1)\L(r2) [eg. s = “a@b.c”] Solve: sL(r1) [eg. s = “a@..c”]

slide-23
SLIDE 23

Application 2 Password generation

Given constraints:

  • Length is k: "^[\x21-\x7E]{k}$"
  • Contains 2 capital letters: "[A-Z].*[A-Z]"
  • Contains a digit: "\d"
  • Contains a non-word character: "\W"

Generate random instances with uniform distribution that match all the above conditions. k=4 : http://www.rise4fun.com/Rex/4nE

April 5, 2014 VSSE'14, Grenoble, France 24

slide-24
SLIDE 24

Application 3 String analysis (motivating scenario)

req = http://www.x.com/%c0%ae%c0%ae/%c0%ae%c0%ae/private/

Windows 2000 vulnerability: http://www.sans.org/security-resources/malwarefaq/wnt-unicode.php Apache Tomcat vulnerability: http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-2938

1) security check: req must not contain "../" 2) dir = utf8decode("%c0%ae %c0%ae/%c0%ae%c 0%ae/private/") = "../../private/"

access granted to "../../private/" Analysis question: Does utf8decode reject overlong utf8-encodings such as "%C0%AE" for '.'?

VSSE'14, Grenoble, France 25 April 5, 2014

slide-25
SLIDE 25

Application 3 (cont.) SFA Example

  • Utf8 validator (for up to 2 octet encodings)

– Rejects invalid utf8 encoded strings

Regex Rutf8: ^([\x00-\x7F]|[\xC2-\xDF][\x80-\xBF])*$

Accepts “../../” Rejects “..%C0%AF../” p q  x. 8016 ≤ x ≤ BF16  x. C216 ≤ x ≤ DF16  x. 0 ≤ x ≤ 7F16

VSSE'14, Grenoble, France 26 April 5, 2014

slide-26
SLIDE 26

Application 3 (cont.) Complete Rutf8

VSSE'14, Grenoble, France 27 April 5, 2014

slide-27
SLIDE 27

Application 3 (cont.) Analysis scenario

  • Valid inputs

A = SFA(Rutf8)

  • Invalid inputs (attack vectors)

Ac = Complement(A)

  • Inputs accepted by Utf8Decode

D = Domain(Utf8Decode)

  • Does Utf8Decode accept an invalid input?

Ac  D  ? (e.g. "%c0%ae%c0%ae"  D)

April 5, 2014 VSSE'14, Grenoble, France 28

slide-28
SLIDE 28

We also want to handle outputs

  • Want to analyze questions such as:

Does Utf8Encode produce a bad output? x(Utf8Encode(x)  Complement(SFA(Rutf8))) ?

  • SFA + outputs = SFT

April 5, 2014 VSSE'14, Grenoble, France 29

slide-29
SLIDE 29

SFT Example

  • Utf8 encoder

– Input: valid utf16 encoded string – Output: equivalent utf8 encoded string For example utf8encode(“\uFF28\uFF29”) = “\xEF\xBC\xA8\xEF\xBC\xA9” 5 states & 11 transitions

  • Equiv. classical

transducer has 216 transitions

VSSE'14, Grenoble, France 30 April 5, 2014

slide-30
SLIDE 30

Bek (a frontend language for SFTs)

program smileycipher(w) { return iter(c in w) { case(true): yield(0xD83D,(c - 'a') + 0xDE00); }; }

http://www.rise4fun.com/Bek/ZH0

April 5, 2014 VSSE'14, Grenoble, France 31

slide-31
SLIDE 31

Why SFTs and SFAs?

  • They have good algebraic properties (POPL'12)

– SFTs are closed under composition – Equivalence is decidable in the single-valued case – domain of an SFT is an SFA – SFAs are closed under Boolean operations

  • Useful for various analysis tasks

April 5, 2014 VSSE'14, Grenoble, France 32

slide-32
SLIDE 32

Property analysis (USENIX Sec'11)

  • Does it matter if a sanitizer is applied twice? Idempotence:
  • Does order of sanitizers matter? Commutativity:

April 5, 2014 VSSE'14, Grenoble, France 33

 “input string”  A not idempotent A  A A A A  “input string”  A and B not commutative B  A B A A  B A B

slide-33
SLIDE 33

Safety analysis

  • Algorithms for SFAs and SFTs.

 extensions of classical algorithms modulo Th()

  • Example: suppose good output = "catfree"

Catfree = [^\uDE38-\uDE40]* bad output: ContainsACat = Complement(Catfree)

  • x(smileycipher(x)  ContainsACat) ?

  {x | smileycipher(x)  ContainsACat} Using solver

Does there exist an input x that causes a "cat" in the

  • utput ?

http://www.rise4fun.com/Bek/nDx

April 5, 2014 VSSE'14, Grenoble, France 34

slide-34
SLIDE 34

Extensions of SFAs and SFTs

  • ESFT

– Extended SFT with look-ahead (VMCAI’13, CAV'13) – BEX compiler (PSI’14)

  • STT

– Symbolic tree transducers (PSI’11) – STT with regular lookahead (PLDI’14)

  • ST

– SFT with registers (POPL’12) i.e. not finite state

April 5, 2014 VSSE'14, Grenoble, France 35

slide-35
SLIDE 35

Questions?

Links:

– Bek http://rise4fun.com/Bek/tiutorial – Bex http://rise4fun.com/Bex/tutorial – Rex http://rise4fun.com/rex/ – Fast http://rise4fun.com/Fast/tutorial

VSSE'14, Grenoble, France 36 April 5, 2014