symbolic finite automata
play

Symbolic Finite Automata Margus Veanes April 5, 2014 VSSE'14, - PowerPoint PPT Presentation

Applications of Symbolic Finite Automata Margus Veanes April 5, 2014 VSSE'14, Grenoble, France 1 Overview Are SFAs applicable to analysis of software evolution? automata modulo theories S ymbolic Finite Automaton (SFA) Main


  1. Applications of Symbolic Finite Automata Margus Veanes April 5, 2014 VSSE'14, Grenoble, France 1

  2. Overview • Are SFAs applicable to analysis of software evolution? • automata modulo theories S ymbolic Finite Automaton (SFA) • Main properties Boolean closed, succinct for large  • Symbolic finite transducers SFA with symbolic outputs • Current Applications – Testing (unit, fuzz) – Regex processing – Web security – SMT theory plugin – backend for MSO • Extensions – look-ahead – trees – registers April 5, 2014 VSSE'14, Grenoble, France 2

  3. Automata based analysis of software evolution? • Possible extension of graph based approaches – SFAs are directed graphs – In addition to talking about structural properties such as cyclicity and rank , one can talk about regularity and language • Possible extension of FSA based approaches – Not bound to a finite small alphabet – The alphabet can be rich, possibly infinite • Brings in an aspect of model based analysis – SFA can act as a model or oracle April 5, 2014 VSSE'14, Grenoble, France 3

  4. Mile-high view SFA 1  Software learn monitor traces  v1  evolve L (SFA 1 ) = L (SFA 2 ) ? trace  L (SFA 1 ) ? SFA 2 Software  traces learn monitor v2   April 5, 2014 VSSE'14, Grenoble, France 4

  5. Possible scenario Prog.v1 = loop{t= now ; critical_code ; save(now-t) } 0-255 regex: [\0-\xFF]+ SFA 1 : 0-255 Prog.v2 = loop{t= now ; critical_code_upd ; save(now-t) } trace: [56,150, 500 ]  L (SFA 1 ) April 5, 2014 VSSE'14, Grenoble, France 5

  6. Symbolic Finite Automaton (SFA) • Alphabet is an effective Boolean Algebra A • Labels are predicates over A  x . 'a' ≤ x ≤ 'd' one symbolic p q transition: for x  〚 'a ' ≤ x ≤ 'd ' 〛 'a' denotes 'c' many concrete p q 'b' transitions: 'd' April 5, 2014 VSSE'14, Grenoble, France 6

  7. SFA Execution Example odd(x) even(x) even(x) p q odd(x) 1 2 5 3 p p q p p p is final  accept the input 7 April 5, 2014 VSSE'14, Grenoble, France

  8. Alphabet Effective Boolean Algebra   2 D Domain Predicates April 5, 2014 VSSE'14, Grenoble, France 8

  9. Alphabet SMT int • D = Integers  = integer linear arithmetic formulas • (with one fixed free variable) • 〚    〛 = 〚  〛  〚  〛 • 〚  〛 =  , 〚   〛 = D \ 〚  〛 • Sat atis isfiab fiability ility: 〚  〛   April 5, 2014 VSSE'14, Grenoble, France 9

  10. Alphabet 2 {a,b} {  ,{a},{b},{a,b}} c {a,b}    {a,b} id {a} {a,b} {b} SFA over 2 {a,b} : p q regex : a*b(a|b)* April 5, 2014 VSSE'14, Grenoble, France 10

  11. Alphabet 2 bv k • D = {n | 0  n < 2 k } •  = BDDs of depth k • Boolean operations are BDD operations • Below 〚  i 〛 = {n  D | i'th bit of n is 1}  i has fixed size independent of i April 5, 2014 VSSE'14, Grenoble, France 11

  12. Boolean operations over SFAs • Intersection (product of transitions)  1 A 1 : p 1 q 1  1  2 A 1  A 2 : p 1 q 1 X  2 p 2 q 2 A 2 : p 2 q 2 delete when  1  2 unsat April 5, 2014 VSSE'14, Grenoble, France 12

  13. Boolean operations over SFAs • Complementation ( first determinize then swap final and nonfinal states ) delete unsat guards  {q}  p q {p} determinize {q,r}   r {r}  April 5, 2014 VSSE'14, Grenoble, France 13

  14. Intersection example let  k ( x )  (( x mod k ) = 0)  2  2  3 a 1 a 2 A:  6  6 a 2 a 1 A  B: b 2 b 1  6   3  3 B: X  6  3 b 1 b 2 a 1  3 b 2 April 5, 2014 VSSE'14, Grenoble, France 14

  15. Are SFAs a useful extension of classical automata? • Can classical automata theory and algorithms be extended to work modulo large (infinite) alphabets  ? • The answer is nontrivial. For example. – NFA determinization is O ( |  |2 n ) – DFA minimization is O ( |  | n log n ) What happens when  is infinite? April 5, 2014 VSSE'14, Grenoble, France 15

  16. Why care about symbolic representation at all? • Scalability . – Explicit expansion is expensive even for finite case (take e.g. ASCII where |  | = 2 7 ) • String analysis – typically  is UTF16, |  | = 2 16 • Often characters are lifted to integers and use arithmetic operations • List processing – elements are integers or have composite types, such as tuples or lists April 5, 2014 VSSE'14, Grenoble, France 16

  17. Perhaps SFA  NFA ? • Given SFA Create NFA whose characters are minterms of predicates occurring in the SFA • Minterms (  ,  ) = {  ,  ,  ,  } (keep satisfiable combinations only) • May blow up exponentially, e.g., the following SFA has 2 k minterms (alphabet 2 bv k ) April 5, 2014 VSSE'14, Grenoble, France 17

  18. We also want output • ... transducers April 5, 2014 VSSE'14, Grenoble, France 18

  19. Symbolic Finite Transducer (SFT) • Labels are guarded transformation functions Concrete transitions: Symbolic transition: guard p 1920 p transitions  x . 80 16 ≤ x ≤ 7FF 16 / … ‘ \ x7FF’/ ‘ \ x80’/ [C0 16 | x  10,6  , 80 16 | x  5,0  ] “ \xDF\xBF ” “ \xC2\ x80” bitvector q q operations April 5, 2014 VSSE'14, Grenoble, France 19

  20. SFT Execution Example odd(x)/[x-1] even(x)/[] even(x)/[x, x] p q odd(x)/[x-1] Input tape 1 2 5 3 p p q p p Output tape 0 2 2 4 2 20 April 5, 2014 VSSE'14, Grenoble, France

  21. Some Applications of SFAs/SFTs • SFAs: – Regex support in parameterized unit testing – Password generation • SFTs: – Analysis of string encoders/decoders – Security analysis of sanitizers April 5, 2014 VSSE'14, Grenoble, France 21

  22. Application 1 Regexes in parameterized unit testing • Rex component in Pex • Generate values for s that reach the return branches – s is a string of Unicode characters (16-bit bit-vectors) bool IsValidEmail(string s) { string r1 = @"^[A-Za-z0-9]+@(([A-Za-z0-9\-])+\.)+([A-Za-z\-])+$"; string r2 = @"^\d.*$"; if (System.Text.RegularExpressions.Regex.IsMatch(s, r1)) if (System.Text.RegularExpressions.Regex.IsMatch(s, r2)) return false; //branch 1 else Solve : s  L(r1)  L(r2) [eg . s = “3@a.b”] return true; //branch 2 else return false; //branch 3 Solve : s  L(r1)\L(r2) [eg . s = “a@b.c”] } Solve : s  L(r1) [eg . s = “a@..c”] April 5, 2014 VSSE'14, Grenoble, France 23

  23. Application 2 Password generation Given constraints: • Length is k: "^[\x21-\x7E]{k}$" • Contains 2 capital letters: "[A-Z].*[A-Z]" • Contains a digit: "\d" • Contains a non-word character: "\W" Generate random instances with uniform distribution that match all the above conditions. k=4 : http://www.rise4fun.com/Rex/4nE April 5, 2014 VSSE'14, Grenoble, France 24

  24. Application 3 String analysis ( motivating scenario) req = http://www.x.com/%c0%ae%c0%ae/%c0%ae%c0%ae/private/ 1) security check : req must not contain Analysis question : "../" Does utf8decode 2) dir = reject overlong utf8decode ("%c0%ae utf8-encodings such %c0%ae/%c0%ae%c 0%ae/private/") as "%C0%AE" for '.'? = "../../private/" access granted to "../../private/" Windows 2000 vulnerability: http://www.sans.org/security-resources/malwarefaq/wnt-unicode.php April 5, 2014 Apache Tomcat vulnerability: http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-2938 VSSE'14, Grenoble, France 25

  25. Application 3 (cont.) SFA Example • Utf8 validator (for up to 2 octet encodings) – Rejects invalid utf8 encoded strings Regex R utf8 : ^([\x00-\x7F]|[\xC2-\xDF][\x80-\xBF])*$ Accepts “../../”  x . C2 16 ≤ x ≤ DF 16  x . 0 ≤ x ≤ 7 F 16 q p  x . 80 16 ≤ x ≤ BF 16 Rejects “..%C0%AF../” April 5, 2014 VSSE'14, Grenoble, France 26

  26. Application 3 (cont.) Complete R utf8 April 5, 2014 VSSE'14, Grenoble, France 27

  27. Application 3 (cont.) Analysis scenario • Valid inputs A = SFA( R utf8 ) • Invalid inputs ( attack vectors ) A c = Complement(A) • Inputs accepted by Utf8Decode D = Domain( Utf8Decode ) • Does Utf8Decode accept an invalid input? A c  D   ? (e.g. "%c0%ae%c0%ae"  D) April 5, 2014 VSSE'14, Grenoble, France 28

  28. We also want to handle outputs • Want to analyze questions such as : Does Utf8Encode produce a bad output?  x ( Utf8Encode ( x )  Complement(SFA( R utf8 ))) ? • SFA + outputs = SFT April 5, 2014 VSSE'14, Grenoble, France 29

  29. SFT Example • Utf8 encoder – Input : valid utf16 encoded string – Output : equivalent utf8 encoded string For example utf8encode(“ \uFF28\ uFF29”) = “ \xEF\xBC\xA8\xEF\xBC\ xA9” Equiv. classical 5 states & transducer has 2 16 transitions 11 transitions April 5, 2014 VSSE'14, Grenoble, France 30

  30. Bek (a frontend language for SFTs) program smileycipher(w) { return iter (c in w) { case ( true ): yield (0xD83D,(c - 'a') + 0xDE00); }; } http://www.rise4fun.com/Bek/ZH0 April 5, 2014 VSSE'14, Grenoble, France 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend