modular static analysis of string manipulations
play

Modular static analysis of string manipulations in C programs a - PowerPoint PPT Presentation

Modular static analysis of string manipulations in C programs a Matthieu Journault, Antoine Min e, Abdelraouf Ouadjaout August 30, 2018 a This work is supported by the European Research Council under Consolidator Grant Agreement 681393


  1. Modular static analysis of string manipulations in C programs a Matthieu Journault, Antoine Min´ e, Abdelraouf Ouadjaout August 30, 2018 a This work is supported by the European Research Council under Consolidator Grant Agreement 681393 – MOPSA.

  2. Introductory example dest a − o p o p dest a 1 while (*q != ’\0’) { *p = *q; 2 dest p++; 3 q++; 4 src 5 } 6 *p = *q; o q src l − o q src l Program 1: strcpy � = ’\0’ = ’\0’ ? No out of bounds access if src l − o q < dest a − o p and src l ≥ o q and src l < src a 1

  3. Modular analysis? Goal Infer a summary of strcpy function to prevent reanalysis in top-down analysis. 1 char* strcpy(char* p, char* q) { while (*q != ’\0’) { 2 *p = *q; 3 p ++; 4 q ++; 5 } 6 *p = *q; 7 8 } Program 2: strcpy Problems: • What do pointers point to? 2 • What are the aliasing patterns?

  4. Table of contents 1. Cell abstract domain 2. String abstraction 3. Going modular 4. Implementation 5. Conclusion 3

  5. Language ∆ lval = * scalarType expr | v ∈ V ∆ = s8 | s16 | s32 | s64 intType ∆ expr = cst cst ∈ N | u8 | u16 | u32 | u64 | & lval ∆ = intType | ptr scalarType | expr ⋄ expr ⋄ ∈ { + , ≤ , . . . } ∆ = scalarType type ∆ stmt = v = malloc ( e ) | type [ n ] n ∈ N v ∈ V , e ∈ expr | struct { u 0 : type , . . . , u n − 1 : type } | type v v ∈ V | union { u 0 : type , . . . , u n − 1 : type } | · · · Figure 1: Syntax of the language • array, union type, struct type • string • dynamic allocation • pointer arithmetic • pointer dereference with arbitrary types Low level (pointer casts) and high level (string length) considerations. 4

  6. Cell abstract domain

  7. Cell definition Byte-level memory representation: u 0 0 1 2 v ’a’ ’b’ ’c’ ’\0’ ’n’ Cell memory representation: � u , 0 , u32 � � v , 2 , u8 � 258 ’c’ � v , 0 , u8 � � v , 3 , u8 � ’a’ ’\0’ � v , 1 , u8 � � v , 4 , u8 � ’b’ ’n’ Cell definition ∆ C ell = {� V , o , t � | V ∈ V , t ∈ scalarType , 0 ≤ o ≤ sizeof ( typeof ( V )) − sizeof ( t ) } . 5

  8. Pointers and numerical values Pointer representation • a set of all possible base variables pointed to ( ⊆ V ): ∆ P C = C | ptr → ℘ ( V ∪ { NULL , invalid } ) • a numerical variable coding for the offset of the pointer Example �∅ , ⊤ , ∅� �{ a ∆ = � a , 0 , u32 �} , a �→ 1 , ∅� u32 a=1; 1 � a �→ 1 � � u32* p=&a; a , � 2 � , , { p �→ { a }}� p ∆ p �→ 0 = � p , 0 , ptr � 6

  9. Abstraction Pointers offsets and numerical cells values? Use a numerical domain to express constraints on both (potentially relational constraints between pointers offsets and numerical variable values). N ♯ C : a numerical domain over the cell set C . = {� C , R ♯ , P � | C ⊆ C ell , R ♯ ∈ N ♯ D ♯ ∆ C , P ∈ P C } m Remarks • Dynamic set of cells • Recency abstraction used for dynamic memory allocations 7

  10. String abstraction

  11. Introduction Domain presentation • A set V ⊆ V of string variables. • For s ∈ V : s l , s a denote length and allocated size of buffer s . • Enrich numerical domain to account for length and allocated size of buffers. • Partition memory in zones dealt with by the cell domain or the string domain. Example 1 char [3] s;   a = 0 ,   2 int a = 0; �{ a ∆ = � a , 0 , s32 �} , s l = 1 ,  , ∅� 3 s[a] = ’u’; s a = 3  4 s[a+1] = ’\0’; 8

  12. Computable Galois connection with the Cell abstract domain Translation functions to cell ( s , S ♯ ) (resp. from cell ( s , S ♯ )) computable functions to transform string handling from string domain to cell domain (resp. from cell domain to string domain). Example to cell ( s , · ) s 0 ≥ 1  ∆    s 0 = � s , 0 , u8 �        s 0 ≤ 255       . . .            s 1 ≥ 1 � s l = 9 �     ∆ � , , ∅� ) �∅ , , ∅� s 9 = � s , 9 , u8 � s 1 ≤ 255 s a = 15     . . .        . . .          ∆     = � s , 15 , u8 �  s 15    s 9 = 0 from cell ( s , · ) 9

  13. Operators and transformers Operators Unify strings and cells and rely on numerical domain operators. Transformers Only need definitions for: S ♯ � s [ e 1 ] = e 2 � ( S ♯ ) where s ∈ V E ♯ � s [ e ] � ( S ♯ ) where s ∈ V Remarks on analyzer: • Analyzer provides dynamic expression transformations • Evaluations yield a disjunctive form on expressions and abstract states ( ℘ ( expr × D ♯ )) 10

  14. Evaluation E ♯ � s [ e ] � ( S ♯ ) = ( eval , � tests � ( S ♯ )) � ( tests , eval ) ∈ table case tests on offset evaluation before 0 ≤ e ∧ e < l ∧ e < a [1; 255] at 0 ≤ e ∧ e = l ∧ e < a 0 after 0 ≤ e ∧ e > l ∧ e < a [0; 255] error e > a ∨ e < 0 ∅ l a � = 0 � = 0 � = 0 � = 0 � = 0 = 0 ? ? ? at before after 11

  15. Transformation S ♯ � s [ e 1 ] = e 2 � ( S ♯ ) = ( � transf �� tests � ( S ♯ )) � ( tests , transf ) ∈ table case tests on offsets tests on rhs transformation set0 e 1 ≥ 0 ∧ e 1 ≤ l ∧ e 1 < a e 2 = 0 l ← e 1 l a � = 0 � = 0 � = 0 � = 0 � = 0 = 0 ? ? ? = 0 set0 12

  16. Transformation S ♯ � s [ e 1 ] = e 2 � ( S ♯ ) = ( � transf �� tests � ( S ♯ )) � ( tests , transf ) ∈ table case tests on offsets tests on rhs transformation set0 e 1 ≥ 0 ∧ e 1 ≤ l ∧ e 1 < a e 2 = 0 l ← e 1 setnon0 e 1 ≥ 0 ∧ e 1 = l ∧ e 1 < a e 2 � = 0 l ← [ e 1 + 1; a ] l a � = 0 � = 0 � = 0 � = 0 � = 0 = 0 ? ? ? setnon0 � = 0 12

  17. Transformation S ♯ � s [ e 1 ] = e 2 � ( S ♯ ) = ( � transf �� tests � ( S ♯ )) � ( tests , transf ) ∈ table case tests on offsets tests on rhs transformation set0 e 1 ≥ 0 ∧ e 1 ≤ l ∧ e 1 < a e 2 = 0 l ← e 1 setnon0 e 1 ≥ 0 ∧ e 1 = l ∧ e 1 < a e 2 � = 0 l ← [ e 1 + 1; a ] unchanged e 1 ≥ 0 ∧ e 1 < l ∧ e 1 < a e 2 � = 0 . . . . . . . . . . . . l a � = 0 � = 0 � = 0 � = 0 � = 0 = 0 ? ? ? unchanged � = 0 12

  18. Example  p = 0 , q = 0 ,  � p �→ s , �   �{ p , q } , 0 ≤ s l < s a , �  , while (*q != ’\0’) { 1 q �→ t 0 ≤ t l < t a  *p = *q; 2 p ++; 3  − p + q = 0 ,      q ++; s l ≥ p , � p �→ s , � 4   �{ p , q } , � , t l ≥ p + 1 , q �→ t 5 };      . . .  6 *p = *q;  t l = s l , q = s l ,  Program 3: strcpy � p �→ s , �   �{ p , q } , s l ≥ 0 , s a ≥ s l + 1 , �  , q �→ t  . . . 13

  19. Going modular

  20. Avoid losing precision, gain scalability Goals: • Function analysis should be done with call site information (pointer aliasing, variable ranges): Top down analysis • Classic top down analysis: function calls are inlined • Use function body analysis to infer a summary • Replace further function body analysis by use of a summary Incrementation function • Semantic of incr can not be 1 int incr(int x) { exactly represented by a list of return (x+1); 2 tabulated input/output pairs. 3 } • ⇒ use a relational domain: Program 4: incr � { x = x ′ + 1 } � = λ x . x + 1 14

  21. Modular analysis In practice: Summaries are : { ( F 0 , R 0 ) , ( F 1 , R 1 ) , . . . } such that: • F i is a precondition (an abstract element) • R i is a postcondition (an abstract relation). When encountering a function call made from an abstract state S ♯ : • If there exists some F i such that S ♯ ⊑ F i , return � R i � ( S ♯ ) • Otherwise if number of summaries is low perform relational analysis of the body of the function starting from S ♯ : (yielding R ) and store newly found ( S ♯ , R ) relation. • Otherwise choose some summary ( F i , R i ), perform relational analysis of the body of the function starting from F i ▽ S ♯ : (yielding R ) and store newly found ( F i ▽ S ♯ , R ) relation. 15

  22. Not just numerical Incrementation function � � a ≥ 0 , 1 ... �{ p , a , b } , , { p �→ a }� incr(p) 2 p = 0 3 ... incr(q) 4 5 ... 6 void incr(int* x) { *x = *x + 1; 7 � � c = 0 , return; 8 �{ q , c } , , { q �→ c }� 9 } q = 0 16

  23. Not just numerical Incrementation function � � a ≥ 0 , 1 ... �{ p , a , b } , , { p �→ a }� incr(p) 2 p = 0 3 ... � � incr(q) 4 α ≥ 0 , �{ p , α } , , { p �→ α }� 5 ... p = 0 6 void incr(int* x) { *x = *x + 1; 7 � � c = 0 , return; 8 �{ q , c } , , { q �→ c }� 9 } q = 0 Input generalization • Remove useless information (unreachable blocks) from input • Universal quantification of some memory blocks ( α here is a symbolic cell) ⇒ framing 16

  24. Relations on structured states 1 void incr(int* x) { � � α ≥ 0 , *x = *x + 1; 2 with filter: �{ x , α } , , { x �→ α }� return; 3 x = 0 4 } Discover relations of the form: ( S i , N , S o ): S i = { x , α } , { x �→ α } N = { α ′ ≥ 0 , x = 0 , α = α ′ + 1 , x = x ′ } S o = { x , α } , { x �→ α } ⇒ Can be reused for other calls to incr 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend