 
              Modular static analysis of string manipulations in C programs a Matthieu Journault, Antoine Min´ e, Abdelraouf Ouadjaout August 30, 2018 a This work is supported by the European Research Council under Consolidator Grant Agreement 681393 – MOPSA.
Introductory example dest a − o p o p dest a 1 while (*q != ’\0’) { *p = *q; 2 dest p++; 3 q++; 4 src 5 } 6 *p = *q; o q src l − o q src l Program 1: strcpy � = ’\0’ = ’\0’ ? No out of bounds access if src l − o q < dest a − o p and src l ≥ o q and src l < src a 1
Modular analysis? Goal Infer a summary of strcpy function to prevent reanalysis in top-down analysis. 1 char* strcpy(char* p, char* q) { while (*q != ’\0’) { 2 *p = *q; 3 p ++; 4 q ++; 5 } 6 *p = *q; 7 8 } Program 2: strcpy Problems: • What do pointers point to? 2 • What are the aliasing patterns?
Table of contents 1. Cell abstract domain 2. String abstraction 3. Going modular 4. Implementation 5. Conclusion 3
Language ∆ lval = * scalarType expr | v ∈ V ∆ = s8 | s16 | s32 | s64 intType ∆ expr = cst cst ∈ N | u8 | u16 | u32 | u64 | & lval ∆ = intType | ptr scalarType | expr ⋄ expr ⋄ ∈ { + , ≤ , . . . } ∆ = scalarType type ∆ stmt = v = malloc ( e ) | type [ n ] n ∈ N v ∈ V , e ∈ expr | struct { u 0 : type , . . . , u n − 1 : type } | type v v ∈ V | union { u 0 : type , . . . , u n − 1 : type } | · · · Figure 1: Syntax of the language • array, union type, struct type • string • dynamic allocation • pointer arithmetic • pointer dereference with arbitrary types Low level (pointer casts) and high level (string length) considerations. 4
Cell abstract domain
Cell definition Byte-level memory representation: u 0 0 1 2 v ’a’ ’b’ ’c’ ’\0’ ’n’ Cell memory representation: � u , 0 , u32 � � v , 2 , u8 � 258 ’c’ � v , 0 , u8 � � v , 3 , u8 � ’a’ ’\0’ � v , 1 , u8 � � v , 4 , u8 � ’b’ ’n’ Cell definition ∆ C ell = {� V , o , t � | V ∈ V , t ∈ scalarType , 0 ≤ o ≤ sizeof ( typeof ( V )) − sizeof ( t ) } . 5
Pointers and numerical values Pointer representation • a set of all possible base variables pointed to ( ⊆ V ): ∆ P C = C | ptr → ℘ ( V ∪ { NULL , invalid } ) • a numerical variable coding for the offset of the pointer Example �∅ , ⊤ , ∅� �{ a ∆ = � a , 0 , u32 �} , a �→ 1 , ∅� u32 a=1; 1 � a �→ 1 � � u32* p=&a; a , � 2 � , , { p �→ { a }}� p ∆ p �→ 0 = � p , 0 , ptr � 6
Abstraction Pointers offsets and numerical cells values? Use a numerical domain to express constraints on both (potentially relational constraints between pointers offsets and numerical variable values). N ♯ C : a numerical domain over the cell set C . = {� C , R ♯ , P � | C ⊆ C ell , R ♯ ∈ N ♯ D ♯ ∆ C , P ∈ P C } m Remarks • Dynamic set of cells • Recency abstraction used for dynamic memory allocations 7
String abstraction
Introduction Domain presentation • A set V ⊆ V of string variables. • For s ∈ V : s l , s a denote length and allocated size of buffer s . • Enrich numerical domain to account for length and allocated size of buffers. • Partition memory in zones dealt with by the cell domain or the string domain. Example 1 char [3] s;   a = 0 ,   2 int a = 0; �{ a ∆ = � a , 0 , s32 �} , s l = 1 ,  , ∅� 3 s[a] = ’u’; s a = 3  4 s[a+1] = ’\0’; 8
Computable Galois connection with the Cell abstract domain Translation functions to cell ( s , S ♯ ) (resp. from cell ( s , S ♯ )) computable functions to transform string handling from string domain to cell domain (resp. from cell domain to string domain). Example to cell ( s , · ) s 0 ≥ 1  ∆    s 0 = � s , 0 , u8 �        s 0 ≤ 255       . . .            s 1 ≥ 1 � s l = 9 �     ∆ � , , ∅� ) �∅ , , ∅� s 9 = � s , 9 , u8 � s 1 ≤ 255 s a = 15     . . .        . . .          ∆     = � s , 15 , u8 �  s 15    s 9 = 0 from cell ( s , · ) 9
Operators and transformers Operators Unify strings and cells and rely on numerical domain operators. Transformers Only need definitions for: S ♯ � s [ e 1 ] = e 2 � ( S ♯ ) where s ∈ V E ♯ � s [ e ] � ( S ♯ ) where s ∈ V Remarks on analyzer: • Analyzer provides dynamic expression transformations • Evaluations yield a disjunctive form on expressions and abstract states ( ℘ ( expr × D ♯ )) 10
Evaluation E ♯ � s [ e ] � ( S ♯ ) = ( eval , � tests � ( S ♯ )) � ( tests , eval ) ∈ table case tests on offset evaluation before 0 ≤ e ∧ e < l ∧ e < a [1; 255] at 0 ≤ e ∧ e = l ∧ e < a 0 after 0 ≤ e ∧ e > l ∧ e < a [0; 255] error e > a ∨ e < 0 ∅ l a � = 0 � = 0 � = 0 � = 0 � = 0 = 0 ? ? ? at before after 11
Transformation S ♯ � s [ e 1 ] = e 2 � ( S ♯ ) = ( � transf �� tests � ( S ♯ )) � ( tests , transf ) ∈ table case tests on offsets tests on rhs transformation set0 e 1 ≥ 0 ∧ e 1 ≤ l ∧ e 1 < a e 2 = 0 l ← e 1 l a � = 0 � = 0 � = 0 � = 0 � = 0 = 0 ? ? ? = 0 set0 12
Transformation S ♯ � s [ e 1 ] = e 2 � ( S ♯ ) = ( � transf �� tests � ( S ♯ )) � ( tests , transf ) ∈ table case tests on offsets tests on rhs transformation set0 e 1 ≥ 0 ∧ e 1 ≤ l ∧ e 1 < a e 2 = 0 l ← e 1 setnon0 e 1 ≥ 0 ∧ e 1 = l ∧ e 1 < a e 2 � = 0 l ← [ e 1 + 1; a ] l a � = 0 � = 0 � = 0 � = 0 � = 0 = 0 ? ? ? setnon0 � = 0 12
Transformation S ♯ � s [ e 1 ] = e 2 � ( S ♯ ) = ( � transf �� tests � ( S ♯ )) � ( tests , transf ) ∈ table case tests on offsets tests on rhs transformation set0 e 1 ≥ 0 ∧ e 1 ≤ l ∧ e 1 < a e 2 = 0 l ← e 1 setnon0 e 1 ≥ 0 ∧ e 1 = l ∧ e 1 < a e 2 � = 0 l ← [ e 1 + 1; a ] unchanged e 1 ≥ 0 ∧ e 1 < l ∧ e 1 < a e 2 � = 0 . . . . . . . . . . . . l a � = 0 � = 0 � = 0 � = 0 � = 0 = 0 ? ? ? unchanged � = 0 12
Example  p = 0 , q = 0 ,  � p �→ s , �   �{ p , q } , 0 ≤ s l < s a , �  , while (*q != ’\0’) { 1 q �→ t 0 ≤ t l < t a  *p = *q; 2 p ++; 3  − p + q = 0 ,      q ++; s l ≥ p , � p �→ s , � 4   �{ p , q } , � , t l ≥ p + 1 , q �→ t 5 };      . . .  6 *p = *q;  t l = s l , q = s l ,  Program 3: strcpy � p �→ s , �   �{ p , q } , s l ≥ 0 , s a ≥ s l + 1 , �  , q �→ t  . . . 13
Going modular
Avoid losing precision, gain scalability Goals: • Function analysis should be done with call site information (pointer aliasing, variable ranges): Top down analysis • Classic top down analysis: function calls are inlined • Use function body analysis to infer a summary • Replace further function body analysis by use of a summary Incrementation function • Semantic of incr can not be 1 int incr(int x) { exactly represented by a list of return (x+1); 2 tabulated input/output pairs. 3 } • ⇒ use a relational domain: Program 4: incr � { x = x ′ + 1 } � = λ x . x + 1 14
Modular analysis In practice: Summaries are : { ( F 0 , R 0 ) , ( F 1 , R 1 ) , . . . } such that: • F i is a precondition (an abstract element) • R i is a postcondition (an abstract relation). When encountering a function call made from an abstract state S ♯ : • If there exists some F i such that S ♯ ⊑ F i , return � R i � ( S ♯ ) • Otherwise if number of summaries is low perform relational analysis of the body of the function starting from S ♯ : (yielding R ) and store newly found ( S ♯ , R ) relation. • Otherwise choose some summary ( F i , R i ), perform relational analysis of the body of the function starting from F i ▽ S ♯ : (yielding R ) and store newly found ( F i ▽ S ♯ , R ) relation. 15
Not just numerical Incrementation function � � a ≥ 0 , 1 ... �{ p , a , b } , , { p �→ a }� incr(p) 2 p = 0 3 ... incr(q) 4 5 ... 6 void incr(int* x) { *x = *x + 1; 7 � � c = 0 , return; 8 �{ q , c } , , { q �→ c }� 9 } q = 0 16
Not just numerical Incrementation function � � a ≥ 0 , 1 ... �{ p , a , b } , , { p �→ a }� incr(p) 2 p = 0 3 ... � � incr(q) 4 α ≥ 0 , �{ p , α } , , { p �→ α }� 5 ... p = 0 6 void incr(int* x) { *x = *x + 1; 7 � � c = 0 , return; 8 �{ q , c } , , { q �→ c }� 9 } q = 0 Input generalization • Remove useless information (unreachable blocks) from input • Universal quantification of some memory blocks ( α here is a symbolic cell) ⇒ framing 16
Relations on structured states 1 void incr(int* x) { � � α ≥ 0 , *x = *x + 1; 2 with filter: �{ x , α } , , { x �→ α }� return; 3 x = 0 4 } Discover relations of the form: ( S i , N , S o ): S i = { x , α } , { x �→ α } N = { α ′ ≥ 0 , x = 0 , α = α ′ + 1 , x = x ′ } S o = { x , α } , { x �→ α } ⇒ Can be reused for other calls to incr 17
Recommend
More recommend