Modular static analysis of string manipulations in C programs a - - PowerPoint PPT Presentation

modular static analysis of string manipulations
SMART_READER_LITE
LIVE PREVIEW

Modular static analysis of string manipulations in C programs a - - PowerPoint PPT Presentation

Modular static analysis of string manipulations in C programs a Matthieu Journault, Antoine Min e, Abdelraouf Ouadjaout August 30, 2018 a This work is supported by the European Research Council under Consolidator Grant Agreement 681393


slide-1
SLIDE 1

Modular static analysis of string manipulations in C programsa

Matthieu Journault, Antoine Min´ e, Abdelraouf Ouadjaout August 30, 2018

aThis work is supported by the European Research Council under Consolidator

Grant Agreement 681393 – MOPSA.

slide-2
SLIDE 2

Introductory example

1 while (*q != ’\0’) { 2

*p = *q;

3

p++;

4

q++;

5 } 6 *p = *q;

Program 1: strcpy

  • p

desta desta − op srcl

  • qsrcl − oq

dest src = ’\0’ = ’\0’ ?

No out of bounds access if srcl − oq < desta − op and srcl ≥ oq and srcl < srca

1

slide-3
SLIDE 3

Modular analysis?

Goal Infer a summary of strcpy function to prevent reanalysis in top-down analysis.

1 char* strcpy(char* p, char* q) { 2

while (*q != ’\0’) {

3

*p = *q;

4

p ++;

5

q ++;

6

}

7

*p = *q;

8 }

Program 2: strcpy

Problems:

  • What do pointers point to?
  • What are the aliasing patterns?

2

slide-4
SLIDE 4

Table of contents

  • 1. Cell abstract domain
  • 2. String abstraction
  • 3. Going modular
  • 4. Implementation
  • 5. Conclusion

3

slide-5
SLIDE 5

Language

intType

= s8 | s16 | s32 | s64 | u8 | u16 | u32 | u64 scalarType

= intType | ptr type

= scalarType | type[n] n ∈ N | struct{u0 : type, . . . , un−1 : type} | union{u0 : type, . . . , un−1 : type} lval

= *scalarTypeexpr | v ∈ V expr

= cst cst ∈ N | &lval | expr ⋄ expr ⋄ ∈ {+, ≤, . . . } stmt

= v = malloc(e) v ∈ V, e ∈ expr | type v v ∈ V | · · ·

Figure 1: Syntax of the language

  • array, union type, struct type
  • string
  • dynamic allocation
  • pointer arithmetic
  • pointer dereference with arbitrary types

Low level (pointer casts) and high level (string length) considerations. 4

slide-6
SLIDE 6

Cell abstract domain

slide-7
SLIDE 7

Cell definition

u 1 2 ’a’ v ’b’ ’c’ ’\0’ ’n’ Byte-level memory representation: 258 Cell memory representation: u, 0, u32 v, 0, u8 ’a’ v, 1, u8 ’b’ v, 2, u8 ’c’ v, 3, u8 ’\0’ v, 4, u8 ’n’

Cell definition Cell

= {V , o, t | V ∈ V, t ∈ scalarType, 0 ≤ o ≤ sizeof (typeof (V )) − sizeof (t)}.

5

slide-8
SLIDE 8

Pointers and numerical values

Pointer representation

  • a set of all possible base variables pointed to (⊆ V):

PC

= C|ptr → ℘(V ∪ {NULL, invalid})

  • a numerical variable coding for the offset of the pointer

Example

1

u32 a=1;

2

u32* p=&a; ∅, ⊤, ∅ {a ∆ = a, 0, u32}, a → 1, ∅

  • a,

p ∆ = p, 0, ptr

  • ,

a → 1 p → 0

  • , {p → {a}}

6

slide-9
SLIDE 9

Abstraction

Pointers offsets and numerical cells values? Use a numerical domain to express constraints on both (potentially relational constraints between pointers offsets and numerical variable values). N♯

C: a numerical domain over the cell set C.

D♯

m

= {C, R♯, P | C ⊆ Cell, R♯ ∈ N♯

C, P ∈ PC}

Remarks

  • Dynamic set of cells
  • Recency abstraction used for dynamic memory allocations

7

slide-10
SLIDE 10

String abstraction

slide-11
SLIDE 11

Introduction

Domain presentation

  • A set V ⊆ V of string variables.
  • For s ∈ V: sl, sa denote length and allocated size of buffer s.
  • Enrich numerical domain to account for length and allocated

size of buffers.

  • Partition memory in zones dealt with by the cell domain or

the string domain. Example

1 char [3] s; 2 int a = 0; 3 s[a] = ’u’; 4 s[a+1] = ’\0’;

{a ∆ = a, 0, s32},    a = 0, sl = 1, sa = 3    , ∅ 8

slide-12
SLIDE 12

Computable Galois connection with the Cell abstract domain

Translation functions to cell(s, S♯) (resp. from cell(s, S♯)) computable functions to transform string handling from string domain to cell domain (resp. from cell domain to string domain). Example

∅,

  • sl = 9

sa = 15

  • , ∅

              s0

= s, 0, u8 . . . s9

= s, 9, u8 . . . s15

= s, 15, u8                ,                s0 ≥ 1 s0 ≤ 255 s1 ≥ 1 s1 ≤ 255 . . . s9 = 0                , ∅) to cell(s, ·) from cell(s, ·) 9

slide-13
SLIDE 13

Operators and transformers

Operators Unify strings and cells and rely on numerical domain operators. Transformers Only need definitions for: S♯s[e1] = e2(S♯) where s ∈ V E♯s[e](S♯) where s ∈ V Remarks on analyzer:

  • Analyzer provides dynamic expression transformations
  • Evaluations yield a disjunctive form on expressions and

abstract states (℘(expr × D♯))

10

slide-14
SLIDE 14

Evaluation

E♯s[e](S♯) =

  • (tests,eval)∈table

(eval, tests(S♯))

case tests on offset evaluation before 0 ≤ e ∧ e < l ∧ e < a [1; 255] at 0 ≤ e ∧ e = l ∧ e < a after 0 ≤ e ∧ e > l ∧ e < a [0; 255] error e > a ∨ e < 0 ∅ = 0 = 0 = 0 = 0 = 0 = 0 l ? ? ? a before at after 11

slide-15
SLIDE 15

Transformation

S♯s[e1] = e2(S♯) =

  • (tests,transf)∈table

(transftests(S♯))

case tests on offsets tests on rhs transformation set0 e1 ≥ 0 ∧ e1 ≤ l ∧ e1 < a e2 = 0 l ← e1 = 0 = 0 = 0 = 0 = 0 = 0 l ? ? ? a = 0 set0 12

slide-16
SLIDE 16

Transformation

S♯s[e1] = e2(S♯) =

  • (tests,transf)∈table

(transftests(S♯))

case tests on offsets tests on rhs transformation set0 e1 ≥ 0 ∧ e1 ≤ l ∧ e1 < a e2 = 0 l ← e1 setnon0 e1 ≥ 0 ∧ e1 = l ∧ e1 < a e2 = 0 l ← [e1 + 1; a] = 0 = 0 = 0 = 0 = 0 = 0 l ? ? ? a = 0 setnon0 12

slide-17
SLIDE 17

Transformation

S♯s[e1] = e2(S♯) =

  • (tests,transf)∈table

(transftests(S♯))

case tests on offsets tests on rhs transformation set0 e1 ≥ 0 ∧ e1 ≤ l ∧ e1 < a e2 = 0 l ← e1 setnon0 e1 ≥ 0 ∧ e1 = l ∧ e1 < a e2 = 0 l ← [e1 + 1; a] unchanged e1 ≥ 0 ∧ e1 < l ∧ e1 < a e2 = 0 . . . . . . . . . . . . = 0 = 0 = 0 = 0 = 0 = 0 l ? ? ? a = 0 unchanged 12

slide-18
SLIDE 18

Example

1

while (*q != ’\0’) {

2

*p = *q;

3

p ++;

4

q ++;

5 }; 6 *p = *q;

Program 3: strcpy

{p, q},    p = 0, q = 0, 0 ≤ sl < sa, 0 ≤ tl < ta    , p → s, q → t

  • {p, q},

       −p + q = 0, sl ≥ p, tl ≥ p + 1, . . .        , p → s, q → t

  • {p, q},

   tl = sl, q = sl, sl ≥ 0, sa ≥ sl + 1, . . .    , p → s, q → t

  • 13
slide-19
SLIDE 19

Going modular

slide-20
SLIDE 20

Avoid losing precision, gain scalability

Goals:

  • Function analysis should be done with call site information

(pointer aliasing, variable ranges): Top down analysis

  • Classic top down analysis: function calls are inlined
  • Use function body analysis to infer a summary
  • Replace further function body analysis by use of a summary

Incrementation function

1 int

incr(int x) {

2

return (x+1);

3 }

Program 4: incr

  • Semantic of incr can not be

exactly represented by a list of tabulated input/output pairs.

  • ⇒ use a relational domain:

{x = x′ + 1} = λx.x + 1

14

slide-21
SLIDE 21

Modular analysis

In practice: Summaries are : {(F0, R0), (F1, R1), . . . } such that:

  • Fi is a precondition (an abstract element)
  • Ri is a postcondition (an abstract relation).

When encountering a function call made from an abstract state S♯:

  • If there exists some Fi such that S♯ ⊑ Fi, return Ri(S♯)
  • Otherwise if number of summaries is low perform relational

analysis of the body of the function starting from S♯: (yielding R) and store newly found (S♯, R) relation.

  • Otherwise choose some summary (Fi, Ri), perform relational

analysis of the body of the function starting from Fi▽S♯: (yielding R) and store newly found (Fi▽S♯, R) relation.

15

slide-22
SLIDE 22

Not just numerical

Incrementation function

1 ... 2

incr(p)

3 ... 4

incr(q)

5 ... 6 void

incr(int* x) {

7

*x = *x + 1;

8

return;

9 }

{p, a, b},

  • a ≥ 0,

p = 0

  • , {p → a}

{q, c},

  • c = 0,

q = 0

  • , {q → c}

16

slide-23
SLIDE 23

Not just numerical

Incrementation function

1 ... 2

incr(p)

3 ... 4

incr(q)

5 ... 6 void

incr(int* x) {

7

*x = *x + 1;

8

return;

9 }

{p, a, b},

  • a ≥ 0,

p = 0

  • , {p → a}

{p, α},

  • α ≥ 0,

p = 0

  • , {p → α}

{q, c},

  • c = 0,

q = 0

  • , {q → c}

Input generalization

  • Remove useless information (unreachable blocks) from input
  • Universal quantification of some memory blocks (α here is a

symbolic cell) ⇒ framing

16

slide-24
SLIDE 24

Relations on structured states

1 void

incr(int* x) {

2

*x = *x + 1;

3

return;

4 }

with filter: {x, α},

  • α ≥ 0,

x = 0

  • , {x → α}

Discover relations of the form: (Si, N, So): Si = {x, α}, {x → α} N = {α′ ≥ 0, x = 0, α = α′ + 1, x = x′} So = {x, α}, {x → α} ⇒ Can be reused for other calls to incr

17

slide-25
SLIDE 25

Example

1 void

strcat(char* dest , char* src)

2 { 3

int i; int j;

4

for (i=0; dest[i]!= ’\0’;i++) ;

5

for (j=0; src[j]!= ’\0’;j++)

6

dest[i+j] = src[j];

7

dest[i+j] = ’\0’;

8 }

Program 5: strcat

Summary: When:

  • dest −

0 β

  • src −

0 γ

Then: {β′

l = γl + βl, β′ a = βa, γ′ a = γa, γ′ l =

γl, γ′

l ≥ 0, γ′ l ≤ γ′ a − 1, γ′ l ≤ β′ l, β′ l ≤

β′

a − 1} 18

slide-26
SLIDE 26

Implementation

slide-27
SLIDE 27

Implementation

Mopsa A framework for the development of static analyzers by abstract interpretation, written in OCaml, that supports (for now) a subset

  • f C and Python.

Tests Focused on comparison with previous works:

  • Simon and King: 2 examples successfully analyzed
  • Allamigeon et al: 1 example successfully analyzed
  • Dor et al: 7 out of 9 example of Web2c successfully analyzed

(lack C features)

19

slide-28
SLIDE 28

Example of successfully analyzed programs (1/2)

1 typedef struct { 2 char* f; 3 } s; 4 char buf [10]; 5 6 void init(s* x) { 7 x[1].f = buf; 8 } 9 int main () { 10 s a[2][2]; 11 s* ptr = (s*) &(a[1]); 12 init(ptr); 13 ptr = (s*) &(a[0]); 14 strcpy(a[1][1].f,"strcpy ok"); 15 strcpy(a[1][1].f,"strcpy not ok"); 16 }

Program 6: string in struct

  • From Allamigeon et al.
  • strcpy called on a string in

a struct itself in a matrix, and accessed by pointer manipulations.

  • no false positive, error

found at line 15.

20

slide-29
SLIDE 29

Example of successfully analyzed programs (2/2)

1 char* insert_long (char* cp) 2 { 3

char tbuf[BUFSIZ ];

4

int i;

5

for (i=0;& buf[i]<cp ;++i)

6

tbuf[i] = buf[i];

7

strcpy (& tbuf[i],"(long)");

8

strcpy (& tbuf[i + 6], cp);

9

strcpy(buf , tbuf);

10

return cp + 6;

11 }

Program 7: insert long

  • From Dor et al., taken from

Web2c.

  • inserts "(long)" string in a

global variable buf at position pointed to by cp.

  • Modular relation inferred

for insert long: length of buf increased by 6 (under some conditions).

21

slide-30
SLIDE 30

Conclusion

slide-31
SLIDE 31

Conclusion

Results

  • Development of a low level C static analyzer performing higher

level abstractions

  • Lifting of the analyzer to a precise modular framework based
  • n IO relations, partitioning and input generalizations

Perspectives

  • Bigger code analysis where modularity is necessary to scale
  • Better pointer aliasing and memory blocks modularity
  • Partitioning and framing with numerical consideration
  • Improve relations between String and Cell abstract domain

(interesting dynamic partitioning? reduced product?)

22