Shape Analysis via Symbolic Memory Graphs and Conversion from - - PowerPoint PPT Presentation

shape analysis via symbolic memory graphs and conversion
SMART_READER_LITE
LIVE PREVIEW

Shape Analysis via Symbolic Memory Graphs and Conversion from - - PowerPoint PPT Presentation

Shape Analysis via Symbolic Memory Graphs and Conversion from Pointers to Containers Kamil Dudka Luk a s Hol k Petr Peringer Marek Trt k Tom a s Vojnar Brno University of Technology, Czech Republic Dagstuhl, 2/11/2015


slide-1
SLIDE 1

Shape Analysis via Symbolic Memory Graphs and Conversion from Pointers to Containers

Kamil Dudka Luk´ aˇ s Hol´ ık Petr Peringer Marek Trt´ ık Tom´ aˇ s Vojnar

Brno University of Technology, Czech Republic Dagstuhl, 2/11/2015

slide-2
SLIDE 2

Shape Analysis via Symbolic Memory Graphs [Dudka, Peringer, V., SAS’13]

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 2 / 25

slide-3
SLIDE 3

Symbolic Memory Graphs (SMGs)

An example of a kernel-style linked list used in Linux:

...

hfo nfo pfo

list_head custom_record next prev next prev next prev

2+ DLS

size(ptr),

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 3 / 25

slide-4
SLIDE 4

Symbolic Memory Graphs (SMGs)

An example of a kernel-style linked list used in Linux:

...

hfo nfo pfo

list_head custom_record next prev next prev next prev

An SMG describing the data structure above:

2+ DLS

hfo,lst hfo,fst 0,ptr 0,reg pfo,ptr size(ptr),ptr nfo,ptr

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 3 / 25

slide-5
SLIDE 5

Symbolic Memory Graphs (SMGs)

An example of a kernel-style linked list used in Linux:

...

hfo nfo pfo

list_head custom_record next prev next prev next prev

An SMG describing the data structure above:

2+ DLS

hfo,lst hfo,fst 0,ptr 0,reg pfo,ptr size(ptr),ptr nfo,ptr

SMGs are directed graphs consisting of:

◮ objects (allocated space) and values (addresses, integers), ◮ has-value and points-to edges. Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 3 / 25

slide-6
SLIDE 6

SMGs: Labelling of Objects (1/2)

2+ DLS

hfo,lst hfo,fst 0,ptr 0,reg pfo,ptr size(ptr),ptr nfo,ptr

Objects are further divided into:

◮ regions, i.e., individual blocks of memory, ◮ optional regions, i.e., a region or NULL, ◮ doubly-linked list segments (DLSs), ◮ singly-linked list segments (SLSs),

Each object has some (constant) size in bytes and a validity flag.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 4 / 25

slide-7
SLIDE 7

SMGs: Labelling of Objects (2/2)

Each DLS is given by a head, next, and prev field offset.

...

hfo nfo pfo

list_head custom_record next prev next prev next prev

DLSs can be of length:

◮ N+ for any N ≥ 0 or ◮ 0 or 1. Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 5 / 25

slide-8
SLIDE 8

SMGs: Labelling of Objects (2/2)

Each DLS is given by a head, next, and prev field offset.

...

hfo nfo pfo

list_head custom_record next prev next prev next prev

DLSs can be of length:

◮ N+ for any N ≥ 0 or ◮ 0 or 1.

Nodes of DLSs can point to objects that are:

◮ shared: each node points to the same object, or ◮ nested: each node points to a separate copy of the object.

  • Implemented by tagging objects by their nesting level.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 5 / 25

slide-9
SLIDE 9

SMGs: Has-Value and Points-To Edges

a1

region1 region2

size1

  • ffset1

size2

  • ffset2
  • ffset1, ptr
  • ffset2, reg

a1

size=size1 size=size2

Memory SMG

has-value points-to region1 region2 edge edge

Has-value edges – from objects to values, labelled by:

◮ field offset, ◮ type of the value stored in the field. Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 6 / 25

slide-10
SLIDE 10

SMGs: Has-Value and Points-To Edges

a1

region1 region2

size1

  • ffset1

size2

  • ffset2
  • ffset1, ptr
  • ffset2, reg

a1

size=size1 size=size2

Memory SMG

has-value points-to region1 region2 edge edge

Has-value edges – from objects to values, labelled by:

◮ field offset, ◮ type of the value stored in the field.

Points-to edges – from values (addresses) to objects, labelled by:

◮ target offset, ◮ target specifier: first/last/each node of a DLS,

  • specifier each node: used for back-links from nested objects.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 6 / 25

slide-11
SLIDE 11

SMGs: Join Operator

Traverses two SMGs and tries to join simultaneously encountered objects.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 7 / 25

slide-12
SLIDE 12

SMGs: Join Operator

Traverses two SMGs and tries to join simultaneously encountered objects. Objects being joined must be locally compatible (same size, nesting level, DLS linking offsets, ...).

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 7 / 25

slide-13
SLIDE 13

SMGs: Join Operator

Traverses two SMGs and tries to join simultaneously encountered objects. Objects being joined must be locally compatible (same size, nesting level, DLS linking offsets, ...). DLSs join with regions or DLSs.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 7 / 25

slide-14
SLIDE 14

SMGs: Join Operator

Traverses two SMGs and tries to join simultaneously encountered objects. Objects being joined must be locally compatible (same size, nesting level, DLS linking offsets, ...). DLSs join with regions or DLSs. If the above fails, try to insert a DLS

  • f length 0+ into one of the SMGs.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 7 / 25

slide-15
SLIDE 15

SMGs: Entailment Checking

The join of SMGs is re-used: G1 ⊑ G2 tested by computing G1 ⊔ G2 while checking that G1 consists of less general objects.

2+ 1+ 0+ 0+

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 8 / 25

slide-16
SLIDE 16

SMGs: Abstraction

Collapsing uninterrupted sequences of compatible objects (same size, nesting level, field offsets, ...) into DLSs. Uses join of the sub-SMGs under the nodes to be collapsed to see whether they are compatible too. 0+ 1+ 0+ 0+ 0+ Distinguishes cases of shared and private sub-SMGs. Heuristic control of the choice of sequences to collapse:

◮ ratio of loss of precision and number of collapsed objects. Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 9 / 25

slide-17
SLIDE 17

Predator: An Analyser Based on SMGs

slide-18
SLIDE 18

Predator: An Overview

An analyser based on SMGs. In addition to the basic features of SMGs, with a (partial) support of:

◮ pointer arithmetics, address alignment, ◮ interval-based pointers, interval-sized objects, ◮ block operations, re-intepretation of nullified blocks.

Verification of low-level system code (such as Linux kernel) that manipulates dynamic data structures. Proving absence of memory safety errors (invalid dereferences, buffer

  • verruns, memory leaks, ...).

Implemented as an open source GCC plug-in:

http://www.fit.vutbr.cz/research/groups/verifit/tools/predator

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 11 / 25

slide-19
SLIDE 19

From Pointers to (List) Containers [Dudka, Hol´ ık, Peringer, Trt´ ık, V., VMCAI’16]

slide-20
SLIDE 20

From Pointers to Containers: The Goal

typedef struct SNode { int x; struct SNode *f; struct SNode *b; } Node; #define NEW(T) (T*)malloc(sizeof(T)) 1 Node *h=0, *t=0; list<Node> L; 2 while (nondet()) { while (nondet()) { 3 Node *p=NEW(Node); Node *p=L.push_back(); 4 if (h==NULL) 5 h=p; 6 else 7 t->f=p; 8 p->f=NULL; 9 p->x=0; p->x=0; 10 p->b=t; 11 t=p; 12 } } ... ... 13 while (t) { while (!L.empty()) 14 Node *p=t->b; L.pop_back(); 15 if (p) p->n=NULL; 16 else h=NULL; 17 free(t); 18 t=p; 19 } Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 13 / 25

slide-21
SLIDE 21

From Pointers to Containers: Motivation

Recognition of containers and container operations has many applications: automatic parallelization, profiling and optimization, fault tolerance, program signatures for detection of plagiarism or malware, program understanding, debugging and automatic bug finding,

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 14 / 25

slide-22
SLIDE 22

From Pointers to Containers: Motivation

Recognition of containers and container operations has many applications: automatic parallelization, profiling and optimization, fault tolerance, program signatures for detection of plagiarism or malware, program understanding, debugging and automatic bug finding, simplification of program verification,

◮ separation of pointer and non-pointer analyses. Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 14 / 25

slide-23
SLIDE 23

From Pointers to Containers: Motivation

Recognition of containers and container operations has many applications: automatic parallelization, profiling and optimization, fault tolerance, program signatures for detection of plagiarism or malware, program understanding, debugging and automatic bug finding, simplification of program verification,

◮ separation of pointer and non-pointer analyses.

So far done via unsound dynamic approaches, possibly with human help.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 14 / 25

slide-24
SLIDE 24

Input of the Proposed Program Transformation (1/3)

1 A CFG annotated by:

◮ shape invariants,

  • encoded using a chosen formalism: e.g., SMGs (can be changed),

◮ source/target links between shape invariants and their elements:

  • for tracking the life-cycle of allocated memory nodes,
  • easy extension of common shape analysers.

11 t=p p h t

L

p h t

L

p h t p=

L

h t h t

L L

2

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 15 / 25

slide-25
SLIDE 25

Input of the Proposed Program Transformation (2/3)

2 Container shapes to be searched for in shape invariants:

◮ Specified in the formalism used for encoding shape invariants. Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 16 / 25

slide-26
SLIDE 26

Input of the Proposed Program Transformation (2/3)

2 Container shapes to be searched for in shape invariants:

◮ Specified in the formalism used for encoding shape invariants. ◮ Currently, fixed to null-terminated doubly-linked lists,

  • up to the concrete linking fields and type of nodes used.
  • Generalization to other containers possible.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 16 / 25

slide-27
SLIDE 27

Input of the Proposed Program Transformation (2/3)

2 Container shapes to be searched for in shape invariants:

◮ Specified in the formalism used for encoding shape invariants. ◮ Currently, fixed to null-terminated doubly-linked lists,

  • up to the concrete linking fields and type of nodes used.
  • Generalization to other containers possible.

◮ In SMGs: null-terminated, doubly-linked sequences of

regions and DLSs.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 16 / 25

slide-28
SLIDE 28

Input of the Proposed Program Transformation (3/3)

3 Specification of destructive container operations consisting of:

◮ destructive pointer updates x → sel = y, ◮ allocation and deallocation. Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 17 / 25

slide-29
SLIDE 29

Input of the Proposed Program Transformation (3/3)

3 Specification of destructive container operations consisting of:

◮ destructive pointer updates x → sel = y, ◮ allocation and deallocation.

Destructive container operations specified using sets of pairs of in/out abstract heap configurations:

◮ encoded using the chosen formalism, e.g., SMGs, ◮ with linked elements of the in/out configurations, ◮ sets: different variants of the operations,

  • e.g., insertion into empty/non-empty container.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 17 / 25

slide-30
SLIDE 30

Input of the Proposed Program Transformation (3/3)

3 Specification of destructive container operations consisting of:

◮ destructive pointer updates x → sel = y, ◮ allocation and deallocation.

Destructive container operations specified using sets of pairs of in/out abstract heap configurations:

◮ encoded using the chosen formalism, e.g., SMGs, ◮ with linked elements of the in/out configurations, ◮ sets: different variants of the operations,

  • e.g., insertion into empty/non-empty container.

4 Specification of non-destructive container operations:

◮ Currently, a fixed set of such operations – up to the selectors used:

  • iterators: moving along a data structure (e.g., next, prev),
  • initialisation of iterators (e.g., front, tail),
  • tests (e.g., emptiness, end of iteration).

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 17 / 25

slide-31
SLIDE 31

Searching for Destructive Container Operations (1/2)

Automatically derive which low-level pointer statements implement a given destructive container operation based on

◮ which pointer links change and ◮ which regions appear/disappear. Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 18 / 25

slide-32
SLIDE 32

Searching for Destructive Container Operations (1/2)

Automatically derive which low-level pointer statements implement a given destructive container operation based on

◮ which pointer links change and ◮ which regions appear/disappear. Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 18 / 25

slide-33
SLIDE 33

Searching for Destructive Container Operations (1/2)

Automatically derive which low-level pointer statements implement a given destructive container operation based on

◮ which pointer links change and ◮ which regions appear/disappear.

Along the input annotated CFG, look for transformation chains (TCs):

◮ memory-safe permutations of the implementing statements, ◮ possibly interleaved with other non-interfering statements. Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 18 / 25

slide-34
SLIDE 34

Searching for Destructive Container Operations (2/2)

An example of a transformation chain:

1 h= 2 nondet() 3 p=malloc() 4 h!= 7 t->f=p 8 p->f= 9 p->x=0 10 p->b=t t= 12 5 h=p p h t h== 11 t=p h t h t h t h t

L L

h t h t

L L

p h t p h t p h t p h t

L

p h t

L

p h t

L

p h t

L

p h t p h t p h t p h t p h t p h t p h t p h t

L

p h t

L

p h t 2 1 2 1 p=

L

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 19 / 25

slide-35
SLIDE 35

Replacement Locations

Replacement location: a location where a call of a container operation can be inserted, replacing the implementing pointer statements.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 20 / 25

slide-36
SLIDE 36

Replacement Locations

Replacement location: a location where a call of a container operation can be inserted, replacing the implementing pointer statements. Finding replacement locations:

◮ move non-implementing statements

to the prefix/suffix of a transformation chain,

◮ consider locations in between

  • f the prefix/suffix.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 20 / 25

slide-37
SLIDE 37

Replacement Locations

Replacement location: a location where a call of a container operation can be inserted, replacing the implementing pointer statements. Finding replacement locations:

◮ move non-implementing statements

to the prefix/suffix of a transformation chain,

◮ consider locations in between

  • f the prefix/suffix.

1 h= 2 nondet() 3 p=malloc() 4 h!= 7 t->f=p 8 p->f= 9 p->x=0 10 p->b=t t= 12 5 h=p p h t h== 11 t=p h t h t h t h t

L L

h t h t

L L

p h t p h t p h t p h t

L

p h t

L

p h t

L

p h t

L

p h t p h t p h t p h t p h t p h t p h t p h t

L

p h t

L

p h t 2 1 2 1 p=

L

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 20 / 25

slide-38
SLIDE 38

Replacement Recipes

Replacement recipe: assigns to each transformation chain τ

1 a container operation ∆τ, 2 a replacement location ℓτ, and 3 in/out parameter valuations σin

τ /σout τ

.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 21 / 25

slide-39
SLIDE 39

Replacement Recipes

Replacement recipe: assigns to each transformation chain τ

1 a container operation ∆τ, 2 a replacement location ℓτ, and 3 in/out parameter valuations σin

τ /σout τ

.

Local consistency:

◮ each TC τ is minimal and implements ∆τ wrt. ℓτ and σin

τ /σout τ

.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 21 / 25

slide-40
SLIDE 40

Replacement Recipes

Replacement recipe: assigns to each transformation chain τ

1 a container operation ∆τ, 2 a replacement location ℓτ, and 3 in/out parameter valuations σin

τ /σout τ

.

Local consistency:

◮ each TC τ is minimal and implements ∆τ wrt. ℓτ and σin

τ /σout τ

.

Global consistency:

◮ In all traces, the same replacement locations and implementing edges. ◮ Overlapping TCs agree on replacement locations, implementing edges,

and implemented operations.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 21 / 25

slide-41
SLIDE 41

Replacement Recipes

Replacement recipe: assigns to each transformation chain τ

1 a container operation ∆τ, 2 a replacement location ℓτ, and 3 in/out parameter valuations σin

τ /σout τ

.

Local consistency:

◮ each TC τ is minimal and implements ∆τ wrt. ℓτ and σin

τ /σout τ

.

Global consistency:

◮ In all traces, the same replacement locations and implementing edges. ◮ Overlapping TCs agree on replacement locations, implementing edges,

and implemented operations.

Connectedness:

◮ a container is manipulated destructively by container operations only. ◮ Natural: otherwise conversions from/to containers needed! Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 21 / 25

slide-42
SLIDE 42

Replacement Recipes

Replacement recipe: assigns to each transformation chain τ

1 a container operation ∆τ, 2 a replacement location ℓτ, and 3 in/out parameter valuations σin

τ /σout τ

.

Local consistency:

◮ each TC τ is minimal and implements ∆τ wrt. ℓτ and σin

τ /σout τ

.

Global consistency:

◮ In all traces, the same replacement locations and implementing edges. ◮ Overlapping TCs agree on replacement locations, implementing edges,

and implemented operations.

Connectedness:

◮ a container is manipulated destructively by container operations only. ◮ Natural: otherwise conversions from/to containers needed!

Implementation: find as many TCs as possible, then prune them.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 21 / 25

slide-43
SLIDE 43

Code Replacement, Non-destructive Operations

Insert assignment of in/out parameters of container operations.

◮ Need to agree for all overlapping TCs: otherwise prune again.

Insert calls of container operations, remove their pointer implementation.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 22 / 25

slide-44
SLIDE 44

Code Replacement, Non-destructive Operations

Insert assignment of in/out parameters of container operations.

◮ Need to agree for all overlapping TCs: otherwise prune again.

Insert calls of container operations, remove their pointer implementation. Non-destructive container operations:

◮ Detected by looking at annotations of two neighbouring locations only. Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 22 / 25

slide-45
SLIDE 45

Code Replacement, Non-destructive Operations

Insert assignment of in/out parameters of container operations.

◮ Need to agree for all overlapping TCs: otherwise prune again.

Insert calls of container operations, remove their pointer implementation. Non-destructive container operations:

◮ Detected by looking at annotations of two neighbouring locations only. ◮ Iteration:

  • A pointer always moved along the direction of a certain selector.

◮ Iteration initialization:

  • A pointer always moved to some distinguished location of a container.

◮ Tests:

  • A branch followed if a certain predicate over the container holds

(e.g., being empty).

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 22 / 25

slide-46
SLIDE 46

Experiments (1/2)

Very prototypical implementation in Predator-adt:

◮ http://www.fit.vutbr.cz/research/groups/verifit/tools/predator-adt/

No code generation, just a CFG transformation.

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 23 / 25

slide-47
SLIDE 47

Experiments (1/2)

Very prototypical implementation in Predator-adt:

◮ http://www.fit.vutbr.cz/research/groups/verifit/tools/predator-adt/

No code generation, just a CFG transformation. Benchmarks:

◮ 18 programs using different implementations of typical list operations:

  • insertion – push back, push front, ..., insert,
  • removal – pop back, pop front,
  • iteration, tests.

◮ Further variants generated by legal permutations of the statements. ◮ Programs creating, traversing, filtering, and searching lists taken from

the benchmark suite of SLAyer.

◮ Programs using DLLs with head/tail pointers. ◮ Programs dealing with circular DLLs. Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 23 / 25

slide-48
SLIDE 48

Experiments (2/2)

Combination of Predator and J2BP:

◮ Predator: shape analysis, but (almost) no non-pointer data. ◮ J2BP: predicate abstraction over containers, no shape analysis. Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 24 / 25

slide-49
SLIDE 49

Experiments (2/2)

Combination of Predator and J2BP:

◮ Predator: shape analysis, but (almost) no non-pointer data. ◮ J2BP: predicate abstraction over containers, no shape analysis. ◮ Due to separation of analyses, we verified programs that neither

Predator nor J2BP could handle:

  • Insertion into a sorted list and checking that it remains sorted.
  • Correctness of rewriting selected values in a container, ...

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 24 / 25

slide-50
SLIDE 50

Future Work

More generic formalisation covering richer classes of containers, more generic implementation, more flexible specification of non-destructive container operations, support of iterative destructive container operations, ..., container recognition in open programs, concurrent programs, ...

Dudka, Hol´ ık, Peringer, Trt´ ık, Vojnar SMGs for going from Pointers to Containers 25 / 25