Shape Analysis
Alon Milchgrub
Shape Analysis Alon Milchgrub Overview Lisp review The concrete - - PowerPoint PPT Presentation
Shape Analysis Alon Milchgrub Overview Lisp review The concrete semantics The abstractions function The abstract semantics Discussion Lisp review In Lisp everything is a list The command cons concatenates two objects
Alon Milchgrub
Lisp review The concrete semantics The abstractions function The abstract semantics Discussion
In Lisp everything is a list The command cons concatenates two objects by creating a new object with
pointers to both the original ones.
The commands car and cdr are used to access the first and second elements
(cons 'pine '(fir oak maple)) returns (pine fir oak maple) (car ‘(pine fir oak maple)) returns pine (cdr ‘(pine fir oak maple)) returns (fir oak maple)
pine (fir oak maple)
Let PVar be the set of pointers in a program. A shape graph if a directed graph with two type of edges: variable-edges Ev
and selector-edges Es.
Ev is a set of pairs of the form x, n where x ∈ PVar and n is a shape-node. 𝐹𝑡 is a set of triplets of the form 𝑡, 𝑡𝑓𝑚, 𝑢 where 𝑡𝑓𝑚 ∈ 𝑑𝑏𝑠, 𝑑𝑒𝑠 and 𝑡 and 𝑢
are shape nodes.
A shape graph is deterministic if from every PVar exit at most one edge and
from every shape-node exit at most one edge of each of 𝑑𝑏𝑠, 𝑑𝑒𝑠 .
𝑦 ≔ 𝒐𝒇𝒙
𝑧 ≔ 𝒐𝒇𝒙
𝑧. 𝑑𝑒𝑠 ≔ 𝑦
𝑨 ≔ 𝐨𝐟𝐱
𝑦. 𝑑𝑏𝑠 ≔ 𝑨
𝑧 ≔ 𝒐𝒋𝒎
𝑧 ≔ 𝑦. 𝑑𝑏𝑠
𝑨 ≔ 𝒐𝒋𝒎
𝑨 ≔ 𝑦
𝑑 𝑇𝐻
𝑦 𝑚1 𝑧 𝑚2 𝑨 𝑚3
The transformations applied to the shape graph are defined by the concrete
semantics 𝑡𝑢 𝒯: 𝒯 → 𝒯.
Let 𝑤 be a control flow graph vertex and 𝑞𝑏𝑢ℎ𝑡𝑈𝑝 𝑤 the set of paths in the
control flow graph from start to predecessors of 𝑤
Then the collecting semantics is defined as follows:
𝑑𝑡 𝑤 = 𝑡𝑢 𝑤𝑙
𝒯 …
𝑡𝑢 𝑤1
𝒯
∅, ∅ 𝑤1, … , 𝑤𝑙 ∈ 𝑞𝑏𝑢ℎ𝑡𝑈𝑝 𝑤
This is the set of possible shape graphs at 𝑤.
A static shape graph (SSG) is a pair 𝑇𝐻, 𝑗𝑡_𝑡ℎ𝑏𝑠𝑓𝑒 , where SG is a shape graph, whose shape nodes are a subset of 𝑜𝑌 𝑌 ⊆ 𝑄𝑊𝑏𝑠 . 𝑗𝑡_𝑡ℎ𝑏𝑠𝑓𝑒 is a function for the shape nodes of SG to 𝑢𝑠𝑣𝑓, 𝑔𝑏𝑚𝑡𝑓 .
Semantically, 𝑗𝑡_𝑡ℎ𝑏𝑠𝑓𝑒 𝑜 = 𝑢𝑠𝑣𝑓 indicates that 𝑜 is pointed to by more than 1
pointer on the heap.
Given a DSG, the mapping
𝛽 generates a SSG by replacing the concrete locations by the set of pointers pointing to the same location (after gc).
For the image of
𝛽 𝐸𝑇𝐻 𝑗𝑡_𝑡ℎ𝑏𝑠𝑓𝑒 𝑜𝑎 = 𝑢𝑠𝑣𝑓 ⇔ 𝑜𝑨 represents a concrete location that is pointed by more than 1 pointer on the heap. 𝑦 𝑚3 𝑚4 𝑚5 𝑢1 𝑚1 𝑚2 𝑧 𝑢 𝑜 𝑢 𝑜 𝑧 𝑜 𝑦,𝑢1 𝑜𝜚 𝑜𝜚
For a set of shape graphs 𝑇 the abstraction function 𝛽 is defined as follows:
𝛽 𝑇 =
𝐸𝑇𝐻∈𝑇
𝛽 𝑇𝐸𝐻
Where for two SSGs 𝑇𝐻 and 𝑇𝐻′:
𝑇𝐻 ⊔ 𝑇𝐻′ = 𝐹𝑤 ∪ 𝐹𝑤
′, 𝐹𝑡 ∪ 𝐹𝑡 ′ , 𝑗𝑡_𝑡ℎ𝑏𝑠𝑓𝑒 ∨ 𝑗𝑡_𝑡ℎ𝑏𝑠𝑓𝑒′
For a single DSG the shape-nodes of
𝛽 𝐸𝑇𝐻 represent disjoint sets of points.
Let 𝑇 be a set of DSGs, and 𝛽 𝑇 =
𝐹𝑤, 𝐹𝑡 , 𝑗𝑡_𝑡ℎ𝑏𝑠𝑓𝑒 , then it follow that: For all 𝑜𝑌, 𝑡𝑓𝑚, 𝑜𝑍 ∈ 𝐹𝑡 either 𝑌 = 𝑍 or 𝑌 ∩ 𝑍 = ∅ 𝑜 𝑦,𝑢1 𝑜𝜚 𝑜 𝑢
𝑦 𝑢1 𝑧 𝑢 𝑜 𝑧
In order for the abstraction to be useful, one should be able to compute it
directly by transforming the static shape graph (in contrast to by abstracting the concrete shape graph).
For this purpose the SSG meaning function 𝑡𝑢 𝒯𝒯: 𝒯𝒯 → 𝒯𝒯 is defined.
𝑦 ≔ 𝒐𝒇𝒙
𝑦 𝑜 𝑦 𝑦 𝑚𝑜𝑓𝑥
Concrete Abstract
𝑜 𝑧,𝑢2,𝑦 𝑜 𝑧,𝑢1,𝑦
𝑦 ≔ 𝑧
𝑧 𝑚𝑘 𝑢2 𝑧 𝑚𝑗 𝑢1 𝑦 𝑦 𝑧 𝑜 𝑧,𝑢2 𝑢2 𝑧 𝑜 𝑧,𝑢1 𝑢1 𝑦 𝑦
Concrete Abstract
𝑦. 𝑑𝑒𝑠 ≔ 𝑧
𝑜 𝑧 𝑜 𝑧,𝑦 𝑧 𝑚𝑘 𝑧 𝑚𝑗 𝑦 𝑧 𝑧 𝑦 𝑦 𝑚𝑙 𝑜 𝑦 𝑦 x
Concrete Abstract
x
𝑦 ≔ 𝑧. 𝑑𝑒𝑠
𝑧 𝑜 𝑧
Abstract
𝑜 𝑢1 𝑢1 𝑜 𝑢2 𝑢2 𝑜 𝑢1 𝑢1
𝑦 ≔ 𝑧. 𝑑𝑒𝑠
𝑧 𝑜 𝑧
Abstract
𝑜 𝑢1 𝑢1 𝑜 𝑢2 𝑢2 𝑜 𝑢1,𝑦 𝑢1 x 𝑦
The abstract semantics associate a SSG, 𝑇𝐻𝑤, with every control-flow vertex
𝑤, defined by: 𝑇𝐻𝑤 = ∅, ∅ , 𝜇𝑜. 𝑔𝑏𝑚𝑡𝑓 𝑗𝑔 𝑤 = 𝑡𝑢𝑏𝑠𝑢
𝑣∈𝑞𝑠𝑓𝑒 𝑤
𝑡𝑢 𝑣
𝒯𝒯 𝑇𝐻𝑣
𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓
Theorem (Correctness): For every control-flow graph vertex 𝑤:
“Strong Nullification” – When processing a statement of the type 𝑦. 𝑡𝑓𝑚0 = 𝑧
the 𝑡𝑓𝑚0 edges currently emanating from 𝑦 are always removed.
Materialization – When processing a statement of the type 𝑦 = 𝑧. 𝑡𝑓𝑚0 the
algorithm creates a copy of 𝑧. 𝑡𝑓𝑚0 and thus is able to un-summarize shape- nodes.
The shape analysis algorithm presented is able to verify shape preservation
properties of data structures like lists, lists containing a cycle and trees.
What are possible uses of this kind of analysis?
What are possible extensions of this method? What are possible flaws of this method?
Is it scalable?