Datalog. by Franois Gauthier It all began with a qualifying exam - - PowerPoint PPT Presentation

datalog
SMART_READER_LITE
LIVE PREVIEW

Datalog. by Franois Gauthier It all began with a qualifying exam - - PowerPoint PPT Presentation

Simplified data-flow analysis with Datalog. by Franois Gauthier It all began with a qualifying exam Among the heap of papers I had to read, there was this one: Cloning-based context-sensitive pointer alias analysis using binary decision


slide-1
SLIDE 1

Simplified data-flow analysis with Datalog.

by François Gauthier

slide-2
SLIDE 2

It all began with a qualifying exam…

Among the heap of papers I had to read, there was this one: Cloning-based context-sensitive pointer alias analysis using binary decision diagrams by J Whaley and MS Lam in the Programming Language Design and Implementation (PLDI) conference

slide-3
SLIDE 3

… and a points-to analysis

Where the authors claimed that the following 4 lines compute a basic points-to analysis: vP(v, h) :− vP0(v, h). vP(v1, h) :− assign(v1, v2), vP(v2, h). hP(h1, f, h2) :− store(v1,f,v2), vP(v1,h1), vP(v2, h2). vP(v2, h2) :− load(v1,f, v2), vP(v1,h1), hP(h1,f, h2).

slide-4
SLIDE 4

Really?

slide-5
SLIDE 5

Yes!

slide-6
SLIDE 6

Datalog – Basics

  • Datalog is a logic programming language that

is a subset of Prolog.

  • Datalog operates on facts and rules.
  • A fact is declared like this:

– parent("Bill", "Mary").

  • Can read as

– Bill is the parent of Mary or – Mary is the parent of Bill.

  • Implementer choose the meaning.
slide-7
SLIDE 7

Datalog – Basics (cont.)

  • A Datalog program consists in a set rules that

define new facts.

  • A rule consists of two parts: head and body:

– ancestor(?X,?Y) :- parent(?X,?Y). – ancestor(?X,?Y) :- parent(?X,?Z), ancestor(?Z,?Y).

  • The :- symbol separates the head and the

body.

  • Commas in the body stand for AND.
  • ? indicates a variable.
slide-8
SLIDE 8

Datalog – Understanding rules

  • The following rule:

– ancestor(?X,?Y) :- parent(?X,?Y).

reads as: Y is an ancestor of X if it is true that Y is a parent of X.

  • Similarly, the following rule:

– ancestor(?X,?Y) :- parent(?X,?Z), ancestor(?Z,?Y).

reads as: Y is an ancestor of X if it is true that Z is the parent of X and Y is the ancestor of Z.

slide-9
SLIDE 9

Ancestors - Initial facts

parent("C", "D"). parent("Y", "D"). parent("C", "Z"). parent("Y", "Z"). parent("A", "B"). parent("A", "C"). parent("W", "Y"). parent("W", "X").

Z Y W D C A B X

slide-10
SLIDE 10

Ancestors – Rules and queries

  • Recall the rules of our ancestors program:

– ancestor(?X,?Y) :- parent(?X,?Y). – ancestor(?X,?Y) :- parent(?X,?Z), ancestor(?Z,?Y).

  • These rules will be evaluated iteratively until

the head is not modified anymore (fixpoint).

  • A query in Datalog is expressed like this:

– ?-ancestor("W", ?Ancestor).

slide-11
SLIDE 11

What about data-flow analysis?

Java code

public String name (String type){ 1: String a = “Anonymous”; 2: if(type.equals(“cat”)) 3: a = “Garfield”; 4: else if(type.equals(“dog”)) 5: a = “Snoopy"; 6: else 7: a = “Blob”; 8: return a; }

Control-flow graph

1 2 3 5 7 4 8

slide-12
SLIDE 12

Reaching definitions – Initial facts

Java code

public String name (String type){ 1: String a = “Anonymous”; 2: if(type.equals(“cat”)) 3: a = “Garfield”; 4: else if(type.equals(“dog”)) 5: a = “Snoopy"; 6: else 7: a = “Blob”; 8: return a; }

Initial facts

assign(1, "a"). assign(3, "a"). assign(5, "a"). assign(7, "a").

slide-13
SLIDE 13

Reaching definitions – Initial facts

Initial facts follows(1,2). follows(2,3). follows(2,4). follows(4,5). follows(4,7). follows(3,8). follows(5,8). follows(7,8). Control-flow graph

1 2 3 5 7 4 8

slide-14
SLIDE 14

Reaching definitions - Rules

reach(?i,?x,?j) :- assign(?i,?x), follows(?i,?j). reach(?d,?x,?j) :- reach(?d,?x,?i), follows(?i,?j), !assign(?j,?x).

slide-15
SLIDE 15

Back to point-to…

Java code

  • 1: Dog snoopy = new Dog();
  • 2: Dog odie = new Dog();
  • 3: Food f1 = new Food();

snoopy.food = f1; Food f2 = snoopy.food;

  • die.food = f2;

Dog myDog = odie;

Representation

  • 1

snoopy

  • die
  • 2

f1

  • 3

food food

f2 myDog

slide-16
SLIDE 16

Initial facts – vPointsTo0

Java code

  • 1: Dog snoopy = new Dog();
  • 2: Dog odie = new Dog();
  • 3: Food f1 = new Food();

snoopy.food = f1; Food f2 = snoopy.food;

  • die.food = f2;

Dog myDog = odie;

Facts

vPointsTo0("snoopy","o1"). vPointsTo0("odie","o2"). vPointsTo0("f1","o3").

slide-17
SLIDE 17

Initial facts – store

Java code

  • 1: Dog snoopy = new Dog();
  • 2: Dog odie = new Dog();
  • 3: Food f1 = new Food();

snoopy.food = f1; Food f2 = snoopy.food;

  • die.food = f2;

Dog myDog = odie;

Facts

store("snoopy","food","f1"). store("odie","food","f2").

slide-18
SLIDE 18

Initial facts – load

Java code

  • 1: Dog snoopy = new Dog();
  • 2: Dog odie = new Dog();
  • 3: Food f1 = new Food();

snoopy.food = f1; Food f2 = snoopy.food;

  • die.food = f2;

Dog myDog = odie;

Facts

load("snoopy","food","f2").

slide-19
SLIDE 19

Initial facts – assign

Java code

  • 1: Dog snoopy = new Dog();
  • 2: Dog odie = new Dog();
  • 3: Food f1 = new Food();

snoopy.food = f1; Food f2 = snoopy.food;

  • die.food = f2;

Dog myDog = odie;

Facts

assign("myDog","odie").

slide-20
SLIDE 20

Initial facts – putting it all together

Java code

  • 1: Dog snoopy = new Dog();
  • 2: Dog odie = new Dog();
  • 3: Food f1 = new Food();

snoopy.food = f1; Food f2 = snoopy.food;

  • die.food = f2;

Dog myDog = odie;

Facts

vPointsTo0("snoopy","o1"). vPointsTo0("odie","o2"). vPointsTo0("f1","o3"). store("snoopy","food","f1"). load("snoopy","food","f2"). store("odie","food","f2"). assign("myDog","odie").

slide-21
SLIDE 21

Points-to – Rules

We are interested in finding:

  • 1. To which heap objects a variable can point to.
  • 2. To which heap objects a field can point to.

Outputs will be stored in two relations:

  • 1. vPointsTo(?v, ?o) – Variable v points to object o
  • 2. hPointsTo(?o1, ?f, ?o2) – The field f of object o1

points to object o2.

slide-22
SLIDE 22

Points-to – Rules (cont.)

Initialization: vPointsTo(?v, ?o) :- vPointsTo0(?v, ?o). Assignments (v1 = v2): vPointsTo(?v1, ?o) :- assign(?v1, ?v2), vPointsTo(?v2, ?o).

slide-23
SLIDE 23

Points-to – Rules (cont.)

Stores (v1.f = v2): hPointsTo(?o1, ?f, ?o2) :- store(?v1, ?f, ?v2), vPointsTo(?v1, ?o1), vPointsTo(?v2, ?o2).

slide-24
SLIDE 24

Points-to – Rules (cont.)

Loads (v2 = v1.f): vPointsTo(?v2, ?o2) :- load(?v1, ?f, ?v2), vPointsTo(?v1, ?o1), hPointsTo(?o1, ?f, ?o2).

slide-25
SLIDE 25

Points-to – Putting all rules together

vPointsTo(?v, ?o) :- vPointsTo0(?v, ?o). vPointsTo(?v1, ?o) :- assign(?v1, ?v2), vPointsTo(?v2, ?o). hPointsTo(?o1, ?f, ?o2) :- store(?v1, ?f, ?v2), vPointsTo(?v1, ?o1), vPointsTo(?v2, ?o2). vPointsTo(?v2, ?o2) :- load(?v1,?f, ?v2), vPointsTo(?v1, ?o1), hPointsTo(?o1, ?f, ?o2).

slide-26
SLIDE 26

Application to security

function read($file, ) { if( ) else error(‘You cannot read that file’); } ... $file = ‘prescriptions.txt’; ... = ... read($file, ); user_can(‘read’); $canRd $canRd $privilege $privilege

1 2 3 4

Protected by the ‘read’ privilege.

$handle = fopen($file, "r");

5

slide-27
SLIDE 27

Results on Moodle 1.9.5

Syntactic analysis: 992 security checks detected. Intra-procedural, flow-insensitive: 1062 security checks detected. Intra-procedural, flow-sensitive: 1063 security checks detected (removed an ambiguity) Inter-procedural, flow-insensitive: 1072 security checks detected.

slide-28
SLIDE 28

Conclusion

You can find the Datalog programs I developed (both intra and inter-procedural) in: Alias-aware propagation of simple pattern- based properties in PHP applications, SCAM 2012. That’s all folks!