Domain-Specific Languages for Program Analysis Mark Hills OOPSLE - - PowerPoint PPT Presentation

domain specific languages for program analysis
SMART_READER_LITE
LIVE PREVIEW

Domain-Specific Languages for Program Analysis Mark Hills OOPSLE - - PowerPoint PPT Presentation

Domain-Specific Languages for Program Analysis Mark Hills OOPSLE 2015: Open and Original Problems in Software Language Engineering March 6, 2014 Montreal, Canada http://www.rascal-mpl.org 1 Overview A Starting Example: DCFlow Other


slide-1
SLIDE 1

Domain-Specific Languages for Program Analysis

Mark Hills OOPSLE 2015: Open and Original Problems in Software Language Engineering March 6, 2014 Montreal, Canada

http://www.rascal-mpl.org

1

slide-2
SLIDE 2

Overview

  • A Starting Example: DCFlow
  • Other Early-Stage Ideas
  • Summary extraction from documentation
  • Trace processing
  • Discussion

2

slide-3
SLIDE 3

Say you need a control flow graph…

10 y := 10 exit 15 y := 15 3 x := 3 x true false entry

entry 3 x := 3 x exit 15 y := 15 10 y := 10 false true

3

slide-4
SLIDE 4

Building control flow graph extractors

  • First, define how to represent control flow graphs
  • Then, pick a language — hopefully we can reuse the first part

for different languages, but maybe not…

  • Next, define the control flow rules, using your favorite language

(such as Rascal, of course…)

  • Finally, define something that uses the graph — this makes

sure the data structure is rich enough to be useful as well…

4

slide-5
SLIDE 5

What if we want to work with another language?

  • May be able to reuse base CFG definition (but maybe not)
  • Cannot reuse flow definition (unless CFG def is the same and

features have identical semantics — the flow rules are specific to the features being defined)

  • Cannot easily reuse analysis (since CFG definition and

semantics differ)

5

slide-6
SLIDE 6

What if we want to work with another language?

  • May be able to reuse base CFG definition (but maybe not)
  • Cannot reuse flow definition (unless CFG def is the same and

features have identical semantics — the flow rules are specific to the features being defined)

  • Cannot easily reuse analysis (since CFG definition and

semantics differ)
 
 So, we write the entire thing over again 
 (and again, and again…)

6

slide-7
SLIDE 7

DCFlow: Declarative Control Flow

  • Declarative DSL for defining control flow rules
  • Generates Rascal code to build intraprocedural control flow

graphs with reusable library of CFG concepts

  • Provides basic visualization to allow graphs to be rendered in

GraphViz dot

  • Provides ignore mechanism to indicate which language

constructs we are not trying to define

  • IDE provides basic checking to aid user (with more coming)

7

slide-8
SLIDE 8

DCFlow Architecture

DCFlow Translator (Rascal) DCFlow Definition Source Program (Input Language) DCFlow Libraries (Rascal) Language-Specific Functions (Rascal) CFG Builder Modules (Rascal) CFG Construction (Rascal) Control Flow Graphs (Rascal) CFG Visualization (Rascal) GraphViz Visualizations (GraphViz,dot)

8

slide-9
SLIDE 9

Building up an example: plus

  • What should plus do?



 


9

binaryOperation(Expr left, Expr right, plus())

slide-10
SLIDE 10

Building up an example: plus

  • What should plus do?


  • Run left, then run right, then add them together



 
 
 


10

binaryOperation(Expr left, Expr right, plus()) rule EXP::add = left --> right --> self;

slide-11
SLIDE 11

Building up an example: plus

  • What should plus do?


  • Run left, then run right, then add them together


  • That’s it!



 


11

binaryOperation(Expr left, Expr right, plus()) rule EXP::add = left --> right --> self;

slide-12
SLIDE 12

Something more complex: while loops

  • What should while do?



 
 
 
 
 


12

\while(Expr cond, list[Stmt] body)

slide-13
SLIDE 13

Something more complex: while loops

  • What should while do?


  • The exp is the first and last thing we should do
  • A footer is useful as a target for break and continue
  • We need a back-edge, and it would be nice to label others



 
 


13

\while(Expr cond, list[Stmt] body)

slide-14
SLIDE 14

Something more complex: while loops

  • What should while do?


  • The exp is the first and last thing we should do
  • A footer is useful as a target for break and continue
  • We need a back-edge, and it would be nice to label others



 
 


14

\while(Expr cond, list[Stmt] body) rule STATEMENT::whileStat = create(footer), ^exp -conditionTrue-> body -backedge-> exp, exp -conditionFalse-> $footer;

slide-15
SLIDE 15

Design Decisions

  • Focus on abstract syntax trees (should 


almost work on Rascal concrete syntax, 
 but there are some differences)

  • Leverage reified types for generation and checking
  • Try to ensure added features are general — don’t want to add

something just because PHP or Java needs it

  • Make sure generated code is understandable — it should look

close to what you would write yourself

15

slide-16
SLIDE 16

How about for other domains?

  • Idea 1: Program tracing
  • Internal DSL — goal is to build this as a library in Rascal
  • Allow filter functions to keep or discard events of interest
  • Use closures to support registration of handlers for specific events or

event patterns

  • What we have now: rudimentary tracing for PHP programs using

Rascal and xdebug (running over TCP sockets)

16

slide-17
SLIDE 17

How about for other domains?

  • Idea 2: Summary extraction
  • Libraries make it harder to analyze code, we may not know what

these libraries actually do

  • Extract function/procedure/method summaries from existing

documentation — basic info such as signatures, types, maybe ability to attach more advanced info

  • No work on this yet, still deciding what makes sense — currently

works for PHP by extracting very generic HTML representation and using Rascal to match over it

17

slide-18
SLIDE 18

Related work

  • “Extensible intraprocedural flow analysis at the abstract syntax

tree level”, Söderberg, Ekman, Hedin, Magnusson

  • Uses attribute grammars to represent control flow
  • Reference attributes represent edges
  • Collection attributes represent inverse relations (e.g., pred)
  • Higher-order attributes allow building new AST nodes (e.g.,

entry and exit)

slide-19
SLIDE 19

Related work

  • Spoofax: NaBL, language for incremental type checking
  • DHAL and variants for data flow analysis
  • Related conceptually — use domain-specific languages for

specific analysis-related tasks

  • Direct language support: Rascal, TXL, Spoofax, ASF+SDF

, etc

slide-20
SLIDE 20

Discussion

20

slide-21
SLIDE 21

Discussion: Some possible topics…

  • What opportunities are there for creating DSLs for

program analysis? Which parts of the process would be best for this?

  • Which is best: internal or external? What

circumstances drive this?

  • Is this even a good idea? Why not just use Rascal (or

something else, if you must…)

21

slide-22
SLIDE 22

Which design decisions are important?

  • Focus on abstract syntax trees (should 


almost work on Rascal concrete syntax, 
 but there are some differences)

  • Leverage reified types for generation and checking
  • Try to ensure added features are general — don’t want to add

something just because PHP or Java needs it

  • Make sure generated code is understandable — it should look

close to what you would write yourself

22