Using ANTLR In Cybersecurity Stuart Maclean Applied Physics - - PowerPoint PPT Presentation

using antlr in cybersecurity
SMART_READER_LITE
LIVE PREVIEW

Using ANTLR In Cybersecurity Stuart Maclean Applied Physics - - PowerPoint PPT Presentation

Using ANTLR In Cybersecurity Stuart Maclean Applied Physics Laboratory University of Washington stuart@apl.uw.edu Seattle Java User Group, June 2017 Maclean (APL/UW) ANTLR Seajug June 2017 1 / 52 Outline Motivation 1 ANTLR: Another Tool


slide-1
SLIDE 1

Using ANTLR In Cybersecurity

Stuart Maclean

Applied Physics Laboratory University of Washington stuart@apl.uw.edu

Seattle Java User Group, June 2017

Maclean (APL/UW) ANTLR Seajug June 2017 1 / 52

slide-2
SLIDE 2

Outline

1

Motivation

2

ANTLR: Another Tool for Language Recognition

3

Building With ANTLR

4

ANTLR And Maven

5

A Toy Expression Language

6

Evaluating A Program With Embedded Actions

7

Representing Programs As Trees

8

Tree Visualizations

9

ANTLR Runtime API, Tree Manipulations

10 Manipulating C Code Using ANTLR Trees 11 Automating Code Generation For Program Analysis Via API Hooking 12 Conclusion

Maclean (APL/UW) ANTLR Seajug June 2017 2 / 52

slide-3
SLIDE 3

Motivation

Motivation, Goals

My goal for this work was to auto-generate C code to be used in program execution analysis, via a technique called API-Hooking. My goal for this talk is to show how useful the ANTLR tool was in achieving my work goal, but also how useful it could be in solving problems in your domain.

Maclean (APL/UW) ANTLR Seajug June 2017 3 / 52

slide-4
SLIDE 4

Motivation

What This Talk Is And Is Not

Original idea was to present work on disk imaging infrastructure, written largely in Java. See www.osdfcon.org/2016-event. Suited to digital forensics audience. Recent idea was to compare/contrast my auto-generated API hook code with e.g. Cuckoo Sandbox (www.cuckoosandbox.org) and to introduce and discuss Microsoft

  • Detours. Suited to C, program analysis audience.

Ended up focusing on ANTLR, suited to a Java audience!

Maclean (APL/UW) ANTLR Seajug June 2017 4 / 52

slide-5
SLIDE 5

ANTLR: Another Tool for Language Recognition

What is ANTLR?

From www.antlr.org: ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. Terence Parr (University of San Francisco) is the maniac behind ANTLR and has been working on language tools since 1989.

Maclean (APL/UW) ANTLR Seajug June 2017 5 / 52

slide-6
SLIDE 6

ANTLR: Another Tool for Language Recognition

A Computer Science Book With Humor!

From the Definitive ANTLR Reference preface (Parr). . . My office mate was an astrophysicist named Kevin, who told me on multiple

  • ccasions that only physicists do real work and that programmers merely support
  • physicists. Because all I do is build language tools to support programmers, I am at

least two levels of indirection away from doing anything useful. and Parr’s guiding principle. . . Why program by hand in five days what you can spend five years of your life automating?

Maclean (APL/UW) ANTLR Seajug June 2017 6 / 52

slide-7
SLIDE 7

ANTLR: Another Tool for Language Recognition

Why Use ANTLR?

You are learning evaluators and interpreters (REPL) You are building a compiler for a new language (!) You need to translate XML into JSON (!) You need to translate English into Finnish (???) The first step of any text-processing system is recognizing the input. Instead of writing a recognizer, or parser, by hand, let a parser generator do the heavy lifting. ANTLR is such a tool.

Maclean (APL/UW) ANTLR Seajug June 2017 7 / 52

slide-8
SLIDE 8

ANTLR: Another Tool for Language Recognition

ANTLR, Other Resources

Before I forget or run out of time. . . The Definitive ANTLR Reference (from which our Expr grammar was adapted). Enforcing Strict Model-View Separation in Template Engines (Parr). www.antlr.org. Detours: Binary Interception of Win32 Functions (Hunt,Brubacher). Java code for Expr parsers presented here available in a code bundle. For release: github.com/UW-APL-EIS/wicajo

Maclean (APL/UW) ANTLR Seajug June 2017 8 / 52

slide-9
SLIDE 9

Building With ANTLR

ANTLR In Action

Create a grammar file T.g for some language T. Then, run ANTLR. It produces Java source code for parsing language T: $ java -cp /path/to/antlr-3.0.jar org.antlr.Tool T.g $ ls T.g TLexer.java TParser.java Then build a test rig in e.g. RunT.java. Uses ANTLR-generated TParser class: $ javac -cp /path/to/antlr-3.0.jar RunT.java $ java

  • cp /path/to/antlr-3.0.jar:. RunT

Maclean (APL/UW) ANTLR Seajug June 2017 9 / 52

slide-10
SLIDE 10

ANTLR And Maven

ANTLR and Maven

Maven and ANTLR play great. Just reference the ANTLR plugin and runtime: $ cat pom.xml <dependency> <groupId>org.antlr</groupId> <artifactId>antlr-runtime</artifactId> <version>3.4</version> </dependency> <plugin> <groupId>org.antlr</groupId> <artifactId>antlr3-maven-plugin</artifactId> <version>3.4</version> </plugin>

Maclean (APL/UW) ANTLR Seajug June 2017 10 / 52

slide-11
SLIDE 11

ANTLR And Maven

Maven’s Standardized FileSystem

$ ls pom.xml src/main/antlr3/T.g src/main/java/RunT.java $ mvn compile target/generated-sources/antlr3/TLexer.java target/generated-sources/antlr3/TParser.java target/classes/* ANTLR’s parser generator binds to the process-sources lifecycle phase.

Maclean (APL/UW) ANTLR Seajug June 2017 11 / 52

slide-12
SLIDE 12

A Toy Expression Language

A Toy Expression Language

Define a mini/toy programming language. It will contain identifiers, numbers, assignments. In

  • ur language, statements end in ’;’. Valid ’programs’ in this language include:

a = 45; copyOfA = a; To make things more interesting, we’ll add four binary operators ’+’, ’-’, ’*’, ’/’, and parenthesized expressions, e.g. ’( a + 4 )’. Some more valid programs: a = 45; sum = 4; someCalc = 45 + (4 - 3) / ( 3 + (5 + 2) * b); myVar = c + a * b; How then to write a program to recognize all programs in this language?

Maclean (APL/UW) ANTLR Seajug June 2017 12 / 52

slide-13
SLIDE 13

A Toy Expression Language

A Toy Language Expressed in ANTLR (13 Lines)

$ cat Expr.g grammar Expr; // Lexer part, three token types: whitespace, numbers, identifiers WS : (’ ’|’\t’|’\n’|’\r’)+ { skip(); } ; INT : ’0’ .. ’9’+ ; ID : (’a’ .. ’z’ | ’A’ .. ’Z’)+ ; // Parser part, five rules prog : stat+ ; stat : expr ’;’ | ID ’=’ expr ’;’ | ’;’ ; expr : multExpr (( ’+’ | ’-’ ) multExpr)* ; multExpr : atom (( ’*’ | ’/’ ) atom)* ; atom : INT | ID | ’(’ expr ’)’ ;

Maclean (APL/UW) ANTLR Seajug June 2017 13 / 52

slide-14
SLIDE 14

A Toy Expression Language

A Toy Language Expressed in ANTLR (13 Lines)

$ cat Expr.g grammar Expr; // Lexer part, three token types: whitespace, numbers, identifiers WS : (’ ’|’\t’|’\n’|’\r’)+ { skip(); } ; INT : ’0’ .. ’9’+ ; ID : (’a’ .. ’z’ | ’A’ .. ’Z’)+ ; // Parser part, five rules prog : stat+ ; stat : expr ’;’ | ID ’=’ expr ’;’ | ’;’ ; expr : multExpr (( ’+’ | ’-’ ) multExpr)* ; multExpr : atom (( ’*’ | ’/’ ) atom)* ; atom : INT | ID | ’(’ expr ’)’ ;

Maclean (APL/UW) ANTLR Seajug June 2017 13 / 52

slide-15
SLIDE 15

A Toy Expression Language

A Toy Language Expressed in ANTLR (13 Lines)

$ cat Expr.g grammar Expr; // Lexer part, three token types: whitespace, numbers, identifiers WS : (’ ’|’\t’|’\n’|’\r’)+ { skip(); } ; INT : ’0’ .. ’9’+ ; ID : (’a’ .. ’z’ | ’A’ .. ’Z’)+ ; // Parser part, five rules prog : stat+ ; stat : expr ’;’ | ID ’=’ expr ’;’ | ’;’ ; expr : multExpr (( ’+’ | ’-’ ) multExpr)* ; multExpr : atom (( ’*’ | ’/’ ) atom)* ; atom : INT | ID | ’(’ expr ’)’ ;

Maclean (APL/UW) ANTLR Seajug June 2017 13 / 52

slide-16
SLIDE 16

A Toy Expression Language

A Toy Language Expressed in ANTLR (13 Lines)

$ cat Expr.g grammar Expr; // Lexer part, three token types: whitespace, numbers, identifiers WS : (’ ’|’\t’|’\n’|’\r’)+ { skip(); } ; INT : ’0’ .. ’9’+ ; ID : (’a’ .. ’z’ | ’A’ .. ’Z’)+ ; // Parser part, five rules prog : stat+ ; stat : expr ’;’ | ID ’=’ expr ’;’ | ’;’ ; expr : multExpr (( ’+’ | ’-’ ) multExpr)* ; multExpr : atom (( ’*’ | ’/’ ) atom)* ; atom : INT | ID | ’(’ expr ’)’ ;

Maclean (APL/UW) ANTLR Seajug June 2017 13 / 52

slide-17
SLIDE 17

A Toy Expression Language

A Toy Language Expressed in ANTLR (13 Lines)

$ cat Expr.g grammar Expr; // Lexer part, three token types: whitespace, numbers, identifiers WS : (’ ’|’\t’|’\n’|’\r’)+ { skip(); } ; INT : ’0’ .. ’9’+ ; ID : (’a’ .. ’z’ | ’A’ .. ’Z’)+ ; // Parser part, five rules prog : stat+ ; stat : expr ’;’ | ID ’=’ expr ’;’ | ’;’ ; expr : multExpr (( ’+’ | ’-’ ) multExpr)* ; multExpr : atom (( ’*’ | ’/’ ) atom)* ; atom : INT | ID | ’(’ expr ’)’ ;

Maclean (APL/UW) ANTLR Seajug June 2017 13 / 52

slide-18
SLIDE 18

A Toy Expression Language

A Toy Language Expressed in ANTLR (13 Lines)

$ cat Expr.g grammar Expr; // Lexer part, three token types: whitespace, numbers, identifiers WS : (’ ’|’\t’|’\n’|’\r’)+ { skip(); } ; INT : ’0’ .. ’9’+ ; ID : (’a’ .. ’z’ | ’A’ .. ’Z’)+ ; // Parser part, five rules prog : stat+ ; stat : expr ’;’ | ID ’=’ expr ’;’ | ’;’ ; expr : multExpr (( ’+’ | ’-’ ) multExpr)* ; multExpr : atom (( ’*’ | ’/’ ) atom)* ; atom : INT | ID | ’(’ expr ’)’ ;

Maclean (APL/UW) ANTLR Seajug June 2017 13 / 52

slide-19
SLIDE 19

A Toy Expression Language

A Toy Language Expressed in ANTLR (13 Lines)

$ cat Expr.g grammar Expr; // Lexer part, three token types: whitespace, numbers, identifiers WS : (’ ’|’\t’|’\n’|’\r’)+ { skip(); } ; INT : ’0’ .. ’9’+ ; ID : (’a’ .. ’z’ | ’A’ .. ’Z’)+ ; // Parser part, five rules prog : stat+ ; stat : expr ’;’ | ID ’=’ expr ’;’ | ’;’ ; expr : multExpr (( ’+’ | ’-’ ) multExpr)* ; multExpr : atom (( ’*’ | ’/’ ) atom)* ; atom : INT | ID | ’(’ expr ’)’ ;

Maclean (APL/UW) ANTLR Seajug June 2017 13 / 52

slide-20
SLIDE 20

A Toy Expression Language

A Toy Language Expressed in ANTLR (13 Lines)

$ cat Expr.g grammar Expr; // Lexer part, three token types: whitespace, numbers, identifiers WS : (’ ’|’\t’|’\n’|’\r’)+ { skip(); } ; INT : ’0’ .. ’9’+ ; ID : (’a’ .. ’z’ | ’A’ .. ’Z’)+ ; // Parser part, five rules prog : stat+ ; stat : expr ’;’ | ID ’=’ expr ’;’ | ’;’ ; expr : multExpr (( ’+’ | ’-’ ) multExpr)* ; multExpr : atom (( ’*’ | ’/’ ) atom)* ; atom : INT | ID | ’(’ expr ’)’ ;

Maclean (APL/UW) ANTLR Seajug June 2017 13 / 52

slide-21
SLIDE 21

A Toy Expression Language

ANTLR-Generated Java Code

Given Expr.g, ANTLR produces ExprLexer.java and ExprParser.java. Rules in the grammar become methods in the parser, making ANTLR a recursive-descent parser: public class ExprParser extends DebugParser { // $ANTLR start "prog" // expr/Expr.g:37:1: prog : ( stat )+ ; public final void prog() throws RecognitionException { ... } // $ANTLR start "stat" // expr/Expr.g:40:1: stat : ( expr ’;’ | ID ’=’ expr ’;’ | ’;’ ); public final void stat() throws RecognitionException { ... } }

Maclean (APL/UW) ANTLR Seajug June 2017 14 / 52

slide-22
SLIDE 22

A Toy Expression Language

ANTLR-Generated Java Code II

Perform a quick line count: $ wc -l src/main/antlr3/Expr.g 13 $ wc -l target/generated-sources/antlr3/*.java 781 ExprLexer.java 647 ExprParser.java We wrote 13 lines, and ANTLR wrote 1428. That’s my kind of job-share!

Maclean (APL/UW) ANTLR Seajug June 2017 15 / 52

slide-23
SLIDE 23

A Toy Expression Language

Alternative Output — Python

As well as Java (the default), ANTLR can produce parsers in other target languages! Thus your evaluator/compiler/translator could be in e.g. Python or C: $ cat ExprPy.g grammar ExprPy;

  • ptions {

language=Python; } // rest of grammar identical to original $ ls target/generated-sources/antlr3/ExprPyLexer.py target/generated-sources/antlr3/ExprPyParser.py

Maclean (APL/UW) ANTLR Seajug June 2017 16 / 52

slide-24
SLIDE 24

A Toy Expression Language

Alternative Output — C

$ cat ExprC.g grammar ExprC;

  • ptions {

language=C; } $ ls target/generated-sources/antlr3/ExprCLexer.[ch] target/generated-sources/antlr3/ExprCParser.[ch] All done with a templating engine called StringTemplate. One template for each output

  • language. Core generator logic unchanged! Other languages too, see ANTLR docs.

Maclean (APL/UW) ANTLR Seajug June 2017 17 / 52

slide-25
SLIDE 25

A Toy Expression Language

Testing The Expr Grammar

import org.antlr.runtime.*; public class ExprRunner { static void parse( String input ) { ParseTreeBuilder ptb = new ParseTreeBuilder( "prog" ); CharStream cs = new ANTLRStringStream( input ); Lexer lex = new ExprLexer( cs ); TokenStream tokens = new CommonTokenStream ( lex ); ExprParser parser = new ExprParser( tokens, ptb ); parser.prog(); System.out.println( ptb.getTree().toStringTree() ); } }

Maclean (APL/UW) ANTLR Seajug June 2017 18 / 52

slide-26
SLIDE 26

A Toy Expression Language

Testing The Expr Grammar

import org.antlr.runtime.*; public class ExprRunner { static void parse( String input ) { ParseTreeBuilder ptb = new ParseTreeBuilder( "prog" ); CharStream cs = new ANTLRStringStream( input ); Lexer lex = new ExprLexer( cs ); TokenStream tokens = new CommonTokenStream ( lex ); ExprParser parser = new ExprParser( tokens, ptb ); parser.prog(); System.out.println( ptb.getTree().toStringTree() ); } }

Maclean (APL/UW) ANTLR Seajug June 2017 18 / 52

slide-27
SLIDE 27

A Toy Expression Language

Testing The Expr Grammar

import org.antlr.runtime.*; public class ExprRunner { static void parse( String input ) { ParseTreeBuilder ptb = new ParseTreeBuilder( "prog" ); CharStream cs = new ANTLRStringStream( input ); Lexer lex = new ExprLexer( cs ); TokenStream tokens = new CommonTokenStream ( lex ); ExprParser parser = new ExprParser( tokens, ptb ); parser.prog(); System.out.println( ptb.getTree().toStringTree() ); } }

Maclean (APL/UW) ANTLR Seajug June 2017 18 / 52

slide-28
SLIDE 28

A Toy Expression Language

Testing The Expr Grammar

import org.antlr.runtime.*; public class ExprRunner { static void parse( String input ) { ParseTreeBuilder ptb = new ParseTreeBuilder( "prog" ); CharStream cs = new ANTLRStringStream( input ); Lexer lex = new ExprLexer( cs ); TokenStream tokens = new CommonTokenStream ( lex ); ExprParser parser = new ExprParser( tokens, ptb ); parser.prog(); System.out.println( ptb.getTree().toStringTree() ); } }

Maclean (APL/UW) ANTLR Seajug June 2017 18 / 52

slide-29
SLIDE 29

A Toy Expression Language

Testing The Expr Grammar

import org.antlr.runtime.*; public class ExprRunner { static void parse( String input ) { ParseTreeBuilder ptb = new ParseTreeBuilder( "prog" ); CharStream cs = new ANTLRStringStream( input ); Lexer lex = new ExprLexer( cs ); TokenStream tokens = new CommonTokenStream ( lex ); ExprParser parser = new ExprParser( tokens, ptb ); parser.prog(); System.out.println( ptb.getTree().toStringTree() ); } }

Maclean (APL/UW) ANTLR Seajug June 2017 18 / 52

slide-30
SLIDE 30

A Toy Expression Language

Testing The Expr Grammar

import org.antlr.runtime.*; public class ExprRunner { static void parse( String input ) { ParseTreeBuilder ptb = new ParseTreeBuilder( "prog" ); CharStream cs = new ANTLRStringStream( input ); Lexer lex = new ExprLexer( cs ); TokenStream tokens = new CommonTokenStream ( lex ); ExprParser parser = new ExprParser( tokens, ptb ); parser.prog(); System.out.println( ptb.getTree().toStringTree() ); } }

Maclean (APL/UW) ANTLR Seajug June 2017 18 / 52

slide-31
SLIDE 31

A Toy Expression Language

Testing The Expr Grammar

import org.antlr.runtime.*; public class ExprRunner { static void parse( String input ) { ParseTreeBuilder ptb = new ParseTreeBuilder( "prog" ); CharStream cs = new ANTLRStringStream( input ); Lexer lex = new ExprLexer( cs ); TokenStream tokens = new CommonTokenStream ( lex ); ExprParser parser = new ExprParser( tokens, ptb ); parser.prog(); System.out.println( ptb.getTree().toStringTree() ); } }

Maclean (APL/UW) ANTLR Seajug June 2017 18 / 52

slide-32
SLIDE 32

A Toy Expression Language

ExprRunner In Action

$ java -cp myJar:antlrJar ExprRunner 1; (<grammar prog> (prog (stat (expr (multExpr (atom 1))) ;))) pi = 3; rad = 89; dia = 2 * rad; x = a * (5 - (3 / 2 - 6 * z) + 27); 1 = 2; a = ; b = 7; Running the code, we note ANTLR’s great error handling. It continues after errors. The test rig works, but nothing really happens. We want a calculator!

Maclean (APL/UW) ANTLR Seajug June 2017 19 / 52

slide-33
SLIDE 33

A Toy Expression Language

ExprRunner In Action

$ java -cp myJar:antlrJar ExprRunner 1; (<grammar prog> (prog (stat (expr (multExpr (atom 1))) ;))) pi = 3; rad = 89; dia = 2 * rad; x = a * (5 - (3 / 2 - 6 * z) + 27); 1 = 2; a = ; b = 7; Running the code, we note ANTLR’s great error handling. It continues after errors. The test rig works, but nothing really happens. We want a calculator!

Maclean (APL/UW) ANTLR Seajug June 2017 19 / 52

slide-34
SLIDE 34

A Toy Expression Language

ExprRunner In Action

$ java -cp myJar:antlrJar ExprRunner 1; (<grammar prog> (prog (stat (expr (multExpr (atom 1))) ;))) pi = 3; rad = 89; dia = 2 * rad; x = a * (5 - (3 / 2 - 6 * z) + 27); 1 = 2; a = ; b = 7; Running the code, we note ANTLR’s great error handling. It continues after errors. The test rig works, but nothing really happens. We want a calculator!

Maclean (APL/UW) ANTLR Seajug June 2017 19 / 52

slide-35
SLIDE 35

Evaluating A Program With Embedded Actions

The Expr Grammar With Embedded Actions I

For the generated parser to do something, we add actions. These go right in the grammar file: $ cat ExprActions.g @parser::header { import java.util.HashMap; import java.util.Map; } @members { Map<String,Integer> memory = new HashMap<>(); } prog: stat+ ;

Maclean (APL/UW) ANTLR Seajug June 2017 20 / 52

slide-36
SLIDE 36

Evaluating A Program With Embedded Actions

The Expr Grammar With Embedded Actions II

stat: expr ’;’ { System.out.println( $expr.value ); } | ID ’=’ expr ’;’ { memory.put( $ID.text, $expr.value ); } | ’;’ ; expr returns [int value] : e=multExpr { $value = $e.value; } ( ’+’ e=multExpr { $value += $e.value; } | ’-’ e=multExpr { $value -= $e.value; } )* ;

Maclean (APL/UW) ANTLR Seajug June 2017 21 / 52

slide-37
SLIDE 37

Evaluating A Program With Embedded Actions

The Expr Grammar With Embedded Actions III

multExpr returns [int value] : e=atom { $value = $e.value; } ( ’*’ e=atom { $value *= $e.value; } | ’/’ e=atom { $value /= $e.value; } )* ; atom returns [int value] : INT { $value = Integer.parseInt( $INT.text ); } | ID { Integer v = memory.get( $ID.text ); if( v == null ) { printErr } else $value = v; } | ’(’ expr ’)’ { $value = $expr.value; } ;

Maclean (APL/UW) ANTLR Seajug June 2017 22 / 52

slide-38
SLIDE 38

Evaluating A Program With Embedded Actions

Testing The Expr Grammar With Embedded Actions

import org.antlr.runtime.*; public class ExprWithActionsRunner { static void parse( String input ) { CharStream cs = new ANTLRStringStream( input ); Lexer lex = new ExprActionsLexer( cs ); TokenStream tokens = new CommonTokenStream ( lex ); ParseTreeBuilder ptb = new ParseTreeBuilder( "prog" ); ExprActionsParser parser = new ExprActionsParser( tokens, ptb ); parser.prog(); } } Almost same as before. Only lexer, parser class names different. All the actions are in the ANTLR-generated code.

Maclean (APL/UW) ANTLR Seajug June 2017 23 / 52

slide-39
SLIDE 39

Evaluating A Program With Embedded Actions

ExprActionsRunner In Action I

$ java -cp myJar:antlrJar ExprActionsRunner > a = 5; b = 4 * a; a; b; 5 20 > l geometry.exp 2 4 8 12 4 Run the demo for a clearer picture!

Maclean (APL/UW) ANTLR Seajug June 2017 24 / 52

slide-40
SLIDE 40

Evaluating A Program With Embedded Actions

ExprActionsRunner In Action II

$ cat circle.exp pi = 3; rad = 4; dia = 2 * rad; area = pi * rad * rad; vol = 4 / 3 * pi * rad * rad * rad; pi; rad; dia; area; vol; $ java -cp myJar:antlrJar ExprActionsRunner circle.exp 3 4 8 48 192

Maclean (APL/UW) ANTLR Seajug June 2017 25 / 52

slide-41
SLIDE 41

Representing Programs As Trees

Tree Generation

The embedded actions in the previous example can only go so far. For any moderately complex input, e.g. programming language source code, evaluating the input as you read it is infeasible. An intermediate form called an abstract syntax tree is needed. Great time to learn recursion. ANTLR produces these trees automagically!

Maclean (APL/UW) ANTLR Seajug June 2017 26 / 52

slide-42
SLIDE 42

Representing Programs As Trees

The Expr Grammar With Tree Construction I

$ cat ExprTree.g grammar ExprTree;

  • ptions { output=AST; }

tokens { // Dummy tokens needed for source-source translations PROG; STAT; PARENS; } prog: stat+ -> ^(PROG stat+) ;

Maclean (APL/UW) ANTLR Seajug June 2017 27 / 52

slide-43
SLIDE 43

Representing Programs As Trees

The Expr Grammar With Tree Construction I

$ cat ExprTree.g grammar ExprTree;

  • ptions { output=AST; }

tokens { // Dummy tokens needed for source-source translations PROG; STAT; PARENS; } prog: stat+ -> ^(PROG stat+) ;

Maclean (APL/UW) ANTLR Seajug June 2017 27 / 52

slide-44
SLIDE 44

Representing Programs As Trees

The Expr Grammar With Tree Construction I

$ cat ExprTree.g grammar ExprTree;

  • ptions { output=AST; }

tokens { // Dummy tokens needed for source-source translations PROG; STAT; PARENS; } prog: stat+ -> ^(PROG stat+) ;

Maclean (APL/UW) ANTLR Seajug June 2017 27 / 52

slide-45
SLIDE 45

Representing Programs As Trees

The Expr Grammar With Tree Construction II

Subtree generation for rules stat, expr and multExpr: // STAT dummy tokens at subtree roots. Can thus discard the ’;’ stat: expr ’;’

  • > ^(STAT expr)

| ID ’=’ expr ’;’

  • > ^(STAT ID ’=’ expr)

| ’;’ ; expr: multExpr (( ’+’^ | ’-’^) multExpr)* ; multExpr: atom ((’*’^ | ’/’^) atom)* ;

Maclean (APL/UW) ANTLR Seajug June 2017 28 / 52

slide-46
SLIDE 46

Representing Programs As Trees

The Expr Grammar With Tree Construction II

Subtree generation for rules stat, expr and multExpr: // STAT dummy tokens at subtree roots. Can thus discard the ’;’ stat: expr ’;’

  • > ^(STAT expr)

| ID ’=’ expr ’;’

  • > ^(STAT ID ’=’ expr)

| ’;’ ; expr: multExpr (( ’+’^ | ’-’^) multExpr)* ; multExpr: atom ((’*’^ | ’/’^) atom)* ;

Maclean (APL/UW) ANTLR Seajug June 2017 28 / 52

slide-47
SLIDE 47

Representing Programs As Trees

The Expr Grammar With Tree Construction II

Subtree generation for rules stat, expr and multExpr: // STAT dummy tokens at subtree roots. Can thus discard the ’;’ stat: expr ’;’

  • > ^(STAT expr)

| ID ’=’ expr ’;’

  • > ^(STAT ID ’=’ expr)

| ’;’ ; expr: multExpr (( ’+’^ | ’-’^) multExpr)* ; multExpr: atom ((’*’^ | ’/’^) atom)* ;

Maclean (APL/UW) ANTLR Seajug June 2017 28 / 52

slide-48
SLIDE 48

Representing Programs As Trees

The Expr Grammar With Tree Construction II

Subtree generation for rules stat, expr and multExpr: // STAT dummy tokens at subtree roots. Can thus discard the ’;’ stat: expr ’;’

  • > ^(STAT expr)

| ID ’=’ expr ’;’

  • > ^(STAT ID ’=’ expr)

| ’;’ ; expr: multExpr (( ’+’^ | ’-’^) multExpr)* ; multExpr: atom ((’*’^ | ’/’^) atom)* ;

Maclean (APL/UW) ANTLR Seajug June 2017 28 / 52

slide-49
SLIDE 49

Representing Programs As Trees

The Expr Grammar With Tree Construction III

Subtree generation for atoms: atom: INT | ID /* Discard any parenthesis source token, but root the new subtree with a PARENS dummy token (in our case) */ | ’(’ expr ’)’ -> ^(PARENS expr) ;

Maclean (APL/UW) ANTLR Seajug June 2017 29 / 52

slide-50
SLIDE 50

Representing Programs As Trees

Testing The Expr Grammar With Tree Construction

import org.antlr.runtime.*; import org.antlr.runtime.tree.*; public class ExprWithTreesRunner { static void parse( String input ) { CharStream cs = new ANTLRStringStream( input ); Lexer lex = new ExprTreeLexer( cs ); TokenStream tokens = new CommonTokenStream ( lex ); ParseTreeBuilder ptb = new ParseTreeBuilder( "prog" ); ExprTreeParser parser = new ExprTreeParser( tokens, ptb ); ExprTreeParser.prog return r = parser.prog(); Tree t = (Tree)r.getTree(); process(t); } }

Maclean (APL/UW) ANTLR Seajug June 2017 30 / 52

slide-51
SLIDE 51

Representing Programs As Trees

ExprTreeRunner In Action

$ java -cp myJar:antlrJar ExprTreeRunner > a = 5; b = 4 * a; [1] > foo = bar + 3 * baz; [2] > d 2 display program 2 (via dot,png) > e 2 emit program 2 back out as source > load circle.exp load a program file [3] > ps list loaded programs > w emit all loaded programs as source

Maclean (APL/UW) ANTLR Seajug June 2017 31 / 52

slide-52
SLIDE 52

Representing Programs As Trees

ExprTreeRunner In Action

$ java -cp myJar:antlrJar ExprTreeRunner > a = 5; b = 4 * a; [1] > foo = bar + 3 * baz; [2] > d 2 display program 2 (via dot,png) > e 2 emit program 2 back out as source > load circle.exp load a program file [3] > ps list loaded programs > w emit all loaded programs as source

Maclean (APL/UW) ANTLR Seajug June 2017 31 / 52

slide-53
SLIDE 53

Representing Programs As Trees

ExprTreeRunner In Action

$ java -cp myJar:antlrJar ExprTreeRunner > a = 5; b = 4 * a; [1] > foo = bar + 3 * baz; [2] > d 2 display program 2 (via dot,png) > e 2 emit program 2 back out as source > load circle.exp load a program file [3] > ps list loaded programs > w emit all loaded programs as source

Maclean (APL/UW) ANTLR Seajug June 2017 31 / 52

slide-54
SLIDE 54

Representing Programs As Trees

ExprTreeRunner In Action

$ java -cp myJar:antlrJar ExprTreeRunner > a = 5; b = 4 * a; [1] > foo = bar + 3 * baz; [2] > d 2 display program 2 (via dot,png) > e 2 emit program 2 back out as source > load circle.exp load a program file [3] > ps list loaded programs > w emit all loaded programs as source

Maclean (APL/UW) ANTLR Seajug June 2017 31 / 52

slide-55
SLIDE 55

Representing Programs As Trees

ExprTreeRunner In Action

$ java -cp myJar:antlrJar ExprTreeRunner > a = 5; b = 4 * a; [1] > foo = bar + 3 * baz; [2] > d 2 display program 2 (via dot,png) > e 2 emit program 2 back out as source > load circle.exp load a program file [3] > ps list loaded programs > w emit all loaded programs as source

Maclean (APL/UW) ANTLR Seajug June 2017 31 / 52

slide-56
SLIDE 56

Representing Programs As Trees

ExprTreeRunner In Action

$ java -cp myJar:antlrJar ExprTreeRunner > a = 5; b = 4 * a; [1] > foo = bar + 3 * baz; [2] > d 2 display program 2 (via dot,png) > e 2 emit program 2 back out as source > load circle.exp load a program file [3] > ps list loaded programs > w emit all loaded programs as source

Maclean (APL/UW) ANTLR Seajug June 2017 31 / 52

slide-57
SLIDE 57

Tree Visualizations

Tree Visualization – DOT

Graphviz contains a tool called dot, which takes files in the dot format and can produce graphics, e.g. PNGs. ANTLR runtime includes a class to convert a Tree into a dot file: DOTTreeGenerator dtg = new DOTTreeGenerator(); StringTemplate st = dtg.toDOT( someTree ); File dotFile = new File( "someTree.dot" ); FileWriter fw = new FileWriter( dotFile ); PrintWriter pw = new PrintWriter( fw ); pw.println( st ); # apt-get install graphviz $ dot someTree.dot -Tpng > someTree.png $ display someTree.png See www.graphviz.org/content/dot-language

Maclean (APL/UW) ANTLR Seajug June 2017 32 / 52

slide-58
SLIDE 58

ANTLR Runtime API, Tree Manipulations

ANTLR-Derived Tree For The Circle Expr Program: Tree-to-Dot-to-PNG

Green-colored nodes are tokens from the input stream:

Maclean (APL/UW) ANTLR Seajug June 2017 33 / 52

slide-59
SLIDE 59

ANTLR Runtime API, Tree Manipulations

ANTLR Runtime API — Trees

ANTLR grammars of the output=AST variety produce tree objects. We can then manipulate those trees, and have fun with recursion: package org.antlr.runtime.tree; public interface Tree { void addChild( Tree t ); void deleteChild( int index ); int getChildCount(); Tree getChild( int indx ); Tree locateFirstChild( int type ); int getType(); String getText(); void replaceChildren( int i, int j, Tree t ); Tree dupNode(); }

Maclean (APL/UW) ANTLR Seajug June 2017 34 / 52

slide-60
SLIDE 60

ANTLR Runtime API, Tree Manipulations

ANTLR Runtime API — Tokens

Character sequences from the input are captured in tokens. Each tree node holds a token, which has a type (NUMBER, IDENTIFIER, etc) as well as its text: package org.antlr.runtime; public interface Token { int getType(); String getText(); void setText( String s ); ! int getTokenIndex(); int getLine(); }

Maclean (APL/UW) ANTLR Seajug June 2017 35 / 52

slide-61
SLIDE 61

ANTLR Runtime API, Tree Manipulations

Tree Mutations — Some Fun With ExprTreeRunner

> load circle.exp [1] > load geometry.exp [2] > ps > e 2 see what we have > ri 3 12 replace any number 3 with 12 > rid height seajug rename an ID > rp 5 replace any parened expression with 5 (sed?) > slr swap leftmost, rightmost > e 2 see what we have now

Maclean (APL/UW) ANTLR Seajug June 2017 36 / 52

slide-62
SLIDE 62

ANTLR Runtime API, Tree Manipulations

Tree Mutations — Some Fun With ExprTreeRunner

> load circle.exp [1] > load geometry.exp [2] > ps > e 2 see what we have > ri 3 12 replace any number 3 with 12 > rid height seajug rename an ID > rp 5 replace any parened expression with 5 (sed?) > slr swap leftmost, rightmost > e 2 see what we have now

Maclean (APL/UW) ANTLR Seajug June 2017 36 / 52

slide-63
SLIDE 63

ANTLR Runtime API, Tree Manipulations

Tree Mutations — Some Fun With ExprTreeRunner

> load circle.exp [1] > load geometry.exp [2] > ps > e 2 see what we have > ri 3 12 replace any number 3 with 12 > rid height seajug rename an ID > rp 5 replace any parened expression with 5 (sed?) > slr swap leftmost, rightmost > e 2 see what we have now

Maclean (APL/UW) ANTLR Seajug June 2017 36 / 52

slide-64
SLIDE 64

ANTLR Runtime API, Tree Manipulations

Tree Mutations — Some Fun With ExprTreeRunner

> load circle.exp [1] > load geometry.exp [2] > ps > e 2 see what we have > ri 3 12 replace any number 3 with 12 > rid height seajug rename an ID > rp 5 replace any parened expression with 5 (sed?) > slr swap leftmost, rightmost > e 2 see what we have now

Maclean (APL/UW) ANTLR Seajug June 2017 36 / 52

slide-65
SLIDE 65

ANTLR Runtime API, Tree Manipulations

Tree Mutations — Some Fun With ExprTreeRunner

> load circle.exp [1] > load geometry.exp [2] > ps > e 2 see what we have > ri 3 12 replace any number 3 with 12 > rid height seajug rename an ID > rp 5 replace any parened expression with 5 (sed?) > slr swap leftmost, rightmost > e 2 see what we have now

Maclean (APL/UW) ANTLR Seajug June 2017 36 / 52

slide-66
SLIDE 66

ANTLR Runtime API, Tree Manipulations

Tree Mutations — Some Fun With ExprTreeRunner

> load circle.exp [1] > load geometry.exp [2] > ps > e 2 see what we have > ri 3 12 replace any number 3 with 12 > rid height seajug rename an ID > rp 5 replace any parened expression with 5 (sed?) > slr swap leftmost, rightmost > e 2 see what we have now

Maclean (APL/UW) ANTLR Seajug June 2017 36 / 52

slide-67
SLIDE 67

ANTLR Runtime API, Tree Manipulations

Tree Mutations — Some Fun With ExprTreeRunner

> load circle.exp [1] > load geometry.exp [2] > ps > e 2 see what we have > ri 3 12 replace any number 3 with 12 > rid height seajug rename an ID > rp 5 replace any parened expression with 5 (sed?) > slr swap leftmost, rightmost > e 2 see what we have now

Maclean (APL/UW) ANTLR Seajug June 2017 36 / 52

slide-68
SLIDE 68

ANTLR Runtime API, Tree Manipulations

Tree Mutations — Some Fun With ExprTreeRunner

> load circle.exp [1] > load geometry.exp [2] > ps > e 2 see what we have > ri 3 12 replace any number 3 with 12 > rid height seajug rename an ID > rp 5 replace any parened expression with 5 (sed?) > slr swap leftmost, rightmost > e 2 see what we have now

Maclean (APL/UW) ANTLR Seajug June 2017 36 / 52

slide-69
SLIDE 69

ANTLR Runtime API, Tree Manipulations

ANTLR Versions

ANTLR constructs shown here apply to version 3 (quite old now). Other ANTLR 3 features are tree grammars and text generation via templates. ANTLR 4 is current version. Tree grammars (even ASTs?) deprecated in favor of parse tree listeners (??) I still use v3 since the C grammar I started with was a v3 document (C.g). New users will go with v4.

Maclean (APL/UW) ANTLR Seajug June 2017 37 / 52

slide-70
SLIDE 70

Manipulating C Code Using ANTLR Trees

The C Programmer’s Interview

So far, have seen that we can manipulate programs in the simple Expr language using ANTLR

  • trees. If it can be done for one language, why not others? Like C:

Maclean (APL/UW) ANTLR Seajug June 2017 38 / 52

slide-71
SLIDE 71

Manipulating C Code Using ANTLR Trees

The C Programmer’s Interview

So far, have seen that we can manipulate programs in the simple Expr language using ANTLR

  • trees. If it can be done for one language, why not others? Like C:

signal

Maclean (APL/UW) ANTLR Seajug June 2017 38 / 52

slide-72
SLIDE 72

Manipulating C Code Using ANTLR Trees

The C Programmer’s Interview

So far, have seen that we can manipulate programs in the simple Expr language using ANTLR

  • trees. If it can be done for one language, why not others? Like C:

signal signal( )

Maclean (APL/UW) ANTLR Seajug June 2017 38 / 52

slide-73
SLIDE 73

Manipulating C Code Using ANTLR Trees

The C Programmer’s Interview

So far, have seen that we can manipulate programs in the simple Expr language using ANTLR

  • trees. If it can be done for one language, why not others? Like C:

signal signal( ) signal( , )

Maclean (APL/UW) ANTLR Seajug June 2017 38 / 52

slide-74
SLIDE 74

Manipulating C Code Using ANTLR Trees

The C Programmer’s Interview

So far, have seen that we can manipulate programs in the simple Expr language using ANTLR

  • trees. If it can be done for one language, why not others? Like C:

signal signal( ) signal( , ) signal(int sig, )

Maclean (APL/UW) ANTLR Seajug June 2017 38 / 52

slide-75
SLIDE 75

Manipulating C Code Using ANTLR Trees

The C Programmer’s Interview

So far, have seen that we can manipulate programs in the simple Expr language using ANTLR

  • trees. If it can be done for one language, why not others? Like C:

signal signal( ) signal( , ) signal(int sig, ) signal(int sig, void (*H)(int) )

Maclean (APL/UW) ANTLR Seajug June 2017 38 / 52

slide-76
SLIDE 76

Manipulating C Code Using ANTLR Trees

The C Programmer’s Interview

So far, have seen that we can manipulate programs in the simple Expr language using ANTLR

  • trees. If it can be done for one language, why not others? Like C:

signal signal( ) signal( , ) signal(int sig, ) signal(int sig, void (*H)(int) ) void (*signal(int sig, void (*H)(int) ))(int)

Maclean (APL/UW) ANTLR Seajug June 2017 38 / 52

slide-77
SLIDE 77

Manipulating C Code Using ANTLR Trees

The C Programmer’s Interview

So far, have seen that we can manipulate programs in the simple Expr language using ANTLR

  • trees. If it can be done for one language, why not others? Like C:

signal signal( ) signal( , ) signal(int sig, ) signal(int sig, void (*H)(int) ) void (*signal(int sig, void (*H)(int) ))(int) void (*signal(int sig, void (*H)(int) ))(int);

Maclean (APL/UW) ANTLR Seajug June 2017 38 / 52

slide-78
SLIDE 78

Manipulating C Code Using ANTLR Trees

The C Programmer’s Interview

So far, have seen that we can manipulate programs in the simple Expr language using ANTLR

  • trees. If it can be done for one language, why not others? Like C:

signal signal( ) signal( , ) signal(int sig, ) signal(int sig, void (*H)(int) ) void (*signal(int sig, void (*H)(int) ))(int) void (*signal(int sig, void (*H)(int) ))(int); What about the tree a C compiler would build when parsing that code? Visualize that too? Moral? Grammar for C way more complex than that of Expr!

Maclean (APL/UW) ANTLR Seajug June 2017 38 / 52

slide-79
SLIDE 79

Manipulating C Code Using ANTLR Trees

ANTLR-Derived Tree For Signal: Tree-to-Dot-to-PNG

Green-colored nodes would produce output in any source-source translation:

Maclean (APL/UW) ANTLR Seajug June 2017 39 / 52

slide-80
SLIDE 80

Manipulating C Code Using ANTLR Trees

Windows Program Execution

Hardware kernel32.dll (900+) CreateFile DeleteFile advapi32.dll (700+) RegDeleteKey ws2.dll (100+) connect listen MyApp.exe CreateFile( args ); RegDeleteKey( args ); .

Maclean (APL/UW) ANTLR Seajug June 2017 40 / 52

slide-81
SLIDE 81

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Permits Program Monitoring

Unknown.exe Windows API winFunc x = winFunc(a,b,c); . WinAPI CALL made.

Maclean (APL/UW) ANTLR Seajug June 2017 41 / 52

slide-82
SLIDE 82

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Permits Program Monitoring

Unknown.exe Windows API winFunc x = winFunc(a,b,c); Hooks winFuncHook Logging . WinAPI CALL made. Hooked function JUMPs to our installed hook.

Maclean (APL/UW) ANTLR Seajug June 2017 41 / 52

slide-83
SLIDE 83

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Permits Program Monitoring

Unknown.exe Windows API winFunc x = winFunc(a,b,c); Hooks winFuncHook Logging a,b,c . WinAPI CALL made. Hooked function JUMPs to our installed hook. The hook logs the

  • riginal parameters.

Maclean (APL/UW) ANTLR Seajug June 2017 41 / 52

slide-84
SLIDE 84

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Permits Program Monitoring

Unknown.exe Windows API winFunc x = winFunc(a,b,c); Hooks winFuncHook Logging a,b,c . WinAPI CALL made. Hooked function JUMPs to our installed hook. The hook logs the

  • riginal parameters. The hook CALLs the real function (skipping over the JUMP).

Maclean (APL/UW) ANTLR Seajug June 2017 41 / 52

slide-85
SLIDE 85

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Permits Program Monitoring

Unknown.exe Windows API winFunc x = winFunc(a,b,c); Hooks winFuncHook Logging a,b,c x . WinAPI CALL made. Hooked function JUMPs to our installed hook. The hook logs the

  • riginal parameters. The hook CALLs the real function (skipping over the JUMP). The hook

logs the real function’s result.

Maclean (APL/UW) ANTLR Seajug June 2017 41 / 52

slide-86
SLIDE 86

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Permits Program Monitoring

Unknown.exe Windows API winFunc x = winFunc(a,b,c); Hooks winFuncHook Logging a,b,c x . WinAPI CALL made. Hooked function JUMPs to our installed hook. The hook logs the

  • riginal parameters. The hook CALLs the real function (skipping over the JUMP). The hook

logs the real function’s result. The hook RETURNs. Due to the CALL+JUMP+RETURN, instruction pointer now back at original call site.

Maclean (APL/UW) ANTLR Seajug June 2017 41 / 52

slide-87
SLIDE 87

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Problem Statement I

Want to monitor all calls to some Windows function, say CreateFileA, taking note of the file name, access mode, etc passed in. Given this API, from windows.h: HANDLE WINAPI CreateFileA( _In_ LPCTSTR lpFileName, _In_ DWORD dwDesiredAccess, _In_ DWORD dwShareMode, _In_opt_ LPSECURITY_ATTRIBUTES lpSecurityAttributes, _In_ DWORD dwCreationDisposition, _In_ DWORD dwFlagsAndAttributes, _In_opt_ HANDLE hTemplateFile );

Maclean (APL/UW) ANTLR Seajug June 2017 42 / 52

slide-88
SLIDE 88

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Problem Statement II

to hook that call, we have to write this new code: HANDLE (WINAPI * CreateFileA VAR)( LPCTSTR lpFileName,

  • therArgs ) = CreateFileA;

HANDLE WINAPI CreateFileA HOOK( LPCTSTR lpFileName,

  • therArgs ) {

LOG( "In CreateFileA", lpFileName, otherArgs ); HANDLE result = CreateFileA_VAR( lpFileName, otherArgs ); LOG( "Result is X" ); return result; } DetoursAttach( &CreateFileA VAR, CreateFileA HOOK ); // Only Detours call Oh, and same for other 2000 functions in the Windows API!

Maclean (APL/UW) ANTLR Seajug June 2017 43 / 52

slide-89
SLIDE 89

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Problem Statement II

to hook that call, we have to write this new code: HANDLE (WINAPI * CreateFileA VAR)( LPCTSTR lpFileName,

  • therArgs ) = CreateFileA;

HANDLE WINAPI CreateFileA HOOK( LPCTSTR lpFileName,

  • therArgs ) {

LOG( "In CreateFileA", lpFileName, otherArgs ); HANDLE result = CreateFileA_VAR( lpFileName, otherArgs ); LOG( "Result is X" ); return result; } DetoursAttach( &CreateFileA VAR, CreateFileA HOOK ); // Only Detours call Oh, and same for other 2000 functions in the Windows API!

Maclean (APL/UW) ANTLR Seajug June 2017 43 / 52

slide-90
SLIDE 90

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Problem Statement II

to hook that call, we have to write this new code: HANDLE (WINAPI * CreateFileA VAR)( LPCTSTR lpFileName,

  • therArgs ) = CreateFileA;

HANDLE WINAPI CreateFileA HOOK( LPCTSTR lpFileName,

  • therArgs ) {

LOG( "In CreateFileA", lpFileName, otherArgs ); HANDLE result = CreateFileA_VAR( lpFileName, otherArgs ); LOG( "Result is X" ); return result; } DetoursAttach( &CreateFileA VAR, CreateFileA HOOK ); // Only Detours call Oh, and same for other 2000 functions in the Windows API!

Maclean (APL/UW) ANTLR Seajug June 2017 43 / 52

slide-91
SLIDE 91

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Problem Statement II

to hook that call, we have to write this new code: HANDLE (WINAPI * CreateFileA VAR)( LPCTSTR lpFileName,

  • therArgs ) = CreateFileA;

HANDLE WINAPI CreateFileA HOOK( LPCTSTR lpFileName,

  • therArgs ) {

LOG( "In CreateFileA", lpFileName, otherArgs ); HANDLE result = CreateFileA_VAR( lpFileName, otherArgs ); LOG( "Result is X" ); return result; } DetoursAttach( &CreateFileA VAR, CreateFileA HOOK ); // Only Detours call Oh, and same for other 2000 functions in the Windows API!

Maclean (APL/UW) ANTLR Seajug June 2017 43 / 52

slide-92
SLIDE 92

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Problem Statement II

to hook that call, we have to write this new code: HANDLE (WINAPI * CreateFileA VAR)( LPCTSTR lpFileName,

  • therArgs ) = CreateFileA;

HANDLE WINAPI CreateFileA HOOK( LPCTSTR lpFileName,

  • therArgs ) {

LOG( "In CreateFileA", lpFileName, otherArgs ); HANDLE result = CreateFileA_VAR( lpFileName, otherArgs ); LOG( "Result is X" ); return result; } DetoursAttach( &CreateFileA VAR, CreateFileA HOOK ); // Only Detours call Oh, and same for other 2000 functions in the Windows API!

Maclean (APL/UW) ANTLR Seajug June 2017 43 / 52

slide-93
SLIDE 93

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Problem Statement II

to hook that call, we have to write this new code: HANDLE (WINAPI * CreateFileA VAR)( LPCTSTR lpFileName,

  • therArgs ) = CreateFileA;

HANDLE WINAPI CreateFileA HOOK( LPCTSTR lpFileName,

  • therArgs ) {

LOG( "In CreateFileA", lpFileName, otherArgs ); HANDLE result = CreateFileA_VAR( lpFileName, otherArgs ); LOG( "Result is X" ); return result; } DetoursAttach( &CreateFileA VAR, CreateFileA HOOK ); // Only Detours call Oh, and same for other 2000 functions in the Windows API!

Maclean (APL/UW) ANTLR Seajug June 2017 43 / 52

slide-94
SLIDE 94

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Problem Statement II

to hook that call, we have to write this new code: HANDLE (WINAPI * CreateFileA VAR)( LPCTSTR lpFileName,

  • therArgs ) = CreateFileA;

HANDLE WINAPI CreateFileA HOOK( LPCTSTR lpFileName,

  • therArgs ) {

LOG( "In CreateFileA", lpFileName, otherArgs ); HANDLE result = CreateFileA_VAR( lpFileName, otherArgs ); LOG( "Result is X" ); return result; } DetoursAttach( &CreateFileA VAR, CreateFileA HOOK ); // Only Detours call Oh, and same for other 2000 functions in the Windows API!

Maclean (APL/UW) ANTLR Seajug June 2017 43 / 52

slide-95
SLIDE 95

Automating Code Generation For Program Analysis Via API Hooking

API Hooking Solution, Partially At Least

Adapt ANTLR’s C.g grammar to do tree construction (like ExprTree.g). Load windows/*.h to produce (monster) trees. Via ANTLR’s Tree API, mutate those trees to produce the new C code we need. Solve the LOG signature problem.

Maclean (APL/UW) ANTLR Seajug June 2017 44 / 52

slide-96
SLIDE 96

Automating Code Generation For Program Analysis Via API Hooking

C Code Manipulation — Preparation

How to gather all the functions describing the Windows API? Do what any C programmer would do, inspect the header files. Run the preprocessor on some one-line C program, will deliver tons: windows9> type grabFuncDecls.c #include <windows.h> windows9> cl /P /C grabFuncDecls.c Now take this data to an ANTLR C parser, read grabFuncDecls.c in and transform it via tree manipulations!

Maclean (APL/UW) ANTLR Seajug June 2017 45 / 52

slide-97
SLIDE 97

Automating Code Generation For Program Analysis Via API Hooking

Windows C as Java Objects (WICAJO)

Idea: Use ANTLR to convert C function declarations from Windows C header files into Java objects, specifically ANTLR trees. Mutate those trees as needed to compose new C code with functions which are able to monitor and log program behavior. Compile the new functions and inject them into other programs using API hooking technologies, e.g. Microsoft Detours. Collect the logs to infer program execution patterns. Also applicable to e.g. Linux but not sure if a Detours equivalent exists?

Maclean (APL/UW) ANTLR Seajug June 2017 46 / 52

slide-98
SLIDE 98

Automating Code Generation For Program Analysis Via API Hooking

C Code Manipulation — WICAJO Shell

As per our interactive ExprTree runner, only applied to C programs, not Expr programs: $ wicajosh -d grabFuncDecls.i

  • d = no dot files, too many funcs?

> fs list loaded functions > ts list loaded typedefs > df F display tree for func F > dt T display tree for typedef T > pf F a function pointer for F > rv F a return variable for F > rvr F a resolved return variable for F > pd N F info on Nth argument to F > pdr N F resolved info on Nth argument to F

Maclean (APL/UW) ANTLR Seajug June 2017 47 / 52

slide-99
SLIDE 99

Automating Code Generation For Program Analysis Via API Hooking

C Code Manipulation — WICAJO API I

The WICAJO shell makes use of the WICAJO API, Java classes representing C function

  • declarations. The API, with example return values for the CreateFileA Windows function:

public class FunctionDeclaration { FunctionDeclaration( org.antlr.runtime.tree.Tree t ); String pointer( String s ); -> "HANDLE (WINAPI * CreateFileA" + s + ") (LPCTSTR lpFileName,

  • therArgs )"

String text( String s );

  • > "HANDLE WINAPI " +s+ " (LPCTSTR lpFileName,
  • therArgs )"

String result( String s );

  • > "HANDLE " + s

String args();

  • > "lpFileName, otherArgs"

}

Maclean (APL/UW) ANTLR Seajug June 2017 48 / 52

slide-100
SLIDE 100

Automating Code Generation For Program Analysis Via API Hooking

C Code Manipulation — WICAJO API II

We also need extraction of info from each function parameter: public class ParameterDeclaration { ParameterDeclaration( org.antlr.runtime.tree.Tree t ); String type()

  • > "LPCTSTR"

String name()

  • > "lpFileName"

String start()

  • > "lpFileName"

String length() -> "strlen(lpFileName)" } The last two calls return the C code snippets we would pass to a logging call — where to log from (start) and how many bytes (length). Note how WICAJO has inferred, via (recursive!) typedef resolution, that lpFileName is really a string, though its given type was LPCTSTR.

Maclean (APL/UW) ANTLR Seajug June 2017 49 / 52

slide-101
SLIDE 101

Automating Code Generation For Program Analysis Via API Hooking

C Code Manipulation — Typedefs, Arghhh!

Windows headers make frequent use of typedefs: typedef char CHAR; typedef CHAR* LPCSTR; int someFunc( int a, LPCSTR b, float c ); In order to log argument b properly in the hook for someFunc, we’d have to resolve all argument types to their native C types. In the case above, the byte count should be strlen(b).

Maclean (APL/UW) ANTLR Seajug June 2017 50 / 52

slide-102
SLIDE 102

Automating Code Generation For Program Analysis Via API Hooking

C Code Manipulation — Typedefs, Done!

The typedef resolution of parameter declarations (and of any return value) can be visualized in the WICAJO interactive shell, e.g. $ wicajosh > l signal.c > l signaltd.c > ef 1 vanilla signal function, named signal1 > ef 2 signal defined via typedefs, named signal2 > pd 2 1 info on 2nd arg to signal1, OK > pd 2 2 info on 2nd arg to signal2, how big? > pdr 2 2 info on 2nd arg to signal2, now OK See the earlier ParameterDeclaration class API for details.

Maclean (APL/UW) ANTLR Seajug June 2017 51 / 52

slide-103
SLIDE 103

Conclusion

Conclusions

Parr’s guiding principle revisited. . . Why program by hand in five days what you can spend five years of your life automating? I have spent five-plus years dabbling with this automated generation of C code for the purposes of API hooking and the overall goal of black-box program analysis. Likely could have done it all by hand in three (but certainly not five days!). Parsing C code? Just say no. Seriously, ANTLR is an amazing tool and library and is great fun too.

Maclean (APL/UW) ANTLR Seajug June 2017 52 / 52