Source Code Manipulation Dr. Vadim Zaytsev aka @grammarware UvA, - - PowerPoint PPT Presentation

source code manipulation
SMART_READER_LITE
LIVE PREVIEW

Source Code Manipulation Dr. Vadim Zaytsev aka @grammarware UvA, - - PowerPoint PPT Presentation

Source Code Manipulation Dr. Vadim Zaytsev aka @grammarware UvA, MSc SE, 30 November 2015 Roadmap W44 Introduction V.Zaytsev W45 Metaprogramming J.Vinju W46 Reverse Engineering V.Zaytsev W47 Software Analytics M.Bruntink W48 Clone


slide-1
SLIDE 1

Source Code Manipulation

  • Dr. Vadim Zaytsev aka @grammarware

UvA, MSc SE, 30 November 2015

slide-2
SLIDE 2

Roadmap

W44 Introduction V.Zaytsev W45 Metaprogramming J.Vinju W46 Reverse Engineering V.Zaytsev W47 Software Analytics M.Bruntink W48 Clone Management M.Bruntink W49 Source Code Manipulation V.Zaytsev W50 Legacy and Renovation TBA W51 Conclusion V.Zaytsev

slide-3
SLIDE 3

I

slide-4
SLIDE 4

Compiler

D.Grune, K.v.Reeuwijk, H.E.Bal, C.J.H.Jacobs, K.Langendoen, Modern Compiler Design, 2ed, 2012, p. 300.

Lexical Analysis Syntax Analysis Intermed. Code Generation Machine Code Generation Interpre- tation

slide-5
SLIDE 5

Generated code

* Preferably * avoid any evolution * regenerate on sync * Possibly * bidirectional link * Properties: * correctness, speed, size, energy…

F.Ferreira, B.Pientka, Bidirectional Elaboration of Dependently Typed Programs, PPDP 2014.

slide-6
SLIDE 6

Supercompilation

* History * partial evaluation (1964, L.A.Lombardi & B.Raphael?) * supercompilation (1966, Valentin Turchin) * local simplification (1975-) * subgoal abstraction (1975) * symbolic execution (1976, James C. King) * mixed computation (1977, Andrei Ershov) * Futamura projections (1983, Yoshihiko Futamura) * abstract interpretation (1977, P. & R. Cousot) * . . .

slide-7
SLIDE 7

Supercompilation

* Given is F(X,Y); find G(X) = F(X,z) * partial application (currying) * partial evaluation (residual) * Also covers: * lazy evaluation * theorem proving * problem solving

slide-8
SLIDE 8

Supercompilation

* map f $ map g xs
 
 * map (f . g) xs

slide-9
SLIDE 9

Supercompilation

* let ones = 1:ones
 in map (\ x -> x + 1) ones
 
 * let twos = 2:twos
 in twos

slide-10
SLIDE 10

Supercompilation

* sum x = case x of
 [] -> 0
 x:xs -> x + sum xs
 range i n = case i>n of
 True -> []
 False -> i:range (i+1) n
 main n = sum (range 0 n)
 * main2 i n = if i>n
 then 0
 else i + main2 (i+1) n
 main n = main2 1 n

slide-11
SLIDE 11

Generative SE

* Program generator * a program that produces programs * in a high-level language
 * Structured program generation * any generated program should type check * (it will be before running anyway) * (any error is a bug in a generator)

Yannis Smaragdakis, GTTSE 2015 Tutorial

slide-12
SLIDE 12

Everyone’s Doing It!

* sqlProg = "SELECT name FROM" + tableName + "WHERE id = " + id; * sqlProg = new SelectStmt(
 new Column("name"),
 table,
 new WhereClause(new Column("id"),
 id));

Yannis Smaragdakis, GTTSE 2015 Tutorial

slide-13
SLIDE 13

Everyone’s Doing It!

* template<int X, int Y>
 struct Adder
 { enum { result = X + Y }; }; * aspect S
 {
 declare parents:
 Car implements Serializable; 
 }

Yannis Smaragdakis, GTTSE 2015 Tutorial

slide-14
SLIDE 14

Everyone’s Doing It!

* expr = `[7 + i]; 
 
 * stmt = `[
 if (i > 0) return #[expr];
 ];

Yannis Smaragdakis, GTTSE 2015 Tutorial

slide-15
SLIDE 15

Staging

* Scala, MetaML, MetaOCaml, … * Explicit delaying of computation * quote * unquote * run/eval

Yannis Smaragdakis, GTTSE 2015 Tutorial

slide-16
SLIDE 16

MetaOCaml

let even n = (n mod 2) = 0;; let square x = x * x;; let rec power n x = if n = 0 then 1 else if even n then square (power (n/2) x) else x * (power (n-1) x) ;; let power5 = fun x -> (power 5 x ) ;;

Yannis Smaragdakis, GTTSE 2015 Tutorial

slide-17
SLIDE 17

MetaOCaml

let even n = (n mod 2) = 0;; let square x = x * x;; let rec powerS n x = if n = 0 then .<1>. else if even n then .<square .~(powerS (n/2) x)>. else .<.~x * .~(powerS (n-1) x)>.;; let power5 = !. .<fun x -> .~(powerS 5 .<x>.)>.;;

Yannis Smaragdakis, GTTSE 2015 Tutorial

slide-18
SLIDE 18

Scala

def powerS (n : Rep[Int], x : Int) : Rep[Int] = { if (n == 0) 1 else if (n % 2 == 0) { val result = powerS(n/2, x) result * result } else x * powerS(n-1, x) } def powerTest(n : Rep[Int]) : Rep[Int] = powerS(n, 5)

Yannis Smaragdakis, GTTSE 2015 Tutorial

slide-19
SLIDE 19

Java + MorphJ

class LogMe<class X> extends X { <R,A*>[m] for ( public R m(A) : X.methods ) public R m (A a) { R result = super.m(a); System.out.println(result); return result; } }

Yannis Smaragdakis, GTTSE 2015 Tutorial

slide-20
SLIDE 20

Java + MorphJ

class Listify<Subj> { Subj ref; Listify(Subj s) {ref = s;} <R,A>[m] for (public R m(A): Subj.methods) public R m (List<A> a) { // … call m for all elements } }

Yannis Smaragdakis, GTTSE 2015 Tutorial

slide-21
SLIDE 21

Java + SafeGen

#defgen MakeDelegator ( input(Class c) => !Abstract(c) ) { #foreach( Class c : input(c) ) { public class Delegator extends #[c] { #foreach(Method m : MethodOf(m, c) & !Private(m)) { #[m.Modifiers] #[m.Type] #[m] ( #[m.Formals] ) { return super.#[m](#[m.ArgNames]);

} } } } }

Yannis Smaragdakis, GTTSE 2015 Tutorial

slide-22
SLIDE 22

Pigs from Sausages

* Interactive disassembly * IDA Pro * Tool-independent * Dava, Boomerang, dcc * Compiler-specific * javac: Mocha, Jad, Jasmin, Wingdis, SourceAgain

slide-23
SLIDE 23

Decompilation uses

* recover lost source code * adapt to another platform * check security-critical code * find malware * inspect vulnerabilities * learn algorithms & data formats

Mike Van Emmerik, http://www.program-transformation.org/Transform/WhyDecompilation

slide-24
SLIDE 24

Decompilation

* Load binary code into virtual memory * Parse / disassemble * Recognise compilation patterns * Build control flow graph * Perform data flow analysis * Perform control flow analysis * Restructure intermediate result * Generate high-level code

C.Cifuentes, K.J.Gough, Decompilation of Binary Programs, SPE 25(7), 1995

slide-25
SLIDE 25

Disasm advice

* Do not underestimate debuggers * ptrace, gdb, windbg * winice, softice, linice * vmware, dosbox, bochs, xen, parallels * Obfuscation & deobfuscation * elfcrypt, upx, burneye, shiva * Learn system software * Beware of anti-hacking hacks

slide-26
SLIDE 26

Up-compilation

* CSS to SASS * ~70% less code * ~5% less padding * ~10% in mixins * ~8% to children * ~2 CSS decls per SASS var

Re-engineering Cascading Style Sheets by preprocessing and refactoring

Axel Polet axel.polet33@gmail.com August 23, 2015, 92 pages Supervisor
  • Dr. Vadim Zaytsev
Universiteit van Amsterdam Faculteit der Natuurwetenschappen, Wiskunde en Informatica Master Software Engineering http://www.software-engineering-amsterdam.nl

CRET

slide-27
SLIDE 27

Part I: Conclusion

* Compilation and code generation * Supercompilation * Generative programming * morphing as improved generics * staging as guided evaluation * You want meta-type safety

slide-28
SLIDE 28

II

slide-29
SLIDE 29

Language Conversion

* Everybody lies. * Syntax swap is NEVER a solution. * not even OS/VS COBOL to VS COBOL II * Wrapping is NOT a solution! * Component wrapping COULD be a solution for a while. * Two wrongs make a right, almost.

A.A.Terekhov, C.Verhoef, The Realities of Language Conversions, IEEE Software 2000.

slide-30
SLIDE 30

Language Conversion

A.A.Terekhov, C.Verhoef, The Realities of Language Conversions, IEEE Software 2000.

Native construct Simulated construct Native construct Simulated construct No construct

slide-31
SLIDE 31

Language Conversion

A.A.Terekhov, C.Verhoef, The Realities of Language Conversions, IEEE Software 2000.

Original program Target program

Syntax swap Restructuring Restructuring

slide-32
SLIDE 32

Codegen properties

* Correctness * Speed * Size * Memory use * Network demands * Energy * . . .

F.Ferreira, B.Pientka, Bidirectional Elaboration of Dependently Typed Programs, PPDP 2014.

slide-33
SLIDE 33

Correct codegen

* semantic preservation * …under special conditions * protect from logical errors * verification * testing

slide-34
SLIDE 34

Bit flip

* Software-Implemented Hardware Fault Tolerance (SIHFT) * Measurement unit: * FIT (Failure in 1000000000 hours ≈114155 years) * Reasons for SEU (Single Event Upsets) * natural radiation * chip temperature instability * malicious intervention * experimental technology * Known victims * Sun, Toyota

M.Heing-Becker, T.Kamph, S.Schupp, Bit-error injection for software developers, CSMR-WCRE 2014

slide-35
SLIDE 35

Fast code

* Optimisation * traditional semantic-preserving * Supercompilation * partial evaluation * Folding/unfolding * inlining functions

slide-36
SLIDE 36

Code optimisation

* By basic blocks * Construct data dependency graphs * Convert to SSA * (Static Single Assignment) * Eliminate common subexpressions * Form a ladder sequence * Allocate registers, pseudo-, memory…

D.Grune, K.v.Reeuwijk, H.E.Bal, C.J.H.Jacobs, K.Langendoen, Modern Compiler Design, 2ed, 2012, §9.1.2.

slide-37
SLIDE 37

Code optimisation

* By rewriting * Prepare instruction patterns * “load constant”, “multiply registers”, “add from memory”, etc * Traverse the tree bottom-up thrice * Instruction collection * Instruction selecting * Code generation

D.Grune, K.v.Reeuwijk, H.E.Bal, C.J.H.Jacobs, K.Langendoen, Modern Compiler Design, 2ed, 2012, §9.1.4.

slide-38
SLIDE 38

Folding/unfolding

* If code occurs several times * fold into a function and call it * If a function is scarcely called * unfold its body * Balancing * statically: with thresholds * dynamically: search-based

slide-39
SLIDE 39

Folding/unfolding

* Function inlining * void f
 { ... 
 
 print_square( i++ );
 ... 
 }
 void print_square(int n)
 { printf ("square = %d\n", n*n); }

D.Grune, K.v.Reeuwijk, H.E.Bal, C.J.H.Jacobs, K.Langendoen, Modern Compiler Design, 2ed, 2012, §7.3.3.

slide-40
SLIDE 40

Folding/unfolding

* Function inlining * void f
 { ... 
 
 printf ("square = %d\n", (i++)*(i++));
 ... 
 }
 void print_square(int n)
 { printf ("square = %d\n", n*n); }

D.Grune, K.v.Reeuwijk, H.E.Bal, C.J.H.Jacobs, K.Langendoen, Modern Compiler Design, 2ed, 2012, §7.3.3.

slide-41
SLIDE 41

Folding/unfolding

* Function inlining * void f
 { ... 
 {int n=i++;
 printf ("square = %d\n", n*n); }
 ... 
 }
 void print_square(int n)
 { printf ("square = %d\n", n*n); }

D.Grune, K.v.Reeuwijk, H.E.Bal, C.J.H.Jacobs, K.Langendoen, Modern Compiler Design, 2ed, 2012, §7.3.3.

slide-42
SLIDE 42

Folding/unfolding

* Function inlining + supercompilation * void f
 { ... 
 {int n=3;
 printf ("square = %d\n", n*n); }
 ... 
 }
 void print_square(int n)
 { printf ("square = %d\n", n*n); }

D.Grune, K.v.Reeuwijk, H.E.Bal, C.J.H.Jacobs, K.Langendoen, Modern Compiler Design, 2ed, 2012, §7.3.3.

slide-43
SLIDE 43

Folding/unfolding

* Function inlining + supercompilation * void f
 { ... 
 {int n=3;
 printf ("square = %d\n", 3*3); }
 ... 
 }
 void print_square(int n)
 { printf ("square = %d\n", n*n); }

D.Grune, K.v.Reeuwijk, H.E.Bal, C.J.H.Jacobs, K.Langendoen, Modern Compiler Design, 2ed, 2012, §7.3.3.

slide-44
SLIDE 44

Folding/unfolding

* Function inlining + supercompilation * void f
 { ... 
 {int n=3;
 printf ("square = %d\n", 9); }
 ... 
 }
 void print_square(int n)
 { printf ("square = %d\n", n*n); }

D.Grune, K.v.Reeuwijk, H.E.Bal, C.J.H.Jacobs, K.Langendoen, Modern Compiler Design, 2ed, 2012, §7.3.3.

slide-45
SLIDE 45

Folding/unfolding

* Function inlining + supercompilation * void f
 { ... 
 
 printf ("square = %d\n", 9);
 ... 
 }
 void print_square(int n)
 { printf ("square = %d\n", n*n); }

D.Grune, K.v.Reeuwijk, H.E.Bal, C.J.H.Jacobs, K.Langendoen, Modern Compiler Design, 2ed, 2012, §7.3.3.

slide-46
SLIDE 46

Size matters

* Aggressive suppression of unused code * Virtual machines + intermediate code * Honest compression (e.g., LZ) * Reliance on hardware * Traditional optimisations

D.Grune, K.v.Reeuwijk, H.E.Bal, C.J.H.Jacobs, K.Langendoen, Modern Compiler Design, 2ed, 2012, §9.2

slide-47
SLIDE 47

Superoptimisation

signum (x) int x; { if (x>0) return 1; else if (x<0) return -1; else return 0; }

H.Massalin. Superoptimizer: A look at the smallest program, ASPLOS 1987.

slide-48
SLIDE 48

Superoptimisation

(x in d0) add.1 d0, d0 subx.1 d1, d1 negx.1 d0 addx.1 d1, d1 (result in d1)

H.Massalin. Superoptimizer: A look at the smallest program, ASPLOS 1987.

slide-49
SLIDE 49

Superoptimisation

(x in d0) add.1 d0, d0 subx.1 d1, d1 negx.1 d0 addx.1 d1, d1 (result in d1)

H.Massalin. Superoptimizer: A look at the smallest program, ASPLOS 1987.

| sign bit in the carry flag! | d1 is now -1 | d0≠0, the carry is set! | -1+1 -1 = -1 | suppose d0 is negative

slide-50
SLIDE 50

Superoptimisation

(x in d0) add.1 d0, d0 subx.1 d1, d1 negx.1 d0 addx.1 d1, d1 (result in d1)

H.Massalin. Superoptimizer: A look at the smallest program, ASPLOS 1987.

| no carry set | d1 is now 0 | d0≠0, the carry is set! | 0+1 - 0 = 1 | suppose d0 is positive

slide-51
SLIDE 51

Superoptimisation

(x in d0) add.1 d0, d0 subx.1 d1, d1 negx.1 d0 addx.1 d1, d1 (result in d1)

H.Massalin. Superoptimizer: A look at the smallest program, ASPLOS 1987.

| d0 is 0, no carry set | d1 is now 0 | d0 is 0, no carry set | double 0 is still 0 | suppose d0 is zero

slide-52
SLIDE 52

Superoptimisation

* State of the art: * brute force is viable * optimisation database caches solutions * enumerate everything possible * harvest beforehand * canonicalisation up to equivalence * stochastic search for large code segments * adapting memory subsystems * used in GCC (prooflink: PLDI-1992-GranlundK)

J.G.Wingbermuehle, R.K.Cytron, R.D.Chamberlain, Superoptimization of Memory Subsystems, LCTES 2014. http://bibtex.github.io/LCTES-2014-WingbermuehleCC.html

slide-53
SLIDE 53

Power consumption

* More computation occurs on gadgets * Save energy to increase optime * Reduce costs * Limit peak heat dissipation * Conceptually (weakly) linked to * sustainability

D.Grune, K.v.Reeuwijk, H.E.Bal, C.J.H.Jacobs, K.Langendoen, Modern Compiler Design, 2ed, 2012, §9.3

slide-54
SLIDE 54

* Fast code uses less energy * gcc -O1 saves 20% * P = V × A * reduce CPU voltage * Replace the scheduler * minimise changed bits

SUCH GREEN

Power consumption

D.Grune, K.v.Reeuwijk, H.E.Bal, C.J.H.Jacobs, K.Langendoen, Modern Compiler Design, 2ed, 2012, §9.3

SUCH GREEN SO ENERGY WOW

slide-55
SLIDE 55

Part II Conclusion

* Language conversion does not exist * Code should be correct * even with bit flips * Code can be made small * Nothing beats superoptimisation * Try to conserve energy * just make it fast and optimal

slide-56
SLIDE 56

Conclusion

* General technology is easy * Topics touched * Compilation: up, down, generative * Optimisation: speed, size, power * Topics untouched * test suite optimisation * software transplantation

slide-57
SLIDE 57

Questions?

Y U NO SUBMIT REVIEW?!