Compilers and computer architecture: Just-in-time compilation Martin - - PowerPoint PPT Presentation

compilers and computer architecture just in time
SMART_READER_LITE
LIVE PREVIEW

Compilers and computer architecture: Just-in-time compilation Martin - - PowerPoint PPT Presentation

Compilers and computer architecture: Just-in-time compilation Martin Berger 1 December 2019 1 Email: M.F.Berger@sussex.ac.uk , Office hours: Wed 12-13 in Chi-2R312 1 / 1 Recall the function of compilers 2 / 1 Welcome to the cutting edge 3 /


slide-1
SLIDE 1

Compilers and computer architecture: Just-in-time compilation

Martin Berger 1 December 2019

1Email: M.F.Berger@sussex.ac.uk, Office hours: Wed 12-13 in

Chi-2R312

1 / 1

slide-2
SLIDE 2

Recall the function of compilers

2 / 1

slide-3
SLIDE 3

Welcome to the cutting edge

3 / 1

slide-4
SLIDE 4

Welcome to the cutting edge

Compilers are used to translate from programming languages humans can understand to machine code executable by

  • computers. Compilers come in two forms:

◮ Conventional ahead-of-time compilers where translation is

done once, long before program execution.

◮ Just-in-time (JIT) compilers where translation of program

fragments happens at the last possible moment and is interleaved with program execution.

4 / 1

slide-5
SLIDE 5

Welcome to the cutting edge

Compilers are used to translate from programming languages humans can understand to machine code executable by

  • computers. Compilers come in two forms:

◮ Conventional ahead-of-time compilers where translation is

done once, long before program execution.

◮ Just-in-time (JIT) compilers where translation of program

fragments happens at the last possible moment and is interleaved with program execution. We spend the whole term learning about the former. Today I want to give you a brief introduction to the latter.

5 / 1

slide-6
SLIDE 6

Why learn about JIT compilers?

6 / 1

slide-7
SLIDE 7

Why learn about JIT compilers?

In the past, dynamically typed languages (e.g. Python, Javascript) were much more slow than statically typed languages (factor of 10 or worse). Even OO languages (e.g. Java) were a lot slower than procedural languages like C.

7 / 1

slide-8
SLIDE 8

Why learn about JIT compilers?

In the past, dynamically typed languages (e.g. Python, Javascript) were much more slow than statically typed languages (factor of 10 or worse). Even OO languages (e.g. Java) were a lot slower than procedural languages like C. In the last couple of years, this gap has been narrowed

  • considerably. JIT compilers where the main cause of this

performance revolution.

8 / 1

slide-9
SLIDE 9

Why learn about JIT compilers?

In the past, dynamically typed languages (e.g. Python, Javascript) were much more slow than statically typed languages (factor of 10 or worse). Even OO languages (e.g. Java) were a lot slower than procedural languages like C. In the last couple of years, this gap has been narrowed

  • considerably. JIT compilers where the main cause of this

performance revolution. JIT compilers are cutting (bleeding) edge technology and considerably more complex than normal compilers, which are already non-trivial. Hence the presentation today will be massively simplifying.

9 / 1

slide-10
SLIDE 10

If JIT compilers are the answer ... what is the problem?

10 / 1

slide-11
SLIDE 11

If JIT compilers are the answer ... what is the problem?

Let’s look at two examples. Remember the compilation of

  • bjects and classes?

a dptr Instances of A Pointer to f_A Pointer to g_A dptr dptr dptr Method table for A Code for f_A Code for g_A Method bodies Pointer to f_B Method table for B Method bodies a Pointer to g_A Code for f_B dptr bdptr b Instances of B

To deal with inheritance of methods, invoking a method is indirect via the method table. Each invocation has to follow two

  • pointers. Without inheritance, no need for indirection.

11 / 1

slide-12
SLIDE 12

If JIT compilers are the answer ... what is the problem?

Of course an individual indirection takes < 1 nano-second on a modern CPU. So why worry?

12 / 1

slide-13
SLIDE 13

If JIT compilers are the answer ... what is the problem?

Of course an individual indirection takes < 1 nano-second on a modern CPU. So why worry? Answer: loops!

interface I { int f ( int n ); } class A implements I { public int f ( int n ) { return n; } } class B implements I { public int f ( int n ) { return 2*n; } } class Main { public static void main ( String [] args ) { I o = new A (); for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) {

  • .f ( i+j ); } } } }

Performance penalties add up.

13 / 1

slide-14
SLIDE 14

If JIT compilers are the answer ... what is the problem?

14 / 1

slide-15
SLIDE 15

If JIT compilers are the answer ... what is the problem?

But, I hear you say, it’s obvious, even at compile time, that the

  • bject o is of class A. A good optimising compiler should be

able to work this out, and replace the indirect invocation of f with a cheaper direct jump.

class Main { public static void main ( String [] args ) { I o = new A (); for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) {

  • .f ( i+j ); } } } }

15 / 1

slide-16
SLIDE 16

If JIT compilers are the answer ... what is the problem?

But, I hear you say, it’s obvious, even at compile time, that the

  • bject o is of class A. A good optimising compiler should be

able to work this out, and replace the indirect invocation of f with a cheaper direct jump.

class Main { public static void main ( String [] args ) { I o = new A (); for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) {

  • .f ( i+j ); } } } }

Yes, in this simple example, a good optimising compiler can do

  • this. But what about the following?

16 / 1

slide-17
SLIDE 17

If JIT compilers are the answer ... what is the problem?

public static void main ( String [] args ) { I o = null; if ( args [ 0 ] == "hello" ) new A (); else new B (); for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) {

  • .f ( i+j ); } } } }

17 / 1

slide-18
SLIDE 18

If JIT compilers are the answer ... what is the problem?

public static void main ( String [] args ) { I o = null; if ( args [ 0 ] == "hello" ) new A (); else new B (); for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) {

  • .f ( i+j ); } } } }

Now the type of o is determined only at run-time.

18 / 1

slide-19
SLIDE 19

If JIT compilers are the answer ... what is the problem?

public static void main ( String [] args ) { I o = null; if ( args [ 0 ] == "hello" ) new A (); else new B (); for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) {

  • .f ( i+j ); } } } }

Now the type of o is determined only at run-time. What is the problem?

19 / 1

slide-20
SLIDE 20

If JIT compilers are the answer ... what is the problem?

public static void main ( String [] args ) { I o = null; if ( args [ 0 ] == "hello" ) new A (); else new B (); for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) {

  • .f ( i+j ); } } } }

Now the type of o is determined only at run-time. What is the problem? Not enough information at compile-time to carry

  • ut optimisation!

20 / 1

slide-21
SLIDE 21

If JIT compilers are the answer ... what is the problem?

public static void main ( String [] args ) { I o = null; if ( args [ 0 ] == "hello" ) new A (); else new B (); for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) {

  • .f ( i+j ); } } } }

Now the type of o is determined only at run-time. What is the problem? Not enough information at compile-time to carry

  • ut optimisation! At run-time we do have this information, but

that’s too late (for normal compilers).

21 / 1

slide-22
SLIDE 22

If JIT compilers are the answer ... what is the problem?

public static void main ( String [] args ) { I o = null; if ( args [ 0 ] == "hello" ) new A (); else new B (); for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) {

  • .f ( i+j ); } } } }

Now the type of o is determined only at run-time. What is the problem? Not enough information at compile-time to carry

  • ut optimisation! At run-time we do have this information, but

that’s too late (for normal compilers). (Aside, can you see a hack to deal with this problem in an AOT compiler?)

22 / 1

slide-23
SLIDE 23

If JIT compilers are the answer ... what is the problem?

23 / 1

slide-24
SLIDE 24

If JIT compilers are the answer ... what is the problem?

Dynamically typed languages have a worse problem.

24 / 1

slide-25
SLIDE 25

If JIT compilers are the answer ... what is the problem?

Dynamically typed languages have a worse problem. Simplifying a little, variables in dynamically typed languages store not just the usual value, e.g. 3, but also the type of the value, e.g. Int, and sometimes even more.

25 / 1

slide-26
SLIDE 26

If JIT compilers are the answer ... what is the problem?

Dynamically typed languages have a worse problem. Simplifying a little, variables in dynamically typed languages store not just the usual value, e.g. 3, but also the type of the value, e.g. Int, and sometimes even more. Whenever you carry an innocent operation like

x = x + y

under the hood something like the following happens.

let tx = typeof ( x ) let ty = typeof ( y ) if ( tx == Int && ty == Int ) let vx = value ( x ) let vy = value ( y ) let res = integer_addition ( vx, vy ) x_result_part = res x_type_part = Int else ... // even more complicated.

26 / 1

slide-27
SLIDE 27

If JIT compilers are the answer ... what is the problem?

Imagine this in a nested loop!

for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) { let tx = typeof ( x ) let ty = typeof ( y ) if ( tx == Int && ty == Int ) let vx = value ( x ) let vy = value ( y ) let res = integer_addition ( vx, vy ) x_result_part = res x_type_part = Int ...

27 / 1

slide-28
SLIDE 28

If JIT compilers are the answer ... what is the problem?

Imagine this in a nested loop!

for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) { let tx = typeof ( x ) let ty = typeof ( y ) if ( tx == Int && ty == Int ) let vx = value ( x ) let vy = value ( y ) let res = integer_addition ( vx, vy ) x_result_part = res x_type_part = Int ...

This is painful. This is why dynamically typed languages are slow(er).

28 / 1

slide-29
SLIDE 29

If JIT compilers are the answer ... what is the problem?

But ...

29 / 1

slide-30
SLIDE 30

If JIT compilers are the answer ... what is the problem?

But ... in practise, variables usually do not change their types in inner loops.

30 / 1

slide-31
SLIDE 31

If JIT compilers are the answer ... what is the problem?

But ... in practise, variables usually do not change their types in inner loops. Why?

31 / 1

slide-32
SLIDE 32

If JIT compilers are the answer ... what is the problem?

But ... in practise, variables usually do not change their types in inner loops. Why? Because typically innermost loops work on big and uniform data structures (usually big arrays).

32 / 1

slide-33
SLIDE 33

If JIT compilers are the answer ... what is the problem?

But ... in practise, variables usually do not change their types in inner loops. Why? Because typically innermost loops work on big and uniform data structures (usually big arrays). So the compiler should move the type-checks outside the loops.

33 / 1

slide-34
SLIDE 34

If JIT compilers are the answer ... what is the problem?

Recall that in dynamically typed languages

for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) { a [i, j] = a[i,j] + 1 } }

34 / 1

slide-35
SLIDE 35

If JIT compilers are the answer ... what is the problem?

Recall that in dynamically typed languages

for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) { a [i, j] = a[i,j] + 1 } }

Is really

for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) { let ta = typeof ( a[i, j] ) // always same let t1 = typeof ( 1 ) // always same if ( ta == Int && t1 == Int ) { let va = value ( a[i, j] ) let v1 = value ( 1 ) // simplifying let res = integer_addition ( va, v1 ) a[ i, j ]_result_part = res a[ i, j ] _type_part = Int } else { ... } } }

35 / 1

slide-36
SLIDE 36

If JIT compilers are the answer ... what is the problem?

So program from last slide can be

let ta = typeof ( a ) let t1 = typeof ( 1 ) if ( ta == Array [...] of Int && t1 == Int ) { for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) { let va = value ( a[i, j] ) let v1 = value ( 1 ) // simplifying let res = integer_addition ( va, v1 ) a[ i, j ]_result_part = res } } } else { ... }

36 / 1

slide-37
SLIDE 37

If JIT compilers are the answer ... what is the problem?

So program from last slide can be

let ta = typeof ( a ) let t1 = typeof ( 1 ) if ( ta == Array [...] of Int && t1 == Int ) { for ( int i = 0; i < 1000000; i++ ) { for ( int j = 0; i < 1000000; j++ ) { let va = value ( a[i, j] ) let v1 = value ( 1 ) // simplifying let res = integer_addition ( va, v1 ) a[ i, j ]_result_part = res } } } else { ... }

Alas, at compile-time, the compiler does not have enough information to make this optimisation safely.

37 / 1

slide-38
SLIDE 38

If JIT compilers are the answer ... what is the problem?

38 / 1

slide-39
SLIDE 39

If JIT compilers are the answer ... what is the problem?

Let’s summarise the situation.

◮ Certain powerful optimisations cannot be done at

compile-time, because the compiler has not got enough information to know they are safe.

◮ At run-time we have enough information to carry out these

  • ptimisations.

39 / 1

slide-40
SLIDE 40

If JIT compilers are the answer ... what is the problem?

Let’s summarise the situation.

◮ Certain powerful optimisations cannot be done at

compile-time, because the compiler has not got enough information to know they are safe.

◮ At run-time we have enough information to carry out these

  • ptimisations.

Hmmm, what could we do ...

40 / 1

slide-41
SLIDE 41

41 / 1

slide-42
SLIDE 42

How about we compile and optimise only at run-time?

42 / 1

slide-43
SLIDE 43

How about we compile and optimise only at run-time? But there is no run-time if we don’t have a compilation process, right?

43 / 1

slide-44
SLIDE 44

How about we compile and optimise only at run-time? But there is no run-time if we don’t have a compilation process, right? Enter interpreters!

44 / 1

slide-45
SLIDE 45

Interpreters

45 / 1

slide-46
SLIDE 46

Interpreters

Recall from the beginning of the course, that interpreters are a second way to run programs.

Compiler Source program Executable Data Output Source program Interpreter Data Output

At runtime. 46 / 1

slide-47
SLIDE 47

Interpreters

Recall from the beginning of the course, that interpreters are a second way to run programs.

Compiler Source program Executable Data Output Source program Interpreter Data Output

At runtime.

◮ Compilers generate a program

that has an effect on the world.

47 / 1

slide-48
SLIDE 48

Interpreters

Recall from the beginning of the course, that interpreters are a second way to run programs.

Compiler Source program Executable Data Output Source program Interpreter Data Output

At runtime.

◮ Compilers generate a program

that has an effect on the world.

◮ Interpreters effect the world

directly.

48 / 1

slide-49
SLIDE 49

Interpreters

Recall from the beginning of the course, that interpreters are a second way to run programs.

Compiler Source program Executable Data Output Source program Interpreter Data Output

At runtime.

◮ The advantage of compilers is

that generated code is faster, because a lot of work has to be done only once (e.g. lexing, parsing, type-checking,

  • ptimisation). And the results
  • f this work are shared in

every execution. The interpreter has to redo this work every time.

◮ The advantage of interpreters

is that they are much simpler than compilers.

49 / 1

slide-50
SLIDE 50

JIT compiler, key idea

50 / 1

slide-51
SLIDE 51

JIT compiler, key idea

Interpret the program, and compile (parts of) the program at run-time.

51 / 1

slide-52
SLIDE 52

JIT compiler, key idea

Interpret the program, and compile (parts of) the program at run-time. This suggests the following questions.

52 / 1

slide-53
SLIDE 53

JIT compiler, key idea

Interpret the program, and compile (parts of) the program at run-time. This suggests the following questions.

◮ When shall we compile, and which parts of the program?

53 / 1

slide-54
SLIDE 54

JIT compiler, key idea

Interpret the program, and compile (parts of) the program at run-time. This suggests the following questions.

◮ When shall we compile, and which parts of the program? ◮ How do interpreter and compiled program interact?

54 / 1

slide-55
SLIDE 55

JIT compiler, key idea

Interpret the program, and compile (parts of) the program at run-time. This suggests the following questions.

◮ When shall we compile, and which parts of the program? ◮ How do interpreter and compiled program interact? ◮ But most of all: compilation is really slow, especially

  • ptimising compilation. Don’t we make performance worse

if we slow an already slow interpreter down with a lengthy compilation process?

55 / 1

slide-56
SLIDE 56

JIT compiler, key idea

Interpret the program, and compile (parts of) the program at run-time. This suggests the following questions.

◮ When shall we compile, and which parts of the program? ◮ How do interpreter and compiled program interact? ◮ But most of all: compilation is really slow, especially

  • ptimising compilation. Don’t we make performance worse

if we slow an already slow interpreter down with a lengthy compilation process? In other words, we are facing the following conundrum:

56 / 1

slide-57
SLIDE 57

JIT compiler, key idea

Interpret the program, and compile (parts of) the program at run-time. This suggests the following questions.

◮ When shall we compile, and which parts of the program? ◮ How do interpreter and compiled program interact? ◮ But most of all: compilation is really slow, especially

  • ptimising compilation. Don’t we make performance worse

if we slow an already slow interpreter down with a lengthy compilation process? In other words, we are facing the following conundrum:

◮ We want to optimise as much as possible, because

  • ptimised programs run faster.

57 / 1

slide-58
SLIDE 58

JIT compiler, key idea

Interpret the program, and compile (parts of) the program at run-time. This suggests the following questions.

◮ When shall we compile, and which parts of the program? ◮ How do interpreter and compiled program interact? ◮ But most of all: compilation is really slow, especially

  • ptimising compilation. Don’t we make performance worse

if we slow an already slow interpreter down with a lengthy compilation process? In other words, we are facing the following conundrum:

◮ We want to optimise as much as possible, because

  • ptimised programs run faster.

◮ We want to optimises as little as possible, because running

the optimisers is really slow.

58 / 1

slide-59
SLIDE 59

JIT compiler, key idea

Interpret the program, and compile (parts of) the program at run-time. This suggests the following questions.

◮ When shall we compile, and which parts of the program? ◮ How do interpreter and compiled program interact? ◮ But most of all: compilation is really slow, especially

  • ptimising compilation. Don’t we make performance worse

if we slow an already slow interpreter down with a lengthy compilation process? In other words, we are facing the following conundrum:

◮ We want to optimise as much as possible, because

  • ptimised programs run faster.

◮ We want to optimises as little as possible, because running

the optimisers is really slow. Hmmmm ...

59 / 1

slide-60
SLIDE 60

Pareto principle and compiler/interpreter ∆ to our rescue

Compiling Running Running Interpreter Compiler Time Paid every time Paid once

Interpretation is much faster than (optimising) compilation. But a compiled program is much faster than

  • interpretation. And we have

to compile only once.

60 / 1

slide-61
SLIDE 61

Pareto principle and compiler/interpreter ∆ to our rescue

Compiling Running Running Interpreter Compiler Time Paid every time Paid once

Interpretation is much faster than (optimising) compilation. But a compiled program is much faster than

  • interpretation. And we have

to compile only once. Combine this with the Pareto principle, and you have a potent weapon at hand.

61 / 1

slide-62
SLIDE 62

Pareto principle, aka 80-20 rule

62 / 1

slide-63
SLIDE 63

Pareto principle, aka 80-20 rule

Vilfredo Pareto, late 19th, early 20th century Italian economist. Noticed:

◮ 80% of land in Italy was owned by 20% of the population. ◮ 20% of the pea pods in his garden contained 80% of the

peas.

63 / 1

slide-64
SLIDE 64

Pareto principle, aka 80-20 rule

Vilfredo Pareto, late 19th, early 20th century Italian economist. Noticed:

◮ 80% of land in Italy was owned by 20% of the population. ◮ 20% of the pea pods in his garden contained 80% of the

peas. This principle applies in many other areas of life, including program execution:

64 / 1

slide-65
SLIDE 65

Pareto principle, aka 80-20 rule

Vilfredo Pareto, late 19th, early 20th century Italian economist. Noticed:

◮ 80% of land in Italy was owned by 20% of the population. ◮ 20% of the pea pods in his garden contained 80% of the

peas. This principle applies in many other areas of life, including program execution: The great majority of a program’s execution time is spent running in a tiny fragment of the code.

65 / 1

slide-66
SLIDE 66

Pareto principle, aka 80-20 rule

Vilfredo Pareto, late 19th, early 20th century Italian economist. Noticed:

◮ 80% of land in Italy was owned by 20% of the population. ◮ 20% of the pea pods in his garden contained 80% of the

peas. This principle applies in many other areas of life, including program execution: The great majority of a program’s execution time is spent running in a tiny fragment of the code. Such code is referred to as hot.

66 / 1

slide-67
SLIDE 67

Putting the pieces together

67 / 1

slide-68
SLIDE 68

Putting the pieces together

Compiling Running Running Interpreter Compiler Time

Clearly compiling at run-time code that’s executed infrequently will slow down execution. Trade-offs are different for hot code.

68 / 1

slide-69
SLIDE 69

Putting the pieces together

Compiling Running Running Interpreter Compiler Time

Clearly compiling at run-time code that’s executed infrequently will slow down execution. Trade-offs are different for hot code. An innermost loop may be executed billions of

  • times. The more often, the more optimising

compilation pays off.

69 / 1

slide-70
SLIDE 70

Putting the pieces together

Compiling Running Running Interpreter Compiler Time

Clearly compiling at run-time code that’s executed infrequently will slow down execution. Trade-offs are different for hot code. An innermost loop may be executed billions of

  • times. The more often, the more optimising

compilation pays off. Pareto’s principle tells us that (typically) a program contains some hot code.

70 / 1

slide-71
SLIDE 71

Putting the pieces together

Compiling Running Running Interpreter Compiler Time

Clearly compiling at run-time code that’s executed infrequently will slow down execution. Trade-offs are different for hot code. An innermost loop may be executed billions of

  • times. The more often, the more optimising

compilation pays off. Pareto’s principle tells us that (typically) a program contains some hot code. With the information available at run-time, we can aggressively optimise such hot code, and get a massive speed-up. The rest is interpreted. Sluggishness of interpretation doesn’t matter, because it’s only a fraction of program execution time.

71 / 1

slide-72
SLIDE 72

There is just one problem ... how do we find hot code?

72 / 1

slide-73
SLIDE 73

There is just one problem ... how do we find hot code?

Remember, at compiler time, the optimiser couldn’t work it out (reliably).

73 / 1

slide-74
SLIDE 74

There is just one problem ... how do we find hot code?

Remember, at compiler time, the optimiser couldn’t work it out (reliably).

74 / 1

slide-75
SLIDE 75

There is just one problem ... how do we find hot code?

Remember, at compiler time, the optimiser couldn’t work it out (reliably). Let’s use counters at run-time!

75 / 1

slide-76
SLIDE 76

There is just one problem ... how do we find hot code?

Remember, at compiler time, the optimiser couldn’t work it out (reliably). Let’s use counters at run-time! We instrument the interpreter with counters, that increment every time a method is called,

  • r every time we go round a loop.

76 / 1

slide-77
SLIDE 77

There is just one problem ... how do we find hot code?

Remember, at compiler time, the optimiser couldn’t work it out (reliably). Let’s use counters at run-time! We instrument the interpreter with counters, that increment every time a method is called,

  • r every time we go round a loop.

Whenever these counters reach a threshold, we assume that the associated code is hot. We compile that hot code, and jump to the compiled code.

77 / 1

slide-78
SLIDE 78

There is just one problem ... how do we find hot code?

Remember, at compiler time, the optimiser couldn’t work it out (reliably). Let’s use counters at run-time! We instrument the interpreter with counters, that increment every time a method is called,

  • r every time we go round a loop.

Whenever these counters reach a threshold, we assume that the associated code is hot. We compile that hot code, and jump to the compiled code. Making this play nice with garbage collection, exceptions, concurrency, debugging isn’t easy ...

78 / 1

slide-79
SLIDE 79

There is just one problem ... how do we find hot code?

Remember, at compiler time, the optimiser couldn’t work it out (reliably). Let’s use counters at run-time! We instrument the interpreter with counters, that increment every time a method is called,

  • r every time we go round a loop.

Whenever these counters reach a threshold, we assume that the associated code is hot. We compile that hot code, and jump to the compiled code. Making this play nice with garbage collection, exceptions, concurrency, debugging isn’t easy ... When the compiled code terminates, we switch back to interpretation.

79 / 1

slide-80
SLIDE 80

In a picture

80 / 1

slide-81
SLIDE 81

In a picture

Interpret Increment counter Hot code Compile hot code and

  • ptimise

No Yes Execute compiled hot code to termination Source code

81 / 1

slide-82
SLIDE 82

Aside

82 / 1

slide-83
SLIDE 83

Aside

Have you noticed that Java programs start up quite slowly?

83 / 1

slide-84
SLIDE 84

Aside

Have you noticed that Java programs start up quite slowly? This is because at the beginning, everything is interpreted, hence slow. Then JIT compilation starts, also slow.

84 / 1

slide-85
SLIDE 85

Aside

Have you noticed that Java programs start up quite slowly? This is because at the beginning, everything is interpreted, hence slow. Then JIT compilation starts, also slow. Eventually, the hot code is detected and compiled with a great deal of optimisation. Then execution gets really fast.

85 / 1

slide-86
SLIDE 86

The devil is in the details

86 / 1

slide-87
SLIDE 87

The devil is in the details

This picture omits many subtleties.

87 / 1

slide-88
SLIDE 88

The devil is in the details

This picture omits many subtleties. Chief among those is that the handover of control from interpreter to compiler and back works seamlessly.

88 / 1

slide-89
SLIDE 89

The devil is in the details

This picture omits many subtleties. Chief among those is that the handover of control from interpreter to compiler and back works seamlessly. Also, we don’t want to recompile code, typically use cache of already compiled code.

89 / 1

slide-90
SLIDE 90

The devil is in the details

This picture omits many subtleties. Chief among those is that the handover of control from interpreter to compiler and back works seamlessly. Also, we don’t want to recompile code, typically use cache of already compiled code. How actually to do the optimisations, taking information available at run-time into account.

90 / 1

slide-91
SLIDE 91

The devil is in the details

This picture omits many subtleties. Chief among those is that the handover of control from interpreter to compiler and back works seamlessly. Also, we don’t want to recompile code, typically use cache of already compiled code. How actually to do the optimisations, taking information available at run-time into account. Etc etc.

91 / 1

slide-92
SLIDE 92

JIT compilers summary

92 / 1

slide-93
SLIDE 93

JIT compilers summary

JIT compilers are the cutting edge of compiler technology. They were first conceived (in rudimentary form) in the 1960s, but came to life in the last 10 years or so.

93 / 1

slide-94
SLIDE 94

JIT compilers summary

JIT compilers are the cutting edge of compiler technology. They were first conceived (in rudimentary form) in the 1960s, but came to life in the last 10 years or so. JIT compilers are very complicated. The JVM, probably the best known JIT compiler, probably took 1000+ person years to build.

94 / 1

slide-95
SLIDE 95

JIT compilers summary

JIT compilers are the cutting edge of compiler technology. They were first conceived (in rudimentary form) in the 1960s, but came to life in the last 10 years or so. JIT compilers are very complicated. The JVM, probably the best known JIT compiler, probably took 1000+ person years to build. So what’s next in compiler technology?

95 / 1

slide-96
SLIDE 96

JIT compilers summary

JIT compilers are the cutting edge of compiler technology. They were first conceived (in rudimentary form) in the 1960s, but came to life in the last 10 years or so. JIT compilers are very complicated. The JVM, probably the best known JIT compiler, probably took 1000+ person years to build. So what’s next in compiler technology? Let me introduce you to ...

96 / 1

slide-97
SLIDE 97

Tracing JIT compilers

97 / 1

slide-98
SLIDE 98

Tracing JIT compilers

Tracing JIT compilers are a form of JIT compilation where

  • ptimisation is especially aggressive.

98 / 1

slide-99
SLIDE 99

Tracing JIT compilers

99 / 1

slide-100
SLIDE 100

Tracing JIT compilers

Hot code can contain code that is not used (much).

100 / 1

slide-101
SLIDE 101

Tracing JIT compilers

Hot code can contain code that is not used (much). Imagine the compilation of:

for ( x = 1 to 1000000 ) for ( y = 1 to 1000000 ) try a[ x ][ y ] = a[ x+1 ][ a [ y-1 ][ y+1 ] ] catch ... // error handling

101 / 1

slide-102
SLIDE 102

Tracing JIT compilers

Hot code can contain code that is not used (much). Imagine the compilation of:

for ( x = 1 to 1000000 ) for ( y = 1 to 1000000 ) try a[ x ][ y ] = a[ x+1 ][ a [ y-1 ][ y+1 ] ] catch ... // error handling

Clearly the try-catch block is an innermost loop, so potentially hot code. But if the programmer does a good job, the exception handling will never be triggered. Yet we have all this exception handling code (tends to be large) in the hot loop. This causes all manner of problems, e.g. cache locality is destroyed.

102 / 1

slide-103
SLIDE 103

Tracing JIT compilers

Hot code can contain code that is not used (much). Imagine the compilation of:

for ( x = 1 to 1000000 ) for ( y = 1 to 1000000 ) try a[ x ][ y ] = a[ x+1 ][ a [ y-1 ][ y+1 ] ] catch ... // error handling

It is difficult to figure out, even at run-time (!) to find such parts.

103 / 1

slide-104
SLIDE 104

Tracing JIT compilers

Hot code can contain code that is not used (much). Imagine the compilation of:

for ( x = 1 to 1000000 ) for ( y = 1 to 1000000 ) try a[ x ][ y ] = a[ x+1 ][ a [ y-1 ][ y+1 ] ] catch ... // error handling

It is difficult to figure out, even at run-time (!) to find such parts. Why can’t we use counters?

104 / 1

slide-105
SLIDE 105

Tracing JIT compilers

Hot code can contain code that is not used (much). Imagine the compilation of:

for ( x = 1 to 1000000 ) for ( y = 1 to 1000000 ) try a[ x ][ y ] = a[ x+1 ][ a [ y-1 ][ y+1 ] ] catch ... // error handling

It is difficult to figure out, even at run-time (!) to find such parts. Why can’t we use counters? Yes but ... counters only give us some relevant information ... for good optimisation we need more information. Traces give us this information. What are traces?

105 / 1

slide-106
SLIDE 106

Tracing JIT compilers

106 / 1

slide-107
SLIDE 107

Tracing JIT compilers

Tracing JIT compilers have not one, but several compilers (or interpreters) inside (simplifying greatly).

107 / 1

slide-108
SLIDE 108

Tracing JIT compilers

Tracing JIT compilers have not one, but several compilers (or interpreters) inside (simplifying greatly). After the interpreter has found hot code, the hot code is compiled and executed once (called tracing execution).

108 / 1

slide-109
SLIDE 109

Tracing JIT compilers

Tracing JIT compilers have not one, but several compilers (or interpreters) inside (simplifying greatly). After the interpreter has found hot code, the hot code is compiled and executed once (called tracing execution). In the tracing execution, the machine code actually executed is recorded, yielding the trace of the hot code.

109 / 1

slide-110
SLIDE 110

Tracing JIT compilers

Tracing JIT compilers have not one, but several compilers (or interpreters) inside (simplifying greatly). After the interpreter has found hot code, the hot code is compiled and executed once (called tracing execution). In the tracing execution, the machine code actually executed is recorded, yielding the trace of the hot code. Note that if the machine code to be traced is branching, only the branch taken is in the trace. Traces are linear, no branching. This makes optimisation algorithms much simpler and faster.

110 / 1

slide-111
SLIDE 111

Tracing JIT compilers

Tracing JIT compilers have not one, but several compilers (or interpreters) inside (simplifying greatly). After the interpreter has found hot code, the hot code is compiled and executed once (called tracing execution). In the tracing execution, the machine code actually executed is recorded, yielding the trace of the hot code. Note that if the machine code to be traced is branching, only the branch taken is in the trace. Traces are linear, no branching. This makes optimisation algorithms much simpler and faster. Once tracing has finished, e.g. the body of the hot loop has been executed once: then analyse and optimise trace.

111 / 1

slide-112
SLIDE 112

Tracing JIT compilers

Tracing JIT compilers have not one, but several compilers (or interpreters) inside (simplifying greatly). After the interpreter has found hot code, the hot code is compiled and executed once (called tracing execution). In the tracing execution, the machine code actually executed is recorded, yielding the trace of the hot code. Note that if the machine code to be traced is branching, only the branch taken is in the trace. Traces are linear, no branching. This makes optimisation algorithms much simpler and faster. Once tracing has finished, e.g. the body of the hot loop has been executed once: then analyse and optimise trace. Based on the analysis another compiler generates another (highly optimised) executable, which is then run to termination, then control goes back to interpreter.

112 / 1

slide-113
SLIDE 113

Tracing JIT compilers

Analysing and optimising the trace:

◮ Find out if variables change type in the loop, if not, move

type-checking out of the loop. (For dynamically typed languages.)

◮ Find out if object change type in the loop, if not, use

short-cut method invocations, no need to go via method table.

◮ Let the interpreter handle the rarely used parts of the hot

loop (e.g. error handling).

◮ ... ◮ Finally, enter the third phase, the ’normal’ execution of the

  • ptimised trace.

113 / 1

slide-114
SLIDE 114

A tracing JIT compiler in a picture

Interpret Increment counter Hot code Compile No Yes Execute compiled code and record trace Analyse trace Optimise trace Execute

  • ptimised

trace Source code

114 / 1

slide-115
SLIDE 115

Difficulties

As with normal JIT compilers, we have to orchestrate the interplay of all these compiler phases, e.g.: Handover of control from interpreter to compiler, to tracing, to execution of

  • ptimised trace, and back. Garbage collection, exceptions,

concurrency etc must all also work.

115 / 1

slide-116
SLIDE 116

Difficulties

As with normal JIT compilers, we have to orchestrate the interplay of all these compiler phases, e.g.: Handover of control from interpreter to compiler, to tracing, to execution of

  • ptimised trace, and back. Garbage collection, exceptions,

concurrency etc must all also work. Typical optimisations: type-specialisation, bypassing method invocation, function inlining, register allocation, dead code elimination.

116 / 1

slide-117
SLIDE 117

Difficulties

As with normal JIT compilers, we have to orchestrate the interplay of all these compiler phases, e.g.: Handover of control from interpreter to compiler, to tracing, to execution of

  • ptimised trace, and back. Garbage collection, exceptions,

concurrency etc must all also work. Typical optimisations: type-specialisation, bypassing method invocation, function inlining, register allocation, dead code elimination. Etc etc.

117 / 1

slide-118
SLIDE 118

Example compilers

118 / 1

slide-119
SLIDE 119

Example compilers

The JVM (from Oracle). It is a method based JIT compiler, meaning that methods are the units of compilation. It is not tracing.

119 / 1

slide-120
SLIDE 120

Example compilers

The JVM (from Oracle). It is a method based JIT compiler, meaning that methods are the units of compilation. It is not tracing. The first implementation of a tracing JIT was HPs Dynamo. It does not compile from a high-level language to a low-level

  • language. Instead it optimises machine-code.

120 / 1

slide-121
SLIDE 121

Example compilers

The JVM (from Oracle). It is a method based JIT compiler, meaning that methods are the units of compilation. It is not tracing. The first implementation of a tracing JIT was HPs Dynamo. It does not compile from a high-level language to a low-level

  • language. Instead it optimises machine-code.

HotpathVM was the first tracing JIT for a high-level language (Java).

121 / 1

slide-122
SLIDE 122

Example compilers

The JVM (from Oracle). It is a method based JIT compiler, meaning that methods are the units of compilation. It is not tracing. The first implementation of a tracing JIT was HPs Dynamo. It does not compile from a high-level language to a low-level

  • language. Instead it optimises machine-code.

HotpathVM was the first tracing JIT for a high-level language (Java). TraceMonkey, one of Firefox’s JavaScript implementations was first JIT compiler for Javascript. (NB: Current Firefox’s SpiderMonkey is not tracing.)

122 / 1

slide-123
SLIDE 123

Example compilers

The JVM (from Oracle). It is a method based JIT compiler, meaning that methods are the units of compilation. It is not tracing. The first implementation of a tracing JIT was HPs Dynamo. It does not compile from a high-level language to a low-level

  • language. Instead it optimises machine-code.

HotpathVM was the first tracing JIT for a high-level language (Java). TraceMonkey, one of Firefox’s JavaScript implementations was first JIT compiler for Javascript. (NB: Current Firefox’s SpiderMonkey is not tracing.) Hard to say exactly who uses what (e.g. Apple Safari) since companies rarely say what they’re using. They can use more than one. Trade secrets.

123 / 1

slide-124
SLIDE 124

Example compilers

Open source: PyPy, a meta-tracing framework for Python.

124 / 1

slide-125
SLIDE 125

Example compilers

Open source: PyPy, a meta-tracing framework for Python. Meta-tracing, what’s that?

125 / 1

slide-126
SLIDE 126

Meta-tracing?

126 / 1

slide-127
SLIDE 127

Meta-tracing

Background:

127 / 1

slide-128
SLIDE 128

Meta-tracing

Background: Writing compilers is hard, writing optimising compilers is harder, writing JIT compilers is harder still, but writing tracing JIT compilers is the hardest.

128 / 1

slide-129
SLIDE 129

Meta-tracing

Background: Writing compilers is hard, writing optimising compilers is harder, writing JIT compilers is harder still, but writing tracing JIT compilers is the hardest. Designers of new programming languages cannot really produce a good code generator for a new language. Typically language designers write interpreters for new languages. But that means the new language is hampered. This impedes progress in programming languages.

129 / 1

slide-130
SLIDE 130

Meta-tracing

Background: Writing compilers is hard, writing optimising compilers is harder, writing JIT compilers is harder still, but writing tracing JIT compilers is the hardest. Designers of new programming languages cannot really produce a good code generator for a new language. Typically language designers write interpreters for new languages. But that means the new language is hampered. This impedes progress in programming languages. Great idea: how about using a JIT compiler to compile the interpreter, hoping that JITing will speed up interpreter, hence new PL.

130 / 1

slide-131
SLIDE 131

Meta-tracing

Background: Writing compilers is hard, writing optimising compilers is harder, writing JIT compilers is harder still, but writing tracing JIT compilers is the hardest. Designers of new programming languages cannot really produce a good code generator for a new language. Typically language designers write interpreters for new languages. But that means the new language is hampered. This impedes progress in programming languages. Great idea: how about using a JIT compiler to compile the interpreter, hoping that JITing will speed up interpreter, hence new PL. This idea is ingenious, simple, old and ...

131 / 1

slide-132
SLIDE 132

Meta-tracing

Background: Writing compilers is hard, writing optimising compilers is harder, writing JIT compilers is harder still, but writing tracing JIT compilers is the hardest. Designers of new programming languages cannot really produce a good code generator for a new language. Typically language designers write interpreters for new languages. But that means the new language is hampered. This impedes progress in programming languages. Great idea: how about using a JIT compiler to compile the interpreter, hoping that JITing will speed up interpreter, hence new PL. This idea is ingenious, simple, old and ... wrong!

132 / 1

slide-133
SLIDE 133

Meta-tracing

Background: Writing compilers is hard, writing optimising compilers is harder, writing JIT compilers is harder still, but writing tracing JIT compilers is the hardest. Designers of new programming languages cannot really produce a good code generator for a new language. Typically language designers write interpreters for new languages. But that means the new language is hampered. This impedes progress in programming languages. Great idea: how about using a JIT compiler to compile the interpreter, hoping that JITing will speed up interpreter, hence new PL. This idea is ingenious, simple, old and ... wrong! The problem is that interpreter loops are the kinds of loops that JITers do not optimise well. Let’s explain this in detail.

133 / 1

slide-134
SLIDE 134

Why JIT compilers can’t optimise interpreter loops

134 / 1

slide-135
SLIDE 135

Why JIT compilers can’t optimise interpreter loops

An interpreter is a big loop that gets the next command and acts on it, e.g.

135 / 1

slide-136
SLIDE 136

Why JIT compilers can’t optimise interpreter loops

An interpreter is a big loop that gets the next command and acts on it, e.g.

while true do: cmd = getNextCommand if cmd is: "x := E" then ... "if C then M else N" then ... "while C do M" then ... "repeat M until C" then ... "print(M)" then ... ...

136 / 1

slide-137
SLIDE 137

Why JIT compilers can’t optimise interpreter loops

An interpreter is a big loop that gets the next command and acts on it, e.g.

while true do: cmd = getNextCommand if cmd is: "x := E" then ... "if C then M else N" then ... "while C do M" then ... "repeat M until C" then ... "print(M)" then ... ...

Now JIT compilers are really good at optimising loops, why do they fail with interpreter loops?

137 / 1

slide-138
SLIDE 138

Key requirements for good JIT optimisation of loops

The essence of JIT compilation are tight inner loops that are executed a large number of times. This insight can be split into separate parts.

138 / 1

slide-139
SLIDE 139

Key requirements for good JIT optimisation of loops

The essence of JIT compilation are tight inner loops that are executed a large number of times. This insight can be split into separate parts.

◮ Because they are executed a large number of times the

effect of the optimisation is magnified.

139 / 1

slide-140
SLIDE 140

Key requirements for good JIT optimisation of loops

The essence of JIT compilation are tight inner loops that are executed a large number of times. This insight can be split into separate parts.

◮ Because they are executed a large number of times the

effect of the optimisation is magnified.

◮ Optimising these inner loops heavily gives substantial

performace benefits.

140 / 1

slide-141
SLIDE 141

Key requirements for good JIT optimisation of loops

The essence of JIT compilation are tight inner loops that are executed a large number of times. This insight can be split into separate parts.

◮ Because they are executed a large number of times the

effect of the optimisation is magnified.

◮ Optimising these inner loops heavily gives substantial

performace benefits.

◮ Each iteration (or at least most of them) do the same thing.

141 / 1

slide-142
SLIDE 142

Key requirements for good JIT optimisation of loops

The essence of JIT compilation are tight inner loops that are executed a large number of times. This insight can be split into separate parts.

◮ Because they are executed a large number of times the

effect of the optimisation is magnified.

◮ Optimising these inner loops heavily gives substantial

performace benefits.

◮ Each iteration (or at least most of them) do the same thing.

Last requirement is violated in interpreter loops.

142 / 1

slide-143
SLIDE 143

Why can’t interpreter loops be JITed?

The problem is that the source language to be interpreted has loops too.

143 / 1

slide-144
SLIDE 144

Why can’t interpreter loops be JITed?

The problem is that the source language to be interpreted has loops too. Let’s assume this is the programm we are interpreting.

while i > 0: j = j+i i = i-1

144 / 1

slide-145
SLIDE 145

Why can’t interpreter loops be JITed?

The problem is that the source language to be interpreted has loops too. Let’s assume this is the programm we are interpreting.

while i > 0: j = j+i i = i-1

This gives rise to something like the following bytecode

loop: br r17 exit add r21 r33 r21 subabs r33 1 r33 jump loop exit: ...

145 / 1

slide-146
SLIDE 146

Why can’t interpreter loops be JITed?

Let’s have bytecode and bytecode interpreter side-by-side:

loop: br r17 exit add r21 r33 r21 subabs r33 1 r33 jump loop exit: ... while true:

  • p = mem [ pc ]

pc = pc+1 case op = br: r = mem [ pc ] pc = pc+1 if mem [ r ] == 0: pc := mem [ pc ] case op = add: r1 = mem [ pc ] pc = pc+1 ...

146 / 1

slide-147
SLIDE 147

Why can’t interpreter loops be JITed?

Let’s have bytecode and bytecode interpreter side-by-side:

loop: br r17 exit add r21 r33 r21 subabs r33 1 r33 jump loop exit: ... while true:

  • p = mem [ pc ]

pc = pc+1 case op = br: r = mem [ pc ] pc = pc+1 if mem [ r ] == 0: pc := mem [ pc ] case op = add: r1 = mem [ pc ] pc = pc+1 ...

Now every round of the interpreter takes a different branch.

147 / 1

slide-148
SLIDE 148

Why can’t interpreter loops be JITed?

Let’s have bytecode and bytecode interpreter side-by-side:

loop: br r17 exit add r21 r33 r21 subabs r33 1 r33 jump loop exit: ... while true:

  • p = mem [ pc ]

pc = pc+1 case op = br: r = mem [ pc ] pc = pc+1 if mem [ r ] == 0: pc := mem [ pc ] case op = add: r1 = mem [ pc ] pc = pc+1 ...

Now every round of the interpreter takes a different branch. The tracing JIT can just optimise one branch through the loop.

148 / 1

slide-149
SLIDE 149

Why can’t interpreter loops be JITed?

Let’s have bytecode and bytecode interpreter side-by-side:

loop: br r17 exit add r21 r33 r21 subabs r33 1 r33 jump loop exit: ... while true:

  • p = mem [ pc ]

pc = pc+1 case op = br: r = mem [ pc ] pc = pc+1 if mem [ r ] == 0: pc := mem [ pc ] case op = add: r1 = mem [ pc ] pc = pc+1 ...

Now every round of the interpreter takes a different branch. The tracing JIT can just optimise one branch through the loop. This is the worst case scenario: we pay the price of tracing,

  • ptimisation (since loop is executed a lot), only to throw away

the optimisation and go back to interpretation.

149 / 1

slide-150
SLIDE 150

Why can’t interpreter loops be JITed?

150 / 1

slide-151
SLIDE 151

Why can’t interpreter loops be JITed?

Profiling detects the wrong loop as hot code!

151 / 1

slide-152
SLIDE 152

Why can’t interpreter loops be JITed?

Profiling detects the wrong loop as hot code! We want profiling to detect the (code corresponding to the) user loop, not the interpreter loop.

152 / 1

slide-153
SLIDE 153

Why can’t interpreter loops be JITed?

Profiling detects the wrong loop as hot code! We want profiling to detect the (code corresponding to the) user loop, not the interpreter loop. Note that the (code corresponding to the) user loop consists of several rounds of the interpreter loop.

153 / 1

slide-154
SLIDE 154

Why can’t interpreter loops be JITed?

Profiling detects the wrong loop as hot code! We want profiling to detect the (code corresponding to the) user loop, not the interpreter loop. Note that the (code corresponding to the) user loop consists of several rounds of the interpreter loop. This is too difficult to detect for profiling, since user programs can vary greatly.

154 / 1

slide-155
SLIDE 155

Why can’t interpreter loops be JITed?

The interpreter writer knows what the user loops are like:

while true do: cmd = getNextCommand if cmd is: "x := E" then ... "if C then M else M" then ... "while C do M" then ... "repeat M until C" then ... "print(M)" then ... ...

155 / 1

slide-156
SLIDE 156

Why can’t interpreter loops be JITed?

The interpreter writer knows what the user loops are like:

while true do: cmd = getNextCommand if cmd is: "x := E" then ... "if C then M else M" then ... "while C do M" then ... "repeat M until C" then ... "print(M)" then ... ...

The idea of meta-tracing is to let the interpreter writer annotate the interpreter code with ’hooks’ that tell the tracing JIT compiler where user loops start and end. The profiler can then identify the hot loops in (the interpretation of) user code.

156 / 1

slide-157
SLIDE 157

Why can’t interpreter loops be JITed?

while true do: beginInterpreterLoop cmd = getNextCommand if cmd is: "x := E" then ... "while C do M" then beginUserLoop ... endUserLoop "repeat M until C" then beginUserLoop ... endUserLoop ... endInterpreterLoop

Annotations are used by profiler for finding hot user loops. Then user loops are traced & optimised.

157 / 1

slide-158
SLIDE 158

Why can’t interpreter loops be JITed?

while true do: beginInterpreterLoop cmd = getNextCommand if cmd is: "x := E" then ... "while C do M" then beginUserLoop ... endUserLoop "repeat M until C" then beginUserLoop ... endUserLoop ... endInterpreterLoop

Annotations are used by profiler for finding hot user loops. Then user loops are traced & optimised. Result: Speedup!

158 / 1

slide-159
SLIDE 159

Why can’t interpreter loops be JITed?

while true do: beginInterpreterLoop cmd = getNextCommand if cmd is: "x := E" then ... "while C do M" then beginUserLoop ... endUserLoop "repeat M until C" then beginUserLoop ... endUserLoop ... endInterpreterLoop

Annotations are used by profiler for finding hot user loops. Then user loops are traced & optimised. Result: Speedup! It is simple to annotate an interpreter.

159 / 1

slide-160
SLIDE 160

Meta-tracing as game changer in PL development

The real advantage of this is that it divides the problem of developing high-performance JIT compilers for a language into several parts, each of which separately is much more mangable:

160 / 1

slide-161
SLIDE 161

Meta-tracing as game changer in PL development

The real advantage of this is that it divides the problem of developing high-performance JIT compilers for a language into several parts, each of which separately is much more mangable:

  • 1. Develop a (meta-)tracing JIT compiler. Hard, but needs to

be done only once.

161 / 1

slide-162
SLIDE 162

Meta-tracing as game changer in PL development

The real advantage of this is that it divides the problem of developing high-performance JIT compilers for a language into several parts, each of which separately is much more mangable:

  • 1. Develop a (meta-)tracing JIT compiler. Hard, but needs to

be done only once.

  • 2. Develop an interpreter for the given source language.

Easy!

162 / 1

slide-163
SLIDE 163

Meta-tracing as game changer in PL development

The real advantage of this is that it divides the problem of developing high-performance JIT compilers for a language into several parts, each of which separately is much more mangable:

  • 1. Develop a (meta-)tracing JIT compiler. Hard, but needs to

be done only once.

  • 2. Develop an interpreter for the given source language.

Easy!

  • 3. Add annotations in the interpreter to expose user loops.

Easy!

163 / 1

slide-164
SLIDE 164

Meta-tracing as game changer in PL development

The real advantage of this is that it divides the problem of developing high-performance JIT compilers for a language into several parts, each of which separately is much more mangable:

  • 1. Develop a (meta-)tracing JIT compiler. Hard, but needs to

be done only once.

  • 2. Develop an interpreter for the given source language.

Easy!

  • 3. Add annotations in the interpreter to expose user loops.

Easy!

  • 4. Run the interpreter using the tracing JIT from (1). Easy!

164 / 1

slide-165
SLIDE 165

Meta-tracing as game changer in PL development

The real advantage of this is that it divides the problem of developing high-performance JIT compilers for a language into several parts, each of which separately is much more mangable:

  • 1. Develop a (meta-)tracing JIT compiler. Hard, but needs to

be done only once.

  • 2. Develop an interpreter for the given source language.

Easy!

  • 3. Add annotations in the interpreter to expose user loops.

Easy!

  • 4. Run the interpreter using the tracing JIT from (1). Easy!

The tracing JIT from (1) can be reused for an unlimited number

  • f language interpreters.

165 / 1

slide-166
SLIDE 166

Meta-tracing as game changer in PL development

The real advantage of this is that it divides the problem of developing high-performance JIT compilers for a language into several parts, each of which separately is much more mangable:

  • 1. Develop a (meta-)tracing JIT compiler. Hard, but needs to

be done only once.

  • 2. Develop an interpreter for the given source language.

Easy!

  • 3. Add annotations in the interpreter to expose user loops.

Easy!

  • 4. Run the interpreter using the tracing JIT from (1). Easy!

The tracing JIT from (1) can be reused for an unlimited number

  • f language interpreters. Once a meta-tracing JIT is available,

we can easily develop new languages and have high-performance compilers for them (almost) for free.

166 / 1

slide-167
SLIDE 167

Meta-tracing as game changer in PL development

The real advantage of this is that it divides the problem of developing high-performance JIT compilers for a language into several parts, each of which separately is much more mangable:

  • 1. Develop a (meta-)tracing JIT compiler. Hard, but needs to

be done only once.

  • 2. Develop an interpreter for the given source language.

Easy!

  • 3. Add annotations in the interpreter to expose user loops.

Easy!

  • 4. Run the interpreter using the tracing JIT from (1). Easy!

The tracing JIT from (1) can be reused for an unlimited number

  • f language interpreters. Once a meta-tracing JIT is available,

we can easily develop new languages and have high-performance compilers for them (almost) for free. The PyPy meta-tracing framework runs Python substantially faster than e.g. the CPython framework.

167 / 1

slide-168
SLIDE 168

Brief remarks on performance

168 / 1

slide-169
SLIDE 169

Brief remarks on performance

JIT compilers are built upon many trade-offs.

169 / 1

slide-170
SLIDE 170

Brief remarks on performance

JIT compilers are built upon many trade-offs. Although JIT compilers can give lightning fast execution on typical programs, their worst-case execution time can be dreadful.

170 / 1

slide-171
SLIDE 171

Brief remarks on performance

JIT compilers are built upon many trade-offs. Although JIT compilers can give lightning fast execution on typical programs, their worst-case execution time can be dreadful. JIT compilers work best for languages that do a lot of stuff at run-time (e.g. type-checking).

171 / 1

slide-172
SLIDE 172

Brief remarks on performance

JIT compilers are built upon many trade-offs. Although JIT compilers can give lightning fast execution on typical programs, their worst-case execution time can be dreadful. JIT compilers work best for languages that do a lot of stuff at run-time (e.g. type-checking). For bare-bones languages like C, there is little to optimise at run-time, and code generated by a conventional C compiler with heavy (hence slow) optimisation will almost always beat a modern JIT compiler.

172 / 1

slide-173
SLIDE 173

Compiler development in industry

173 / 1

slide-174
SLIDE 174

Compiler development in industry

Lot’s of research going on into compilers, both conventional and (tracing) JITs. It’s super high-tech.

174 / 1

slide-175
SLIDE 175

Compiler development in industry

Lot’s of research going on into compilers, both conventional and (tracing) JITs. It’s super high-tech. Big companies (Google, Microsoft, Oracle, Intel, Arm, Apple) compete heavily on quality (e.g. speed, energy usage) of their

  • compilers. They have large teams working on this. Find it

difficult to hire, because advanced compiler knowledge is rare.

175 / 1

slide-176
SLIDE 176

Compiler development in industry

Lot’s of research going on into compilers, both conventional and (tracing) JITs. It’s super high-tech. Big companies (Google, Microsoft, Oracle, Intel, Arm, Apple) compete heavily on quality (e.g. speed, energy usage) of their

  • compilers. They have large teams working on this. Find it

difficult to hire, because advanced compiler knowledge is rare. Much work left to be done.

176 / 1

slide-177
SLIDE 177

Interested?

177 / 1

slide-178
SLIDE 178

Interested?

Compilers (and related subjects) great subject for final year projects.

178 / 1

slide-179
SLIDE 179

Interested?

Compilers (and related subjects) great subject for final year projects. JRA (Junior Research Assistant) in the summer 2019.

179 / 1

slide-180
SLIDE 180

Interested?

Compilers (and related subjects) great subject for final year projects. JRA (Junior Research Assistant) in the summer 2019. Feel free to talk to me about this.

180 / 1

slide-181
SLIDE 181

181 / 1

slide-182
SLIDE 182

182 / 1