Compilers and computer architecture: introduction Martin Berger 1 - - PowerPoint PPT Presentation

compilers and computer architecture introduction
SMART_READER_LITE
LIVE PREVIEW

Compilers and computer architecture: introduction Martin Berger 1 - - PowerPoint PPT Presentation

Compilers and computer architecture: introduction Martin Berger 1 Thanks to Chad MacKinney, Alex Jeffery, Justin Crow, Jim Fielding, Shaun Ring and Vilem Liepelt for suggestions and corrections. Thanks to Benjamin Landers for the RARS simulator.


slide-1
SLIDE 1

Compilers and computer architecture: introduction

Martin Berger 1

Thanks to Chad MacKinney, Alex Jeffery, Justin Crow, Jim Fielding, Shaun Ring and Vilem Liepelt for suggestions and corrections. Thanks to Benjamin Landers for the RARS simulator.

September 2019

1Email: M.F.Berger@sussex.ac.uk, Office hours: Wed 12-13 in

Chi-2R312.

1 / 150

slide-2
SLIDE 2

Administrative matters: lecturer

2 / 150

slide-3
SLIDE 3

Administrative matters: lecturer

◮ Name: Martin Berger

3 / 150

slide-4
SLIDE 4

Administrative matters: lecturer

◮ Name: Martin Berger ◮ Email: M.F.Berger@sussex.ac.uk

4 / 150

slide-5
SLIDE 5

Administrative matters: lecturer

◮ Name: Martin Berger ◮ Email: M.F.Berger@sussex.ac.uk ◮ Web: http://users.sussex.ac.uk/~mfb21/compilers

5 / 150

slide-6
SLIDE 6

Administrative matters: lecturer

◮ Name: Martin Berger ◮ Email: M.F.Berger@sussex.ac.uk ◮ Web: http://users.sussex.ac.uk/~mfb21/compilers ◮ Lecture notes etc: http://users.sussex.ac.uk/

~mfb21/compilers/material.html Linked from Canvas

6 / 150

slide-7
SLIDE 7

Administrative matters: lecturer

◮ Name: Martin Berger ◮ Email: M.F.Berger@sussex.ac.uk ◮ Web: http://users.sussex.ac.uk/~mfb21/compilers ◮ Lecture notes etc: http://users.sussex.ac.uk/

~mfb21/compilers/material.html Linked from Canvas

◮ Office hour: after the Wednesdays lectures, and on

request (please arrange by email, see http://users.sussex.ac.uk/~mfb21/cal for available time-slots)

7 / 150

slide-8
SLIDE 8

Administrative matters: lecturer

◮ Name: Martin Berger ◮ Email: M.F.Berger@sussex.ac.uk ◮ Web: http://users.sussex.ac.uk/~mfb21/compilers ◮ Lecture notes etc: http://users.sussex.ac.uk/

~mfb21/compilers/material.html Linked from Canvas

◮ Office hour: after the Wednesdays lectures, and on

request (please arrange by email, see http://users.sussex.ac.uk/~mfb21/cal for available time-slots)

◮ My room: Chichester II, 312

8 / 150

slide-9
SLIDE 9

Administrative matters: dates, times and assessment

9 / 150

slide-10
SLIDE 10

Administrative matters: dates, times and assessment

◮ Lectures: Two lectures per week,

Wednesday: 11-12 Lec PEV1-1A7 Friday: 17-18 RICH-AS3

10 / 150

slide-11
SLIDE 11

Administrative matters: dates, times and assessment

◮ Lectures: Two lectures per week,

Wednesday: 11-12 Lec PEV1-1A7 Friday: 17-18 RICH-AS3

◮ Tutorials: please see your timetables. The TA is Shaun

Ring sr410@sussex.ac.uk

11 / 150

slide-12
SLIDE 12

Administrative matters: dates, times and assessment

◮ Lectures: Two lectures per week,

Wednesday: 11-12 Lec PEV1-1A7 Friday: 17-18 RICH-AS3

◮ Tutorials: please see your timetables. The TA is Shaun

Ring sr410@sussex.ac.uk

◮ There will (probably) be PAL sessions, more soon.

12 / 150

slide-13
SLIDE 13

Administrative matters: dates, times and assessment

◮ Lectures: Two lectures per week,

Wednesday: 11-12 Lec PEV1-1A7 Friday: 17-18 RICH-AS3

◮ Tutorials: please see your timetables. The TA is Shaun

Ring sr410@sussex.ac.uk

◮ There will (probably) be PAL sessions, more soon. ◮ Assessment: coursework (50%) and by unseen

examination (50%). Both courseworks involve writing parts

  • f a compiler. Due dates for courseworks: Fri, 8 Nov 2019,

and Fri, 20 Dec 2019, both 18:00.

13 / 150

slide-14
SLIDE 14

Questions welcome!

14 / 150

slide-15
SLIDE 15

Questions welcome!

Please, ask questions ...

◮ during the lesson ◮ at the end of the lesson ◮ in my office hours (see

http://users.sussex.ac.uk/~mfb21/cal for available time-slots)

◮ by email M.F.Berger@sussex.ac.uk ◮ on Canvas ◮ in the tutorials ◮ in the course’s Discord channel (invite is on Canvas) ◮ any other channels (e.g. Telegram, TikTok ...)?

Please, don’t wait until the end of the course to tell me about any problems you may encounter.

15 / 150

slide-16
SLIDE 16

Prerequisites

16 / 150

slide-17
SLIDE 17

Prerequisites

Good Java programming skills are indispensable.This course is not about teaching you how to program. “Good” in this context means you can do most questions on e.g. https://leetcode.com/ classified as “Easy” without problems (= without looking up the answer, and in 1 hour or less). I also recommed that you familiarise yourself with the material on “Shell Tools and Scripting” and “Command-line Environment” in: https://missing.csail.mit.edu/

17 / 150

slide-18
SLIDE 18

Prerequisites

Good Java programming skills are indispensable.This course is not about teaching you how to program. “Good” in this context means you can do most questions on e.g. https://leetcode.com/ classified as “Easy” without problems (= without looking up the answer, and in 1 hour or less). I also recommed that you familiarise yourself with the material on “Shell Tools and Scripting” and “Command-line Environment” in: https://missing.csail.mit.edu/ It helps if you have already seen e.g. regular expressions, FSMs etc. But we will cover all this from scratch.

18 / 150

slide-19
SLIDE 19

Prerequisites

Good Java programming skills are indispensable.This course is not about teaching you how to program. “Good” in this context means you can do most questions on e.g. https://leetcode.com/ classified as “Easy” without problems (= without looking up the answer, and in 1 hour or less). I also recommed that you familiarise yourself with the material on “Shell Tools and Scripting” and “Command-line Environment” in: https://missing.csail.mit.edu/ It helps if you have already seen e.g. regular expressions, FSMs etc. But we will cover all this from scratch. It helps if you have already seen a CPU, e.g. know what a register is or a stack pointer.

19 / 150

slide-20
SLIDE 20

Course content

20 / 150

slide-21
SLIDE 21

Course content

I’m planning to give a fairly orthodox compilers course that show you all parts of a compiler. At the end of this course you should be able to write a fully blown compiler yourself and implement programming languages.

21 / 150

slide-22
SLIDE 22

Course content

I’m planning to give a fairly orthodox compilers course that show you all parts of a compiler. At the end of this course you should be able to write a fully blown compiler yourself and implement programming languages. We will also look at computer architecture, although more superficially.

22 / 150

slide-23
SLIDE 23

Course content

I’m planning to give a fairly orthodox compilers course that show you all parts of a compiler. At the end of this course you should be able to write a fully blown compiler yourself and implement programming languages. We will also look at computer architecture, although more superficially. This will take approximately 9 weeks, so we have time at the end for some advanced material. I’m happy to tailor the course to your interest, so please let me know what you want to hear about.

23 / 150

slide-24
SLIDE 24

Coursework

24 / 150

slide-25
SLIDE 25

Coursework

Evaluation of assessed courseworks will (largely) be by automated tests. This is quite different from what you’ve seen so far. The reason for this new approach is threefold.

25 / 150

slide-26
SLIDE 26

Coursework

Evaluation of assessed courseworks will (largely) be by automated tests. This is quite different from what you’ve seen so far. The reason for this new approach is threefold.

◮ Compilers are complicated algorithms and it’s beyond

human capabilities to find subtle bugs.

◮ Realism. In industry you don’t get paid for being nice, or for

having code that “almost” works.

◮ Fairness. Automatic testing removes subjective element.

26 / 150

slide-27
SLIDE 27

Coursework

Evaluation of assessed courseworks will (largely) be by automated tests. This is quite different from what you’ve seen so far. The reason for this new approach is threefold.

◮ Compilers are complicated algorithms and it’s beyond

human capabilities to find subtle bugs.

◮ Realism. In industry you don’t get paid for being nice, or for

having code that “almost” works.

◮ Fairness. Automatic testing removes subjective element.

Note that if you make a basic error in your compiler then it is quite likely that every test fails and you will get 0 points. So it is really important that you test your code before submission

  • thoroughly. I encourage you to share tests and testing

frameworks with other students: as tests are not part of the deliverable, you make share them. Of course the compiler must be written by yourself.

27 / 150

slide-28
SLIDE 28

Plan for today’s lecture

28 / 150

slide-29
SLIDE 29

Plan for today’s lecture

Whirlwind overview of the course.

29 / 150

slide-30
SLIDE 30

Plan for today’s lecture

Whirlwind overview of the course.

◮ Why study compilers?

30 / 150

slide-31
SLIDE 31

Plan for today’s lecture

Whirlwind overview of the course.

◮ Why study compilers? ◮ What is a compiler?

31 / 150

slide-32
SLIDE 32

Plan for today’s lecture

Whirlwind overview of the course.

◮ Why study compilers? ◮ What is a compiler? ◮ Compiler structure

32 / 150

slide-33
SLIDE 33

Plan for today’s lecture

Whirlwind overview of the course.

◮ Why study compilers? ◮ What is a compiler? ◮ Compiler structure ◮ Lexical analysis

33 / 150

slide-34
SLIDE 34

Plan for today’s lecture

Whirlwind overview of the course.

◮ Why study compilers? ◮ What is a compiler? ◮ Compiler structure ◮ Lexical analysis ◮ Syntax analysis

34 / 150

slide-35
SLIDE 35

Plan for today’s lecture

Whirlwind overview of the course.

◮ Why study compilers? ◮ What is a compiler? ◮ Compiler structure ◮ Lexical analysis ◮ Syntax analysis ◮ Semantic analysis, type-checking

35 / 150

slide-36
SLIDE 36

Plan for today’s lecture

Whirlwind overview of the course.

◮ Why study compilers? ◮ What is a compiler? ◮ Compiler structure ◮ Lexical analysis ◮ Syntax analysis ◮ Semantic analysis, type-checking ◮ Code generation

36 / 150

slide-37
SLIDE 37

Why study compilers?

37 / 150

slide-38
SLIDE 38

Why study compilers?

To become a good programmer, you need to understand what happens ’under the hood’ when you write programs in a high-level language.

38 / 150

slide-39
SLIDE 39

Why study compilers?

To become a good programmer, you need to understand what happens ’under the hood’ when you write programs in a high-level language. To understand low-level languages (assembler, C/C++, Rust, Go) better. Those languages are of prime importance, e.g. for writing operating systems, embedded code and generally code that needs to be fast (e.g. computer games, ML e.g. TensorFlow).

39 / 150

slide-40
SLIDE 40

Why study compilers?

To become a good programmer, you need to understand what happens ’under the hood’ when you write programs in a high-level language. To understand low-level languages (assembler, C/C++, Rust, Go) better. Those languages are of prime importance, e.g. for writing operating systems, embedded code and generally code that needs to be fast (e.g. computer games, ML e.g. TensorFlow). Most large programs have a tendency to embed a programming

  • language. The skill quickly to write an interpreter or compiler for

such embedded languages is invaluable.

40 / 150

slide-41
SLIDE 41

Why study compilers?

To become a good programmer, you need to understand what happens ’under the hood’ when you write programs in a high-level language. To understand low-level languages (assembler, C/C++, Rust, Go) better. Those languages are of prime importance, e.g. for writing operating systems, embedded code and generally code that needs to be fast (e.g. computer games, ML e.g. TensorFlow). Most large programs have a tendency to embed a programming

  • language. The skill quickly to write an interpreter or compiler for

such embedded languages is invaluable. But most of all: compilers are extremely amazing, beautiful and

  • ne of the all time great examples of human ingenuity. After 70

years of refinement compilers are a paradigm case of beautiful software structure (modularisation). I hope it inspires you.

41 / 150

slide-42
SLIDE 42

Overview: what is a compiler?

42 / 150

slide-43
SLIDE 43

Overview: what is a compiler?

A compiler is a program that translates programs from one programming language to programs in another programming

  • language. The translation should preserve meaning (what does

“preserve” and “meaning” mean in this context?).

Compiler Source program Target program Error messages 43 / 150

slide-44
SLIDE 44

Overview: what is a compiler?

A compiler is a program that translates programs from one programming language to programs in another programming

  • language. The translation should preserve meaning (what does

“preserve” and “meaning” mean in this context?).

Compiler Source program Target program Error messages

Typically, the input language (called source language) is more high-level than the output language (called target language)

44 / 150

slide-45
SLIDE 45

Overview: what is a compiler?

A compiler is a program that translates programs from one programming language to programs in another programming

  • language. The translation should preserve meaning (what does

“preserve” and “meaning” mean in this context?).

Compiler Source program Target program Error messages

Typically, the input language (called source language) is more high-level than the output language (called target language) Examples

45 / 150

slide-46
SLIDE 46

Overview: what is a compiler?

A compiler is a program that translates programs from one programming language to programs in another programming

  • language. The translation should preserve meaning (what does

“preserve” and “meaning” mean in this context?).

Compiler Source program Target program Error messages

Typically, the input language (called source language) is more high-level than the output language (called target language) Examples

◮ Source: Java, target: JVM bytecode.

46 / 150

slide-47
SLIDE 47

Overview: what is a compiler?

A compiler is a program that translates programs from one programming language to programs in another programming

  • language. The translation should preserve meaning (what does

“preserve” and “meaning” mean in this context?).

Compiler Source program Target program Error messages

Typically, the input language (called source language) is more high-level than the output language (called target language) Examples

◮ Source: Java, target: JVM bytecode. ◮ Source: JVM bytecode, target: ARM/x86 machine code

47 / 150

slide-48
SLIDE 48

Overview: what is a compiler?

A compiler is a program that translates programs from one programming language to programs in another programming

  • language. The translation should preserve meaning (what does

“preserve” and “meaning” mean in this context?).

Compiler Source program Target program Error messages

Typically, the input language (called source language) is more high-level than the output language (called target language) Examples

◮ Source: Java, target: JVM bytecode. ◮ Source: JVM bytecode, target: ARM/x86 machine code ◮ Source: TensorFlow, target: GPU/TPU machine code.

48 / 150

slide-49
SLIDE 49

Example translation: source program

49 / 150

slide-50
SLIDE 50

Example translation: source program

Here is a little program. (What does it do?)

int testfun( int n ){ int res = 1; while( n > 0 ){ n--; res *= 2; } return res; }

50 / 150

slide-51
SLIDE 51

Example translation: source program

Here is a little program. (What does it do?)

int testfun( int n ){ int res = 1; while( n > 0 ){ n--; res *= 2; } return res; }

Using clang -S this translates to the following x86 machine code ...

51 / 150

slide-52
SLIDE 52

Example translation: target program

_testfun: ## @testfun .cfi_startproc ## BB#0: pushq %rbp Ltmp0: .cfi_def_cfa_offset 16 Ltmp1: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp2: .cfi_def_cfa_register %rbp movl %edi, -4(%rbp) movl $1, -8(%rbp) LBB0_1: ## =>This Inner Loop Header: Depth=1 cmpl $0, -4(%rbp) jle LBB0_3 ## BB#2: ## in Loop: Header=BB0_1 Depth=1 movl

  • 4(%rbp), %eax

addl $4294967295, %eax ## imm = 0xFFFFFFFF movl %eax, -4(%rbp) movl

  • 8(%rbp), %eax

shll $1, %eax movl %eax, -8(%rbp) jmp LBB0_1 LBB0_3: movl

  • 8(%rbp), %eax

popq %rbp retq .cfi_endproc 52 / 150

slide-53
SLIDE 53

Compilers have a beautifully simple structure

Analysis phase Code generation Source program Generated program

53 / 150

slide-54
SLIDE 54

Compilers have a beautifully simple structure

Analysis phase Code generation Source program Generated program

In the analysis phase two things happen:

54 / 150

slide-55
SLIDE 55

Compilers have a beautifully simple structure

Analysis phase Code generation Source program Generated program

In the analysis phase two things happen:

◮ Analysing if the program is well-formed (e.g.

checking for syntax and type errors).

55 / 150

slide-56
SLIDE 56

Compilers have a beautifully simple structure

Analysis phase Code generation Source program Generated program

In the analysis phase two things happen:

◮ Analysing if the program is well-formed (e.g.

checking for syntax and type errors).

◮ Creating a convenient (for a computer)

representation of the source program structure for further processing. (Abstract syntax tree (AST), symbol table).

56 / 150

slide-57
SLIDE 57

Compilers have a beautifully simple structure

Analysis phase Code generation Source program Generated program

In the analysis phase two things happen:

◮ Analysing if the program is well-formed (e.g.

checking for syntax and type errors).

◮ Creating a convenient (for a computer)

representation of the source program structure for further processing. (Abstract syntax tree (AST), symbol table). The executable program is then generated from the AST in the code generation phase.

57 / 150

slide-58
SLIDE 58

Compilers have a beautifully simple structure

Analysis phase Code generation Source program Generated program

In the analysis phase two things happen:

◮ Analysing if the program is well-formed (e.g.

checking for syntax and type errors).

◮ Creating a convenient (for a computer)

representation of the source program structure for further processing. (Abstract syntax tree (AST), symbol table). The executable program is then generated from the AST in the code generation phase. Let’s refine this.

58 / 150

slide-59
SLIDE 59

Compiler structure

59 / 150

slide-60
SLIDE 60

Compiler structure

Compilers have a beautifully simple structure. This structure was arrived at by breaking a hard problem (compilation) into several smaller problems and solving them separately. This has the added advantage of allowing to retarget compilers (changing source or target language) quite easily.

60 / 150

slide-61
SLIDE 61

Compiler structure

Compilers have a beautifully simple structure. This structure was arrived at by breaking a hard problem (compilation) into several smaller problems and solving them separately. This has the added advantage of allowing to retarget compilers (changing source or target language) quite easily.

Lexical analysis Syntax analysis Source program Semantic analysis, e.g. type checking Intermediate code generation Optimisation Code generation Translated program 61 / 150

slide-62
SLIDE 62

Compiler structure

62 / 150

slide-63
SLIDE 63

Compiler structure

Lexical analysis Syntax analysis Source program Semantic analysis, e.g. type checking Intermediate code generation Optimisation Code generation Translated program

Interesting question: when do these phases happen?

63 / 150

slide-64
SLIDE 64

Compiler structure

Lexical analysis Syntax analysis Source program Semantic analysis, e.g. type checking Intermediate code generation Optimisation Code generation Translated program

Interesting question: when do these phases happen? In the past, all happend at ... compile-time. Now some happen at run-time in Just-in-time compilers (JITs). This has profound influences on choice of algorithms and performance.

64 / 150

slide-65
SLIDE 65

Compiler structure

65 / 150

slide-66
SLIDE 66

Compiler structure

Lexical analysis Syntax analysis Source program Semantic analysis, e.g. type checking Intermediate code generation Optimisation Code generation Translated program

Another interesting question: do you note some thing about all these phases?

66 / 150

slide-67
SLIDE 67

Compiler structure

Lexical analysis Syntax analysis Source program Semantic analysis, e.g. type checking Intermediate code generation Optimisation Code generation Translated program

Another interesting question: do you note some thing about all these phases? The phases are purely functional, in that they take one input, and return one output. Modern programming languages like Haskell, Ocaml, F#, Rust or Scala are ideal for writing compilers.

67 / 150

slide-68
SLIDE 68

Phases: Overview

◮ Lexical analysis ◮ Syntactic analysis (parsing) ◮ Semantic analysis (type-checking) ◮ Intermediate code generation ◮ Optimisation ◮ Code generation

68 / 150

slide-69
SLIDE 69

Phases: Lexical analysis

69 / 150

slide-70
SLIDE 70

Phases: Lexical analysis

Lexical analysis Syntax analysis Source program Semantic analysis, e.g. type checking Intermediate code generation Optimisation Code generation Translated program 70 / 150

slide-71
SLIDE 71

Phases: Lexical analysis

71 / 150

slide-72
SLIDE 72

Phases: Lexical analysis

What is the input to a compiler?

72 / 150

slide-73
SLIDE 73

Phases: Lexical analysis

What is the input to a compiler? A (often long) string, i.e. a sequence of characters.

73 / 150

slide-74
SLIDE 74

Phases: Lexical analysis

What is the input to a compiler? A (often long) string, i.e. a sequence of characters. Strings are not an efficient data-structure for a compiler to work with (= generate code from). Instead, compilers generate code from a more convenient data structure called “abstract syntax trees” (ASTs). We construct the AST of a program in two phases:

◮ Lexical anlysis. Where the input string is converted into a

list of tokens.

◮ Parsing. Where the AST is constructed from a token list.

74 / 150

slide-75
SLIDE 75

Phases: Lexical analysis

75 / 150

slide-76
SLIDE 76

Phases: Lexical analysis

In the lexical analysis, a string is converted into a list of tokens. Example: The program

76 / 150

slide-77
SLIDE 77

Phases: Lexical analysis

In the lexical analysis, a string is converted into a list of tokens. Example: The program

int testfun( int n ){ int res = 1; while( n > 0 ){ n--; res *= 2; } return res; }

77 / 150

slide-78
SLIDE 78

Phases: Lexical analysis

In the lexical analysis, a string is converted into a list of tokens. Example: The program

int testfun( int n ){ int res = 1; while( n > 0 ){ n--; res *= 2; } return res; }

Is (could be) represented as the list

T_int, T_ident ( "testfun" ), T_left_brack, T_int, T_ident ( "n" ), T_rightbrack, T_left_curly_brack, T_int, T_ident ( "res" ), T_eq, T_num ( 1 ), T_semicolon, T_while, ...

78 / 150

slide-79
SLIDE 79

Phases: Lexical analysis

T_int, T_ident ( "testfun" ), T_left_brack, T_int, T_ident ( "n" ), T_rightbrack, T_left_curly_brack, T_int, T_ident ( "res" ), T_eq, T_num ( 1 ), T_semicolon, T_while, ...

Why is this interesting?

79 / 150

slide-80
SLIDE 80

Phases: Lexical analysis

T_int, T_ident ( "testfun" ), T_left_brack, T_int, T_ident ( "n" ), T_rightbrack, T_left_curly_brack, T_int, T_ident ( "res" ), T_eq, T_num ( 1 ), T_semicolon, T_while, ...

Why is this interesting?

◮ Abstracts from irrelevant detail (e.g. syntax of keywords,

whitespace, comments).

80 / 150

slide-81
SLIDE 81

Phases: Lexical analysis

T_int, T_ident ( "testfun" ), T_left_brack, T_int, T_ident ( "n" ), T_rightbrack, T_left_curly_brack, T_int, T_ident ( "res" ), T_eq, T_num ( 1 ), T_semicolon, T_while, ...

Why is this interesting?

◮ Abstracts from irrelevant detail (e.g. syntax of keywords,

whitespace, comments).

◮ Makes the next phase (parsing) much easier.

81 / 150

slide-82
SLIDE 82

Phases: syntax analysis (parsing)

82 / 150

slide-83
SLIDE 83

Phases: syntax analysis (parsing)

Lexical analysis Syntax analysis Source program Semantic analysis, e.g. type checking Intermediate code generation Optimisation Code generation Translated program 83 / 150

slide-84
SLIDE 84

Phases: syntax analysis (parsing)

84 / 150

slide-85
SLIDE 85

Phases: syntax analysis (parsing)

This phase converts the program (list of tokens) into a tree, the AST of the program (compare to the DOM of a webpage). This is a very convenient data structure because syntax-checking (type-checking) and code-generation can be done by walking the AST (cf visitor pattern). But how is a program a tree?

85 / 150

slide-86
SLIDE 86

Phases: syntax analysis (parsing)

This phase converts the program (list of tokens) into a tree, the AST of the program (compare to the DOM of a webpage). This is a very convenient data structure because syntax-checking (type-checking) and code-generation can be done by walking the AST (cf visitor pattern). But how is a program a tree?

while( n > 0 ){ n--; res *= 2; }

86 / 150

slide-87
SLIDE 87

Phases: syntax analysis (parsing)

This phase converts the program (list of tokens) into a tree, the AST of the program (compare to the DOM of a webpage). This is a very convenient data structure because syntax-checking (type-checking) and code-generation can be done by walking the AST (cf visitor pattern). But how is a program a tree?

while( n > 0 ){ n--; res *= 2; }

T_while T_greater T_var ( n ) T_num ( 0 ) T_semicolon T_decrement T_var ( n ) T_update T_var ( res ) T_mult T_var ( res ) T_num ( 2 )

87 / 150

slide-88
SLIDE 88

Phases: syntax analysis (parsing)

88 / 150

slide-89
SLIDE 89

Phases: syntax analysis (parsing)

T_while T_greater T_var ( n ) T_num ( 0 ) T_semicolon T_decrement T_var ( n ) T_update T_var ( res ) T_mult T_var ( res ) T_num ( 2 )

89 / 150

slide-90
SLIDE 90

Phases: syntax analysis (parsing)

T_while T_greater T_var ( n ) T_num ( 0 ) T_semicolon T_decrement T_var ( n ) T_update T_var ( res ) T_mult T_var ( res ) T_num ( 2 )

◮ The AST is often implemented as a tree of linked objects.

90 / 150

slide-91
SLIDE 91

Phases: syntax analysis (parsing)

T_while T_greater T_var ( n ) T_num ( 0 ) T_semicolon T_decrement T_var ( n ) T_update T_var ( res ) T_mult T_var ( res ) T_num ( 2 )

◮ The AST is often implemented as a tree of linked objects. ◮ The compiler writer must design the AST data structure

carefully so that it is easy to build (during syntax analysis), and easy to walk (during code generation).

91 / 150

slide-92
SLIDE 92

Phases: syntax analysis (parsing)

T_while T_greater T_var ( n ) T_num ( 0 ) T_semicolon T_decrement T_var ( n ) T_update T_var ( res ) T_mult T_var ( res ) T_num ( 2 )

◮ The AST is often implemented as a tree of linked objects. ◮ The compiler writer must design the AST data structure

carefully so that it is easy to build (during syntax analysis), and easy to walk (during code generation).

◮ The performance of the compiler strongly depends on the

AST, so a lot of optimisation goes here for instustrial strength compilers.

92 / 150

slide-93
SLIDE 93

Phases: syntax analysis (parsing)

93 / 150

slide-94
SLIDE 94

Phases: syntax analysis (parsing)

T_while T_greater T_var ( n ) T_num ( 0 ) T_semicolon T_decrement T_var ( n ) T_update T_var ( res ) T_mult T_var ( res ) T_num ( 2 )

94 / 150

slide-95
SLIDE 95

Phases: syntax analysis (parsing)

T_while T_greater T_var ( n ) T_num ( 0 ) T_semicolon T_decrement T_var ( n ) T_update T_var ( res ) T_mult T_var ( res ) T_num ( 2 )

The construction of the AST has another important role: syntax checking, i.e. checking if the program is syntactically valid!

95 / 150

slide-96
SLIDE 96

Phases: syntax analysis (parsing)

T_while T_greater T_var ( n ) T_num ( 0 ) T_semicolon T_decrement T_var ( n ) T_update T_var ( res ) T_mult T_var ( res ) T_num ( 2 )

The construction of the AST has another important role: syntax checking, i.e. checking if the program is syntactically valid! This dual role is because the rules for constructing the AST are essentially exactly the rules that determine the set of syntactically valid programs. Here the theory of formal languages (context free, context sensitive, and finite automata) is of prime importance. We will study this in detail.

96 / 150

slide-97
SLIDE 97

Phases: syntax analysis (parsing)

97 / 150

slide-98
SLIDE 98

Phases: syntax analysis (parsing)

T_while T_greater T_var ( n ) T_num ( 0 ) T_semicolon T_decrement T_var ( n ) T_update T_var ( res ) T_mult T_var ( res ) T_num ( 2 )

98 / 150

slide-99
SLIDE 99

Phases: syntax analysis (parsing)

T_while T_greater T_var ( n ) T_num ( 0 ) T_semicolon T_decrement T_var ( n ) T_update T_var ( res ) T_mult T_var ( res ) T_num ( 2 )

Great news: the generation of lexical analysers and parsers can be automated by using parser generators (e.g. lex, yacc). Decades of research have gone into parser generators, and in practise they generate better lexers and parsers than most programmers would be able to. Alas, parser generators are quite complicated beasts, and in order to understand them, it is helpful to understand formal languages and lexing/parsing. The best way to understand this is to write a toy lexer and parser.

99 / 150

slide-100
SLIDE 100

Phases: semantic analysis

100 / 150

slide-101
SLIDE 101

Phases: semantic analysis

Lexical analysis Syntax analysis Source program Semantic analysis, e.g. type checking Intermediate code generation Optimisation Code generation Translated program 101 / 150

slide-102
SLIDE 102

Phases: semantic analysis

102 / 150

slide-103
SLIDE 103

Phases: semantic analysis

While parsing can reject syntactically invalid programs, it cannot reject semantically invalid programs, e.g. programs with more complicated ’semantic’ mistakes are harder to catch. Examples.

103 / 150

slide-104
SLIDE 104

Phases: semantic analysis

While parsing can reject syntactically invalid programs, it cannot reject semantically invalid programs, e.g. programs with more complicated ’semantic’ mistakes are harder to catch. Examples.

void main() { i = 7 int i = 7 ...

104 / 150

slide-105
SLIDE 105

Phases: semantic analysis

While parsing can reject syntactically invalid programs, it cannot reject semantically invalid programs, e.g. programs with more complicated ’semantic’ mistakes are harder to catch. Examples.

void main() { i = 7 int i = 7 ... if ( 3 + true ) > "hello" then ...

105 / 150

slide-106
SLIDE 106

Phases: semantic analysis

While parsing can reject syntactically invalid programs, it cannot reject semantically invalid programs, e.g. programs with more complicated ’semantic’ mistakes are harder to catch. Examples.

void main() { i = 7 int i = 7 ... if ( 3 + true ) > "hello" then ...

They are caught with semantic analysis. The key technology are types. Modern languages like Scala, Rust, Haskell, Ocaml, F# employ type inference.

106 / 150

slide-107
SLIDE 107

Phases: intermediate code generation

107 / 150

slide-108
SLIDE 108

Phases: intermediate code generation

Lexical analysis Syntax analysis Source program Semantic analysis, e.g. type checking Intermediate code generation Optimisation Code generation Translated program 108 / 150

slide-109
SLIDE 109

Phases: intermediate code generation

109 / 150

slide-110
SLIDE 110

Phases: intermediate code generation

There are many different CPUs with different machine

  • languages. Often the machine language changes subtly from

CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.

110 / 150

slide-111
SLIDE 111

Phases: intermediate code generation

There are many different CPUs with different machine

  • languages. Often the machine language changes subtly from

CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.

◮ Develop an intermediate language that captures the

essence of almost all machine languages.

111 / 150

slide-112
SLIDE 112

Phases: intermediate code generation

There are many different CPUs with different machine

  • languages. Often the machine language changes subtly from

CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.

◮ Develop an intermediate language that captures the

essence of almost all machine languages.

◮ Compile to this intermediate language.

112 / 150

slide-113
SLIDE 113

Phases: intermediate code generation

There are many different CPUs with different machine

  • languages. Often the machine language changes subtly from

CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.

◮ Develop an intermediate language that captures the

essence of almost all machine languages.

◮ Compile to this intermediate language. ◮ Do compiler optimisations in the intermediate language.

113 / 150

slide-114
SLIDE 114

Phases: intermediate code generation

There are many different CPUs with different machine

  • languages. Often the machine language changes subtly from

CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.

◮ Develop an intermediate language that captures the

essence of almost all machine languages.

◮ Compile to this intermediate language. ◮ Do compiler optimisations in the intermediate language. ◮ Translate the intermediate representation to the target

machine language. This step can be seen as a mini-compiler.

114 / 150

slide-115
SLIDE 115

Phases: intermediate code generation

There are many different CPUs with different machine

  • languages. Often the machine language changes subtly from

CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.

◮ Develop an intermediate language that captures the

essence of almost all machine languages.

◮ Compile to this intermediate language. ◮ Do compiler optimisations in the intermediate language. ◮ Translate the intermediate representation to the target

machine language. This step can be seen as a mini-compiler.

◮ If we want to retarget the compiler to a new machine

language, only this last step needs to be rewritten. Nice data abstraction.

115 / 150

slide-116
SLIDE 116

Phases: optimiser

116 / 150

slide-117
SLIDE 117

Phases: optimiser

Lexical analysis Syntax analysis Source program Semantic analysis, e.g. type checking Intermediate code generation Optimisation Code generation Translated program 117 / 150

slide-118
SLIDE 118

Phases: optimiser

118 / 150

slide-119
SLIDE 119

Phases: optimiser

Translating a program often introduces various inefficiencies, make the program e.g. run slow, or use a lot of memories, or use a lot of power (important for mobile phones). Optimisers try to remove these inefficiencies, by replacing the inefficient program with a more efficient version (without changing the meaning of the program).

119 / 150

slide-120
SLIDE 120

Phases: optimiser

Translating a program often introduces various inefficiencies, make the program e.g. run slow, or use a lot of memories, or use a lot of power (important for mobile phones). Optimisers try to remove these inefficiencies, by replacing the inefficient program with a more efficient version (without changing the meaning of the program). Most code optimisations are problems are difficult (NP complete or undecidable), so optimisers are expensive to run,

  • ften (but not always) lead to modest improvements only. They

are also difficult algorithmically. These difficulties are exacerbate for JITs because the are executed at program run-time.

120 / 150

slide-121
SLIDE 121

Phases: optimiser

Translating a program often introduces various inefficiencies, make the program e.g. run slow, or use a lot of memories, or use a lot of power (important for mobile phones). Optimisers try to remove these inefficiencies, by replacing the inefficient program with a more efficient version (without changing the meaning of the program). Most code optimisations are problems are difficult (NP complete or undecidable), so optimisers are expensive to run,

  • ften (but not always) lead to modest improvements only. They

are also difficult algorithmically. These difficulties are exacerbate for JITs because the are executed at program run-time. However, some optimisations are easy, e.g. inlining of functions: if a function is short (e.g. computing sum of two numbers), replacing the call to the function with its code, can lead to faster code. (What is the disadvantage of this?)

121 / 150

slide-122
SLIDE 122

Phases: code generation

122 / 150

slide-123
SLIDE 123

Phases: code generation

Lexical analysis Syntax analysis Source program Semantic analysis, e.g. type checking Intermediate code generation Optimisation Code generation Translated program 123 / 150

slide-124
SLIDE 124

Phases: code generation

124 / 150

slide-125
SLIDE 125

Phases: code generation

This straighforward phase translates the generated intermediate code to machine code. As machine code and intermediate code are much alike, this ’mini-compiler’ is simple and fast.

125 / 150

slide-126
SLIDE 126

Compilers vs interpreters

126 / 150

slide-127
SLIDE 127

Compilers vs interpreters

Interpreters are a second way to run programs.

127 / 150

slide-128
SLIDE 128

Compilers vs interpreters

Interpreters are a second way to run programs.

Compiler Source program Executable Data Output Source program Interpreter Data Output

At runtime.

Syntax error? Syntax error?

128 / 150

slide-129
SLIDE 129

Compilers vs interpreters

Interpreters are a second way to run programs.

Compiler Source program Executable Data Output Source program Interpreter Data Output

At runtime.

Syntax error? Syntax error?

◮ The advantage of compilers is

that generated code is faster, because a lot of work has to be done only once (e.g. lexing, parsing, type-checking,

  • ptimisation). And the results
  • f this work are shared in

every execution. The interpreter has to redo this work everytime.

129 / 150

slide-130
SLIDE 130

Compilers vs interpreters

Interpreters are a second way to run programs.

Compiler Source program Executable Data Output Source program Interpreter Data Output

At runtime.

Syntax error? Syntax error?

◮ The advantage of compilers is

that generated code is faster, because a lot of work has to be done only once (e.g. lexing, parsing, type-checking,

  • ptimisation). And the results
  • f this work are shared in

every execution. The interpreter has to redo this work everytime.

◮ The advantage of interpreters

is that they are much simpler than compilers.

130 / 150

slide-131
SLIDE 131

Compilers vs interpreters

Interpreters are a second way to run programs.

Compiler Source program Executable Data Output Source program Interpreter Data Output

At runtime.

Syntax error? Syntax error?

◮ The advantage of compilers is

that generated code is faster, because a lot of work has to be done only once (e.g. lexing, parsing, type-checking,

  • ptimisation). And the results
  • f this work are shared in

every execution. The interpreter has to redo this work everytime.

◮ The advantage of interpreters

is that they are much simpler than compilers. We won’t say much more about interpreters in this course.

131 / 150

slide-132
SLIDE 132

Literature

132 / 150

slide-133
SLIDE 133

Literature

Compilers are among the most studied and most well understood parts of informatics. Many good books exist. Here are some of my favourites, although I won’t follow any of them closely.

133 / 150

slide-134
SLIDE 134

Literature

Compilers are among the most studied and most well understood parts of informatics. Many good books exist. Here are some of my favourites, although I won’t follow any of them closely.

◮ Modern Compiler Implementation in Java (second

edition) by Andrew Appel and Jens Palsberg. Probably closest to our course. Moves quite fast.

134 / 150

slide-135
SLIDE 135

Literature

Compilers are among the most studied and most well understood parts of informatics. Many good books exist. Here are some of my favourites, although I won’t follow any of them closely.

◮ Modern Compiler Implementation in Java (second

edition) by Andrew Appel and Jens Palsberg. Probably closest to our course. Moves quite fast.

◮ Compilers - Principles, Techniques and Tools (second

edition) by Alfred V. Aho, Monica Lam, Ravi Sethi, and Jeffrey D. Ullman. The first edition of this book is is the classic text on compilers, known as the “Dragon Book”, but its first edition is a bit obsolete. The second edition is substantially expanded and goes well beyond the scope of

  • ur course. For my liking, the book is a tad long.

135 / 150

slide-136
SLIDE 136

Literature

136 / 150

slide-137
SLIDE 137

Literature

Some other material:

137 / 150

slide-138
SLIDE 138

Literature

Some other material:

◮ Engineering a Compiler, by Keith Cooper, Linda Torczon.

138 / 150

slide-139
SLIDE 139

Literature

Some other material:

◮ Engineering a Compiler, by Keith Cooper, Linda Torczon. ◮ The Alex Aiken’s Stanford University online course on

  • compilers. This course coveres similar ground as ours,

but goes more in-depth. I was quite influenced by Aiken’s course when I designed our’s.

139 / 150

slide-140
SLIDE 140

Literature

Some other material:

◮ Engineering a Compiler, by Keith Cooper, Linda Torczon. ◮ The Alex Aiken’s Stanford University online course on

  • compilers. This course coveres similar ground as ours,

but goes more in-depth. I was quite influenced by Aiken’s course when I designed our’s.

◮ Computer Architecture - A Quantitative Approach (sixth

edition) by John Hennessey and David Patterson. This is the ’bible’ for computer architecture. It goes way beyond what is required for our course, but very well written by some of the world’s leading experts on computer

  • architecture. Well worth studying.

140 / 150

slide-141
SLIDE 141

How to enjoy and benefit from this course

141 / 150

slide-142
SLIDE 142

How to enjoy and benefit from this course

◮ Assessed coursework is designed to reinforce and

integrate lecture material; it’s designed to help you pass the exam

142 / 150

slide-143
SLIDE 143

How to enjoy and benefit from this course

◮ Assessed coursework is designed to reinforce and

integrate lecture material; it’s designed to help you pass the exam

◮ Go look at the past papers - now.

143 / 150

slide-144
SLIDE 144

How to enjoy and benefit from this course

◮ Assessed coursework is designed to reinforce and

integrate lecture material; it’s designed to help you pass the exam

◮ Go look at the past papers - now. ◮ Use the tutorials to get feedback on your solutions

144 / 150

slide-145
SLIDE 145

How to enjoy and benefit from this course

◮ Assessed coursework is designed to reinforce and

integrate lecture material; it’s designed to help you pass the exam

◮ Go look at the past papers - now. ◮ Use the tutorials to get feedback on your solutions ◮ Substantial lab exercise should bring it all together

145 / 150

slide-146
SLIDE 146

How to enjoy and benefit from this course

◮ Assessed coursework is designed to reinforce and

integrate lecture material; it’s designed to help you pass the exam

◮ Go look at the past papers - now. ◮ Use the tutorials to get feedback on your solutions ◮ Substantial lab exercise should bring it all together ◮ Ask questions, in the lectures, in the labs, on Canvas or in

person!

146 / 150

slide-147
SLIDE 147

How to enjoy and benefit from this course

◮ Assessed coursework is designed to reinforce and

integrate lecture material; it’s designed to help you pass the exam

◮ Go look at the past papers - now. ◮ Use the tutorials to get feedback on your solutions ◮ Substantial lab exercise should bring it all together ◮ Ask questions, in the lectures, in the labs, on Canvas or in

person!

◮ Design your own mini-languages and write compilers for

them.

147 / 150

slide-148
SLIDE 148

How to enjoy and benefit from this course

◮ Assessed coursework is designed to reinforce and

integrate lecture material; it’s designed to help you pass the exam

◮ Go look at the past papers - now. ◮ Use the tutorials to get feedback on your solutions ◮ Substantial lab exercise should bring it all together ◮ Ask questions, in the lectures, in the labs, on Canvas or in

person!

◮ Design your own mini-languages and write compilers for

them.

◮ Have a look at real compilers. There are many free,

  • pen-source compilers, g.g. GCC, LLVM, TCC, MiniML,

Ocaml, the Scala compiler, GHC, the Haskell compiler.

148 / 150

slide-149
SLIDE 149

Feedback

In this module, you will receive feedback through:

◮ The mark and comments on your assessment ◮ Feedback to the whole class on assessment and exams ◮ Feedback to the whole class on lecture understanding ◮ Model solutions ◮ Worked examples in class and lecture ◮ Verbal comments and discussions with tutors in class ◮ Discussions with your peers on problems ◮ Online discussion forums ◮ One to one sessions with the tutors

The more questions you ask, the more you participate in discussions, the more you engage with the course, the more feedback you get.

149 / 150

slide-150
SLIDE 150

Questions?

150 / 150