Go Compiler for COMP-520 Vincent Foley-Bourgon Sable Lab McGill - - PowerPoint PPT Presentation

go compiler for comp 520
SMART_READER_LITE
LIVE PREVIEW

Go Compiler for COMP-520 Vincent Foley-Bourgon Sable Lab McGill - - PowerPoint PPT Presentation

Go Compiler for COMP-520 Vincent Foley-Bourgon Sable Lab McGill University November 2014 Agenda COMP-520 Go My implementation Lexer gotchas Parser gotchas Recap Questions welcome during presentation 2 / 47 COMP-520


slide-1
SLIDE 1

Go Compiler for COMP-520

Vincent Foley-Bourgon

Sable Lab McGill University

November 2014

slide-2
SLIDE 2

Agenda

◮ COMP-520 ◮ Go ◮ My implementation

◮ Lexer gotchas ◮ Parser gotchas

◮ Recap

Questions welcome during presentation

2 / 47

slide-3
SLIDE 3

COMP-520

◮ Introduction to compilers ◮ Project-oriented ◮ Being updated ◮ One possible project: a compiler for Go

3 / 47

slide-4
SLIDE 4

COMP-520

◮ Introduction to compilers ◮ Project-oriented ◮ Being updated ◮ One possible project: a compiler for Go ◮ Super fun, you should take it!

3 / 47

slide-5
SLIDE 5

Go

slide-6
SLIDE 6

Go

◮ Created by Unix old-timers (Ken Thompson, Rob Pike)

who happen to work at Google

◮ Helps with issues they see at Google (e.g. complexity,

compilation times)

◮ Imperative with some OO concepts

◮ Methods and interfaces ◮ No classes or inheritance

◮ Focus on concurrency (goroutines and channels) ◮ GC ◮ Simple, easy to remember semantics ◮ Open source

5 / 47

slide-7
SLIDE 7

Why Go for a compilers class?

◮ Language is simple ◮ Detailed online specification ◮ Encompasses all the classical compiler phases ◮ Allows students to work with a language that is quickly

growing in popularity

6 / 47

slide-8
SLIDE 8

Current work

slide-9
SLIDE 9

My compiler

◮ Explore the implementation of Go ◮ Pin-point the tricky parts ◮ Find a good subset

◮ Useful for writing programs ◮ Covered by important compiler topics ◮ Limit implementation drudgery 8 / 47

slide-10
SLIDE 10

Tools

◮ Language: OCaml 4.02 ◮ Lexer generator: ocamllex (ships with OCaml) ◮ Parser generator: Menhir (LR(1), separate from OCaml)

9 / 47

slide-11
SLIDE 11

Why OCaml?

◮ Good lexer and parser generators ◮ Algebraic data types are ideal to create ASTs and other IRs ◮ Pattern matching is great for acting upon AST ◮ I like it!

10 / 47

slide-12
SLIDE 12

Lexer

slide-13
SLIDE 13

Lexer

◮ Written with ocamllex ◮ ∼270 lines of code ◮ Go spec gives all the necessary details ◮ One tricky part: automatic semi-colon insertion

12 / 47

slide-14
SLIDE 14

Semi-colons

What you write What the parser expects

package main import ( "fmt" "math" ) func main () { x := math.Sqrt (18) fmt.Println(x) }

13 / 47

slide-15
SLIDE 15

Semi-colons

What you write What the parser expects

package main import ( "fmt" "math" ) func main () { x := math.Sqrt (18) fmt.Println(x) } package main; import ( "fmt"; "math"; ); func main () { x := math.Sqrt (18); fmt.Println(x); };

14 / 47

slide-16
SLIDE 16

Semi-colons

When the input is broken into tokens, a semicolon is automatically inserted into the token stream at the end of a non-blank line if the line’s final token is

◮ an identifier ◮ a literal ◮ one of the keywords break, continue,

fallthrough, or return

◮ one of the operators and delimiters ++, --, ), ], or }

15 / 47

slide-17
SLIDE 17

Solution

rule next_token = parse (* ... *) | "break" { T_break } | ’\n’ { next_token lexbuf }

16 / 47

slide-18
SLIDE 18

Solution

rule next_token = parse (* ... *) | "break" { yield lexbuf T_break } | ’\n’ { if needs_semicolon lexbuf then yield lexbuf T_semi_colon else next_token lexbuf }

17 / 47

slide-19
SLIDE 19

Solution

rule next_token = parse (* ... *) | "break" { yield lexbuf T_break } | ’\n’ { if needs_semicolon lexbuf then yield lexbuf T_semi_colon else next_token lexbuf } | "//" { line_comment lexbuf } and line_comment = parse | ’\n’ { if needs_semicolon lexbuf then yield lexbuf T_semi_colon else next_token lexbuf } | _ { line_comment lexbuf }

18 / 47

slide-20
SLIDE 20

Pause philosophique

Is Go lexically a regular language?

19 / 47

slide-21
SLIDE 21

Lexer

Supports most of the Go specification

◮ Unicode characters are not allowed in identifiers ◮ No unicode support in char and string literals ◮ Don’t support second semi-colon insertion rule func () int { return 42; }

20 / 47

slide-22
SLIDE 22

Parser

slide-23
SLIDE 23

Parser & AST

◮ Parser written with Menhir ◮ Parser: ∼600 lines of code (incomplete) ◮ AST: ∼200 lines of code ◮ Some constructs are particularily tricky!

22 / 47

slide-24
SLIDE 24

Tricky construct #1: function parameters

func substr(string , int , int) // unnamed arguments

23 / 47

slide-25
SLIDE 25

Tricky construct #1: function parameters

func substr(string , int , int) // unnamed arguments func substr(str string , start int , length int) // named arguments , long form

23 / 47

slide-26
SLIDE 26

Tricky construct #1: function parameters

func substr(string , int , int) // unnamed arguments func substr(str string , start int , length int) // named arguments , long form func substr(str string , start , length int) // named arguments , short form

23 / 47

slide-27
SLIDE 27

Tricky construct #1: function parameters

func substr(string , int , int) // unnamed arguments func substr(str string , start int , length int) // named arguments , long form func substr(str string , start , length int) // named arguments , short form func substr(string , start , length int) // Three parameters

  • f type

int

23 / 47

slide-28
SLIDE 28

Tricky construct #1: function parameters

func substr(string , int , int) // unnamed arguments func substr(str string , start int , length int) // named arguments , long form func substr(str string , start , length int) // named arguments , short form func substr(string , start , length int) // Three parameters

  • f type

int func substr(str string , start int , int) // Syntax error

23 / 47

slide-29
SLIDE 29

Tricky construct #1: function parameters

func substr(string , int , int) // unnamed arguments func substr(string , start , length int) // Three parameters

  • f type

int

24 / 47

slide-30
SLIDE 30

Tricky construct #1: function parameters

How to figure out named and unnamed parameter?

◮ Read list of either type or identifier type ◮ Process list to see if all type or at least one identifier

type

◮ Generate the correct AST nodes (i.e. ParamUnnamed(type)

  • r ParamNamed(id, type))

25 / 47

slide-31
SLIDE 31

Tricky construct #1: function parameters

How to figure out named and unnamed parameter?

◮ Read list of either type or identifier type ◮ Process list to see if all type or at least one identifier

type

◮ Generate the correct AST nodes (i.e. ParamUnnamed(type)

  • r ParamNamed(id, type))

Only named parameters for project.

25 / 47

slide-32
SLIDE 32

Tricky construct #2: Calls, conversions and built-ins

From the Go FAQ: [...] Second, the language has been designed to be easy to analyze and can be parsed without a symbol table.

26 / 47

slide-33
SLIDE 33

Tricky construct #2: Calls, conversions and built-ins

Type conversions in Go look like function calls:

int (3.2) // type conversion fib (24) // function call

27 / 47

slide-34
SLIDE 34

Tricky construct #2: Calls, conversions and built-ins

Type conversions in Go look like function calls:

int (3.2) // type conversion fib (24) // function call ... probably

27 / 47

slide-35
SLIDE 35

Tricky construct #2: Calls, conversions and built-ins

Type conversions in Go look like function calls:

int (3.2) // type conversion fib (24) // function call ... probably ◮ It depends: is fib is a type? ◮ How do we generate the proper AST node? ◮ We need to keep track of identifiers in scope, i.e. a symbol

table

◮ More complex parsing:

call ::= expr or type ’(’ expr* ’)’ e.g. []*int(z)

27 / 47

slide-36
SLIDE 36

Tricky construct #2: Calls, conversions and built-ins

Built-ins also look like function calls:

xs := make ([]int , 3) // [0, 0, 0] xs = append(xs , 1) // [0, 0, 0, 1] len(xs) // 4

What’s different?

28 / 47

slide-37
SLIDE 37

Tricky construct #2: Calls, conversions and built-ins

Built-ins also look like function calls:

xs := make([]int, 3) // [0, 0, 0] xs = append(xs , 1) // [0, 0, 0, 1] len(xs) // 4

What’s different?

◮ The first parameter of a built-in can be a type ◮ call ::= expr or type ’(’ expr or type* ’)’ ◮ Very difficult to get right: expr and type conflict (i.e.

identifier)

◮ Factor the type non-terminals (expr-term-factor)

◮ AST “pollution”

29 / 47

slide-38
SLIDE 38

Tricky construct #2: Calls, conversions and built-ins

FunCall Id("fib") Int(24) Call Id("fib") Int(24) Expr Expr

30 / 47

slide-39
SLIDE 39

Tricky construct #2: Calls, conversions and built-ins

Call Slice(Id("int")) Expr T ype Id("make") Expr Call Id("int") Float(3.2) Expr Expr Int(3) Call Ptr(int) Id("ptr") T ype Expr Call Slice(Id("int")) Expr T ype Id("make") Expr Call Id("int") Float(3.2) Expr Expr Int(3) Call Ptr(int) Id("ptr") T ype Expr

31 / 47

slide-40
SLIDE 40

Pause philosophique

What does it mean to parse a language?

32 / 47

slide-41
SLIDE 41

Pause philosophique

What does it mean to parse a language?

◮ For theorists: does a sequence of symbol belong to a

language?

◮ For compiler writers: can I generate a semantically-precise

AST from this sequence of symbols?

32 / 47

slide-42
SLIDE 42

Tricky construct #3: chan directionality

◮ chan int: channel of ints ◮ chan<- int: send-only channel of ints ◮ <-chan int: receive-only channel of ints

What is chan <- chan int?

33 / 47

slide-43
SLIDE 43

Tricky construct #3: chan directionality

◮ chan int: channel of ints ◮ chan<- int: send-only channel of ints ◮ <-chan int: receive-only channel of ints

What is chan <- chan int? chan<- (chan int)

33 / 47

slide-44
SLIDE 44

Tricky construct #3: chan directionality

How do we implement this?

◮ Apply expr-term-factor factorization to types ◮ No PEDMAS for types ◮ Remember expr or type? Fun times :/ ◮ Complicates parser

34 / 47

slide-45
SLIDE 45

Semantics

slide-46
SLIDE 46

Semantics - interfaces

A type implicitly implements an interface if it has the right methods.

type Point struct { x, y int } type Summable interface { Sum () int } func (p Point) Sum () int { return p.x + p.y } func Test(s Summable) { fmt.Println(s.Sum ()) } func main () { p := Point{ 3, 4 } Test(p) }

36 / 47

slide-47
SLIDE 47

Semantics - constants

Go does not perform automatic type conversions:

var x int = 15 // OK var y float64 = x // Error

Constants are “untyped” however:

var x int = 15 // OK var y float64 = 15 // OK var z int = 3.14 // Error

Constants are high-precision:

Pi = 3.1415926535897932384626433832795028841971693993751 HalfPi = Pi / 2 // Also high -precision

37 / 47

slide-48
SLIDE 48

Status

slide-49
SLIDE 49

Status - Lexer

◮ As complete as I want it at the moment ◮ Don’t intend to add unicode support

39 / 47

slide-50
SLIDE 50

Status - Parser

◮ Lacks support for many constructs (e.g. type switches,

implicitly initialized consts, chan directionality, etc.)

◮ Some cheating to make productions easier (e.g. disallow

parenthesized types in expr or type)

40 / 47

slide-51
SLIDE 51

Status - AST

◮ In the process of simplifying the AST (e.g. generalized call

node)

◮ Create a new, semantically richer AST that will be a result

  • f type checking: type and scope information will be used

to create appropriate nodes

41 / 47

slide-52
SLIDE 52

Status - Semantic analysis

◮ On-going, currently doing basic types ◮ Scrapped some phases that turned out to be unnecessary ◮ Cheating by simplifying the rules of constants (e.g. var x

float64 = 3 is a type error)

◮ Haven’t started thinking about interface types yet; maybe

leave them out

42 / 47

slide-53
SLIDE 53

Status - Code generation

◮ Not started at the moment ◮ Probably going to target JS or C ◮ JS: easy support for closures and multi-value returns

43 / 47

slide-54
SLIDE 54

Misc.

◮ How to support a mini stdlib? ◮ GC? Allocate and forget, that’s my motto!

44 / 47

slide-55
SLIDE 55

Language subset

slide-56
SLIDE 56

Language subset

◮ No concurrency support (channels, goroutines, select) ◮ Simplify some of the syntax (unnamed parameters) ◮ Eliminate “exotic” features (complex numbers, iota) ◮ Simplify constants ◮ Eliminate methods and interfaces ◮ No GC

46 / 47

slide-57
SLIDE 57

Questions?