compiler construction
play

Compiler Construction October 31, 2018 Compiler Construction - PowerPoint PPT Presentation

Compiler Construction October 31, 2018 Compiler Construction October 31, 2018 1 / 175 Mayer Goldberg \ Ben-Gurion University Mayer Goldberg \ Ben-Gurion University Chapter 2 Goals October 31, 2018 Compiler Construction 2 / 175 Agenda


  1. Delimiters in various languages C & Scheme Spaces, tab, newlines, carriage returns, form feeds are examples of whitespaces Java Literal newline characters may not occur inside a literal string (must use \n ). Otherwise, similar to C & Scheme. Python Leading tabs are not whitespaces because they have a clear syntactic function: They denote nesting level. Compiler Construction October 31, 2018 26 / 175 Mayer Goldberg \ Ben-Gurion University

  2. Concrete vs Abstract syntax Artifacts of the Concrete Syntax mechanisms (e.g., begin...end ) Re-examine the concrete and abstract syntax for the factorial function, and notice what’s gone! Compiler Construction October 31, 2018 27 / 175 ▶ Delimiters & whitespaces ▶ Parentheses, brackets, braces, and other grouping and nesting Mayer Goldberg \ Ben-Gurion University

  3. Concrete vs Abstract syntax ( continued ) The concrete syntax (define fact (lambda (n) (if (zero? n) 1 (* n (fact (- n1)))))) The abstract syntax Compiler Construction October 31, 2018 28 / 175 Mayer Goldberg \ Ben-Gurion University

  4. The pipeline of the compiler Basic concepts Compiler Construction October 31, 2018 29 / 175 🗹 Concrete syntax 🗹 Abstract syntax 🗹 Abstract Syntax-Tree (AST) 🗹 Token 🗹 Delimiter 🗹 Whitespace Mayer Goldberg \ Ben-Gurion University

  5. More on parsing To parse computer programs in a given language, we rely on: etc) grammar (e.g., BNF) Parser generator: Takes a description of the grammar for a language, and generates a parser. For example, yacc , bison , nearly , etc. Compiler Construction October 31, 2018 30 / 175 ▶ Grammars with which to express the syntax of the language ▶ There are difgerent kinds of grammars (CFG, CSG, two-level, ▶ There are difgerent languages for expressing syntax in a ▶ Algorithms for parsing programs as per kind of grammar ▶ Techniques (e.g., parsing combinators, DCGs) Mayer Goldberg \ Ben-Gurion University

  6. The pipeline of the compiler ( continued ) parser can avoid re-identifying and re-building complex tokens October 31, 2018 Compiler Construction Scanning such as numbers, strings, etc 31 / 175 characters numbers, strings, etc. ▶ Going from characters to tokens ▶ Identifying & grouping characters into tokens for words, ▶ Parsing over tokens is more effjcient than parsing over ☞ As the parser examines various ways to parse the code, the Parser asm / chars tokens sexprs ASTs ASTs mach lang Semantic Code Scanner Reader Tag-Parser Analyser Generator Mayer Goldberg \ Ben-Gurion University

  7. The pipeline of the compiler ( continued ) for the code October 31, 2018 Compiler Construction Reading itself. capabilities of refmection, i.e., code examining and working with 32 / 175 ▶ In LISP/Prolog, the parser is split into two components: ▶ The reader, or the parser for the data language ▶ The tag-parser, or the parser for the source code ▶ In LISP/Scheme/Racket/Clojure/etc, the abstract syntax for the data is the concrete syntax for the code ▶ In Prolog, the abstract syntax for the data is the abstract syntax ▶ Prolog is the programming language with the most powerful Parser asm / chars tokens sexprs ASTs ASTs mach lang Semantic Code Scanner Reader Tag-Parser Analyser Generator Mayer Goldberg \ Ben-Gurion University

  8. The pipeline of the compiler ( continued ) data October 31, 2018 Compiler Construction Reading — Summary this, later) 33 / 175 part of the syntax of data, concrete syntax is given as a stream the syntax of data, things are a bit more complex: of characters ▶ In programming languages in which the syntax of code is not a ▶ In programming languages in which the syntax of code is part of ▶ The concrete syntax of data is a stream of characters ▶ The concrete language of code is the abstract syntax of the ▶ In Scheme, the language of data is called S-expressions (more on Parser asm / chars tokens sexprs ASTs ASTs mach lang Semantic Code Scanner Reader Tag-Parser Analyser Generator Mayer Goldberg \ Ben-Gurion University

  9. The pipeline of the compiler ( continued ) valid sexpr October 31, 2018 Compiler Construction Tag-Parsing 34 / 175 for] expressions ▶ The tag-parser takes sexprs and returns [ASTs for] exprs ▶ Languages other than from the LISP & Prolog families do not split parsing into a reader & tag-parser ▶ In such languages, parsing goes directly from tokens to [ASTs ☞ Every valid program ”used to be” [i.e., before tag-parsing] a ☞ Not every valid sexpr is a valid program! Parser asm / chars tokens sexprs ASTs ASTs mach lang Semantic Code Scanner Reader Tag-Parser Analyser Generator Mayer Goldberg \ Ben-Gurion University

  10. The pipeline of the compiler ( continued ) Question A parser should: array-index errors, etc.) specifjcation Compiler Construction October 31, 2018 35 / 175 👏 Perform optimizations 👏 Evaluate expressions 👏 Raise type-mismatch errors 👏 Find potential runtime errors (null-pointer dereferences, 👎 Validate the structure of input programs against a syntactic Mayer Goldberg \ Ben-Gurion University

  11. The pipeline of the compiler ( continued ) Question Using an AST, it is impossible to: input program (code generation) Compiler Construction October 31, 2018 36 / 175 👏 Perform code reformatting/beautifjcation/style-checking 👏 Perform optimizations 👏 Output a new program which is semantically equivalent to the 👏 Refactor the input program 👎 Generate a list of all the comments in the code Mayer Goldberg \ Ben-Gurion University

  12. The pipeline of the compiler ( continued ) Semantic Analysis October 31, 2018 Compiler Construction 37 / 175 ▶ Annotate the ASTs ▶ Compute addresses ▶ Annotate tail-calls ▶ Type-check code ▶ Perform optimizations Parser asm / chars tokens sexprs ASTs ASTs mach lang Semantic Code Scanner Reader Tag-Parser Analyser Generator Mayer Goldberg \ Ben-Gurion University

  13. The pipeline of the compiler ( continued ) Code Generation October 31, 2018 Compiler Construction 38 / 175 ▶ Generate a stream of instructions in ▶ assembly language ▶ machine language ▶ Build executable ▶ some other target language… ▶ Perform low-level optimizations Parser asm / chars tokens sexprs ASTs ASTs mach lang Semantic Code Scanner Reader Tag-Parser Analyser Generator Mayer Goldberg \ Ben-Gurion University

  14. The compiler for the course Our compiler project What our project shall lack Compiler Construction October 31, 2018 39 / 175 ▶ Written in ocaml ▶ Support a subset of Scheme + extensions ▶ Support two, simple optimizations ▶ Compile to x86/64 ▶ Run on linux ▶ Support for the full language of Scheme ▶ Support for garbage collection ▶ Self-compilation Mayer Goldberg \ Ben-Gurion University

  15. S-expressions and there’s a tricky relationship between the two. October 31, 2018 Compiler Construction about data 40 / 175 Python, and many other languages Scheme ▶ We’re going to learn about syntax by studying the syntax of ▶ After all, we’re writing a Scheme compiler… ▶ It’s relatively simple, compared to the syntax of C, Java, ▶ It comes with some interesting twists ▶ Scheme comes with two languages: ▶ A language for code ▶ A language for data ▶ The key to understanding the syntax of Scheme, is to think Mayer Goldberg \ Ben-Gurion University

  16. The Language of Data What is a language of data? — A language in which to Compiler Construction October 31, 2018 41 / 175 ▶ Describe arbitrarily-complex data ▶ Possibly multi-dimensional, deeply nested ▶ Polymorphic ▶ Access components easily and effjciently Mayer Goldberg \ Ben-Gurion University

  17. The Language of Data ( continued ) Today many languages of data are known: Compiler Construction October 31, 2018 42 / 175 ▶ S-expressions (the fjrst: 1959) ▶ Functors (1972) ▶ Datalog (1977) ▶ SGML (1986) ▶ MS DDE (1987) ▶ CORBA (1991) ▶ MS COM (1993) ▶ MS DCOM (1996) ▶ XML (1996) ▶ JSON (2001) Mayer Goldberg \ Ben-Gurion University

  18. The Language of Data ( continued ) What makes S-expressions and Functors unique? languages Scheme & Racket of data Compiler Construction October 31, 2018 43 / 175 ▶ They’re the fjrst… 😊 ▶ They’re supported natively, as part of specifjc programming ▶ S-expressions are supported by LISP-based languages, including ▶ Functors are supported by Prolog-based languages ▶ The language of programming is a [strict] subset of the language Mayer Goldberg \ Ben-Gurion University

  19. The Language of Data ( continued ) ... October 31, 2018 Compiler Construction This would be cumbersome, and weird! </package> </class> </method> <method name="goo"> Think for a moment about the language of XML: <class name="Foo"> <package name="Foo"> libraries <something>...</something> , etc 44 / 175 ▶ It’s not supported natively by any programming language ▶ Most modern languages (Java, Python, etc) support it via ▶ No programming language is written in XML: Mayer Goldberg \ Ben-Gurion University

  20. The Language of Data ( continued ) However, if some programming language both Then a parser for XML could also read programs written in that language: have been much simpler! Compiler Construction October 31, 2018 45 / 175 ▶ Supported XML as its data language ▶ Were itself written in XML ▶ Writing interpreters, compilers, and other language-tools would ▶ Refmection (code examining code) would be simple Mayer Goldberg \ Ben-Gurion University

  21. The Language of Data ( continued ) This is the case with S-expressions: October 31, 2018 Compiler Construction and data booleans , the empty list , etc. 46 / 175 Scheme much simpler than in other languages ▶ They are the data language for LISP-based languages, including ▶ LISP-based languages are written using S-expressions ▶ Writing interpreters and compilers in LISP-based languages is ▶ Computational refmection was invented in LISP! ▶ This is the real reason behind all these parentheses in Scheme: ▶ A very simple language ▶ Supports core types: pairs , vectors , symbols , strings , numbers , ▶ A syntactic compromise that is great for expressing both code Mayer Goldberg \ Ben-Gurion University

  22. S-expressions ( continued ) concerned itself with numbers October 31, 2018 Compiler Construction usually with arrays of characters and/or integers… Back to S-expressions working with non-numeric data types was diffjcult 47 / 175 expressions ▶ S-expressions were invented along with LISP, in 1959 ▶ S-expressions stand for Symbolic Expressions ▶ The term is intended to distinguish itself from numerical ▶ Before LISP (and long after it was invented), most computation ▶ Computers languages were great at ”crunching numbers”, but ▶ String libraries were non-standard and uncommon ▶ Polymorphic data was unheard of ▶ Nested data structured needed to be implemented from scratch, Mayer Goldberg \ Ben-Gurion University

  23. S-expressions ( continued ) booleans and…) October 31, 2018 Compiler Construction Back to S-expressions 48 / 175 programming language (LISP): Then S-expressions were invented as part of a very dynamic ▶ Working with data structures became considerably simpler ▶ Trivially allocated (no pointer arithmetic) ▶ Polymorphic (lists of lists of numbers and strings and vectors of ▶ Easy to access sub-structures (no pointer arithmetic) ▶ Easy to modify (in an easygoing, functional style) ▶ Easy to redefjne ▶ Automatically deallocated ( garbage collection ) ▶ Treating code as data became considerably simpler Mayer Goldberg \ Ben-Gurion University

  24. S-expressions ( continued ) Several fjelds were invented using LISP and its tools: Mathematica ) Compiler Construction October 31, 2018 49 / 175 ▶ Symbolic Mathematics ( Macsyma , a precursor to Wolfram ▶ Artifjcial Intelligence ▶ Computer adventure game generation languages (MDL, ZIL) Mayer Goldberg \ Ben-Gurion University

  25. S-expressions ( continued ) Defjnition: S-expressions The language is made up of Compiler Construction October 31, 2018 50 / 175 ▶ The empty list: () ▶ Booleans: #f , #t ▶ Characters: #\a , #\Z , #\space , #\return , #\x05d0 , etc ▶ Strings: "abc" , "Hello\nWorld\t\x05d0;hi!" , etc ▶ Numbers: -23 , #x41 , 2/3 , 2-3i , 2.34 , -2.34+3.5i ▶ Symbols: abc , lambda , define , fact , list->string ▶ Pairs: (a . b) , (a b c) , (a (2 . #f) "moshe") ▶ Vectors: #() , #(a b ((1 . 2) #f) "moshe") Traditionally, non-pairs are known as atoms. Mayer Goldberg \ Ben-Gurion University

  26. S-expressions ( continued ) Proper & improper lists cdr . For all x , y : Compiler Construction October 31, 2018 51 / 175 ▶ The name LISP comes from LISt Processing. ▶ In fact, LISP has no direct support for lists. ▶ LISP has ordered pairs ▶ Ordered pairs are created using cons ▶ The fjrst and second projections over ordered pairs are car and ▶ (car (cons x y)) ≡ x ▶ (cdr (cons x y)) ≡ y ▶ The ordered pair of x and y can be written as (x . y) Mayer Goldberg \ Ben-Gurion University

  27. S-expressions ( continued ) The dot rules Two rules govern how ordered pairs are printed: which looks like a list of 1 element. printed as (E1 E2 — ) Compiler Construction October 31, 2018 52 / 175 ▶ Rule 1: For any E , the ordered pair (E . ()) is printed as (E) , ▶ Rule 2: For any E1 , E2 , …, the ordered pair (E1 . (E2 — )) is ▶ These rules just efgect how pairs are printed ▶ These rules give us a canonical representation for pairs Mayer Goldberg \ Ben-Gurion University

  28. S-expressions ( continued ) Example October 31, 2018 Compiler Construction 53 / 175 ▶ The pair (a . (b . c)) is printed as (a b . c) PAIR CAR CDR SYMBOL PAIR a CAR CDR SYMBOL SYMBOL b c Mayer Goldberg \ Ben-Gurion University

  29. S-expressions ( continued ) Example October 31, 2018 Compiler Construction 54 / 175 printed as ((a b) (c d)) ▶ The pair ((a . (b . ())) . ((c . (d . ())))) is PAIR CAR CDR PAIR PAIR CAR CDR CAR CDR SYMBOL PAIR PAIR NIL a CAR CDR CAR CDR SYMBOL SYMBOL PAIR NIL b c CAR CDR SYMBOL NIL d Mayer Goldberg \ Ben-Gurion University

  30. S-expressions ( continued ) improper lists. Compiler Construction October 31, 2018 55 / 175 ▶ Lists in Scheme can come in two forms, proper lists and ▶ When we just speak of lists, we usually mean proper lists. ▶ Most of the list processing functions ( length , map , etc) all work with proper lists. Mayer Goldberg \ Ben-Gurion University

  31. S-expressions ( continued ) Proper lists is the empty list (aka nil ) predicate pair? pairs, until it reaches their rightmost cdr . This is done by means of the builtin predicate list? Compiler Construction October 31, 2018 56 / 175 ▶ Proper lists are nested ordered pairs the rightmost cdr of which ▶ Testings for pairs is cheap, and is done by means of the builtin ▶ Testing for lists is expensive, since it traverses nested, ordered Mayer Goldberg \ Ben-Gurion University

  32. S-expressions ( continued ) Proper lists Here’s a defjnition for list? : ( define list ? ( lambda (e) ( or ( null ? e) ( and (pair? e) ( list ? (cdr e)))))) Compiler Construction October 31, 2018 57 / 175 Mayer Goldberg \ Ben-Gurion University

  33. S-expressions ( continued ) Improper lists work over improper lists could be written as follows: ( define improper-list? ( lambda (e) ( and (pair? e) ( not ( list ? (cdr e)))))) Compiler Construction October 31, 2018 58 / 175 ▶ Pairs that are not proper lists are improper lists. ▶ Improper lists end with a rightmost cdr that is not nil ▶ List-processing procedures such as length , map , etc., do not ▶ There is no builtin procedure for testing improper lists, but it Mayer Goldberg \ Ben-Gurion University

  34. S-expressions ( continued ) Self-evaluating forms Booleans, numbers, characters, strings are self-evaluating forms. You can evaluate them directly at the prompt: > 123 123 > "abc" "abc" > #t #t > #\m #\m Compiler Construction October 31, 2018 59 / 175 Mayer Goldberg \ Ben-Gurion University

  35. S-expressions ( continued ) Other forms October 31, 2018 Compiler Construction Type (debug) to enter the debugger. Exception: variable b is not bound > (a b c) 60 / 175 prompt: prompt generates a run-time error. The empty list, pairs, and vectors cannot be evaluated directly at the ▶ Entering an empty list or a vector or an improper list at the ▶ Entering a symbol at the prompt causes Scheme to attempt to evaluate a variable by the same name ▶ Entering a proper list, that is not the empty list, at the prompt causes Scheme to attempt to evaluate an application: Mayer Goldberg \ Ben-Gurion University

  36. S-expressions: quote & friends To evaluate S-expressions that are not self-evaluating, we must use October 31, 2018 Compiler Construction when you type 'abc at the Scheme prompt, you get back abc the literal symbol abc the variable abc the form quote : 61 / 175 ▶ The special form quote can be written in two ways: ▶ (quote <sexpr>) ▶ '<sexpr> Both forms are equivalent ▶ When you type abc at the Scheme prompt, you’re evaluating ▶ When you type 'abc at the Scheme prompt, you’re evaluating ▶ The value of the literal symbol abc is just itself, which is why Mayer Goldberg \ Ben-Gurion University

  37. S-expressions: quote & friends application with no function and no arguments! This is a syntax-error! literal empty list when you type '() at the Scheme prompt, you get back () Compiler Construction October 31, 2018 62 / 175 ▶ When you type () at the Scheme prompt, you’re evaluating an ▶ When you type '() at the Scheme prompt, you’re evaluating a ▶ The value of the literal empty list is just itself, which is why Mayer Goldberg \ Ben-Gurion University

  38. S-expressions: quote & friends redundant: October 31, 2018 Compiler Construction 5 > (+ '2 '3) 2 > '2 63 / 175 back (a b c) . why when you type '(a b c) at the Scheme prompt, you get evaluating the literal list (a b c) and c , which are variables evaluating the application of the procedure a to the arguments b ▶ When you type (a b c) at the Scheme prompt, you’re ▶ When you type '(a b c) at the Scheme prompt, you’re ▶ The value of the literal list (a b c) is just (a b c) , which is ▶ Quoting a self-evaluating S-expression is possible, and Mayer Goldberg \ Ben-Gurion University

  39. S-expressions: quote & friends So what does quote do? syntactic function of braces { ... } in C in defjning literal data: const int A[] = {4, 9, 6, 3, 5, 1}; Compiler Construction October 31, 2018 64 / 175 ▶ The quote form does nothing ▶ It is not a procedure ▶ It doesn’t take an argument ▶ It delimits a constant, literal S-expressions ▶ The syntactic function of quote in Scheme is the same as the Mayer Goldberg \ Ben-Gurion University

  40. S-expressions: quote & friends Meet quasiquote ways: Compiler Construction October 31, 2018 65 / 175 ▶ Simlarly to quote , the form quasiquote can be written in two ▶ (quasiquote <sexpr>) ▶ `<sexpr> ▶ quasiquote is also used to defjne data: ▶ `abc is the same as 'abc ▶ `(a b c) is the same as '(a b c) ▶ But quasiquote has two neat tricks! Mayer Goldberg \ Ben-Gurion University

  41. S-expressions: quote & friends Meet quasiquote quasiquote -expression: dynamic data into the data defjned with quasiquote Compiler Construction October 31, 2018 66 / 175 ▶ The following two forms may occur within a ▶ The unquote form: ▶ (unquote <sexpr>) ▶ ,<sexpr> ▶ The unquote-splicing form: ▶ (unquote-splicing <sexpr>) ▶ ,@<sexpr> ▶ Both unquote & unquote-splicing are used to mix in Mayer Goldberg \ Ben-Gurion University

  42. S-expressions: quote & friends Meet quasiquote October 31, 2018 Compiler Construction (a x y z w b) > `(a ,@(append '(x y) '(z w)) b) (a (x y z w) b) > `(a ,(append '(x y) '(z w)) b) (a 6 b) > `(a ,(+ 1 2 3) b) (a (+ 1 2 3) b) > `(a (+ 1 2 3) b) (a ,(+ 1 2 3) b) > '(a ,(+ 1 2 3) b) (a (+ 1 2 3) b) > '(a (+ 1 2 3) b) 67 / 175 Mayer Goldberg \ Ben-Gurion University

  43. S-expressions: quote & friends Meet quasiquote equivalent to (cons 'a (cons (append '(x y) '(z w)) '(b))) equivalent to (cons 'a (append (append '(x y) '(z w)) '(b))) Compiler Construction October 31, 2018 68 / 175 ▶ The expression `(a ,(append '(x y) '(z w)) b) is ▶ The expression `(a ,@(append '(x y) '(z w)) b) is ▶ The difgerence between unquote & unquote-splicing is that ▶ unquote mixes in an expression using cons ▶ unquote-splicing mixes in an expression using append Mayer Goldberg \ Ben-Gurion University

  44. S-expressions: quote & friends Meet quasiquote October 31, 2018 Compiler Construction shows us a computation… applications within programming languages immediately into convenient ways to create code template, that is, by specifying the shape of the data mechanism known as the quasiquote mechanism or the backquote 69 / 175 ▶ Together, quasiquote , unquote , & unquote-splicing are ▶ The quasiquote mechanism allows us to create data by ▶ In Scheme, convenient ways to create data translate ▶ Therefore we expect the quasiquote mechanism to have useful ▶ We can turn code that computes something into code that Mayer Goldberg \ Ben-Gurion University

  45. S-expressions: quote & friends Consider the familiar factorial function: ( define fact ( lambda (n) ( if (zero? n) 1 (* n (fact (- n 1)))))) Compiler Construction October 31, 2018 70 / 175 Mayer Goldberg \ Ben-Gurion University

  46. S-expressions: quote & friends We use the quasiquote mechanism to convert the application (* n ( define fact ( lambda (n) ( if (zero? n) 1 `(* ,n ,(fact (- n 1)))))) Running (fact 5) now gives: > (fact 5) (* 5 (* 4 (* 3 (* 2 (* 1 1))))) Compiler Construction October 31, 2018 71 / 175 (fact (- n 1))) into code that describes what factorial does: As you can see, factorial now prints a trace of the computation. Mayer Goldberg \ Ben-Gurion University

  47. S-expressions: quote & friends We are now going to use the quasiquote mechanism to get Scheme to teach us about the structure of S-expressions. Consider the following code: ( define foo ( lambda (e) ( cond ((pair? e) ( cons (foo (car e)) (foo (cdr e)))) (( or ( null ? e) (symbol? e)) e) ( else e)))) What does this program do? Compiler Construction October 31, 2018 72 / 175 Mayer Goldberg \ Ben-Gurion University

  48. S-expressions: quote & friends Let’s call foo with some arguments: > (foo 'a) a > (foo 123) 123 > (foo '()) () > (foo '(a b c)) (a b c) Compiler Construction October 31, 2018 73 / 175 Mayer Goldberg \ Ben-Gurion University

  49. S-expressions: quote & friends we notice that: October 31, 2018 Compiler Construction the pair removed the 2nd] Looking over the code again 74 / 175 ( else e)))) (( or ( null ? e) (symbol? e)) e) (foo (cdr e)))) ( cons (foo (car e)) ( cond ((pair? e) ( lambda (e) ( define foo ▶ The 2nd and 3rd ribs of the cond overlap [we could have ▶ All atoms are left unchanged ▶ All pairs are duplicated, while recursing over the car and cdr of So foo does nothing, though it does it recursively! ☺ Mayer Goldberg \ Ben-Gurion University

  50. S-expressions: quote & friends We now use the quasiquote mechanism to cause foo to generate a trace: ( define foo ( lambda (e) ( cond ((pair? e) `( cons ,(foo (car e)) ,(foo (cdr e)))) (( or ( null ? e) (symbol? e)) `',e) ( else e)))) Compiler Construction October 31, 2018 75 / 175 Mayer Goldberg \ Ben-Gurion University

  51. S-expressions: quote & friends Running foo now gives us some interesting data: October 31, 2018 Compiler Construction (cons (cons 'c (cons 'd '())) '())) (cons 'a (cons 'b '())) (cons > (foo '((a b) (c d))) 123 > (foo 123) (cons 'a (cons 1 (cons 'b (cons 2 '())))) > (foo '(a 1 b 2)) (cons 'a (cons 'b (cons 'c '()))) > (foo '(a b c)) 'a > (foo 'a) 76 / 175 Mayer Goldberg \ Ben-Gurion University

  52. S-expressions: quote & friends S-expressions are created using the most basic API. Let’s rewrite foo … Compiler Construction October 31, 2018 77 / 175 ▶ Using the quasiquote mechanism , we got foo to describe how ▶ We should really add support for proper lists and vectors! ▶ In fact, the name describe is far more appropriate than foo … Mayer Goldberg \ Ben-Gurion University

  53. S-expressions: quote & friends ( define describe October 31, 2018 Compiler Construction ( else e)))) (( or ( null ? e) (symbol? e)) `',e) (vector-> list e)))) ,@( map describe `( vector (( vector ? e) ,( describe (cdr e)))) `( cons ,( describe (car e)) ((pair? e) `( list ,@( map describe e))) ( cond (( list ? e) ( lambda (e) 78 / 175 Mayer Goldberg \ Ben-Gurion University

  54. S-expressions: quote & friends Running describe on various S-expressions is very instructive: > (describe '(a b c)) (list 'a 'b 'c) > (describe '#(a b c)) (vector 'a 'b 'c) > (describe '(a b . c)) (cons 'a (cons 'b 'c)) > (describe ''a) (list 'quote 'a) Wait! What’s with the last example?! Compiler Construction October 31, 2018 79 / 175 Mayer Goldberg \ Ben-Gurion University

  55. S-expressions: quote & friends Recall what we said about quote , quasiquote , unquote , & unquote-splicing : Now we get to see this happen… Compiler Construction October 31, 2018 80 / 175 ▶ (quote <sexpr>) ≡ '<sexpr> ▶ (quasiquote <sexpr>) ≡ `<sexpr> ▶ (unquote <sexpr>) ≡ ,<sexpr> ▶ (unquote-splicing <sexpr>) ≡ ,@<sexpr> Mayer Goldberg \ Ben-Gurion University

  56. S-expressions: quote & friends Now we get to see this happen: > (describe ''<sexpr>) (list 'quote '<sexpr>) > (describe '`<sexpr>) (list 'quasiquote '<sexpr>) > (describe ',<sexpr>) (list 'unquote '<sexpr>) > (describe ',@<sexpr>) (list 'unquote-splicing '<sexpr>) Rule: Every Scheme expression used to be an S-expression when it was little! Compiler Construction October 31, 2018 81 / 175 Mayer Goldberg \ Ben-Gurion University

  57. S-expressions: quote & friends Question What is (length '''''''''''''''''moshe) ? Compiler Construction October 31, 2018 82 / 175 👏 17 👏 16 👏 Generates an error message! 👏 1 👎 2 Mayer Goldberg \ Ben-Gurion University

  58. S-expressions: quote & friends Explanation (length '''''''''''''''''moshe) is the same as (length '(quote <something>)) , where <something> is '''''''''''''''moshe , but that really doesn’t matter! We are still computing the length of a list of size 2: Compiler Construction October 31, 2018 83 / 175 ▶ The fjrst element of the list is the symbol quote ▶ The second element of the list is '''''''''''''''moshe Mayer Goldberg \ Ben-Gurion University

  59. S-expressions: quote & friends ( continued ) Question The structure of the S-expression ''a in Scheme is: ())) Compiler Construction October 31, 2018 84 / 175 👏 Just the symbol a 👏 The proper list (quote . (a . ())) 👏 The proper list (quote . (quote . (a . ()))) 👏 An invalid S-expression 👎 The nested proper list (quote . ((quote . (a . ())) . Mayer Goldberg \ Ben-Gurion University

  60. Further reading Compiler Construction October 31, 2018 85 / 175 Mayer Goldberg \ Ben-Gurion University

  61. Chapter 2 Goals Agenda Compiler Construction October 31, 2018 86 / 175 🗹 The pipeline of the compiler 🗹 Introduction to syntactic analysis ☞ Further steps in ocaml ☞ Ocaml ▶ Types ▶ References ▶ Modules & signatures ▶ Functional programming in ocaml Mayer Goldberg \ Ben-Gurion University

  62. Introduction to ocaml (2) Still need to cover To program in ocaml efgectively in this course , we still need to learn some additional topics: What we shan’t cover Object Orientation: Once you’re comfortable with the ocaml, you might like to pick up the object-oriented layer. As object-orientation goes, you should fjnd it to be sophisticated and expressive. Compiler Construction October 31, 2018 87 / 175 ▶ Defjning new data types ▶ Assignments, side-efgects, Mayer Goldberg \ Ben-Gurion University

  63. Types New types are defjned using the type statement: type fraction = {numerator : int; denominator : int};; consisting of two fjelds: numerator & denominator , both of type int . Compiler Construction October 31, 2018 88 / 175 The above statement defjnes a new type fraction as a record Mayer Goldberg \ Ben-Gurion University

  64. Types ( continued ) Once fraction has been defjned, the underlying system recognizes it for all records with these fjelds & types: # {numerator = 2; denominator = 3};; - : fraction = {numerator = 2; denominator = 3} # {denominator = 3; numerator = 2};; - : fraction = {numerator = 2; denominator = 3} the fjelds are accessed through their names, which are converted Compiler Construction October 31, 2018 89 / 175 Notice that the order of the fjelds in a record is immaterial, because consistently into ofgsets. Mayer Goldberg \ Ben-Gurion University

  65. Types ( continued ) The type-inference engine in ocaml will correctly infer newly-defjned October 31, 2018 Compiler Construction - : fraction = {numerator = 22; denominator = 15} {numerator = 4; denominator = 5};; {numerator = 2; denominator = 3} # add_fractions And of course: denominator = d1 * d2};; {numerator = n1 * d2 + n2 * d1; {numerator = n2; denominator = d2} -> | {numerator = n1; denominator = d1}, match f1, f2 with let add_fractions f1 f2 = types: 90 / 175 Mayer Goldberg \ Ben-Gurion University

  66. Types ( continued ) We can defjne disjoint types as follows: type number = | Int of int | Frac of fraction | Float of float;; Compiler Construction October 31, 2018 91 / 175 Think of the | as disjunction. The initial | is optional in ocaml. Mayer Goldberg \ Ben-Gurion University

  67. Types ( continued ) denominator = 4}; October 31, 2018 Compiler Construction elements of the list as belonging to type number . Notice that ocaml had no trouble identifying each of the three Float 3.14159265358979312] Frac {numerator = 3; We can now defjne a list of numbers as follows: [Int 3; - : number list = Float (4.0 *. atan(1.0))];; denominator = 4}; Frac {numerator = 3; # [Int 3; 92 / 175 Mayer Goldberg \ Ben-Gurion University

  68. Types ( continued ) Working with disjoint types Use match to dispatch over the corresponding type constructor, and make sure you handle each and every possibility! let number_to_string x = match x with | Int n -> Format.sprintf "%d" n | Frac {numerator = num; denominator = den} -> Format.sprintf "%d/%d" num den | Float x -> Format.sprintf "%f" x;; Compiler Construction October 31, 2018 93 / 175 Mayer Goldberg \ Ben-Gurion University

  69. Types ( continued ) Working with disjoint types ( continued ) And here’s how it looks: # number_to_string (Int 234);; - : string = "234" # number_to_string (Frac {numerator = 2; denominator = 5});; - : string = "2/5" # number_to_string (Float 234.234);; - : string = "234.234000" Compiler Construction October 31, 2018 94 / 175 Mayer Goldberg \ Ben-Gurion University

  70. References Let us take another look at the record-type. Recall the defjnition of fraction : # type fraction = {numerator : int; denominator : int};; type fraction = { numerator : int; denominator : int; } In the function add_fractions we used pattern-matching to access the record-fjelds. Compiler Construction October 31, 2018 95 / 175 Mayer Goldberg \ Ben-Gurion University

  71. References ( continued ) Ocaml lets you access fjelds directing, using the dot-notation that is # {numerator = 3; denominator = 5}.numerator;; - : int = 3 # {numerator = 3; denominator = 5}.denominator;; - : int = 5 Compiler Construction October 31, 2018 96 / 175 familiar from object-oriented programming: Mayer Goldberg \ Ben-Gurion University

  72. References ( continued ) Ocaml ofgers a special record-type known as a reference. # {contents = 1234};; - : int ref = {contents = 1234} # {contents = 1234}.contents;; - : int = 1234 # ! {contents = 1234};; - : int = 1234 Compiler Construction October 31, 2018 97 / 175 ▶ References are derived types. For any type α , we can have a type α ref . ▶ References are records with a single fjeld contents ▶ References have a special syntax ! to dereference the fjeld: Mayer Goldberg \ Ben-Gurion University

  73. References ( continued ) - : unit = () October 31, 2018 Compiler Construction - : int = 4567 # !x;; - : int ref = {contents = 4567} # x;; # x := 4567;; - : int = 1234 # !x;; - : int ref = {contents = 1234} # x;; val x : int ref = {contents = 1234} # let x = ref 1234;; 98 / 175 ▶ References have a special syntax := for assignment ▶ This is how assignments are managed in ocaml Mayer Goldberg \ Ben-Gurion University

  74. References ( continued ) # let x = "abc";; val x : string = "abc" # x := "def";; Characters 0-1: x := "def";; ^ Error: This expression has type string but an expression was expected of type 'a ref Compiler Construction October 31, 2018 99 / 175 ▶ It is not possible to perform assignments on variables ▶ It is only possible to change the fjelds of reference types Mayer Goldberg \ Ben-Gurion University

  75. References ( continued ) # !x := 9876;; October 31, 2018 Compiler Construction - : int ref ref = {contents = {contents = 9876}} # x;; - : unit = () - : int ref ref = {contents = {contents = 5678}} # x;; - : unit = () # x := ref 5678;; val x : int ref ref = {contents = {contents = 1234}} # let x = ref (ref 1234);; other reference types: 100 / 175 ▶ You can defjne a reference type of any other type, including Mayer Goldberg \ Ben-Gurion University

Recommend


More recommend