this slide contains no jokes how to write compilers an d
play

This slide contains no jokes. How to Write Compilers an d solve - PowerPoint PPT Presentation

This slide contains no jokes. How to Write Compilers an d solve data transformation problems. Shevek shevek@anarres.org shevek@nebula.com Introduction I have written a lot of compilers. Sequential and parallel languages. Machine


  1. This slide contains no jokes.

  2. How to Write Compilers an d solve data transformation problems. Shevek shevek@anarres.org shevek@nebula.com

  3. Introduction ● I have written a lot of compilers. ● Sequential and parallel languages. ● Machine code and interpreted. ● Some of them didn't work! ● This presentation contains practical knowledge. ● Almost nothing in it is researched. ● The gaps are stuff I don't know, or don't need. ● Some basic material is omitted. ● Read a book. ● Ask questions! Already!

  4. Nebula http://www.nebula.com/careers/

  5. Example of a Compiler

  6. The Mini-language Philosophy ● Describe your problem in some language. ● Implement the ● Write a compiler description in code. for that language. ● 1 month ● 1 month ● Update the problem description. ● Reimplement the ● Run your description in code. compiler again. ● 1 month ● 1 week!

  7. Why Write a Compiler? ● Save time over the life of a project. ● Simplify the runtime. ● Detect errors earlier.

  8. Errors in Development IDE highlights bug From static analyzer Code leaves developer's desktop Runtime errors detected Say prayers Kind of expensive

  9. Why Not Write a Compiler? ● Reinventing world, plus dog. ● Usually an expression parser. ● Write a library for an existing language. ● Use fluent APIs. ● Cognitive load. ● This applies to annotations as well. – Note to self: Rag on python a bit. ● Confusing education with production.

  10. General Engineering ● TIP: Write type-safe code. ● TIP: Write synchronous code. ● TIP: Write code with a well-defined thread contract. ● TIP: Use monads or immutable structures.

  11. Overview of a Compiler ● Treat the stages independently. ● Give each stage a contract. ● TIP: Make each stage check its input. ● TIP: Treat the input as a read-only structure.

  12. Example Stage: The Front End ● Each phase: ● Consumes some previous outputs. ● Produces one immutable output. ● The last phase ● Produces a consolidated output structure, which is also immutable. ● TIP: Use Context and InstanceMap patterns.

  13. Mistakes ● Writing an ugly monolith. ● Cannot debug or modify. ● Modifying data structures. ● Cannot localize bugs. ● No clear contract for the structure. ● Attaching metadata to the parse tree. ● It looks like a christmas tree.

  14. Assembling Compilers ● A compiler S → T might consist of: ● S → X, X → Y, Y → T ● TIP: Write the data structures first. ● TIP: Use service discovery. ● TIP: Declare phase dependencies. ● TIP: Use jgrapht. ● TIP: Dijkstra's shortest path. ● TIP: Print everything. ● TIP: Use graphviz.

  15. Language Design ● Languages should have low information content. ● Low information content (or high redundancy) increases the ability of the compiler to detect errors. ● An arbitrary (randomly generated) input should have a high probability of being invalid. ● Note to self: Rag on Perl a bit, but not as much as python. ● See @Override in Java, not present in C++. ● The output of a compiler will probably have very high information content.

  16. More Language Design ● The programmer has better things to do than: ● Format code. – Make your code auto-formatable. – Do not make whitespace significant. ● Work out the valid options for … – Parameter types to an overloaded function. – Available symbols, variables or types. ● Keep track of whether a variable exists. – Grrrrr. ● You might not need a syntax. ● Just add semantics to an existing syntax, e.g. XML.

  17. Lexer and Parser ● The lexer turns a sequence of characters into a sequence of tokens. ● The parser turns the sequence of tokens into a parse tree. ● They are generally rules-based.

  18. Parsers: Ambiguity ● Sentence = NounPhrase Verb NounPhrase . ● NounPhrase = Article Adjective? Noun The old man the boats. ● What happened?

  19. Parsers: A Parse Tree

  20. Parsers: LL ● An example parser for SQL public Statement parse_statement() { Token t = token(); switch (t.type) { case SELECT: return parse_select(); ... } } public Select parse_select() { Select s = new Select(); s.expressions = parse_expression_list(); s.tables = parse_from_list(); s.where = parse_where_clause(); return s; }

  21. Parsers: LL s.expressions = parse_expression_list(); s.tables = parse_from_list(); s.where = parse_where_clause();

  22. Parsers: LL s.expressions = parse_expression_list(); s.tables = parse_from_list(); s.where = parse_where_clause();

  23. Parsers: LL s.expressions = parse_expression_list(); s.tables = parse_from_list(); s.where = parse_where_clause();

  24. Parsers: LR ● An LR parser looks at the top few tokens on the stack, and decides one of two things: ● Push the new token onto the stack. ● Reduce the top of the stack to a compound token. ● LR parsers tend to be automatically generated. ● TIP: Always use LR parsers.

  25. Parsers: LR Stack: select expression_list from_list where a0 < 5 ... → expr Rule: a0 < 5

  26. Parsers: LR Stack: select expression_list from_list where expr and b1 > 7 … → expr Rule: b1 > 7

  27. Parsers: LR Stack: select expression_list from_list where expr and expr … → expr Rule: expr and expr

  28. Parsers: LR Stack: select expression_list from_list where expr ... → Rule: where expr where_clause

  29. Parsers: Ambiguity ● Expressions: ● Lists: ● 5 + 6 ● (6, 7, 8) ● 3 * (5 + 6) ● (6, 7) ● (5 + 6) ● () ● (6) ● (6) ● F ^HO ops! ● Perl has +{}, @x = (6), $x = (6) ● Look for Meredith Patterson's 28c3 talk.

  30. Parsers: Shift-Reduce Conflict statement: {ifelse} if ( condition ) then statement else statement | {if} if (condition) then statement ; if (a) if (bar) foo(); else baz(); if (a) if (a) if (bar) if (bar) foo(); foo(); else else baz(); baz();

  31. Parsers: Shift-Reduce Conflict statement: {ifelse} if ( condition ) then statement else statement | {if} if (condition) then statement ; if (a) if (bar) foo(); else baz(); Initial stack: if[0] condition if[1] condition statement else … Rule: … if condition statement → statement Result: if[0] condition statement [1] Rule: … if condition statement [ else statement] → (shift else ) Result: if[0] condition if[1] condition statement else

  32. Parsers: Factoring in LR statement = {no_dangling} no_dangling_statement { -> no_dangling_statement.statement } | {dangling} dangling_statement { -> dangling_statement.statement } ; /* productions NOT ending in 'statement' */ no_dangling_statement { -> statement } = {comp} compound_statement { -> compound_statement.statement } | {exp} expression_statement { -> expression_statement.statement } | {jmp} jump_statement { -> jump_statement.statement } | {if_else} kw_if tok_lpar expression tok_rpar no_dangling_statement kw_else [other]:no_dangling_statement { -> New statement.if_else(expression, no_dangling_statement.statement, other.statement) } | {catch} catch_statement { -> catch_statement.statement } ; /* productions ending in 'statement' */ dangling_statement { -> statement } = {label} labeled_statement { -> labeled_statement.statement } | {select} selection_statement { -> selection_statement.statement } | {iter} iteration_statement { -> iteration_statement.statement } ;

  33. Parsers: Priority in Bison ● For reference only: ... %nonassoc LOWER_THAN_ELSE %nonassoc L_ELSE ... %% ... statement : ... | L_IF '(' nv_list_exp ')' statement opt_else | ... ; opt_else : %prec LOWER_THAN_ELSE | L_ELSE statement ;

  34. Parsers: PEGs ● Like an LR CFG but where rules have priority. statement: {ifelse} if ( condition ) then statement else statement | {if} if (condition) then statement ; ● PEG makes a deterministic decision in case of ambiguity. ● This also means your language design is a dog's breakfast.

  35. SableCC ● A beautiful Java LR parser generator. ● Somewhat under-documented. ● TIP: Use SableCC.

  36. SableCC: Example Grammar ● A fragment of a SableCC grammar: /* 6.5.9 equality-expression */ equality_expression { -> expression } = {no} relational_expression { -> relational_expression.expression } | {eq} equality_expression tok_eq_eq relational_expression { -> New expression.eq( equality_expression.expression, relational_expression.expression) }| {ne} equality_expression tok_ne relational_expression { -> New expression.ne( equality_expression.expression, relational_expression.expression) }; /* 6.5.10 AND-expression */ and_expression { -> expression } = {no} equality_expression { -> equality_expression.expression } | {and} and_expression tok_and equality_expression { -> New expression.bitwise_and( and_expression.expression, equality_expression.expression) };

  37. Concrete Syntax Trees ● Lists are parsed using recursion. /* 6.5.2 argument-expression-list */ argument_expression_list = {single} assignment_expression | {list} argument_expression_list tok_comma assignment_expression ; Ugly, and doesn't even fit on the slide. ● We can embed code to disengangle it. argument_list : argument { $$ = newAV(); av_push($$, $1); } | argument_list ',' argument { av_push($1, $3); $$ = $1; } ;

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend