1dl321 kompilatorteknik i compiler design 1
play

1DL321: Kompilatorteknik I (Compiler Design 1) Course home page: - PowerPoint PPT Presentation

Administrivia Lecturer: Kostis Sagonas ( kostis@it.uu.se ) 1DL321: Kompilatorteknik I (Compiler Design 1) Course home page: http://user.it.uu.se/~kostis/Teaching/KT1-12/ Introduction to Programming Assistants: Language Design


  1. Administrivia • Lecturer: – Kostis Sagonas ( kostis@it.uu.se ) 1DL321: Kompilatorteknik I (Compiler Design 1) • Course home page: http://user.it.uu.se/~kostis/Teaching/KT1-12/ Introduction to Programming • Assistants: Language Design and to Compilation – Stavros Aronis ( stavros.aronis@it.uu.se ) – Andreas Löscher ( andreas.loscher@it.uu.se ) – responsible for the lessons and the assignments Course Structure Course Literature • Course has theoretical and practical aspects • Need both in programming languages! • Written examination = theory (4 points) • first exam scheduled for 11th January 2013 • Three assignments = practice (1 point) – Electronic hand-in to the assistants before the corresponding deadline – You can submit one late assignment if you need to but it cannot be later than the deadline of the next assignment (for 1 and 2) or the exam (for 3)

  2. Academic Honesty The Compiler Project • For assignments you are allowed to work in • A follow-up course pairs (but no threesomes/foursomes/...) • that will be taught in period 3 • Don’t use work from uncited sources • and will allow you to see the material you have – Including old assignments learned in KT1 in practice • by building a complete compiler • for a small (toy?) language PLAGIARISM How are Languages Implemented? Language Implementations • Two major strategies: • Batch compilation systems dominate – Interpreters (older, less studied) – gcc – Compilers (newer, much more studied) • Some languages are primarily interpreted • Interpreters run programs “as is” – Java bytecode – Little or no preprocessing – Postscript • Some environments (e.g. Lisp) provide both • Compilers do extensive preprocessing – Interpreter for development – Compiler for production

  3. (Short) History of High-Level Languages FORTRAN I • 1953 IBM develops the 701 • 1954 IBM develops the 704 • John Backus • Till then, all programming done in assembly – Idea: translate high-level code to assembly – Many thought this impossible • Problem: Software costs exceeded hardware • Had already failed in other projects • 1954-7 FORTRAN I project costs! • By 1958, >50% of all software is in FORTRAN • John Backus: “Speedcoding” • Cut development time dramatically – An interpreter – (2 weeks → 2 hours) – Ran 10-20 times slower than hand-written assembly FORTRAN I The Structure of a Compiler • The first compiler 1. Lexical Analysis – Produced code almost as good as hand-written 2. Syntax Analysis – Huge impact on computer science 3. Semantic Analysis 4. IR Optimization • Led to an enormous body of theoretical work 5. Code Generation 6. Low-level Optimization • Modern compilers preserve the outlines of the FORTRAN I compiler The first 3 phases can be understood by analogy to how humans comprehend natural languages (e.g. Swedish or English).

  4. Lexical Analysis More Lexical Analysis • First step: recognize words. • Lexical analysis is not trivial. Consider: – Smallest unit above letters ist his ase nte nce This is a sentence. • Plus, programming languages are typically more cryptic than English: • Note the * p->f ++ = -.12345e-5 – Capital “T” (start of sentence symbol) – Blank “ ” (word separator) – Period “.” (end of sentence symbol) And More Lexical Analysis Parsing • Lexical analyzer divides program text into • Once words are understood, the next step is “words” or “tokens” to understand the sentence structure if (x == y) then z = 1; else z = 2; • Parsing = Diagramming Sentences • Units: – The diagram is a tree if, (, x, ==, y, ), then, z, =, 1, ;, else, z, =, 2, ;

  5. Diagramming a Sentence (1) Diagramming a Sentence (2) T his line is a lo ng e r se nte nc e T his line is a lo ng e r se nte nc e artic le no un ve rb artic le adje c tive no un artic le no un ve rb artic le adje c tive no un no un phrase subje c t o bje c t no un phrase ve rb phrase se nte nc e se nte nc e Parsing Programs Semantic Analysis • Parsing program expressions is the same • Once the sentence structure is understood, we can try to understand its “meaning” • Consider: – But meaning is too hard for compilers I f (x == y) the n z = 1; e lse z = 2; • Diagrammed: • Most compilers perform limited analysis to x == y z = 1 z = 2 catch inconsistencies re latio n assig nme nt assig nme nt • Some optimizing compilers do more analysis pre dic ate the n-stmt e lse -stmt to improve the performance of the program if-the n-e lse

  6. Semantic Analysis in English Semantic Analysis in Programming Languages • Example: • Programming languages define strict rules to Jack said Jerry left his assignment at home. avoid such ambiguities What does “his” refer to? Jack or Jerry? { int Jac k = 3; • This C++ code prints “4”; { • Even worse: the inner definition is Jack said Jack left his assignment at home? int Jac k = 4; used How many Jacks are there? c o ut << Jac k; } Which one left the assignment? } More Semantic Analysis Optimization • Compilers perform many semantic checks • No strong counterpart in English, but akin to besides variable bindings editing • Automatically modify programs so that they – Run faster • Example: – Use less memory/cache/power Arnold left her homework at home. – In general, conserve some resource more economically • A “type mismatch” between her and Arnold; • The compilers project has no optimization we know they are different people component – Presumably Arnold is male – for those interested, there is also the “Advanced Compiler Design (KT2)” course !

  7. Optimization Example Code Generation • Produces assembly code (usually) • A translation into another language X = Y * 0 is the same as X = 0 – Analogous to human translation NO! Valid for integers, but not for floating point numbers Intermediate Languages Intermediate Languages (Cont.) • Many compilers perform translations between • IL’s are useful because lower levels expose successive intermediate forms features hidden by higher levels – All but first and last are intermediate languages – registers internal to the compiler – memory/frame layout – Typically there is one IL – etc. • IL’s generally ordered in descending level of • But lower levels obscure high-level meaning abstraction – Highest is source – Lowest is assembly

  8. Issues Compilers Today • Compiling is almost this simple, but there are • The overall structure of almost every compiler many pitfalls adheres to our outline • Example: How are erroneous programs • The proportions have changed since FORTRAN handled? – Early: • lexical analysis, parsing most complex, expensive – Today: • Language design has big impact on compiler • semantic analysis and optimization dominate all other – Determines what is easy and hard to compile phases; lexing and parsing are well-understood and cheap – Course theme: many trade-offs in language design Current Trends in Compilation Programming Language Economics • Compilation for speed is less interesting. • Programming languages are designed to fill a void However, there are exceptions: – enable a previously difficult/impossible application – scientific programs – orthogonal to language design quality (almost) – advanced processors (Digital Signal Processors, advanced speculative architectures, GPUs) • Programming training is the dominant cost – Languages with a big user base are replaced rarely • Ideas from compilation used for improving – Popular languages become ossified code reliability: – but it is easy to start in a new niche... – memory safety – detecting data races – security properties – ...

  9. Why so many Programming Languages? Topic: Language Design • Application domains have distinctive (and • No universally accepted metrics for design sometimes conflicting) needs • Examples: • “A good language is one people use” – Scientific computing : High performance – Business : report generation • NO ! – Artificial intelligence : symbolic computation – Is COBOL the best language? – Systems programming : efficient low-level access – Other special purpose languages... • Good language design is hard Language Evaluation Criteria History of Ideas: Abstraction • Abstraction = detached from concrete details Characteristic Criteria • Necessary for building software systems Readability Writeability Reliability • Modes of abstraction: YES Simplicity YES YES – Via languages/compilers Data types YES YES YES • higher-level code; few machine dependencies Syntax design YES YES YES – Via subroutines Abstraction YES YES • abstract interface to behavior – Via modules Expressivity YES YES • export interfaces which hide implementation Type checking YES – Via abstract data types Exceptions YES • bundle data with its operations

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend