c ompiling a l anguage
play

C OMPILING A L ANGUAGE DCC 888 Dealing with Programming Languages - PowerPoint PPT Presentation

Universidade Federal de Minas Gerais Department of Computer Science Programming Languages Laboratory C OMPILING A L ANGUAGE DCC 888 Dealing with Programming Languages LLVM gives developers many tools to interpret or compile a


  1. Universidade Federal de Minas Gerais – Department of Computer Science – Programming Languages Laboratory C OMPILING A L ANGUAGE DCC 888

  2. Dealing with Programming Languages • LLVM gives developers many tools to interpret or compile a language: – The intermediate representaDon – Lots of analyses and opDmizaDons When is it worth designing a new • We can work on a language that already language? exists, e.g., C, C++, Java, etc • We can design our own language. We need a front Machine independent Machine dependent end to convert optimizations, such as optimizations, such programs in the constant propagation as register allocation source language 2344555 to LLVM IR *+,-), 1#% ((0 !"#$%&'() '()./0 '()./0 '().-

  3. The Simple Calculator • To illustrate this capacity of LLVM, let's design a very simple programming language: – A program is a funcDon applicaDon – A funcDon contains only one argument x – Only the integer type exists – The funcDon body contains only addiDons, mulDplicaDons, references to x , and integer constants in polish notaDon: 1) Can you understand why we got each of these values? 2) How is the grammar of our language?

  4. The Architecture of Our Compiler !"#"$ 2$34"$ %&$'"$ (#)$ !!*05136' 1) Can you guess the meaning of the *&$(#)$ 0,1(#)$ different arrows? 2) Can you guess the +,-(#)$ .//(#)$ role of each class? 3) What would be a good execuDon mode for our system?

  5. The ExecuDon Engine Our execuDon engine parses the expression, converts it to a funcDon wriXen in LLVM IR, JIT $> ./driver 4 � * x x � compiles this funcDon, and runs it with the Result: 16 � argument passed to the program in command $> ./driver 4 � line. + x * x 2 � Let's start with Result: 12 � our lexer. Which tokens do we ; ModuleID = 'Example' $> ./driver 4 � have? * x + x 2 � Result: 24 � define i32 @fun(i32 %x) { entry: %addtmp = add i32 %x, 2 %multmp = mul i32 %x, %addtmp ret i32 %multmp }

  6. Lexer.h � The Lexer • A lexer is a program that divides a string of characters into tokens. – A token is a terminal in our grammar, e.g., #ifndef LEXER_H a symbol that is part of the alphabet of our language. #define LEXER_H #include <string> – Lexers can be easily implemented as class Lexer { finite automata. public: std::string getToken(); Lexer() : lastChar(' ') {} 1) Again: which kind of private: tokens do we have? char lastChar; inline char getNextChar() { 2) Can you guess the char c = lastChar; implementaDon of the lastChar = getchar(); getToken() method? return c; } }; #endif

  7. Lexer.cpp � ImplementaDon of the Lexer #include "Lexer.h" std::string Lexer::getToken() { while (isspace(lastChar)) { lastChar = getchar(); } if (isalpha(lastChar)) { std::string idStr; do { idStr += getNextChar(); } while (isalnum(lastChar)); return idStr; } else if (isdigit(lastChar)) { std::string numStr; do { numStr += getNextChar(); } while (isdigit(lastChar)); return numStr; } else if (lastChar == EOF) { 1) Would you be able to return ""; represent this lexer as } else { a state machine? std::string operatorStr; operatorStr = getNextChar(); 2) We must now define return operatorStr; the parser. How can } we implement it? }

  8. Parser.cpp � Parsing • Parsing is the act to transform a string of tokens in a syntax tree ♤ . #ifndef PARSER_H 1) What are these #define PARSER_H forward declaraDons good for? #include <string> class Expr; 2) Do you understand class Lexer; this syntax? class Parser { 3) What does the parser public: return? Parser(Lexer* argLexer) : lexer(argLexer) {} Expr* parseExpr(); private: Lexer* lexer; }; #endif ♤ : it used to be one of the most important problems in computer science.

  9. Syntax Trees • The parser produces syntax trees. * x x + x * x 2 * x + x 2 + * * x x x * x + x 2 2 x How can we implement these trees in C++?

  10. Expr.h � The Nodes of the Tree #ifndef AST_H #define AST_H #include "llvm/IR/IRBuilder.h" class AddExpr : public Expr { public: class Expr { AddExpr(Expr* op1Arg, Expr* op2Arg) : public: op1(op1Arg), op2(op2Arg) {} virtual ~Expr() {} llvm::Value *gen(llvm::IRBuilder<> *builder, virtual llvm::Value *gen(llvm::IRBuilder<> *builder, llvm::LLVMContext& con) const; llvm::LLVMContext& con) const = 0; private: }; const Expr* op1; const Expr* op2; class NumExpr : public Expr { }; public: NumExpr(int argNum) : num(argNum) {} class MulExpr : public Expr { llvm::Value *gen(llvm::IRBuilder<> *builder, public: llvm::LLVMContext& con) const; MulExpr(Expr* op1Arg, Expr* op2Arg) : staDc const unsigned int SIZE_INT = 32; op1(op1Arg), op2(op2Arg) {} private: llvm::Value *gen(llvm::IRBuilder<> *builder, const int num; llvm::LLVMContext& con) const; }; private: const Expr* op1; class VarExpr : public Expr { const Expr* op2; public: There is a gen method }; llvm::Value *gen(llvm::IRBuilder<> *builder, that is a bit weird. We llvm::LLVMContext& con) const; #endif shall look into it later. staDc llvm::Value* varValue; };

  11. Going Back into the Parser • Our parser will build a syntax tree. &%'()**+,-#. + x * x 2 !"#$%# ((((&%'(/"#+,-.01 ((((&%'(234+,-#. + ((((((((&%'(/"#+,-#.01 ((((((((&%'(536+,-#.70 x * ((((0 0 2 x The polish notaDon really So, how can we simplifies parsing. We implement our already have the tree, and parser? without parentheses! Jan Łukasiewicz, father of the Polish notaDon

  12. Parser.cpp � The Parser's ImplementaDon Expr* Parser::parseExpr() { #include "Expr.h" std::string tk = lexer‐>getToken(); #include "Lexer.h" if (tk == "") { #include "Parser.h" return NULL; } else if (isdigit(tk[0])) { 1) Why checking the first return new NumExpr(atoi(tk.c_str())); character of each token is } else if (tk[0] == 'x') { already enough to avoid any return new VarExpr(); ambiguity? } else if (tk[0] == '+') { Expr *op1 = parseExpr(); 2) Now we need a way to Expr *op2 = parseExpr(); translate trees into LLVM IR. return new AddExpr(op1, op2); How to do it? } else if (tk[0] == '*') { Expr *op1 = parseExpr(); !"#$%&'() '()156 "$; Expr *op2 = parseExpr(); <,!=), return new MulExpr(op1, op2); *+,-), ./#,12)"34 7#% 89: } else { return NULL; } ./#,0 '()17#%1- }

  13. Expr.cpp � The Translator #include "Expr.h" Our implementaDon has a llvm::Value* VarExpr::varValue = NULL; small hack: our language has only one variable, which llvm::Value* NumExpr::gen we have decided to call 'x'. (llvm::IRBuilder<> *builder, llvm::LLVMContext &context) const { This variable must be return llvm::ConstantInt::get (llvm::Type::getInt32Ty(context), num); represented by an LLVM } value, which is the llvm::Value* VarExpr::gen argument of the funcDon (llvm::IRBuilder<> *builder, llvm::LLVMContext &context) const { that we will create. Thus, llvm::Value* var = VarExpr::varValue; we need a way to inform return var ? var : NULL; } the translator this value. We llvm::Value* AddExpr::gen do it through a staDc (llvm::IRBuilder<> *builder, llvm::LLVMContext &context) const { variable varValue . That is llvm::Value* v1 = op1‐>gen(builder, context); the only staDc variable that llvm::Value* v2 = op2‐>gen(builder, context); we are using in this class. return builder‐>CreateAdd(v1, v2, "addtmp"); } llvm::Value* MulExpr::gen (llvm::IRBuilder<> *builder, llvm::LLVMContext &context) const { llvm::Value* v1 = op1‐>gen(builder, context); llvm::Value* v2 = op2‐>gen(builder, context); return builder‐>CreateMul(v1, v2, "multmp"); }

  14. Driver.cpp � The Driver's Skeleton int main(int argc, char** argv) { The procedure if (argc != 2) { that creates an llvm::errs() << "Inform an argument to your expression.\n"; LLVM funcDon is return 1; not that } else { complicated. Can llvm::LLVMContext context; you guess its llvm::Module *module = new llvm::Module("Example", context); implementaDon? llvm::FuncDon *funcDon = createEntryFunc=on (module, context); module‐>dump(); llvm::ExecuDonEngine* engine = createEngine(module); JIT(engine, funcDon, atoi(argv[1])); } } !"#$%&'() '()156 "$; <,!=), *+,-), ./#,12)"34 7#% 89: ./#,0 '()17#%1-

  15. Driver.cpp � CreaDng an LLVM FuncDon llvm::FuncDon *createEntryFuncDon( This code is not "that" complicated, but it llvm::Module *module, is not super straighworward either, so we llvm::LLVMContext &context) { will go a bit more carefully over it. llvm::Func=on *func=on = llvm::cast<llvm::Func=on>( module‐>getOrInsertFunc=on("fun", llvm::Type::getInt32Ty(context), llvm::Type::getInt32Ty(context), (llvm::Type *)0) ); llvm::BasicBlock *bb = llvm::BasicBlock::Create(context, "entry", funcDon); llvm::IRBuilder<> builder(context); builder.SetInsertPoint(bb); Let's start with llvm::Argument *argX = funcDon‐>arg_begin(); this humongous argX‐>setName("x"); call. What do you VarExpr::varValue = argX; think it is doing? Lexer lexer; Parser parser(&lexer); Expr* expr = parser.parseExpr(); llvm::Value* retVal = expr‐>gen(&builder, context); builder.CreateRet(retVal); return funcDon; }

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend