programming languages
play

Programming Languages Janyl Jumadinova September 10-15, 2020 Janyl - PowerPoint PPT Presentation

Programming Languages Janyl Jumadinova September 10-15, 2020 Janyl Jumadinova Programming Languages September 10-15, 2020 1 / 25 Most Important Steps in Compilation Janyl Jumadinova Programming Languages September 10-15, 2020 2 / 25


  1. Programming Languages Janyl Jumadinova September 10-15, 2020 Janyl Jumadinova Programming Languages September 10-15, 2020 1 / 25

  2. Most Important Steps in Compilation Janyl Jumadinova Programming Languages September 10-15, 2020 2 / 25

  3. Lexical Analysis Lexical analysis produces a “token stream” in which the progam is reduced to a sequence of token types, each with its identifying number and the actual string (in the program) corresponding to it. Janyl Jumadinova Programming Languages September 10-15, 2020 3 / 25

  4. Lexical Analysis For each token type, give a description: either a literal string – “ ≤ ” or “while” to describe an operator or reserved word, Janyl Jumadinova Programming Languages September 10-15, 2020 4 / 25

  5. Lexical Analysis For each token type, give a description: either a literal string – “ ≤ ” or “while” to describe an operator or reserved word, or a < rule > – the rule < unsigned int > might stand for “a sequence of one or more digits”; the rule < identifier > might stand for “a letter followed by a sequence of zero or more letters or digits.” Janyl Jumadinova Programming Languages September 10-15, 2020 4 / 25

  6. Typical Tokens in Programming Languages Operators and Punctuation + - * / ( ) [ ] ; : :: < <= == = != ! ...! Each of these is a district lexical class Janyl Jumadinova Programming Languages September 10-15, 2020 5 / 25

  7. Typical Tokens in Programming Languages Operators and Punctuation + - * / ( ) [ ] ; : :: < <= == = != ! ...! Each of these is a district lexical class Keywords if while for goto return switch void ... Each of these is also a distinct lexical class (not a string) Janyl Jumadinova Programming Languages September 10-15, 2020 5 / 25

  8. Typical Tokens in Programming Languages Operators and Punctuation + - * / ( ) [ ] ; : :: < <= == = != ! ...! Each of these is a district lexical class Keywords if while for goto return switch void ... Each of these is also a distinct lexical class (not a string) Identifiers (variables) A single ID lexical class, but parameterized by actual identifier (often a pointer into a symbol table) Janyl Jumadinova Programming Languages September 10-15, 2020 5 / 25

  9. Typical Tokens in Programming Languages Operators and Punctuation + - * / ( ) [ ] ; : :: < <= == = != ! ...! Each of these is a district lexical class Keywords if while for goto return switch void ... Each of these is also a distinct lexical class (not a string) Identifiers (variables) A single ID lexical class, but parameterized by actual identifier (often a pointer into a symbol table) Integer constants A single INT lexical class, but parameterized by numeric value Janyl Jumadinova Programming Languages September 10-15, 2020 5 / 25

  10. Typical Tokens in Programming Languages Operators and Punctuation + - * / ( ) [ ] ; : :: < <= == = != ! ...! Each of these is a district lexical class Keywords if while for goto return switch void ... Each of these is also a distinct lexical class (not a string) Identifiers (variables) A single ID lexical class, but parameterized by actual identifier (often a pointer into a symbol table) Integer constants A single INT lexical class, but parameterized by numeric value Other constants (string, floating point, boolean, ...), etc. Janyl Jumadinova Programming Languages September 10-15, 2020 5 / 25

  11. Lexical Complications Most modern languages are free-form Layout doesn’t matter White space separates tokens Alternatives Haskell, Python - indentation and layout can imply grouping Janyl Jumadinova Programming Languages September 10-15, 2020 6 / 25

  12. Regular Expressions used for Scanning Defined over some alphabet � . For programming languages, alphabet is usually ASCII or Unicode. If re is a regular expression, L(re) is the language (set of strings) generated by re . Janyl Jumadinova Programming Languages September 10-15, 2020 7 / 25

  13. Fundamentals of Regular Expressions (REs) These are the basic building blocks that other REs are built from. Janyl Jumadinova Programming Languages September 10-15, 2020 8 / 25

  14. Operations on REs Janyl Jumadinova Programming Languages September 10-15, 2020 9 / 25

  15. Operations on REs Precedence: (R), R*, R 1 R 2 , R 1 | R 2 (lowest). Parenthesis can be used to group REs as needed. Janyl Jumadinova Programming Languages September 10-15, 2020 9 / 25

  16. Examples Janyl Jumadinova Programming Languages September 10-15, 2020 10 / 25

  17. Abbreviations on REs There are common abbreviations used for convenience. Janyl Jumadinova Programming Languages September 10-15, 2020 11 / 25

  18. Example Possible syntax for numeric constants digit ::= [0-9] digits ::= digit + number ::= digits ( . digits )? ([eE] (+ | -)? digits )? Notice that this allows (unnecessary) leading 0s, e.g., 00045.6. (0, or 0.14 would be necessary 0s). Janyl Jumadinova Programming Languages September 10-15, 2020 12 / 25

  19. Example Possible syntax for numeric constants digit ::= [0-9] nonzero digit ::= [1-9] digits ::= digit + number ::= (0 | nonzero digit digits?) ( . digits )? ([eE] (+ | -)? digits )? Janyl Jumadinova Programming Languages September 10-15, 2020 13 / 25

  20. RE Practice: https://regexone.com/ Janyl Jumadinova Programming Languages September 10-15, 2020 14 / 25

  21. Syntactic Analysis The syntax of a language is described by a grammar that specifies the legal combinations of tokens. Janyl Jumadinova Programming Languages September 10-15, 2020 15 / 25

  22. Syntactic Analysis The syntax of a language is described by a grammar that specifies the legal combinations of tokens. Grammars are often specified in BNF notation (“Backus Naur Form”): Janyl Jumadinova Programming Languages September 10-15, 2020 15 / 25

  23. Syntactic Analysis The syntax of a language is described by a grammar that specifies the legal combinations of tokens. Grammars are often specified in BNF notation (“Backus Naur Form”): <item1> ::= valid replacements for <item1> <item2> ::= valid replacements for <item2> Janyl Jumadinova Programming Languages September 10-15, 2020 15 / 25

  24. Alternative Notations There are several syntax notations for productions in common use; all mean the same thing. E.g.: ifStmt ::= if ( expr ) statement ifStmt → if ( expr ) statement <ifStmt> ::= if ( <expr> ) <statement> Janyl Jumadinova Programming Languages September 10-15, 2020 16 / 25

  25. Example: Grammar for Pigese (or Pigish?) A formal grammar for a “pig language” could be: PigTalk ::= oink PigTalk (Rule 1) (Rule 2) | oink! Janyl Jumadinova Programming Languages September 10-15, 2020 17 / 25

  26. Example: Grammar for Pigese (or Pigish?) A formal grammar for a “pig language” could be: PigTalk ::= oink PigTalk (Rule 1) (Rule 2) | oink! PigTalk can then generate, for example: (Rule 2) PigTalk ::= oink! 1 Janyl Jumadinova Programming Languages September 10-15, 2020 17 / 25

  27. Example: Grammar for Pigese (or Pigish?) A formal grammar for a “pig language” could be: PigTalk ::= oink PigTalk (Rule 1) (Rule 2) | oink! PigTalk can then generate, for example: (Rule 2) PigTalk ::= oink! 1 PigTalk ::= oink PigTalk (Rule 1) 2 ::= oink oink! Janyl Jumadinova Programming Languages September 10-15, 2020 17 / 25

  28. Example: Grammar for Pigese (or Pigish?) A formal grammar for a “pig language” could be: PigTalk ::= oink PigTalk (Rule 1) (Rule 2) | oink! PigTalk can then generate, for example: (Rule 2) PigTalk ::= oink! 1 PigTalk ::= oink PigTalk (Rule 1) 2 ::= oink oink! PigTalk ::= oink PigTalk (Rule 1) 3 ::= oink oink PigTalk (Rule 1) ::= oink oink oink! (Rule 2) Janyl Jumadinova Programming Languages September 10-15, 2020 17 / 25

  29. Grammars (Context-free Gramars) Collection of VARIABLES (things that can be replaced by other things), also called NON-TERMINALS. Janyl Jumadinova Programming Languages September 10-15, 2020 18 / 25

  30. Grammars (Context-free Gramars) Collection of VARIABLES (things that can be replaced by other things), also called NON-TERMINALS. Collection of TERMINALS (“constants”, strings that can’t be replaced) Janyl Jumadinova Programming Languages September 10-15, 2020 18 / 25

  31. Grammars (Context-free Gramars) Collection of VARIABLES (things that can be replaced by other things), also called NON-TERMINALS. Collection of TERMINALS (“constants”, strings that can’t be replaced) One special variable called the START SYMBOL. Janyl Jumadinova Programming Languages September 10-15, 2020 18 / 25

  32. Grammars (Context-free Gramars) Collection of VARIABLES (things that can be replaced by other things), also called NON-TERMINALS. Collection of TERMINALS (“constants”, strings that can’t be replaced) One special variable called the START SYMBOL. Collection of RULES, also called PRODUCTIONS. Janyl Jumadinova Programming Languages September 10-15, 2020 18 / 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend