Programming Languages Janyl Jumadinova September 10-15, 2020 Janyl Jumadinova Programming Languages September 10-15, 2020 1 / 25
Most Important Steps in Compilation Janyl Jumadinova Programming Languages September 10-15, 2020 2 / 25
Lexical Analysis Lexical analysis produces a “token stream” in which the progam is reduced to a sequence of token types, each with its identifying number and the actual string (in the program) corresponding to it. Janyl Jumadinova Programming Languages September 10-15, 2020 3 / 25
Lexical Analysis For each token type, give a description: either a literal string – “ ≤ ” or “while” to describe an operator or reserved word, Janyl Jumadinova Programming Languages September 10-15, 2020 4 / 25
Lexical Analysis For each token type, give a description: either a literal string – “ ≤ ” or “while” to describe an operator or reserved word, or a < rule > – the rule < unsigned int > might stand for “a sequence of one or more digits”; the rule < identifier > might stand for “a letter followed by a sequence of zero or more letters or digits.” Janyl Jumadinova Programming Languages September 10-15, 2020 4 / 25
Typical Tokens in Programming Languages Operators and Punctuation + - * / ( ) [ ] ; : :: < <= == = != ! ...! Each of these is a district lexical class Janyl Jumadinova Programming Languages September 10-15, 2020 5 / 25
Typical Tokens in Programming Languages Operators and Punctuation + - * / ( ) [ ] ; : :: < <= == = != ! ...! Each of these is a district lexical class Keywords if while for goto return switch void ... Each of these is also a distinct lexical class (not a string) Janyl Jumadinova Programming Languages September 10-15, 2020 5 / 25
Typical Tokens in Programming Languages Operators and Punctuation + - * / ( ) [ ] ; : :: < <= == = != ! ...! Each of these is a district lexical class Keywords if while for goto return switch void ... Each of these is also a distinct lexical class (not a string) Identifiers (variables) A single ID lexical class, but parameterized by actual identifier (often a pointer into a symbol table) Janyl Jumadinova Programming Languages September 10-15, 2020 5 / 25
Typical Tokens in Programming Languages Operators and Punctuation + - * / ( ) [ ] ; : :: < <= == = != ! ...! Each of these is a district lexical class Keywords if while for goto return switch void ... Each of these is also a distinct lexical class (not a string) Identifiers (variables) A single ID lexical class, but parameterized by actual identifier (often a pointer into a symbol table) Integer constants A single INT lexical class, but parameterized by numeric value Janyl Jumadinova Programming Languages September 10-15, 2020 5 / 25
Typical Tokens in Programming Languages Operators and Punctuation + - * / ( ) [ ] ; : :: < <= == = != ! ...! Each of these is a district lexical class Keywords if while for goto return switch void ... Each of these is also a distinct lexical class (not a string) Identifiers (variables) A single ID lexical class, but parameterized by actual identifier (often a pointer into a symbol table) Integer constants A single INT lexical class, but parameterized by numeric value Other constants (string, floating point, boolean, ...), etc. Janyl Jumadinova Programming Languages September 10-15, 2020 5 / 25
Lexical Complications Most modern languages are free-form Layout doesn’t matter White space separates tokens Alternatives Haskell, Python - indentation and layout can imply grouping Janyl Jumadinova Programming Languages September 10-15, 2020 6 / 25
Regular Expressions used for Scanning Defined over some alphabet � . For programming languages, alphabet is usually ASCII or Unicode. If re is a regular expression, L(re) is the language (set of strings) generated by re . Janyl Jumadinova Programming Languages September 10-15, 2020 7 / 25
Fundamentals of Regular Expressions (REs) These are the basic building blocks that other REs are built from. Janyl Jumadinova Programming Languages September 10-15, 2020 8 / 25
Operations on REs Janyl Jumadinova Programming Languages September 10-15, 2020 9 / 25
Operations on REs Precedence: (R), R*, R 1 R 2 , R 1 | R 2 (lowest). Parenthesis can be used to group REs as needed. Janyl Jumadinova Programming Languages September 10-15, 2020 9 / 25
Examples Janyl Jumadinova Programming Languages September 10-15, 2020 10 / 25
Abbreviations on REs There are common abbreviations used for convenience. Janyl Jumadinova Programming Languages September 10-15, 2020 11 / 25
Example Possible syntax for numeric constants digit ::= [0-9] digits ::= digit + number ::= digits ( . digits )? ([eE] (+ | -)? digits )? Notice that this allows (unnecessary) leading 0s, e.g., 00045.6. (0, or 0.14 would be necessary 0s). Janyl Jumadinova Programming Languages September 10-15, 2020 12 / 25
Example Possible syntax for numeric constants digit ::= [0-9] nonzero digit ::= [1-9] digits ::= digit + number ::= (0 | nonzero digit digits?) ( . digits )? ([eE] (+ | -)? digits )? Janyl Jumadinova Programming Languages September 10-15, 2020 13 / 25
RE Practice: https://regexone.com/ Janyl Jumadinova Programming Languages September 10-15, 2020 14 / 25
Syntactic Analysis The syntax of a language is described by a grammar that specifies the legal combinations of tokens. Janyl Jumadinova Programming Languages September 10-15, 2020 15 / 25
Syntactic Analysis The syntax of a language is described by a grammar that specifies the legal combinations of tokens. Grammars are often specified in BNF notation (“Backus Naur Form”): Janyl Jumadinova Programming Languages September 10-15, 2020 15 / 25
Syntactic Analysis The syntax of a language is described by a grammar that specifies the legal combinations of tokens. Grammars are often specified in BNF notation (“Backus Naur Form”): <item1> ::= valid replacements for <item1> <item2> ::= valid replacements for <item2> Janyl Jumadinova Programming Languages September 10-15, 2020 15 / 25
Alternative Notations There are several syntax notations for productions in common use; all mean the same thing. E.g.: ifStmt ::= if ( expr ) statement ifStmt → if ( expr ) statement <ifStmt> ::= if ( <expr> ) <statement> Janyl Jumadinova Programming Languages September 10-15, 2020 16 / 25
Example: Grammar for Pigese (or Pigish?) A formal grammar for a “pig language” could be: PigTalk ::= oink PigTalk (Rule 1) (Rule 2) | oink! Janyl Jumadinova Programming Languages September 10-15, 2020 17 / 25
Example: Grammar for Pigese (or Pigish?) A formal grammar for a “pig language” could be: PigTalk ::= oink PigTalk (Rule 1) (Rule 2) | oink! PigTalk can then generate, for example: (Rule 2) PigTalk ::= oink! 1 Janyl Jumadinova Programming Languages September 10-15, 2020 17 / 25
Example: Grammar for Pigese (or Pigish?) A formal grammar for a “pig language” could be: PigTalk ::= oink PigTalk (Rule 1) (Rule 2) | oink! PigTalk can then generate, for example: (Rule 2) PigTalk ::= oink! 1 PigTalk ::= oink PigTalk (Rule 1) 2 ::= oink oink! Janyl Jumadinova Programming Languages September 10-15, 2020 17 / 25
Example: Grammar for Pigese (or Pigish?) A formal grammar for a “pig language” could be: PigTalk ::= oink PigTalk (Rule 1) (Rule 2) | oink! PigTalk can then generate, for example: (Rule 2) PigTalk ::= oink! 1 PigTalk ::= oink PigTalk (Rule 1) 2 ::= oink oink! PigTalk ::= oink PigTalk (Rule 1) 3 ::= oink oink PigTalk (Rule 1) ::= oink oink oink! (Rule 2) Janyl Jumadinova Programming Languages September 10-15, 2020 17 / 25
Grammars (Context-free Gramars) Collection of VARIABLES (things that can be replaced by other things), also called NON-TERMINALS. Janyl Jumadinova Programming Languages September 10-15, 2020 18 / 25
Grammars (Context-free Gramars) Collection of VARIABLES (things that can be replaced by other things), also called NON-TERMINALS. Collection of TERMINALS (“constants”, strings that can’t be replaced) Janyl Jumadinova Programming Languages September 10-15, 2020 18 / 25
Grammars (Context-free Gramars) Collection of VARIABLES (things that can be replaced by other things), also called NON-TERMINALS. Collection of TERMINALS (“constants”, strings that can’t be replaced) One special variable called the START SYMBOL. Janyl Jumadinova Programming Languages September 10-15, 2020 18 / 25
Grammars (Context-free Gramars) Collection of VARIABLES (things that can be replaced by other things), also called NON-TERMINALS. Collection of TERMINALS (“constants”, strings that can’t be replaced) One special variable called the START SYMBOL. Collection of RULES, also called PRODUCTIONS. Janyl Jumadinova Programming Languages September 10-15, 2020 18 / 25
Recommend
More recommend