Parsing Principles of Programming Languages Colorado School of - PowerPoint PPT Presentation

Parsing Principles of Programming Languages Colorado School of Mines https://lambda.mines.edu CSCI-400

Activity & Overview CSCI-400

Review the learning group activity with your group. Compare your solutions to the practice problems. Did anyone have any issues with the problems? Then, as a learning group, work on a regular expression to match double-quoted string literals: What you should match is shown in bold . Try your regular expressions at the Python REPL. Learning Group Activity print( "Hello, World!" ) if (strcspn(cmdline, "'\"`" ) != strlen(cmdline)) { printf( "<text:p text:style-name=\"Glossary\">" ); escape( "\"1 < 2\"" ) CSCI-400

this: presumably, might be for some programming language): How does our language implementation know what to do with this code? How do we determine the order of operations on this Suppose we have the following source code (which, How can we represent this code in memory in a way that makes it easy to evaluate or compile? How do we handle cases where programmers write the same expression but with difgerent spacing or style, like Parsing: High Level Overview alpha = beta + gamma * 4 expression so that we compute beta + (gamma * 4) rather than (beta + gamma) * 4 ? alpha=beta+gamma *4 CSCI-400

parentheses: Product is a and nesting of Product id: beta Value Sum id: alpha Assign child of Sum here. The goal of parsing is to Conveys order of operation id: gamma Value int: 4 tree . The abstract syntax using an abstract syntax We typically represent this interpret compile. that makes it easy to code into a representation convert textual source Value Parsing: Goal is Code to AST tree for alpha = beta + gamma * 4 is shown. CSCI-400

Parsers are typically implemented using two stages: Lexical Analysis During lexical analysis, the input is tokenized to produce a sequence of tokens from the input. Syntactic Analysis During syntactic analysis, the tokens from lexical analysis are formed into an abstract syntax tree. Parsing: Two Stages CSCI-400

Lexical Analysis CSCI-400

During lexical analysis, we tokenize the input into a list tokens consisting of two fjelds: Token Type Data (optional) Tokens which won’t appear in the AST are called control tokens : these control the operation of the parser. Lexical Analysis alpha=beta+gamma*4 LA → Id(alpha), Equals, Id(beta), Plus, − Id(gamma), Times, Int(4) CSCI-400

Lexical Analysis: Implementation tokens_p = re.compile(r''' \s*(?: (=)|(\+)|(\*) # operators | (-?\d+) # integers | (\w+) # identifiers | (.) # error )\s*''', re.VERBOSE) def tokenize(code): for m in tokens_p.finditer(code): if m.group(1): yield Equals() ... elif m.group(5): yield Id(m.group(5)) elif m.group(6): raise SyntaxError CSCI-400

Syntactic Analysis CSCI-400

During syntactic analysis, we turn the token stream from the lexical analysis into an abstract syntax tree. In general, there’s two ways to parse a stream of tokens: Top-Down: form the node at the root of the syntax tree, then recursively form the children. Bottom-Up: start by forming the leaf nodes, then forming their parents. Syntactic Analysis Id(alpha), Equals, Id(beta), Plus, Id(gamma), Times, Int(4) SA − − → AST CSCI-400

In order to parse a language, we need a notation to formalize the constructs of our language. We defjne a set of production rules that state what the various constructs are formed of: This is actually a specifjc kind of context-free grammar called a LR (left-recursive) grammar. It makes it convenient for using shift-reduce parsers (coming up!) Language Grammars Assign → Id Equals Sum Sum → Sum Plus Product Sum → Product Product → Product Times Value Product → Value Value → Int Value → Id CSCI-400

Shift-reduce is a type of bottom-up parser. We place a cursor at the beginning of the token stream, and parse each step using one of two transitions: Shift: move the cursor to the next token to the right. Reduce: match a production rule to the tokens directly to the left of the cursor, reducing them to the LHS of the production rule. We refer to the token just to the right of the cursor as the lookahead token . We use the lookahead token to determine that the left of the cursor can unambiguously be reduced, otherwise we will shift. Example on Whiteboard Example shown on whiteboard of using our grammar to create an AST using shift-reduce. Shift-Reduce Parsing CSCI-400

Parsing Principles of Programming Languages Colorado School of - PowerPoint PPT Presentation

Parsing Principles of Programming Languages Colorado School of Mines https://lambda.mines.edu CSCI-400 Activity & Overview CSCI-400 Review the learning group activity with your group. Compare your solutions to the practice problems. Did

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South

Programming Languages: Parsing Onur Tolga S ehito glu Computer Engineering,METU 27 May

* 07/16/96 Plan for Today Shift-reduce parsing The problem with predictive top down parsing

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

15-411/15-611 Compiler Design Robert Simmons, Instructor Fall

Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Outline Languages and grammars

Introduction to Computer Science CSCI 109 China Tianhe-2 Readings Andrew Goodney St.

Parsing [S]hell Yann Rgis-Gianas in collaboration with Nicolas Jeannerod and Ralf Treinen

LEXING cs4430/7430 Spring 2019 Bill Harrison Announcements "CS4430 Code

Lexical Analysis Aslan Askarov aslan@cs.au.dk acknowledgments: E. Ernst Lexical analysis

Lexical and Syntactic Analysis Exercises a. 3.1 Identifier letter IdentRest IdentRest |

Compilers and computer architecture: introduction Martin Berger 1 Thanks to Chad MacKinney, Alex

Parsing Principles of Programming Languages Colorado School of - PowerPoint PPT Presentation

Parsing Principles of Programming Languages Colorado School of Mines https://lambda.mines.edu CSCI-400 Activity & Overview CSCI-400 Review the learning group activity with your group. Compare your solutions to the practice problems. Did

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South

Programming Languages: Parsing Onur Tolga S ehito glu Computer Engineering,METU 27 May

* 07/16/96 Plan for Today Shift-reduce parsing The problem with predictive top down parsing

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

15-411/15-611 Compiler Design Robert Simmons, Instructor Fall

Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Outline Languages and grammars

Introduction to Computer Science CSCI 109 China Tianhe-2 Readings Andrew Goodney St.

Parsing [S]hell Yann Rgis-Gianas in collaboration with Nicolas Jeannerod and Ralf Treinen

LEXING cs4430/7430 Spring 2019 Bill Harrison Announcements &quot;CS4430 Code

Lexical Analysis Aslan Askarov aslan@cs.au.dk acknowledgments: E. Ernst Lexical analysis

Lexical and Syntactic Analysis Exercises a. 3.1 Identifier letter IdentRest IdentRest |

Compilers and computer architecture: introduction Martin Berger 1 Thanks to Chad MacKinney, Alex

LEXING cs4430/7430 Spring 2019 Bill Harrison Announcements "CS4430 Code