CPSC 313: Chomsky Normal Form October 25, 2020 We want a simple - PDF document

CPSC 313: Chomsky Normal Form October 25, 2020 We want a simple standard format to describe the productions motivation of a grammar so that we can do proofs and constructions more readily on the grammar. We now must show that all grammars can be converted into this normal form. In Chomsky Normal Form (CNF) all productions are of one of the following three types: A → BC ( A, B, C ∈ V ) A → a ( A ∈ V and a ∈ Σ) S → ǫ (where S ∈ V and is the start symbol) S does not occur on the right hand side of any production. (( T, α ) ∈ P ⇒ n S ( α ) = 0) • remove S from the right side of productions • remove epsilon productions not from the start symbol ( A → ǫ and A � = S ) • remove mixing terminals and non-terminals on right hand side ( A → abCD ) • remove more than 1 alphabet symbols on right hand side ( A → aA ) • remove more than 2 variables in its righthand side ( A → ABC ) • remove productions of length one that are to variables ( A → B ) That is, remove any productions that violate the rules. To remove the epsilon transitions we just think about what variables can derive epsilon, and replace them by omiting them in productions that create that variable. For example if the variable A ⇒ ⋆ ǫ , and we have productions like B → ABC then we add production B → BC . We need to include S → ǫ if S ⇒ ⋆ ǫ as the base case. For the unit productions ( A → B ) we just allow A to derive everything B can derive. So if we had A → B and B → CD | aa we just add A → CD and A → aa and remove A → B To remove alphabet symbols and variables in our rules we simply promote symbols to a variable and include a rule from that variable to the symbol as its only derivation. If we had A → aB we can add a new rule X a → a and replace A → aB with A → X a B

Finally we need to get rid of productions that have more than 2 variables on the right hand side (all that we still need to deal with, that is, these are the only non-CNF productions that remain in our grammar) and we do this by chaining them as follows: A → BCDE we replace this with: A → BA 1 A 1 → CA 2 A 2 → DE Removing epsilon productions 1. determine the nullable variable set (variables that can derive ǫ A ⇒ ⋆ ǫ ): (a) let T be the set of all variables A such that A → ǫ is a production in our grammar. (base case) (b) for all right hand sides of a production B → x 1 x 2 . . . x r , if all x i are nullable then B is nullable. Add B to our set T of nullable variables. (recursive case) (c) continues until no new variables can be added to our nullable set. 2. Add new productions where any nullable variables A is replaced by either A or ǫ in all possible ways. For example, suppose that C, D are nullable and A, B are not and we have a production of the form H → CADB then we would add the following productions to our grammar H → ADB and H → CAB and H → AB to accomodate the possibilities of the variables going to the empty string. 3. Remove all productions of the form A → ǫ 4. If S is in our nullable set T then add the production S → ǫ Question: Why is it insufficent to write “ remove all productions A → ǫ except for S ”? An example: C → DE D → FG E → ǫ F → ǫ G → ǫ Round 1: T = { E, F, G } Round 2: T = { D, E, F, G } Round 3: T = { C, D, E, F, G }

Remove Unit Productions A unit production is A → B or more generally A ⇒ ⋆ B 1. remove all ǫ productions 2. for each pair of variables ( A, B ) such that A ⇒ ⋆ B and B → α is a production ( α ∈ ( V ∪ Σ) ⋆ ) we add the production A → α to our grammar 3. remove all unit productions If B can derive some sentinels forms and A can derive B in some number of steps, we allow A to derive all sentinel forms that B can derive directly instead of requiring that A first turn into B through an effective unit production and instead derive the things that B can derive directly as rules in our grammar. After we remove all rules of the form A → B we no longer have any more rules in our grammar of the form A → α where | α | = 1 and α ∈ V ⋆ An algorithm to find pairs ( A, B ) such that A ⇒ ⋆ B is the following: For each variable A create a set T A := { A } Then, we do the following: for all productions of the form X → Y where X ∈ T A then we add Y to T A Example of this algorithm: A → BF | D | a B → CA | b | E C → c D → B | AC | d E → CC F → E | f Initially, we create the following sets: T A = { A } T B = { B } T C = { C } T D = { D } T E = { E } T F = { F } then after running through the set of rules in our grammar once, we get the following for our sets: T A = { A, D } T B = { B, E } T C = { C } T D = { D, B } T E = { E } T F = { F, E } We repeat this recursive step until we are not adding anything new to our sets. T A = { A, D, B } T B = { B, E } T C = { C }

T D = { D, B, E } T E = { E } T F = { F, E } We run it again: T A = { A, D, B, E } T B = { B, E } T C = { C } T D = { D, B, E } T E = { E } T F = { F, E } We then run it again, notice that we don’t change any of the sets, and terminate. Now we have a bunch of sets T X such that for all Y ∈ T X it is the case that X ⇒ ⋆ Y . Note that this corresponds to a unit production since both X and Y are variables. Then we can continue with the algorithm to explicitly add rules for X corresponding to what Y can derive. At this point we have only productions Promoting Symbols to Variables in our grammar of the form: S → ǫ A → a A → α where | α | ≥ 2 ∧ α ∈ ( V ∪ Σ) ⋆ We want to ensure that instead all such productions where | α | ≥ 2 are strings over V ⋆ and not over ( V ∪ Σ) ⋆ We make a new variable for each symbol that appears on the right hand side and a production for new variable to derive the symbol. if a ∈ Σ and we see that α = βaγ then we add a new variable X a to our grammar with a production X a → a as its only production and then we rewrite the production βaγ as βX a γ After we do this for all terminal symbols on the right hand side of productions with length greater or equal to 2, we now have that all such productions are strings over variables alone, not over alphabet symbols. A → BCDE we effectively replace it with Chain long forms into pairs a bunch of variables and productions, each with length two on the right hand side, and each effectively continuing along the chain. A → BX 1 X 1 → CX 2 X 2 → DE Seen as parse trees, the difference is: A B C D E is the original rule and

A B X 1 C X 2 D E is the chained varient After doing this step, we end up with a grammar that has only productions involving variables having length 2. As a result, all productions now are compliant to the format of CNF and at no step in our process did we change the language that the grammar produced. Therefore, we have taken an arbitrary grammar and converted it into CNF.

CPSC 313: Chomsky Normal Form October 25, 2020 We want a simple - PDF document

CPSC 313: Chomsky Normal Form October 25, 2020 We want a simple standard format to describe the productions motivation of a grammar so that we can do proofs and constructions more readily on the grammar. We now must show that all grammars can

Chomsky Normal Form Chomsky Normal Form Chomsky Normal Form A context free grammar is in

Chomsky Normal Form We introduce Chomsky Normal Form, which is used to answer questions about

4.9: Chomsky Normal Form In this section, we study a special form of grammars called Chomsky

Chomsky and Greibach Normal Forms Chomsky and Greibach Normal Forms p.1/24 Simplifying

CKY Algorithm, Chomsky Normal Form Scott Farrar CLMA, University of Washington January 13, 2010

Chomsky normal form (CNF) I Purpose: a simplified form of grammars Every rule must be either A

Simplification of CFG and Normal Forms Wen-Guey Tzeng Computer Science Department National

Linear regression How to measure the accuracy of linear regression models Linear Regression

M =fl Jirka Hana Syntax Chomsky et al. Standard Theory Government and binding (GB)

Review of Texas Tax Code, Chapter 313 ESC Region 3 SARA LEON & ASSOCIATES, LLC Chapter 313,

Smith Normal Form and Combinatorics Richard P . Stanley Smith Normal Form and Combinatorics

Normal Form Games 2-12-16 Game Representations Extensive Form Game Normal Form Game

CS 301 Lecture 10 Chomsky Normal Form Stephen Checkoway February 19, 2018 1 / 23 More CFLs

MA/CSSE 474 Theory of Computation Removing Ambiguity Chomsky Normal Form Pushdown Automata

Last Class Recursive Descent Parsing and CYK ANLP: Lecture 13 Chomsky normal form grammars

Homework Homework #3 returned Chomsky Normal Form Homework #4 due today Homework #5

Theory of Computer Science C2. Regular Languages: Finite Automata Gabriele R oger University

COMP 110 Introduction to Programming Fall 2015 Time: TR 9:30 10:45 Room: AR 121 (Hanes Art

Program Fundamentals /* HelloWorld.java * The classic Hello, world! program */ class

CS 115 Lecture 3 A first look at Python Neil Moore Department of Computer Science University of

Fundamentele Informatica 1 (I&E) najaar 2015

CSEE 3827: Fundamentals of Computer Systems Boolean Logic & Boolean Algebra Agenda (M&K

Overview Basic Matlab Operations Starting Matlab Using Matlab as a calculator

Announcements Wednesday, August 30 WeBWorK due on Friday at 11:59pm. The first quiz is on

CPSC 313: Chomsky Normal Form October 25, 2020 We want a simple - PDF document

CPSC 313: Chomsky Normal Form October 25, 2020 We want a simple standard format to describe the productions motivation of a grammar so that we can do proofs and constructions more readily on the grammar. We now must show that all grammars can

Chomsky Normal Form Chomsky Normal Form Chomsky Normal Form A context free grammar is in

Chomsky Normal Form We introduce Chomsky Normal Form, which is used to answer questions about

4.9: Chomsky Normal Form In this section, we study a special form of grammars called Chomsky

Chomsky and Greibach Normal Forms Chomsky and Greibach Normal Forms p.1/24 Simplifying

CKY Algorithm, Chomsky Normal Form Scott Farrar CLMA, University of Washington January 13, 2010

Chomsky normal form (CNF) I Purpose: a simplified form of grammars Every rule must be either A

Simplification of CFG and Normal Forms Wen-Guey Tzeng Computer Science Department National

Linear regression How to measure the accuracy of linear regression models Linear Regression

M =fl Jirka Hana Syntax Chomsky et al. Standard Theory Government and binding (GB)

Review of Texas Tax Code, Chapter 313 ESC Region 3 SARA LEON &amp; ASSOCIATES, LLC Chapter 313,

Smith Normal Form and Combinatorics Richard P . Stanley Smith Normal Form and Combinatorics

Normal Form Games 2-12-16 Game Representations Extensive Form Game Normal Form Game

CS 301 Lecture 10 Chomsky Normal Form Stephen Checkoway February 19, 2018 1 / 23 More CFLs

MA/CSSE 474 Theory of Computation Removing Ambiguity Chomsky Normal Form Pushdown Automata

Last Class Recursive Descent Parsing and CYK ANLP: Lecture 13 Chomsky normal form grammars

Homework Homework #3 returned Chomsky Normal Form Homework #4 due today Homework #5

Theory of Computer Science C2. Regular Languages: Finite Automata Gabriele R oger University

COMP 110 Introduction to Programming Fall 2015 Time: TR 9:30 10:45 Room: AR 121 (Hanes Art

Program Fundamentals /* HelloWorld.java * The classic Hello, world! program */ class

CS 115 Lecture 3 A first look at Python Neil Moore Department of Computer Science University of

Fundamentele Informatica 1 (I&amp;E) najaar 2015

CSEE 3827: Fundamentals of Computer Systems Boolean Logic &amp; Boolean Algebra Agenda (M&amp;K

Overview Basic Matlab Operations Starting Matlab Using Matlab as a calculator

Announcements Wednesday, August 30 WeBWorK due on Friday at 11:59pm. The first quiz is on

Review of Texas Tax Code, Chapter 313 ESC Region 3 SARA LEON & ASSOCIATES, LLC Chapter 313,

Fundamentele Informatica 1 (I&E) najaar 2015

CSEE 3827: Fundamentals of Computer Systems Boolean Logic & Boolean Algebra Agenda (M&K