Overview Grammars, or: how to specify linguistic knowledge Towards - PDF document

Overview • Grammars, or: how to specify linguistic knowledge Towards more complex grammar systems Some basic formal language theory • Automata, or: how to process with linguistic knowledge • Levels of complexity in grammars and automata: The Chomsky hierarchy Detmar Meurers: Intro to Computational Linguistics I OSU, LING 684.01 2 Grammars A simple example A grammar is a 4-tuple ( N, Σ , S, P ) where N = { S, NP , VP , V i , V t , V s } Σ = { John, Mary, laughs, loves, thinks } • N is a finite set of non-terminals S = S • Σ is a finite set of terminal symbols ,   NP → John with N ∩ Σ = ∅ S → NP VP     NP → Mary           P = VP → V i • S is a distinguished start symbol , with S ∈ N V i → laughs → VP V t NP     V t → loves     VP → V s S • P is a finite set of rewrite rules of the form α → β , with α, β ∈     V s → thinks   ( N ∪ Σ) ∗ and α including at least one non-terminal symbol. 3 4 How does a grammar define a language? Processing with grammars: automata Assume α, β ∈ ( N ∪ Σ) ∗ , with α containing at least one non-terminal. • A sentential form for a grammar G is defined as: An automaton in general has three components: − The start symbol S of G is a sentential form. − If αβγ is a sentential form and there is a rewrite rule β → δ then • an input tape , divided into squares with a read-write head positioned αδγ is a sentential form. over one of the squares • α (directly or immediately) derives β if α → β ∈ P . One writes: • an auxiliary memory characterized by two functions − α ⇒ ∗ β if β is derived from α in zero or more steps − fetch: memory configuration → symbols − α ⇒ + β if β is derived from α in one or more steps − store: memory configuration × symbol → memory configuration • A sentence is a sentential form consisting only of terminal symbols. • and a finite-state control relating the two components. • The language L ( G ) generated by the grammar G is the set of all sentences which can be derived from the start symbol S , i.e., L ( G ) = { γ | S ⇒ ∗ γ } 5 6

Different levels of complexity in grammars and Type 3: Right-Linear Grammars and FSAs automata A right-linear grammar is a 4-tuple ( N, Σ , S, P ) with Let A, B ∈ N , x ∈ Σ , α, β, γ ∈ (Σ ∪ T ) ∗ , and δ ∈ (Σ ∪ T )+ , then: P a finite set of rewrite rules of the form α → β , with α ∈ N and β ∈ Type Automaton Grammar { γδ | γ ∈ Σ ∗ , δ ∈ N ∪ { ǫ }} , i.e.: Memory Name Rule Name 0 Unbounded TM α → β General rewrite − left-hand side of rule: a single non-terminal, and 1 Bounded LBA β A γ → β δ γ Context-sensitive − right-hand side of rule: a string containing at most one non-terminal, 2 Stack PDA A → β Context-free as the rightmost symbol 3 None FSA A → xB , A → x Right linear Right-linear grammars are formally equivalent to left-linear grammars. Abbreviations: – TM: Turing Machine A finite-state automaton consists of – LBA: Linear-Bounded Automaton – a tape – PDA: Push-Down Automaton – a finite-state control – FSA: Finite-State Automaton – no auxiliary memory 7 8 A regular language example: ( ab | c ) ab ∗ ( a | cb )? Thinking about regular languages Right-linear grammar: − A language is regular iff one can define a FSM (or regular expression)   Expr → ab X X → a Y for it. N = { Expr, X, Y, Z }     Expr → c X    →  Z a Σ = { a,b,c } P = − An FSM only has a fixed amount of memory, namely the number of Y → b Y Z → cb S = Expr   states.    → →  Y Z Z ǫ   − Strings longer than the number of states, in particular also any infinite ones, must result from a loop in the FSM. Finite-state transition network: − Pumping Lemma: if for an infinite string there is no such loop, the a c a string cannot be part of a regular language. 4 2 0 1 b b c a b 5 3 9 10 A context-free language example: a n b n Type 2: Context-Free Grammars and Push-Down Automata Context-free grammar: Push-down automaton: A context-free grammar is a 4-tuple ( N, Σ , S, P ) with ǫ N = { S } P a finite set of rewrite rules of the form α → β , with α ∈ N and β ∈ 0 1 Σ = { a, b } (Σ ∪ N ) ∗ , i.e.: S = S − left-hand side of rule: a single non-terminal, and a + push x b + pop x � � S → a S b − right-hand side of rule: a string of terminals and/or non-terminals P = S → ǫ A push-down automaton is a − finite state automaton, with a − stack as auxiliary memory 11 12

A context-sensitive language example: a n b n c n Type 1: Context-Sensitive Grammars and Linear-Bounded Automata Context-sensitive grammar: A rule of a context-sensitive grammar N = { S, B, C } – rewrites at most one non-terminal from the left-hand side. – right-hand side of a rule required to be at least as long as the left- Σ = { a, b } hand side, i.e. only contains rules of the form S = S α → β with | α | ≤ | β |   S → a S B C, and optionally S → ǫ with the start symbol S not occurring in any β .     → S a b C,         b B → b b,   A linear-bounded automaton is a P = → b C b c, – finite state automaton, with an     c C → c c,       – auxiliary memory which cannot exceed the length of the input string.   C B → B C   13 14 Type 0: General Rewrite Grammar and Turing Machines Properties of different language classes Languages are sets of strings, so that one can apply set operations to languages and investigate the results for particular language classes. • In a general rewrite grammar there are no restrictions on the form Some closure properties: of a rewrite rule. − All language classes are closed under union with themselves . • A turing machine has an unbounded auxiliary memory. − All language classes are closed under intersection with regular languages . • Any language for which there is a recognition procedure can be defined, but recognition problem is not decidable. − The class of context-free languages is not closed under intersection with itself . Proof: The intersection of the two context-free languages L 1 and L 2 is not context free: − L 1 = � a n b n c i | n ≥ 1 and i ≥ 0 � � � − L 2 = a j b n c n | n ≥ 1 and j ≥ 0 15 16 − L 1 ∩ L 2 = { a n b n c n | n ≥ 1 } Criteria under which to evaluate grammar formalisms Language classes and natural languages Natural languages are not regular There are three kinds of criteria: – linguistic naturalness (1) a. The mouse escaped. – mathematical power b. The mouse that the cat chased escaped. – computational effectiveness and efficiency c. The mouse that the cat that the dog saw chased escaped. . . d. . The weaker the type of grammar: (2) a. aa – the stronger the claim made about possible languages b. abba – the greater the potential efficiency of the parsing procedure c. abccba . . d. . Reasons for choosing a stronger grammar class: – to capture the empirical reality of actual languages Center-embedding of arbitrary depth needs to be captured to capture – to provide for elegant analyses capturing more generalizations ( → language competence → Not possible with a finite state automaton. more “compact” grammars) 17 18

Overview Grammars, or: how to specify linguistic knowledge Towards - PDF document

Overview Grammars, or: how to specify linguistic knowledge Towards more complex grammar systems Some basic formal language theory Automata, or: how to process with linguistic knowledge Levels of complexity in grammars and automata:

01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 |

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 SF park overview OVERVIEW PRESENTATION / 2

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 Acknowledgements OVERVIEW PRESENTATION / 2 SF

INVESTOR PRESENTATION FEBRUARY 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

INVESTOR PRESENTATION MAY 2019 Index Executive Summary Company Overview Business Overview

INVESTOR PRESENTATION MARCH 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

1 Overview Overview Regional demographic overview Regional demographic overview Workforce

Covid-19 and Business Interruption: Maximizing Insurance Coverage and Federal Grants Counsel

OVERVIEW OVERVIEW OVERVIEW OVERVIEW The qualifications are aimed at primary school

An overview to Maltese An overview to Maltese An overview to Maltese An overview to Maltese

GSM System Overview GSM System Overview GSM System Overview GSM System Overview Phone Lin

Butterball Employees Butterball Employees Butterball Employees Benefits Overview Ruan Benefits

Program-for-Results Financing Overview Overview Overview of World Bank Instruments

INVESTOR PRESENTATION Index Executive Summary Company Overview Business Overview Industry

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Federal Fiscal Year 2017-18 CHASE Fee Program June 21, 2018 Overview CHASE Overview Fee

Creating a Dynamic Grammar of Ancient Greek An on-going research project at the Catholic

Algorithms for NLP (11-711) Fall 2020 Formal Language Theory In one lecture Robert Frederking

On recovering multi-dimensional arrays in Polly Tobias Grosser, Sebastian Pop, J. Ramanujam, P.

Lecture 17 No code files for today Reminder: Project 3 due today. Homework 5 (!) due on

Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung &

Compiler Construction October 31, 2018 Compiler Construction October 31, 2018 1 / 175 Mayer

A Component-based Approach for Constructing High-confidence Distributed Embedded Systems Barrett

Fine-grained language composition Edd Barrett, Carl Friedrich Bolz, Lukas Diekmann, Geoff French,

Overview Grammars, or: how to specify linguistic knowledge Towards - PDF document

Overview Grammars, or: how to specify linguistic knowledge Towards more complex grammar systems Some basic formal language theory Automata, or: how to process with linguistic knowledge Levels of complexity in grammars and automata:

01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 |

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 SF park overview OVERVIEW PRESENTATION / 2

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 Acknowledgements OVERVIEW PRESENTATION / 2 SF

INVESTOR PRESENTATION FEBRUARY 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

INVESTOR PRESENTATION MAY 2019 Index Executive Summary Company Overview Business Overview

INVESTOR PRESENTATION MARCH 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

1 Overview Overview Regional demographic overview Regional demographic overview Workforce

Covid-19 and Business Interruption: Maximizing Insurance Coverage and Federal Grants Counsel

OVERVIEW OVERVIEW OVERVIEW OVERVIEW The qualifications are aimed at primary school

An overview to Maltese An overview to Maltese An overview to Maltese An overview to Maltese

GSM System Overview GSM System Overview GSM System Overview GSM System Overview Phone Lin

Butterball Employees Butterball Employees Butterball Employees Benefits Overview Ruan Benefits

Program-for-Results Financing Overview Overview Overview of World Bank Instruments

INVESTOR PRESENTATION Index Executive Summary Company Overview Business Overview Industry

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Federal Fiscal Year 2017-18 CHASE Fee Program June 21, 2018 Overview CHASE Overview Fee

Creating a Dynamic Grammar of Ancient Greek An on-going research project at the Catholic

Algorithms for NLP (11-711) Fall 2020 Formal Language Theory In one lecture Robert Frederking

On recovering multi-dimensional arrays in Polly Tobias Grosser, Sebastian Pop, J. Ramanujam, P.

Lecture 17 No code files for today Reminder: Project 3 due today. Homework 5 (!) due on

Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung &amp;

Compiler Construction October 31, 2018 Compiler Construction October 31, 2018 1 / 175 Mayer

A Component-based Approach for Constructing High-confidence Distributed Embedded Systems Barrett

Fine-grained language composition Edd Barrett, Carl Friedrich Bolz, Lukas Diekmann, Geoff French,

Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung &