data structures and algorithms iii
play

Data Structures and Algorithms III WS 1920 SfS / University of - PDF document

Data Structures and Algorithms III WS 1920 SfS / University of Tbingen . ltekin, formal/computational linguistics computation Why study formal languages Formal & natural languages Languages and Complexity Formal languages


  1. Data Structures and Algorithms III WS 19–20 SfS / University of Tübingen Ç. Çöltekin, formal/computational linguistics computation Why study formal languages Formal & natural languages Languages and Complexity Formal languages Practical matters 4 / 34 SfS / University of Tübingen 5 / 34 Ç. Çöltekin, processing Formal languages and automata grammars and rewrite rules An overview This lecture Formal & natural languages Languages and Complexity Formal languages Practical matters WS 19–20 Practical matters WS 19–20 6 / 34 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, the alphabet Strings Defjnitions Formal & natural languages Languages and Complexity Formal languages Practical matters WS 19–20 Formal languages SfS / University of Tübingen Ç. Çöltekin, is the set of natural language words, – If we are interested in natural language syntax our alphabet – If we want to defjne a grammar for arithmetic operations, – In some cases one may want to use a binary alphabet, Alphabet Defjnitions Formal & natural languages Languages and Complexity 3 / 34 7 / 34 SfS / University of Tübingen Ç. Çöltekin, An overview of the upcoming topics Formal & natural languages Languages and Complexity Formal languages Practical matters 1 / 34 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, ccoltekin@sfs.uni-tuebingen.de – FSTs and computational morphology University of Tübingen Seminar für Sprachwissenschaft Winter Semester 2019–2020 Practical matters on practical sides) Formal languages algorithms (e.g., automata, parsing) Languages and Complexity Linguistics topics / applications The second part of the course will be somewhat difgerent: Formal & natural languages Practical matters Formal & natural languages Practical matters Assignments – Finite state transducers – Parsing the course work, they are not ‘optional’ Languages and Complexity Çağrı Çöltekin – Finite state automata Formal languages 2 / 34 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, • The focus will shift more towards Computational • We will review more specialized data structures and • Some overlap with parsing class (but with more emphasis • Less focus on programming • Assignment policy is similar to the fjrst part of the course • Background on formal languages and automata (today) • Three more assignments: • Finite state automata and regular languages • Finite state transducers (FST) • Dependency grammars and dependency parsing • There will also be some in-class exercises – they are part of • Context-free grammars and constituency parsing • Background: some defjnitions on phrase structure • Formal languages are an important area of the theory of • Chomsky hierarchy of (formal) language classes • Background: computational complexity • They originate from linguistics, and they have been used in • Automata, their relation to formal languages • Formal languages and automata in natural language • A brief note on learnability of natural languages • A string over an alphabet is a fjnite sequence symbols from • An alphabet is a set of symbols – a , ab , acbcaa are example strings over Σ = { a , b , c } • We generally denote an alphabet using the symbol Σ • The empty string is denoted by ϵ • In our examples, we will use lowercase ASCII letters for • The Σ ∗ denotes all strings that can be formed using the individual symbols, e.g., Σ = { a , b , c } alphabet Σ , including the empty string ϵ • Alphabet does not match the every-day use: • The Σ + is a shorthand for Σ ∗ − ϵ • Similarly a ∗ means the symbol a repeated zero or more Σ = { 0 , 1 } times, a + means a repeated one or more times we may want to have Σ = { 0 , 1 , 2 , 3 , . . . , 9 , + , − , × , / } • We use a n for exactly n repetitions of a • The length of a string u is denoted by | u | , e.g., | abc | = 3 , or if u = aabbcc , | u | = 6 Σ = { the , on , cat , dog , mat , sat , . . . } • Concatenation of two string u and v is denoted by uv , e.g., for u = ab and v = ca , uv = abca

  2. Practical matters Ç. Çöltekin, (abstract) machines exist Regular Context Free Context Sensitive Recursively Enumerable SfS / University of Tübingen computation WS 19–20 12 / 34 Practical matters Formal languages Languages and Complexity Formal & natural languages Regular grammars corresponds to a class of grammar Right regular WS 19–20 Grammars and derivations Grammar non-terminals are called sentential forms Q: What if string was not in the language? Q: Is there another derivation sequence? Ç. Çöltekin, Formal languages 11 / 34 production rules of the Practical matters Formal languages Languages and Complexity Formal & natural languages Chomsky hierarchy of (formal) languages natural language syntax the restrictions on Left regular expressions Formal & natural languages Languages and Complexity Ç. Çöltekin, SfS / University of Tübingen WS 19–20 14 / 34 Practical matters Formal languages Formal & natural languages These grammars are weakly equivalent : they generate the same language, but Context-free grammars (CFG) CFG rules quence of terminals and non-terminals later) Ç. Çöltekin, SfS / University of Tübingen WS 19–20 derivations difger right reverse is not true) Formal & natural languages Ç. Çöltekin, SfS / University of Tübingen WS 19–20 13 / 34 Practical matters Formal languages Languages and Complexity Regular grammars left an example Write a right- and a left-regular left right Can you defjne a regular grammar for one of your grammars Defjnitions SfS / University of Tübingen Languages and Complexity sentence in the language based on a set of rewrite rules (or phrase structure rules ) uppercase letters lowercase letters Formal languages the rewrite rules, the string is a valid Q: What does Grammar this grammar defjne? Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 34 Practical matters language Defjnitions Languages and Complexity English sentences Languages and Complexity Formal & natural languages Defjnitions Language – The set of string that retain alphabetical ordering over – The set of strings of words that form grammatically correct (or sometimes words ) of the language Formal & natural languages Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 34 Practical matters Formal languages Languages and Complexity Formal languages 15 / 34 Formal & natural languages Phrase structure grammars: more formally Ç. Çöltekin, WS 19–20 SfS / University of Tübingen Defjnitions 10 / 34 Practical matters • A grammar is a fjnite description of a • A (formal) language is a set of string over an alphabet S → A B – The set of strings of length 2 over { 0 , 1 } : • A common way of specifying a grammar is { 00 , 01 , 10 , 11 } S → S A B – The set of strings with even number of 1 ’s over { 0 , 1 } : A → a { ϵ , 101 , 0 , 11 , 111110 , . . . } B → b • We represent non-terminal symbols with { a , b , c } : { a , ab , abc , ac , abcc , . . . } • We represent terminal symbols with • S is the start symbol • Strings that are member of a language is called sentences • If a string can be generated from S using A phrase structure grammar is a tuple G = ( Σ , N , S , R ) where Derivation of abab Σ is an alphabet of terminal symbols S ⇒ SAB aBAB ⇒ abAB N are a set of non-terminal symbols S → A B SAB ⇒ ABAB abAB ⇒ abaB S is a special ‘start’ symbol ∈ N S → S A B ABAB ⇒ aBAB abaB ⇒ abab R is a set of rules of the form A → a α → β B → b where α and β are strings from Σ ∪ N • Intermediate strings of terminals and A string u is in the language defjned by G , • S ∗ ⇒ abab : the string is in the language if it can be derived from S . • Defjned for formalizing 1. A → a 1. A → a • Defjnitions are in terms of 2. A → aB 2. A → Ba 3. A → ϵ 3. A → ϵ • Least expressive, but easy to process • Also part of theory of • Used in many NLP applications • Defjnes the set of languages expressed by regular • Each language class • Regular grammars defjne only regular languages (but • Other well-studied classes • We will discuss it in more detail soon Derive the string abbbc using grammar ab ∗ c A → α where A is a single non-terminal α is a possibly empty se- S → Ac S → aA S ⇒ Ac ⇒ Abc ⇒ Abbc ⇒ Abbbc ⇒ abbbc A → Ab A → bA A → a A → c • More expressive than regular languages S ⇒ aA ⇒ abA ⇒ abbA ⇒ • Syntax of programming languages are based on CFGs abbbA ⇒ abbbc • a n b n ? • Many applications for natural languages too (more on this • a 5 b 5 ?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend