Language-Processing Problems Roland Backhouse DIMACS, 8th July, - PowerPoint PPT Presentation

1 Language-Processing Problems Roland Backhouse DIMACS, 8th July, 2003

2 Introduction “Factors” and the “factor matrix” were introduced by Conway (1971). He used them very effectively in, for example, constructing biregulators. Conway’s discussion is wordy, making it difficult to understand. There are also occasional errors which are difficult to detect and add to the confusion. (“The theorem does prevent E from occurring twice” should read “The theorem does not prevent E from occurring twice.”)

3 KMP Failure Function (pattern aabaa ) node i 1 2 3 4 5 failure node f ( i ) 0 1 0 1 2 Factor Graph (language Σ ∗ aabaa ) ❄ ε ε ✓✏ ✓✏ ✓✏ ✓✏ ✓✏ ✓✏ ❄ ✞ a a b a a ✲ ✲ ✲ ✲ ✲ ✝ ✲ b 0 1 2 3 4 5 ✒✑ ✒✑ ✒✑ ✒✑ ✒✑ ✒✑ ✻ ✻ ε ε ε ✻

4 Language Problems S ::= aSS | ε . Is-empty S = φ ≡ ( { a } = φ ∨ S = φ ∨ S = φ ) ∧ { ε } = φ . Nullable ε ∈ S ≡ ( ε ∈ { a } ∧ ε ∈ S ∧ ε ∈ S ) ∨ ε ∈ { ε } . Shortest word length # S = ( # a + # S + # S ) ↓ # ε . Non-Example aa ∈ S �≡ ( aa ∈ { a } ∧ aa ∈ S ∧ aa ∈ S ) ∨ aa ∈ { ε } .

5 Fusion Many problems are expressed in the form ◦ generate evaluate where generate generates a (possibly infinite) candidate set of solutions, and evaluate selects a best solution. Examples: ◦ path , shortest ◦ L . ( x ∈ ) Solution method is to fuse the generation and evaluation processes, eliminating the need to generate all candidate solutions.

6 Conditions for Fusion Fusion is made possible when • evaluate is an adjoint in a Galois connection , • generate is expressed as a fixed point . Algorithms for solving resulting fixed point equation include • brute-force iteration, • Knuth’s generalisation of Dijkstra’s shortest path algorithm. . Solution method typically involves generalising the problem.

7 Galois Connections Suppose A =( A, ⊑ ) and B =( B, � ) are partially ordered sets and suppose F ∈ A ← B and G ∈ B ← A . Then ( F, G ) is a Galois connection of A and B iff, for all x ∈ B and y ∈ A , F ( x ) ⊑ y ≡ x � G ( y ) . Examples Negation: ¬ p ⇒ q ≡ p ⇐ ¬ q . Ceiling function: ⌈ x ⌉ ≤ n ≡ x ≤ n . Maximum: x ↑ y ≤ z ≡ x ≤ z ∧ y ≤ z . Even (divisible by two): if b → 2 ✷ ¬ b → 1 fi \ m ≡ b ⇒ even ( m ) .

8 Parsing S ⊆ if b → Σ ∗ ✷ ¬ b → Σ ∗ − { x } fi . x ∈ S ⇒ b ≡ Shortest Word (Path) Let Σ ≥ k denote the set of all words over alphabet Σ of length at least k . Let # S denote the length of a shortest word in the language S . # S ≥ k ≡ S ⊆ Σ ≥ k . (Most common application is when S is the set of paths from one node to another in a graph.)

9 Fusion Theorem F ( µ � g ) = µ ⊑ h provided that • F is a lower adjoint in a Galois connection of ⊑ and � (see brief summary of definition below) • F ◦ g = h ◦ F . Galois Connection F ( x ) ⊑ y ≡ x � G ( y ) . F is called the lower adjoint and G the upper adjoint.

10 Language Recognition Problem : For given word x and grammar G , determine x ∈ L ( G ) . That is, implement ◦ L . ( x ∈ ) Language L ( G ) is the least fixed point (with respect to the subset relation) of a monotonic function. ( x ∈ ) is the lower adjoint in a Galois connection of languages (ordered by the subset relation) and booleans (ordered by implication). (Recall, S ⊆ if b → Σ ∗ ✷ ¬ b → Σ ∗ − { x } fi x ∈ S ⇒ b ≡ .)

11 Nullable Languages Problem : For given grammar G , determine ε ∈ L ( G ) . ◦ L ( ε ∈ ) Solution : Easily expressed as a fixed point computation. Works because: • The function ( x ∈ ) is a lower adjoint in a Galois connection (for all x , but in particular for x = ε ). • For all languages S and T , ε ∈ S · T ≡ ε ∈ S ∧ ε ∈ T .

12 Problem Generalisation Problem : For given grammar G , determine whether all words in L ( G ) have even length. I.e. implement ◦ L . alleven The function alleven is a lower adjoint in a Galois connection. Specifically, for all languages S and T , S ⊆ if ¬ b → Σ ∗ ✷ b → ( Σ · Σ ) ∗ fi . alleven ( S ) ⇐ b ≡ Nevertheless, fusion doesn’t work (directly) because • there is no ⊗ such that, for all languages S and T , alleven ( S · T ) ≡ alleven ( S ) ⊗ alleven ( T ) . Solution : Generalise by tupling: compute simultaneously alleven and allodd .

13 General Context-Free Parsing Problem : For given grammar G , determine x ∈ L ( G ) . ◦ L . ( x ∈ ) Not (in general) expressible as a fixed point computation. Fusion fails because: for all x , x � = ε , there is no ⊗ such that, for all languages S and T , x ∈ S · T ≡ ( x ∈ S ) ⊗ ( x ∈ T ) . CYK : Let F ( S ) denote the relation � i, j :: x [ i .. j ) ∈ S � . Works because: • The function F is a lower adjoint. • For all languages S and T , F ( S · T ) = F ( S ) • F ( T ) where B • C denotes the composition of relations B and C .

14 Language Inclusion Problem : For fixed (regular) language E and varying S , determine S ⊆ E . Example : Emptiness test: S ⊆ φ . Example : Pattern Matching: given pattern P , for each prefix t of text T , evaluate: { t } ⊆ Σ ∗ · { P } . Example : All words are of even length: S ⊆ ( Σ · Σ ) ∗ .

15 Language Inclusion Problem : For fixed (regular) language E and varying S , determine S ⊆ E . • Function ( ⊆ E ) is a lower adjoint. Specifically, S ⊆ if b → E ✷ ¬ b → Σ ∗ fi . S ⊆ E ⇐ b ≡ • But, for E � = φ and E � = Σ ∗ , there is no ⊗ such that, for all languages S and T , S · T ⊆ E ≡ ( S ⊆ E ) ⊗ ( T ⊆ E ) . Solution (Oege de Moor) : Use factor theory to derive generalisation.

16 Factors For all languages S , T and U , S · T ⊆ U ≡ T ⊆ S \ U , S · T ⊆ U ≡ S ⊆ U/T . Note: S \ ( U/T ) = ( S \ U ) /T . Hence, write S \ U/T .

17 Left and Right Factors Define the functions ⊳ and ⊲ by X ⊳ = E/X , X ⊲ = X \ E . By definition, the range of ⊳ is the set of left factors of E and the range of ⊲ is the set of right factors of E . We also have the Galois connection: X ⊆ Y ⊳ ≡ Y ⊆ X ⊲ . Hence, = X ⊳⊲⊳ X ⊳ , X ⊲⊳⊲ = X ⊲ , E ⊳⊲ = E = E ⊲⊳ .

18 The Factor Matrix Let L denote the set of left factors of E . Define the factor matrix of E to be the binary operator \ restricted to L×L . Thus entries in the matrix take the form L 0 \ L 1 where L 0 and L 1 are left factors of E . The factor matrix of E is denoted by [ [ E ] ] . It is a reflexive, transitive matrix. ] ∗ . [ [ E ] ] = [ [ E ] The row and column containing individual factors, the left factors, the right factors, and E itself, is given by: U \ E/V = U ⊲⊳ \ V ⊳ , V ⊳ = E ⊳ \ V ⊳ , = U ⊲⊳ \ E ⊲⊳ , U ⊲ E = E ⊳ \ E ⊲⊳ .

19 Using the Factor Matrix Problem : For fixed regular language E and varying S , determine S ⊆ E . Generalisation : For fixed regular language E and varying S , determine the relation S ⊆ [ [ E ] ] . (Formally, the relation � L, M :: S ⊆ L \ M � where L and M range over the left factors of E .) Works because: ]) • ( T ⊆ [ S · T ⊆ [ [ E ] ] ≡ ( S ⊆ [ [ E ] [ E ] ]) . where B • C denotes the composition of relations B and C .

20 Proof We have to show that S · T ⊆ U ⊳ \ W ⊳ ≡ �∃ V :: S ⊆ U ⊳ \ V ⊳ ∧ T ⊆ V ⊳ \ W ⊳ � . First, S · T ⊆ E = { unit of conjunction } S · T ⊆ E ∧ true = { factors, T ⊳ = E/T ; cancellation } S ⊆ T ⊳ ∧ T ⊳ · T ⊆ E = { factors, T ⊳⊲ = T ⊳ \ E } S ⊆ T ⊳ ∧ T ⊆ T ⊳⊲ . Whence:

21 S · T ⊆ U ⊳ \ W ⊳ = { factors, definition of W ⊳ } U ⊳ · S · T · W ⊆ E = { above, with S,T := U ⊳ · S , T · W } U ⊳ · S ⊆ ( T · W ) ⊳ T · W ⊆ ( T · W ) ⊳⊲ ∧ = { factors } S ⊆ U ⊳ \ ( T · W ) ⊳ T ⊆ ( T · W ) ⊳⊲ / W ∧ = { U ⊲ /W = U \ W ⊳ } S ⊆ U ⊳ \ ( T · W ) ⊳ ∧ T ⊆ ( T · W ) ⊳ \ W ⊳ . { one-point rule } ⇒ �∃ V :: S ⊆ U ⊳ \ V ⊳ ∧ T ⊆ V ⊳ \ W ⊳ � { Leibniz } ⇒ �∃ V :: S · T ⊆ U ⊳ \ V ⊳ · V ⊳ \ W ⊳ � { cancellation, } ⇒ S · T ⊆ U ⊳ \ W ⊳ .

22 Summary • Use of fusion as programming method. • Problem generalisation involves generalising the algebra in the solution domain. • Factor theory as basis for language inclusion problems. Challenges • Efficient computation of factor matrices. • Extension to non-regular languages.

23 References J.H. Conway, “Regular Algebra and Finite Machines”, Chapman and Hall, London, 1971. Backhouse, R.C and Carr´ e, B.A. “Regular algebra applied to path-finding problems”, J. Institute of Mathematics and its Applications, vol. 15, pp. 161–186, 1975. Backhouse, R.C. and Lutz, R.K., “Factor graphs, failure functions and bi-trees”, Automata, Languages and Programming, LNCS 52, pp. 61–75, 1977. Roland Backhouse, “Fusion on Languages”, 10th European Symposium on Programming, LNCS 2028, pp. 107–121, 2001. O. de Moor, S. Drape, D. Lacey and G. Sittampalam. Incremental program analysis via language factors. ( www.comlab.ox.ac.uk/oucl/work/oege.demoor/pubs.htm ) For related publications on fixed points, Galois connections and mathematics of program construction, see www.cs.nott.ac.uk/~rcb/papers

Language-Processing Problems Roland Backhouse DIMACS, 8th July, - PowerPoint PPT Presentation

1 Language-Processing Problems Roland Backhouse DIMACS, 8th July, 2003 2 Introduction Factors and the factor matrix were introduced by Conway (1971). He used them very effectively in, for example, constructing biregulators.

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Solving Percent Problems Word Problems Find a Pattern Estimation Problems Fraction Problems

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Problems with early language systems: Complicated Problems with early language systems:

Natural Language Processing: Traditional Processing Pipeline Roman Kern <rkern@tugraz.at>

Language Processing with Perl and Prolog Chapter 2: Corpus Processing Tools Pierre Nugues Lund

Homes in the Past Today we will be... Investigating what is the same and what is different about

Homes Today we will be... Investigating different types of homes. NEXT www.planbee.com How

Distribution and Fulfillment centers By Dr. Albert Tan 1 1 Lecture 3 Overview of

Campus Concepts Unless the Lord Builds the House Granada Heights Friends Church

Empirical Methods in Natural Language Processing Lecture 4 Language Modeling (II): Smoothing and

VERMONT RENTAL HOUSING STABILIZATION PROGRAM Q&A September 16, 2020 Tenant

Exploring the Wonders of Creation through the Lens of Science Leslie Wickman, Ph.D. It all

Heat Transport Across a Small Gap: Transition from Radiation to Conductance Bair V. Budaev and

Language-Processing Problems Roland Backhouse DIMACS, 8th July, - PowerPoint PPT Presentation

1 Language-Processing Problems Roland Backhouse DIMACS, 8th July, 2003 2 Introduction Factors and the factor matrix were introduced by Conway (1971). He used them very effectively in, for example, constructing biregulators.

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Solving Percent Problems Word Problems Find a Pattern Estimation Problems Fraction Problems

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Problems with early language systems: Complicated Problems with early language systems:

Natural Language Processing: Traditional Processing Pipeline Roman Kern &lt;rkern@tugraz.at&gt;

Language Processing with Perl and Prolog Chapter 2: Corpus Processing Tools Pierre Nugues Lund

Homes in the Past Today we will be... Investigating what is the same and what is different about

Homes Today we will be... Investigating different types of homes. NEXT www.planbee.com How

Distribution and Fulfillment centers By Dr. Albert Tan 1 1 Lecture 3 Overview of

Campus Concepts Unless the Lord Builds the House Granada Heights Friends Church

Empirical Methods in Natural Language Processing Lecture 4 Language Modeling (II): Smoothing and

VERMONT RENTAL HOUSING STABILIZATION PROGRAM Q&amp;A September 16, 2020 Tenant

Exploring the Wonders of Creation through the Lens of Science Leslie Wickman, Ph.D. It all

Heat Transport Across a Small Gap: Transition from Radiation to Conductance Bair V. Budaev and

Natural Language Processing: Traditional Processing Pipeline Roman Kern <rkern@tugraz.at>

VERMONT RENTAL HOUSING STABILIZATION PROGRAM Q&A September 16, 2020 Tenant