natural language processing csci 4152 6509 lecture 27
play

Natural Language Processing CSCI 4152/6509 Lecture 27 Parsing with - PowerPoint PPT Presentation

Natural Language Processing CSCI 4152/6509 Lecture 27 Parsing with Prolog Instructor: Vlado Keselj Time and date: 09:3510:25, 13-Mar-2020 Location: Dunn 135 CSCI 4152/6509, Vlado Keselj Lecture 27 1 / 23 Previous Lecture


  1. Natural Language Processing CSCI 4152/6509 — Lecture 27 Parsing with Prolog Instructor: Vlado Keselj Time and date: 09:35–10:25, 13-Mar-2020 Location: Dunn 135 CSCI 4152/6509, Vlado Keselj Lecture 27 1 / 23

  2. Previous Lecture Context-Free Grammars review continued: ◮ formal definition ◮ inducing a grammar from parse trees ◮ derivations ◮ some notions and terminology Bracket representation of a parse tree CYK Chart Parsing Algorithm Chomsky Normal Form (CNF) CYK algorithm (started) CYK Algorithm example CSCI 4152/6509, Vlado Keselj Lecture 27 2 / 23

  3. Explanation of Index Use in CYK j l i i+l−1 i+l i+j−1 i+j . . . . . . β [i,l,k1] β [i+l,j−l,k2] [i,j,k] β CSCI 4152/6509, Vlado Keselj Lecture 27 3 / 23

  4. CYK Algorithm Require: sentence = w 1 . . . w n , and a CFG in CNF with nonterminals N 1 . . . N m , N 1 is the start symbol Ensure: parsed sentence 1: allocate matrix β ∈ { 0 , 1 } n × n × m and initialize all entries to 0 2: for i ← 1 to n do for all rules N k → w i do 3: β [ i, 1 , k ] ← 1 4: 5: for j ← 2 to n do for i ← 1 to n − j + 1 do 6: for l ← 1 to j − 1 do 7: for all rules N k → N k 1 N k 2 do 8: 9: β [ i, j, k ] ← β [ i, j, k ] OR ( β [ i, l, k 1 ] AND β [ i + l, j − l, k 2 ]) 10: return β [1 , n, 1] CSCI 4152/6509, Vlado Keselj Lecture 27 4 / 23

  5. Parsing Natural Languages Must deal with possible ambiguities Decide whether to make a phrase structure or dependency parser When parsing NLP, there are generally two approaches: Backtracking to find all parse trees 1 Chart parsing 2 CSCI 4152/6509, Vlado Keselj Lecture 27 5 / 23

  6. Parsing with Prolog We will go over a brief Prolog review ◮ more details are provided in the lab Implicative normal form: p 1 ∧ p 2 ∧ . . . ∧ p n ⇒ q 1 ∨ q 2 ∨ . . . ∨ q m If m ≤ 1 , then the clause is called a Horn clause. If resolution is applied to two Horn clauses, the result is again a Horn clause. Inference with Horn clauses is relatively efficient CSCI 4152/6509, Vlado Keselj Lecture 27 6 / 23

  7. Rules A Horn clause with m = 1 is called a rule : p 1 ∧ p 2 ∧ . . . ∧ p n ⇒ q 1 It is expressed in Prolog as: q1 :- p1, p2, ..., p_n. CSCI 4152/6509, Vlado Keselj Lecture 27 7 / 23

  8. Facts A clause with m = 0 is called a fact : p 1 ∧ p 2 ∧ . . . ∧ p n ⇒ ⊤ is expressed in Prolog as: p1, p2, ..., p_n. or :- p1, p2, ..., p_n. and it is called a fact. CSCI 4152/6509, Vlado Keselj Lecture 27 8 / 23

  9. Rabbit and Franklin Example The ‘rabbit and franklin’ example in Prolog: hare(rabbit). turtle(franklin). faster(X,Y) :- hare(X), turtle(Y). Save the program in a file, load the file. After loading the file, on Prolog prompt, type: faster(rabbit,franklin). Try: faster(X,franklin). and faster(X,Y). CSCI 4152/6509, Vlado Keselj Lecture 27 9 / 23

  10. Unification and Backtracking Two important features of Prolog: unification and backtracking What happens after we type: ?- faster(rabbit,franklin). Prolog will search for a ‘matching’ fact or head of a rule: faster(rabbit,franklin) and faster(X,Y) :- ... ‘Matching’ here means unification Unification is an operation of making two terms equal by substituting variables with some terms CSCI 4152/6509, Vlado Keselj Lecture 27 10 / 23

  11. Unification and Backtracking (2) After unifying faster(rabbit,franklin) and faster(X,Y) with substitution X ← rabbit and Y ← franklin , the rule becomes: faster(rabbit,franklin) :- hare(rabbit), turtle(franklin). Prolog interpreter will now try to satisfy predicates at the right hand side: hare(rabbit) and turtle(franklin) and it will easily succeed based on the same facts If it does not succeed, it can generally try other options through backtracking CSCI 4152/6509, Vlado Keselj Lecture 27 11 / 23

  12. Variables. Variable names start with an uppercase letter or an underscore (‘ ’). ‘ ’ is a special, anonymous variable; two occurrences of this variable can represent different values, with no connection between them. Examples: ?- faster(rabbit,franklin). Yes ; ... ?- faster(rabbit,X). X = franklin ; ... ?- hare(X). X = rabbit ; CSCI 4152/6509, Vlado Keselj Lecture 27 12 / 23

  13. Lists (Arrays), Structures. Lists are implemented as linked lists. Structures (records) are expressed as terms. Examples: In program: person(john,public,’123-456’). Interactively: ?- person(john,X,Y). [] is an empty list. A list is created as a nested term, usually a special function ‘ . ’ (dot): ?- is_list(.(a, .(b, .(c, [])))). CSCI 4152/6509, Vlado Keselj Lecture 27 13 / 23

  14. List Notation (.(a, .(b, .(c, []))) is the same as [a,b,c] This is also equivalent to: [ a | [ b | [ c | [] ]]] or [ a, b | [ c ] ] A frequent Prolog expression is: [H|T] where H is head of the list, and T is the tail, which is another list. CSCI 4152/6509, Vlado Keselj Lecture 27 14 / 23

  15. Example: Calculating Factorial factorial(0,1). factorial(N,F) :- N>0, M is N-1, factorial(M,FM), F is FM*N. After saving in factorial.prolog and loading to Prolog: ?- [’factorial.prolog’]. % factorial.prolog compiled 0.00 sec, 1,000 bytes Yes ?- factorial(6,X). X = 720 ; CSCI 4152/6509, Vlado Keselj Lecture 27 15 / 23

  16. Using Prolog to Parse NL Example: Let us consider a simple CFG to parse the following two sentences: “the dog runs” and “the dogs run” The grammar is: S -> NP VP N -> dog NP -> D N N -> dogs D -> the VP -> run VP -> runs CSCI 4152/6509, Vlado Keselj Lecture 27 16 / 23

  17. Control structures. Example (testing membership of a list): member(X, [X|_]). member(X, [_|L]) :- member(X,L). CSCI 4152/6509, Vlado Keselj Lecture 27 17 / 23

  18. Using Difference Lists The problem of parsing using this grammar can be expressed in the following way in Prolog: s(S,R) :- np(S,I), vp(I, R). np(S,R) :- d(S,I), n(I,R). d([the|R], R). n([dog|R], R). n([dogs|R], R). vp([run|R], R). vp([runs|R], R). CSCI 4152/6509, Vlado Keselj Lecture 27 18 / 23

  19. Parsing using Difference Lists Save this in file parse.prolog . On Prolog prompt we type: ?- [’parse.prolog’]. % parse.prolog compiled 0.00 sec, 1,888 bytes Yes ?- s([the,dog,runs],[]). Yes ?- s([runs,the,dog],[]). No CSCI 4152/6509, Vlado Keselj Lecture 27 19 / 23

  20. Basic Definite Clause Grammar (DCG) DCG — Prolog built-in mechanism for parsing Example s --> np, vp. np --> d, n. d --> [the]. n --> [dog]. n --> [dogs]. vp --> [run]. vp --> [runs]. CSCI 4152/6509, Vlado Keselj Lecture 27 20 / 23

  21. Building a Parse Tree A parse tree can be built in the following way: s(s(Tn,Tv)) --> np(Tn), vp(Tv). np(np(Td,Tn)) --> d(Td), n(Tn). d(d(the)) --> [the]. n(n(dog)) --> [dog]. n(n(dogs)) --> [dogs]. vp(vp(run)) --> [run]. vp(vp(runs)) --> [runs]. At Prolog prompt we type and obtain: ?- s(X, [the, dog, runs], []). X = s(np(d(the),n(dog)),vp(runs)); CSCI 4152/6509, Vlado Keselj Lecture 27 21 / 23

  22. Handling Agreement s(s(Tn,Tv)) --> np(Tn,A), vp(Tv,A). np(np(Td,Tn),A) --> d(Td), n(Tn,A). d(d(the)) --> [the]. n(n(dog),sg) --> [dog]. n(n(dogs),pl) --> [dogs]. vp(vp(run),pl) --> [run]. vp(vp(runs),sg) --> [runs]. This grammar will accept sentences “the dog runs” and “the dogs run” but not “the dog run” and “the dogs runs”. Other phenomena can be modeled in a similar fashion. CSCI 4152/6509, Vlado Keselj Lecture 27 22 / 23

  23. Embedded Code We can embed additional Prolog code using braces, e.g.: s(T) --> np(Tn), vp(Tv), {T = s(Tn,Tv)}. and so on, is another way of building the parse tree. CSCI 4152/6509, Vlado Keselj Lecture 27 23 / 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend