dependency based hybrid syntactic analysis for languages
play

Dependency-Based Hybrid Syntactic Analysis for Languages with a - PowerPoint PPT Presentation

Dependency-Based Hybrid Syntactic Analysis for Languages with a Rather Free Word Order Guntis B rzdi , Normunds Gr z tis , Gunta Nepore and Baiba Saul te Institute of Mathematics and Computer Science University of Latvia


  1. Dependency-Based Hybrid Syntactic Analysis for Languages with a Rather Free Word Order Guntis B ā rzdi ņ š, Normunds Gr ū z ī tis , Gunta Nešpore and Baiba Saul ī te Institute of Mathematics and Computer Science University of Latvia NODALIDA 2007, Tartu, May 25-26, 2007

  2. SemTi-Kamols Project • Initial intention: – Integration of Latvian and the latest semantic web technologies • Natural language is a challenge and a good measure for advanced semantic web development – Ontology based natural language processing • Text meaning representation (TMR) • Current course: – Grammatical analysis of a raw text • Creation of a morphologically and syntactically annotated corpus • Chunking of Latvian web pages – Model building from a controlled text (TMR)

  3. The Problem • To develop a grammar model, which would facilitate TMR • Top-down approach T EVENT OBJECT ... CHANGE_LOCATION syntactic structures semantic structures move run – What is a sense of a word? • Bottom-up approach – Conveyance of the meaning from the surface to the model

  4. Dependency Grammars • Bottom-up approach by drawing functional links between words • Good for inflective languages : declensions, free word order • Facilitates analysis via a relatively small set of rules • Convenience (linguistic tradition) & flexibility • Direct mapping to semantic structures: verb / predicate / event object subject Verb Verb subject object agent object agent object Noun [Nom] Noun [Sg,Acc] Noun [Sg,Acc] Noun [Nom] v ī rs pa ņē ma gr ā matu gr ā matu pa ņē ma v ī rs (the man) (took) (the book) (the book) (is taken) (by the man) subject [Noun,Nom] -> [Verb] agent object [Noun,Sg,Acc] -> [Verb] object

  5. Hybrid Grammars • Constituency-based approaches with dependency elements – Head-driven phrase structure grammar S � NP {GEN,NUM,nom} VP {GEN,NUM} NP � Det Adj {GEN,NUM,CASE} Noun {GEN,NUM,CASE} VP � Verb NP {acc} – TIGER annotation scheme • Nodes = constituents, edges = functions • Discontinuous constituents • Dependency-based approaches with constituency elements – Our hybrid parsing method • Good for rather free word order languages with analytical word forms

  6. x-Words • A concept of “x-word”, which in a sense is the core idea • Acts as a glue between the two worlds due to its dual nature – A non-terminal symbol in the phrase structure grammar, and as such during the parsing process substitutes all entities forming this constituent – A regular word that can act as a head for depending words and/or as a dependent of another head word according to the dependency grammar ([_ ,[v,aux ,Tense,Nr,_ ]], [_ ,[v,aux ,past ,Nr,_ ]], ,Ø ,Trans]]) � [_ ,[v,main,Ø [x-verb,[v,main,Tense,Nr,Trans,perf]] ir bijis j ā dod — ‘ have had to give ’ • Simple sentence structure + dependencies & agreements

  7. Implementation • A list of “simple” word forms A-Table along with their morphological Word Morphological Features features vasar ā [n,f,sg,loc] var [v,aux,present,pl,trans] • Acquired on-the-fly via peld ē ties [v,m,inf,0,intrans] morphological analysis • A list of multi-word patterns X-Table x-Word Morphology Constituents • x-Words can be nested in x-coord ... ... other x-Words x-prep ... ... • Explicit and implicit x-verb ... ... constituents may interleave • A list of possible head- B-Table dependant pairs Function Head Dependant modifier [_,{v,m}] [_,{n,loc}] • Simple word and x-word subject [x-verb,{v,m,Nr}] [_,{n,Nr,nom}] heads/dependants are treated attribute [_,{n}] [_,{n,gen}] equally

  8. An Example Parse Tree Verb Verb modifier modifier r r e e i i PLACE PLACE f f i i d d o o m m TIME TIME subject subject AGENT AGENT x-Prep x-Prep attribute attribute Adv Adv Noun Noun Prep Prep Noun Noun ATTRIBUTE ATTRIBUTE attribute attribute e e ATTRIBUTE ATTRIBUTE t t u u OWNER OWNER b b i i r r t t x-Sub x-Sub t t a a x-Coord x-Coord Noun Noun Comma Comma Adv Adv t t x-Verb x-Verb c c e e j j AGENT AGENT b b u u s s Adj Adj Conj Conj Adj Adj Pron Pron Modal Modal Verb Verb Vasar ā Vasar ā lieli lieli un un mazi mazi b ē rni b ē rni dodas dodas uz uz Baltijas Baltijas j ū ru j ū ru kur kur vi ņ i vi ņ i var var peld ē ties. peld ē ties. , , (In summer) (In summer) (big) (big) (and) (and) (small) (small) (kids) (kids) (are going) (are going) (to) (to) (the Baltic) (the Baltic) (sea) (sea) (where) (where) (they) (they) (can) (can) (swim) (swim)

  9. Graphical Notation – Nested Boxes

  10. Computational Challenge DG: high computational complexity PSG: low computational complexity Hybrid structure parser: extremely high computational complexity • The parsing algorithm and the grammar is incomplete – Exponential complexity, relative to the length of a sentence – Only fragments (chunks) can be fully parsed in complex real-life sentences • The task is to find the longest parseable chunks – Run the parser on all sub-sequences of the sentence – x-Words are devices that cancel off substrings • Chunking reveals the non-parseable fragments

  11. Syntactic Phenomena & x-Words (1) • Types of analytical forms of a predicate that are currently described in our grammar: – perfect tenses and moods – passive voice – semantic modifiers – nominal and adverbial predicates subject subject x-Verb x-Verb r r e e i i f f d d i i Noun Noun o o Aux Aux Adj Adj m m Adv Adv persiks persiks ir ir ļ oti ļ oti sul ī gs sul ī gs (a peach) (a peach) (is) (is) (very) (very) (juicy) (juicy) ([_,[v,aux,Tense,Nr,Prs]],[_,[adj,Gen,Nr,nom]]) � [x-verb,[v,m,Tense,Nr,Prs,Gen,nom]]

  12. Syntactic Phenomena & x-Words (2) • Prepositional phrases : preposition + nomen in an appropriate form • The nomen may be involved in other dependencies modifier modifier Verb Verb x-Prep x-Prep e e t t u u b b i i r r t t t t Prep Prep a a Noun Noun e e t t u u b b i i r r Noun Noun t t t t a a Adj Adj skat ī ties skat ī ties pa pa gaišas gaišas istabas istabas logu logu (to look) (to look) (through) (through) (a light) (a light) (of room) (of room) (a window) (a window)

  13. Syntactic Phenomena & x-Words (3) • Coordinated parts of sentence can be regarded as an x-word, as they have the same syntactic role • Morphological features are in agreement , thus can be inherited with no loss of information subject subject x-Coord x-Coord object object Noun Noun Verb Verb Conj Conj Verb Verb Noun Noun meitene meitene s ē ž s ē ž un un lasa lasa gr ā matu gr ā matu (a girl) (a girl) (is sitting) (is sitting) (and) (and) (reading) (reading) (a book) (a book)

  14. Syntactic Phenomena & x-Words (4) • Subordinate clauses are seen as x-words as well – They link to the principal clauses as single parts of a sentence (both syntactically and semantically) – Typically they are dependants of a single (x-)word: • noun � attributive clause • verb � object clause • verb � modifier clause • Coordinated clauses – Could be joined under an artificial node, analogous to coordinated parts of a sentence – However, semantically each clause is treated as a separate sentence • Both types of clauses can be expanded to a full-fledged simple sentence structure • Extremely high computational complexity

  15. Discontinuous Constituents • One of the main issue dealing with a phrase-structure grammar • Non-projective parse trees is very rare phenomenon in dependency grammars – A dependency grammar is essentially not based on constituents – At the root there is a verb to which all the other syntactic primitives are connected • Written text & neutral word order vs. speech • Discontinuous x-words are implicitly covered by the natural interleaving of dependants within them – Dependants that linearly stand inside of an x-word are not allowed to be connected to the x-word as whole, but to a particular constituent of it

  16. Evaluation • Performance and complexity aspects are not considered much • Method has been implemented in a running parser of Latvian • Currently we have formalized ~450 patterns of x-words and ~200 dependency rules – The table of morphological features for each sentence is built on-the-fly – A significant amount of work is still pending to accomplish a nearly complete coverage of syntax – Consistency of the set of rules – Chunking facilitates debugging and improvement of the grammar • If a parse tree is syntactically valid there is no reason to reject it (while selectional restrictions are not considered) • Morphological analysis � syntax parsing � syntactic and lexical valences � model builder

  17. Future Plans • Coverage of the grammar • Quantitative: – Run the chunker on all Latvian texts available • Raw-text corpus (~100 milj. running words) • Web corpus (~4 GB) • Exploitation of the BalticGRID infrastructure • Average chunk size statistics for various classes of documents – Use results to iteratively... • create a partially annotated Latvian text corpus • improve the chunker and the grammar • Qualitative: – Sofie treebank (just started)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend