Dependency-Based Hybrid Syntactic Analysis for Languages with a - - PowerPoint PPT Presentation

dependency based hybrid syntactic analysis for languages
SMART_READER_LITE
LIVE PREVIEW

Dependency-Based Hybrid Syntactic Analysis for Languages with a - - PowerPoint PPT Presentation

Dependency-Based Hybrid Syntactic Analysis for Languages with a Rather Free Word Order Guntis B rzdi , Normunds Gr z tis , Gunta Nepore and Baiba Saul te Institute of Mathematics and Computer Science University of Latvia


slide-1
SLIDE 1

Dependency-Based Hybrid Syntactic Analysis for Languages with a Rather Free Word Order

Guntis Bārzdiņš, Normunds Grūzītis, Gunta Nešpore and Baiba Saulīte Institute of Mathematics and Computer Science University of Latvia

NODALIDA 2007, Tartu, May 25-26, 2007

slide-2
SLIDE 2

SemTi-Kamols Project

  • Initial intention:

– Integration of Latvian and the latest semantic web technologies

  • Natural language is a challenge and a good measure for advanced

semantic web development

– Ontology based natural language processing

  • Text meaning representation (TMR)
  • Current course:

– Grammatical analysis of a raw text

  • Creation of a morphologically and syntactically annotated corpus
  • Chunking of Latvian web pages

– Model building from a controlled text (TMR)

slide-3
SLIDE 3
  • To develop a grammar model, which would facilitate TMR
  • Top-down approach

– What is a sense of a word?

  • Bottom-up approach

– Conveyance of the meaning from the surface to the model

The Problem

move CHANGE_LOCATION

...

EVENT run OBJECT T

syntactic structures semantic structures

slide-4
SLIDE 4
  • Bottom-up approach by drawing functional links between words
  • Good for inflective languages: declensions, free word order
  • Facilitates analysis via a relatively small set of rules
  • Convenience (linguistic tradition) & flexibility
  • Direct mapping to semantic structures: verb / predicate / event

subject [Noun,Nom] -> [Verb] agent

  • bject

[Noun,Sg,Acc] -> [Verb]

  • bject

Dependency Grammars

Verb Noun [Nom] Noun [Sg,Acc]

subject

  • bject

agent

  • bject

vīrs (the man) paņēma (took) grāmatu (the book) Verb Noun [Sg,Acc] Noun [Nom]

  • bject

agent

  • bject

subject

grāmatu (the book) paņēma (is taken) vīrs (by the man)

slide-5
SLIDE 5
  • Constituency-based approaches with dependency elements

– Head-driven phrase structure grammar – TIGER annotation scheme

  • Nodes = constituents, edges = functions
  • Discontinuous constituents
  • Dependency-based approaches with constituency elements

– Our hybrid parsing method

  • Good for rather free word order languages with analytical word forms

S NP{GEN,NUM,nom} VP{GEN,NUM} NP Det Adj{GEN,NUM,CASE} Noun{GEN,NUM,CASE} VP Verb NP{acc}

Hybrid Grammars

slide-6
SLIDE 6

x-Words

  • A concept of “x-word”, which in a sense is the core idea
  • Acts as a glue between the two worlds due to its dual nature

– A non-terminal symbol in the phrase structure grammar, and as such during the parsing process substitutes all entities forming this constituent – A regular word that can act as a head for depending words and/or as a dependent of another head word according to the dependency grammar

  • Simple sentence structure + dependencies & agreements

([_ ,[v,aux ,Tense,Nr,_ ]], [_ ,[v,aux ,past ,Nr,_ ]], [_ ,[v,main,Ø ,Ø ,Trans]]) [x-verb,[v,main,Tense,Nr,Trans,perf]] ir bijis jādod — ‘have had to give’

slide-7
SLIDE 7

Implementation

  • A list of “simple” word forms

along with their morphological features

  • Acquired on-the-fly via

morphological analysis

  • A list of multi-word patterns
  • x-Words can be nested in
  • ther x-Words
  • Explicit and implicit

constituents may interleave

  • A list of possible head-

dependant pairs

  • Simple word and x-word

heads/dependants are treated equally

A-Table Word Morphological Features vasarā [n,f,sg,loc] var [v,aux,present,pl,trans] peldēties [v,m,inf,0,intrans] ... ... x-verb ... ... x-prep ... ... x-coord Constituents Morphology x-Word X-Table [_,{n,gen}] [_,{n}] attribute [_,{n,Nr,nom}] [x-verb,{v,m,Nr}] subject [_,{n,loc}] [_,{v,m}] modifier Dependant Head Function B-Table

slide-8
SLIDE 8

Adv Noun x-Coord Verb Noun x-Prep

m

  • d

i f i e r subject modifier attribute

Vasarā (In summer) lieli (big) un (and) mazi (small) bērni (kids) dodas (are going) uz (to) Baltijas (the Baltic) jūru (sea) kur (where) viņi (they) var (can) peldēties. (swim) ,

Adj Conj Adj

ATTRIBUTE TIME AGENT

Prep Noun

a t t r i b u t e OWNER PLACE

x-Sub Comma Adv x-Verb Pron Modal Verb

attribute ATTRIBUTE s u b j e c t AGENT

Adv Noun x-Coord Verb Noun x-Prep

m

  • d

i f i e r subject modifier attribute

Vasarā (In summer) lieli (big) un (and) mazi (small) bērni (kids) dodas (are going) uz (to) Baltijas (the Baltic) jūru (sea) kur (where) viņi (they) var (can) peldēties. (swim) ,

Adj Conj Adj

ATTRIBUTE TIME AGENT

Prep Noun

a t t r i b u t e OWNER PLACE

x-Sub Comma Adv x-Verb Pron Modal Verb

attribute ATTRIBUTE s u b j e c t AGENT

An Example Parse Tree

slide-9
SLIDE 9

Graphical Notation – Nested Boxes

slide-10
SLIDE 10

PSG: low computational complexity

Computational Challenge

Hybrid structure parser: extremely high computational complexity

  • The parsing algorithm and the grammar is incomplete

– Exponential complexity, relative to the length of a sentence – Only fragments (chunks) can be fully parsed in complex real-life sentences

  • The task is to find the longest parseable chunks

– Run the parser on all sub-sequences of the sentence – x-Words are devices that cancel off substrings

  • Chunking reveals the non-parseable fragments

DG: high computational complexity

slide-11
SLIDE 11

persiks (a peach) ir (is) sulīgs (juicy) Aux Adj x-Verb Noun ļoti (very) Adv

subject m

  • d

i f i e r

persiks (a peach) ir (is) sulīgs (juicy) Aux Adj x-Verb Noun ļoti (very) Adv

subject m

  • d

i f i e r

  • Types of analytical forms of a predicate that are currently

described in our grammar:

– perfect tenses and moods – passive voice – semantic modifiers – nominal and adverbial predicates

Syntactic Phenomena & x-Words (1)

([_,[v,aux,Tense,Nr,Prs]],[_,[adj,Gen,Nr,nom]]) [x-verb,[v,m,Tense,Nr,Prs,Gen,nom]]

slide-12
SLIDE 12

skatīties (to look) pa (through) istabas (of room) Prep Noun x-Prep Verb gaišas (a light) Adj

modifier a t t r i b u t e

logu (a window) Noun

a t t r i b u t e

skatīties (to look) pa (through) istabas (of room) Prep Noun x-Prep Verb gaišas (a light) Adj

modifier a t t r i b u t e

logu (a window) Noun

a t t r i b u t e

Syntactic Phenomena & x-Words (2)

  • Prepositional phrases: preposition + nomen in an appropriate form
  • The nomen may be involved in other dependencies
slide-13
SLIDE 13

meitene (a girl) sēž (is sitting) lasa (reading) Verb Verb x-Coord Noun un (and) Conj

subject

grāmatu (a book) Noun

  • bject

meitene (a girl) sēž (is sitting) lasa (reading) Verb Verb x-Coord Noun un (and) Conj

subject

grāmatu (a book) Noun

  • bject

Syntactic Phenomena & x-Words (3)

  • Coordinated parts of sentence can be regarded as an x-word, as

they have the same syntactic role

  • Morphological features are in agreement, thus can be inherited with

no loss of information

slide-14
SLIDE 14

Syntactic Phenomena & x-Words (4)

  • Subordinate clauses are seen as x-words as well

– They link to the principal clauses as single parts of a sentence (both syntactically and semantically) – Typically they are dependants of a single (x-)word:

  • noun attributive clause
  • verb object clause
  • verb modifier clause
  • Coordinated clauses

– Could be joined under an artificial node, analogous to coordinated parts

  • f a sentence

– However, semantically each clause is treated as a separate sentence

  • Both types of clauses can be expanded to a full-fledged simple

sentence structure

  • Extremely high computational complexity
slide-15
SLIDE 15

Discontinuous Constituents

  • One of the main issue dealing with a phrase-structure grammar
  • Non-projective parse trees is very rare phenomenon in

dependency grammars

– A dependency grammar is essentially not based on constituents – At the root there is a verb to which all the other syntactic primitives are connected

  • Written text & neutral word order vs. speech
  • Discontinuous x-words are implicitly covered by the natural

interleaving of dependants within them

– Dependants that linearly stand inside of an x-word are not allowed to be connected to the x-word as whole, but to a particular constituent of it

slide-16
SLIDE 16
  • Performance and complexity aspects are not considered much
  • Method has been implemented in a running parser of Latvian
  • Currently we have formalized ~450 patterns of x-words and ~200

dependency rules

– The table of morphological features for each sentence is built on-the-fly – A significant amount of work is still pending to accomplish a nearly complete coverage of syntax – Consistency of the set of rules – Chunking facilitates debugging and improvement of the grammar

  • If a parse tree is syntactically valid there is no reason to reject it

(while selectional restrictions are not considered)

  • Morphological analysis syntax parsing syntactic and lexical

valences model builder

Evaluation

slide-17
SLIDE 17

Future Plans

  • Coverage of the grammar
  • Quantitative:

– Run the chunker on all Latvian texts available

  • Raw-text corpus (~100 milj. running words)
  • Web corpus (~4 GB)
  • Exploitation of the BalticGRID infrastructure
  • Average chunk size statistics for various classes of documents

– Use results to iteratively...

  • create a partially annotated Latvian text corpus
  • improve the chunker and the grammar
  • Qualitative:

– Sofie treebank (just started)

slide-18
SLIDE 18

Conclusion

  • The proposed model can be used to describe languages both with

rather free or strict word order

  • Adaptation of the parser requires “only” the three tables to be

produced declaring morphology and syntax of the particular language

  • Straightforward compatibility between the syntactic and semantic

structures is an important argument

  • Construction of a wide coverage grammar seams to be more

convenient via a layer of the hybrid approach

  • Computational complexity and performance requires further

investigation and optimizations

slide-19
SLIDE 19

Thank you!

Questions?

www.semti-kamols.lv