Urdu and the Modular Architecture of ParGram Tina B ogel, Miriam - - PowerPoint PPT Presentation

urdu and the modular architecture of pargram
SMART_READER_LITE
LIVE PREVIEW

Urdu and the Modular Architecture of ParGram Tina B ogel, Miriam - - PowerPoint PPT Presentation

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion Urdu and the Modular Architecture of ParGram Tina B ogel, Miriam Butt, Annette Hautli, Sebastian Sulger Universit at Konstanz CLT 2009, Lahore


slide-1
SLIDE 1

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Urdu and the Modular Architecture of ParGram

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz

CLT 2009, Lahore

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-2
SLIDE 2

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

1 Introduction 2 Overall Architecture 3 Morphology 4 Syntax 5 Prosody 6 Semantics 7 Conclusion

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-3
SLIDE 3

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Introduction

ParGram: NLP project based on Lexical Functional Grammar (LFG)

building large-scale, robust grammars larger grammars: English, French, German, Japanese, Norwegian, Turkish smaller grammars: Arabic, Chinese, Georgian, Malagasy, Urdu, Welsh

LFG parsing and generation using a modular type of architecture this talk: description of the modules used for the grammar; short demos of two of the modules

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-4
SLIDE 4

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Introduction

modules of the grammar: tokenizer, morphological analyzer, syntactic rules, prosodic projection, semantics interface modules are connected using development platform XLE ParGram architecture design allows for robust, large-scale parsing and generation and satisfactory treatment of language-specific phenomena

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-5
SLIDE 5

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Overall Architecture

tokenizer and morphological analyzer: finite-state machines using the Xerox Finite-State Calculus (xfst) morphological analyzer feeds into syntactic rules component

morphological tagging interacts with syntactic rules syntactic analyses are informed by theoretical work within LFG c(onstituent)-structure and f(unctional)-structure are produced by the XLE platform

phonological rules, built in the syntactic module, rephrase prosody (result: p-structure) syntactic structures provide basis for semantic analysis

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-6
SLIDE 6

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Overall Architecture

tokenizer & morphology (fst) ↓ syntax (f- and c-structure) → prosody (p-structure) ↓ semantics (xfr ordered rewriting) Overall modular architecture of ParGram Urdu grammar

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-7
SLIDE 7

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Morphology

implemented using a finite-state machine functions as “black box”

usable tag output for XLE - but could be replaced by other morphological resources morphology is a stand-alone resource - may be used for other applications

connected up to the syntax using a morphology-syntax interface

morphological information can easily be extracted from the finite-state machine system allows broad vocabulary coverage system allows description of language-specific morphological phenomena like reduplication, future formation, etc.

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-8
SLIDE 8

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Morphology

sample output of morphology lexicon: MORPHOLOGY laRk+Noun+Fem+Sg tags are used as input for syntax component: INTERFACE +Fem GEND xle @(GEND fem) +Sg NUM xle @(NUM sg) features are displayed in c- and f-structure: SYNTAX

CS 1: N NOUN-S_BASE laRk N-T_BASE +Noun GEND_BASE +Fem NUM_BASE +Sg

"laRkI" 'laRk ' PRED count COMMON NSEM common NSYN NTYPE GEND fem , NUM sg, PERS 3 1

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-9
SLIDE 9

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Syntax

syntax component is at the core of Urdu grammar theoretical background: LFG well-studied (∼ 30 years) framework with computational usability c- and f-structures used for syntactic representation

c-structure: basic constituent structure (“tree”) and linear precedence (∼ what parts belong together) f-structure: encodes syntactic functions and properties

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-10
SLIDE 10

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Syntax

CS 1: ROOT S KP NP N nAdyA VCmain Vmain V hasI

"nAdyA hasI" 'has <[1:nAdyA] >' PRED 'nAdyA ' PRED name PROPER-TYPE PROPER NSEM proper NSYN NTYPE + SPECIFIC SEM-PROP CASE nom , GEND fem , NUM sg, PERS 3 1 SUBJ infl _MTYPE _VMORPH

  • _RESTRICTED

CHECK IMPF

  • , PERF

+, PROG

  • ASPECT

indicative MOOD TNS-ASP CLAUSE-TYPE decl , LEX-SEM unerg , PASSIVE

  • , VFORM

perf , VTYPE main 16

size: 40 phrase-structure rules, annotated for syntactic function coverage: basic clauses with free word order, verbal complex, tense and aspect, causative verbs, complex predicates Demo: Complex Predicates

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-11
SLIDE 11

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Prosody

LFG architecture allows for additional projections prosody implemented as additional layer on top of syntax prosodic information important for correct understanding/disambiguation of a sentence experimental p(rosodic)-structure in order to model complex phonological properties of clitics, especially Urdu Ezafe

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-12
SLIDE 12

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Prosody - Urdu Ezafe

Urdu Ezafe: loan construction from Persian calls for modifier (adjective or noun) to the right of head noun: not in line with usual Urdu head-final pattern example:

  • a. sher=e panjAb
  • b. sadA=e

buland lion=Ez Punjab voice=Ez high ‘A/The lion of Punjab’ ‘high voice’

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-13
SLIDE 13

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Prosody - Urdu Ezafe

modifier is licenced by Ezafe Ezafe is head of Ezafe phrase constituent complement of Ezafe modifies head noun

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-14
SLIDE 14

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Prosody - Urdu Ezafe

Ezafe is clitic: it attaches to the end of a constituent, which is not possible for inflectional morphology

example shows clitic status of Ezafe: [maal

  • daulat]=e dunyaa

material and wealth=Ez world ‘the material and wealth of the world’

within prosody, the clitic Ezafe is integrated in the prosodic phrase to its left - not modeled at level of syntax additional level: p-structure Ezafe is coded as part of the head noun within p-structure:

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-15
SLIDE 15

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Semantics

f-structures within XLE are coded in Prolog for semantics, we take Prolog code and apply ordered rewrite rules (XFR) on it

reasonable approach, as f-structures are equivalent to quasi-logical forms

input f-structure is consumed step by step by the rewrite rules XLE produces output semantic form world knowledge may also be included (English ParGram grammar uses WordNet as knowledge base) Demo: Semantics module

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram

slide-16
SLIDE 16

Preview Introduction Overall Architecture Morphology Syntax Prosody Semantics Conclusion

Conclusion

Urdu ParGram project devoted to developing a large-scale, broad-coverage LFG parsing and generation grammar using XLE

pipeline architecture: single components may be used in other contexts informed by well-studied linguistic insights from LFG theory

currently experimenting with additional annotation using p-structure (prosody) and XFR rewriting (semantics) LFG/XLE methodology: powerful, effective, proven and tested

Tina B¨

  • gel, Miriam Butt, Annette Hautli, Sebastian Sulger

Universit¨ at Konstanz Urdu and the Modular Architecture of ParGram