Towards a Computational Semantic Analyzer for Urdu Annette Hautli - - PowerPoint PPT Presentation

towards a computational semantic analyzer for urdu
SMART_READER_LITE
LIVE PREVIEW

Towards a Computational Semantic Analyzer for Urdu Annette Hautli - - PowerPoint PPT Presentation

Towards a Computational Semantic Analyzer for Urdu Annette Hautli Miriam Butt Department of Linguistics, University of Konstanz 9th Workshop on Asian Linguistic Resources, IJCNLP 11 1 / 35 Motivation Advances in the computational


slide-1
SLIDE 1

Towards a Computational Semantic Analyzer for Urdu

Annette Hautli Miriam Butt

Department of Linguistics, University of Konstanz

9th Workshop on Asian Linguistic Resources, IJCNLP ’11

1 / 35

slide-2
SLIDE 2

Motivation

1

Advances in the computational processing of Urdu

2

Increasing amount of lexical resources for Urdu available

2 / 35

slide-3
SLIDE 3

Motivation

1

Advances in the computational processing of Urdu

2

Increasing amount of lexical resources for Urdu available

Task

Gathering information from various resources and putting them together to form one coherent resource for Urdu.

3 / 35

slide-4
SLIDE 4

Motivation

1

Advances in the computational processing of Urdu

2

Increasing amount of lexical resources for Urdu available

Task

Gathering information from various resources and putting them together to form one coherent resource for Urdu.

Challenge

What formalism can we employ that puts this information together? And what are the particular challenges with respect to Urdu?

4 / 35

slide-5
SLIDE 5

Taking stock

Urdu is still a language with comparably few linguistic resources Syntactic parsers:

◮ Treebank-based PCFG parser (Abbas, 2002) ◮ Urdu dependency parser trained with MaltParser (Ali and Hussain,

2010)

◮ Urdu ParGram grammar based on LFG (Butt and King 2004,

  • gel et al. 2009)

Lexical resources:

◮ Emille corpus (Baker et al., 2004) ◮ “Experiences in Building Urdu Wordnet” (Adeeba and Hussain, 2011) ◮ Urdu WordNet based on Hindi WordNet (Ahmed and Hautli, 2009) ◮ Automatic collection of Urdu multiwords (Hautli and Sulger, 2011) ◮ Development of a lexical resource for Urdu verbs 5 / 35

slide-6
SLIDE 6

Taking stock

Urdu is still a language with comparably few linguistic resources Syntactic parsers:

◮ Treebank-based PCFG parser (Abbas, 2002) ◮ Urdu dependency parser trained with MaltParser (Ali and Hussain,

2010)

◮ Urdu ParGram grammar based on LFG (Butt and King 2004,

  • gel et al. 2009)

Lexical resources:

◮ Emille corpus (Baker et al., 2004) ◮ “Experiences in Building Urdu Wordnet” (Adeeba and Hussain, 2011) ◮ Urdu WordNet based on Hindi WordNet (Ahmed and Hautli, 2009) ◮ Automatic collection of Urdu multiwords (Hautli and Sulger, 2011) ◮ Development of a lexical resource for Urdu verbs 6 / 35

slide-7
SLIDE 7

The Urdu ParGram grammar

Parser based on the formalism of Lexical Functional Grammar (Bresnan and Kaplan 1981) run on the development platform xle (Crouch et al. 2011) The Urdu ParGram grammar as part of an international effort to create parallel grammars for different languages

7 / 35

slide-8
SLIDE 8

The Urdu ParGram grammar

Parser based on the formalism of Lexical Functional Grammar (Bresnan and Kaplan 1981) run on the development platform xle (Crouch et al. 2011) The Urdu ParGram grammar as part of an international effort to create parallel grammars for different languages transliteration (fst)

8 / 35

slide-9
SLIDE 9

The Urdu ParGram grammar

Parser based on the formalism of Lexical Functional Grammar (Bresnan and Kaplan 1981) run on the development platform xle (Crouch et al. 2011) The Urdu ParGram grammar as part of an international effort to create parallel grammars for different languages transliteration (fst) ↓ tokenizer & morphology (fst)

9 / 35

slide-10
SLIDE 10

The Urdu ParGram grammar

Parser based on the formalism of Lexical Functional Grammar (Bresnan and Kaplan 1981) run on the development platform xle (Crouch et al. 2011) The Urdu ParGram grammar as part of an international effort to create parallel grammars for different languages transliteration (fst) ↓ tokenizer & morphology (fst) ↓ syntax (xle lfg)

10 / 35

slide-11
SLIDE 11

The Urdu ParGram grammar

Parser based on the formalism of Lexical Functional Grammar (Bresnan and Kaplan 1981) run on the development platform xle (Crouch et al. 2011) The Urdu ParGram grammar as part of an international effort to create parallel grammars for different languages transliteration (fst) ↓ tokenizer & morphology (fst) ↓ syntax (xle lfg) ↓ semantics (xfr ordered rewriting)

11 / 35

slide-12
SLIDE 12

The Urdu ParGram grammar

Parser based on the formalism of Lexical Functional Grammar (Bresnan and Kaplan 1981) run on the development platform xle (Crouch et al. 2011) The Urdu ParGram grammar as part of an international effort to create parallel grammars for different languages transliteration (fst) ↓ tokenizer & morphology (fst) ↓ syntax (xle lfg) ↓ semantics (xfr ordered rewriting)

12 / 35

slide-13
SLIDE 13

The Urdu ParGram grammar

us nE t3ul AbEb mEN sEb kHAyA he Erg Tel Aviv in apple eat.Perf.F.Sg ‘He/She ate an apple in Tel Aviv.’

CS 1: ROOT Sadj S KP NP PRON us K nE KP NP N t3ul AbEb K mEN KP NP N sEb VCmain V kHAyA "us nE t3ul AbEb mEN sEb kHAyA" 'kHA<[1:vuh], [26:sEb]>' PRED 'vuh' PRED pronoun NSYN NTYPE CASE erg, NUM sg, PERS 3 1 SUBJ 'sEb' PRED count COMMON NSEM common NSYN NTYPE CASE nom, GEND masc, NUM sg, PERS 3 26 OBJ 't3ul AbEb' PRED location PROPER-TYPE PROPER NSEM proper NSYN NTYPE + SPECIFIC SEM-PROP CASE loc, NUM sg, PERS 3 7 ADJUNCT + AGENTIVE LEX-SEM ASPECT perf, MOOD indicative TNS-ASP CLAUSE-TYPE decl, PASSIVE -, VTYPE main 58

13 / 35

slide-14
SLIDE 14

The Urdu ParGram grammar

Parser based on the formalism of Lexical Functional Grammar (Bresnan and Kaplan 1981) run on the development platform xle (Crouch et al. 2011) The Urdu ParGram grammar as part of an international effort to create parallel grammars for different languages transliteration (fst) ↓ tokenizer & morphology (fst) ↓ syntax (xle lfg) ↓ semantics (xfr ordered rewriting)

14 / 35

slide-15
SLIDE 15

The xfr rewrite system

Rewriting and flattening of f-structure facts by rewrite rules (Crouch and King, 2003)

◮ SUBJ(%1,%2) ==> subj(%1,%2).

Each clause is embedded in a context where predications are true or false Allows for the incorporation of lexical resources such as WordNet and VerbNet using a database interface

15 / 35

slide-16
SLIDE 16

The xfr rewrite system

What information would we like to get from a semantic representation?

us nE t3ul AbEb mEN sEb kHAyA he Erg Tel Aviv in apple eat.Perf.F.Sg ‘He/She ate an apple in Tel Aviv.’

16 / 35

slide-17
SLIDE 17

The xfr rewrite system

What information would we like to get from a semantic representation?

us nE t3ul AbEb mEN sEb kHAyA he Erg Tel Aviv in apple eat.Perf.F.Sg ‘He/She ate an apple in Tel Aviv.’

  • 1. What predications hold in the context of the sentence?

17 / 35

slide-18
SLIDE 18

The xfr rewrite system

What information would we like to get from a semantic representation?

us nE t3ul AbEb mEN sEb kHAyA he Erg Tel Aviv in apple eat.Perf.F.Sg ‘He/She ate an apple in Tel Aviv.’

  • 1. What predications hold in the context of the sentence?
  • 2. What are the thematic roles of the grammatical functions?

18 / 35

slide-19
SLIDE 19

The xfr rewrite system

What information would we like to get from a semantic representation?

us nE t3ul AbEb mEN sEb kHAyA he Erg Tel Aviv in apple eat.Perf.F.Sg ‘He/She ate an apple in Tel Aviv.’

  • 1. What predications hold in the context of the sentence?
  • 2. What are the thematic roles of the grammatical functions?
  • 3. What is the lexical information contained in the sentence?

19 / 35

slide-20
SLIDE 20

The xfr semantics

  • 1. What predications hold in the context of the sentence?

us nE t3ul AbEb mEN sEb kHAyA he Erg Tel Aviv in apple eat.Perf.F.Sg ‘He/She ate an apple in Tel Aviv.’

context head(t,kHA:25), in context(t,role(subj,kHA:25,vuh:1)), in context(t,role(obj,kHA:25,sEb:21)), in context(t,role(mod,kHA:25,’t3ul AbEb’:7)).

20 / 35

slide-21
SLIDE 21

The xfr semantics

  • 2. What are the thematic roles of this sentence?

Development of a lexical resource for Urdu verbs in the style of VerbNet

◮ Assignment of thematic roles to the grammatical functions ◮ kHA ‘to eat’: subj → Agent

  • bj → Patient

◮ VerbNet information is stored in a database which can be accessed by

the xfr system

◮ The xfr rules replace the grammatical functions with the thematic

roles from the database

Locational information is available from the f-structure representation and directly put into the semantic representation

21 / 35

slide-22
SLIDE 22

The xfr semantics

us nE t3ul AbEb mEN sEb kHAyA he Erg Tel Aviv in apple eat.Perf.F.Sg ‘He/She ate an apple in Tel Aviv.’

context head(t,kHA:25), in context(t,role(’Agent’,kHA:25),vuh:1), in context(t,role(’Patient’,kHA:25),sEb:21), in context(t,role(’Location’,kHA:25),’t3ul AbEb’:7),

22 / 35

slide-23
SLIDE 23

The xfr semantics

  • 3. What is the lexical information contained in the sentence?

us nE t3ul AbEb mEN sEb kHAyA he/she Erg Tel Aviv in apple eat.Perf.F.Sg ‘He/She ate an apple in Tel Aviv.’

kHA ‘to eat’: ingestive verb where the agent consumes an eatable

  • bject

us ‘he/she’: living thing that performs the eating event sEb ‘apple’: fruit that is the object of consumption t3ul AbEb ‘Tel Aviv’: location mEN ‘in’: indicates that the event takes place in Tel Aviv

23 / 35

slide-24
SLIDE 24

The xfr semantics

The lexical information in our system comes from Urdu WordNet, which is build on the basis of Hindi WordNet (Ahmed and Hautli 2010)

✞ ✝ ☎ ✆ transliterate Urdu input to Hindi ↓ ✞ ✝ ☎ ✆ lookup and extract all information from Hindi WordNet ↓ ✞ ✝ ☎ ✆ remove the gloss (synset description and example sentence) ↓ ✄ ✂

store the lexical information in an xfr-accessible database

24 / 35

slide-25
SLIDE 25

The xfr semantics

Inclusion of all resources: Location

✞ ✝ ☎ ✆

t3ul AbEb ‘Tel Aviv’ ↑

✞ ✝ ☎ ✆

Ingestion: kHA ‘to eat’ ւ ց Animate Thing: vuh ‘he/she’ Fruit/Tree: sEb ‘apple’ Agent Patient

25 / 35

slide-26
SLIDE 26

The xfr semantics

Treatment of spatial expressions (Ahmed, 2010):

26 / 35

slide-27
SLIDE 27

The xfr semantics

Treatment of spatial expressions (Ahmed, 2010):

us nE t3ul AbEb mEN sEb kHAyA he Erg Tel Aviv in apple eat.Perf.F.Sg ‘He/She ate an apple in Tel Aviv.’

context head(t,kHA:25), in context(t,role(’Agent’,kHA:25),vuh:1), in context(t,role(’Patient’,kHA:25),sEb:21), in context(t,role(’Location’,kHA:25),location:100),

27 / 35

slide-28
SLIDE 28

The xfr semantics

Treatment of spatial expressions (Ahmed, 2010):

us nE t3ul AbEb mEN sEb kHAyA he Erg Tel Aviv in apple eat.Perf.F.Sg ‘He/She ate an apple in Tel Aviv.’

context head(t,kHA:25), in context(t,role(’Agent’,kHA:25),vuh:1), in context(t,role(’Patient’,kHA:25),sEb:21), in context(t,role(’Location’,kHA:25),location:100), in context(t,role(figure,location:100,vuh:1)), in context(t,role(ground,location:100,’t3ul AbEb’:7)), in context(t,role(configuration,’t3ul AbEb’:7,in)).

28 / 35

slide-29
SLIDE 29

The xfr semantics

Treatment of modality:

29 / 35

slide-30
SLIDE 30

The xfr semantics

Treatment of modality: Expression of modality mostly constructionally (Bhatt et al. 2011)

us t3ul AbEb mEN sEb kHA pA-yA he Tel Aviv in apple eat find-Perf ‘He/She was able to eat an apple in Tel Aviv.’

CS 1: ROOT Sadj S KP NP PRON vuh KP NP N t3ul AbEb K mEN VCmain V kHA Vmod pAyA

"vuh t3ul AbEb mEN sEb kHA pAyA" 'pA<[56:kHA]>[1:vuh]' PRED 'vuh' PRED pronoun NSYN NTYPE CASE nom, GEND masc, NUM sg, PERS 3 1 SUBJ 'kHA<[1:vuh], [24:sEb]>' PRED [1:vuh] SUBJ 'sEb' PRED count COMMON NSEM common NSYN NTYPE CASE nom, GEND masc, NUM sg, PERS 3 24 OBJ + AGENTIVE LEX-SEM

  • PASSIVE

56 XCOMP 't3ul AbEb ' PRED location PROPER-TYPE PROPER NSEM proper NSYN NTYPE + SPECIFIC SEM-PROP CASE loc, NUM sg, PERS 3 5 ADJUNCT

  • AGENTIVE

LEX-SEM ASPECT perf, MOOD indicative TNS-ASP CLAUSE-TYPE decl, VTYPE main 74

30 / 35

slide-31
SLIDE 31

The xfr semantics

Treatment of modality: Expression of modality in Urdu mostly constructionally

us t3ul AbEb mEN sEb kHA pA-yA he Tel Aviv in apple eat find-Perf ‘He/She was able to eat an apple in Tel Aviv.’

context head(t,pA:6), context head(ctx(kHA:25),kHA:25), in context(t,ABIL(pA,ctx(kHA:25))), in context(t,role(Holder Of Obligation,pA:6),vuh:1), in context(ctx,role(’Agent’,kHA:25,vuh:1)), in context(ctx,role(’Patient’,kHA:25,sEb:21)), in context(ctx,role(’Location’,kHA:25,’t3ul AbEb’:7)).

31 / 35

slide-32
SLIDE 32

Challenges

An extreme case - eat expressions in Hindi/Urdu (Hook and Pardeshi, 2009): Employment of ‘eat’ in idiomatic expressions About 160 eat expressions for Hindi/Urdu Variety of uses due to loan translations from Persian

32 / 35

slide-33
SLIDE 33

Challenges

h2asan=nE kEk=kO kHAyA h2asan cake eat.Perf.Sg.Masc ‘Hasan ate the cake.’ eat= Agent, Patient inqilAbI fikar zang kHA jAEgI revolutionary thought rust eat go.Fut.Fem.Sg ‘Revolutionary thinking will gather rust.’ eat (gather rust) = Patient, Theme is sAl=kI mandI sheyar-bAzAr kHA gAyI this year slowdown stockmarket eat go.Perf.Fem.Sg ‘This year’s slowdown wrecked (lit. devoured) the stock market.’ eat (wreck) = Agent, Theme

33 / 35

slide-34
SLIDE 34

Tasks ahead

Further develop the lexical resources for Urdu Define concrete application areas for the semantic representation and adjust it accordingly Develop evaluation standards for semantic representations and run large-scale experiments

◮ Hindi TreeBank could provide some semantic information

Work on the theoretical semantic analysis of the language

34 / 35

slide-35
SLIDE 35

Summary

The xfr rewrite system is an adequate way of combining various resources in one tool

◮ Lexical information from WordNet ◮ Verb frames from a verb resource

Based on a detailed syntactic analysis, the semantic representation can go deeper, e.g.

◮ spatial expressions ◮ modality constructions 35 / 35

slide-36
SLIDE 36

Thank you!

36 / 35