The Metagrammar Goes Multilingual A Crosslinguistic Look at the V2 - - PowerPoint PPT Presentation

the metagrammar goes multilingual
SMART_READER_LITE
LIVE PREVIEW

The Metagrammar Goes Multilingual A Crosslinguistic Look at the V2 - - PowerPoint PPT Presentation

The Metagrammar Goes Multilingual A Crosslinguistic Look at the V2 Phenomenon Owen Rambow Tatjana Scheffler Thanks to SinWon Yoon Aravind K. Joshi Alexandra Kinyon 22 juin 2007 Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22


slide-1
SLIDE 1

The Metagrammar Goes Multilingual

A Crosslinguistic Look at the V2 Phenomenon Owen Rambow Tatjana Scheffler Thanks to SinWon Yoon Aravind K. Joshi Alexandra Kinyon 22 juin 2007

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 1 / 41

slide-2
SLIDE 2

Outline

Introduction: Multilingual Metagrammars Verb-second in a Multilingual Metagrammar The V2 Phenomenon: German and Yiddish Crosslinguistic Analysis of V2 Our Implementation Sample Derivations Implementation & Evaluation

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 2 / 41

slide-3
SLIDE 3

Introduction: Multilingual Metagrammars

Outline

Introduction: Multilingual Metagrammars Verb-second in a Multilingual Metagrammar The V2 Phenomenon: German and Yiddish Crosslinguistic Analysis of V2 Our Implementation Sample Derivations Implementation & Evaluation

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 3 / 41

slide-4
SLIDE 4

Introduction: Multilingual Metagrammars

Multilingual Metagrammars

Traditional focus: Grammar development Our focus: Linguistic generalizations Our approach: Find cross-linguistic and framework-neutral syntactic invariants

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 4 / 41

slide-5
SLIDE 5

Introduction: Multilingual Metagrammars

Goal of Project

◮ Our goal is theoretical: we want to identify cross-linguistic

principles and language-specific parameters (as does theoretical syntax)

◮ MGs factor common properties of TAG elementary trees ◮ Theoretical syntax makes emprirical claims; they are not usually

systematically validated

◮ XMG allows us to validate empirical claims we make! ◮ Side benefit: actually generating usable grammars for new

languages (theory and applications are intertwined in NLP!)

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 5 / 41

slide-6
SLIDE 6

Introduction: Multilingual Metagrammars

Methodology of the Project

  • 1. Identify issue to be handled: head position (V2) and scrambling
  • 2. Choose theoretical solution to be implemented and verified in

XMG: head features and underspecification

  • 3. Implement:

◮ Take existing metagrammar for Korean ◮ Hypothesize a division into Universal Grammar and a Korean

component

◮ Add German, modifying the UG where necessary ◮ Adapt German grammar into Yiddish – very few, if any, changes to

UG necessary.

  • 4. Generate grammar
  • 5. Validate against (naturally ocurring) data

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 6 / 41

slide-7
SLIDE 7

Introduction: Multilingual Metagrammars

Major New Issue When Going Multilingual: Heads

◮ One language: relative position of verb and arguments determine

word order

◮ Two languages: want language-independent generalizations

about syntax; prototypical example: adverbs in English and French (Pollock): E: Charles (often) eats (*often) beans F: Charles (souvent) mange (souvent) des haricots

◮ Solution: claim verbal heads are in different positions on the

projection in E and F , but adverb is always adjoined to VP

◮ In some languages (like German and Yiddish), it is clear that

verbs can be in different positions on the projection, anyway

◮ For some languages (Korean), there is very little evidence for this

notion, even cross-linguistically

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 7 / 41

slide-8
SLIDE 8

Introduction: Multilingual Metagrammars

Sample cross-linguistic and cross-framework syntactic invariants

◮ Finite number of syntactic categories (NP

, PP , etc.)

◮ Notion of subcategorization (Candito’s dimension 1) ◮ Finite number of syntactic functions (subject, object etc.) ◮ Existence of valency alternations (Candito’s dimension 2) ◮ Argument realization, word order effects (such as V2 or

wh-movement) (extension of Candito’s dimension 3).

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 8 / 41

slide-9
SLIDE 9

Introduction: Multilingual Metagrammars

UG components

◮ TAG elementary tree is defined by a projection, a subcat frame,

and a set of heads and their positions

◮ The set of heads is a function of valency alternations and of

argument realization choices

◮ Much is underspecified in UG: e.g., category of head and

arguments, order of head and sisters, etc.

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 9 / 41

slide-10
SLIDE 10

Introduction: Multilingual Metagrammars

UG components (2)

◮ Universal diathesis alternations: passive, causative ◮ Spec heads, non-spec heads ◮ specifier arguments, non-specifier arguments

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 10 / 41

slide-11
SLIDE 11

Verb-second in a Multilingual Metagrammar

Outline

Introduction: Multilingual Metagrammars Verb-second in a Multilingual Metagrammar The V2 Phenomenon: German and Yiddish Crosslinguistic Analysis of V2 Our Implementation Sample Derivations Implementation & Evaluation

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 11 / 41

slide-12
SLIDE 12

Verb-second in a Multilingual Metagrammar The V2 Phenomenon: German and Yiddish

The Verb-Second Phenomenon (V2)

(1) a. [Auf

  • n

dem the Weg] path sieht sees [der the Junge] boy [eine a Ente]. duck ‘On the path, the boy sees a duck.’ b. * [Auf

  • n

dem the Weg] path [der the Junge] boy sieht sees [eine a Ente]. duck Int.: ‘On the path, the boy sees a duck.’

◮ Finite verb is required to be located in “second position” ◮ V2 languages include German, Dutch, Yiddish, Frisian, Icelandic,

Mainland Scandinavian, and Kashmiri

◮ Small-scale linguistic variation: Behavior in embedded clauses

differs

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 12 / 41

slide-13
SLIDE 13

Verb-second in a Multilingual Metagrammar The V2 Phenomenon: German and Yiddish

V2 in German

(2)

  • a. Der

the Junge boy sieht sees eine a Ente duck auf

  • n

dem the Weg. path ‘On the path, the boy sees a duck.’

  • b. . . . ,

. . . , dass that der the Junge boy auf

  • n

dem the Weg path eine a Ente duck sieht. sees ‘. . . , that the boy sees a duck on the path.’

◮ Main clauses exhibit V2 in German ◮ Embedded clauses with complementizers are verb-final

Main Clauses Embedded Clauses German V2 V-Final

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 13 / 41

slide-14
SLIDE 14

Verb-second in a Multilingual Metagrammar The V2 Phenomenon: German and Yiddish

V2 in Yiddish

(3)

  • a. Oyfn
  • n-the

veg path zet sees dos the yingl boy a a katshke. duck. ‘On the path, the boy sees a duck.’

  • b. . . . ,

. . . , az that dos the yingl boy zet sees a a katshke duck

  • yfn
  • n-the

veg path ‘. . . , that the boy sees a duck on the path.’

◮ As a verb-second language, Yiddish main clauses exhibit V2 ◮ Yiddish embedded clauses must also be V2

Main Clauses Embedded Clauses German V2 V-Final Yiddish V2 V2

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 14 / 41

slide-15
SLIDE 15

Verb-second in a Multilingual Metagrammar Crosslinguistic Analysis of V2

Methodology

Idea Basic V2 phenomenon is the same in all V2 languages Our Approach Crosslinguistic generalizations are captured in one Metagrammar using different heads (see Rambow and Santorini, 1995)

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 15 / 41

slide-16
SLIDE 16

Verb-second in a Multilingual Metagrammar Our Implementation

Dealing With Word Order Variation in a Metagrammar

Verbal trees are determined by:

  • 1. A subcategorization frame
  • 2. Valency alternations (e.g., voice)
  • 3. Argument Realizations
  • 4. A topology, which encodes the position and characteristics of the

verbal head

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 16 / 41

slide-17
SLIDE 17

Verb-second in a Multilingual Metagrammar Our Implementation

Topology

A topology is a combination of the projection and any compatible head(s). projection

◮ Empty verbal head plus its maximal projection ◮ Different types of clauses defined by features:

◮ non-finite clauses: [I:−] ◮ root V2 clauses: [Top:+] ◮ finite clauses [M:+, I:+]

heads

◮ Introduce categorial features ◮ The list of possible heads differs from language to

language (similar to, but different from, (Gerdes 2002))

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 17 / 41

slide-18
SLIDE 18

Verb-second in a Multilingual Metagrammar Our Implementation

A Finite Projection

VP " M + I + # VP2 6 6 6 4 C − M − TOP − I − 3 7 7 7 5 ε

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 18 / 41

slide-19
SLIDE 19

Verb-second in a Multilingual Metagrammar Our Implementation

The Heads Define the Topology of Clauses

Properties of the verbal heads (feature inventory) determine the positions of arguments and adjuncts: I (finite tense and subject-verb agreement): creates a specifier position for agreement, but allows recursion (i.e., adjunction at IP) Top (topic): a feature which creates a specifier position for the topic and which does not allow recursion M (mood): a feature with semantic content (to be defined), but no specifier C (complementizer): a lexical feature introduced only by complementizers

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 19 / 41

slide-20
SLIDE 20

Verb-second in a Multilingual Metagrammar Our Implementation

Some Simplified German Heads

1:

VP2 6 6 6 4 C − M − TOP − I + 3 7 7 7 5

❜❜ ✧ ✧

VP2 6 6 6 4 C − M − TOP − I − 3 7 7 7 5 v

2:

VP2 6 6 6 4 C − M + TOP + I + 3 7 7 7 5

❜❜ ✧ ✧

v VP2 6 6 6 4 C − M − TOP − I − 3 7 7 7 5

4:

VP2 6 6 6 4 C + M + TOP − I + 3 7 7 7 5

❍❍ ❍ ✟ ✟ ✟

comp VP2 6 6 6 4 C − M − TOP − I + 3 7 7 7 5

finite V-final V2-Subject Complementizer

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 21 / 41

slide-21
SLIDE 21

Verb-second in a Multilingual Metagrammar Our Implementation

German vs. Yiddish Heads

German: What Features Introduced Directionality 1 Verb (clause-final) +I head-final 2 Verb (V2, subject-inital) +M, +Top, +I head-initial 3 Verb (V2, non-subject-initial) +M, +Top head-initial 4 Complementizer +C, +M head-initial Yiddish: What Features Introduced Directionality 1 Verb +I head-initial 2 Verb (V2, subject-inital) +M, +Top, +I head-initial 3 Verb (V2, non-subject-initial) +M, +Top head-initial 4 Complementizer +C head-initial

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 22 / 41

slide-22
SLIDE 22

Verb-second in a Multilingual Metagrammar Sample Derivations

Derivation of a German sentence

VP 2 6 4 C + M + I + 3 7 5

❍❍ ❍ ✟ ✟ ✟

comp VP 2 6 4 C − M − I + 3 7 5 VP

❧ ❧ ✱ ✱

NPObj VP VP h I + i

❜❜ ❜ ✧ ✧ ✧

NPSubj VP h I + i VP " M + I + # VP2 6 6 6 4 C − M − TOP − I − 3 7 7 7 5 VP2 6 6 6 4 C − M − TOP − I + 3 7 7 7 5

❜❜ ✧ ✧

VP2 6 6 6 4 C − M − TOP − I − 3 7 7 7 5 v

Head 4 (Comp) + Object-Non-Topicalized + Subject-Non-Topicalized + Projection + Head 1 (V-final) that a duck a boy ε saw Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 24 / 41

slide-23
SLIDE 23

Verb-second in a Multilingual Metagrammar Sample Derivations

Derived German Tree

VP [ C:+, M:+, Top:−, I:+ ]

❛❛❛❛ ✦ ✦ ✦ ✦

comp⋄ VP [ C:−, M:−, Top:−, I:+ ]

❛❛❛❛ ❛ ✦ ✦ ✦ ✦ ✦

NPObj VP [ C:−, M:−, Top:−, I:+ ]

PPPPP P ✏ ✏ ✏ ✏ ✏ ✏

NPSubj VP[ C:−, M:−, Top:−, I:+ ]

❍❍❍ ❍ ✟ ✟ ✟ ✟

VP [ C:−, M:−, Top:−, I:− ] ε v⋄

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 26 / 41

slide-24
SLIDE 24

Verb-second in a Multilingual Metagrammar Sample Derivations

Derivation of a Yiddish Sentence

VP 2 6 4 C + M + I + 3 7 5

❍❍ ❍ ✟ ✟ ✟

comp VP 2 6 4 C − M + I + 3 7 5 VP h I + i

❜❜ ❜ ✧ ✧ ✧

NPSubj VP h I + i VP2 6 6 6 4 C − M + TOP + I + 3 7 7 7 5

❜❜ ✧ ✧

v VP2 6 6 6 4 C − M − TOP − I − 3 7 7 7 5 VP

❧ ❧ ✱ ✱

VP NPObj VP " M + I + # VP2 6 6 6 4 C − M − TOP − I − 3 7 7 7 5

Head 4 (Comp) + Subject-Topicalized + Head 2 + Object-Non-Topicalized + Projection that a boy saw a duck ε Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 28 / 41

slide-25
SLIDE 25

Verb-second in a Multilingual Metagrammar Sample Derivations

Derived Yiddish Tree

VP [ C:+, M:+, I:+ ]

❛❛❛ ❛ ✦ ✦ ✦ ✦

comp⋄ VP [ C:−, M:+, I:+ ]

❛❛❛❛ ✦ ✦ ✦ ✦

NPSubj VP [ C:−, M:+, Top:+, I:+ ]

PPPPP P ✏ ✏ ✏ ✏ ✏ ✏

v⋄ VP [ C:−, M:−, Top:−, I:− ]

❛❛❛ ❛ ✦ ✦ ✦ ✦

VP [ C:−, M:−, Top:−, I:− ] ε NPObj

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 30 / 41

slide-26
SLIDE 26

Implementation & Evaluation

Outline

Introduction: Multilingual Metagrammars Verb-second in a Multilingual Metagrammar The V2 Phenomenon: German and Yiddish Crosslinguistic Analysis of V2 Our Implementation Sample Derivations Implementation & Evaluation

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 31 / 41

slide-27
SLIDE 27

Implementation & Evaluation

German Grammar – Coverage

◮ A head for the verb-final finite verb, which introduces the feature

[+I] only.

◮ Two heads for the verb in V2 position: [+M+Top], [+M+Top+I]. ◮ Same two heads as empty for embedded questions and relative

clauses.

◮ Scrambling (through underspecification), passive, particle verbs, 8

subcategorization frames including clausal subcategorization, embedded wh questions and relative clauses

◮ We do not yet handle VP topicalization. ◮ The metagrammar generates a grammar with 1,357 verb trees. ◮ Development of the German grammar from the Korean grammar:

∼ 4 person-weeks.

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 32 / 41

slide-28
SLIDE 28

Implementation & Evaluation

German Testsuite and Evaluation

◮ Part 2 of the TIGER corpus, sentences 1–7999 for development

(kept 8000–9999 for evaluation)

◮ Extracted finite clauses without discontinuous constituents ◮ 5,254 clauses (tokens), of 200 types ◮ Coverage: 93.3% of tokens, 67.0% of clause types ◮ Error Analysis:

  • 1. Unexpected constructions which are not normally grammatical

(such as verb-final clauses without complementizer or wh-word, or clauses with an accusative object but no subject)

  • 2. Problems with wh-questions.

◮ Correctness: inspected 15% of the grammar (177 of 1,177 trees):

6 doubtful trees, none of which clearly wrong

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 33 / 41

slide-29
SLIDE 29

Implementation & Evaluation

Yiddish Grammar – Coverage

◮ SVO with widespread scrambling: S Aux V O, S Aux O V ◮ Objects can be right- or left-adjoining ◮ Heads: [+M+Top+I] for subject-initial clauses, [+M+Top] for

non-subject-initial clauses, empty [+I] head licenses the subject in the Mittelfeld; [+C] complementizer head

◮ V1 head [+M−Top] to account for the verb-initial matrix clauses. ◮ Nominal/Sentential objects, ditransitives ◮ Grammar Development: ∼ 2 person-weeks

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 34 / 41

slide-30
SLIDE 30

Implementation & Evaluation

Yiddish Testsuite and Evaluation

◮ Testsuite:

◮ Royte Pomerantsen, 249 stories in 6730 sentences ◮ Extracted all matrix IPs and CPs, all CPs embedded under

declaratives

◮ Simplification: 3751 clause tokens, 376 types

◮ Coverage: 78.9% of tokens, 33.8% of types ◮ Error Analysis (based on 10% of 249 missed types):

Error # corpus errors 5 embedded infinitives 8 missing subcat frames 7 misanalysis of topicalized S-complements 4 Total 24

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 35 / 41

slide-31
SLIDE 31

Implementation & Evaluation

Conclusion and Outlook

◮ Metagrammar captures common elements in and among

grammars

◮ Ideal for representing cross-linguistic generalizations ◮ Korean, German and Yiddish look a lot alike in a metagrammar ◮ Explicit modelling of head positions through features ◮ Multilingual metagrammar for variation among V2 languages ◮ Very fast development of grammars for new languages is possible

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 36 / 41

slide-32
SLIDE 32

Implementation & Evaluation

Some Desiderata for XMG

◮ More elegant multiple inheriance ◮ Say: take any combination of heads that works (solved by Denys?) ◮ Feature hierarchy/other feature interdependencies (example: if

[M:+] then [I:+]) (principle?)

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 37 / 41

slide-33
SLIDE 33

The Metagrammar Goes Multilingual: A Cross-Linguistic Look at the V2-Phenomenon

Thank You!

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 38 / 41

slide-34
SLIDE 34

The Metagrammar Goes Multilingual: Free Word Order in SOV language

Scrambling

As SOV languages, German and Korean both have scrambling Scrambling is the permutation of elements. 4! = 24 are acceptable in Korean and in German (4) [hyeongi gongjangi]

a local companynom

[samchonege]

the uncledat

[gagureul]

furnitureacc

[samiljeone]

three days ago

baedakhaessda.

delivered has.

‘A local company has delivered the furniture to the uncle three days

ago’

(5) ... (dass) [eine hiesige Firma] [dem Onkel] [die M¨

  • bel]

[vor drei Tagen] zugestellt hat.

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 39 / 41

slide-35
SLIDE 35

The Metagrammar Goes Multilingual: Free Word Order in SOV language

Scrambling in the Metagrammar

◮ Free order through underspecification

Metagrammar:

VP

❧ ❧ ✱ ✱

NPSubj

❏ ❏ ✡ ✡

VP VP

❧ ❧ ✱ ✱

NPObj

❏ ❏ ✡ ✡

VP VP V sees

(Compiled) Grammar:

VP

❍❍ ❍ ✟ ✟ ✟

NPSubj

❏ ❏ ✡ ✡

VP

❝ ❝ ★ ★

NPObj

❏ ❏ ✡ ✡

VP V sees VP

❍❍ ❍ ✟ ✟ ✟

NPObj

❏ ❏ ✡ ✡

VP

❩ ❩ ✚ ✚

NPSubj

❏ ❏ ✡ ✡

VP V sees

Rambow & Scheffler ( ) Multilingual Metagrammar: V2 22 juin 2007 41 / 41