Om-Omission and Filler-Gap Dependencies Gosse Bouma Centre for - - PowerPoint PPT Presentation

om omission and filler gap dependencies
SMART_READER_LITE
LIVE PREVIEW

Om-Omission and Filler-Gap Dependencies Gosse Bouma Centre for - - PowerPoint PPT Presentation

Optional Om Corpus Study Filler Gap Om and Gaps Collocations Om-Omission and Filler-Gap Dependencies Gosse Bouma Centre for Language and Cognition University of Groningen Structure and Evidence in Linguistics, Stanford, April 2013 Gosse


slide-1
SLIDE 1

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Om-Omission and Filler-Gap Dependencies

Gosse Bouma

Centre for Language and Cognition University of Groningen

Structure and Evidence in Linguistics, Stanford, April 2013

Gosse Bouma 1/25

slide-2
SLIDE 2

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Optional Complementizer Om

What explains absence or presence of om? De Indiërs aarzelen te investeren in Uganda The Indians hesitate to invest in Uganda The Indians hesitate to invest in Uganda Moser had overwogen om zijn avontuur af te blazen Moser had considered COMP his adventure PRT to cancel Moser had considered to cancel his adventure

Gosse Bouma 2/25

slide-3
SLIDE 3

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Optional Complementizer Om

What explains absence or presence of om? De Indiërs aarzelen om te investeren in Uganda The Indians hesitate COMP to invest in Uganda The Indians hesitate to invest in Uganda Moser had overwogen zijn avontuur af te blazen Moser had considered his adventure PRT to cancel Moser had considered to cancel his adventure

Gosse Bouma 2/25

slide-4
SLIDE 4

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Optional Complementizer Om

What explains absence or presence of om? De Indiërs aarzelen om te investeren in Uganda The Indians hesitate COMP to invest in Uganda The Indians hesitate to invest in Uganda Moser had overwogen zijn avontuur af te blazen Moser had considered his adventure PRT to cancel Moser had considered to cancel his adventure Filler-gap dependencies as predictor Gap locations inside Om-te-infinitives generally considered ok

Gosse Bouma 2/25

slide-5
SLIDE 5

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Optional Complementizer Om

What explains absence or presence of om? De Indiërs aarzelen om te investeren in Uganda The Indians hesitate COMP to invest in Uganda The Indians hesitate to invest in Uganda Moser had overwogen zijn avontuur af te blazen Moser had considered his adventure PRT to cancel Moser had considered to cancel his adventure Filler-gap dependencies as predictor Gap locations inside Om-te-infinitives generally considered ok But hardly occurs in corpus data (this talk)

Gosse Bouma 2/25

slide-6
SLIDE 6

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

The Dutch complementizer Om

Om as optional complementizer in to-infinitive complements De Kok vraagtV (om) 1 procent van hun inkomen te geven aan het fonds De Kok asks (COMP) 1 percent of their income to give to the fund De Kok asks to donate 1 percent of their income to the fund Ik ben niet vrijA (om) daarover te spreken I am not free (COMP) about-that to speak I am not free to speak about that Ik hou er niet vanP (om) Beverly Hills af te kammen I like there not PRT (COMP) Beverly Hills PRT to disrespect I do not like to criticize Beverly Hills Huurders krijgen het rechtN (om) mee te praten tenants obtain the right (COMP) with to talk Tenants obtain the right to have a say

Gosse Bouma 3/25

slide-7
SLIDE 7

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Historical Development

IJbema 2002 Om originated as preposition Later used as complementizer in purpose modifier clauses Use as complementizer in complement clauses is recent development (rare before 1750)

Gosse Bouma 4/25

slide-8
SLIDE 8

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Disapproval in Prescriptive Linguistics

Overview from Jansen 1987 Brill (1852), no objections Woordenboek der Nederlandse Taal, 1869 (lemma om): Om behoort altijd een doel, eene bestemming, of eene strekking aan te wijzen” (Om should always indicate a goal, purpose, or consequence) WNT, 1934 (lemma te) : no objections Van Es and Van Caspel (1971-75): Om is superfluous, typical of informal language ‘Nog in 1973 moet de redactie [van Onze Taal] inzenders die om als ‘slokdarmgeluid’ betitelen verdraagzaamheid voorhouden’ (’Even in 1973 the editors of Onze Taal had to plea for tolerance to members who described om as a guttural sound) Algemene Nederlandse Spraakkunst (1984): In spoken language there is a preference for om, leaving om out makes a formal impression

Gosse Bouma 5/25

slide-9
SLIDE 9

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

That-deletion in English complement clauses

The athlete realized that her goals would be difficult to achieve

Syntactic Complexity Features that play a role in predicting presence of that: complexity of complement clause (CC), distance between governor and CC, frequency of governor, complexity of CC subject, subject starts with that, ... Lexical bias (Roland et al 2006) that-bias(governor) = ln

CCs with that CCs without that

Information Density (Jaeger 2010) complement-bias(governor) = ln

  • ccurrences with CC

1 − occurrences with CC

Gosse Bouma 6/25

slide-10
SLIDE 10

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

That-deletion in English complement clauses

The athlete realized that her goals would be difficult to achieve

Syntactic Complexity Features that play a role in predicting presence of that: complexity of complement clause (CC), distance between governor and CC, frequency of governor, complexity of CC subject, subject starts with that, ... Lexical bias (Roland et al 2006) that-bias(governor) = ln

CCs with that CCs without that

Information Density (Jaeger 2010) complement-bias(governor) = ln

  • ccurrences with CC

1 − occurrences with CC

Gosse Bouma 6/25

slide-11
SLIDE 11

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

That-deletion in English complement clauses

The athlete realized that her goals would be difficult to achieve

Syntactic Complexity Features that play a role in predicting presence of that: complexity of complement clause (CC), distance between governor and CC, frequency of governor, complexity of CC subject, subject starts with that, ... Lexical bias (Roland et al 2006) that-bias(governor) = ln

CCs with that CCs without that

Information Density (Jaeger 2010) complement-bias(governor) = ln

  • ccurrences with CC

1 − occurrences with CC

Gosse Bouma 6/25

slide-12
SLIDE 12

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

That-deletion in English complement clauses

The athlete realized that her goals would be difficult to achieve

Syntactic Complexity Features that play a role in predicting presence of that: complexity of complement clause (CC), distance between governor and CC, frequency of governor, complexity of CC subject, subject starts with that, ... Lexical bias (Roland et al 2006) that-bias(governor) = ln

CCs with that CCs without that

Information Density (Jaeger 2010) complement-bias(governor) = ln

  • ccurrences with CC

1 − occurrences with CC

Gosse Bouma 6/25

slide-13
SLIDE 13

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Corpus Study

Data Twente Newspaper Corpus (approx 400M words) Corpus of Spoken Dutch (10M words) Annotation Automatically parsed with the HPSG inspired Alpino parser for Dutch (van Noord 2006) Output is dependency analysis (with phrasal nodes)

Gosse Bouma 7/25

slide-14
SLIDE 14

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Alpino Dependency Analysis

top

smain

su

1

ik0 hd spreek af1 svp af2 vc

ti

cmp te5 body

inf

su

1

mod vandaag3 svp thuis4 hd blijf thuis6

Ik spreek af vandaag thuis te blijven I arrange to stay at home today

Gosse Bouma 8/25

slide-15
SLIDE 15

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Alpino Dependency Analysis

top smain hd ben2 vc ppart mod pp hd in0

  • bj1

OESO verband1 hd spreek af3 vc

  • ti

cmp

  • m4

body ti cmp te10 body inf

  • bj1

np det

die5 hd subsidie6 mod

pp hd

vanaf7

  • bj1

mwu mwp

18 mwp januari9 hd schrap11

In OESO-verband is afgesproken om die subsidies vanaf 1 januari te schrappen In OECD context it was agreed to stop those subsidies as of January, 1st

Gosse Bouma 9/25

slide-16
SLIDE 16

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Spoken vs. Written

Percentage Om in spoken and written material for selected verbs

vergeet weiger besluit

  • verweeg

dwing raad_aan beslis spreek_af neem_voor vraag maak verplicht nodig_uit vind Spoken Written 20 40 60 80 Gosse Bouma 10/25

slide-17
SLIDE 17

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Om-bias does not correlate with Complement-bias

20 40 60 80 20 40 60 80 100 Om−bias (percentage) Complement−bias (percentage)

aarzel acht belemmer belet beloof beschouw beslis besluit best−doe besta bestem_voor beveel_aan beweeg bied_aan breng_op daag_uit de−tijd−geef de−tijd−heb de−tijd−krijg doe−aan draag_op durf_aan dwing geen−been−zie−in haal_over help_mee in−de−gelegenheid−stel in−het−werk−stel in−hoofd−haal kans−schoon−zie kans−zie kijk_uit kom_overeen kondig_aan laat_na laat_toe maan machtig moedig_aan motiveer nodig−heb nodig_uit noem noodzaak

  • p−de−nominatie−sta
  • p−het−punt−sta
  • pper
  • verreed
  • vertuig
  • verweeg

presteer prikkel raad_aan raad_af risico−loop schroom smeek sommeer spoor_aan spreek_af spreek_af−met sta−trappel sta_toe stem−ga_op stimuleer suggereer tot−doel−heb tot−taak−heb van−plan−ben verbied verdien vergeet verhinder verleid veroorloof verplicht vertik verzuim vind voor−elkaar−krijg vraag waag waarschuw zeg_toe zet_aan zich−beijver zich−geroepen−voel zich−haast zich−maak_op zich−neem_voor zich−permitteer zich−schaam zich−span_in zich−ten−doel−stel zich−zet_in

Gosse Bouma 11/25

slide-18
SLIDE 18

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Predicting Om-deletion in Dutch

Hard to go beyond lexical bias Approx 25% of the to-infinitival complements are introduced by om

  • m-bias of the governing verb is by far strongest predictor

Syntactic complexity plays a small but significant role Semantic association plays a small but significant role Results (far) less clear than for English that-deletion

Gosse Bouma 12/25

slide-19
SLIDE 19

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Filler-gap dependencies inside Om-te Infinitives?

Island Constraints vs. Processing Difficulties

  • ur goal is to assess certain arguments that have been made to the effect that

grammatical constraints MUST be involved in island phenomena ... it is our contention that independently motivated processing factors can successfully explain a substantial amount of the judgment variation that has been used to motivate island constraints within grammar. Hofmeister, Philip, Laura Staum Casasanto, and Ivan A. Sag. In press. Islands in the Grammar? Standards of Evidence. Role of Corpus Research Can automatically annotated treebanks provide any insights in this discussion? Non-local filler-gap dependencies are not very frequent in text Parse results (automatic annotation) are not very accurate for such cases

Gosse Bouma 13/25

slide-20
SLIDE 20

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Filler-gap dependencies inside Om-te Infinitives?

Island Constraints vs. Processing Difficulties

  • ur goal is to assess certain arguments that have been made to the effect that

grammatical constraints MUST be involved in island phenomena ... it is our contention that independently motivated processing factors can successfully explain a substantial amount of the judgment variation that has been used to motivate island constraints within grammar. Hofmeister, Philip, Laura Staum Casasanto, and Ivan A. Sag. In press. Islands in the Grammar? Standards of Evidence. Role of Corpus Research Can automatically annotated treebanks provide any insights in this discussion? Non-local filler-gap dependencies are not very frequent in text Parse results (automatic annotation) are not very accurate for such cases

Gosse Bouma 13/25

slide-21
SLIDE 21

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Filler-gap dependencies inside Om-te Infinitives?

Extraction from OTI Linguistic literature suggests that such extractions are ok Hans Bennis (2000), Adjectives and Argument Structure

Waari is Jan bang (om) over ti te praten

Broekhuis, den Besten, Hoekstra, and Rutten (1995), Infinitival complementation in Dutch: On remnant extraposition

Wati heeft Jan geprobeerd om ti te lezen “it must be mentioned that the complementizer is preferably dropped”

Challenges for Corpus Study Wh-questions hardly occur in newspaper corpus Relative clauses do occur frequently But most filler-gap dependencies are ’local’...

Gosse Bouma 14/25

slide-22
SLIDE 22

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Filler-gap dependencies inside Om-te Infinitives?

Extraction from OTI Linguistic literature suggests that such extractions are ok Hans Bennis (2000), Adjectives and Argument Structure

Waari is Jan bang (om) over ti te praten

Broekhuis, den Besten, Hoekstra, and Rutten (1995), Infinitival complementation in Dutch: On remnant extraposition

Wati heeft Jan geprobeerd om ti te lezen “it must be mentioned that the complementizer is preferably dropped”

Challenges for Corpus Study Wh-questions hardly occur in newspaper corpus Relative clauses do occur frequently But most filler-gap dependencies are ’local’...

Gosse Bouma 14/25

slide-23
SLIDE 23

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Gap Location in Relative Clauses

perc Local Non−local Percentage Local vs Non−local Gaps 20 40 60

15.361 Relative Clauses in 100.000 sentences (Wikipedia sample)

su mod predc

  • bj1

pc

  • ther

relative with local gap simple main clause

Percentage DepRel

10 20 30 40 50 60 70

Gosse Bouma 15/25

slide-24
SLIDE 24

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Distribution of Gaps in Relatives dominating an (O)TI

Relatives with Object gap in (O)TI complement een tekstbericht dat Tankink besloot voorlopig te bewaren a text-message that Tankink decided to store for the moment het werk wat veel jongeren vertikken om te doen the work that most youth refuse to do Count Perc Sentences containing a verb with an (O)TI complement 285,000 100 Relatives containing a verb with an (O)TI compl. 22,059 7.74 Relatives with ’local’ filler-gap dependency 17,449 6.12 Relatives with non-local filler-gap dependency 4,610 1.62

Gosse Bouma 16/25

slide-25
SLIDE 25

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Distribution of Gaps in Relatives dominating an (O)TI

Relatives with Object gap in (O)TI complement een tekstbericht dat Tankink besloot voorlopig te bewaren a text-message that Tankink decided to store for the moment het werk wat veel jongeren vertikken om te doen the work that most youth refuse to do Count Perc Sentences containing a verb with an (O)TI complement 285,000 100 Relatives containing a verb with an (O)TI compl. 22,059 7.74 Relatives with ’local’ filler-gap dependency 17,449 6.12 Relatives with non-local filler-gap dependency 4,610 1.62

Gosse Bouma 16/25

slide-26
SLIDE 26

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Distribution of Gaps in Relatives dominating an (O)TI

Relatives with Object gap in (O)TI complement een tekstbericht dat Tankink besloot voorlopig te bewaren a text-message that Tankink decided to store for the moment het werk wat veel jongeren vertikken om te doen the work that most youth refuse to do Count Perc Sentences containing a verb with an (O)TI complement 285,000 100 Relatives containing a verb with an (O)TI compl. 22,059 7.74 Relatives with ’local’ filler-gap dependency 17,449 6.12 Relatives with non-local filler-gap dependency 4,610 1.62

Gosse Bouma 16/25

slide-27
SLIDE 27

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Distribution of Gaps in Relatives dominating an (O)TI

Relatives with Object gap in (O)TI complement een tekstbericht dat Tankink besloot voorlopig te bewaren a text-message that Tankink decided to store for the moment het werk wat veel jongeren vertikken om te doen the work that most youth refuse to do Count Perc Sentences containing a verb with an (O)TI complement 285,000 100 Relatives containing a verb with an (O)TI compl. 22,059 7.74 Relatives with ’local’ filler-gap dependency 17,449 6.12 Relatives with non-local filler-gap dependency 4,610 1.62

Gosse Bouma 16/25

slide-28
SLIDE 28

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Distribution of Gaps in Relatives dominating an (O)TI

Relatives with Object gap in (O)TI complement een tekstbericht dat Tankink besloot voorlopig te bewaren a text-message that Tankink decided to store for the moment het werk wat veel jongeren vertikken om te doen the work that most youth refuse to do Count Perc Sentences containing a verb with an (O)TI complement 285,000 100 Relatives containing a verb with an (O)TI compl. 22,059 7.74 Relatives with ’local’ filler-gap dependency 17,449 6.12 Relatives with non-local filler-gap dependency 4,610 1.62

Gosse Bouma 16/25

slide-29
SLIDE 29

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Relatives with Object gap in (O)TI

Examples automatically extracted from corpus 849 occurrences (manually checked)

//node[@rel="rhd" and @index = ..//node[@cat="ti" and @rel="vc"] //node[@rel="obj1"]/@index and not(@index = //node[@rel="su"]/@index) ]

Gosse Bouma 17/25

slide-30
SLIDE 30

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

All (10) relatives with Object gap in OTI (1)

het werk wat veel jongeren vertikken om te doen the work which many youth refuse to do de sociale problemen die de kunst geacht wordt om op te lossen the social problems that art is supposed to solve een nieuwszender die de kabelmaatschappijen niet verplicht zullen zijn om in het pakket op de kabel door te geven a news channel that cable companies will not be obliged to include in their offers de plichten die ze het beneden hun waardigheid achten om thuis te vervullen the duties that they consider to be below their dignity at home wat ik vergeten ben om te vertellen what I forgot to tell De maatregelen welke men van plan is om door te voeren Measurements which one is planning to implement

Gosse Bouma 18/25

slide-31
SLIDE 31

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

All (10) relatives with Object gap in OTI (2)

(all from same day/article:) Bijna heb ik gedaan wat ik me heb voorgenomen om te doen Almost, I did what I had planned to do mensen die doen wat ze zich hebben voorgenomen om te doen doen wat je je hebt voorgenomen om te doen doen wat je je voorneemt om te doen

Gosse Bouma 19/25

slide-32
SLIDE 32

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Relatives with Object gap in TI (839 cases)

Het is dit kantoortje dat Eva Joly besluit voortaan zelf schoon te houden It is this office that Eva Joly decides to clean herself from now on dingen die de overheid besloot niet te doen things that the government decided not to do de Golf GTI , die VW-importeur Pon ( nog ) niet van plan is naar Nederland te halen the Golf GTI, that VW-dealer Pon (not) plans to bring to the Netherlands brandbommen , die men van plan was op het terrein van het azc te gooien fire bombs that one was planning to throw on the premises of the refugee center het presidentiële vliegtuig dat hij had beloofd direct van de hand te zullen doen the presidential plane that he had promised to do away immediately de problemen die Schröder belooft met harde hand te zullen bestrijden the problems which Schröder promises to attack fiercely

Gosse Bouma 20/25

slide-33
SLIDE 33

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Filler-gap dependencies in (O)TI

Statistics Count Perc TI 839 98.8 OTI 10 1.2 Filler-gap dependency predicts absence of Om non-local filler-gap dependency accurately predicts absence of om But non-local filler-gap dependencies are very scarce in general

0.3% of all relevant data

So hardly useful for statistical model predicting om

Gosse Bouma 21/25

slide-34
SLIDE 34

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Filler-gap dependencies in (O)TI

Statistics Count Perc TI 839 98.8 OTI 10 1.2 Filler-gap dependency predicts absence of Om non-local filler-gap dependency accurately predicts absence of om But non-local filler-gap dependencies are very scarce in general

0.3% of all relevant data

So hardly useful for statistical model predicting om

Gosse Bouma 21/25

slide-35
SLIDE 35

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Collocational effects

Relative frequency of verbs governing TI in filler-gap dependencies and in general TI corpus

acht besluit an−plan−ben beloof verplicht beoog vergeet verwacht dwing verbied weiger Filler−gap General 5 10 15 Gosse Bouma 22/25

slide-36
SLIDE 36

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Achten (consider, be supposed to)

De eerbied [die] we geacht worden jegens het koningshuis te koesteren The respect that we are supposed to cherish concerning the royal family de vrouw die hij geacht wordt te schaduwen the woman that he is supposed to shadow antwoorden die ook minister Zalm wordt geacht met overtuiging uit te spreken answers which minister Zalm also is supposed to utter with conviction degenen die hij geacht wordt in de gaten te houden those which he is supposed to keep an eye on concurrentiestrijd die de uitvoerders worden geacht aan te gaan competion which the responsibles are supposed to engage in Edelgassen doen toch [wat] ze geacht worden niet te doen Noble gasses after all do what they are supposed not to do werk dat in Nederland de leraar geacht wordt er ook nog eens bij te doen work which, in the Netherlands, the teacher is supposed to do along the side

Gosse Bouma 23/25

slide-37
SLIDE 37

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Conclusions

Syntax and Lexicon Syntactic variation clearly governed by lexical items (that-deletion,

  • m-omission)

Distribution of filler-gap dependencies shows collocational effects Large (Automatically) Annotated Treebanks Provide detailed syntactic information Allow study of frequency effects in syntactic constructions Allow search for rare syntactic phenomena

Relative clauses with non-local filler gap dependencies ...containing a specific clausal complement ...and a specific verbal head

Gosse Bouma 24/25

slide-38
SLIDE 38

Optional Om Corpus Study Filler Gap Om and Gaps Collocations

Greetings from Groningen!

Gosse Bouma 25/25