Overview Introduction & Background aka Theory Complex phenomena - - PDF document

overview
SMART_READER_LITE
LIVE PREVIEW

Overview Introduction & Background aka Theory Complex phenomena - - PDF document

Overview Introduction & Background aka Theory Complex phenomena deserve Goals, Corpora, & Quantitative Methods complex explanations choosing Results how to think in Finnish Discussion Conclusions


slide-1
SLIDE 1

1 Complex phenomena deserve complex explanations – choosing how to think in Finnish

Antti Arppe University of Helsinki QITL2, Osnabrück, 2.6.2006

Overview

  • Introduction & Background aka ’Theory’
  • Goals, Corpora, & ’Quantitative’ Methods
  • Results
  • Discussion
  • Conclusions

Theory and concepts

  • contextuality of usage and meaning: ”You shall know a word by the

company that it keeps!” (J. R. Firth) – words are in a structural and semantic relationship with others in their context – the choice (i.e. usage) and meaning of words is interconnected with their context – in a language with a free word order such Finnish, (functional) dependency grammar (Tesnière) is a practical way to explore such strucural relationships

  • non-modularity of language – constructionality

– regularities in co-occurrence and structure can be observed at a continuum of levels from individual words and synonym groups to general semantic groupings

  • r parts-of-speech -> Construction Grammar
  • synonymy

– some word pairs or groups have relatively similar meanings – in some contexts such words can be interchanged with each other without an essential change in the meaning of the entire utterance

Introduction – Modelling of lexical choice in computational theory

  • In the case of semantically similar words,

especially near-synonyms, at least three levels have been suggested (Edmonds and Hirst 2002)

1) conceptual-semantic level 2) subconceptual/stylistic-semantic level, and 3) syntactic-semantic level

Factors influencing lexical choice

  • n the syntactic-semantic level
  • (mainly) lexicographically motivated corpus-based studies show

differences in the use of semantically similar words, i.e. synonyms, in e.g. their: 1) lexical context

  • e.g. English powerful vs. strong in Biber et al. 1998

2) syntactic structures of which they form part of

  • e.g. English begin vs. start in Biber et al. 1998

3) semantic classification of some particular argument

  • e.g. English shake verbs in Atkins & Levin 1996

4) style-associated text type, in which they are used

  • e.g Biber 1998
  • while the above studies have focused on English, with minimal

morphology, it has also been shown in languages with extensive morphology such as Finnish that similar differentiation is evident 5) wrt the inflectional forms and the associated morphosyntactic features in which synonyms are used

  • Finnish miettiä and pohtia ‘think, ponder, reflect, consider’ in Arppe and Arppe &

Järvikivi 2002

  • tärkeä vs. keskeinen ‘important, central’ in Jantunen 2002

Critical assessment of these results – monocausality

  • The mentioned studies are typically

monofactorial/monocausal, focusing on one linguistic category or one feature within a category (at a time)

– HOWEVER Jantunen (2002) does go through a wide range of categories, but does not quantitatively evaluate their interactions – With justification, Gries (2003) has argued convincingly for a holistic approach using multifactorial (i.e. multivariate) statistical methods

  • HOWEVERm these multivariate methods build upon

univariate and bivariate analysis

slide-2
SLIDE 2

2 Critical assessment – dichotomous setups

  • The mentioned studies concern typically synonym pairs instead of

groups with more than two members

– powerful vs. strong, start vs. begin, miettiä vs. pohtia, tärkeä vs. keskeinen – BUT ALSO Gries’ own study of particle placement concerns a dichotomous choice between two alternative constructions – this has been noted earlier by also Divjak and Gries (forthcoming), motivating their exceptional study of nine Russian verbs meaning ’try’

  • However, lexicographical reality, clearly evident in both dictionaries

and in language use, often indicates that there are more than just two members to a synonym group

– THOUGH full interchangability for more than two synonyms may be prima facie rarer, there are probably at least some contexts where any

  • ne in a group of more than two synonyms could be substituted with

each other without a major reservation – Consequently, the differences observed between some pair might change, diminish or even disappear when studied within the entire group

Subsequent goals, methods and corpora in this study

  • Explore and develop corpus-based and statistical

(quantitative) methodology with an aim to:

– Extend from dichotomous to polytomous (more than two) setups

  • Inclusion of other members of the THINK synonym groups, with

roughly similar magnitudes of frequency (common translations in boldface): – ajatella: 1 intend 2 plan 3 imagine, fancy, conceive (conceive of sth) 4 ponder 5 reflect 6 think, think of, give a thought to, figure 7 consider 8 take from some perspective 9 regard, make of (sth) – miettiä: 1 think 2 meditate, ponder (meditate on sth) 3 reflect 4 contemplate, conceive (conceive [of] sth), consider, mull [over], wonder (wonder about sth), give a thought to muse, cast about for 5 think twice, thoroughly – pohtia: 1 deliberate, consider, ponder, think over 2 contemplate, discuss (discuss sth), debate, talk over, puzzle, think in terms of 3 wonder (wonder about sth) 4 turn over, chew over 5 kick around / about 6 (think out loud) talk about – harkita: 1 contemplate 2 ponder, deliberate, think over 3 weigh, weigh up 4 consider, think of, think in terms of 5 think, entertain 6 think out 7 be considering [doing sth]

Goals (cont’d) …

– Extend from (simple) monofactorial to (complex) multifactorial models of explanation of lexical choice

  • Inclusion of all practically available linguistic and extra-linguistic

contextual information – Morphological features and inflectional structure – Syntactic arguments (according to dependency grammar as implemented in the Functional Dependency Parser for Finnish by Connexor, influenced by Tesnière 195X) – Semantic classifications of syntactic arguments (according to WordNet in the case of nominal lexemes and loosely adapting semantic semantic primitives of Wierzbicka in the case of non-nominal adverbs)

  • Building upon Gries’ framework (2003) of combining various

statistical methods – X2 test, Cramér's V, lambda (Goodman-Kruskal), correlation and uncertainty coefficient (UC, Thiel) for discovering significant individual features – Regression analysis for studying the simultanous influence and interaction of significant features

Goals (cont’d)

  • Extend from traditional written corpora such as newspapers or

published literature (formal, standardized and monologic in nature) to more informal material with a dialogic character

– In addition to two months of Helsingin Sanomat, Finland’s largest daily newspaper from January-February 1995

  • 3,3M words with 1750 instances of the studied verbs

– Inclusion of six months of Finnish Internet discussion group material from 2002-2003

  • sfnet.keskustelu.ihmissuhteet (human relationships) and

sfnet.keskustelu.politiikka (politics)

  • 400K words with 1654 instances of the studied verbs

– Newspaper/Newsgroup section, author and quotation/body information available from both sources as extralinguistic context – In addition, various aspects of repetion were also included as extralinguistic context (first use within article/posting, repetition of the preceding verb, individual preceding verbs of the same group)

Current descriptions of Finnish THINK verbs

  • Pajunen: Verbien argumenttirakenne ‘Argument

Structure of [Finnish] Verbs’ (2001: 62-63)

– “Primary-B verbs [i.e. mental verbs], with the exception of speech verbs and some descriptive perception verbs, in general have a flat [classificatory] structure. … In classes with very flat structure these relationships [hyponym-hypernym] are rare and classificatory structure consists of minor sets which are in loose co-hyponymic relationhips to each other (i.e. contrast groups)

Current descriptions ...

  • Pajunen (2001: 313-319)

– [käsittää], ajatella:

  • x-arg: subject A:ab:
  • y-arg: object, clause argument=subordinate

clause, participle, infinitive

  • A:gentivity: volitional participation in state or event,

sensing and/or perceiving

– harkita

  • x-arg: subject:A:(a)b: y-arg: object, clause

argument

  • Agentivity: (volitional participation in state or

event), sensing and/or perceiving

slide-3
SLIDE 3

3

Current descriptions ...

–Arguments

  • x-arg:

–∀human referent

  • y-arg:

–abstract notion > concrete object, state-

  • f-affairs, human referent
  • z-arg:

–goal: stimulus (“world-to-mind”), –result (“mind-to-world”)

Current descriptions (Perussanakirja ’Standard dictionary of Finnish’)

  • ajatella 67*C (other members of the THINK group marked in boldface)

  • 1. yhdistää käsitteitä ja mielteitä tietoisesti toisiinsa (us. jnk ongelman

ratkaisemiseksi), miettiä, harkita, pohtia, tuumia, järkeillä, päätellä, aprikoida,

  • punnita. Ajatella loogisesti, selkeästi. Lupasi ajatella asiaa. Olen ajatellut sinua.

Ajatella jkta pahalla. En tullut sitä ajatelleeksi. Tapaus antoi ajattelemisen [= vakavan harkinnan] aihetta. Ajatella ääneen puhua itsekseen. –

  • 2. asennoitua, suhtautua, olla jtak mieltä jstak, arvella. Samoin, toisin ajattelevat.

Porvarillisesti ajattelevat kansalaiset. Mitä ajattelet asiasta? Ajattelin, että olisi parasta luopua hankkeesta. –

  • 3. kuvitella, olettaa, pitää mahdollisena, otaksua. Suoran ajateltu jatke. Tauti,

jonka aiheuttajaksi on ajateltu virusta. Ajatellaanpa, että - -. Paras ajateltavissa

  • leva. Pahinta, mitä ajatella saattaa.

  • 4. kiinnittää huomiota jhk, ottaa jtak huomioon, pitää jtak silmällä, mielessä.

Ajatella omaa etuaan, toisten parasta. Toimia seurauksia ajattelematta. Paras vaihtoehto tulevaisuutta ajatellen paremmin: tulevaisuuden kannalta. –

  • 5. harkita, aikoa, suunnitella, tuumia. Ajatteli jäädä eläkkeelle, eläkkeelle

jäämistä. Tehtaan paikaksi on ajateltu Torniota. –

  • 6. vars. ark. huudahduksissa huomiota kiinnittämässä t. sanontaa
  • tehostamassa. Ajatteles, mitä sillä rahalla olisi saanut! Ajatella, että hän on jo

aikuinen!

Current descriptions ...

  • miettiä 61*C (other members of the THINK group marked in boldface)

  • 1. ajatella, harkita, pohtia, punnita, tuumia, aprikoida, järkeillä, mietiskellä.

Mitäpä mietit? Asiaa täytyy vielä miettiä. Mietin juuri, kannattaako ollenkaan lähteä. Vastasi sen enempää miettimättä. Miettiä päänsä puhki. –

  • 2. suunnitella; keksiä (miettimällä). Miettiä uusia kepposia. Oli miettinyt hyvän

selityksen.

  • pohtia61*F

– ajatella jtak perusteellisesti, eri mahdollisuuksia arvioiden, harkita, miettiä, tuumia, ajatella, järkeillä, punnita, aprikoida. Pohtia arvoitusta, ongelmaa. Pohtia kysymystä joka puolelta. Pohtia keinoja asian auttamiseksi.

  • harkita69 (harkitsematon, harkitseva, harkittu ks. erikseen)

  • 1. ajatella perusteellisesti, eri mahdollisuuksia arvioiden, pohtia, punnita,

puntaroida, miettiä; suunnitella. Harkita ehdotusta, tilannetta. Asiaa kannattaa

  • harkita. Ottaa jtak harkittavaksi, harkittavakseen. Asiaa tarkoin harkittuani

päätin - -. Lääkkeitä on käytettävä harkiten. Yhtiö harkitsee toiminnan laajentamista. –

  • 2. päätyä jhk perusteellisen ajattelun nojalla, tulla jhk päätelmään, katsoa jksik.

Harkitsi parhaaksi vaieta. Sen mukaan kuin kohtuulliseksi harkitaan. Näin olen asian harkinnut.

Monofactorial results

  • 5771 (syntactic) contextual features observed altogether in the two corpora
  • 340 features with statistically significant differences in their distributions among the

four studied verbs (83 morphological features, 208 syntactic argument+semantic/morphological features, 43 syntactic argument+lexemes, and 18 extralinguistic features)

  • Some statistically significant individual lexemes as syntactic arguments without a

semantic classification inspired an additional classification of semantically similar arguments, e.g.

– tarkka <- tarkkaan, tarkoin ’careful/meticulous’ ->vakavasti ’seriously’, oikeasti ’really/earnestly’, perusteellisesti ’thoroughly’, tarkasti ’thoroughly’, huolellisesti ’carefully’, syvään <- syvä ’in depth’ -> SX_MAN.SEM_THOROUGH in MANNER, or – vielä ’still’, enää ’anymore’, edelleen ’ever still’, jo ’already’, yhä ’evermore’, edelleen <- edelle ’ ever still’, erää <- erä_’for now’ [ei] koommin ’[not since]’, vasta ’just (since a short for while)’ -> SX_DUR.SEM_OPEN in DURATION

  • Of the statistically significant features

– 185 were logically associated with some other so they were excluded – 39 correlated with another to the extent that the other was discarded from further analysis

  • Resulting in 106 features for further analysis (37 morphological features, 51 syntactic

argument+semantic features, 15 syntactic argument+lexemes, and 17 extralinguistic features)

Multifactorial results – selection of a heuristic method for polytomous regression

  • logistic regression can be extended from dichotomous to

polytomous cases with several heuristics, which are based on a set

  • f dichotomous logistic regression models (those explicitly observed

in this study marked in boldface)

– one vs. rest (Rifkin & Klautau 2004)

  • ajatella vs. miettiä+pohtia+harkita, miettiä vs. ajatella+pohtia+harkita, ...

– double-round-robin aka pairwise (Fürnkranz 2002)

  • ajatella vs. miettiä, ajatella vs. pohtia, ..., miettiä vs. ajatella, ...

– nested dichotomies

  • e.g. ajatella vs. (miettiä vs. (pohtia vs. harkita) or (ajatella vs. miettiä) vs.

(pohtia vs. harkita) – ensemble of nested dichotomies, i.e. ENDs (Frank & Kramer 2004)

  • an aggregate of (a random selection of) all possible nestings

– multinomial logistic models

  • a single logistic model
  • presuppose a baseline, e.g. most frequent or most prototypical one (in this

case ajatella) vs. the rest

Comparison of the pros and cons

  • f the various heuristic methods

– + Heuristic

Nmodels=Nlexemes*(Nlex-1) Does not provide probabilities directly -> has to rely on some heuristic, e.g. voting scheme for selection in evaluation For a lexeme contasting positively with another lexeme and negatively with a third lexeme provides a contradictory aggregate result -> may does exaggerate differences within the group as a whole Discovers thoroughly all pairwise differences among tlexemes double-round-robin May not uncover a distinguisting feature which is approximately equally common also with another lexeme Nmodels=Nlexemes Provides direct probabilities -> lex(max(P(context))) can be selected in evaluation Highlights features separating

  • ne lexeme from the rest
  • ne-vs-rest
slide-4
SLIDE 4

4

Comparison (cont’d) ...

– + Heuristic

Does not provide distinguisting features for the baseline lexeme Nmodels=1 multinomial models Nmodels(Nlex)=(2Nlex- 3)*Nmodels(Nlex-1); Nmodels(1)=1 Can take into account different perspectives represented by several different nestings Provides direct probabilities for each lexeme as averages of each nesting ensemble of nested dichotomies Selecting one appropriate nesting may be difficult or impossible Nmodels=Nlexemes-1 Provides direct probabilities for each lexemes in a context nested dichotomies

Evaluation results – one vs. all

  • five rounds with 2/3 random holdout sample of the corpus data for training

and the remaining 1/3 for evaluation

  • 2269 training cases -> 1135 test cases

Recall.Total Recall.Total.% Mean 734.4 64.71 Std.Dev 17.9 1.58 tau (Kendall) 0.5461221 0.02402176 Test.Mean Test/All.% Recall.Mean Recall.% Recall.Std.Dev ajatella 493.6 43.49 419.2 84.94 15.6 harkita 131.2 11.56 59.2 45.45 2.4 miettiä 274.2 24.16 126.6 46.24 5.8 pohtia 236.0 20.79 129.4 54.91 6.2 Recall.Std.Dev.% Precision.Mean Precision.% ajatella 2.36 564.6 74.25 harkita 4.41 107.8 55.15 miettiä 2.59 234.4 54.04 pohtia 1.70 228.2 56.90

Evaluation – double-round-robin

Recall.Total Recall.Total.% Mean 735.0 64.76 Std.Dev 20.4 1.79 tau (Kendall) 0.5612979 0.02145662 Test.Mean Test/All.% Recall.Mean Recall.% Recall.Std.Dev ajatella 512.8 45.18 426.0 83.05 17.5 harkita 119.6 10.54 52.0 43.43 5.8 miettiä 259.6 22.87 123.4 47.53 13.1 pohtia 243.0 21.41 133.6 55.02 12.7 Recall.Std.Dev.% Precision.Mean Precision.% ajatella 1.50 551.2 77.27 harkita 3.55 110.0 47.40 miettiä 3.98 244.8 50.41 pohtia 5.49 229.0 58.39

Evaluation – nested dichotomies

Nest.Mean Nest.St.Dev. %.Nest/Test Nest.Rank (a, (m, (p, h))) 722.4 17.10 63.65 7 (a, (p, (m, h))) 577.4 7.47 50.87 15 (a, (h, (m, p))) 721.0 13.21 63.52 10 (m, (a, (p, h))) 724.2 13.14 63.81 4 (m, (p, (a, h))) 729.4 15.81 64.26 1 (m, (h, (a, p))) 721.6 9.24 63.58 8 (p, (a, (m, h))) 719.0 14.49 63.35 13 (p, (m, (a, h))) 728.8 12.32 64.21 2 (p, (h, (a, m))) 726.8 15.51 64.04 3 (h, (a, (m, p))) 720.8 13.77 63.51 12 (h, (m, (a, p))) 721.6 11.28 63.58 8 (h, (p, (a, m))) 721.0 12.86 63.52 10 ((a, m), (p, h)) 723.2 14.32 63.72 6 ((a, p), (m, h)) 715.8 13.01 63.07 14 ((a, h), (m, p)) 723.2 8.76 63.72 6

Evaluation – ensemble of nested dichotomies

Recall.Total Recall.Total.% Mean 733.6 64.64 Std.Dev 9.6 0.85 tau (Kendall) 0.5527388 0.01366292 Test.Mean Test/All.% Recall.Mean Recall.% Recall.Std.Dev ajatella 495.0 43.61 422.0 85.27 7.0 harkita 138.2 12.18 59.8 43.33 2.6 miettiä 263.6 23.22 121.0 45.88 9.8 pohtia 238.2 20.99 130.8 55.01 6.1 Recall.Std.Dev.% Precision.Mean Precision.% ajatella 1.64 567.0 74.42 hark 2.06 106.2 56.38 miettiä 2.62 237.2 51.07 pohtia 3.68 224.6 58.48

Evaluation – comparison of results

  • prediction accuracy

– one-vs-all: 64.71% – double-round-robin: 64.76% – nested dichotomies: in 14/15 cases 63-64% – ENDs: 64.64%

  • OBVIOUSLY no significant difference in

prediction performance

slide-5
SLIDE 5

5

Results - model coefficients

  • See Appendix 1 (at the end) for the full

results of the aggregated model preferences with the one-vs-all heuristic

  • See Appendix 2 (at the end) for the full

results of the aggregated model preferences with the double-round-robin heuristic Model coefficients – comparison – AGENT

+1ST, +/-GROUP

  • GROUP,
  • INDIVIDUAL

harkita

  • 1ST, +2ND, +3RD, -

SING, +PASS, ++GROUP +PASS, -SING, +GROUP pohtia +SING, +GROUP +SING miettiä

  • 2ND, -3RD,
  • PASS
  • --GROUP
  • GROUP

ajatella round-robin

  • ne-vs-all

Method/Verb

+: positive significant coefficient;

  • : negative significant coefficient

(in each verb-vs-rest or pairwise verb-vs-verb comparison); similar coefficients marked in boldface

Model coefficients – comparison – PATIENT

+/-NOTION, +COMMUNICATION, +++ACTIVITY +ACTIVITY harkita

  • INDIVIDUAL, -GROUP,

++NOTION, +ATTRIBUTE, +COMMUNICATION, +/-ACTIVITY

  • INDIVIDUAL, +NOTION,

+ATTRIBUTE, +COMMUNICATION pohtia

  • INDIVIDUAL, -GROUP,

+NOTION, +COMMUNICATION, +/-ACTIVITY +NOTION, +COMMUNICATION miettiä ++INDIVIDUAL, ++GROUP,

  • --NOTION, -ATTRIBUTE,
  • --COMMUNICATION,
  • --ACTIVITY

+INDIVIDUAL, +GROUP,

  • NOTION, -ATTRIBUTE,
  • COMMUNICATION,
  • ACTIVITY

ajatella round-robin

  • ne-vs-all

Method/Verb +: positive significant coefficient;

  • : negative significant coefficient

(in each verb-vs-rest or pairwise verb-vs-verb comparison); similar coefficients marked in boldface

Discussion

  • doubling the number of features (51 -> 106) to

be included in training the multi-level models increases prediction accuracy by only less than ten percent: 58-59% -> 64-65%)

  • both one-vs-rest and double-round-robin

uncover practically the same feature-lexeme associations

  • BUT double-round-robin brings forth essentially

more distinctive features than one-vs-all

  • Person/Number distinctions are marginalized

when considered together with other factors with the one-vs-all

Further issues

  • the remaining, not insignificant proportion (~35%) of incorrect

predictions with all the studied methods

– does this represent truly synonymous, i.e. interchangable cases, which could be explored with experimentation as in Arppe & Järvikivi 2002, or – is this a result of some still missing features in the models – how much does the aggregation process of the individual dichotomous regression models in the double-round-robin method contribute to this inaccuracy

  • collinearity of the factors should be scrutinized
  • double-round-robin method: how to aggregate the coefficient values
  • f the various component dichotomous regression models
  • differences between the registers (formal i.e. newspaper vs. informal

i.e. newsgroup) should be studied

  • morphological family size (’MFS’) effects (cf. Schreuder & Baayen

1997, De Jong 2002, Moscoso del Prado Martín et al. 2004), i.e. noun/adjective derivations of the studied verbs, should also be

  • bserved

Conclusions

  • A wide range of different linguistic

(morphological, lexical, syntactic and semantic) and extralinguistic features appear to influence (register, repetition) in the choice of the studied synonymous verbs

  • Univariate, bivariate and multivariate

statistical methods each play an essential role in the discovery of both these factors and their relative weights and interactions

slide-6
SLIDE 6

6

Appendix 1 – Full results of aggregated model coefficient preferences with one-vs-all heuristic

(-) + Z_ANL_SING + Z_ANL_PASS

  • +

Z_ANL_NEG

  • +

Z_ANL_IMP

  • Z_ANL_KOND
  • (+)

(+) Z_ANL_IND

  • +

Z_TRA (+)

  • Z_NOM

+

  • Z_INE

+ Z_ESS (-) + Z_ABE (-) +

  • Z_PCP2
  • Z_PCP1

(-) +

  • Z_INF4

+

  • Z_INF3
  • +

Z_INF2 + Z_INF1 harkita Pohtia miettiä ajatella Morphological features

+: positive significant coefficient;

  • : negative significant coefficient

(in each verb-vs-rest or pairwise verb-vs-verb comparison); (+) or (-): coefficients with a p-value .05<p<.1

Appendix 1 – cont’d

(+) SX_GOA.SEM_NOTION

  • +

SX_GOA

  • +

SX_SOU (+) +

  • SX_LX_se_PRON.SX_PAT
  • +

SX_LX_että_CS.SX_PAT + +

  • SX_PAT.INDIRECT_QUESTION

+ +

  • SX_PAT.DIRECT_QUOTE
  • +

SX_PAT.PARTICIPLE

  • +

SX_PAT.INFINITIVE +

  • SX_PAT.SEM_ATTRIBUTE

+

  • SX_PAT.SEM_ACTIVITY

(+) +

  • SX_PAT.SEM_COMMUNICATION

+ +

  • SX_PAT.SEM_NOTION

+ SX_PAT.SEM_GROUP

  • +

SX_PAT.SEM_INDIVIDUAL

  • +

SX_PAT (-) +

  • SX_AGE.SEM_GROUP
  • SX_AGE.SEM_INDIVIDUAL

harkita pohtia miettiä ajatella Syntactic arguments

Appendix 1 – cont’d

harkita pohtia miettiä ajatella Syntactic arguments (-) + (-) SX_FRQ.SEM_OFTEN + (-) SX_FRQ.SEM_AGAIN

  • +
  • SX_DUR.SEM_SHORT
  • SX_DUR.SEM_OPEN

+

  • SX_DUR.SEM_LONG

+ SX_TMP.SEM_TIME

  • (+)

SX_TMP.PHR_CLAUSE +

  • SX_TMP

(-) +

  • SX_LOC.SEM_LOCATION

(-) + SX_LOC.SEM_GROUP (-) +

  • SX_LOC.SEM_EVENT

+ SX_QUA.SEM_LITTLE +

  • SX_MAN.SEM_THOROUGH

(-) + SX_MAN.SEM_NEGATIVE

  • +

SX_MAN.SEM_GENERIC

  • +
  • +

SX_MAN.SEM_FRAME + SX_MAN.SEM_DIFFER + SX_MAN.SEM_CONCUR (-) SX_MAN.SEM_ALONE

Appendix 1 – cont’d

harkita pohtia miettiä ajatella Syntactic arguments + SX_LX_voida_V.SX_AAUX +

  • SX_LX_tarvita_V.SX_AAUX

+

  • SX_LX_kannattaa_V.SX_AAUX
  • SX_LX_joutua_V.SX_AAUX
  • SX_LX_alkaa_V.SX_AAUX

+

  • SX_CV

(-) SX_CAUX +

  • SX_AAUX

+ (-) SX_COMP + SX_LX_mukaan_PSP.SX_META (+) (-) SX_META +

  • SX_CND

(-) SX_RSN

Appendix 1 – cont’d

  • +

Z_REPEAT

  • +

+ Z_QUOTE

  • (-)

+ Z_EXTRA_DE_politiikka (-)

  • +

+ Z_EXTRA_DE_ihmissuhteet + + (-)

  • Z_EXTRA_DE_hs95_YO

+

  • Z_EXTRA_DE_hs95_UL

(+)

  • Z_EXTRA_DE_hs95_TA

(+)

  • Z_EXTRA_DE_hs95_PO

(+) Z_EXTRA_DE_hs95_NH (-) + Z_EXTRA_DE_hs95_MP + (-) Z_EXTRA_DE_hs95_MN (-) +

  • Z_EXTRA_DE_hs95_KU

+ (-) Z_EXTRA_DE_hs95_KA harkita pohtia miettiä ajatella Extralinguistic features

Appendix 2 – Full results of aggredated model coefficient preferences with double-round-robin heuristic

+

  • Z_ANL_PASS
  • +

Z_ANL_SING +

  • Z_ANL_THIRD

+

  • Z_ANL_SECOND

(+) (-) Z_ANL_FIRST

  • ++

Z_ANL_NEG (-)

  • (+)+

+ Z_ANL_IMP (+)

  • +

(-)+ Z_ANL_KOND

  • ++

+ Z_ANL_IND

  • (-)
  • (+)
  • +++

Z_TRA ‘come to [think]’

  • +

Z_NOM +(-) +(+)+ +-

  • Z_INE
  • (-)
  • ++(+)

Z_ESS (-)(-) (+) (+) Z_ABE (+) + (-)- Z_PTV ++

  • (-)

+(+)

  • Z_PCP2

+

  • (-)

(+) Z_PCP1

  • (+)

++

  • (-)

Z_INF4 (-) +(+)

  • Z_INF3

(+)

  • (-)(-)

(+) + Z_INF2

  • ++
  • Z_INF1

harkita pohtia miettiä ajatella Morphological features +: positive significant coefficient;

  • : negative significant coefficient

(in each verb-vs-rest or pairwise verb-vs-verb comparison); (+) or (-): coefficients with a p-value .05<p<.1

slide-7
SLIDE 7

7

Appendix 2 – cont’d

  • +++

SX_GOA (-)

  • (-)

(+)(+)+ SX_SOU (-) + +(+)

  • SX_LX_se_PRON.SX_PAT ‘consider

that!’

  • +++

SX_LX_että_CS.SX_PAT ’that’ +(+) +(-)

  • SX_PAT.DIRECT_QUOTE

+-(-) +(+)- +++

  • SX_PAT.INDIRECT_QUESTION

+

  • +

SX_PAT.PARTICIPLE +

  • +

SX_PAT.INFINITIVE +++ +- +-

  • SX_PAT.SEM_ACTIVITY

(+) + + (-)-- SX_PAT.SEM_COMMUNICATION +

  • SX_PAT.SEM_ATTRIBUTE

+- ++ +

  • SX_PAT.SEM_NOTION

(-)

  • +(+)

SX_PAT.SEM_GROUP (-)

  • +(+)

SX_PAT.SEM_INDIVIDUAL (-)-- +++ +- (+)- SX_PAT +- (+)+ +

  • -(-)

SX_AGE.SEM_GROUP SX_AGE.SEM_INDIVIDUAL harkita pohtia miettiä ajatella Syntactic arguments

Appendix 2 – cont’d

harkita pohtia miettiä ajatella Syntactic arguments

  • ++
  • SX_FRQ.SEM_OFTEN

+ (+)

  • (-)

SX_FRQ.SEM_AGAIN

  • (+)

++

  • (-)

SX_DUR.SEM_SHORT + + +

  • SX_DUR.SEM_OPEN

+(-)

  • +(+)+
  • SX_DUR.SEM_LONG

(-) +(+)+

  • SX_TMP.SEM_TIME

+

  • +

+ SX_TMP.PHR_CLAUSE + +

  • SX_TMP

+- ++

  • SX_LOC.SEM_LOCATION
  • (-)

++(+)

  • SX_LOC.SEM_GROUP

(-)- ++ (+)

  • SX_LOC.SEM_EVENT

(-) (+) SX_QUA.SEM_MUCH (-) +(+)

  • SX_QUA.SEM_LITTLE

+ +

  • SX_MAN.SEM_THOROUGH

(-)

  • (+)+

SX_MAN.SEM_NEGATIVE

  • +

SX_MAN.SEM_GENERIC

  • +(+)
  • (-)

++ SX_MAN.SEM_FRAME (-) (+) SX_MAN.SEM_DIFFER (-) (+) SX_MAN.SEM_CONCUR

Appendix 2 – cont’d

harkita pohtia miettiä ajatella Syntactic arguments

  • ++
  • SX_LX_tarvita_V.SX_AAUX ’need to

[think] +++

  • +-+
  • SX_LX_kannattaa_V.SX_AAUX ’be

worth [thinking]’ (+) +

  • (-)

SX_LX_joutua_V.SX_AAUX ’have to [think]’

  • (-)

(+)(+) + +-(-) SX_LX_alkaa_V.SX_AAUX ’start [thinking]’ +- ++

  • SX_CV

(-) (+) SX_CAUX +

  • +

SX_AAUX (-)- ++

  • (+)

SX_COMP ++(+) (-)

  • SX_LX_mukaan_PSP.SX_META

’according to [someone]’ +

  • SX_META

+(+)+

  • (-)
  • SX_CND

(+) (-) SX_RSN

Appendix 2 – cont’d

  • +++

Z_REPEAT +

  • +

+ Z_QUOTE

  • +

++ Z_EXTRA_DE_politiikka

  • +

+ Z_EXTRA_DE_ihmissuhteet ++ +(+)

  • (-)
  • Z_EXTRA_DE_hs95_YO

+

  • Z_EXTRA_DE_hs95_UL

+ +

  • Z_EXTRA_DE_hs95_TA

+ (+) +

  • -(-)

Z_EXTRA_DE_hs95_PO

  • ++

Z_EXTRA_DE_hs95_MP +

  • Z_EXTRA_DE_hs95_MN

+

  • Z_EXTRA_DE_hs95_MA
  • ++
  • Z_EXTRA_DE_hs95_KU

(+) (-) Z_EXTRA_DE_hs95_KN ++(+) (-)

  • Z_EXTRA_DE_hs95_KA

harkita pohtia miettiä ajatella Extralinguistic features