1
play

1 productivity measures: criticism interpretation of productivity - PDF document

Outline Need and Competition qualitative and quantitative productivity Deconstructing Quantitative need Productivity competition the 'too much' data Anke Ldeling, Berlin need Marco Baroni, Bologna/Forl competition


  1. Outline Need and Competition � qualitative and quantitative productivity Deconstructing Quantitative � need Productivity � competition � the 'too much' data Anke Lüdeling, Berlin � need Marco Baroni, Bologna/Forlì � competition Stefan Evert, Osnabrück productivity measures Qualitative and quantitative productivity � In a generative model a morphological process � a number of measures have been proposed, is either possible (grammatical) or not based on proportion of unseen types to types (ungrammatical) or on number of restrictions → qualitative productivity, availability (e.g., Booij 1977, Aronoff 1976) � morphologists have always wanted to express � these have been criticized on linguistic and on something like ‘the ease with which a process mathematical grounds can apply’ (witness expressions like ‘very productive’, ‘marginally productive’ etc.) � following the work of Harald Baayen (1989, 1992 → quantitative productivity, profitability etc.) productivity measures are proposed that are based on the distribution of types and tokens � Baayen 1989, 1992, Baayen & Lieber 1991, produced by a given word-formation process Plag 1999, Bauer 2001, Lüdeling & Evert 2003, (most well-known Baayen’s P ) Meibauer, Guttropf & Scherer 2004, Nishimoto 2004, … productivity measures: the basic idea productivity measures: frequency spectrum � select a word-formation process � count the types and tokens of all complex words in a given corpus (this already implies a lot of qualitative analysis, see Lüdeling, Evert & Heid 2000) � calculate a productivity measure (e.g., Baayen’s P ) � the measures rely on low frequency types � basically: the more low frequency types are generated by the wf process, the more productive it is – because low-frequency types indicate new formations 1

  2. productivity measures: criticism interpretation of productivity measures � "An important property of P is that it expresses � mathematical: not possible to directly compare productivity measures for processes with different corpus in a very real sense the probability that new sizes (fitting of models for extrapolation difficult) types will be encountered when the item sample → discussed before, see Baayen 2001, Evert and Baroni is increased. [...] The main interest of P is that it 2005, Gaeta and Ricca (to appear) is the quantitative formalization of the linguistic notion of productivity." Baayen (1992, 115) � empirical: measures dependent on size and design of corpus � "We argue that a measure of productivity based → discussed before, used as a measure in stylometry on the token frequencies of types, specifically on (Tweedie and Baayen 1998) and diachronic productivity the number of hapax legomena for a given affix studies (Scherer 2005) in a corpus, comes very close to according with � linguistic: interpretation of the measure as purely our intuitions about productivity." (Baayen & linguistic and as inherent property of a single wf process Lieber 1991, 801) → topic of this talk need linguistic problems of productivity measures � all measures of productivity rely on corpus counts and � corpus counts are influenced by the need to express a given thought/concept are interpreted as indices of the independent degree of linguistic productivity of a wf process Die Möglichkeit zur Bildung von Zuss. aus zwei Substantiven ist � however: the corpus counts are influenced by a number unbegrenzt. Ob solche aber wirklich gebildet werden, hängt of factors (even if we assume a balanced corpus) natürlich vom Bedürfnis ab (Paul 1920, 15) “The possibility to form noun-noun compounds is unlimited. � the counts therefore reflect a ‘mixture’ of Whether they are actually formed, however, depends on the � need - extra-linguistic need” � competition - linguistic, sociolinguistic, psycholinguistic � persistence - psycholinguistic Words are only formed as and when there is a need for them [. . . ] (Bauer 2001, 143) � ‘inherent’ productivity? - linguistic � ... � the need to express something depends on fashion, the political situation etc. (Plag 1999) → extra-linguistic factors need ans measures of productivity competition � typical interpretation: � corpus counts are influenced by competition productivity of ri- � any need can be expressed by (in principle infinitely) many ways, morphological and syntactic � reflects the need � not only competition in terms of truth-functional (extralinguistic) mixed semantics: connotation, register, etc. with the ‚inherent � some of the realizations are closer to each other than productivity‘ others (linguistic) (competition cannot be modeled as random noise) � for single wf � some are more likely than others processes corpus � the likelihood of the competitors influences the likelihood counts do not reflect of each process productivity 2

  3. aside: competition in linguistics aside: competition in linguistics � competition among well-formed objects � Optimality Theory plays a role in many linguistic fields � competition between constraints (typically not in generative linguistics � competition between candidates to find the proper): optimal one � historical linguistics: language change, variation – most candidates not well-formed � sociolinguistics: dialects, registers, variation � morphology: type blocking, token blocking � mainly descriptive, mostly no fully worked- (Plag 1999 → no genuine competition in wf) out mathematical model of competition � Minimalism � principles of economy „inherent“ productivity the 'too much' corpus � does it exist? � the 'too much' data � how can we go about studying it? � need � competition � find morphological processes that express the same need (qualitative) � select suitable corpus � find instances of the processes in the corpus � develop a model to account for their distribution (we are still working on this!) find morphological processes expressing the ‚too much‘ heads the same need � non-medical - itis , as in Telefonitis ‘using the telephone � must pertain to very specific need too much’ � relatively ‘rare’ wf processes � wahn , as in Abbawahn ‘playing too much music by Abba’ � candidate instances of wf processes must be � hysterie , as in Absicherungshysterie ‘worrying too much easy to spot by automated means about security’ � zwang , as in Ausgehzwang ‘having to go out too often’ � the ‘too much’ data: several word formation � sucht , as in Ausstattungssucht ‘using too much processes that express the notion that equipment (in a movie)’ somebody is doing too much of something and � besessenheit , as in Besitzbesessenheit ‘being obsessed have an ‘illness’ connotation about one’s possessions’ � obsession , as in Computerobsession ‘being obsessed � all instances of compounding about computers’ � manie/mania, as in Handymanie ‘using the mobile too much' 3

  4. selecting a suitable corpus collecting the data � we need a large corpus � all potential forms in corpus extracted with (Lüdeling & Evert 2005) regular expressions � deWaC: more than 1.5 billion tokens of � de-duping, clumpiness effects German from the Web (Baroni & Kilgarriff � manual preprocessing necessary 2006) � noise � semantics collecting the data: noise collecting the data: other readings � all heads have medical readings � the regular expressions find words that are → have to be thrown out not built by the targeted wf processes: - itis ’inflammation’, as in Arthritis ’inflammation → these have to be thrown out of the joints’ sucht ’addiction’, as in Drogensucht ’drug � typos in the data that can be clearly addiction’ recognized are normalized: � with all heads we find compounds that have Effizienswahn / Effizienzwahn ’obsessing readings other than the “too much” reading → have to be thrown out about efficiency’ Behördenzwang ’force by the authorities’ Medienhysterie ’hysteria caused by the media’ competition 1: categorical competition 2: in context besessen hysterie -itis manie obsession sucht wahn zwang � is there competition in a given context? heit � speaker‘s perspective: is there a choice between simplex � � � � � � � � N several options to express the same concept? complex � � � � � � � � � comparable contexts in the data N deverbal � � � � � � � � (our analysis) N � very small Web-experiment V � � � � � � � � (10 participants) Adj � � � � � � � � with 'too-much' contexts and specific contexts, neocl � � � � � � � � ratings from 1 (very good) to 6 (unacceptable) Engl � � � � � � � � 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend