Some references Kocourek , R. 1996. The prefix post- in contemporary - - PDF document

some references
SMART_READER_LITE
LIVE PREVIEW

Some references Kocourek , R. 1996. The prefix post- in contemporary - - PDF document

LEXICAL PRODUCTIVITY: THEORETICAL ISSUES AND QUANTITATIVE MEASURES ISABELLA CHIARI Dipartimento di Studi Filologici, Linguistici e Letterari Universit La Sapienza di Roma isabella.chiari@uniroma1.it Chiari, I. Lexical Productivity:


slide-1
SLIDE 1

1

LEXICAL PRODUCTIVITY: THEORETICAL ISSUES AND QUANTITATIVE MEASURES

ISABELLA CHIARI

Dipartimento di Studi Filologici, Linguistici e Letterari Università La Sapienza di Roma isabella.chiari@uniroma1.it

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

Some references

  • Kocourek, R. 1996. The prefix post- in contemporary

English terminology. Terminology, 3:1, pp. 85-110.

  • Nakagawa, H. 2000. Automatic term recognition based
  • n statistics of compound nouns. Terminology, 6(2):

195–210.

  • Diaz Vera, J. (2003), Lexical and Non Lexical linguistic

Variation: in the Vocabulary of Old English, Atlantis, 21(1), 29-30.

  • Kageura, K. 2004. Quantitative Portraits of Lexical
  • Elements. In S. Ananadiou & P. Zweigenbaum (eds.),

COLING 2004 CompuTerm 2004: 3rd International Workshop on Computational Terminology, Geneva: COLING, pp. 75-8.

  • Bolasco, S. (2005), “Statistica testuale e text mining:

alcuni paradigmi applicative”, Quaderni di Statistica, 7, pp.

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

slide-2
SLIDE 2

2

Potential lexical productivity (Kageura)

in the lexicological sphere “which correspond to theoretical sphere of discourse as represented by the given document set” “d(i) = how many compounds t can potentially make”

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

Productivity (Bolasco)

“capability of producing different forms from a specific lexeme or root”

  • the more frequent a lexeme is in a text, the

more probable is the occurrence of its derivations in that text

  • Application on proper names

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

slide-3
SLIDE 3

3

What lexical productivity?

  • Connected forms: derivations, compounding,

abbreviations, conversions, blendings, complex lexemes and idioms, including recursive classes of the

previous types

  • Textual typologies
  • Diachronic trends

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

Power of producing connected forms by any word formation process, in a given period of time, in a given set of texts.

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

FILM anti crito fanta inter maxi mega meta contro mini tele porno radio super video prime-

  • are

auto-

  • ografia

bio-

  • maker
  • balletto
  • culto
  • documentario
  • documento
  • inchiesta
  • opera
  • scandalo
  • tv
  • verità
  • abile
  • ato
  • ino
  • ico
  • (a)mente
  • ità
  • ino
  • accio
  • one
  • etto
  • ina
  • istico
  • (a)mente
  • izzazione
  • (o)teca
  • ico
  • (o)logo
  • logia
  • geno

in- pre-

  • s
slide-4
SLIDE 4

4

Why lexical productivity?

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

Loanwords monitoring, integration and adaptation Neologisms monitoring and integration Term representativeness and keyword extraction Data mining, specificity indexes Lexicographic (statistical) profiling (headwords selection and description)

Critical issues

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

Communication needs (Martinet)

Competition (synonymy) Specific Semantic Field Domain Socio-Cultural role in the community Marginality vs. centrality

slide-5
SLIDE 5

5

Two perspectives

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

synchronic lexical productivity lexeme usage, through assessment, in a given corpus, of types and tokens

  • f connected forms

diachronic lexical productivity trends in lexical productivity observed at regular intervals, thus taking into account possible variation of specific connected form usage

Indicators

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

  • LEX. PROD. the total number of new lexemes produced by any

word formation process TYPE PROD. the total number of types (inflected forms) produced TOKEN PROD. the total number of tokens (frequency) of all the connected forms

10 11,037 1,066 12 8 9,971

bar

7 124,234 8,416 93 58 115,818

film

% TOKEN PROD TOT (LOAN+CF) TOKEN PROD TYPE PROD LEX PROD LEX FREQ LOANWORD

slide-6
SLIDE 6

6

Application on loanwords’ integration

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

Lexical borrowing

Integration Adaptation Nativization How can we interpret lexical productivity as an index of loanwords’ integration? What are the main productivity trends that can be inferred from data? How do they correlate with other factors influencing integration?

Examples on loan word selection

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

50 loanwords with the highest frequency of usage in VELI (VELI Vocabolario elettronico della lingua italiana. Il vocabolario del 2000, a cura di T. De Mauro, IBM Italia, Milano 1989). it contains lexemes already attested before 1990 (starting date for the Rep90 corpus)

slide-7
SLIDE 7

7

Inclusion/selection criteria

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

a) attested in GRADIT (Grande dizionario della lingua italiana d’uso, De Mauro, 1999-2003, UTET); b) they were included both in the usage list and in the frequency list of VELI; c) they might be simple bases or direct applications of word formation processes (not only club has been included, but also management, leader, network); d) While the great majority of loanwords are simple stems, if a derivational form is in the top 50, and its simple stem is not, the derivational form has been included (such as for marketing and market).

Exclusion criteria

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

a) they were homographs with other loanwords present in the corpus (as for golf, pullover and sport) as attested in GRADIT b) they where commonly present in proper names and entity names (as for bank, city); c) they were not composite expressions (as made in Italy) d) only those loanwords whose connected forms can be clearly distinguished from originally Italian words (it’s the case of import “importare”) e) derivational forms were excluded if the stem was already in the selected list (as for designer and design,

  • r leadership and leader): so design and leader have

been included.

slide-8
SLIDE 8

8

Rep90 corpus

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

large newspaper corpus Rep90, built out of texts extracted from daily newspaper “La Repubblica” (1990-1999), kindly made available by Sergio Bolasco The total corpus: more than 270 ml running words (more than 20 ml occurrences per year). The list produced by Bolasco includes 291,649 inflected forms TALTAC (v.1), text mining manual processing and cleaning-up

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

Total occurrences of base and of connected form s

  • 3,000

2,000 7,000 12,000 17,000 22,000

film computer leader sport premier record manager club partner boss rock pool spot test bar business holding killer sponsor clan show caffè dossier tour boom

loanword frequenc

LEX FREQ TO KEN PRO D

slide-9
SLIDE 9

9

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

4,281 4,281

STAFF

4,877 4,877

MANAGEMENT

2,845 2 1 1 2,843

RAID

11,415 15 3 3 11,400

POOL

1,935 3 1 1 1,932

MEETING

1,750 6 1 1 1,744

LOOK

0.4 7,299 27 3 2 7,272

DOSSIER

0.8 11,189 87 15 13 11,102

SPOT

21.6 1,594 344 30 16 1,250

ROBOT

25 4,996 1,225 34 8 3,771

STRESS

27.1 2,538 688 3 2 1,850

DESIGN

32 3,313 1,070 19 7 2,243

SHOCK

34 6,283 2,138 21 8 4,145

STOP

35.9 14,025 5,029 70 30 8,996

SPONSOR

50.1 43,499 21,783 55 34 21,716

SPORT % TOKEN PROD TOT TOKEN PROD TYPE PROD LEX PROD LEX FREQ LOANWORD Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

4,281 4,281 STAFF 4,877 4,877

MANAGEMENT

27.1 2,538 688 3 2 1,850 DESIGN 0.8 11,189 87 15 13 11,102 SPOT 2.7 15,963 434 36 28 15,529 CLUB 7.5 19,399 1,461 48 30 17,938 COMPUTER 50.1 43,499 21,783 55 34 21,716 SPORT 35.9 14,025 5,029 70 30 8,996 SPONSOR 6.8 124,234 8,416 93 58 115,818 FILM % TOKEN PROD TOT TOKEN PROD TYPE PROD LEX PROD LEX FREQ

LOANWORD

slide-10
SLIDE 10

10

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

Interconnections

Ability for a lexeme to be inserted in recipient language morphological processes Ability to import donor language morphological processes Ability to develop new senses Dispersion over text typologies and contents Representativeness, key-wordliness Competition (formal, semantic, syntactic) Ability to be integrated into the phonological and phonotactic system of RL.

Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006)

Open issues and further research

Language internal and language external constraints Textual typology and register Diachronic trends Classification of different word classes applications in corpus linguistics and computational linguistics