Presenting the new General Service List:
Rationale, method, implications Vaclav Brezina, Dana Gablasova
Presenting the new General Service List: Rationale, method, - - PowerPoint PPT Presentation
Presenting the new General Service List: Rationale, method, implications Vaclav Brezina, Dana Gablasova Overview 1. Vocabulary lists: So what? 2. New GSL: compilation procedure 3. Basic English vocabulary: stable and new words 4. Looking
Presenting the new General Service List:
Rationale, method, implications Vaclav Brezina, Dana Gablasova
2
Overview
West’s GSL: Why does it matter today?
1) Pedagogical purpose 2) Research purpose
4
1) Pedagogical purpose
Which of these words are useful for learners of English?
Nation (1990, 2001), Nation & Newton (1997: 239)
5
1) Pedagogical purpose (cont.)
98%
Ijima & Horie (2010); Beglar (2010); Matsuoka & Hirsh (2010); He & Sirinthon (2010); Coxhead, Stevens & Tinkle (2010); Matsumo, Tsutsumi, Matsuo & Gilbert (2010); Chon (2011); Lubliner & Hiebert (2011); Wang, Shen & Masataki (2011); Anderson & Platten (2011); Smith (2011); Millar, Budgell & Kwong (2011); Kokkinakis, Skoldberg & Henriksen (2012); Parent (2012); Fukushima, Watanabe, Kinjo, Yoshihara & Suzuki (2012); Webb & Nation (2012); Cuningham (2012); Coxhead (2012); Budgell (2013)
2) Research purpose
So why do we need a NEW GSL?
7
8
West’s (1953 [1936]) GSL
shilling footman milkmaid telegraph timely stoppage invaluable
West’s selection based on: 1) Frequency 2) Subjective criteria:
3) One corpus (1936) 4) Organising principle: word families
So why do we need a NEW GSL?
10
Task 1: types, lemmas, word families
dog, dogs, develop, develops, developed, developing, undeveloped, underdeveloped, development, developments, value, values, valuable, invaluable, train, trains, trained, training, trainer, trainers
undeveloped, underdeveloped, development, developments
11
Words: types, lemmas, word families
Quantitative paradigm: 1) Frequency 2) Dispersion 3) Stability across corpora Organising principle: Lemma
Our wordlist
ARF
items in different general language corpora?
corpora?
13
Method: RQs
14
Method: Data
Corpora
LOB BNC BE06 EnTenTen12
Tokens
1.14 million 112 million 1.15 million 12.97 billion
Period
1961 1990s 2005-7 2012
Variety of English
British British British International
Spoken component NO
YES (10%) NO NO
Sample size
2k words of each text 40-50k words of each text 2k words of each text whole documents included
500 4,049 500 21.55 million
Sampled text-types 15 genres of writing
Imaginative (20%) and informative (70%) writing + speech (10%) 15 genres of writing www – a wide range
1. Creation of wordlists based on the four corpora (LOB, BNC, BE06, EnTenTen12). 2. Comparison of wordlists pairwise (RQ1). 3. Identification of a common lexical core among the four wordlists and extraction of the shared items (RQ2). 4. Identification of lexical items reflecting recent vocabulary changes in the English language based on BE06 and EnTenTen12.
15
Method: Procedure
16
Results
Corpora LOB-3000 BNC-3000 BE06-3000 EnTenTen12-3000 LOB-3000 x 2,497 (83.2%) rs=.870, p<.001 2,458 (81.9%) rs=.832, p<.001 2,352 (78.4%) rs=.762, p<.001 BNC-3000 x x 2,514 (83.8%) rs=.870, p<.001 2,428 (80.9%) rs=.819, p<.001 BE06-3000 x x x 2,518 (83.9%) rs=.826, p<.001 EnTenTen12-3000 x x x x
17
Results (cont.)
Word class Overlap nouns 1009 verbs (+ modals) 488+10 adjectives 317 adverbs 166 conjunction & prepositions 63 pronouns 22
47 TOTAL 2,122
Lexical innovations Examples New words (forms)
Internet, website, online, email
New meanings/functions of old words
user, via, network, client, mobile, file, web
Old words with recent prominence
medium, phone, key, technology, guy, kid, environment, computer, movie, definitely
Lexical core
2,122 items
Lexical innovations
374 items
new-GSL
2,496 lemmas
Wordlist Number of items
Types Lemmas Word families
new-GSL 5,115 2,496
7,826 4,114 2,000 18
19
Task 2
NOUNS Word rank cake cloud colour computer Internet letter lover man people prayer sex society time TV woman ADJECTIVES Word rank afraid bad blue good key new
red stupid
A = first 500; B = 501 – 1000; C = 1001-2500
20
NOUNS Word new-GSL rank
45
79
105
198
543
546
701
724
861
1422
1484
2385
2429
2454
2457 ADJECTIVES Word new-GSL rank
73
87
160
304
660
712
1018
1878
2367
2438
21
22
Summary
Going back to the two target groups of users: a) Practitioners: creating usable interface b) Researchers: American supplement (methodological questions/decisions to make)
23
Looking ahead
24 Text box: User input Tree tagger
new-GSL
Lexical complexity OUTPUT
25