Presenting the new General Service List: Rationale, method, - - PowerPoint PPT Presentation

presenting the new general service list
SMART_READER_LITE
LIVE PREVIEW

Presenting the new General Service List: Rationale, method, - - PowerPoint PPT Presentation

Presenting the new General Service List: Rationale, method, implications Vaclav Brezina, Dana Gablasova Overview 1. Vocabulary lists: So what? 2. New GSL: compilation procedure 3. Basic English vocabulary: stable and new words 4. Looking


slide-1
SLIDE 1

Presenting the new General Service List:

Rationale, method, implications Vaclav Brezina, Dana Gablasova

slide-2
SLIDE 2
  • 1. Vocabulary lists: So what?
  • 2. New GSL: compilation procedure
  • 3. Basic English vocabulary: stable and new words
  • 4. Looking ahead

2

Overview

slide-3
SLIDE 3

West’s GSL: Why does it matter today?

1) Pedagogical purpose 2) Research purpose

slide-4
SLIDE 4

4

1) Pedagogical purpose

Which of these words are useful for learners of English?

slide-5
SLIDE 5

Nation (1990, 2001), Nation & Newton (1997: 239)

5

1) Pedagogical purpose (cont.)

98%

slide-6
SLIDE 6

Ijima & Horie (2010); Beglar (2010); Matsuoka & Hirsh (2010); He & Sirinthon (2010); Coxhead, Stevens & Tinkle (2010); Matsumo, Tsutsumi, Matsuo & Gilbert (2010); Chon (2011); Lubliner & Hiebert (2011); Wang, Shen & Masataki (2011); Anderson & Platten (2011); Smith (2011); Millar, Budgell & Kwong (2011); Kokkinakis, Skoldberg & Henriksen (2012); Parent (2012); Fukushima, Watanabe, Kinjo, Yoshihara & Suzuki (2012); Webb & Nation (2012); Cuningham (2012); Coxhead (2012); Budgell (2013)

2) Research purpose

slide-7
SLIDE 7

So why do we need a NEW GSL?

7

slide-8
SLIDE 8

8

West’s (1953 [1936]) GSL

shilling footman milkmaid telegraph timely stoppage invaluable

slide-9
SLIDE 9

West’s selection based on: 1) Frequency 2) Subjective criteria:

  • Ease of learning
  • Necessity
  • Neutrality

3) One corpus (1936) 4) Organising principle: word families

So why do we need a NEW GSL?

slide-10
SLIDE 10

10

Task 1: types, lemmas, word families

dog, dogs, develop, develops, developed, developing, undeveloped, underdeveloped, development, developments, value, values, valuable, invaluable, train, trains, trained, training, trainer, trainers

slide-11
SLIDE 11
  • 1. dog, dogs
  • 2. develop, develops, developed, developing,

undeveloped, underdeveloped, development, developments

  • 3. value, values, valuable, invaluable
  • 4. train, trains, trained, training, trainer, trainers

11

Words: types, lemmas, word families

  • 1. dog
  • 2. dogs
  • 3. develop
  • 4. develops
  • 5. developed
  • 6. developing
  • 7. undeveloped
  • 8. underdeveloped
  • 9. development
  • 10. developments
  • 11. value
  • 12. values
  • 13. valuable
  • 14. invaluable
  • 15. train
  • 16. trains
  • 17. trained
  • 18. training
  • 19. trainer
  • 20. trainers
  • 1. dog, dogs
  • 2. develop, develops, developed, developing
  • 3. developing (ADJ)
  • 4. undeveloped
  • 5. underdeveloped
  • 6. development, developments
  • 7. value, values
  • 8. valuable
  • 9. invaluable
  • 10. train, trains
  • 11. train, trains, trained, training
  • 12. training (NOUN)
  • 13. trainer, trainers
slide-12
SLIDE 12

Quantitative paradigm: 1) Frequency 2) Dispersion 3) Stability across corpora Organising principle: Lemma

Our wordlist

ARF

slide-13
SLIDE 13
  • RQ1: Is there a substantial overlap between frequent lexical

items in different general language corpora?

  • RQ2: What is the lexical core common to different language

corpora?

13

Method: RQs

slide-14
SLIDE 14

14

Method: Data

Corpora

LOB BNC BE06 EnTenTen12

Tokens

1.14 million 112 million 1.15 million 12.97 billion

Period

1961 1990s 2005-7 2012

Variety of English

British British British International

Spoken component NO

YES (10%) NO NO

Sample size

2k words of each text 40-50k words of each text 2k words of each text whole documents included

  • No. of texts

500 4,049 500 21.55 million

Sampled text-types 15 genres of writing

Imaginative (20%) and informative (70%) writing + speech (10%) 15 genres of writing www – a wide range

  • f documents
slide-15
SLIDE 15

1. Creation of wordlists based on the four corpora (LOB, BNC, BE06, EnTenTen12). 2. Comparison of wordlists pairwise (RQ1). 3. Identification of a common lexical core among the four wordlists and extraction of the shared items (RQ2). 4. Identification of lexical items reflecting recent vocabulary changes in the English language based on BE06 and EnTenTen12.

15

Method: Procedure

slide-16
SLIDE 16

16

Results

Corpora LOB-3000 BNC-3000 BE06-3000 EnTenTen12-3000 LOB-3000 x 2,497 (83.2%) rs=.870, p<.001 2,458 (81.9%) rs=.832, p<.001 2,352 (78.4%) rs=.762, p<.001 BNC-3000 x x 2,514 (83.8%) rs=.870, p<.001 2,428 (80.9%) rs=.819, p<.001 BE06-3000 x x x 2,518 (83.9%) rs=.826, p<.001 EnTenTen12-3000 x x x x

slide-17
SLIDE 17

17

Results (cont.)

Word class Overlap nouns 1009 verbs (+ modals) 488+10 adjectives 317 adverbs 166 conjunction & prepositions 63 pronouns 22

  • ther (gram. words)

47 TOTAL 2,122

71%

Lexical innovations Examples New words (forms)

Internet, website, online, email

New meanings/functions of old words

user, via, network, client, mobile, file, web

Old words with recent prominence

medium, phone, key, technology, guy, kid, environment, computer, movie, definitely

slide-18
SLIDE 18

new-GSL

Lexical core

2,122 items

Lexical innovations

374 items

new-GSL

2,496 lemmas

Wordlist Number of items

Types Lemmas Word families

new-GSL 5,115 2,496

  • West’s GSL

7,826 4,114 2,000 18

slide-19
SLIDE 19

19

Task 2

NOUNS Word rank cake cloud colour computer Internet letter lover man people prayer sex society time TV woman ADJECTIVES Word rank afraid bad blue good key new

  • ld
  • rganic

red stupid

A = first 500; B = 501 – 1000; C = 1001-2500

slide-20
SLIDE 20

20

NOUNS Word new-GSL rank

  • 1. TIME

45

  • 2. PEOPLE

79

  • 3. MAN

105

  • 4. WOMAN

198

  • 5. society

543

  • 6. letter

546

  • 7. Internet

701

  • 8. colour

724

  • 9. computer

861

  • 10. sex

1422

  • 11. TV

1484

  • 12. cloud

2385

  • 13. lover

2429

  • 14. prayer

2454

  • 15. cake

2457 ADJECTIVES Word new-GSL rank

  • 1. GOOD

73

  • 2. NEW

87

  • 3. OLD

160

  • 4. BAD

304

  • 5. key

660

  • 6. red

712

  • 7. blue

1018

  • 8. afraid

1878

  • 9. organic

2367

  • 10. stupid

2438

slide-21
SLIDE 21

21

slide-22
SLIDE 22
  • LOB, BNC, BE06, EnTenTen12
  • 71 % overlap: lexical core + new items
  • new GSL: 1) frequency 2) dispersion 3) stability across corpora
  • new GSL same coverage with half of the lemmas

22

Summary

slide-23
SLIDE 23

Going back to the two target groups of users: a) Practitioners: creating usable interface b) Researchers: American supplement (methodological questions/decisions to make)

23

Looking ahead

slide-24
SLIDE 24

24 Text box: User input Tree tagger

new-GSL

Lexical complexity OUTPUT

slide-25
SLIDE 25

Thank(1081) you(25)!

25