SLIDE 1 THE APPLICATIONS OF CORPUS LINGUISTICS IN LANGUAGE TEACHING AND ASSESSMENT International Conference on Language Teaching and Assessment, August 21st- 23rd, 2017, UIN Syarif Hidayatullah Jakarta.
Clarence Green Nanyang Technological University NIE3-03-118, 1 Nanyang Walk, Singapore 637616 clarence.green@nie.edu.sg
NANYANG TECHNOLOGICAL UNIVERSITY National Institute of Education –
SLIDE 2 Outline:
- A (very) brief history of Corpus Linguistics in Language Assessment
- The Turn to Academic and Disciplinary Literacy
- Frequency and Language Acquisition
- Wordlists, word-families and lemmas
- Collocations and Phrase lists
- Useful assessment tools for teachers and researchers
- Data driven Learning: Assessing what a learner ‘can-do’.
SLIDE 3 The Applications of Corpus Linguistics in Language Teaching and Assessment
- Corpus Linguistics: study of language via large, computer-readable collections of authentic
text or speech.
- Few areas of contemporary language teaching and assessment where the field has not had an
impact (Biber, & Conrad, 2010; Römer, 2010).
- This talk considers how teachers and language learners can draw on CL for data-driven
learning, effective curriculum design and proficiency assessment.
SLIDE 4 A (very) brief history of Corpus Linguistics and Language Assessment
- Benefits of ELT informed by corpus linguistics gaining ever more recognition (Römer, 2011).
- Yet, insight is quite old: CL originates in the context of language education.
THORNDIKE (1921) - A psychologist: frequency as behavioral learning.
- 1921-1944 frequency-based ‘wordbooks’: “enable a teacher to know not only the general
importance of each word so far as frequency of occurrence measures that, but also its importance in current popular reading” (Thorndike and Lorge 1944: 1).
- Wanted high school graduation conditional on mastering most frequent 15000 English words (in
his book). Learning in the classroom could be designed from more frequent word to less frequent words.
SLIDE 5 The ELT/ESL Turn: The General Service List
- The General Service List/GSL (West 1953) shifted the field more into second language learning.
- West (1953) explicitly, though not exclusively, had adult second language learners in mind. For
nearly 50 years GSL core list in Applied Linguistics.
- GSL: claimed to be the 2000 most frequent headwords in English
- We might call such material corpus-based rather than corpus driven.
SLIDE 6 The ELT/ESL Turn
- Past 30 years, increasing alignment between corpus informed material and the adult second
language learner in the university stetting.
- Partly a shift in English as a global lingua franca and the globalization of the university.
- West’s (1953) GSL: no less than two recent updates: Brezenia & Gabslova (2013), Browne,
Culligan & Phillips (2013), a “guideline for L2 vocabulary learning”
- Both are publically available and free for teaching and research.
SLIDE 7 The Academic Turn: The Academic Wordlist
- Naturally, along with university context came interest in academic language
- E.g. Coxhead’s (2000) seminal, corpus derived Academic Wordlist (AWL)
- Coxhead’s (2000: 229) methodology was far more rigorous than the historical lists. For example,
AWL used word families (Bauer & Nation, 1993), minimum freq and range, excluded GSL words, and was derived from a corpus of university reading material.
- Pedagogical impact has been immense: 570 word families; ordered by frequency bands; covers
around 10% of all academic texts.
SLIDE 8 The Academic Turn: The Academic Vocabulary List
- Gardner and Davies (2014: 3) critique the AWL: word families do not capture P.O.S
- e.g. react (a verb meaning respond) is in same WF as reactor (a noun, most often a source of
nuclear power).
- Other problems: AWL corpus used the > 50 year old GSL.
- G and D (2014) thus developed Academic Vocabulary List (AVL): A lemma based list ranked by
frequency and derived from the academic subsection of the Corpus of Contemporary American English (Davies 2010) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. study.n group.n system.n social.j provide.v however.r research.n level.n result.n include.v
SLIDE 9 Recent Challenges: Discipline-specific language
- Same AWL frequency band: assess and economy; yet economy likely more frequent in
Economics, and assess in education.
- Hyland (2008), 3.4 million words of journals, dissertations in engineering, biology, applied
linguistics, business “over half the items in each list do not occur at all in any other discipline” = Recent wave of discipline-specific wordlists: business (Konstantakis 2007), medical (Lei and Liu 2016), engineers (Todd 2017), nurses (Yang 2015), sciences (Coxhead & Hirsh, 2007), agriculture (Martinez et al. 2009).
SLIDE 10 Frequency, Psycholinguistics and Assessment
- How many words as does a learner need to know?
- Corpus linguistics has shown that the most frequent 2000 words in a wordlist of English based
- n a large corpus cover account for about 83% of all texts.
- Approx. 95% for if a student knows top 10 thousand WF
SLIDE 11
Frequency, Psycholinguistics and Assessment
Why all this emphasis on frequency? George Zipf (1902–1950) & Zipf’s law:
SLIDE 12 Frequency, Psycholinguistics and Assessment
- Tells us much about the psychology of human communication:
We aim to maximize efficiency and economy in language Function words tied to grammatical relations are the core building blocks of language Closed class items much more repeated than lexical items, many of which are used only once. Can also surmise frequency effects create: function words, closed class, polysemy
Hapax legomena and assessment: the long tail of a distribution in which most lexical words occur
If I collected some essays, computed a wordlist ordered by frequency and found one student’s work had a longer Hapax tail to their Zipfian distribution, what might this suggest? A metric of proficiency. Hapax legomena
SLIDE 13
Frequency, Psycholinguistics and Assessment
Word Frequency Effect in psycholinguistics: words higher on the Zipf curve are easier to process. lexical decision experiments tell us frequent words/phrases derived from corpora have faster reaction time. Sensitivity to the frequency of academic words thus becomes a psychometric tool for assessing linguistic proficiency (Akbari, 2014)
SLIDE 14 Put into words:
- 1. Humanities students were generally faster processing language than science students
- 2. 3rd year/graduates were generally faster than 1st years at processing academic language
- 3. 3rd year students faster on high freq words I their discipline.
Implications: frequent experience with the language of a discipline changes the resting activation of academic words in your brain to make language processing easier.
Artificial frequency inflation in reading material of words and phrases associated with a discipline could accelerate lowering the processing cost of disciplinary language & these can be derived from subject specific corpora.
SLIDE 15 Implications for lang. teaching & assessment: Interim Summary
- As a general rule of Language Cognition, children/adults acquire high frequency words
(and grammatical patterns) first and then progress through the frequency bands.
- Pedagogically intuitively selecting vocabulary for language learning is therefore highly
inefficient.
- Frequency of exposure reinforces cognition, makes processing easier, promotes
proficiency.
- If students learn vocabulary by following a band progression learning up to the most
frequent 10000 word families in English, they will know ≈ 95-98% of words in any given text (Nation, 2006).
- This is a natural scaffolding (i+1 in Krashen’s sense). If student has acquired the first 1000,
it is safe to assume they are developmentally ready for the next 1000.
SLIDE 16 Assessment Applications: Tools
- Linguists have realized that proficiency can be tested by trying to determine which frequency
band level a learner is currently at:
- 1. Vocabulary Levels Test: Lest your learners on words from within the same frequency band, e.g.
the first 1000, second 1000 etc, up to 20,000 word families.
http://www.lextutor.ca/tests/levels/productive/2ka.html
- 2. Vocabulary Size Test: 40 questions, 10 at each of 14 thousand-levels. Should give you a good
idea of the number of English words students know.
http://www.lextutor.ca/tests/levels/recognition/1_14k/
Nation’s website: http://www.victoria.ac.nz/lals/about/staff/paul-nation#vocab-tests
SLIDE 17 Assessment: Writing
- Proficient writers should have more words from lower bands in a corpus than less proficiency
writers.
- Assessing this can be done via automatic corpus analysis: COCA:
http://www.wordandphrase.info/analyzeText.asp
SLIDE 18 What else can a corpus offer? Phraseology and Collocations
- If I am strong I am powerful. But why do I drink strong tea, and never powerful tea?
Collocation: given W1> pW2 “The principle of idiom is that a language user has available to him [sic] a large number of semi- preconstructed phrases that constitute single choices, even though they might appear to be analyzable into segments” Sinclair (1991: 110) Co-occurrence is meaningful:
(Conrad & Biber 2004)
SLIDE 19 What do you notice about the collocates of student and book in COCA?
Student: the words that you subjectively associate can be objectively extracted from corpora. Book = something that is read, written by an author, published etc
- Words can often be defined by ‘the company they keep’ (Firth, 1968)
- Vocabulary input in SLA can/should be contextualized by easily obtainable corpus information
SLIDE 20 Assessing Phrasal and Collocational Knowledge
- Easy assessment tools for the students level of proficiency with discourse beyond the word.
- LEXTUTOR (Cobb, 2017) has compiled many of the published CL informed assessment tools
- 1. PHRASE TEST based on Martinez & Schmitt (2012) frequency ordered phrase list:
http://www.lextutor.ca/tests/levels/recognition/phrasal/
- 2. Word association test based on collocations (Read, 1998) http://www.lextutor.ca/tests/associates/
SLIDE 21 The Applications of Corpus Linguistics in Language Teaching and Assessment: A current project (Green & Lambert, in progress)
- Current research at National Institute of Education, Nanyang Technological University
- Large corpus study: a series of discipline-specific academic vocabulary lists, targeted at the
secondary school ELT context, and extensive material on their collocates.
- Eight core secondary school subjects are being covered: Biology, Math, History, Physics,
Geography, Chemistry, English, and Economics
SLIDE 22 The Applications of Corpus Linguistics in Language Teaching and Assessment: (Green & Lambert, in progress)
- Corpus based on secondary school textbooks, majority of textbooks in the corpus (82%) had
publication dates within the past five years
- To ensure that this material was likely to represent that which students encountered in their
subjects texts were largely taken from approved textbook lists.
SLIDE 23 The Applications of Corpus Linguistics in Language Teaching and Assessment: (Green & Lambert, in progress)
Methods to determine the usefulness of a word for English Language Teaching 1. Minimum Frequency: minimum 28.57 per million words in discipline 2. Range: in more than 50% texts in the discipline corpus
- 3. Dispersion: homogeneity of occurrence; lemmas with low dispersion (0.5) excluded
- 4. Range Ratio: occurs in 50% of the texts in discipline with at least 20% its expected frequency
5. Discipline-specific Frequency Ratio and Keyness: needed to be a keyword and ratio 3 times higher than other disciplines 6. Major Part of Speech: noun, verb, adjective and adverb.
SLIDE 24 Conclusion Applications of Corpus Linguistics in Language Teaching and Assessment
- Language teaching and assessment material, whether vocabulary, grammar, curriculum,
proficiency tests etc., can be informed by important information from corpus research that has psychological authenticity.
- Corpus-derived material can help us as teachers overcome intuitive selections of word and
discourse material to teach, and can help us make our language teaching and assessment more better.
- Much of the state of the art empirical research (packaged as classroom assessment tools) is
freely available online and easy to use!
SLIDE 25
PhD (Applied Linguistics) EdD (English Language and Literature MA (Applied Linguistics) MEd (English)
THANK YOU!
SLIDE 26
Brezina, V. and D. Gablasova. 2013. ‘Is there a core general vocabulary? Introducing the New General Service List’, Applied Linguistics, 36/1: 1-22. Cobb, T. 2012. The Compleat Lexical Tutor for Data driven Learning on the Web. Montreal University of Quebec, http://lexutor.ca/ Coxhead, A. 2000. ‘A new academic word list’, TESOL Quarterly 34: 213-238. Dolch, E. 1936. ‘A basic sight vocabulary’, The Elementary School Journal 36/6: 456-460. Firth, J.R. 1957. Papers in Linguistics 1934–1951. Oxford University Press. Fry, E. 1957. ‘Developing a word list for remedial reading’, Elementary English 34/7: 456-458. Gardner, D., and M. Davies. 2014. ‘A new academic vocabulary list’, Applied Linguistics 35/3: 305-327. Garnier, M. and N. Schmitt. 2015. ‘The PHaVE List: A pedagogical list of phrasal verbs and their most frequent meaning senses’, Language Teaching Research 19/6: 645-666. Hyland, K. 2008. ‘As can be seen: Lexical bundles and disciplinary variation’, English for Specific Purposes 27/1: 4-21. Hyland, K., and P. Tse. 2007. Is there an academic vocabulary?’, TESOL Quarterly 41/2: 235-253 Martinez, R., and N. Schmitt. 2012. ‘A phrasal expressions list,’ Applied Linguistics 33/3: 299-320. Nation, P. 2008. Teaching ESL/EFL Reading and Writing. Routledge. Römer, U. 2011. ‘Corpus research applications in second language teaching’, Annual Review of Applied Linguistics 31: 205-225. Simpson-Vlach, R. and N. Ellis. 2010. ‘An academic formulas list: New methods in phraseology research’, Applied Linguistics 31/4: 487-512. Sinclair, J. (ed.). 1995. COBUILD English Collocations. Harper Collins. Thorndike, E.L. 1921. The Teacher’s Word Book. Teachers College, Columbia University.
References