Look for monodisciplinary journals by finding those with a high - - PowerPoint PPT Presentation

look for monodisciplinary journals by finding those with
SMART_READER_LITE
LIVE PREVIEW

Look for monodisciplinary journals by finding those with a high - - PowerPoint PPT Presentation

C ORPUS APPROACHES TO THE LANGUAGE OF INTERDISCIPLINARY RESEARCH ARTICLES Aug 2013 Aug 2015 ESRC-funded project: ES/K007300/1 Paul Thompson with Susan Hunston, Akira Murakami, and Dominik Vajn B ACKGROUND Substantial amount of work


slide-1
SLIDE 1

CORPUS APPROACHES TO THE LANGUAGE OF

INTERDISCIPLINARY RESEARCH ARTICLES

Aug 2013 – Aug 2015 ESRC-funded project: ES/K007300/1

Paul Thompson

with Susan Hunston, Akira Murakami, and Dominik Vajn

slide-2
SLIDE 2

2

BACKGROUND

¢ Substantial amount of work carried out on academic

discourse in corpus linguistics (e.g., Biber; Hyland).

¢ The research has typically drawn clear boundaries

between disciplines (e.g., history vs biology) or between levels of research (e.g, pure vs applied).

¢ In recent time there has been an expansion in

‘interdisciplinary research’

¢ Little work, nevertheless, on the linguistic nature of

interdisciplinary research as opposed to general research discourse and disciplinary research discourse.

slide-3
SLIDE 3

MAIN AIM

Ø to achieve a fuller understanding of the

distinctive features of discourse practices in interdisciplinary research and of how they differ from discourse practices in conventional disciplines Global Environmental Change – a successful

interdisciplinary journal published by Elsevier

slide-4
SLIDE 4

DATA

¢ * Full holdings of a successful IDR journal,

Global Environmental Change, 1990-2010

— 675 articles ¢ * Holdings of 5 IDR journals (interdisciplinary)

and 5 specialist journals (monodisciplinary), 2001-2010

¢ Surveys, interviews with editors, board members,

authors

Data-driven approach – collect data and see what comes out it

slide-5
SLIDE 5

Look for monodisciplinary journals by finding those with a high clustering coefficient, and multidisciplinary journals are those with a low clustering coefficient.

Journals with connections to journals which are themselves well-connected to one another are said to have a high clustering coefficient

slide-6
SLIDE 6

R & Sketch Engine

slide-7
SLIDE 7

PAPER LABELLING

¢ Categories

1.Empirical 2.Policy discussion 3.Research agenda / Research Framework 4.Other

¢ Disagreements were resolved by negotiation.

slide-8
SLIDE 8

AGREED LABELLING

5 10 15 20 25 30 35 40 45 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Empirical Policy Agenda/Framework Other

slide-9
SLIDE 9

INTRODUCTION

¢ Corpus data can be explored ‘top-down’ or

‘bottom-up’

¢ Texts can be grouped by text-external criteria

  • r by text-internal

¢ We have taken different bottom-up approaches

to the data:

— Multidimensional Analysis [6 new dimensions] — Keyword analysis — Topic modelling ¢ Following slides present topic modelling and

the results of our analyses.

slide-10
SLIDE 10

CORPUS

¢ All the full research papers in the 11 journals ¢ Main text only ¢ 11,703 papers ¢ 56 million words

slide-11
SLIDE 11

1 1

FEATURES OF TOPIC MODELS

¢ Probabilistic TM is a machine learning technique that

automatically identifies “topics” in a given corpus (Blei, 2012)

¢ Latent Dirichlet allocation (LDA) ¢ Automatically identifies “topics” in a given corpus

— keywords in each topic — distribution of topics in each document

  • A document consists of multiple topics

¢ Methodologically,

— Bag-of-words approach → single words

slide-12
SLIDE 12

DIVISION OF PAPERS

¢A research paper

— is longer than a typical document

targeted in topic models

— can contain multiple topics

¢Better to divide papers into multiple

parts

¢This allows the investigation of topic

transition within papers as well.

slide-13
SLIDE 13

DETAILS

¢ Punctuations, numbers, and the standard list of stop

words were removed.

¢ All the words were stemmed using the Porter stemmer

(e.g., require → requir, analysis → analysi).

¢ Only targeted terms that appear in at least 0.1% of all

the documents.

¢ Each text was assigned with the information on where in

the paper the paragraph(s) appeared.

— e.g., 70% from the beginning of the paper

¢ This created 140,816 ‘texts’ with an average length of

248 words per text (SD = 57.6) after removal of the stop words

¢ topicmodels package in R

slide-14
SLIDE 14

NUMBER OF TOPICS

¢ No agreed way to automatically determine the

number of topics.

¢ We built topic models with 10, 20, 30, . . . ,

90,100 topics

¢ Each topic in the model with 100 topics looked

interpretable. → 100 topics

slide-15
SLIDE 15

BY-TEXT TOPIC DISTRIBUTION

slide-16
SLIDE 16

THINGS WE CAN DO

¢ Identify prominent topics at different positions

  • f a paper

¢ Compare prominent topics across papers or

across journals

¢ Cluster papers according to topic distribution

inter alia

slide-17
SLIDE 17

BY-PAPER TOPIC DISTRIBUTION

slide-18
SLIDE 18

Topic 7

By-Paper Topic Distribution

slide-19
SLIDE 19
slide-20
SLIDE 20

countri develop global world nation econom industri intern million year popul growth environ ment import major trade decad centuri economi

slide-21
SLIDE 21

WITHIN-PAPER TOPIC DISTRIBUTION

Each line is a different topic – there are a hundred in total

slide-22
SLIDE 22

WITHIN-PAPER TOPIC DISTRIBUTION

slide-23
SLIDE 23

KEYWORDS OF THE FOUR TOPICS

¢ Topic 3: use, dna, primer, sequenc, clone, pcr, fragment,

rna, cdna, probe, perform, hybrid, isol, amplifi, min, total, follow, gene, cycl, product

¢ Topic 40: min, buffer, use, extract, incub, contain, solut,

centrifug, assay, protein, describ, determin, wash, mixtur, supernat, gel, prepar, homogen, acid, follow

¢ Topic 55: sampl, use, extract, standard, determin, analysi,

column, analyz, method, analys, digest, analyt, concentr, abc, dri, acid, filter, mass, min, detect

¢ Topic 97: collect, use, place, sampl, water, chamber, day,

dri, remov, week, solut, filter, contain, experi, store, replic, plastic, diamet, pot, tube

slide-24
SLIDE 24

TOPICS PROMINENT TOWARDS THE END OF PAPERS

slide-25
SLIDE 25

TOPICS PROMINENT TOWARDS THE END OF PAPERS

¢Topic 50: will, can, may, need, requir,

must, target, howev, limit, possibl, current, futur, like, make, becom, potenti, necessari, provid, exist, exampl

¢Topic 53: may, suggest, like, might,

howev, possibl, evid, support, appear, associ, although, result, also, seem, find, hypothesi, strong, fact, explain,

  • ccur
slide-26
SLIDE 26

TOPICS PROMINENT AT THE BEGINNING OF PAPERS

slide-27
SLIDE 27

TOPICS PROMINENT AT THE BEGINNING OF PAPERS

¢ Topic 1: pollut, deposit, sourc, atmospher, air, area, lichen,

element, moss, main, monitor, load, industri, isotop, contribut, anthropogen, particul, dust, concentr, major

¢ Topic 7: countri, develop, global, world, nation, econom,

industri, intern, million, year, popul, growth, environment, import, major, trade, mani, decad, centuri, economi

¢ Topic 32: etaal [=et al], sediment, lake, river, contamin,

concentr, water, mercuri, effluent, organ, studi, estuari, environ, wastewat, bay, mehg, figa, pollut, aquat, sourc

¢ Topic 72: studi, rice, mani, china, recent, howev, wide,

includ, also, sever, import, high, common, various, major, larg, well, report, varieti, although

slide-28
SLIDE 28

TOPICS PROMINENT AT THE BEGINNING AND THE END OF PAPERS

slide-29
SLIDE 29

TOPICS PROMINENT AT THE BEGINNING AND THE END OF PAPERS

¢

Topic 31: rural, social, communiti, local, cultur, econom, polit, place, within, new, way, discours, relat, peopl, argu, particular, ident, societi, natur, construct

¢

Topic 33: method, use, approach, techniqu, can, applic, appli, develop, requir, propos, base, provid, limit, altern, advantag, allow, howev, combin, work, need

¢

Topic 58: research, paper, section, discuss, focus, approach, work, develop, analysi, issu, understand, framework, scienc, literatur, process, address, knowledg, review, studi, provid

¢

Topic 81: chang, climat, scenario, adapt, impact, vulner, futur, global, assess, capac, project, polici, uncertainti, respons, will, current, region, rise, warm, ipcc

¢

Topic 87: process, system, biolog, agent, physic, theori, inform, organ, natur, can, concept, dynam, intern, quantum, principl, mechan, one, environ, space, idea

slide-30
SLIDE 30

area urban citi popul local hous household rural resid locat park counti migrat district residenti access build town centr villag

slide-31
SLIDE 31

farm agricultur farmer product food

  • rgan

produc market practic household incom local livestock labour manag consum convent econom coffe activ Topic revolves around agriculture from a ‘human’ point of view

slide-32
SLIDE 32

100

veget graze grassland grass pastur cover manag intens studi cattl product year area biomass stock anim nativ forag meadow livestock

11

crop yield wheat harvest maiz system year grain product weed rotat cultiv fertil manag fallow practic field tillag corn cereal

slide-33
SLIDE 33

Water

32

etaal sediment lake river contamin concentr water mercuri effluent

  • rgan

studi estuari environ wastewat bay mehg figa pollut aquat sourc Predominantly Environmental Pollution

slide-34
SLIDE 34

Different senses of water:

Resource and part of ecological system

19

water irrig reservoir potenti surfac suppli storag capac condit use avail releas limit qualiti system also demand evapor inflow balanc 84 rainfal river runoff event stream catchment flood basin watersh hydrolog flow area discharg wetland rain drainag eros slope storm intens

slide-35
SLIDE 35

CONCLUSION

¢ Topic models are useful in exploring the

content of the papers in the corpus.

¢ “Topics” identified in topic models are generally

interpretable based on domain knowledge.

¢ Topic models help us identify keywords at

different positions in papers.

slide-36
SLIDE 36

81

chang climat scenario adapt impact vulner futur global assess capac project polici uncertainti respons will current region rise warm ipcc

58

research paper section discuss focus approach work develop analysi issu understand framework scienc literatur process address knowledg review studi provid

How about Global Environmental Change? Which topics are typical of GEC?

slide-37
SLIDE 37

SHELL NOUNS (BENITEZ 2014, 2015)

¢ Semantically unspecific abstract nouns that

encapsulate and label

¢ Variously called: — ‘general nouns’ (Halliday & Hasan 1976; Mahlberg

2005)

— ‘Vocabulary 3 items’ (Winter 1977) — ‘enumerables’ and ‘advance labelling’ (Tadros 1985) — ‘anaphoric nouns’ (Francis 1986) — ‘carrier nouns’ (Ivanič 1991) — ‘labels’ (Francis 1994) — ‘shell nouns’ (Hunston & Francis 2000; Schmid 2000) — ‘signalling nouns’ (Flowerdew 2003).

slide-38
SLIDE 38

SHELL NOUNS (FURTHER)

¢ Frequently the head of definite or demonstrative

noun phrases (e.g. the argument, this issue)

¢ Shell-noun phrases expedite cognitive processing

  • f the text by transforming complex pieces of

information into single discourse entities

¢ They also offer writers the opportunity to express

their stance on the meaning underlying a particular discourse segment

slide-39
SLIDE 39

TOP SHELL NOUNS IN GLOBAL ENVIRONMENTAL CHANGE

¢ Scenario (6882), model (6851), study (6369),

example (5044), time (5024), result (4794), effect (4685), problem (4684), issue (4639), approach (4336), way (3257), question (2334), paper (2087), point (1975), method (1901) [lemma="scenario|study|example|effect| problem|issue|approach|way|question| paper|point" & tag="NN.?"]

slide-40
SLIDE 40

Top four journals in terms of relative use of these shell nouns are three ID journals and one ‘transdisciplinary’

[lemma="scenario|study|example|effect| problem|issue|approach|way|question| paper|point" & tag="NN.?"]

slide-41
SLIDE 41

EXAMPLE

¢ 3463 for example ¢ 244 an example ¢ 64 another example ¢ 60 one example

slide-42
SLIDE 42

Interactive metadiscourse:

Code glosses (Vande Kopple 1985):

rephrasing, explaining, elaborating, and exemplifying

there is a stronger emphasis on composition and distribution of economic activity than on growth of

  • utput. For example, although there are many

definitions of sustainability, it is often taken to mean that the well-being of the current generation should not be advanced at the expense of future generations.

which means this/that means for instance for example such as that is to say

slide-43
SLIDE 43

Code glosses (Vande Kopple 1985):

rephrasing, explaining, elaborating, and exemplifying

slide-44
SLIDE 44

Multidisciplinary journals 7K 6K 5K 4K Same four journals have relatively high use of code glosses as of shell nouns

slide-45
SLIDE 45
slide-46
SLIDE 46

CODE GLOSSES: EXAMPLES

¢ All these toxic substances should have direct impact

  • n plasmalemma and cellular machinery, therefore,

reducing protoplast survival [29,14]. Driselase, for instance , has often been used after purification [29]. PS [monodisciplinary]

¢ Public activity generally occurred in the aftermath of

the Montreal Protocol, and it simply accelerated the movement which had already been put in place. In the UK, for instance , Friends of the Earth ... In the USA ... [GEC]

¢ We are talking about gradual and slow processes,

where both positive and negative effects may only be seen years and decades after changes in emissions. Take, for instance , the Norwegian situation ... [GEC]

In GEC, the examples are typically extended elaborations on propositions

slide-47
SLIDE 47
  • Discussed a bottom-up approach to exploration of

research articles: topic-modelling

  • topic-modelling usually used for short texts - here

used to identify probable topics in sections of texts, enabling us to assess 'topic' relations to places within texts

  • analysis tells us primarily what texts are about but also

something about what writers write about, and how, in different parts of texts

  • some evidence of higher use of metadiscourse and of

some shell nouns in ID articles, suggesting that such texts are more outward-facing. The use of metadiscourse I s constrained by text length, however.

slide-48
SLIDE 48

TO FOLLOW THE IDRD PROJECT

¢Visit: — idrd-bham.info/ ¢Or: — @IDRD_bham