The lexico-grammar of stance: an exploratory analysis of scientific - - PowerPoint PPT Presentation

the lexico grammar of stance
SMART_READER_LITE
LIVE PREVIEW

The lexico-grammar of stance: an exploratory analysis of scientific - - PowerPoint PPT Presentation

The lexico-grammar of stance: an exploratory analysis of scientific texts Stefania Degaetano & Elke Teich Stefania Degaetano & Elke Teich Universitt des Saarlandes Saarbrcken Linguistische Profile interdisziplinrer Register


slide-1
SLIDE 1

The lexico-grammar of stance:

an exploratory analysis of scientific texts Stefania Degaetano & Elke Teich Stefania Degaetano & Elke Teich Universität des Saarlandes Saarbrücken

Linguistische Profile interdisziplinärer Register (2006-2009) Register im Kontakt (2011-2014)

slide-2
SLIDE 2

Overview

  • Background & Motivation
  • Corpus
  • Methodology
  • Methodology
  • Analysis & Results
  • Discussion & Future Work

23.02.2011 2

slide-3
SLIDE 3

Background & Motivation

Growing interest in meaning-oriented analysis of texts

  • Descriptive linguistics/corpus linguistics

(Halliday 1985, Biber et al. 1999, Martin & White (Halliday 1985, Biber et al. 1999, Martin & White 2005, Reis 1999, Hunston & Thompson 2003)

  • Computational linguistics

(Pang & Lee 2008, Liu 2010, Taboada et al. forthcoming)

23.02.2011 3

slide-4
SLIDE 4

Background & Motivation

Meaning potential is associated to functions (metafunctions)

  • ideational

– expression of experience, including processes within and beyond the self and phenomena of the external world and of consciousness

  • interpersonal

– personal participation; it expresses the speaker’s role in the speech situation, his personal commitment and his interaction with others

  • textual

– concerned with the creation of text; it expresses the structure of information, and the relation of each part of the discourse to the whole and to the setting

(Halliday 1973: 351)

23.02.2011 4

slide-5
SLIDE 5

Background & Motivation

Understanding of interpersonal meaning is still fragmentary because

it is realized in a variety of forms

  • phrasal/clausal, e.g., it is important that, it is obligatory to
  • lexical, e.g., modal verbs, modal adverbs
  • lexical, e.g., modal verbs, modal adverbs
  • many ambiguous lexemes (connotations!)

it is extremely context-dependent (register)

  • phrasal/clausal, e.g., You should write an outline. (British

National Corpus) vs. It is obligatory to write an outline.

  • many ambiguous lexemes (connotations!)

23.02.2011 5

slide-6
SLIDE 6

Background & Motivation

  • Contribute to a better understanding of how

interpersonal meaning is expressed

  • Registers of scientific writing

commonalities/differences across different commonalities/differences across different scientific disciplines in expressing interpersonal meaning

23.02.2011 6

“The notion of register is typically described as functional variation” (Quirk et al. 1985:15), i.e., variation according to type of situational context.

slide-7
SLIDE 7

Corpus

Darmstadt Scientific Text Corpus (DaSciTex)

  • full English journal articles

(early 2000’s)

23.02.2011 7

(early 2000’s)

  • approx. 17 million words
  • tokenized, pos-tagged,

lemmatized

  • currently being

diachronically extended (1960/70’s)

(Teich & Holtz 2009, Teich & Fankhauser 2010)

slide-8
SLIDE 8

Methodology

Stance

  • one kind of interpersonal meaning
  • refers to how writers convey personal feelings and

assessments in addition to propositional content

  • three kinds of meaning associated with stance
  • three kinds of meaning associated with stance

− epistemic (e.g. probably, it is possible to) − attitudinal (e.g. surprisingly, it is important to) − style (e.g. honestly, briefly) (cf. Conrad & Biber 2003)

(this kind of interpersonal meaning is also known under other labels: ‘evaluation’(Hunston & Thompson 2003), ‘appraisal’ (Martin 2003), ‘hedging’(Hyland 1996))

23.02.2011 8

slide-9
SLIDE 9

Methodology

Stance realized by lexico-grammatical patterns

“[…] if a combination of words occurs relatively frequently, if it is dependent on a particular word frequently, if it is dependent on a particular word choice, and if there is a clear meaning associated with it […]”

(Hunston & Francis 2000: 37)

23.02.2011 9

slide-10
SLIDE 10

Methodology

Examples

it is ADJ to-INF

It is, however, possible to call this result into question. (C1-Linguistics)

it is ADJ that

It is clear that in some cases nesting is correlated with […]. (C2-Biology)

this v-link ADJ for/to/if this v-link ADJ for/to/if

This is difficult to do for the algorithm. (A-ComputerScience)

make it ADJ

[…] two facts make it possible to classify the genes. (C2-Biology)

dt most ADJ n

[…] since they have the most important optimization potential […]. (B3-CAD)

evaluative noun of

Its main drawback lies in the difficulty of obtaining a large set […] (B1-CompLing) (cf. Degaetano 2010)

23.02.2011 10

slide-11
SLIDE 11

Methodology

Extraction of pattern instances Corpus Query Processor (Evert 2005)

  • searches by means of regular expressions
  • searches by means of regular expressions
  • one very common pattern is the

it is ADJ to-INF pattern, e.g. it is easy to "it|It" [pos="VB.*"][]{0,3}[pos="J.*"] "to";

23.02.2011 11

slide-12
SLIDE 12

Methodology

Examples

23.02.2011 12

slide-13
SLIDE 13

Methodology

EPISTEMIC stance ATTITUDINAL stance POSSIBILITY IMPORTANCE COMPLEXITY

  • ther
  • possible, feasible
  • impossible,
  • important,

necessary, relevant, vital,

  • difficult, hard
  • easy, simple
  • interesting, intriguing
  • worthwhile, worth
  • natural, common,

Stance & meaning groups

23.02.2011 13

  • impossible,

unfeasible relevant, vital, essential, significant

  • trivial,

unimportant, unnecessary

  • easy, simple
  • natural, common,

customary

  • reasonable, plausible
  • useful , instructive,

advantageous, helpful

  • sufficient, enough
  • desirable

(classified according to WordNet)

slide-14
SLIDE 14

Analysis

Stance as expressed by the it is ADJ to-INF pattern

  • differences / commonalities across different registers
  • f DaSciTex in terms of stance
  • f DaSciTex in terms of stance
  • Do the ‘mixed disciplines’ show differences in

comparison to computer science and their ‘pure disciplines’?

23.02.2011 14

slide-15
SLIDE 15

Analysis 1 – Results

Epistemic vs. attitudinal stance

epistemic attitudinal subcorpus F % F % A 186 32.75 382 67.25 B1 72 29.51 172 70.49 B2 144 33.64 284 66.36 B3 133 28.60 332 71.40 B3 133 28.60 332 71.40 B4 164 38.86 258 61.14 C1 129 32.74 265 67.26 C2 75 35.38 137 64.62 C3 153 36.17 270 63.83 C4 205 35.59 371 64.41

Mixed disciplines Pure disciplines A: computer B1: computational linguistics C1: linguistics science B2: bioinformatics C2: biology B3: computer aided design C3: mechanical engineering B4: microelectronics C4: electrical engineering

23.02.2011 15

slide-16
SLIDE 16

Analysis 1 – Results

Summary Within the it is ADJ to-INF pattern

– more attitudinal stance expressed by the IMPORTANCE and COMPLEXITY-group IMPORTANCE and COMPLEXITY-group – less epistemic stance expressed by the POSSIBILITY-group

23.02.2011 16

slide-17
SLIDE 17

Analysis 2 – Results

Meaning groups

60% 70% 80% 90% 100% OTHERS 0% 10% 20% 30% 40% 50% OTHERS COMPLEXITY IMPORTANCE POSSIBILITY

23.02.2011 17

slide-18
SLIDE 18

Analysis 2 – Results

IMPORTANCE-group

28,28 28,27 40 35,55 27,66 28,3 36,41 25,87 25 30 35 40 45 Mixed disciplines Pure disciplines % % % % % 12,5 5 10 15 20 25

23.02.2011 18

% % % % % %

slide-19
SLIDE 19

Analysis 2 – Results

COMPLEXITY-group

35,39 31,15 24,07 22,80 18,72 22,59 25,00 18,20 25,17 20 25 30 35 40 Mixed disciplines Pure disciplines % % % % % 18,72 18,20 5 10 15 20

23.02.2011 19

% % % % %

slide-20
SLIDE 20

Analysis 2 – Results

Significant differences in DaSciTex

corpora p-value signif. direction

POSSIBILITY IMPORTANCE COMPLEXITY OTHERS

B1 – A 3.099e-07 s

  • +
  • B2 – A

5.979e-10 s +

  • B3 – A

< 2.2e-16 s +

  • B4 – A

< 2.2e-16 s +

  • B4 – A

< 2.2e-16 s +

  • B1 – C1

0.0385 s

  • +
  • B2 – C2

0.8106 ns B3 – C3 0.07039 ns B4 – C4 5.099e-05 s +

  • Mixed disciplines

Pure disciplines A: computer B1: computational linguistics C1: linguistics science B2: bioinformatics C2: biology B3: computer aided design C3: mechanical engineering B4: microelectronics C4: electrical engineering

23.02.2011 20

slide-21
SLIDE 21

Analysis 2 – Results

Summary Mixed disciplines

  • 1. make more use of the IMPORTANCE-group than

computer science (A)

  • 2. bioinformatics (B2) and computer aided design (B3)

similar to their pure disciplines

  • 3. very pronounced distinctness of microelectronics

(B4) (differs in the same way from A and C4)

  • 4. less pronounced difference of computational

linguistics (B1)

23.02.2011 21

slide-22
SLIDE 22

Analysis 3

Thing evaluated Examples

1. It is important to evaluate the winglets [...] (C3) 2. Thus, it is important to model the functionality (C4) 2. Thus, it is important to model the functionality (C4) 3. It is important to note that the shape [...] (C3) 4. At this point, however, it is important to highlight the following [...] (C4)

23.02.2011 22

slide-23
SLIDE 23

Analysis 3 – Results

important + cognitive verb

lexical verb F % note 152 58.91 understand 18 6.98 consider 17 6.59

  • bserve

14 5.43

  • bserve

14 5.43 realize 10 3.88 notice 9 3.49 recognize 6 2.33 remark 5 1.94 remember 5 1.94

important + note different functional status formulaic expression with stylistic meaning?!

23.02.2011 23

slide-24
SLIDE 24

Analysis 3 – Results

ADJ + note within the it is ADJ to-INF pattern

ADJ Frequency % important 152 51.70 interesting 109 37.07 worthwhile 10 3.40 worth 6 2.04

23.02.2011 24

worth 6 2.04 worthy 3 1.02 instructive 2 0.68 easy 2 0.68 significant 2 0.68 pertinent, surprising, critical, essential, useful, possible, crucial, sufficient (each occurring once) 8 2.72

slide-25
SLIDE 25

Analysis 3 – Results

Occurrences of note in DaSciTex

Type of occurrence Frequency % it is ADJ to note 294 64.33 note not within the pattern 163 35.67 note (base form) total in DaSciTex 457

23.02.2011 25

slide-26
SLIDE 26

Analysis 3 – Results

it is ADJ to note within DaSciTex

11,9 20,75 13,95 15 20 25 Mixed disciplines Pure disciplines % % %

23.02.2011 26

10,2 7,82 11,9 7,82 7,82 10,2 9,52 5 10% % %

slide-27
SLIDE 27

Analysis 3 – Results

Summary

it is ADJ to note

  • most often used with important and interesting
  • basic form of note in DaSciTex more often used within
  • basic form of note in DaSciTex more often used within

the it is ADJ to-INF pattern seems to be used as a formulaic expression

  • relatively frequently used by microelectronics (B4)

23.02.2011 27

slide-28
SLIDE 28

Discussion & Future Work

  • investigate additional verbs occurring within the it is ADJ to-INF

pattern (process types: mental, material, verbal, relational)

  • investigate additional patters to find more evidence of the

tendencies of cross-disciplinary variation

  • explore the constraints between evaluative category and thing
  • explore the constraints between evaluative category and thing

evaluated for – potentially discriminatory effects between scientific disciplines – automatic attribution of the value of the evaluative category to the thing evaluated

  • explore automated approaches for annotation of interpersonal

expressions and probabilistic methods for corpus comparison

23.02.2011 28

slide-29
SLIDE 29

Discussion & Future Work

  • knowledge of how evaluative patterns are constructed brings a

better understanding of interpersonal meaning, e.g. stance

  • the pattern approach allows a fairly easy identification of

particular stance expressions in large corpora

  • this knowledge may be used to improve existing systems in
  • this knowledge may be used to improve existing systems in

sentiment analysis

– i.e., the classification approach and its extraction pattern learning algorithms (Wiebe and Riloff (2003)) and – the evaluative category and the thing evaluated could be automatically identified

  • interpersonal meaning is context-dependent (register)

23.02.2011 29

slide-30
SLIDE 30

Thank you for your attention! Any questions? Any questions?

23.02.2011 30

slide-31
SLIDE 31

References

Douglas Biber, Stig Johansson, and Geoffrey Leech. Longman Grammar of Spoken and Written English. Longman, Harlow, 1999. Susan Conrad and Douglas Biber. Adverbial marking of stance in speech and writing. In Susan Hunston and Geoff Thompson, editors, Evaluation in Text, Authorial Stance and the Construction of Discourse, pages 56–73. Oxford University Press Inc., New York, 2003. Stefania Degaetano. Evaluation in Academic Research Articles across Scientific

  • Disciplines. Master’s thesis, Technische Universität Darmstadt, 2010.
  • Disciplines. Master’s thesis, Technische Universität Darmstadt, 2010.

Stefan Evert. The CQP Query Language Tutorial. IMS Stuttgart, 2005. CWB version 2.2.b90. M.A.K. Halliday. Explorations in the functions of language. Arnold, London, 1973. M.A.K. Halliday. An Introduction to Functional Grammar. Arnold, London, 1985. Ken Hyland. Academic Discourse, editor. Continuum, London, 2009. Susan Hunston and Gill Francis. Pattern Grammar: A Corpus-driven Approach to the Lexical Grammar of English. Studies in Corpus Linguistics. John Benjamins Publishing, Amsterdam/ Philadelphia, 2000.

23.02.2011 31

slide-32
SLIDE 32

References

Susan Hunston and Geoff Thompson, editors. Evaluation in Text: Authorial stance and the construction of discourse. Oxford University Press, Oxford, 2003. Bing Liu. Sentiment Analysis and Subjectivity. In Nitin Indurkhya and Fred J. Damerau, editors, Handbook of Natural Language Processing. CRC Press, Goshen, Connecticut, USA, 2 edition, 2010. Jim R. Martin. Beyond Exchange: APPRAISAL Systems in English. In Susan Hunston and Geoff Thompson, editors, Evaluation in Text, Authorial Stance and the Construction Geoff Thompson, editors, Evaluation in Text, Authorial Stance and the Construction

  • f Discourse, pages 56–73. Oxford University Press Inc., New York, 2003.

Jim R. Martin and Peter R.R. White. The Language of Evaluation, Appraisal in English. Palgrave Macmillan, London & New York, 2005. Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2:Nos. 1–2:1–135, 2008. Randolph Quirk, Sidney Greenbaum, Jan Svartvik, and Geoffry Leech. A Comprehensive Grammar of the English Language. Longman, 1985.

23.02.2011 32

slide-33
SLIDE 33

References

Marga Reis. On sentence types in German: An enquiry into the relationship between grammar and pragmatics. Interdisciplinary Journal for Germanic Linguistics and Semiotics Analysis, 4:195 – 236, 1999. Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. Lexicon-based methods for sentiment analysis. Computational Linguistics, forthcoming. Elke Teich and Mônica Holtz. Scientific registers in contact. An exploration of the Elke Teich and Mônica Holtz. Scientific registers in contact. An exploration of the lexico-grammatical properties of interdisciplinary discourses. International Journal

  • f Corpus Linguistics, 14(4):524–548, 2009.

Elke Teich and Peter Fankhauser. Exploring a corpus of scientific texts using data

  • mining. In Gries S., Wulff S. and M. Davies, editors, Corpus-linguistic

applications: Current studies, new directions, Rodopi, Amsterdam and New York: 233-247, 2010. Janyce Wiebe and Ellen Riloff. Learning extraction patterns for subjective expressions. In Conference on Empirical Methods in Natural Language Processing (EMNLP), Sapporo, Japan, July 2003.

23.02.2011 33