Visualising Linguistic Evolution in Academic Discourse Verena - - PowerPoint PPT Presentation

visualising linguistic evolution in academic discourse
SMART_READER_LITE
LIVE PREVIEW

Visualising Linguistic Evolution in Academic Discourse Verena - - PowerPoint PPT Presentation

Visualising Linguistic Evolution in Academic Discourse Verena Lyding, Ekaterina Lapshinova, Stefania Degaetano, Henrik Dittmann, Chris Culy Joint Workshop of LINGVIS & UNCLH EACL-2012 Avignon, France V.Lyding, E.Lapshinova (EURAC,


slide-1
SLIDE 1

Visualising Linguistic Evolution in Academic Discourse

Verena Lyding, Ekaterina Lapshinova, Stefania Degaetano, Henrik Dittmann, Chris Culy

Joint Workshop of LINGVIS & UNCLH EACL-2012 Avignon, France

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 1 / 32

slide-2
SLIDE 2

Overview

1

Introduction

2

Data to Analyse Lexico-grammatical Features Resources & Feature Extraction

3

Structured Parallel Coordinates SPC Visualisation Customisation and Interactive Features Visual Analysis of Registers with SPC

4

Interpreting Visualisation Results Case Study I - changes in variable TENOR Case Study II - changes in variable FIELD

5

Conclusion and Future Work

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 2 / 32

slide-3
SLIDE 3

REGICO: Registers in Contact Ekaterina Lapshinova Stefania Degaetano Elke Teich FR 4.6 Applied Linguistics, Interpreting and Translation Studies Saarland University Saarbrücken LinfoVis Verena Lyding Henrik Dittmann Chris Culy Institute for Specialised Communication and Multilingualism EURAC, Bozen-Bolzano

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 3 / 32

slide-4
SLIDE 4

Introduction

Aims

create procedures to visualise diachronic language changes in academic discourse with the help of SPC, cf. (Culy et al., 2011) ⇒to facilitate analysis and interpretation of complex data

Motivation

study diachronic changes with focus on contact registers changes are reflected by linguistic features we determine and describe tendencies of features, which might become rarer, more frequent or cluster in new ways ⇒the amount and complexity of the interrelated data

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 4 / 32

slide-5
SLIDE 5

Data to Analyse Lexico-grammatical Features

Register Analysis

Registers are patterns of language according to use in context

  • cf. (Halliday&Hasan, 1989)

Linguistic variation according to contexts of use, with variables field tenor mode

  • cf. Systemic Functional Linguistics (SFL) and register theory, e.g.,

(Quirk, 1985), (Halliday&Hasan, 1989) and (Biber, 1995), Particular settings of these variables are associated with certain lexico-grammatical features ⇒ co-occurrences indicate distinctive registers (e.g., the language of linguistics in academic discourse).

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 5 / 32

slide-6
SLIDE 6

Data to Analyse Lexico-grammatical Features

Recent Language Change

changes in contexts of use (variables) and language use (features) ⇒ features become rarer or more frequent, and cluster in novel ways ⇒ existing registers become obsolete, new ones evolve

  • cf. (Mair, 2006): changes in preferences of lexico-grammatical

selection in English in the 1960s vs. the 1990s. Our focus: new registers that evolve in contact of disciplines (e.g. the language of bioinformatics, a contact register to biology and computer science)

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 6 / 32

slide-7
SLIDE 7

Data to Analyse Lexico-grammatical Features

Case Study I - changes in variable tenor

TENOR: modality modal verbs grouped according to (Biber, 1999):

  • bligation, permission and volition

categories of meaning (feature) realisation

  • bligation/necessity

can, could, may, etc. permission/possibility/ability must, should, etc. volition/prediction will, would, shall, etc.

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 7 / 32

slide-8
SLIDE 8

Data to Analyse Lexico-grammatical Features

Case Study II - changes in variable field

FIELD: verb valency patterns Competing grammatical variants, e.g. valency patterns show the trends in the development of grammatical features, cf. (Mair, 2006) valency patterns (feature) example VERB+inf help do sth. VERB+obj+inf help sb. do sth. VERB+to-inf help to do sth.

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 8 / 32

slide-9
SLIDE 9

Data to Analyse Resources & Feature Extraction

SciTex

DaSciTex: from the early 2000s

  • approx. 17 million words

SaSciTex: from the 1970s/early 1980s

  • approx. 17 million words

COMPUTER SCIENCE (A) LINGUISTICS (C1) C

O M P U T A T I O N A L

L

I N G U I S T I C S

( B 1 ) BIOLOGY (C2) B

I O

  • I

N F O R M A T I C S

( B 2 ) ELECTRICAL ENGINEERING (C4) M

I C R O

  • E

L E C T R O N I C S

( B 4 ) MECHANICAL ENGINEERING (C3) D

I G I T A L

C

O N S T R U C T I O N

( B 3 )

  • cf. (Degaetano et al., 2012) and (Teich&Fankhauser, 2010)

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 9 / 32

slide-10
SLIDE 10

Data to Analyse Resources & Feature Extraction

Corpus Annotations

automatic token, lemma, part-of-speech, chunk text register, text year, division, etc. (metadata) semi-automatic cohesive devices, evaluative patterns manual transitivity

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 10 / 32

slide-11
SLIDE 11

Data to Analyse Resources & Feature Extraction

Extractions From Corpora

with the Corpus Query Processor (CQP), cf. (Evert 2005)

Positional Attributes: word pos lemma Structural Attributes: s text text_title text_author text_year text_ad cohesion cohesion_device modal modal_meaning evaluation evaluation_pattern

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 11 / 32

slide-12
SLIDE 12

Data to Analyse Resources & Feature Extraction

Examples of Extraction

Case I Extraction: Modal Menaings

Query building blocks comments sentences extracted from SciTex context Each edge [_._modal_meaning=”obligation"] category of obligation can [pos="V.*"] verb transmit context a single packet in each time step context S [_._modal_meaning=”permission"] category of permission must [pos="V.*"] verb remove context at least bj jobs context We [_._modal_meaning=”volition"] category of volition shall [pos="V.*"] verb use context s adversary trees V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 12 / 32

slide-13
SLIDE 13

Data to Analyse Resources & Feature Extraction

Examples of Extraction

Case II Extraction: Valency Patterns

Query building blocks comments sentences extracted from SciTex context It also The power available with the system Lemma 1 [pos=”V.*"&lemma=”help"] verb help helps helps helps [pos=”TO"]?

  • ptional to

to (

  • bject start

[pos=”DT|PP|PDT"]?

  • ne or none determiner

the [pos=”RB.*|JJ.*|VVN|N.*"]{0,3} up to 3 modifiers [pos=”POS"]?

  • ne or none possessive

[pos=”N.*|PP"]? noun or pronoun programmer )

  • bject end

[pos=”V(V|B|H)"] infinitive

  • rganise

refrain set context routine review

  • f recordings

from resisting changes the inductive basis for k ⇓ ⇓ ⇓ valency patterns VERB+inf VERB+obj+inf VERB+to-inf V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 13 / 32

slide-14
SLIDE 14

Data to Analyse Resources & Feature Extraction

Extraction Output

Preparation for Analysis extracted material is sorted according to registers data is transformed into JSON format for input to SPC Analysis Aims register analysis of A-B-C triples: ⇒whether B disciplines are more similar to A or C or distinct from both diachronic analysis: ⇒two time periods in SciTex (70/80s vs. 2000s) a more fine-grained diachronic analysis: publication year

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 14 / 32

slide-15
SLIDE 15

Structured Parallel Coordinates SPC Visualisation

Structured Parallel Coordinates (SPC)

SPC (Culy et al. 2011) are a specialisation of the Parallel Coordinates visualisation (d’Ocagne 1885; Inselberg 1985, 2009) The Parallel Coordinates visualisation provides: two-dimensional representation of multidimensional data data dimensions on vertical axes, lined up horizontally related data points are connected by colored lines between axes

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 15 / 32

slide-16
SLIDE 16

Structured Parallel Coordinates SPC Visualisation

Parallel Coordinates

Example visualising car features Taken from protovis page:

http://mbostock.github.com/protovis/ex/cars.html

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 16 / 32

slide-17
SLIDE 17

Structured Parallel Coordinates SPC Visualisation

SPC for language data

Adaptation of Parallel Coordinates to accomodate language data, e.g. as derived from corpora customised for representing ordered characteristics within and across dimensions

  • e.g. in the n-grams with frequencies application of SPC, ordered

axes represent the linear ordering of words in text

  • e.g. visual separation of ordered and unordered axes

refined modes of interaction

  • e.g. non-contiguous selection of values

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 17 / 32

slide-18
SLIDE 18

Structured Parallel Coordinates SPC Visualisation

N-grams with Frequencies application

Pronouns used with happy and sad ⇒ It is sad > It’s sad > One was sad > It was sad > We were sad

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 18 / 32

slide-19
SLIDE 19

Structured Parallel Coordinates Customisation and Interactive Features

SPC for analysing language change

In SPC data dimensions are placed on different axes subcorpus characteristics, lexico-grammatical features, and their frequencies. Numerical axes are ordered according to time/register of the subcorpus i.e. corpus from the 1970/80s → corpus 2000s i.e. computer science → mixed-discipline (e.g. bioinformatics) → specialised discipline (e.g. biology)

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 19 / 32

slide-20
SLIDE 20

Structured Parallel Coordinates Customisation and Interactive Features

Subcorpus Comparisons - adjustments

For analysing language change: changes in linguistic features over time changes in linguistic features across registers SPC subcorpus comparison application based on n-grams with frequencies application

  • rdered numerical axes follow (unordered) categorical axes

discrete line coloring for the distinction of categorical variables switching between comparable and individual numerical scales

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 20 / 32

slide-21
SLIDE 21

Structured Parallel Coordinates Customisation and Interactive Features

SPC subcorpus comparison application

Visualising linguistic features by register and time Visualisation of HELP plus complements by register

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 21 / 32

slide-22
SLIDE 22

Structured Parallel Coordinates Customisation and Interactive Features

Visual analysis of registers

The SPC visualisation allows for: display of multidimensional data dynamic interaction with the data

  • comparable vs. individual numerical scales
  • discrete vs. scaled coloring of lines

→ OVERVIEW

  • selection of data points for dynamic filtering
  • line coloring according to axis in focus

→ FOCUS

  • highlighting of axes on mouseover
  • written summary of the record

→ DETAILS

⇒ support for the detection of patterns, tendencies and outliers

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 22 / 32

slide-23
SLIDE 23

Interpreting Visualisation Results Case Study I - changes in variable TENOR

Analysing modal meanings

Investigation of changes in usage: obligation, permission and volition ⇒ DEMO:

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 23 / 32

slide-24
SLIDE 24

Interpreting Visualisation Results Case Study I - changes in variable TENOR

Modal Meanings by Time

selection of permission, focus on registers: remarkably less increase for some data sets → Electrical Engineering domain (C4) selection of single disciplines, focus on registers :

  • (A-B1-C1, Linguistics):

B is closer to C than to A for all modal meanings in the 1970/80s

  • (A-B2-C2, Biology):

no remarkable differences in tendency for volition; stronger decrease in C than in A and B for obligation; for permission increase in B lies between increase for C and A

Link: www.eurac.edu/linfovis

> LInfoVis programs and resources > Structured Parallel Coordinates > (sub)corpus comparisons V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 24 / 32

slide-25
SLIDE 25

Interpreting Visualisation Results Case Study I - changes in variable TENOR

Modal Meanings by Register

focus on time: modal meanings behave similarily over time

  • detailed analysis with selection of single modal meanings:
  • bligation: strongest decrease for B3 to C3;

strongest increase for B1-C1 and B2-C2 permission: strongest decrease for B1 to C1; strongest increase for B3-C3 volition: strongest decrease for B2 to C2

  • detailed analysis with selection of Biology:

e.g. focus on obligation: contrary tendencies for B and C over time

focus on registers, normalised values: C values remained stable, B values decreased over time

Link: www.eurac.edu/linfovis

> LInfoVis programs and resources > Structured Parallel Coordinates > (sub)corpus comparisons V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 25 / 32

slide-26
SLIDE 26

Interpreting Visualisation Results Case Study II - changes in variable FIELD

Analysing verb complements

Investigation of changes in usage: HELP plus complements ⇒ DEMO:

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 26 / 32

slide-27
SLIDE 27

Interpreting Visualisation Results Case Study II - changes in variable FIELD

HELP plus Complements by Time

focus on verbs: frequency ordering of verb constructions for all registers: HELP+To+Inf HELP+Inf (1970/80s) ≥ HELP+Obj+Inf (catching up in 2000s) selection of HELP+To+Inf, focus on disciplines: increase over time for B3 (Mechanical), decrease in A and B4 (Electrical), moderate changes in other disciplines selection of HELP+Inf and HELP+Obj+Inf, focus on verbs: some distinct tendencies:

  • A, B3 and B4/C4 strongly increasing
  • B1/C1 and B2/C2 are changing moderately

Link: www.eurac.edu/linfovis

> LInfoVis programs and resources > Structured Parallel Coordinates > (sub)corpus comparisons V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 27 / 32

slide-28
SLIDE 28

Interpreting Visualisation Results Case Study II - changes in variable FIELD

HELP plus Complements by Register

focus on verbs: HELP+Inf behaves most uniformly over all registers selection of HELP+Inf, focus on time: relative stable over subdisciplines, differences between B2 and C2 selection of HELP+Obj+Inf, focus on disciplines: relative occurrences in B3 and C3 inversed from 70/80s to 2000s registers layed out in detail, focus on verbs

  • B3/C3 show inversed tendencies over time for HELP+Obj+Inf and

less for HELP+To+Inf

  • B4/C4 show relative stability over time periods for all verb

constructions

Link: www.eurac.edu/linfovis

> LInfoVis programs and resources > Structured Parallel Coordinates > (sub)corpus comparisons V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 28 / 32

slide-29
SLIDE 29

Conclusion and Future Work

Conclusion

We could show that visualisation allows to gain an overview and detect tendencies → complex set of data in one display interactive features allow to dynamically focus on different aspects

  • f the data

→ filtering and highlighting of specific subsets for detailed analyses ⇒SPC facilitate our diachronic register analysis

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 29 / 32

slide-30
SLIDE 30

Conclusion and Future Work

Future Work

Data Analysis use different data layouts to feed several SPC visualisations focus on further features for the three contextual variables e.g., conjunctive relations expressing cohesion for mode. analyse several linguistic features at the same time (feature sets for register variation) provide a more fine-grained diachronic analysis (by publication years)

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 30 / 32

slide-31
SLIDE 31

Conclusion and Future Work

Future Work continued

Technical Enhancements function for automatic restructuring the underlying data to create different layouts, e.g. the merging of axes with categorical values (e.g., axes registers and disciplines) introduction of a ’summary’ category on each data dimension (the sum of all individual values) function for selecting data items based on crossings or declination

  • f their connecting lines

changing the visualisation of overlapping lines (e.g. using semi-transparent or stacked lines)

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 31 / 32

slide-32
SLIDE 32

Thank you!

Questions? Comments? Suggestions? verena.lyding@eurac.edu e.lapshinova@mx.uni-saarland.de www.eurac.edu/linfovis

V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 32 / 32