 
              Visualising Linguistic Evolution in Academic Discourse Verena Lyding, Ekaterina Lapshinova, Stefania Degaetano, Henrik Dittmann, Chris Culy Joint Workshop of LINGVIS & UNCLH EACL-2012 Avignon, France V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 1 / 32
Overview Introduction 1 Data to Analyse 2 Lexico-grammatical Features Resources & Feature Extraction 3 Structured Parallel Coordinates SPC Visualisation Customisation and Interactive Features Visual Analysis of Registers with SPC Interpreting Visualisation Results 4 Case Study I - changes in variable TENOR Case Study II - changes in variable FIELD Conclusion and Future Work 5 V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 2 / 32
REGICO: Registers in Contact LinfoVis Ekaterina Lapshinova Verena Lyding Stefania Degaetano Henrik Dittmann Elke Teich Chris Culy FR 4.6 Applied Linguistics, Institute for Specialised Interpreting and Translation Studies Communication and Saarland University Multilingualism Saarbrücken EURAC, Bozen-Bolzano V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 3 / 32
Introduction Aims create procedures to visualise diachronic language changes in academic discourse with the help of SPC, cf. (Culy et al., 2011) ⇒ to facilitate analysis and interpretation of complex data Motivation study diachronic changes with focus on contact registers changes are reflected by linguistic features we determine and describe tendencies of features, which might become rarer, more frequent or cluster in new ways ⇒ the amount and complexity of the interrelated data V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 4 / 32
Data to Analyse Lexico-grammatical Features Register Analysis Registers are patterns of language according to use in context cf. (Halliday&Hasan, 1989) Linguistic variation according to contexts of use, with variables field tenor mode cf. Systemic Functional Linguistics (SFL) and register theory, e.g., (Quirk, 1985), (Halliday&Hasan, 1989) and (Biber, 1995), Particular settings of these variables are associated with certain lexico-grammatical features ⇒ co-occurrences indicate distinctive registers (e.g., the language of linguistics in academic discourse). V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 5 / 32
Data to Analyse Lexico-grammatical Features Recent Language Change changes in features become contexts of use rarer or existing registers ( variables ) and more frequent, become obsolete, ⇒ ⇒ language use and cluster new ones evolve ( features ) in novel ways cf. (Mair, 2006): changes in preferences of lexico-grammatical selection in English in the 1960s vs. the 1990s. Our focus: new registers that evolve in contact of disciplines (e.g. the language of bioinformatics, a contact register to biology and computer science) V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 6 / 32
Data to Analyse Lexico-grammatical Features Case Study I - changes in variable tenor TENOR: modality modal verbs grouped according to (Biber, 1999): obligation , permission and volition categories of meaning (feature) realisation obligation /necessity can, could, may , etc. permission /possibility/ability must, should , etc. volition /prediction will, would, shall , etc. V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 7 / 32
Data to Analyse Lexico-grammatical Features Case Study II - changes in variable field FIELD: verb valency patterns Competing grammatical variants, e.g. valency patterns show the trends in the development of grammatical features , cf. (Mair, 2006) valency patterns (feature) example VERB+inf help do sth. VERB+obj+inf help sb. do sth. VERB+to-inf help to do sth. V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 8 / 32
Data to Analyse Resources & Feature Extraction SciTex L INGUISTICS B IOLOGY L A (C1) (C2) N DaSciTex: O I S N B I C T F I A I O from the early 2000s T O T S R - ( U B M I U P ) 2 A 1 M G ) T - approx. 17 million words N B O I C C I ( L S C OMPUTER S CIENCE E L (A) N M SaSciTex: E O L C I A I C T T T ( R C R I from the 1970s/early 1980s B O G U O - I R ) 4 D N 3 ) T I B C S N ( - approx. 17 million words S O C E LECTRICAL M ECHANICAL E NGINEERING E NGINEERING (C4) (C3) cf. (Degaetano et al., 2012) and (Teich&Fankhauser, 2010) V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 9 / 32
Data to Analyse Resources & Feature Extraction Corpus Annotations automatic token, lemma, part-of-speech, chunk text register, text year, division, etc. (metadata) semi-automatic cohesive devices, evaluative patterns manual transitivity V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 10 / 32
Data to Analyse Resources & Feature Extraction Extractions From Corpora with the Corpus Query Processor (CQP), cf. (Evert 2005) Positional Attributes: word pos lemma Structural Attributes: s text text_title text_author text_year text_ad cohesion cohesion_device modal modal_meaning evaluation evaluation_pattern V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 11 / 32
Data to Analyse Resources & Feature Extraction Examples of Extraction Case I Extraction: Modal Menaings Query building blocks comments sentences extracted from SciTex context Each edge [ _._modal_meaning=”obligation"] category of obligation can [ pos="V.*"] verb transmit context a single packet in each time step context S [ _._modal_meaning=”permission"] category of permission must [ pos="V.*"] verb remove context at least bj jobs context We [ _._modal_meaning=”volition"] category of volition shall [ pos="V.*"] verb use context s adversary trees V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 12 / 32
Data to Analyse Resources & Feature Extraction Examples of Extraction Case II Extraction: Valency Patterns Query building blocks comments sentences extracted from SciTex context It also The power Lemma 1 available with the system [ pos=”V.*"&lemma=”help"] verb help helps helps helps [ pos=”TO"]? optional to to ( object start [ pos=”DT | PP | PDT"]? one or none determiner the [ pos=”RB.* | JJ.* | VVN | N.*"] { 0,3 } up to 3 modifiers [ pos=”POS"]? one or none possessive [ pos=”N.* | PP"]? noun or pronoun programmer ) object end [ pos=”V(V | B | H)"] infinitive organise refrain set context routine review from resisting the inductive of recordings changes basis for k ⇓ ⇓ ⇓ valency patterns VERB+inf VERB+obj+inf VERB+to-inf V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 13 / 32
Data to Analyse Resources & Feature Extraction Extraction Output Preparation for Analysis extracted material is sorted according to registers data is transformed into JSON format for input to SPC Analysis Aims register analysis of A-B-C triples: ⇒ whether B disciplines are more similar to A or C or distinct from both diachronic analysis: ⇒ two time periods in SciTex (70/80s vs. 2000s) a more fine-grained diachronic analysis: publication year V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 14 / 32
Structured Parallel Coordinates SPC Visualisation Structured Parallel Coordinates ( SPC ) SPC (Culy et al. 2011) are a specialisation of the Parallel Coordinates visualisation (d’Ocagne 1885; Inselberg 1985, 2009) The Parallel Coordinates visualisation provides: two-dimensional representation of multidimensional data data dimensions on vertical axes, lined up horizontally related data points are connected by colored lines between axes V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 15 / 32
Structured Parallel Coordinates SPC Visualisation Parallel Coordinates Example visualising car features Taken from protovis page: http://mbostock.github.com/protovis/ex/cars.html V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 16 / 32
Structured Parallel Coordinates SPC Visualisation SPC for language data Adaptation of Parallel Coordinates to accomodate language data , e.g. as derived from corpora customised for representing ordered characteristics within and across dimensions - e.g. in the n-grams with frequencies application of SPC, ordered axes represent the linear ordering of words in text - e.g. visual separation of ordered and unordered axes refined modes of interaction - e.g. non-contiguous selection of values V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 17 / 32
Structured Parallel Coordinates SPC Visualisation N-grams with Frequencies application Pronouns used with happy and sad V.Lyding, E.Lapshinova (EURAC, Saarland U) Visualising Linguistic Evolution 23 April 2012 18 / 32 ⇒ It is sad > It’s sad > One was sad > It was sad > We were sad
Recommend
More recommend