Using Measures of Linguistic Complexity to Assess German L2 - - PowerPoint PPT Presentation

using measures of linguistic complexity to assess german
SMART_READER_LITE
LIVE PREVIEW

Using Measures of Linguistic Complexity to Assess German L2 - - PowerPoint PPT Presentation

Using Measures of Linguistic Complexity to Assess German L2 Proficiency in Learner Corpora under Consideration of Task-Effects Zarah Wei Eberhard Karls Universitt Tbingen Kolloquium Korpuslinguistik und Phonetik (SS17) Humboldt


slide-1
SLIDE 1

Using Measures of Linguistic Complexity to Assess German L2 Proficiency in Learner Corpora under Consideration of Task-Effects

Zarah Weiß Eberhard Karls Universität Tübingen

Kolloquium Korpuslinguistik und Phonetik (SS17) Humboldt Universität zu Berlin

June 7th, 2017

slide-2
SLIDE 2

Table of Contents

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-3
SLIDE 3

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-4
SLIDE 4

Introduction

Overview

  • Complexity analysis productive approach in SLA - CL intersection
  • Automatic assessment of proficiency, readability, essay scoring, etc.
  • Started in 1930s with superficial measures (text length in words,

word length in characters) (Frogner 1933; Thorndike 1921)

  • Development towards more sophisticated easily obtainable feature

sets with progress in NLP → Increasing number of diverse complexity measures → Increasing availability of text analysis systems facilitating complexity analysis → Lack of consistency in findings and interpretation of measures

slide-5
SLIDE 5

Introduction

Criticism

  • Lack of theoretical foundation of feature implementation, selection,

and interpretation (Bulté and Housen 2014; Housen, Vedder, and Kuiken 2012; Pallotti 2009)

  • Assumption of homogeneous learner profiles (Crossley and

McNamara 2011; Jarvis et al. 2003)

  • Disregard for task differences and task effects (Alexopoulou et al.

2017; Tracy-Ventura and Myles 2015) Solutions

  • Modeling of heterogeneous learner profiles; e.g. L1 backgrounds

(Crossley and McNamara 2011), learning strategies (Jarvis et al. 2003)

  • Modeling of task effects (Alexopoulou et al. 2017; Polio and Park

2016; Tracy-Ventura and Myles 2015)

slide-6
SLIDE 6

Research Questions

1 How do measures of complexity model German L2 proficiency? 2 To which extend is this influenced by cognitive or functional

task-effects?

3 Does a retrospective analysis of German learner corpora with diverse

task backgrounds improve complexity-based L2 proficiency modeling?

slide-7
SLIDE 7

Procedure

  • Selection of 2 German learner corpora with diverse task

backgrounds: Merlin (Abel et al. 2013), Falko Georgetown (Falko Georgetown Dokumentation 2007)

  • Automatic extraction of 398 measures of complexity
  • Extract data from close transcription of learner data
  • Manual extraction of cognitive and functional task factors from task

descriptions provided in corpus documentations

  • Descriptive cross-corpus analysis for +100 measures
  • Inferential GAM regression analysis with task interactions for

data-driven feature selection

  • Within-corpus classification experiment on Merlin
slide-8
SLIDE 8

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-9
SLIDE 9

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-10
SLIDE 10

Complexity in SLA

Overview

  • Assessment of language performance to measure
  • Text readability
  • L2 and L1 proficiency
  • L2 and L1 development
  • Writing performance
  • Task performance
  • ...
  • Dimensions of language performance

1 Complexity 2 Accuracy 3 Fluency

→ CAF

slide-11
SLIDE 11

Complexity in SLA

CAF Definitions

Fluency: native-like production speed (Pallotti 2009; Wolfe-Quintero, Inagaki, and Kim 1998)

  • e.g. production rate units in time (if available); sometimes:

frequency or length of sentences, t-units, utterances, clauses, and phrases Accuracy: native-like production error rate (Housen, Vedder, and Kuiken 2012; Wolfe-Quintero, Inagaki, and Kim 1998)

  • e.g. error-free t-units, overall error count

Complexity: elaborateness, variedness, and inter-relatedness of a system (Ellis and Barkhuizen 2005; Rescher 1998)

  • e.g. type token ratio, word frequency, etc. for lexical complexity
  • e.g. modifiers per phrase, words per clause, clauses per t-unit, etc.

for syntactic complexity

slide-12
SLIDE 12

Complexity in SLA

CAF Criticism

Vagueness of definitions used in empirical studies

  • Does sentence length measure fluency or complexity?

(Wolfe-Quintero, Inagaki, and Kim 1998)

  • What is more complex: adäquat or angemessen?

Lack of explicit norm in complexity definition

  • Is the observed complexity adequate? (Pallotti and Ferrari 2008;

Pallotti 2015)

  • High complexity in simple tasks indicates lack of socio-linguistic

awareness Ortega 2003; Pallotti 2015

slide-13
SLIDE 13

Complexity in SLA

CAF Criticism

  • Lack of theoretical

underpinnings for measures

  • Complexity =

continuous, temporal progression

  • Complexity =

competence

  • Complexity = difficulty

Figure: L2 complexity and related constructs (Bulté and Housen 2014)

slide-14
SLIDE 14

Taxonomy

Overview

Figure: Taxonomy by Housen, Vedder, and Kuiken 2012

slide-15
SLIDE 15

Taxonomy

Difficulty vs. Complexity

  • Complexity:

general validity across groups

  • Difficulty:

valid for specific groups (ability dependent)

  • Cf. task complexity vs. task

difficulty by Robinson 2001

Figure: Taxonomy by ibid.

slide-16
SLIDE 16

Taxonomy

System vs. Structure

(1) The small, happy bear from Peru ate all the orange marmalade.

  • Structure complexity:

prenominal modifiers, postnominal modifiers

  • System complexity:

NP length, number of modifiers

Figure: Taxonomy by Housen, Vedder, and Kuiken 2012

slide-17
SLIDE 17

Taxonomy

Formal vs. Functional

(2) She carrie-s1 her dog-s2

  • 3rd P. Sg. Pres. (s1) vs. plural

marker (s2)

  • Same formal complexity
  • s1 functionally more complex

Figure: Taxonomy by Housen, Vedder, and Kuiken 2012

slide-18
SLIDE 18

Taxonomy

Benefits

  • Taxonomy is not exhaustive
  • Helps to clarify how measures relate to complexity
  • Helps to interpret findings
  • Helps to assess coverage of measures included in studies
slide-19
SLIDE 19

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-20
SLIDE 20

Overview

  • In task-based language teaching CAF as dimensions of task

performance

  • Task factors known to influence CAF measures from research on

TBLT (Polio and Park 2016; Robinson 2001; Skehan 1996)

  • Task effects rarely analyzed in corpus-based studies (Tracy-Ventura

and Myles 2015)

  • Functional task factors vs. cognitive task factors

→ Cognitive task factors assumed to influence attentional resources dedicated to CAF → Functional task factors assumed to functionally require or inhibit aspects of CAF

slide-21
SLIDE 21

Cognitive Task Factors

Skehan’s Limited Attentional Capacity Model

  • Differentiates

1 Code complexity 2 Cognitive complexity

(processing cost, familiarity)

3 Communicative stress

(mode, time, stakes, etc.)

→ Deplete attentional resources from CAF

Robinson’s Cognition Hypothesis

  • Differentiates

1 Task complexity 2 Task conditions 3 Task difficulty

→ more elements, tempo-spatial disclocation, reasoning demands direct resources to complexity and accuracy → more planning time, num. tasks, prior knowledge depletes resources from complexity and accuracy

slide-22
SLIDE 22

Functional Task Factors

  • LACM and CH make contradicting predictions about task effects on

CAF

  • Findings are inconsistent (Tracy-Ventura and Myles 2015; Yoon and

Polio 2016)

  • Functional task effects argued to be stronger (Biber, Gray, and

Poonpon 2011; Yoon and Polio 2016)

  • Assumed to functionally require or inhibit aspects of CAF
  • Examples: discourse type, text genre, topic
  • Most often comparison of argumentative, narrative, and descriptive

texts

slide-23
SLIDE 23

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-24
SLIDE 24

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-25
SLIDE 25

Merlin

In a Nutshell

  • Cross-sectional corpus of 1,033

German L2 writings

  • Elicited in official standardized

language certification tests by Abel et al. 2013

  • Test levels ranging from A1 to

C1

  • Proficiency scores assigned by 2

expert raters ranging from A1 to C2

100 200 300 a1 a2 b1 b2 c1 c2

Overall CEFR Score Number of Essays META_CEFR_LevelOfTest

a1 a2 b1 b2 c1

Figure: Distribution of proficiency scores grouped by test level.

slide-26
SLIDE 26

Merlin

Tasks

Task Test

  • A1

A2 B1 B2 C1 C2 Going swimming A1 56 8 45 3 Apartment search A1 77 11 50 16 Child birth A1 74 25 41 8 Ticket offer A2 66 5 28 31 2 Pet sitting A2 72 32 40 Housing office A2 70 4 43 22 1 Announce visit B1 67 2 31 29 5 Happy birthday B1 70 24 38 8 Happy new year B1 73 2 10 54 7 Application B2 69 1 22 42 4 Work complaint B2 70 1 20 47 2 Information request B2 65 24 41 Housing situation C1 72 7 52 13 1 Learning German C1 42 1 26 15 2 Traditions & Assimilation C1 90 16 62 12 1

Table: Mapping of tasks to test levels, task frequency, and their distribution across overall proficiency scores (A1 to C2).

slide-27
SLIDE 27

Falko Georgetown

In a Nutshell

  • Partially longitudinal corpus of

209 German L2 writings by 123 students

  • Elicited in curricular writing

courses at Georgetown University by Falko Georgetown Dokumentation 2007; Reznicek et al. 2007

  • Course levels 1 to 4 for

intermediate to advanced learners of German

  • No external validation of

proficiency besides course levels

Figure: Texts written by learners who contributed multiple writings to Falko

slide-28
SLIDE 28

Falko Georgetown

Tasks

Task

  • Level 1

Level 2 Level 3 Level 4 Write a letter 21 21 Continue a novel 28 28 Write an article 28 28 Write a speech 16 16 Book review 116 19 25 23 49

Table: Frequency of tasks across course levels in the Falko Georgetown L2 corpus.

slide-29
SLIDE 29

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-30
SLIDE 30

Overview

  • Annotate 15 cognitive and functional task factors
  • Goal: disentangle correlation of course / test level and task
  • Based on task descriptions in supplementary material Falko

Georgetown Dokumentation 2007; Merlin project 2014a,b,c,d,e,f,g,h,i,j,k,l,m,n,o

  • Follow approach by Alexopoulou et al. 2017
slide-31
SLIDE 31

Operationalizations

Cognitive Factors

Code complexity: instructions provided no, few, or detailed language material to draw from Cognitive complexity: require reasoning about writing structure vs.

  • utline or refer to known structure

Shared context temporal and spatial dislocation (here/there; now/then) Reasoning demands: quantity and elaborateness of spatial reasoning, i.e. referencing a location without extra-linguistic support, and reasoning about other people’s intentions, beliefs, desires, or relationships Referenced elements: number of discourse referents minimally required in solution Perspective requires perspective of i) self, ii) psomeone else; iii) multiple other people.

slide-32
SLIDE 32

Operationalizations

Functional Factors

Genre text category; cf. task descriptions Audience recipient; cf. task descriptions (partially grouped) Formality tone; cf. task descriptions or inferred from genre and audience Task theme general topic; professional/occupational interests, public social affairs, small talk, or (by extension) goal-oriented personal matters (demand) Task type determined by a combination of functional needs and genre; argumentative, narrative, descriptive/expositional, and instructional

slide-33
SLIDE 33

Tasks in Merlin

Task Test Time Expected Genre Audience Formality Theme Task Type Code Cognitive Shared Reasoning Referenced Perspective Level Words Complexity Complexity Context Elements Going swimming A1 45 Min. 30 Email Friend Informal Demand Descriptive High Low T & T Low Few Own Apartment search A1 45 Min. 30 Email Friend Informal Demand Instructive High Low H & N Low Few Own Child birth A1 45 Min. 30 Letter Friend Informal Small talk Descriptive High Low H & N Low Few Own Ticket offer A2 50 Min. 40 Letter Friend Informal Demand Instructional High Low H & N Medium Few Own Pet sitting A2 50 Min. 40 Email Friend Informal Demand Instructional High Low H & N Medium Few Own Housing office A2 50 Min. 40 Letter Agency Informal Demand Descriptive High Low H & N Low Few Own Announce visit B1 30 Min. 128 Letter Friend Informal Small talk Narrative Low Low H & N Low Many Own Happy birthday B1 30 Min. 128 Letter Friend Informal Small talk Narrative Low Low H & N Medium Many Own Happy new year B1 30 Min. 128 Letter Friend Informal Small talk Narrative Low Low T & T Medium Many Own Application B2 30 Min. 150 Letter Agency Formal Profession Argumentative High Medium H & N Medium Few Own Work complaint B2 30 Min. 150 Letter Agency Formal Profession Argumentative High Medium T & T Medium Many Own Information request B2 30 Min. 150 Letter Agency Formal Profession Descriptive High Low H & N Medium Few Own Housing situation C1 60 Min. 200 Essay Public Formal Society Descriptive High High H & N Medium Open Own & others Learning German C1 60 Min. 200 Essay Public Formal Society Argumentative High High H & N High Open Own & others Traditions & C1 60 Min. 200 Essay Public Formal Society Argumentative High High H & N High Open Own & others Assimilation

Table: Properties and task factors annotated for Merlin tasks.

slide-34
SLIDE 34

Tasks in Falko Georgetown

Task Test Audience Genre Formality Theme Task Type Code Cognitive Shared Reasoning Referenced Perspective Level Complexity Complexity Context Elements Write a letter 1 Friend Letter Informal Small talk Instructional High Low T & T Low Few Own Continue a novel 2 Public Novel Informal Mystery Narrative Low Low T & T Medium Few Other Write an article 3 Public Article Formal Society Descriptive Low High T & T Medium Many Own & others Write a speech 4 Public Speech Formal Society Argumentative Low Low T & T High Open Own & others Book review 1-4 Public Review Formal Society Argumentative High High T & T High Open Own & others

Table: Properties and task factors annotated for Falko Georgetown L2 tasks.

slide-35
SLIDE 35

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-36
SLIDE 36

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-37
SLIDE 37

Overview

  • 398 measures of elaborateness and variedness of various domains
  • Extracted automatically using elaborate NLP tool chain
  • Written by Galasso 2014; Hancke 2013; Weiß 2015, 2017
  • Domains:

1 Language use 2 Human language processing 3 Discourse & encoding of meaning 4 Theoretical linguistics (lexico-semantics, syntax, morphology)

slide-38
SLIDE 38

Pipeline

Figure: System pipeline from plain text corpus to feature analysis.

slide-39
SLIDE 39

Resources

Task Component Version Model Tokenization and

  • OpenNLP

1.6.0 default sentence segmentation POS tagging        Mate tools 3.6.0 default Lemmatization Morphological analysis Dependeny parsing Compound splitting JWordSplitter 3.4.0 default Constituency parsing Stanford PCFG parser 3.6.0 default Topological field parsing Berkeley parser 1.7.0

  • cf. Ramon Ziai

Table: NLP components used in the complexity analysis pipeline.

slide-40
SLIDE 40

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-41
SLIDE 41

Language Use

Domains:

  • Corpus and psycho-linguistics

Measures:

  • Word frequencies: less frequent = more sophisticated/complex
  • Age of acquisition: later AoA = more sophisticated/complex

Implemented:

  • Frequency data bases: dlexDB, SUBTLEX-DE, Google Books 2000
  • AoA approximation based on KCT (Lavalley, Berkling, and Stüker

2015)

slide-42
SLIDE 42

Human Language Processing

Domains:

  • Cognitive science, psycho-linguistics and information theory

Measures:

  • Cognitive processing costs as identified by processing time, reading

time, etc.

  • Storage and integration of discourse referents consumes cognitive

resources

  • Long distances between referents increase these costs

Implemented:

  • Dependency Locality Theory (DLT) by Gibson 2000; Shain et al.

2016

  • Verb argument distances in syllables (Weiß 2015)
slide-43
SLIDE 43

Discourse & Encoding of Meaning

Domains:

  • Psychology, psycho-linguistics

Measures:

  • Propositional idea density (Brown et al. 2008): more propositions =

more complex encoding of meaning

  • Connectives
  • Co-referential expressions
  • Grammatical transitions

→ cause more cohesive writing and complex discourse Implemented:

  • PID (Louwerse et al. 2004)
  • Connectives as listed by Duden (Gr) 2009
  • Local and global overlap of linguistic material
  • Co-referential expressions (pronouns, articles, etc.)
  • Local transitions of grammatical roles (Barzilay and Lapata 2008;

Galasso 2014; Todirascu et al. 2013)

slide-44
SLIDE 44

Theoretical Linguistics

Lexio-Semantic

  • Measures: concreteness, relatedness, diversity, and variation
  • Implemented: TTR, lexical TTR, GermaNet semantic relations

Syntax:

  • Measures: clausal complexity (subordination), phrasal complexity

(modification)

  • Implemented: dependent clause ratios, modifier ratios, complex NP,

periphrastic constructions, etc. Morphology:

  • Measures: inflection, derivation, composition
  • Implemented: nominalization, tense, compound depth, etc.
slide-45
SLIDE 45

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-46
SLIDE 46

Overview

  • Plots of measures with 95% confidence intervals
  • Compare proficiency trajectory across corpora and task profiles
  • Sample of over 100 complexity measures
  • Grouped under theoretical considerations into concepts
  • Selected to represent at least one concept per domain
  • All 398 measures at

http://www.sfs.uni-tuebingen.de/~zweiss/ma-thesis/ supplementary-material/complexity-plots/

slide-47
SLIDE 47

Human Language Processing

DLT-V and syllable distance measures

0.000 0.001 a1 a2 b1 b2 c

proficiency score adjHighICAreas/Vfin

2.5 3.0 3.5 a1 a2 b1 b2 c

proficiency score totalICAtVfin/Vfin

1.8 2.0 2.2 2.4 a1 a2 b1 b2 c

proficiency score maxTotalIC/Vfin

2.5 5.0 7.5 a1 a2 b1 b2 c

proficiency score syllablesInMF/MF

2 4 6 8 a1 a2 b1 b2 c

proficiency score dist1stArgToVerb/VerbWithDistArg

Figure: Merlin.

  • −0.001

0.000 0.001 0.002 1 2 3 4

course adjHighICAreas/Vfin

  • 2.5

3.0 3.5 4.0 1 2 3 4

course totalICAtVfin/Vfin

  • 1.5

2.0 2.5 1 2 3 4

course maxTotalIC/Vfin

  • 6

8 10 1 2 3 4

course syllablesInMF/MF

  • 6

8 10 12 1 2 3 4

course dist1stArgToVerb/VerbWithDistArg

Figure: Falko GT L2; △: curricular tasks, ◦: book reviews.

slide-48
SLIDE 48

Discourse & Encoding of Meaning

Overlap of linguistic material

0.1 0.2 a1 a2 b1 b2 c

proficiency score localNounOverlapsPerSentence

0.05 0.10 0.15 0.20 a1 a2 b1 b2 c

proficiency score globalNounOverlapsPerSentence

0.2 0.3 0.4 0.5 0.6 a1 a2 b1 b2 c

proficiency score localArgOverlapsPerSentence

0.1 0.2 0.3 0.4 a1 a2 b1 b2 c

proficiency score globalArgOverlapsPerSentence

0.03 0.04 0.05 0.06 0.07 a1 a2 b1 b2 c

proficiency score localContentOverlapsPerSentence

0.02 0.03 0.04 0.05 0.06 a1 a2 b1 b2 c

proficiency score globalContentOverlapsPerSentence

0.1 0.2 0.3 a1 a2 b1 b2 c

proficiency score localStemOverlapsPerSentence

0.1 0.2 a1 a2 b1 b2 c

proficiency score globalStemOverlapsPerSentence

Figure: Merlin.

  • 0.1

0.2 0.3 0.4 1 2 3 4

course localNounOverlapsPerSentence

  • 0.1

0.2 0.3 1 2 3 4

course globalNounOverlapsPerSentence

  • 0.4

0.5 0.6 0.7 1 2 3 4

course localArgOverlapsPerSentence

  • 0.20

0.25 0.30 0.35 0.40 0.45 1 2 3 4

course globalArgOverlapsPerSentence

  • 0.04

0.06 0.08 0.10 0.12 1 2 3 4

course localContentOverlapsPerSentence

  • 0.02

0.04 0.06 0.08 1 2 3 4

course globalContentOverlapsPerSentence

  • 0.1

0.2 0.3 0.4 0.5 1 2 3 4

course localStemOverlapsPerSentence

  • 0.1

0.2 0.3 1 2 3 4

course globalStemOverlapsPerSentence

Figure: Falko GT L2; △: curricular tasks, ◦: book reviews.

slide-49
SLIDE 49

Syntactic Complexity

Complex NPs

2.8 3.2 3.6 4.0 a1 a2 b1 b2 c

proficiency score words/np

1.6 1.7 1.8 1.9 2.0 a1 a2 b1 b2 c

proficiency score npDeps/npWithDeps

0.2 0.3 0.4 0.5 0.6 a1 a2 b1 b2 c

proficiency score npMods/np

0.000 0.005 0.010 0.015 0.020 0.025 a1 a2 b1 b2 c

proficiency score attrParticiples/np

0.01 0.02 0.03 0.04 a1 a2 b1 b2 c

proficiency score clausalNounMods/np

0.0000 0.0025 0.0050 0.0075 a1 a2 b1 b2 c

proficiency score comparativeNounMods/np

0.2 0.3 0.4 a1 a2 b1 b2 c

proficiency score determiners/np

0.05 0.06 0.07 0.08 0.09 0.10 a1 a2 b1 b2 c

proficiency score possessiveNounMods/np

0.05 0.10 0.15 0.20 a1 a2 b1 b2 c

proficiency score prenominalMods/np

0.100 0.125 0.150 0.175 0.200 a1 a2 b1 b2 c

proficiency score postnominalMods/np

0.4 0.5 0.6 0.7 a1 a2 b1 b2 c

proficiency score coverageModifierTypes

Figure: Merlin.

  • 2.5

3.0 3.5 4.0 1 2 3 4

course words/np

  • 1.5

1.6 1.7 1.8 1.9 2.0 1 2 3 4

course npDeps/npWithDeps

  • 0.3

0.4 0.5 0.6 1 2 3 4

course npMods/np

  • 0.00

0.01 0.02 0.03 1 2 3 4

course attrParticiples/np

  • 0.01

0.02 0.03 0.04 0.05 1 2 3 4

course clausalNounMods/np

  • 0.0000

0.0025 0.0050 0.0075 1 2 3 4

course comparativeNounMods/np

  • 0.2

0.3 0.4 0.5 1 2 3 4

course determiners/np

  • 0.05

0.10 0.15 1 2 3 4

course possessiveNounMods/np

  • 0.10

0.15 0.20 0.25 0.30 1 2 3 4

course prenominalMods/np

  • 0.10

0.15 0.20 0.25 1 2 3 4

course postnominalMods/np

  • 0.6

0.7 0.8 0.9 1.0 1 2 3 4

course coverageModifierTypes

Figure: Falko GT L2; △: curricular tasks, ◦: book reviews.

slide-50
SLIDE 50

Morphological Complexity

Inflection measures

0.3 0.4 0.5 0.6 a1 a2 b1 b2 c

proficiency score nominatives/noun

0.20 0.25 0.30 0.35 a1 a2 b1 b2 c

proficiency score accusatives/noun

0.02 0.04 0.06 a1 a2 b1 b2 c

proficiency score genitives/noun

0.10 0.15 0.20 0.25 0.30 a1 a2 b1 b2 c

proficiency score datives/noun

0.7 0.8 a1 a2 b1 b2 c

proficiency score vfin/verb

0.1 0.2 0.3 a1 a2 b1 b2 c

proficiency score infiniteVerbs/verb

0.04 0.06 0.08 0.10 a1 a2 b1 b2 c

proficiency score participleVerbs/verb

0.000 0.002 0.004 0.006 a1 a2 b1 b2 c

proficiency score imperatives/vfin

0.03 0.06 0.09 a1 a2 b1 b2 c

proficiency score subjunctives/vfin

0.850 0.875 0.900 0.925 0.950 0.975 a1 a2 b1 b2 c

proficiency score indicatives/vfin

0.2 0.3 0.4 0.5 a1 a2 b1 b2 c

proficiency score 1stPersonInfl/vfin

0.000 0.025 0.050 0.075 a1 a2 b1 b2 c

proficiency score 2ndPersonInfl/vfin

0.4 0.5 0.6 0.7 0.8 a1 a2 b1 b2 c

proficiency score 3rdPersonInfl/vfin

Figure: Merlin.

  • 0.25

0.30 0.35 0.40 0.45 0.50 1 2 3 4

course nominatives/noun

  • 0.25

0.30 0.35 1 2 3 4

course accusatives/noun

  • 0.05

0.10 0.15 1 2 3 4

course genitives/noun

  • 0.2

0.3 1 2 3 4

course datives/noun

  • 0.65

0.70 0.75 0.80 1 2 3 4

course vfin/verb

  • 0.10

0.15 0.20 0.25 1 2 3 4

course infiniteVerbs/verb

  • 0.075

0.100 0.125 0.150 1 2 3 4

course participleVerbs/verb

  • 0.000

0.002 0.004 0.006 0.008 1 2 3 4

course imperatives/vfin

  • 0.025

0.050 0.075 0.100 1 2 3 4

course subjunctives/vfin

  • 0.900

0.925 0.950 0.975 1 2 3 4

course indicatives/vfin

  • 0.0

0.2 0.4 0.6 1 2 3 4

course 1stPersonInfl/vfin

  • 0.00

0.02 0.04 1 2 3 4

course 2ndPersonInfl/vfin

  • 0.4

0.6 0.8 1 2 3 4

course 3rdPersonInfl/vfin

Figure: Falko GT L2; △: curricular tasks, ◦: book reviews.

slide-51
SLIDE 51

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-52
SLIDE 52

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-53
SLIDE 53

Overview

  • Regression analysis to predict proficiency from complexity and task

factors

  • Use ordinal generative additive regression models (GAMs)
  • 2 studies on Merlin: i) task effects; ii) performance effects
  • Studies on Falko Georgetown not reported here
slide-54
SLIDE 54

Ordinal Generative Additive Regression Models

Overview

GAMs

  • Extension of linear regression models
  • Use splines as smooths for controlled introduction of non-linear

relations

  • Highly interpretable, yet similar predictive power as ML techniques

like SVM

  • Share requirements of regression models: normal, uncorrelated

predictors

  • Support 1 predictor per 15 to 20 data points

Ordinal Regression

  • Link function to non-exponential distribution by Wood 2006
  • Estimates boundaries between classes

→ keeps precedence without introducing quantity

slide-55
SLIDE 55

Model Design

Iterative, data-driven model approach

1 Rank measures by information gain using WEKA 2 Test most informative measure for normality; normalize if necessary 3 Test for correlation of predictors

  • a. If < ±0.70 Pearson correlation: add measure to model
  • b. Else: remove correlated measures, add measure to model

4 Smooth measures unless they are linear 5 If changes lead to significant model improvement (χ2 test), keep

them

6 Do until 20 iterations did not yield better model or model contains

15/n measures

slide-56
SLIDE 56

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-57
SLIDE 57

Study 1: Task Effects

Set Up

Figure: Model formula of Merlin interaction model predicting overall CEFR scores from scaled and transformed complexity measures.

slide-58
SLIDE 58

Study 1: Task Effects

Model Fit

  • R2 = 0.7660
  • Approximately homescedastic residual errors with

µ = 0.04; sd = 7.26 after outlier removal

  • Severe outliers across all model variants: Assumption of idiosyncratic

properties in texts

  • Outliers systematically include learners who performed above or

below test level → Prompted performance effect analysis in Study 2

slide-59
SLIDE 59

Study 1: Task Effects

Model Fit

Model AIC Df REML Edf Compared with χ2 Edf difference Pr(> χ2) Complexity 1315.05 30.37 658.56 19 Reference 1287.08 28.41 642.77 20 Complexity 15.790 1 1.914e-08 Interaction 1281.00 39.27 628.84 31 Complexity 29.717 12 2.861e-08 Reference 13.928 11 0.003

Table: Model comparison for complexity, reference, and interaction model build

  • n the Merlin data.
slide-60
SLIDE 60

Study 1: Task Effects

Model Discussion

  • A. parametric coefficients

Estimate

  • Std. Error

t-value p-value (Intercept) 8.3759 0.3833 21.8509 < 0.0001 hasTransitionsFromSubjectToNot[TRUE]

  • 0.5349

0.2387

  • 2.2408

0.0250 has3rdPersPossessivePronouns[TRUE]

  • 0.8906

0.2030

  • 4.3873

< 0.0001 containsToInfinitives[TRUE]

  • 0.5541

0.2282

  • 2.4284

0.0152 halfModalClusterPerVP 0.1831 0.1011 1.8113 0.0701 logSumNonTerminalNodesPerSentence 1.9714 0.1785 11.0435 < 0.0001 avgVTotalIntegrationCostAtFiniteVerb 0.3705 0.1059 3.4968 0.0005 lexTypesFoundInDlexPerLexType 0.8840 0.0942 9.3858 < 0.0001

Table: Interaction model: linear measures.

slide-61
SLIDE 61

Study 1: Task Effects

Model Discussion

  • A. parametric coefficients

Estimate

  • Std. Error

t-value p-value (Intercept) 8.3759 0.3833 21.8509 < 0.0001 usesConjunctionalClauses[TRUE]

  • 0.6051

0.3173

  • 1.9074

0.0565 logATFBand2PerTypesFoundInDlex

  • 0.3003

0.1091

  • 2.7528

0.0059 typeTokenRato 1.2853 0.2038 6.3068 < 0.0001 logSumNonTerminalNodesPerWord

  • 0.7130

0.1598

  • 4.4619

< 0.0001 TaskTheme[Society] 0.4921 0.7085 0.6947 0.4873 TaskTheme[Profession] 1.0774 0.5508 1.9560 0.0505 TaskTheme[Smalltalk]

  • 0.8117

0.3529

  • 2.3004

0.0214 usesConjunctionalClauses:TaskTheme[Society] 2.1839 0.9603 2.2742 0.0230 usesConjunctionalClauses:TaskTheme[Profession]

  • 0.4185

0.5417

  • 0.7726

0.4398 usesConjunctionalClauses:TaskTheme[Smalltalk] 0.5155 0.4714 1.0937 0.2741 logATFBand2PerTypesFoundInDlex:TaskTheme[Society]

  • 0.1827

0.4194

  • 0.4357

0.6631 logATFBand2PerTypesFoundInDlex:TaskTheme[Profession] 0.5517 0.3530 1.5628 0.1181 logATFBand2PerTypesFoundInDlex:TaskTheme[Smalltalk] 0.5392 0.2197 2.4539 0.0141 typeTokenRato:TaskTheme[Society]

  • 0.4750

0.3634

  • 1.3072

0.1912 typeTokenRato:TaskTheme[Profession]

  • 0.5975

0.3998

  • 1.4947

0.1350 typeTokenRato:TaskTheme[Smalltalk]

  • 0.8335

0.2925

  • 2.8494

0.0044 logSumNonTerminalNodesPerWord:TaskTheme[Society]

  • 0.9369

0.4216

  • 2.2224

0.0263 logSumNonTerminalNodesPerWord:TaskTheme[Profession]

  • 0.1522

0.3409

  • 0.4465

0.6552 logSumNonTerminalNodesPerWord:TaskTheme[Smalltalk] 0.2680 0.2344 1.1429 0.2531

Table: Interaction model: interactions measures.

slide-62
SLIDE 62

Study 1: Task Effects

Model Discussion

  • B. smooth terms

edf Ref.df F-value p-value s(charactersPerWord) 2.7714 3.5484 18.5670 0.0007 s(numberOfSentencesSquared) 4.6262 5.7193 254.0399 < 0.0001

Table: Interaction model: smoothed measures. Figure: Smooths of Merlin interaction model.

slide-63
SLIDE 63

Study 1: Task Effects

Classification Experiment

Model µ F1 ±SD µ Recall ±SD µ Precision ±SD Majority Baseline 7.37 11.59 7.44 11.33 7.37 11.37 Complexity 70.97 4.25 71.63 4.74 72.30 4.09 Reference 71.32 4.33 71.78 4.87 72.74 4.10 Interaction 72.17 4.43 72.69 4.94 73.39 4.15

Table: Weighted average precision, recall, and f1 score for complexity, reference, and interaction model for 10 iterations of 10-folds cross-validation.

slide-64
SLIDE 64

Study 1: Task Effects

Classification Experiment

Predicted↓ / Observed→ A1 A2 B1 B2 C A1 25.5 10.1 0.0 0.0 0.0 A2 29.5 241.3 45.4 0.0 0.0 B1 0.0 52.6 233.5 37.8 0.0 B2 0.0 0.0 49.1 243.1 37.9 C 0.0 0.0 0.0 10.1 8.1

Table: Averaged confusion matrix for classification of L2 proficiency in Merlin using the interaction model.

slide-65
SLIDE 65

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-66
SLIDE 66

Study 2: Performance Effects

Set Up

Figure: Model formula of Merlin success model predicting overall CEFR scores from scaled and transformed complexity measures

slide-67
SLIDE 67

Study 2: Performance Effects

Model Fit

  • R2 = 0.9000
  • Approximately homescedastic residual errors with

µ = −0.14; sd = 12.91 after outlier removal

  • Same outliers as before, except for under-performing learners
  • Outliers still systematically include learners who performed above or

below test level

slide-68
SLIDE 68

Study 2: Performance Effects

Model Fit

Model AIC Df REML Edf Comparison with χ2 Edf diff. Pr(> χ2) Complexity 1315.05 30.37 658.56 19 Reference model 1287.08 28.41 642.77 20 Interaction model 1281.00 39.27 628.84 31 Success model 821.11 35.76 401.19 26 Complexity 257.36 7 < 2e − 16 Success model Reference 241.573 6 < 2e − 16 Success model Interaction 227.65

  • 5

Table: Model comparison for reference, complexity, interaction, and success GAMs modeling L2 proficiency from complexity measures and task theme on the Merlin data.

slide-69
SLIDE 69

Study 2: Performance Effects

Model Discussion

  • A. parametric coefficients

Estimate

  • Std. Error

t-value p-value (Intercept) 3.8610 0.5186 7.4455 < 0.0001 hasTransitionsFromSubjectToNot[TRUE]

  • 0.8565

0.2732

  • 3.1350

0.0017 has3rdPersPossessivePronouns[TRUE]

  • 1.3267

0.2556

  • 5.1903

< 0.0001 containsToInfinitives[TRUE]

  • 0.7246

0.2701

  • 2.6825

0.0073 usesConjunctionalClauses[TRUE]

  • 0.5514

0.2698

  • 2.0434

0.0410 logATFBand2PerTypesFoundInDlex

  • 0.3972

0.1202

  • 3.3054

0.0009 avgVTotalIntegrationCostAtFiniteVerb 0.4765 0.1380 3.4522 0.0006 lexTypesFoundInDlexPerLexType 0.9649 0.1132 8.5218 < 0.0001 typeTokenRato 1.2797 0.1877 6.8176 < 0.0001 sumNonTerminalNodesPerWord

  • 0.8316

0.1398

  • 5.9464

< 0.0001 logSumNonTerminalNodesPerSentence 2.4829 0.2093 11.8655 < 0.0001 Passed[TRUE] 6.6843 0.3018 22.1510 < 0.0001 TaskTheme[Society] 11.5649 0.6668 17.3437 < 0.0001 TaskTheme[Profession] 7.2479 0.5982 12.1158 < 0.0001 TaskTheme[Smalltalk] 0.9101 0.2796 3.2550 0.0011 logATFBand2PerTypesFoundInDlex:TaskTheme[Society] 0.4881 0.5980 0.8163 0.4144 logATFBand2PerTypesFoundInDlex:TaskTheme[Profession] 1.1930 0.4812 2.4795 0.0132 logATFBand2PerTypesFoundInDlex:TaskTheme[Smalltalk] 0.3248 0.2428 1.3376 0.1810 s(charactersPerWord):Passed[FALSE] 2.5964 3.2239 5.3214 0.1503 s(charactersPerWord):Passed[TRUE] 1.3498 1.6284 6.5613 0.0297 s(numberOfSentencesSquared):Passed[FALSE] 3.6322 4.5433 69.6021 < 0.0001 s(numberOfSentencesSquared):Passed[TRUE] 4.3517 5.3657 306.2779 < 0.0001

Table: Summary of success model predicting Merlin overall CEFR scores from scaled and transformed complexity measures in Merlin. Uses ’demand’ as

slide-70
SLIDE 70

Study 2: Performance Effects

Model Discussion

Figure: Smooths of Merlin success model.

slide-71
SLIDE 71

Study 2: Performance Effects

Model Discussion

  • Most task theme interactions become uninformative
  • Still significantly different slopes, but not enough new variance

explained

  • Especially: texts about society heavily confounded with failed tests
  • Unclear relationship between performance and task theme
slide-72
SLIDE 72

Study 2: Performance Effects

Classification Experiment

Model µ F1 ±SD µ Recall ±SD µ Precision ±SD Majority Baseline 7.37 11.59 7.44 11.33 7.37 11.37 Complexity 71.20 4.25 71.89 4.71 72.53 4.03 Reference 71.32 4.33 71.78 4.87 72.74 4.10 Interaction 72.17 4.43 72.69 4.94 73.39 4.15 Success 84.98 2.75 85.60 2.80 85.28 2.74

Table: Weighted average precision, recall, and F1 score for complexity, reference, interaction and success model for 10 iterations of 10-folds cross-validation.

slide-73
SLIDE 73

Study 2: Performance Effects

Classification Experiment

Pred.↓ / Obs.→ A1 A2 B1 B2 C A1 28.1 12.7 0.0 0.0 0.0 A2 26.9 260.9 34.5 0.0 0.0 B1 0.0 30.4 271.9 18.2 0.0 B2 0.0 0.0 21.6 271.9 6.0 C 0.0 0.0 0.0 0.4 40.0

Table: Averaged confusion matrix for classification of L2 proficiency in Merlin using the success model.

slide-74
SLIDE 74

1 Introduction 2 Theoretical Background

Complexity in SLA Task Effects in SLA

3 Data

Corpora Task Annotations

4 Measuring Complexity

Automatic System Complexity Measures

5 Descriptive Cross-Corpus Analysis 6 Inferential Regression Modeling

Set Up Study 1: Task Effects Study 2: Performance Effects

7 Conclusion

slide-75
SLIDE 75

Conclusion

Findings

How do measures of complexity model German L2 proficiency?

  • Most indices of the same concept tend to develop homogeneously

and stable across corpora

  • Most indices develop homogeneously across corpora
  • Data-driven feature selection approaches yield diverse set of

measures

  • GAMs are highly interpretable, yet show considerable predictive

power

slide-76
SLIDE 76

Conclusion

Findings

To which extend is this influenced by cognitive or functional task-effects?

  • Some measures are more stable across heterogeneous task

backgrounds (human language processing, complex NPs)

  • Other measures are less stable
  • Stable measures especially promising for systems evaluating diverse

task backgrounds

  • Task factors seem to predominantly effect local measures of

structural complexity

  • Further research on this required
slide-77
SLIDE 77

Conclusion

Findings

Does a retrospective analysis of German learner corpora with diverse task backgrounds improve complexity-based L2 proficiency modeling?

  • Post-hoc annotation straight forward if task documentation available
  • Suited to decrease confound of tasks and course levels
  • Task factors improve model fit significantly and decrease

non-linearity

  • Interactions seem unstable, models suffer from wide standard

deviation

  • Results lack interpretability due to skewed distribution
  • Analysis improves situation, but idiosyncratic distributional

properties of data remain problematic

slide-78
SLIDE 78

Future Work

Next:

1 Investigation of task and performance effects on more balanced data

sets

2 Study of adequacy of L2 complexity by comparing results on

comparable L1 productions (Falko)

3 Make complexity code used here publicly available in CTAP (Chen

and Meurers 2016) Also interesting:

  • Analysis of task type interactions in data
  • Cross-corpus testing of success model
  • Systematically assess sensitivity of system and structure complexity

to task effects

  • Systematic validation of measure validity on L2 data
slide-79
SLIDE 79

Thank you for your attention! Questions?

slide-80
SLIDE 80

References I

Abel, Andrea et al. (2013). merlin: A Trilingual Learner Corpus illustrating European Reference Levels. LRC 2013. Bergen, Norway. Alexopoulou, Theodora et al. (2017). “Task Effects on Linguistic Complexity and Accuracy: A Large-Scale Learner Corpus Analysis Employing Natural Language Processing Techniques”. In: Language Learning, pp. 1–29. Barzilay, Regina and Mirella Lapata (2008). “Modeling local coherence: An entity-based approach”. In: Computational Linguistics 34, pp. 1–34. Biber, Douglas, Bethany Gray, and Kornwepa Poonpon (2011). “Should we use characteristics of conversation to measure grammatical complexity in L2 writing development?” In: Tesol Quarterly 45,

  • pp. 5–35.

Brown, Cati et al. (2008). “Automatic measurement of propositional idea density from part-of-speech tagging”. In: Behavior research methods 40.2, pp. 540–545.

slide-81
SLIDE 81

References II

Bulté, Bram and Alex Housen (2014). “Conceptualizing and measuring short-term changes in L2 writing complexity”. In: Journal of Second Language Writing 26, pp. 42–65. Chen, Xiaobin and Detmar Meurers (2016). “CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis”. In: Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity,

  • pp. 113–119.

Crossley, Scott A. and Danielle S. McNamara (2011). “Shared features of L2 writing: Intergroup homogeneity and text classification”. In: Journal

  • f Second Language Writing 20, pp. 271–285.

Duden (Gr) (2009). Deutsche Grammatik. Ed. by Ursula Hoberg and Rudolf Hoberg. 4th ed. Vol. 4. Der kleine Duden. Berlin, Germany: Dudenverlag. Ellis, R. and G. Barkhuizen (2005). Analysing learner language. Oxford: Oxford University Press. Falko Georgetown Dokumentation (2007). Humnoldt-Universität zu Berlin.

slide-82
SLIDE 82

References III

Frogner, Ellen (1933). “Problems of sentence structure in pupils’ themes”. In: English Journal 22, pp. 742–749. Galasso, Sabrina (2014). Exploring Textual Cohesion Characteristics for German Readability Classification. B.A. Thesis. Gibson, Edward (2000). “The dependency locality theory: A distance-based theory of linguistic complexity”. In: Image, language, brain, pp. 95–126. Hancke, Julia (2013). “Automatic Prediction of CERF Proficiency Levels Based on Linguistic Features of Learner Language”. MA thesis. Eberhard Karls Universität Tübingen. Hancke, Julia, Sowmya Vajjala, and Detmar Meurers (2012). “Readability Classification for German using lexical, syntactic and morphological features”. In: Proceedings of COLING. Mumbai, pp. 1063–1080. Housen, Alex, Ineke Vedder, and Folkert Kuiken (2012). “Document Viewing Options: Title: Dimensions of L2 Performance and Proficiency : Complexity, Accuracy and Fluency in SLA”. In: vol. 32. Language Learning & Language Teaching. Amsterdam, Philadelphia: John Benjamins Publishing. Chap. 1–2.

slide-83
SLIDE 83

References IV

Jarvis, Scott et al. (2003). “Exploring multiple profiles of highly rated learner compositions”. In: Journal of Second Language Writing 12,

  • pp. 377–403.

Kyle, Kristopher (2016). “Measuring Syntactic Development in L2 Writing: Fine Grained Indices of Syntactic Complexity and Usage-Based Indices of Syntactic Sophistication”. PhD thesis. Georgia State University. Lavalley, Rémi, Kay Berkling, and Sebastian Stüker (2015). “Preparing Children’s Writing Database for Automated Processing”. In: Workshop

  • n L1 Teaching, Learning and Technology (L1TLT). Leipzig, Germany,
  • pp. 9–15.

Louwerse, Max M. et al. (2004). “Variation in language and cohesion across writ- ten and spoken registers”. In: Proceedings of the 26th Annual Meeting of the Cognitive Science Society, pp. 843–848. Lu, Xiaofei (2010). “Automatic analysis of syntactic complexity in second language writing”. In: International Journal of Corpus Linguistics 15.4,

  • pp. 474–496.
slide-84
SLIDE 84

References V

McNamara, Danielle S. et al. (2014). Automated evaluation of text and discourse with Coh-Metrix. Camebridge University Press. Merlin project (2014a). task desciption: Essay: why it’s of value to learn

  • German. http://merlin-platform.eu/.

Merlin project (2014b). task desciption: Formal letter: apply for internship in sales department. http://merlin-platform.eu/. Merlin project (2014c). task desciption: Formal letter: ask for information at Au pair Agency. http://merlin-platform.eu/. Merlin project (2014d). task desciption: Formal letter: Au pair writes letter of complaint to Agency. http://merlin-platform.eu/. Merlin project (2014e). task desciption: Formal letter to housing office. http://merlin-platform.eu/. Merlin project (2014f). task desciption: Informal e-mail: arrange an appointment with a friend to go swimming together. http://merlin-platform.eu/. Merlin project (2014g). task desciption: Informal e-mail: ask a friend for help with finding an apartment. http://merlin-platform.eu/.

slide-85
SLIDE 85

References VI

Merlin project (2014h). task desciption: Informal letter: ask friend to take care of pet. http://merlin-platform.eu/. Merlin project (2014i). task desciption: Informal letter: birthday

  • congratulations. http://merlin-platform.eu/.

Merlin project (2014j). task desciption: Informal letter: congratulate to birth of a child. http://merlin-platform.eu/. Merlin project (2014k). task desciption: Informal letter for New Year to a

  • friend. http://merlin-platform.eu/.

Merlin project (2014l). task desciption: Informal letter: offer a ticket not used to a friend. http://merlin-platform.eu/. Merlin project (2014m). task desciption: Informal letter to a friend announcing a visit. http://merlin-platform.eu/. Merlin project (2014n). task desciption: Online article: about sticking to

  • ne’s traditions and "assimilation" in a new environment.

http://merlin-platform.eu/. Merlin project (2014o). task desciption: Report: about the housing

  • situation. http://merlin-platform.eu/.
slide-86
SLIDE 86

References VII

Ortega, Lourdes (2003). “Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing”. In: Applied Linguistics 24, pp. 492–518. Pallotti, G. and S. Ferrari (2008). “Lavariabilità situazionale dell’interlingua: Implicazioni per la ricerca acquisizionale e il testing linguistico”. In: Competenze Lessicali e Discorsive nell’Acquisizione di Lingue Seconde. Pallotti, Gabrielle (2009). “CAF: Defining, Refining and Differentiating Constructs”. In: Applied Linguistics 30.4, pp. 590–601. Pallotti, Gabrielle (2015). “A simple view of linguistic complexity”. In: Second Language Research 31.1, pp. 117–134. Polio, Charlene and J.-H. Park (2016). “Language development in second language writing”. In: Handbook of second and foreign language

  • writing. Ed. by R. Manchón and P. K. Matsuda. Mouton de Gruyter.

Rescher, Nicholas (1998). Complexity: A philosophical overview. Transaction Publishers. Reznicek, Marc et al. Das Falko-Handbuch Korpusaufbau und

  • Annotationen. Humnoldt-Universität zu Berlin.
slide-87
SLIDE 87

References VIII

Robinson, Peter (2001). “Task Complexity, Task Difficulty, and Task Production: Exploring Interactions in a Componential Framework”. In: Applied Linguistics 22.1, pp. 27–57. Shain, Cory et al. (2016). “Memory access during incremental sentence processing causes reading time latency”. In: Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity,

  • pp. 49–58.

Skehan, Peter (1996). “A Framework for the Implementation of Task-based Instruction”. In: Applied Linguistics 17.1, pp. 38–62. Thorndike, E. L. (1921). “Word Knowledge in the Elementary School”. In: Teachers College Record 28.5, pp. 334–370. Todirascu, Amalia et al. (2013). “Coherence and cohesion for the assessment of text readability”. In: Natural Language Processing and Cognitive Science 11, pp. 11–19. Tracy-Ventura, Nicole and Florence Myles (2015). “The importance of task variability in the design of learner corpora for SLA research”. In: International Journal of Learner Corpus Research 1.1, pp. 58–95.

slide-88
SLIDE 88

References IX

von der Brück, Tim and Sven Hartrumpf (2007). “A Semantically Oriented Readability Checker for German”. In: Proceedings of the 3rd Language & Technology Conference, pp. 270–274. von der Brück, Tim, Sven Hartrumpf, and Hermann Helbig (2008). “A Readability Checker with Supervised Learning Using Deep Indicators”. In: Informatica 32, pp. 429–435. Weiß, Zarah Leonie (2015). More Linguistically Motivated Features of Language Complexity in Readability Classification of German Textbooks: Implementation and Evaluation. B.A. Thesis. Tübingen, Germany. Weiß, Zarah Leonie (2017). “Using Measures of Linguistic Complexity to Assess German L2 Proficiency in Learner Corpora under Consideration

  • f Task-Effects”. MA thesis. Eberhard Karls Universität Tübingen.

Wolfe-Quintero, Kate, Shunji Inagaki, and Hae-Young Kim (1998). Second Language Development in Writing: Measures of Fluency, Accuracy & Complexity. Second Language Teaching & Curriculum Center.

slide-89
SLIDE 89

References X

Wood, Simon N. (2006). Generalized additive models: an introduction with R. CRC press. Yoon, Hyung-Jo and Charlene Polio (2016). “The Linguistic Development

  • f Students of English as a Second Language in Two Written Genres”.

In: Tesol Quarterly.

slide-90
SLIDE 90

Robinson’s Cognition Hypothesis

Figure: Task complexity, condition, and difficulty (Robinson 2001, p. 30, Figure 1).

slide-91
SLIDE 91

Language Use

DlexDB frequencies

15000 20000 25000 a1 a2 b1 b2 c

proficiency score ATFreq/LTD

15000 20000 25000 30000 a1 a2 b1 b2 c

proficiency score typeFreq/LTD

40000 50000 60000 a1 a2 b1 b2 c

proficiency score lemmaFreq/LTD

3.1 3.2 3.3 3.4 3.5 3.6 3.7 a1 a2 b1 b2 c

proficiency score logATF/LTD

3.3 3.4 3.5 3.6 3.7 a1 a2 b1 b2 c

proficiency score logTypeFreq/LTD

3.7 3.8 3.9 4.0 4.1 a1 a2 b1 b2 c

proficiency score logLemmaFreq/LTD

0.025 0.050 0.075 a1 a2 b1 b2 c

proficiency score logATFBand1/LTD

0.06 0.09 0.12 0.15 a1 a2 b1 b2 c

proficiency score logATFBand2/LTD

0.16 0.20 0.24 0.28 a1 a2 b1 b2 c

proficiency score logATFBand3/LTD

0.25 0.30 0.35 0.40 a1 a2 b1 b2 c

proficiency score logATFBand4/LTD

0.26 0.28 0.30 0.32 0.34 a1 a2 b1 b2 c

proficiency score logATFBand5/LTD

0.02 0.04 0.06 a1 a2 b1 b2 c

proficiency score logATFBand6/LTD

0.3 0.4 0.5 a1 a2 b1 b2 c

proficiency score typesNotInDlex/LT

Figure: Merlin.

  • 20000

25000 30000 35000 40000 1 2 3 4

course ATFreq/LTD

  • 25000

30000 35000 40000 1 2 3 4

course typeFreq/LTD

  • 50000

60000 70000 80000 1 2 3 4

course lemmaFreq/LTD

  • 3.80

3.85 3.90 3.95 4.00 1 2 3 4

course logATF/LTD

  • 3.9

4.0 4.1 1 2 3 4

course logTypeFreq/LTD

  • 4.25

4.30 4.35 4.40 4.45 4.50 4.55 1 2 3 4

course logLemmaFreq/LTD

  • 0.000

0.005 0.010 0.015 1 2 3 4

course logATFBand1/LTD

  • 0.01

0.02 0.03 1 2 3 4

course logATFBand2/LTD

  • 0.06

0.08 0.10 0.12 1 2 3 4

course logATFBand3/LTD

  • 0.36

0.40 0.44 0.48 1 2 3 4

course logATFBand4/LTD

  • 0.36

0.40 0.44 0.48 0.52 1 2 3 4

course logATFBand5/LTD

  • 0.03

0.04 0.05 0.06 0.07 1 2 3 4

course logATFBand6/LTD

  • 0.40

0.45 0.50 0.55 1 2 3 4

course typesNotInDlex/LT

Figure: Falko GT L2; △: curricular tasks, ◦: book reviews.

slide-92
SLIDE 92

Discourse & Encoding of Meaning

Pronouns, articles and names

0.125 0.150 0.175 0.200 a1 a2 b1 b2 c

proficiency score pronouns/TIS

0.06 0.08 0.10 0.12 a1 a2 b1 b2 c

proficiency score persPron/TIS

0.01 0.02 0.03 a1 a2 b1 b2 c

proficiency score possPron/TIS

0.004 0.008 0.012 0.016 a1 a2 b1 b2 c

proficiency score 3PPersPron/TIS

0.005 0.010 0.015 a1 a2 b1 b2 c

proficiency score 3PPossPron/TIS

0.010 0.015 0.020 0.025 0.030 a1 a2 b1 b2 c

proficiency score 3PPers&PossPron/TIS

0.04 0.06 0.08 a1 a2 b1 b2 c

proficiency score 1PPersPron/TIS

0.000 0.005 0.010 0.015 a1 a2 b1 b2 c

proficiency score 1PPossPron/TIS

0.04 0.06 0.08 a1 a2 b1 b2 c

proficiency score 1PPers&PossPron/TIS

0.00 0.01 0.02 0.03 a1 a2 b1 b2 c

proficiency score 2PPersPron/TIS

0.000 0.005 0.010 a1 a2 b1 b2 c

proficiency score 2PPossPron/TIS

0.00 0.01 0.02 0.03 0.04 a1 a2 b1 b2 c

proficiency score 2PPers&PossPron/TIS

0.3 0.4 0.5 0.6 0.7 a1 a2 b1 b2 c

proficiency score defArt/TIS

0.3 0.4 0.5 0.6 0.7 a1 a2 b1 b2 c

proficiency score indefArt/TIS

0.5 1.0 1.5 a1 a2 b1 b2 c

proficiency score properNamesPerSentence

Figure: Merlin.

  • 0.10

0.15 0.20 0.25 1 2 3 4

course pronouns/TIS

  • 0.04

0.08 0.12 0.16 1 2 3 4

course persPron/TIS

  • 0.01

0.02 0.03 1 2 3 4

course possPron/TIS

  • 0.02

0.04 0.06 1 2 3 4

course 3PPersPron/TIS

  • 0.00

0.01 0.02 1 2 3 4

course 3PPossPron/TIS

  • 0.02

0.04 0.06 1 2 3 4

course 3PPers&PossPron/TIS

  • 0.000

0.025 0.050 0.075 0.100 0.125 1 2 3 4

course 1PPersPron/TIS

  • 0.000

0.005 0.010 0.015 0.020 1 2 3 4

course 1PPossPron/TIS

  • 0.05

0.10 1 2 3 4

course 1PPers&PossPron/TIS

  • 0.000

0.005 0.010 0.015 1 2 3 4

course 2PPersPron/TIS

  • 0.0000

0.0005 0.0010 0.0015 1 2 3 4

course 2PPossPron/TIS

  • 0.000

0.005 0.010 0.015 0.020 1 2 3 4

course 2PPers&PossPron/TIS

  • 0.60

0.65 0.70 0.75 0.80 1 2 3 4

course defArt/TIS

  • 0.60

0.65 0.70 0.75 0.80 1 2 3 4

course indefArt/TIS

  • 0.25

0.50 0.75 1.00 1.25 1 2 3 4

course properNamesPerSentence

Figure: Falko GT L2; △: curricular tasks, ◦: book reviews.

slide-93
SLIDE 93

Lexical Complexity

Lexical Variation

0.80 0.85 0.90 a1 a2 b1 b2 c

proficiency score lexTypes/lexToken

0.500 0.525 0.550 0.575 0.600 0.625 a1 a2 b1 b2 c

proficiency score lexTypes/Token

0.18 0.20 0.22 a1 a2 b1 b2 c

proficiency score lexVerbTypes/lexToken

0.80 0.84 0.88 0.92 a1 a2 b1 b2 c

proficiency score lexVerbTypes/lexVerbs

5 10 15 20 a1 a2 b1 b2 c

proficiency score (LexVerbTypes/lexVerbs)^2

1.0 1.5 2.0 2.5 3.0 a1 a2 b1 b2 c

proficiency score corrLexVerbTypes/lexVerb

0.11 0.12 0.13 a1 a2 b1 b2 c

proficiency score lexVerbs/token

0.4 0.5 0.6 a1 a2 b1 b2 c

proficiency score nouns/lexToken

0.25 0.30 0.35 0.40 a1 a2 b1 b2 c

proficiency score nouns/token

0.5 0.6 0.7 0.8 a1 a2 b1 b2 c

proficiency score verbs/noun

0.100 0.125 0.150 0.175 a1 a2 b1 b2 c

proficiency score adjectives/lexToken

0.04 0.06 0.08 0.10 a1 a2 b1 b2 c

proficiency score adverbs/lexToken

0.16 0.20 0.24 a1 a2 b1 b2 c

proficiency score adj+adv/lexToken

Figure: Merlin.

  • 0.60

0.65 0.70 0.75 1 2 3 4

course lexTypes/lexToken

  • 0.48

0.50 0.52 0.54 1 2 3 4

course lexTypes/Token

  • 0.12

0.16 0.20 0.24 1 2 3 4

course lexVerbTypes/lexToken

  • 0.70

0.75 0.80 0.85 1 2 3 4

course lexVerbTypes/lexVerbs

  • 20

40 60 1 2 3 4

course (LexVerbTypes/lexVerbs)^2

  • 3

4 5 6 1 2 3 4

course corrLexVerbTypes/lexVerb

  • 0.100

0.125 0.150 1 2 3 4

course lexVerbs/token

  • 0.25

0.30 0.35 1 2 3 4

course nouns/lexToken

  • 0.175

0.200 0.225 0.250 0.275 1 2 3 4

course nouns/token

  • 0.6

0.8 1.0 1.2 1.4 1 2 3 4

course verbs/noun

  • 0.10

0.12 0.14 0.16 1 2 3 4

course adjectives/lexToken

  • 0.04

0.06 0.08 1 2 3 4

course adverbs/lexToken

  • 0.17

0.19 0.21 0.23 1 2 3 4

course adj+adv/lexToken

Figure: Falko GT L2; △: curricular tasks, ◦: book reviews.

slide-94
SLIDE 94

Syntactic Complexity

Periphrastic grammatical measures

0.025 0.050 0.075 0.100 0.125 a1 a2 b1 b2 c

proficiency score eventivePassive/finClause

0.00 0.01 0.02 0.03 0.04 a1 a2 b1 b2 c

proficiency score passives/finClause

0.00 0.02 0.04 0.06 a1 a2 b1 b2 c

proficiency score quasiPassives/finClause

0.10 0.15 0.20 a1 a2 b1 b2 c

proficiency score sein/verbs

0.06 0.09 0.12 0.15 a1 a2 b1 b2 c

proficiency score haben/verbs

0.65 0.70 0.75 0.80 0.85 a1 a2 b1 b2 c

proficiency score simplePresent/vfin

0.050 0.075 0.100 a1 a2 b1 b2 c

proficiency score simplePast/vfin

0.02 0.04 0.06 0.08 0.10 a1 a2 b1 b2 c

proficiency score presentPerfect/vfin

0.0000 0.0025 0.0050 0.0075 0.0100 a1 a2 b1 b2 c

proficiency score pastPerfect/vfin

0.00 0.01 0.02 0.03 a1 a2 b1 b2 c

proficiency score future1/vfin

−0.50 −0.25 0.00 0.25 0.50 a1 a2 b1 b2 c

proficiency score future2/vfin

0.2 0.3 0.4 0.5 a1 a2 b1 b2 c

proficiency score coverageTenses

0.1 0.2 0.3 a1 a2 b1 b2 c

proficiency score coveragePeriphrasticTenses

Figure: Merlin.

  • 0.09

0.12 0.15 1 2 3 4

course eventivePassive/finClause

  • 0.00

0.02 0.04 1 2 3 4

course passives/finClause

  • 0.00

0.01 0.02 0.03 0.04 1 2 3 4

course quasiPassives/finClause

  • 0.15

0.20 0.25 0.30 1 2 3 4

course sein/verbs

  • 0.06

0.08 0.10 0.12 0.14 0.16 1 2 3 4

course haben/verbs

  • 0.2

0.4 0.6 1 2 3 4

course simplePresent/vfin

  • 0.2

0.4 0.6 1 2 3 4

course simplePast/vfin

  • 0.05

0.10 1 2 3 4

course presentPerfect/vfin

  • 0.00

0.01 0.02 0.03 0.04 0.05 1 2 3 4

course pastPerfect/vfin

  • 0.00

0.02 0.04 0.06 0.08 1 2 3 4

course future1/vfin

  • 0.000

0.001 0.002 0.003 1 2 3 4

course future2/vfin

  • 0.5

0.6 0.7 0.8 1 2 3 4

course coverageTenses

  • 0.3

0.4 0.5 0.6 0.7 1 2 3 4

course coveragePeriphrasticTenses

Figure: Falko GT L2; △: curricular tasks, ◦: book reviews.

slide-95
SLIDE 95

Syntactic Complexity

Dependent clause measures

1.25 1.50 1.75 2.00 a1 a2 b1 b2 c

proficiency score clauses/sentence

0.0 0.2 0.4 0.6 a1 a2 b1 b2 c

proficiency score depClauses/sentence

0.0 0.1 0.2 0.3 a1 a2 b1 b2 c

proficiency score conjClauses/sentence

0.0 0.1 0.2 0.3 0.4 0.5 a1 a2 b1 b2 c

proficiency score depClausesWithConj/sentence

0.0000 0.0025 0.0050 0.0075 0.0100 a1 a2 b1 b2 c

proficiency score depClausesWithoutConj/sentence

0.00 0.02 0.04 0.06 a1 a2 b1 b2 c

proficiency score interrogativeClauses/sentence

0.00 0.03 0.06 0.09 a1 a2 b1 b2 c

proficiency score relativeClauses/sentence

Figure: Merlin.

  • 1.8

2.0 2.2 1 2 3 4

course clauses/sentence

  • 0.2

0.3 0.4 0.5 0.6 1 2 3 4

course depClauses/sentence

  • 0.2

0.3 0.4 1 2 3 4

course conjClauses/sentence

  • 0.2

0.3 0.4 0.5 0.6 1 2 3 4

course depClausesWithConj/sentence

  • 0.00

0.01 0.02 1 2 3 4

course depClausesWithoutConj/sentence

  • 0.00

0.02 0.04 0.06 1 2 3 4

course interrogativeClauses/sentence

  • 0.00

0.05 0.10 0.15 0.20 1 2 3 4

course relativeClauses/sentence

Figure: Falko GT L2; △: curricular tasks, ◦: book reviews.

slide-96
SLIDE 96

Generative Additive Regression Models

From Linear to Additive Models

ˆ y = η + ǫ, where ǫ ∼ N(0, σ2) and η = β0 +

I

  • i=1

xiβi (1) g(ˆ y) = η + ǫ, where η = β0 +

I

  • i=1

xiβi (2) g(ˆ y) = η + ǫ, where η = β0 +

I

  • i=1

si(xi) (3)

slide-97
SLIDE 97

Generative Additive Regression Models

From Linear to Additive Models

g(ˆ y) = η + ǫ, where η = β0 +

I

  • i=1

si(xi) (4) s(x) =

K

  • k=1

bk(x)βk, (5) s(x) =

C+1

  • c=1

xc−1βc (6)

slide-98
SLIDE 98

Generative Additive Regression Models

Regression Splines

Figure: Single cubic basis function (left) and full cubic regression spline (right),

  • cf. Wood 2006, p. 147, Figure 4.1.
slide-99
SLIDE 99

Generative Additive Regression Models

Regression Splines

Figure: A rank 7 thin plate regression spline preceded by its weighted basis functions, cf. Wood 2006, p. 153, Figure 4.5.

slide-100
SLIDE 100

Generative Additive Regression Models

Ordinal Models

u = η + ǫ, where η = β0 +

I

  • i=1

si(xi), and u ∈ [±∞] (7)

  • Ordinal data neither numeric nor nominal
  • Ordinal distribution not covered in exponential link functions g()
  • Solution by Wood 2006: partition ±∞ into K bins using K-1

boundaries

  • Estimate latent variable u with regression model
  • Assign ordinal category based on interval in which u falls
slide-101
SLIDE 101

Generative Additive Regression Models

Ordinal Models

Figure: Mapping of latent variable u to CEFR levels using estimated boundaries from the interaction model of Study 1.

slide-102
SLIDE 102

Automatic System

Overview

English Systems

  • Coh-Metrix Web Interface (McNamara et al. 2014)
  • Syntactic and Lexical Complexity analyzer (Lu 2010)
  • Linguistic Analysis Tool (Kyle 2016)
  • Common Text Analysis Platform (Chen and Meurers 2016)

German Systems

  • DeLite Readability Checker (von der Brück and Hartrumpf 2007; von

der Brück, Hartrumpf, and Helbig 2008)

  • Tübinger complexity code (Hancke, Vajjala, and Meurers 2012; Weiß

2015)