Broad Linguistic Modeling is Beneficial for German L2 Proficiency - - PowerPoint PPT Presentation

broad linguistic modeling is beneficial for german l2
SMART_READER_LITE
LIVE PREVIEW

Broad Linguistic Modeling is Beneficial for German L2 Proficiency - - PowerPoint PPT Presentation

Broad Linguistic Modeling is Beneficial for German L2 Proficiency Assessment Zarah Wei and Detmar Meurers Eberhard Karls University, T ubingen Learner Corpus Research Conference Bolzano/Bozen, 5-7 October 2017 October 6th, 2017 1


slide-1
SLIDE 1

1

Broad Linguistic Modeling is Beneficial for German L2 Proficiency Assessment

Zarah Weiß and Detmar Meurers Eberhard Karls University, T¨ ubingen

Learner Corpus Research Conference Bolzano/Bozen, 5-7 October 2017

October 6th, 2017

slide-2
SLIDE 2

2

Outline

1 Introduction 2 Merlin Data 3 Study 1 4 Study 2 5 Conclusion

slide-3
SLIDE 3

3

Introduction

L2 Complexity

  • Dimensions of L2 performance: Complexity, Accuracy, Fluency

(Housen, Vedder, and Kuiken 2012)

  • Complexity: elaborateness, variedness, and inter-relatedness of a

system (Ellis and Barkhuizen 2005; Rescher 1998)

  • Research on operationalization of complexity into implementable

features (Crossley, Kyle, and McNamara 2016; Hancke, Vajjala, and Meurers 2012; Kyle 2016; Lu and Ai 2015)

  • Assessment of e.g. proficiency, readability, essay scoring
slide-4
SLIDE 4

4

Introduction

L2 Complexity

  • Increasing number of diverse complexity measures available due to

advances in NLP technology

  • Made available through text analysis systems, e.g. CohMetrix

(Crossley and McNamara 2012), Linguistic Analysis tool (Kyle 2016), CTAP (Chen and Meurers 2016)

  • Most studies use only few established measures of linguistic

elaborateness → Include measures from various theoretical backgrounds → Include measures of variedness as well as elaborateness

slide-5
SLIDE 5

5

Introduction

Complexity Analysis System

  • 398 measures of language elaborateness and variedness extracted by

elaborate NLP tool chain

  • To be integrated in CTAP (Chen and Meurers 2016) by end of 2017
  • Cover domains of

1 Theoretical linguistics (syntax, lexico-semantics, morphology), 2 Discourse & encoding of meaning, 3 Language use, 4 Human language processing

slide-6
SLIDE 6

6

Introduction

Measures of the Linguistic System

Theoretical Linguistics

  • Lexio-Semantic: lexical diversity, variation, and density; semantic

relations (Lu 2011; McCarthy and Jarvis 2007)

  • Syntactic: dependent clause ratios, modifier ratios, complex NP,

periphrastic constructions, etc. (Kyle 2016; Wolfe-Quintero, Inagaki, and Kim 1998)

  • Morphological: inflection, derivation, and composition (Fran¸

cois and Fairon 2012; Hancke, Vajjala, and Meurers 2012) Discourse & Encoding of Meaning

  • Connectives
  • Local and global overlap of linguistic material and co-referential

expressions (pronouns, articles, etc.)

  • Local transitions of grammatical roles (Barzilay and Lapata 2008;

Todirascu et al. 2013)

slide-7
SLIDE 7

7

Introduction

Psycho-Linguistic Measures

Language Use

  • Word frequencies (dlexDB, SUBTLEX-DE, Google Books 2000)
  • Approximate age of active use based on Karlsruhe Chilrend’s Texts

corpus (Lavalley, Berkling, and St¨ uker 2015) Human Language Processing

  • Cognitive processing cost based on Dependency Locality Theory

(DLT) by Gibson 2000; Shain et al. 2016

  • Argument-verb distances, dependency lengths
slide-8
SLIDE 8

8

Merlin Data

Overview

  • 1,033 German L2 texts in German section of Merlin corpus (Abel

et al. 2013)

  • Elicited in official standardized language certification tests for 5

CEFR test levels (A1-C1)

  • Rated by human experts trained on CEFR-based Merlin rating grid

by (Wisniewski et al. 2013)

  • Holistic proficiency scores range from A1 to C2
slide-9
SLIDE 9

9

Merlin Data

Distribution across Proficiency Scores

100 200 300 a1 a2 b1 b2 c1 c2

Overall CEFR Score Number of Essays META_CEFR_LevelOfTest

a1 a2 b1 b2 c1

Figure: Number of essays per holisitic proficiency score grouped by test level.

slide-10
SLIDE 10

10

Study 1

Methods

  • Analysis of overall proficiency scores (A1 - C1+C2) on

non-normalized data

  • SVM classification with SMO with linear kernel (K=1)
  • Feature ranking with information gain

→ Implementations from WEKA machine learning toolkit (Frank, Hall, and Witten 2016; Hall et al. 2009)

  • Training and testing with 10 repetitions of 10-folds cross-validation
  • Correlation analysis with Pearson rank correlation for −0.7 ≤ r ≤ 0.7
slide-11
SLIDE 11

11

Study 1

Classification Results

Set n Pre. Rec. F1 Majority 1 10.3 32.0 15.6 All 366 67.6 68.2 67.4 IG 100 100 67.9 71.0 68.5 Language Use 54 54.6 59.9 56.6 HLP 32 49.4 54.7 51.7 Discourse 84 58.0 63.4 59.8 Clausal 147 60.3 64.3 61.3 Phrasal 28 57.0 62.1 58.8 Lex/Sem 66 66.0 69.1 66.3 Morph 43 56.2 61.5 58.8

Table: Precision, recall, and F1 scores across feature sets for proficiency classification with 10∗10-CV SMO (K=1).

slide-12
SLIDE 12

12

Study 1

Classification Results

Obs.\Pred. A1 A2 B1 B2 C A1 24 32 1 A2 20 225 61 B1 1 56 219 54 1 B2 4 44 231 14 C 1 36 9

Table: Averaged confusion matrix for IG 100 model.

slide-13
SLIDE 13

13

Most Informative Measures

Information Gain Ranking

# Measure 1 Number of tokens 2 Corrected type token ratio 7 Sum of longest dependencies per sentence 14 Dependent clauses with conjunction per dep. clause 15

  • Cvg. of NP modifier types

16

  • Dep. clauses per sentence

25 P(not → not) per transition 26 Verbs per sentence 30 VP modifiers per VP 34 NT per word

Table: Top 10 uncorrelated measures.

# Measure 36 SD of verb cluster size 38 VZ per sentence 39 P(not → object) per transition 40 Total integration cost at finite verb per finite verb 42 Syllables per token 43 HDD 49

  • Cvg. of verb cluster size

52

  • Cvg. of verb cluster types

56 P(object → not) per transition 57 Words in VPs per VP

Table: Top 20 uncorrelated measures.

slide-14
SLIDE 14

13

Most Informative Measures

Information Gain Ranking: Clausal Complexity

# Measure 1 Number of tokens 2 Corrected type token ratio 7 Sum of longest dependencies per sentence 14 Dependent clauses with conjunction per dep. clause 15

  • Cvg. of NP modifier types

16

  • Dep. clauses per sentence

25 P(not → not) per transition 26 Verbs per sentence 30 VP modifiers per VP 34 NT per word

Table: Top 10 uncorrelated measures.

# Measure 36 SD of verb cluster size 38 VZ per sentence 39 P(not → object) per transition 40 Total integration cost at finite verb per finite verb 42 Syllables per token 43 HDD 49

  • Cvg. of verb cluster size

52

  • Cvg. of verb cluster types

56 P(object → not) per transition 57 Words in VPs per VP

Table: Top 20 uncorrelated measures.

slide-15
SLIDE 15

13

Most Informative Measures

Information Gain Ranking: Phrasal Complexity

# Measure 1 Number of tokens 2 Corrected type token ratio 7 Sum of longest dependencies per sentence 14 Dependent clauses with conjunction per dep. clause 15

  • Cvg. of NP modifier types

16

  • Dep. clauses per sentence

25 P(not → not) per transition 26 Verbs per sentence 30 VP modifiers per VP 34 NT per word

Table: Top 10 uncorrelated measures.

# Measure 36 SD of verb cluster size 38 VZ per sentence 39 P(not → object) per transition 40 Total integration cost at finite verb per finite verb 42 Syllables per token 43 HDD 49

  • Cvg. of verb cluster size

52

  • Cvg. of verb cluster types

56 P(object → not) per transition 57 Words in VPs per VP

Table: Top 20 uncorrelated measures.

slide-16
SLIDE 16

13

Most Informative Measures

Information Gain Ranking: Lexical Complexity

# Measure 1 Number of tokens 2 Corrected type token ratio 7 Sum of longest dependencies per sentence 14 Dependent clauses with conjunction per dep. clause 15

  • Cvg. of NP modifier types

16

  • Dep. clauses per sentence

25 P(not → not) per transition 26 Verbs per sentence 30 VP modifiers per VP 34 NT per word

Table: Top 10 uncorrelated measures.

# Measure 36 SD of verb cluster size 38 VZ per sentence 39 P(not → object) per transition 40 Total integration cost at finite verb per finite verb 42 Syllables per token 43 HDD 49

  • Cvg. of verb cluster size

52

  • Cvg. of verb cluster types

56 P(object → not) per transition 57 Words in VPs per VP

Table: Top 20 uncorrelated measures.

slide-17
SLIDE 17

13

Most Informative Measures

Information Gain Ranking: Cohesion

# Measure 1 Number of tokens 2 Corrected type token ratio 7 Sum of longest dependencies per sentence 14 Dependent clauses with conjunction per dep. clause 15

  • Cvg. of NP modifier types

16

  • Dep. clauses per sentence

25 P(not → not) per transition 26 Verbs per sentence 30 VP modifiers per VP 34 NT per word

Table: Top 10 uncorrelated measures.

# Measure 36 SD of verb cluster size 38 VZ per sentence 39 P(not → object) per transition 40 Total integration cost at finite verb per finite verb 42 Syllables per token 43 HDD 49

  • Cvg. of verb cluster size

52

  • Cvg. of verb cluster types

56 P(object → not) per transition 57 Words in VPs per VP

Table: Top 20 uncorrelated measures.

slide-18
SLIDE 18

13

Most Informative Measures

Information Gain Ranking: Human Language Processing

# Measure 1 Number of tokens 2 Corrected type token ratio 7 Sum of longest dependencies per sentence 14 Dependent clauses with conjunction per dep. clause 15

  • Cvg. of NP modifier types

16

  • Dep. clauses per sentence

25 P(not → not) per transition 26 Verbs per sentence 30 VP modifiers per VP 34 NT per word

Table: Top 10 uncorrelated measures.

# Measure 36 SD of verb cluster size 38 VZ per sentence 39 P(not → object) per transition 40 Total integration cost at finite verb per finite verb 42 Syllables per token 43 HDD 49

  • Cvg. of verb cluster size

52

  • Cvg. of verb cluster types

56 P(object → not) per transition 57 Words in VPs per VP

Table: Top 20 uncorrelated measures.

slide-19
SLIDE 19

13

Most Informative Measures

Information Gain Ranking: Length

# Measure 1 Number of tokens 2 Corrected type token ratio 7 Sum of longest dependencies per sentence 14 Dependent clauses with conjunction per dep. clause 15

  • Cvg. of NP modifier types

16

  • Dep. clauses per sentence

25 P(not → not) per transition 26 Verbs per sentence 30 VP modifiers per VP 34 NT per word

Table: Top 10 uncorrelated measures.

# Measure 36 SD of verb cluster size 38 VZ per sentence 39 P(not → object) per transition 40 Total integration cost at finite verb per finite verb 42 Syllables per token 43 HDD 49

  • Cvg. of verb cluster size

52

  • Cvg. of verb cluster types

56 P(object → not) per transition 57 Words in VPs per VP

Table: Top 20 uncorrelated measures.

slide-20
SLIDE 20

14

Study 2

Outline

  • Ordinal GAM for nonlinear regression modeling with 10 iterations of

10-folds cross-validation

  • Implemented in R with packages mgcv (Wood 2003, 2004, 2011),

itsadug (van Rij, Wieling, Baayen, and van Rijn 2016), and caret (Kuhn et al. 2016)

  • Investigate interactions of complexity measures with functional task

factor task theme

slide-21
SLIDE 21

15

Study 2

Task Effects

  • Task factors known to influence CAF (Ellis and Barkhuizen 2005;

Foster and Tavakoli 2009; Tavakoli and Foster 2011; Tracy-Ventura and Myles 2015) → Functional factors: communicative goal, text type, task type (Ravid and Tolchinsky 2002; Yoon and Polio 2016) → Cognitive factors: cognitive and code complexity (Skehan 1998; Skehan and Foster 1997), spatio-temporal dislocation (Robinson 2001)

  • Annotate 15 task factors in Merlin based on task descriptions

following Alexopoulou, Michel, Murakami, and Meurers 2017 in Weiß 2017

  • In Merlin 3 tasks per test level (Merlin project

2014a,b,c,d,e,f,g,h,i,j,k,l,m,n,o)

slide-22
SLIDE 22

16

Study 2

Test Levels in Merlin

50 100 150 200 a1 a2 b1 b2 c1

Test Level Number of Essays META_CEFR_OverallScore

a1 a2 b1 b2 c1 c2

Figure: Number of essays per test level grouped by holisitic proficiency score.

slide-23
SLIDE 23

17

Study 2

Feature Selection

1 Ranking by 10 iterations of 10-folds cross-validation of information

gain ranking

2 Remove measures not approximating normal distribution after

transformation (log, square, binarization)

3 Build increasingly complex model of uncorrelated measures (±0.70) 4 Introduce smooths if measures were non-linear 5 Do until no significantly better model fit is reached (χ2, α < 0.05)

within 20 iterations → 13 complexity measures

slide-24
SLIDE 24

18

Study 2

Model Discussion

Figure: Summary of interaction model predicting overall CEFR scores. Uses ’demand’ as reference level for task theme.

slide-25
SLIDE 25

19

Study 2

Model Discussion

  • A. parametric coefficients

Estimate

  • Std. Error

t-value p-value (Intercept) 8.3759 0.3833 21.8509 < 0.0001 Has transitions from subject to not

  • 0.5349

0.2387

  • 2.2408

0.0250 Has 3rd person possessive pronouns

  • 0.8906

0.2030

  • 4.3873

< 0.0001 Contains to infinitives

  • 0.5541

0.2282

  • 2.4284

0.0152 Uses conjunctional clauses

  • 0.6051

0.3173

  • 1.9074

0.0565 Half modal cluster per VP 0.1831 0.1011 1.8113 0.0701 Log sum non terminal nodes per sentence 1.9714 0.1785 11.0435 < 0.0001 Log avg. type freq. band 2 per types in dlexDB

  • 0.3003

0.1091

  • 2.7528

0.0059

  • Avg. total integration cost at finite verb

0.3705 0.1059 3.4968 0.0005 Lexical types found in dlexDB per lexical type 0.8840 0.0942 9.3858 < 0.0001 Type token ratio 1.2853 0.2038 6.3068 < 0.0001 Log sum non terminal nodes per word

  • 0.7130

0.1598

  • 4.4619

< 0.0001 . . . . . . . . . . . . . . .

  • B. smooth terms

edf Ref.df F-value p-value s(Characters per word) 2.7714 3.5484 18.5670 0.0007 s(Squared number of sentences) 4.6262 5.7193 254.0399 < 0.0001

Table: Summary of interaction model predicting overall CEFR scores. Uses ’demand’ as reference level for task theme.

slide-26
SLIDE 26

19

Study 2

Model Discussion: Clausal Complexity

  • A. parametric coefficients

Estimate

  • Std. Error

t-value p-value (Intercept) 8.3759 0.3833 21.8509 < 0.0001 Has transitions from subject to not

  • 0.5349

0.2387

  • 2.2408

0.0250 Has 3rd person possessive pronouns

  • 0.8906

0.2030

  • 4.3873

< 0.0001 Contains to infinitives

  • 0.5541

0.2282

  • 2.4284

0.0152 Uses conjunctional clauses

  • 0.6051

0.3173

  • 1.9074

0.0565 Half modal cluster per VP 0.1831 0.1011 1.8113 0.0701 Log sum non terminal nodes per sentence 1.9714 0.1785 11.0435 < 0.0001 Log avg. type freq. band 2 per types in dlexDB

  • 0.3003

0.1091

  • 2.7528

0.0059

  • Avg. total integration cost at finite verb

0.3705 0.1059 3.4968 0.0005 Lexical types found in dlexDB per lexical type 0.8840 0.0942 9.3858 < 0.0001 Type token ratio 1.2853 0.2038 6.3068 < 0.0001 Log sum non terminal nodes per word

  • 0.7130

0.1598

  • 4.4619

< 0.0001 . . . . . . . . . . . . . . .

  • B. smooth terms

edf Ref.df F-value p-value s(Characters per word) 2.7714 3.5484 18.5670 0.0007 s(Squared number of sentences) 4.6262 5.7193 254.0399 < 0.0001

Table: Summary of interaction model predicting overall CEFR scores. Uses ’demand’ as reference level for task theme.

slide-27
SLIDE 27

19

Study 2

Model Discussion: Phrasal Complexity

  • A. parametric coefficients

Estimate

  • Std. Error

t-value p-value (Intercept) 8.3759 0.3833 21.8509 < 0.0001 Has transitions from subject to not

  • 0.5349

0.2387

  • 2.2408

0.0250 Has 3rd person possessive pronouns

  • 0.8906

0.2030

  • 4.3873

< 0.0001 Contains to infinitives

  • 0.5541

0.2282

  • 2.4284

0.0152 Uses conjunctional clauses

  • 0.6051

0.3173

  • 1.9074

0.0565 Half modal cluster per VP 0.1831 0.1011 1.8113 0.0701 Log sum non terminal nodes per sentence 1.9714 0.1785 11.0435 < 0.0001 Log avg. type freq. band 2 per types in dlexDB

  • 0.3003

0.1091

  • 2.7528

0.0059

  • Avg. total integration cost at finite verb

0.3705 0.1059 3.4968 0.0005 Lexical types found in dlexDB per lexical type 0.8840 0.0942 9.3858 < 0.0001 Type token ratio 1.2853 0.2038 6.3068 < 0.0001 Log sum non terminal nodes per word

  • 0.7130

0.1598

  • 4.4619

< 0.0001 . . . . . . . . . . . . . . .

  • B. smooth terms

edf Ref.df F-value p-value s(Characters per word) 2.7714 3.5484 18.5670 0.0007 s(Squared number of sentences) 4.6262 5.7193 254.0399 < 0.0001

Table: Summary of interaction model predicting overall CEFR scores. Uses ’demand’ as reference level for task theme.

slide-28
SLIDE 28

19

Study 2

Model Discussion: Lexical Complexity

  • A. parametric coefficients

Estimate

  • Std. Error

t-value p-value (Intercept) 8.3759 0.3833 21.8509 < 0.0001 Has transitions from subject to not

  • 0.5349

0.2387

  • 2.2408

0.0250 Has 3rd person possessive pronouns

  • 0.8906

0.2030

  • 4.3873

< 0.0001 Contains to infinitives

  • 0.5541

0.2282

  • 2.4284

0.0152 Uses conjunctional clauses

  • 0.6051

0.3173

  • 1.9074

0.0565 Half modal cluster per VP 0.1831 0.1011 1.8113 0.0701 Log sum non terminal nodes per sentence 1.9714 0.1785 11.0435 < 0.0001 Log avg. type freq. band 2 per types in dlexDB

  • 0.3003

0.1091

  • 2.7528

0.0059

  • Avg. total integration cost at finite verb

0.3705 0.1059 3.4968 0.0005 Lexical types found in dlexDB per lexical type 0.8840 0.0942 9.3858 < 0.0001 Type token ratio 1.2853 0.2038 6.3068 < 0.0001 Log sum non terminal nodes per word

  • 0.7130

0.1598

  • 4.4619

< 0.0001 . . . . . . . . . . . . . . .

  • B. smooth terms

edf Ref.df F-value p-value s(Characters per word) 2.7714 3.5484 18.5670 0.0007 s(Squared number of sentences) 4.6262 5.7193 254.0399 < 0.0001

Table: Summary of interaction model predicting overall CEFR scores. Uses ’demand’ as reference level for task theme.

slide-29
SLIDE 29

19

Study 2

Model Discussion: Cohesion

  • A. parametric coefficients

Estimate

  • Std. Error

t-value p-value (Intercept) 8.3759 0.3833 21.8509 < 0.0001 Has transitions from subject to not

  • 0.5349

0.2387

  • 2.2408

0.0250 Has 3rd person possessive pronouns

  • 0.8906

0.2030

  • 4.3873

< 0.0001 Contains to infinitives

  • 0.5541

0.2282

  • 2.4284

0.0152 Uses conjunctional clauses

  • 0.6051

0.3173

  • 1.9074

0.0565 Half modal cluster per VP 0.1831 0.1011 1.8113 0.0701 Log sum non terminal nodes per sentence 1.9714 0.1785 11.0435 < 0.0001 Log avg. type freq. band 2 per types in dlexDB

  • 0.3003

0.1091

  • 2.7528

0.0059

  • Avg. total integration cost at finite verb

0.3705 0.1059 3.4968 0.0005 Lexical types found in dlexDB per lexical type 0.8840 0.0942 9.3858 < 0.0001 Type token ratio 1.2853 0.2038 6.3068 < 0.0001 Log sum non terminal nodes per word

  • 0.7130

0.1598

  • 4.4619

< 0.0001 . . . . . . . . . . . . . . .

  • B. smooth terms

edf Ref.df F-value p-value s(Characters per word) 2.7714 3.5484 18.5670 0.0007 s(Squared number of sentences) 4.6262 5.7193 254.0399 < 0.0001

Table: Summary of interaction model predicting overall CEFR scores. Uses ’demand’ as reference level for task theme.

slide-30
SLIDE 30

19

Study 2

Model Discussion: Language Use

  • A. parametric coefficients

Estimate

  • Std. Error

t-value p-value (Intercept) 8.3759 0.3833 21.8509 < 0.0001 Has transitions from subject to not

  • 0.5349

0.2387

  • 2.2408

0.0250 Has 3rd person possessive pronouns

  • 0.8906

0.2030

  • 4.3873

< 0.0001 Contains to infinitives

  • 0.5541

0.2282

  • 2.4284

0.0152 Uses conjunctional clauses

  • 0.6051

0.3173

  • 1.9074

0.0565 Half modal cluster per VP 0.1831 0.1011 1.8113 0.0701 Log sum non terminal nodes per sentence 1.9714 0.1785 11.0435 < 0.0001 Log avg. type freq. band 2 per types in dlexDB

  • 0.3003

0.1091

  • 2.7528

0.0059

  • Avg. total integration cost at finite verb

0.3705 0.1059 3.4968 0.0005 Lexical types found in dlexDB per lexical type 0.8840 0.0942 9.3858 < 0.0001 Type token ratio 1.2853 0.2038 6.3068 < 0.0001 Log sum non terminal nodes per word

  • 0.7130

0.1598

  • 4.4619

< 0.0001 . . . . . . . . . . . . . . .

  • B. smooth terms

edf Ref.df F-value p-value s(Characters per word) 2.7714 3.5484 18.5670 0.0007 s(Squared number of sentences) 4.6262 5.7193 254.0399 < 0.0001

Table: Summary of interaction model predicting overall CEFR scores. Uses ’demand’ as reference level for task theme.

slide-31
SLIDE 31

19

Study 2

Model Discussion: Human Language Processing

  • A. parametric coefficients

Estimate

  • Std. Error

t-value p-value (Intercept) 8.3759 0.3833 21.8509 < 0.0001 Has transitions from subject to not

  • 0.5349

0.2387

  • 2.2408

0.0250 Has 3rd person possessive pronouns

  • 0.8906

0.2030

  • 4.3873

< 0.0001 Contains to infinitives

  • 0.5541

0.2282

  • 2.4284

0.0152 Uses conjunctional clauses

  • 0.6051

0.3173

  • 1.9074

0.0565 Half modal cluster per VP 0.1831 0.1011 1.8113 0.0701 Log sum non terminal nodes per sentence 1.9714 0.1785 11.0435 < 0.0001 Log avg. type freq. band 2 per types in dlexDB

  • 0.3003

0.1091

  • 2.7528

0.0059

  • Avg. total integration cost at finite verb

0.3705 0.1059 3.4968 0.0005 Lexical types found in dlexDB per lexical type 0.8840 0.0942 9.3858 < 0.0001 Type token ratio 1.2853 0.2038 6.3068 < 0.0001 Log sum non terminal nodes per word

  • 0.7130

0.1598

  • 4.4619

< 0.0001 . . . . . . . . . . . . . . .

  • B. smooth terms

edf Ref.df F-value p-value s(Characters per word) 2.7714 3.5484 18.5670 0.0007 s(Squared number of sentences) 4.6262 5.7193 254.0399 < 0.0001

Table: Summary of interaction model predicting overall CEFR scores. Uses ’demand’ as reference level for task theme.

slide-32
SLIDE 32

19

Study 2

Model Discussion: Length

  • A. parametric coefficients

Estimate

  • Std. Error

t-value p-value (Intercept) 8.3759 0.3833 21.8509 < 0.0001 Has transitions from subject to not

  • 0.5349

0.2387

  • 2.2408

0.0250 Has 3rd person possessive pronouns

  • 0.8906

0.2030

  • 4.3873

< 0.0001 Contains to infinitives

  • 0.5541

0.2282

  • 2.4284

0.0152 Uses conjunctional clauses

  • 0.6051

0.3173

  • 1.9074

0.0565 Half modal cluster per VP 0.1831 0.1011 1.8113 0.0701 Log sum non terminal nodes per sentence 1.9714 0.1785 11.0435 < 0.0001 Log avg. type freq. band 2 per types in dlexDB

  • 0.3003

0.1091

  • 2.7528

0.0059

  • Avg. total integration cost at finite verb

0.3705 0.1059 3.4968 0.0005 Lexical types found in dlexDB per lexical type 0.8840 0.0942 9.3858 < 0.0001 Type token ratio 1.2853 0.2038 6.3068 < 0.0001 Log sum non terminal nodes per word

  • 0.7130

0.1598

  • 4.4619

< 0.0001 . . . . . . . . . . . . . . .

  • B. smooth terms

edf Ref.df F-value p-value s(Characters per word) 2.7714 3.5484 18.5670 0.0007 s(Squared number of sentences) 4.6262 5.7193 254.0399 < 0.0001

Table: Summary of interaction model predicting overall CEFR scores. Uses ’demand’ as reference level for task theme.

slide-33
SLIDE 33

19

Study 2

Model Discussion: Measures from Study 1

  • A. parametric coefficients

Estimate

  • Std. Error

t-value p-value (Intercept) 8.3759 0.3833 21.8509 < 0.0001 Has transitions from subject to not

  • 0.5349

0.2387

  • 2.2408

0.0250 Has 3rd person possessive pronouns

  • 0.8906

0.2030

  • 4.3873

< 0.0001 Contains to infinitives

  • 0.5541

0.2282

  • 2.4284

0.0152 Uses conjunctional clauses

  • 0.6051

0.3173

  • 1.9074

0.0565 Half modal cluster per VP 0.1831 0.1011 1.8113 0.0701 Log sum non terminal nodes per sentence 1.9714 0.1785 11.0435 < 0.0001 Log avg. type freq. band 2 per types in dlexDB

  • 0.3003

0.1091

  • 2.7528

0.0059

  • Avg. total integration cost at finite verb

0.3705 0.1059 3.4968 0.0005 Lexical types found in dlexDB per lexical type 0.8840 0.0942 9.3858 < 0.0001 Type token ratio 1.2853 0.2038 6.3068 < 0.0001 Log sum non terminal nodes per word

  • 0.7130

0.1598

  • 4.4619

< 0.0001 . . . . . . . . . . . . . . .

  • B. smooth terms

edf Ref.df F-value p-value s(Characters per word) 2.7714 3.5484 18.5670 0.0007 s(Squared number of sentences) 4.6262 5.7193 254.0399 < 0.0001

Table: Summary of interaction model predicting overall CEFR scores. Uses ’demand’ as reference level for task theme.

slide-34
SLIDE 34

19

Study 2

Model Discussion: Measures Similar to Study 1

  • A. parametric coefficients

Estimate

  • Std. Error

t-value p-value (Intercept) 8.3759 0.3833 21.8509 < 0.0001 Has transitions from subject to not

  • 0.5349

0.2387

  • 2.2408

0.0250 Has 3rd person possessive pronouns

  • 0.8906

0.2030

  • 4.3873

< 0.0001 Contains to infinitives

  • 0.5541

0.2282

  • 2.4284

0.0152 Uses conjunctional clauses

  • 0.6051

0.3173

  • 1.9074

0.0565 Half modal cluster per VP 0.1831 0.1011 1.8113 0.0701 Log sum non terminal nodes per sentence 1.9714 0.1785 11.0435 < 0.0001 Log avg. type freq. band 2 per types in dlexDB

  • 0.3003

0.1091

  • 2.7528

0.0059

  • Avg. total integration cost at finite verb

0.3705 0.1059 3.4968 0.0005 Lexical types found in dlexDB per lexical type 0.8840 0.0942 9.3858 < 0.0001 Type token ratio 1.2853 0.2038 6.3068 < 0.0001 Log sum non terminal nodes per word

  • 0.7130

0.1598

  • 4.4619

< 0.0001 . . . . . . . . . . . . . . .

  • B. smooth terms

edf Ref.df F-value p-value s(Characters per word) 2.7714 3.5484 18.5670 0.0007 s(Squared number of sentences) 4.6262 5.7193 254.0399 < 0.0001

Table: Summary of interaction model predicting overall CEFR scores. Uses ’demand’ as reference level for task theme.

slide-35
SLIDE 35

20

Study 2

Classification Results

Set n Pre. Rec. F1 R2 Majority 1 10.3 32.0 15.6 NA IG 100 100 67.9 71.0 68.5 NA GAM 13 72.4 71.6 71.0 74.1 GAM + Task 14+4 73.4 72.7 72.2 76.6

Table: Precision, recall, and F1 score comparison of Study 1 and Study 2 for proficiency classification with 10∗10-CV.

slide-36
SLIDE 36

21

Study 2

Classification Results

Obs.\Prd. A1 A2 B1 B2 C A1 26 30 A2 10 241 53 B1 45 234 49 B2 38 243 10 C 38 8

Table: Averaged confusion matrix for interaction model.

slide-37
SLIDE 37

22

Conclusion

  • Higher classification performance for diverse feature sets (1)
  • Grammatical variedness among the most informative measures (1)
  • Most informative measures cover all linguistic domains (1, 2)
  • Holds for small feature set selections as well as for larger ones (2)

→ Broad linguistic modeling is beneficial for German L2 proficiency assessment

slide-38
SLIDE 38

23

Thank you for your attention! Questions?

slide-39
SLIDE 39

24

References I

Abel, Andrea et al. (2013). merlin: A Trilingual Learner Corpus illustrating European Reference Levels. LRC 2013. Bergen, Norway. Alexopoulou, Theodora et al. (2017). “Task Effects on Linguistic Complexity and Accuracy: A Large-Scale Learner Corpus Analysis Employing Natural Language Processing Techniques”. In: Language Learning, pp. 1–29. Barzilay, Regina and Mirella Lapata (2008). “Modeling local coherence: An entity-based approach”. In: Computational Linguistics 34,

  • pp. 1–34.

Chen, Xiaobin and Detmar Meurers (2016). “CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis”. In: Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity,

  • pp. 113–119.

Crossley, Scott A, Kristopher Kyle, and Danielle S McNamara (2016). “The development and use of cohesive devices in L2 writing and their relations to judgments of essay quality”. In: Journal of Second Language Writing 32, pp. 1–16.

slide-40
SLIDE 40

25

References II

Crossley, Scott A. and Danielle S. McNamara (2012). “Predicting second language writing proficiency: the roles of cohesion and linguistic sophistication”. In: Journal of Research in Reading 35.2, pp. 115–135. Ellis, R. and G. Barkhuizen (2005). Analysing learner language. Oxford: Oxford University Press. Foster, Pauline and Parvaneh Tavakoli (2009). “Native speakers and task performance: Comparing effects on complexity, fluency, and lexical density”. In: Language Learning 59.4, pp. 866–896. Fran¸ cois, Thomas and C´ edrik Fairon (2012). “An ”AI readability” formula for French as a foreign language”. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics. Jeju Island, Korea, pp. 466–477. Frank, Eibe, Mark A. Hall, and Ian H. Witten (2016). The WEKA

  • Workbench. Online Appendix for ”Data Mining: Practical Machine

Learning Tools and Techniques. 4th ed. Morgan Kaufmann.

slide-41
SLIDE 41

26

References III

Gibson, Edward (2000). “The dependency locality theory: A distance-based theory of linguistic complexity”. In: Image, language, brain, pp. 95–126. Hall, Mark et al. (2009). “The WEKA Data Mining Software: An Update”. In: SIGKDD Explorations 11.1. Hancke, Julia, Sowmya Vajjala, and Detmar Meurers (2012). “Readability Classification for German using lexical, syntactic and morphological features”. In: Proceedings of COLING. Mumbai, pp. 1063–1080. Housen, Alex, Ineke Vedder, and Folkert Kuiken (2012). “Document Viewing Options: Title: Dimensions of L2 Performance and Proficiency : Complexity, Accuracy and Fluency in SLA”. In: vol. 32. Language Learning & Language Teaching. Amsterdam, Philadelphia: John Benjamins Publishing. Chap. 1–2. Kuhn, Max et al. (2016). caret: Classification and Regression Training. R package version 6.0-73.

slide-42
SLIDE 42

27

References IV

Kyle, Kristopher (2016). “Measuring Syntactic Development in L2 Writing: Fine Grained Indices of Syntactic Complexity and Usage-Based Indices of Syntactic Sophistication”. PhD thesis. Georgia State University. Lavalley, R´ emi, Kay Berkling, and Sebastian St¨ uker (2015). “Preparing Children’s Writing Database for Automated Processing”. In: Workshop on L1 Teaching, Learning and Technology (L1TLT). Leipzig, Germany, pp. 9–15. Lu, Xiaofei (2011). “The relationship of lexical richness to the quality of esl learners’ oral narratives”. In: The Modern Languages Journal 96.2,

  • pp. 190–208.

Lu, Xiaofei and Haiyang Ai (2015). “Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds”. In: Journal of Second Language Writing 29, pp. 16–27. McCarthy, Philip M. and Scott Jarvis (2007). “A theoretical and empirical evaluation of vocd”. In: Language Testing 24, pp. 459–488. Merlin project (2014a). task desciption: Essay: why it’s of value to learn

  • German. http://merlin-platform.eu/.
slide-43
SLIDE 43

28

References V

Merlin project (2014b). task desciption: Formal letter: apply for internship in sales department. http://merlin-platform.eu/. Merlin project (2014c). task desciption: Formal letter: ask for information at Au pair Agency. http://merlin-platform.eu/. Merlin project (2014d). task desciption: Formal letter: Au pair writes letter of complaint to Agency. http://merlin-platform.eu/. Merlin project (2014e). task desciption: Formal letter to housing office. http://merlin-platform.eu/. Merlin project (2014f). task desciption: Informal e-mail: arrange an appointment with a friend to go swimming together. http://merlin-platform.eu/. Merlin project (2014g). task desciption: Informal e-mail: ask a friend for help with finding an apartment. http://merlin-platform.eu/. Merlin project (2014h). task desciption: Informal letter: ask friend to take care of pet. http://merlin-platform.eu/. Merlin project (2014i). task desciption: Informal letter: birthday

  • congratulations. http://merlin-platform.eu/.
slide-44
SLIDE 44

29

References VI

Merlin project (2014j). task desciption: Informal letter: congratulate to birth of a child. http://merlin-platform.eu/. Merlin project (2014k). task desciption: Informal letter for New Year to a

  • friend. http://merlin-platform.eu/.

Merlin project (2014l). task desciption: Informal letter: offer a ticket not used to a friend. http://merlin-platform.eu/. Merlin project (2014m). task desciption: Informal letter to a friend announcing a visit. http://merlin-platform.eu/. Merlin project (2014n). task desciption: Online article: about sticking to

  • ne’s traditions and ”assimilation” in a new environment.

http://merlin-platform.eu/. Merlin project (2014o). task desciption: Report: about the housing

  • situation. http://merlin-platform.eu/.

Ravid, Dorit and Liliana Tolchinsky (2002). “Developing linguistic literacy: A comprehensive model”. In: Journal of Child Language 29.2,

  • pp. 417–447.
slide-45
SLIDE 45

30

References VII

Rescher, Nicholas (1998). Complexity: A philosophical overview. Transaction Publishers. Robinson, Peter (2001). “Task Complexity, Task Difficulty, and Task Production: Exploring Interactions in a Componential Framework”. In: Applied Linguistics 22.1, pp. 27–57. Shain, Cory et al. (2016). “Memory access during incremental sentence processing causes reading time latency”. In: Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity,

  • pp. 49–58.

Skehan, Peter (1998). A cognitive approach to language learning. Oxford: Oxford University Press. Skehan, Peter and Pauline Foster (1997). “Task type and task processing conditions as influence on foreign language performance”. In: Language Teaching Research 1.3, pp. 185–211. Tavakoli, Parvaneh and Pauline Foster (2011). “Task design and second language performance: The effect of narrative type on learner output”. In: Language Learning 61, pp. 37–72.

slide-46
SLIDE 46

31

References VIII

Todirascu, Amalia et al. (2013). “Coherence and cohesion for the assessment of text readability”. In: Natural Language Processing and Cognitive Science 11, pp. 11–19. Tracy-Ventura, Nicole and Florence Myles (2015). “The importance of task variability in the design of learner corpora for SLA research”. In: International Journal of Learner Corpus Research 1.1, pp. 58–95. van Rij, Jacolien et al. (2016). itsadug: Interpreting Time Series and Autocorrelated Data Using GAMMs. R package version 2.2. Weiß, Zarah Leonie (2017). “Using Measures of Linguistic Complexity to Assess German L2 Proficiency in Learner Corpora under Consideration

  • f Task-Effects”. MA thesis. Eberhard Karls Universit¨

at T¨ ubingen. Wisniewski, Katrin et al. (2013). “MERLIN: An Online Trilingual Learner Corpus Empirically Grounding the European Reference Levels in Authentic Learner Data”. In: ICT for Language Learning 2013. Florence, Italy.

slide-47
SLIDE 47

32

References IX

Wolfe-Quintero, Kate, Shunji Inagaki, and Hae-Young Kim (1998). Second Language Development in Writing: Measures of Fluency, Accuracy & Complexity. Second Language Teaching & Curriculum Center. Wood, Simon N. (2003). “Thin-plate regression splines”. In: Journal of the Royal Statistical Society (B) 65.1, pp. 95–114. Wood, Simon N. (2004). “Stable and efficient multiple smoothing parameter estimation for generalized additive models”. In: Journal of the American Statistical Association 99, pp. 673–686. Wood, Simon N. (2011). “Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models”. In: Journal of the Royal Statistical Society 72.1, pp. 3–36. Yoon, Hyung-Jo and Charlene Polio (2016). “The Linguistic Development of Students of English as a Second Language in Two Written Genres”. In: Tesol Quarterly.