Roadmap On annotating On annotating learner corpora learner - PowerPoint PPT Presentation

ICALL: Part IV ICALL: Part IV Roadmap On annotating On annotating learner corpora learner corpora Detmar Meurers Detmar Meurers Intelligent Computer-Assisted Language Learning Universit¨ at T¨ ubingen Universit¨ at T¨ ubingen Learner Corpora Learner Corpora Part IV: On Annotating Learner Corpora ◮ Which role can learner corpora play in Foreign Language Why they’re useful Why they’re useful On compiling learner corpora On compiling learner corpora Teaching & Second Language Acquisition (SLA) research? Why annotate corpora Why annotate corpora Data in SLA research Data in SLA research Error annotation & beyond ◮ Why is linguistic annotation relevant? Error annotation & beyond Error annotation Error annotation Detmar Meurers Linguistic Annotation Linguistic Annotation ◮ How can high quality annotation be obtained? (Universit¨ at T¨ ubingen) Annotation quality Annotation quality Why it’s important Why it’s important ◮ Corpus Representation: A Concrete Case DECCA: Variation n-gram DECCA: Variation n-gram error detection error detection ◮ The NOCE (NOn-native Corpus of English) learner corpus A Concrete Case A Concrete Case based on joint research with NOCE Corpus NOCE Corpus ◮ XML and TEI representation of the annotated corpus Luiz Amaral, Holger Wunsch, Ana D´ ıaz-Negrillo, Salvador Valera; cf. also: Linguistic Information Linguistic Information Tokenization ◮ Towards linguistic annotation of NOCE Tokenization D´ ıaz-Negrillo/Meurers/Valera/Wunsch (2009): Towards interlanguage POS-Tagging POS-Tagging POS annotation for effective learner corpora in SLA and FLT . Representation: XML, TEI Representation: XML, TEI ◮ Analyzing learner language: Automatic POS-Tagging Automatic POS-Tagging http://purl.org/dm/papers/diaz-negrillo-et-al-09.html Analyzing learner ◮ sources of evidence for POS annotation Analyzing learner language language ◮ mismatches in combining evidence Sources of Evidence Sources of Evidence Mismatching Evidence Mismatching Evidence European Summer School in Language, Logic, and Information Mismatch-free errors Mismatch-free errors Bordeaux. July 27–31, 2009 Conclusion Conclusion 1 / 46 2 / 46 ICALL: Part IV ICALL: Part IV Learner Corpora On compiling learner corpora On annotating On annotating learner corpora learner corpora Detmar Meurers Detmar Meurers Universit¨ at T¨ ubingen Universit¨ at T¨ ubingen ◮ Many current learner language corpora consist of essays. Learner Corpora Learner Corpora Why they’re useful Why they’re useful ◮ Yet learners produce language in a wide range of On compiling learner corpora On compiling learner corpora Why annotate corpora Why annotate corpora ◮ Learner corpora can serve contexts, naturalistic or instructed, e.g., Data in SLA research Data in SLA research Error annotation & beyond Error annotation & beyond ◮ email and chat messages ◮ as a teaching resource for Foreign Language Teaching Error annotation Error annotation Linguistic Annotation ◮ answering reading or listening comprehension questions Linguistic Annotation materials design, Annotation quality Annotation quality ◮ provide insights into typical student needs, and ◮ asking questions in information gap activities Why it’s important Why it’s important ◮ contribute an empirical basis for theories of Second DECCA: Variation n-gram DECCA: Variation n-gram error detection error detection ⇒ To obtain corpora representative of learner language, it Language Acquisition. A Concrete Case A Concrete Case NOCE Corpus is important to include language produced in a variety NOCE Corpus ◮ Depending on the corpus composition, it can support Linguistic Information Linguistic Information of contexts, ideally also including longitudinal data. Tokenization Tokenization qualitative and quantitative analysis of examples found POS-Tagging POS-Tagging Representation: XML, TEI ◮ Including explicit task contexts in the meta-information Representation: XML, TEI Automatic POS-Tagging Automatic POS-Tagging of a corpus can also provide constraining information Analyzing learner Analyzing learner language useful for interpreting learner language. language Sources of Evidence Sources of Evidence ◮ e.g., it’s easier to infer what a learner wanted to say if Mismatching Evidence Mismatching Evidence Mismatch-free errors Mismatch-free errors one knows the text they are answering questions about. Conclusion Conclusion 3 / 46 4 / 46

ICALL: Part IV ICALL: Part IV Annotation of Learner Corpora Annotation of Learner Corpora (cont.) On annotating On annotating learner corpora learner corpora Detmar Meurers Detmar Meurers Universit¨ at T¨ ubingen Universit¨ at T¨ ubingen ◮ Example: Finding all sentences containing modal verbs Learner Corpora Learner Corpora Why they’re useful Why they’re useful using only the surface forms is possible, but involves On compiling learner corpora On compiling learner corpora ◮ Effective querying of corpora for specific phenomena Why annotate corpora Why annotate corpora specifying a long list of all forms of all modal verbs. Data in SLA research Data in SLA research often requires reference to corpus annotation. Error annotation & beyond Error annotation & beyond ◮ Even so, sentences where can is not actually a modal Error annotation Error annotation Linguistic Annotation Linguistic Annotation ◮ To find relevant classes of examples, the terminology would be wrongly identified: Annotation quality Annotation quality Why it’s important Why it’s important used to single out learner language aspects of interest (1) Pass me a can of beer. DECCA: Variation n-gram DECCA: Variation n-gram error detection error detection needs to be mapped to instances in the corpus (2) I can tuna for a living. A Concrete Case A Concrete Case (Meurers 2005; Meurers & M¨ uller 2009). NOCE Corpus NOCE Corpus Linguistic Information ◮ Many search patterns cannot be specified in finite form, Linguistic Information Tokenization Tokenization ◮ Annotations function as an index to classes of data POS-Tagging POS-Tagging e.g, finding all sentences with past participle verbs. Representation: XML, TEI Representation: XML, TEI which cannot easily be identified in the surface form. Automatic POS-Tagging Automatic POS-Tagging ◮ What type of learner language annotations are needed Analyzing learner Analyzing learner language language to support the searches for the data which are Sources of Evidence Sources of Evidence Mismatching Evidence Mismatching Evidence important for FLT and SLA research? Mismatch-free errors Mismatch-free errors Conclusion Conclusion 5 / 46 6 / 46 ICALL: Part IV ICALL: Part IV Data in SLA research Data in SLA research On annotating On annotating learner corpora learner corpora Clahsen & Muysken (1986) Kanno (1997), P´ erez-Lerroux & Glass (1997) Detmar Meurers Detmar Meurers Universit¨ at T¨ ubingen Universit¨ at T¨ ubingen Learner Corpora Learner Corpora ◮ They studied word order acquisition in German by ◮ They studied the use of overt and null pronouns by Why they’re useful Why they’re useful On compiling learner corpora On compiling learner corpora native speakers of Romance languages non-native speakers of Japanese and Spanish. Why annotate corpora Why annotate corpora Data in SLA research Data in SLA research Error annotation & beyond Error annotation & beyond ◮ Stages of acquisition: ◮ Examples: Error annotation Error annotation Linguistic Annotation Linguistic Annotation 1. S (Aux) V O 4. XP V[+fin] S O Annotation quality (3) Nadie dice que ´ el ganar´ a el premio. Annotation quality 2. (AdvP/PP) S (Aux) V O 5. S V[+fin] (Adv) O Why it’s important Why it’s important nobody says that he will win the prize DECCA: Variation n-gram DECCA: Variation n-gram 3. S V[+fin] O V[-fin] 6. dass S O V[+fin] error detection error detection ‘Nobody i says that he ∗ i / j will win the prize.’ A Concrete Case A Concrete Case Stage 2 example: Fr¨ uher ich kannte den Mann NOCE Corpus NOCE Corpus (4) Nadie dice que ganar´ a el premio. Linguistic Information Linguistic Information earlier AdvP I S knew V [the man] O Tokenization Tokenization nobody says that pro will win the prize POS-Tagging POS-Tagging Representation: XML, TEI Representation: XML, TEI Stage 4 example: Fr¨ uher kannte ich den Mann ‘Nobody i says that he i / j will win the prize.’ Automatic POS-Tagging Automatic POS-Tagging earlier AdvP knew V [+ fin ] I S [the man] O Analyzing learner Analyzing learner ◮ How is the data characterized? language language Sources of Evidence Sources of Evidence ◮ How is the data characterized? ◮ syntactic functions and semantic relations Mismatching Evidence Mismatching Evidence Mismatch-free errors Mismatch-free errors ◮ lexical and syntactic categories and functions ◮ not overtly expressed but interpreted elements Conclusion Conclusion 7 / 46 8 / 46

Roadmap On annotating On annotating learner corpora learner - PowerPoint PPT Presentation

ICALL: Part IV ICALL: Part IV Roadmap On annotating On annotating learner corpora learner corpora Detmar Meurers Detmar Meurers Intelligent Computer-Assisted Language Learning Universit at T ubingen Universit at T ubingen

ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap

FIFE Roadmap Workshop Mike Kirby FIFE Roadmap Workshop Dec 5, 2017 FIFE Roadmap Workshop The

Defining Encryption Lecture 2 1 Roadmap 2 Roadmap First, Symmetric Key Encryption 2 Roadmap

Roadmap: Transition to College Professor Doug Szajda Roadmap, Fall 2017 Welcome This is my

Partnering with Missouri Communities: Roadmap to Resilience Webinar 2: Roadmap Action Steps 1-3

unification 2016 unification strategic roadmap succession unification strategic roadmap

2015 Roadmap Update Maddy Thompson | Randy Spaulding Senate Higher Education Committee January

ICE Roadmap Japanese STAR Conference Richard Johns Introduction Top-Level Roadmap

THE SECTORAL ROADMAP FOR RUBBER PRODUCTS dongcubillas 1 To elicit feedback on the content of

Plug & Abandonment Forum (PAF) Desired P&A direction - P&A Technology Roadmap Martin

2018 1MORE PRODUCT ROADMAP 2018 ROADMAP Dual Driver TWS Triple BT BT IE iBFree 2 Gaming

Towards a roadmap for Towards a roadmap for standardization in standardization in language

The Philippine Housing Industry Roadmap: 2012-2030 BOI PRESENTATION By the Subdivision &

New Data Protection law - Impact on the University LSBUs compliance roadmap Roadmap -

Date: 25 th October 2016 IT-BPM Transformation Journey Through 3 Industry Roadmaps Roadmap

eCommerce Roadmap with Focus on Logistics @ Swiss Post Baumberger Fabian Business Development

1 Lets watch a video clip of the American SpaceX CRS-12 Falcon 9 rocket, launching to the

S3K Seeking Statement-Supporting top-K Witnesses Steffen Metzger, Shady Elbassuoni, Katja

HOSPITALITY GRADUATION an Aussie Rules football. As part On Tuesday, Feb. 19, five students from

Defj nition 06.29.10 | | English 1301: Com position & Rhetoric I || D. Glen Sm ith,

Ne Neural T Text Ge Generation f from S Struct ctured Da Data wi with h Appl Application

Lecturer: Dr. Kingsley Nyarko , Department of Psychology Contact Information: knyarko@ug.edu.gh

5. Leading 5.1 Leadership Versus Management 5.2 Transactional Leadership 5.3 Transformational

Roots of Procedural Fairness A tale of two inclinations via Mark L oczy Jeffrey Goldberg

Roadmap On annotating On annotating learner corpora learner - PowerPoint PPT Presentation

ICALL: Part IV ICALL: Part IV Roadmap On annotating On annotating learner corpora learner corpora Detmar Meurers Detmar Meurers Intelligent Computer-Assisted Language Learning Universit at T ubingen Universit at T ubingen

ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap

FIFE Roadmap Workshop Mike Kirby FIFE Roadmap Workshop Dec 5, 2017 FIFE Roadmap Workshop The

Defining Encryption Lecture 2 1 Roadmap 2 Roadmap First, Symmetric Key Encryption 2 Roadmap

Roadmap: Transition to College Professor Doug Szajda Roadmap, Fall 2017 Welcome This is my

Partnering with Missouri Communities: Roadmap to Resilience Webinar 2: Roadmap Action Steps 1-3

unification 2016 unification strategic roadmap succession unification strategic roadmap

2015 Roadmap Update Maddy Thompson | Randy Spaulding Senate Higher Education Committee January

ICE Roadmap Japanese STAR Conference Richard Johns Introduction Top-Level Roadmap

THE SECTORAL ROADMAP FOR RUBBER PRODUCTS dongcubillas 1 To elicit feedback on the content of

Plug &amp; Abandonment Forum (PAF) Desired P&amp;A direction - P&amp;A Technology Roadmap Martin

2018 1MORE PRODUCT ROADMAP 2018 ROADMAP Dual Driver TWS Triple BT BT IE iBFree 2 Gaming

Towards a roadmap for Towards a roadmap for standardization in standardization in language

The Philippine Housing Industry Roadmap: 2012-2030 BOI PRESENTATION By the Subdivision &amp;

New Data Protection law - Impact on the University LSBUs compliance roadmap Roadmap -

Date: 25 th October 2016 IT-BPM Transformation Journey Through 3 Industry Roadmaps Roadmap

eCommerce Roadmap with Focus on Logistics @ Swiss Post Baumberger Fabian Business Development

1 Lets watch a video clip of the American SpaceX CRS-12 Falcon 9 rocket, launching to the

S3K Seeking Statement-Supporting top-K Witnesses Steffen Metzger*, Shady Elbassuoni*, Katja

HOSPITALITY GRADUATION an Aussie Rules football. As part On Tuesday, Feb. 19, five students from

Defj nition 06.29.10 | | English 1301: Com position &amp; Rhetoric I || D. Glen Sm ith,

Ne Neural T Text Ge Generation f from S Struct ctured Da Data wi with h Appl Application

Lecturer: Dr. Kingsley Nyarko , Department of Psychology Contact Information: knyarko@ug.edu.gh

5. Leading 5.1 Leadership Versus Management 5.2 Transactional Leadership 5.3 Transformational

Roots of Procedural Fairness A tale of two inclinations via Mark L oczy Jeffrey Goldberg

Plug & Abandonment Forum (PAF) Desired P&A direction - P&A Technology Roadmap Martin

The Philippine Housing Industry Roadmap: 2012-2030 BOI PRESENTATION By the Subdivision &

S3K Seeking Statement-Supporting top-K Witnesses Steffen Metzger, Shady Elbassuoni, Katja

Defj nition 06.29.10 | | English 1301: Com position & Rhetoric I || D. Glen Sm ith,