Towards Focus Detection Introduction Project Background in Content - - PowerPoint PPT Presentation

towards focus detection
SMART_READER_LITE
LIVE PREVIEW

Towards Focus Detection Introduction Project Background in Content - - PowerPoint PPT Presentation

Towards Focus Detection in Content Assessment Ramon Ziai Towards Focus Detection Introduction Project Background in Content Assessment Empirical Basis IS in Content Assessment Given/New Focus/Background Annotation Ramon Ziai


slide-1
SLIDE 1

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Towards Focus Detection in Content Assessment

Ramon Ziai

Universit¨ at T¨ ubingen SFB 833, Project A4

Second T¨ uBerlin Workshop on the Analysis of Learner Language T¨ ubingen, Dec 5, 2011

1 / 27

slide-2
SLIDE 2

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Outline

Introduction Project Background Empirical Basis IS in Content Assessment Given/New Focus/Background Annotation Experiment Data Selection Annotation Criteria Results Conclusions

2 / 27

slide-3
SLIDE 3

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

What is Content Assessment?

◮ The task of evaluating an answer to a question with

respect to meaning given a concrete linguistic context.

◮ Context here means: Explicit question and reading text

in Reading Comprehension data

◮ Answers to RC questions are produced by language

learners, language can be ill-formed and requires robust computational processing

3 / 27

slide-4
SLIDE 4

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Content Assessment: A Simple Example

Meurers, Ziai, Ott & Kopp (2011b)

  • er

es-

Q: Was sind die Kritikpunkte, die Leute ¨ uber Hamburg ¨ außern? ‘What are the objections people have about Hamburg?’ TA: Der The Gestank stink von

  • f

Fisch fish und and Schiffsdiesel fuel an

  • n

den the Kais quays . . SA: Der The Geruch smell zon

  • ferr

Fish fisherr und and Schiffsdiesel fuel beim at the Hafen port . .

Spelling SemType Spelling Ident Ident Similarity

4 / 27

slide-5
SLIDE 5

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

CREG

Corpus of Reading Comprehension Exercises in German Meurers, Ott & Ziai (2011a)

◮ Together with Ohio State University and Kansas

University we are collecting a corpus of

◮ reading texts, ◮ questions on the reading texts, ◮ student answers to the questions, and ◮ target answers pre-specified by teachers.

◮ All student answers are rated by two annotators at OSU

and KU with respect to meaning.

◮ Binary: correct, incorrect ◮ Detailed: correct answer, missing concept,

extra concept, blend, non-answer

5 / 27

slide-6
SLIDE 6

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

CREG: Numbers and Sizes

◮ Snapshot from September 6, 2011:

◮ 118 Texts ◮ 752 Questions ◮ 1.059 Target Answers ◮ 20.851 Student Answers 6 / 27

slide-7
SLIDE 7

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Today: IS of answers to RC questions

◮ So far, our approach has no notion of information

required by the question, it only compares student and target answers

◮ Comparable systems such as C-rater (Leacock &

Chodorow 2003) or Willow (P´ erez Marin 2007) also don’t take the question into account

◮ However, questions naturally impose IS requirements

  • n answers

◮ Research questions:

◮ What are the necessary IS notions? Given/New or

Focus/Background?

◮ Can instances of them be identified reliably? 7 / 27

slide-8
SLIDE 8

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

IS in Content Assessment

◮ In order for a system to distinguish between potentially

relevant and irrelevant content, a partitioning of the answers would be useful

◮ The system can only recognize previously mentioned

lexical material so far (simple Givenness)

◮ This simple case of Givenness is easy to detect, but

does it suffice for the task?

8 / 27

slide-9
SLIDE 9

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Given/New: Example where it works

Question: Welche kulturelle Personlichkeit ist f¨ ur die Salzburger am wichtigsten? Target Answer: Mozart [ist f¨ ur die Salzburger am wichtigsten]G. Student Answer: Mozart [ist die kulturelle Pers¨

  • nlichkeit

f¨ ur die Salzburger an wichtigsten]G.

9 / 27

slide-10
SLIDE 10

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Given/New: Why it works

◮ No extraneous, irrelevant new information ◮ [Mozart] is the only new piece of information and also

the correct answer

➥ When new information is identical to requested

information, a Given/New distinction works as intended

◮ What happens in other cases?

10 / 27

slide-11
SLIDE 11

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Given/New: Example where it doesn’t work

Question: An was denken viele Menschen, wenn sie von Weißrussland h¨

  • ren?

Target Answer: [Sie denken an]G die Tschernobyl- Katastrophe von 1986. Student Answer: [Ausl¨ ander denken bei Weißrussland]G weniger an Urlaub, sondern eher an die Tschernobyl- Katastrophe von

  • 1986. Damals explodierten in der Sow-

jetunion Teile eines Atomkraftwerks und wurden einige Regionen Weißrus- slands von der radioaktiven Strahlung verseucht.

11 / 27

slide-12
SLIDE 12

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Given/New: Why it doesn’t work

◮ Lots of new and unrequested information in the student

answer

◮ Relevant information [die Tschernobyl-Katastrophe von

1986] is included, but cannot be distinguished as such

➥ Newness is not an ideal category for requested answer

content.

12 / 27

slide-13
SLIDE 13

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Given/New: Why it really doesn’t work

Extreme case: Alternative questions Question: Ist die Wohnung in einem Neubau oder einem Altbau? Target Answer: [Die Wohnung ist in einem Neubau]G. Student Answer: [Die Wohnung ist in einem Neubau]G

➥ Requested information is Given here, so Newness

doesn’t help at all!

◮ What about Focus/Background?

13 / 27

slide-14
SLIDE 14

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

What kind of focus?

◮ We need a notion of focus that selects the minimal

acceptable answer out of the whole answer content

◮ Pragmatic notion of focus, no relation to prosody layer ◮ Needs to be able to cover phrases, because acceptable

answers rarely consist of single words ➥ Similar concept to “focus phrase” in ??

◮ Needs to correspond to the type of information

requested by the question

14 / 27

slide-15
SLIDE 15

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Focus/Background: Example

Question: An was denken viele Menschen, wenn sie von Weißrussland h¨

  • ren?

Target Answer: Sie denken an [die Tschernobyl- Katastrophe von 1986]F. Student Answer: Ausl¨ ander denken bei Weißrussland [weniger an Urlaub, sondern eher an die Tschernobyl- Katastrophe von 1986]F. Damals explodierten in der Sowjetunion Teile eines Atom- kraftwerks und wurden einige Regio- nen Weißrusslands von der radioak- tiven Strahlung verseucht.

15 / 27

slide-16
SLIDE 16

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Focus/Background: Example 2

Question: Ist die Wohnung in einem Neubau oder einem Altbau? Target Answer: Die Wohnung ist [in einem Neubau]F. Student Answer: Die Wohnung ist [in einem Neubau]F

16 / 27

slide-17
SLIDE 17

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Focus Annotation Experiment

◮ In order to learn about Focus annotation and find out

whether it is feasible, we annotated a selection of question/answer sets in our data

◮ Given a certain level of consistency, this can also be

used as training data for a computational approach later

◮ Previous attempts at annotating focus (e.g. Ritz et al.

2008) were moderately successful

◮ However, we believe that given the explicit questions in

  • ur corpus, our job is easier and thus can be done more

consistently

17 / 27

slide-18
SLIDE 18

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Data Selection

◮ Course level: intermediate and upwards ◮ Only correct answers with agreement between both

raters in the binary decision task

◮ For each question, we chose the longest student

answer ⇒ less chance of the minimal answer being identical to the whole answer

◮ Resulting data set consists of 82 questions, target

answers and student answers

◮ Data set is tokenized but no other preprocessing is done

18 / 27

slide-19
SLIDE 19

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Build on Answer Types

◮ Since we have explicit questions and a correspondence

between focus and requested content, we build our annotation scheme on so-called Answer Types as used in Question Answering literature (e.g. Li & Roth 2002)

◮ An Answer Type is a label for the type of requested

content, we use Entity, Reason, Description, Place, Time, Degree and Polar

◮ Idea: Label the question part the defines the Answer

Type, such as the wh-phrase in wh-questions

◮ Depending on that, label the focused phrase in the

target and student answer with the same type!

19 / 27

slide-20
SLIDE 20

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Annotation Criteria

Focus marking in the answer should

◮ cover the minimal part in the answer that can stand on

its own and answer the question,

◮ ideally separate relevant from irrelevant content with

respect to the question,

◮ obey syntactic borders, such as phrase boundaries, ◮ avoid Given material unless it is necessary to answer

the question The annotation was carried out in the EXMARaLDA tool (Schmidt 2004) independently by two annotators (Thanks to Philip Schulz for annotation and discussion).

20 / 27

slide-21
SLIDE 21

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Annotation Example

21 / 27

slide-22
SLIDE 22

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Quantitative Results: Measures

◮ In order to assess annotation consistency, we

compared both annotators in different ways:

◮ Exact match of focused part, only full matches count ◮ Mean token overlap of focused parts to account for

partial matches

◮ Unlabeled percentage agreement per token ◮ Labeled percentage agreement per token ◮ Cohen’s Kappa on a token basis as a standard

agreement measure

◮ Each of the above was calculated for questions, target

answers and student answers separately.

22 / 27

slide-23
SLIDE 23

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Quantitative Results: Numbers

Span-based Token-based Exact Overlap Unlabeled Labeled

κ

Q 74.4% 85.1% 95.7% 91.8% 0.72 T 47.6% 88.5% 89.9% 79.3% 0.65 S 29.3% 77.4% 86.8% 80.9% 0.57

23 / 27

slide-24
SLIDE 24

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Quantitative Results: Discussion

◮ Classifiying and annotating questions is easier than

consistently identifying the focus in answers

◮ Learner language is harder to annotate than the target

answer material

◮ Agreement measures on a per-token basis are

acceptable ⇒ task seems feasible

◮ Difference between exact match and overlap suggests

that the boundaries of focused material need to be better defined

24 / 27

slide-25
SLIDE 25

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

Conclusions

◮ A Focus/Background distinction is a better fit for the

task of Content Assessment than a Given/New one, because it allows minimal answer marking

◮ Focus marking in Reading Comprehension data

appears to be feasible

◮ Domain of focus marking is still a problem, many

smaller mismatches occur ➥ Constrain focus syntactically according to Answer Type

E.g. an Entity must be an NP

◮ Set of Answer Types needs to be revised ◮ Proper guidelines necessary with detailed description of

conventions

25 / 27

slide-26
SLIDE 26

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

The End

Thank you!

26 / 27

slide-27
SLIDE 27

Towards Focus Detection in Content Assessment

Ramon Ziai

Introduction

Project Background Empirical Basis

IS in Content Assessment

Given/New Focus/Background

Annotation Experiment

Data Selection Annotation Criteria Results

Conclusions References

SFB 833

References

Leacock, C. & M. Chodorow (2003). C-rater: Automated Scoring of Short-Answer Questions. Computers and the Humanities 37, 389–405. Li, X. & D. Roth (2002). Learning Question Classifiers. In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). Taipei, Taiwan, pp. 1–7. Meurers, D., N. Ott & R. Ziai (2011a). On the Creation and Analysis of a Reading Comprehension Exercise Corpus: Evaluating Meaning in Context. In T. Schmidt & K. W¨

  • rner (eds.), Multilingual Corpora and

Multilingual Corpus Analysis, Amsterdam and Philadelphia: John Benjamins, Hamburg Studies in

  • Multilingualism. Submitted.

Meurers, D., R. Ziai, N. Ott & J. Kopp (2011b). Evaluating Answers to Reading Comprehension Questions in Context: Results for German and the Role of Information Structure. In Proceedings of the TextInfer 2011 Workshop on Textual Entailment. Edinburgh, Scotland, UK: Association for Computational Linguistics, pp. 1–9. P´ erez Marin, D. R. (2007). Adaptive Computer Assisted Assessment of free-text students’ answers: an approach to automatically generate students’ conceptual models. Ph.D. thesis, Universidad Autonoma de Madrid. Ritz, J., S. Dipper & M. G¨

  • tze (2008). Annotation of Information Structure: An Evaluation Across Different

Types of Texts. In Proceedings of the 6th International Conference on Language Resources and

  • Evaluation. Marrakech, Morocco, pp. 2137–2142.

Schmidt, T. (2004). Transcribing and annotating spoken language with EXMARaLDA. In Proceedings of the LREC-Workshop on XML based richly annotated corpora, Lisbon 2004. Paris: ELRA. EN. 27 / 27