Russian Morphological Processing for ICALL System architecture - - PowerPoint PPT Presentation

russian morphological processing for icall
SMART_READER_LITE
LIVE PREVIEW

Russian Morphological Processing for ICALL System architecture - - PowerPoint PPT Presentation

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context Russian Morphological Processing for ICALL System architecture Exercise design Error types Morphological Markus Dickinson and Joshua Herring


slide-1
SLIDE 1

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Russian Morphological Processing for ICALL

Markus Dickinson and Joshua Herring

  • Dept. of Linguistics, Indiana University

ACL Workshop on Building Educational Applications Columbus, OH June 19, 2008

1 / 20

slide-2
SLIDE 2

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Introduction & Motivation

Intelligent computer-aided language learning (ICALL) systems are ideal for language pedagogy

◮ provide additional practice outside classroom

◮ aiding awareness of language forms & rules (see Amaral

and Meurers 2006)

2 / 20

slide-3
SLIDE 3

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Introduction & Motivation

Intelligent computer-aided language learning (ICALL) systems are ideal for language pedagogy

◮ provide additional practice outside classroom

◮ aiding awareness of language forms & rules (see Amaral

and Meurers 2006)

However:

◮ Few ICALL systems in existence today

◮ German (Heift and Nicholson 2001) ◮ Portuguese (Amaral and Meurers 2006, 2007) ◮ Japanese (Nagata 1995) 2 / 20

slide-4
SLIDE 4

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Introduction & Motivation

Intelligent computer-aided language learning (ICALL) systems are ideal for language pedagogy

◮ provide additional practice outside classroom

◮ aiding awareness of language forms & rules (see Amaral

and Meurers 2006)

However:

◮ Few ICALL systems in existence today

◮ German (Heift and Nicholson 2001) ◮ Portuguese (Amaral and Meurers 2006, 2007) ◮ Japanese (Nagata 1995)

◮ Processing of ill-formed learner text focuses on a limited

set of languages and language types

◮ See Vandeventer Faltin (2003) and references therein 2 / 20

slide-5
SLIDE 5

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Introduction & Motivation

Intelligent computer-aided language learning (ICALL) systems are ideal for language pedagogy

◮ provide additional practice outside classroom

◮ aiding awareness of language forms & rules (see Amaral

and Meurers 2006)

However:

◮ Few ICALL systems in existence today

◮ German (Heift and Nicholson 2001) ◮ Portuguese (Amaral and Meurers 2006, 2007) ◮ Japanese (Nagata 1995)

◮ Processing of ill-formed learner text focuses on a limited

set of languages and language types

◮ See Vandeventer Faltin (2003) and references therein

⇒ Should expand to more language families

2 / 20

slide-6
SLIDE 6

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Re-usability

Significant overhead in developing an ICALL system

3 / 20

slide-7
SLIDE 7

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Re-usability

Significant overhead in developing an ICALL system Effort in producing an ICALL system can be reduced by:

◮ reusing system architecture

◮ evaluating and optimizing the architecture 3 / 20

slide-8
SLIDE 8

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Re-usability

Significant overhead in developing an ICALL system Effort in producing an ICALL system can be reduced by:

◮ reusing system architecture

◮ evaluating and optimizing the architecture

◮ adapting existing NLP tools

◮ and/or developing resource-light technology 3 / 20

slide-9
SLIDE 9

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Re-usability

Significant overhead in developing an ICALL system Effort in producing an ICALL system can be reduced by:

◮ reusing system architecture

◮ evaluating and optimizing the architecture

◮ adapting existing NLP tools

◮ and/or developing resource-light technology

It is important to determine where and how reuse of technology is appropriate

3 / 20

slide-10
SLIDE 10

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Russian ICALL

We are developing an ICALL system for beginning learners of Russian

4 / 20

slide-11
SLIDE 11

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Russian ICALL

We are developing an ICALL system for beginning learners of Russian

◮ Based on the TAGARELA system for Portuguese

(Amaral and Meurers 2006, 2007)

◮ Q1: How can the technology in TAGARELA can be

adapted for efficient & accurate use with Russian?

4 / 20

slide-12
SLIDE 12

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Russian ICALL

We are developing an ICALL system for beginning learners of Russian

◮ Based on the TAGARELA system for Portuguese

(Amaral and Meurers 2006, 2007)

◮ Q1: How can the technology in TAGARELA can be

adapted for efficient & accurate use with Russian?

◮ Requires development of techniques to parse ill-formed

input for a morphologically-rich language

◮ Q2: What kind of processing do we need, and are

existing NLP tools reusable for this purpose?

4 / 20

slide-13
SLIDE 13

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Russian ICALL

We are developing an ICALL system for beginning learners of Russian

◮ Based on the TAGARELA system for Portuguese

(Amaral and Meurers 2006, 2007)

◮ Q1: How can the technology in TAGARELA can be

adapted for efficient & accurate use with Russian?

◮ Requires development of techniques to parse ill-formed

input for a morphologically-rich language

◮ Q2: What kind of processing do we need, and are

existing NLP tools reusable for this purpose?

◮ Q2a: What is the context for processing (i.e., the

exercise requirements)?

4 / 20

slide-14
SLIDE 14

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Russian ICALL

We are developing an ICALL system for beginning learners of Russian

◮ Based on the TAGARELA system for Portuguese

(Amaral and Meurers 2006, 2007)

◮ Q1: How can the technology in TAGARELA can be

adapted for efficient & accurate use with Russian?

◮ Requires development of techniques to parse ill-formed

input for a morphologically-rich language

◮ Q2: What kind of processing do we need, and are

existing NLP tools reusable for this purpose?

◮ Q2a: What is the context for processing (i.e., the

exercise requirements)?

◮ Q2b: What are the expected types of morphological

errors?

4 / 20

slide-15
SLIDE 15

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

System architecture

From TAGARELA, we retain:

◮ Modular separation of activities from analysis

◮ Each activity type has own directory, to ease in: ◮ loading different kinds of external files (e.g., sound) ◮ calling different processing tools (Amaral 2007) 5 / 20

slide-16
SLIDE 16

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

System architecture

From TAGARELA, we retain:

◮ Modular separation of activities from analysis

◮ Each activity type has own directory, to ease in: ◮ loading different kinds of external files (e.g., sound) ◮ calling different processing tools (Amaral 2007)

◮ Web processing code

◮ e.g., code for handling user logins, design of user

databases (for tracking learner information)

◮ Minimizes amount of online overhead in our system,

allowing us to focus on linguistic processing

5 / 20

slide-17
SLIDE 17

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

System architecture

From TAGARELA, we retain:

◮ Modular separation of activities from analysis

◮ Each activity type has own directory, to ease in: ◮ loading different kinds of external files (e.g., sound) ◮ calling different processing tools (Amaral 2007)

◮ Web processing code

◮ e.g., code for handling user logins, design of user

databases (for tracking learner information)

◮ Minimizes amount of online overhead in our system,

allowing us to focus on linguistic processing

◮ Idea of using annotation-based processing (cf. Amaral

and Meurers 2007).

◮ Before error detection/diagnosis, annotate learner input

with linguistic properties that can be automatically determined

5 / 20

slide-18
SLIDE 18

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Exercise design

Goals of the system:

◮ Support an 8-week “survival” Russian course

◮ Basics of the language ◮ Contextualized practice to support traveling to Russia 6 / 20

slide-19
SLIDE 19

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Exercise design

Goals of the system:

◮ Support an 8-week “survival” Russian course

◮ Basics of the language ◮ Contextualized practice to support traveling to Russia

◮ Cover a range of exercises, all of which require some

morphosyntactic analysis of Russian

◮ listening, video-based narratives, reading practice,

exercises centered around maps and locations, . . .

6 / 20

slide-20
SLIDE 20

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Exercise design

Goals of the system:

◮ Support an 8-week “survival” Russian course

◮ Basics of the language ◮ Contextualized practice to support traveling to Russia

◮ Cover a range of exercises, all of which require some

morphosyntactic analysis of Russian

◮ listening, video-based narratives, reading practice,

exercises centered around maps and locations, . . .

A simple example of a Russian verbal exercise: (1) Вчера vchera Yesterday он

  • n

he __ __ __ (видеть) (videt’) (to see) фильм. fil’m a film

6 / 20

slide-21
SLIDE 21

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Exercise design

Goals of the system:

◮ Support an 8-week “survival” Russian course

◮ Basics of the language ◮ Contextualized practice to support traveling to Russia

◮ Cover a range of exercises, all of which require some

morphosyntactic analysis of Russian

◮ listening, video-based narratives, reading practice,

exercises centered around maps and locations, . . .

A simple example of a Russian verbal exercise: (1) Вчера vchera Yesterday он

  • n

he __ __ __ (видеть) (videt’) (to see) фильм. fil’m a film ⇒ This set-up constrains what types of errors learners are allowed to make

6 / 20

slide-22
SLIDE 22

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (1)

We focus on morphological errors, as these are common across exercises

7 / 20

slide-23
SLIDE 23

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (1)

We focus on morphological errors, as these are common across exercises

  • 1. Inappropriate verb stem

7 / 20

slide-24
SLIDE 24

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (1)

We focus on morphological errors, as these are common across exercises

  • 1. Inappropriate verb stem

1.1 Always inappropriate (spelling error)

◮ Requires some spell-checking technology 7 / 20

slide-25
SLIDE 25

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (1)

We focus on morphological errors, as these are common across exercises

  • 1. Inappropriate verb stem

1.1 Always inappropriate (spelling error)

◮ Requires some spell-checking technology

1.2 Inappropriate for this context

◮ Requires activity model specifying appropriate verbs 7 / 20

slide-26
SLIDE 26

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (1)

We focus on morphological errors, as these are common across exercises

  • 1. Inappropriate verb stem

1.1 Always inappropriate (spelling error)

◮ Requires some spell-checking technology

1.2 Inappropriate for this context

◮ Requires activity model specifying appropriate verbs

External needs: lexicon, spell checker

7 / 20

slide-27
SLIDE 27

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (2)

  • 2. Inappropriate verb affix

8 / 20

slide-28
SLIDE 28

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (2)

  • 2. Inappropriate verb affix

2.1 Always inappropriate (spelling error)

8 / 20

slide-29
SLIDE 29

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (2)

  • 2. Inappropriate verb affix

2.1 Always inappropriate (spelling error) 2.2 Always inappropriate for verbs

◮ ев is an appropriate nominal ending:

(2) *начина-ев begin-??

8 / 20

slide-30
SLIDE 30

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (2)

  • 2. Inappropriate verb affix

2.1 Always inappropriate (spelling error) 2.2 Always inappropriate for verbs

◮ ев is an appropriate nominal ending:

(2) *начина-ев begin-??

2.3 Inappropriate for this verb

◮ ит is for a different verb conjugation:

(3) *начина-ит begin-3s (cf. начина-ет)

8 / 20

slide-31
SLIDE 31

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (2)

  • 2. Inappropriate verb affix

2.1 Always inappropriate (spelling error) 2.2 Always inappropriate for verbs

◮ ев is an appropriate nominal ending:

(2) *начина-ев begin-??

2.3 Inappropriate for this verb

◮ ит is for a different verb conjugation:

(3) *начина-ит begin-3s (cf. начина-ет) External needs: lexicon, spell checker

8 / 20

slide-32
SLIDE 32

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (3)

  • 3. Inappropriate combination of stem and affix

◮ The verb for ’can’ varies between the stems мог and

мож (e.g., мож-ем ’we can’)

(4) *мож-у can-1s (cf. мог-у)

9 / 20

slide-33
SLIDE 33

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (3)

  • 3. Inappropriate combination of stem and affix

◮ The verb for ’can’ varies between the stems мог and

мож (e.g., мож-ем ’we can’)

(4) *мож-у can-1s (cf. мог-у) External needs: lexicon

9 / 20

slide-34
SLIDE 34

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (4)

  • 4. Well-formed word in inappropriate context

10 / 20

slide-35
SLIDE 35

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (4)

  • 4. Well-formed word in inappropriate context

4.1 Inappropriate agreement features

◮ Need to know best analysis in context of verb &

subject

(5) *Я I думает think-3sg

10 / 20

slide-36
SLIDE 36

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (4)

  • 4. Well-formed word in inappropriate context

4.1 Inappropriate agreement features

◮ Need to know best analysis in context of verb &

subject

(5) *Я I думает think-3sg

4.2 Inappropriate verb form (tense, (im)perfective, etc.)

◮ Activity model can often indicate correct form—e.g.,

perfective (completed action) or imperfective

10 / 20

slide-37
SLIDE 37

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (4)

  • 4. Well-formed word in inappropriate context

4.1 Inappropriate agreement features

◮ Need to know best analysis in context of verb &

subject

(5) *Я I думает think-3sg

4.2 Inappropriate verb form (tense, (im)perfective, etc.)

◮ Activity model can often indicate correct form—e.g.,

perfective (completed action) or imperfective

◮ Need to know best analysis in context—e.g., infinitive

verb is governed by a verb selecting for infinitive

10 / 20

slide-38
SLIDE 38

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Expected error types (4)

  • 4. Well-formed word in inappropriate context

4.1 Inappropriate agreement features

◮ Need to know best analysis in context of verb &

subject

(5) *Я I думает think-3sg

4.2 Inappropriate verb form (tense, (im)perfective, etc.)

◮ Activity model can often indicate correct form—e.g.,

perfective (completed action) or imperfective

◮ Need to know best analysis in context—e.g., infinitive

verb is governed by a verb selecting for infinitive

External needs: morphological analyzer, POS tagger

10 / 20

slide-39
SLIDE 39

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Using the error taxonomy

Even for simple exercises, there are a range of errors, requiring new technology

11 / 20

slide-40
SLIDE 40

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Using the error taxonomy

Even for simple exercises, there are a range of errors, requiring new technology

◮ Error types #1 through #3 make no use of context

◮ Only need information from activity model and lexicon

to tell whether the word is valid

◮ Priority is thus to develop or acquire a lexicon 11 / 20

slide-41
SLIDE 41

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Using the error taxonomy

Even for simple exercises, there are a range of errors, requiring new technology

◮ Error types #1 through #3 make no use of context

◮ Only need information from activity model and lexicon

to tell whether the word is valid

◮ Priority is thus to develop or acquire a lexicon

◮ Error type #4 requires contextual information, as the

words are well-formed

◮ Requires morphological analysis, based on a lexicon ◮ Ideally, the lexicon design should be integrated with

morphological analysis

11 / 20

slide-42
SLIDE 42

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Using the error taxonomy

Even for simple exercises, there are a range of errors, requiring new technology

◮ Error types #1 through #3 make no use of context

◮ Only need information from activity model and lexicon

to tell whether the word is valid

◮ Priority is thus to develop or acquire a lexicon

◮ Error type #4 requires contextual information, as the

words are well-formed

◮ Requires morphological analysis, based on a lexicon ◮ Ideally, the lexicon design should be integrated with

morphological analysis

◮ No category for argument structure misuse or word order

variation as these are syntactic errors, not morphological

11 / 20

slide-43
SLIDE 43

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Morphological analysis

Annotation of input must be able to determine morphological properties, independent of surrounding context

12 / 20

slide-44
SLIDE 44

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Morphological analysis

Annotation of input must be able to determine morphological properties, independent of surrounding context

◮ We cannot assume well-formed input, as traditional

morphological analyzers do

12 / 20

slide-45
SLIDE 45

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Morphological analysis

Annotation of input must be able to determine morphological properties, independent of surrounding context

◮ We cannot assume well-formed input, as traditional

morphological analyzers do

◮ We need ready access to alternative analyses, especially

for learner innovations (6) душ-у soul-N.acc? *shower-V.1s?

12 / 20

slide-46
SLIDE 46

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Morphological analysis

Annotation of input must be able to determine morphological properties, independent of surrounding context

◮ We cannot assume well-formed input, as traditional

morphological analyzers do

◮ We need ready access to alternative analyses, especially

for learner innovations (6) душ-у soul-N.acc? *shower-V.1s?

◮ We need easy implementation of activity-specific

heuristics, e.g., weight analyses

12 / 20

slide-47
SLIDE 47

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Morphological analysis

Annotation of input must be able to determine morphological properties, independent of surrounding context

◮ We cannot assume well-formed input, as traditional

morphological analyzers do

◮ We need ready access to alternative analyses, especially

for learner innovations (6) душ-у soul-N.acc? *shower-V.1s?

◮ We need easy implementation of activity-specific

heuristics, e.g., weight analyses Finite State Morphology is ideal for this purpose (see, e.g., Roark and Sproat 2007)

12 / 20

slide-48
SLIDE 48

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

The nature of the lexicon

Goal: Accurately obtain partial information from well-formed and ill-formed input

13 / 20

slide-49
SLIDE 49

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

The nature of the lexicon

Goal: Accurately obtain partial information from well-formed and ill-formed input Proposal: Use a fully-specified lexicon, implemented as a Finite State Transducer (FST), indexed by both word edges

13 / 20

slide-50
SLIDE 50

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

The nature of the lexicon

Goal: Accurately obtain partial information from well-formed and ill-formed input Proposal: Use a fully-specified lexicon, implemented as a Finite State Transducer (FST), indexed by both word edges

◮ Russian morphological information is at word edges—i.e.,

prefixes and suffixes

◮ Analysis proceeds by working inwards, one character at

a time, beginning at each end of an input item

13 / 20

slide-51
SLIDE 51

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Lexical chains

Specifically, morphological endings are stored as separate chains, attached to the main chain as appropriate

14 / 20

slide-52
SLIDE 52

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Lexical chains

Specifically, morphological endings are stored as separate chains, attached to the main chain as appropriate

◮ Read symbols from input string one at a time, building a

set of hypotheses about the proper analysis

14 / 20

slide-53
SLIDE 53

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Lexical chains

Specifically, morphological endings are stored as separate chains, attached to the main chain as appropriate

◮ Read symbols from input string one at a time, building a

set of hypotheses about the proper analysis

◮ set of legal continuations of the current string ◮ set of continuations that can be obtained through

application of a repair operation (insert, delete, etc.)

14 / 20

slide-54
SLIDE 54

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Lexical chains

Specifically, morphological endings are stored as separate chains, attached to the main chain as appropriate

◮ Read symbols from input string one at a time, building a

set of hypotheses about the proper analysis

◮ set of legal continuations of the current string ◮ set of continuations that can be obtained through

application of a repair operation (insert, delete, etc.)

Consider дума-ю (‘think-1sg’):

◮ Up to morpheme boundary, identical to some form of

дума (duma), ‘parliament’

14 / 20

slide-55
SLIDE 55

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Lexical chains

Specifically, morphological endings are stored as separate chains, attached to the main chain as appropriate

◮ Read symbols from input string one at a time, building a

set of hypotheses about the proper analysis

◮ set of legal continuations of the current string ◮ set of continuations that can be obtained through

application of a repair operation (insert, delete, etc.)

Consider дума-ю (‘think-1sg’):

◮ Up to morpheme boundary, identical to some form of

дума (duma), ‘parliament’

◮ At hypothesized boundary, both competing hypotheses

(‘think’ and ‘parliament’) are possible

◮ For ‘think’, continuing to ю is legal ◮ For ‘parliament’, continuing to ю requires a repair 14 / 20

slide-56
SLIDE 56

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Information for feedback

As it changes state, the transducer will add information to the current set of analyses:

15 / 20

slide-57
SLIDE 57

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Information for feedback

As it changes state, the transducer will add information to the current set of analyses:

◮ Append input symbol to output

15 / 20

slide-58
SLIDE 58

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Information for feedback

As it changes state, the transducer will add information to the current set of analyses:

◮ Append input symbol to output

◮ Add morphological features, generally when a transition

crosses a morphological boundary

15 / 20

slide-59
SLIDE 59

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Information for feedback

As it changes state, the transducer will add information to the current set of analyses:

◮ Append input symbol to output

◮ Add morphological features, generally when a transition

crosses a morphological boundary

◮ Add corrections on the input string, when phonological

processes have been misapplied

15 / 20

slide-60
SLIDE 60

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Information for feedback

As it changes state, the transducer will add information to the current set of analyses:

◮ Append input symbol to output

◮ Add morphological features, generally when a transition

crosses a morphological boundary

◮ Add corrections on the input string, when phonological

processes have been misapplied

Hypothesizing morpheme boundaries means we can:

◮ segment word into its likely component parts

15 / 20

slide-61
SLIDE 61

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Information for feedback

As it changes state, the transducer will add information to the current set of analyses:

◮ Append input symbol to output

◮ Add morphological features, generally when a transition

crosses a morphological boundary

◮ Add corrections on the input string, when phonological

processes have been misapplied

Hypothesizing morpheme boundaries means we can:

◮ segment word into its likely component parts ◮ analyze each part independently of the others

◮ e.g., ignore an erroneous morpheme while identifying an

adjoining correct morpheme

15 / 20

slide-62
SLIDE 62

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Efficiency

Is fully specifying every word wasteful of memory?

16 / 20

slide-63
SLIDE 63

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Efficiency

Is fully specifying every word wasteful of memory?

◮ Since the lexicon is an FST, sections shared across forms

will only be stored once

◮ stems which require such affixes simply point to them 16 / 20

slide-64
SLIDE 64

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Efficiency

Is fully specifying every word wasteful of memory?

◮ Since the lexicon is an FST, sections shared across forms

will only be stored once

◮ stems which require such affixes simply point to them

◮ Added advantage: analyzer operating over FST lexicon

retains explicit knowledge of state

16 / 20

slide-65
SLIDE 65

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Efficiency

Is fully specifying every word wasteful of memory?

◮ Since the lexicon is an FST, sections shared across forms

will only be stored once

◮ stems which require such affixes simply point to them

◮ Added advantage: analyzer operating over FST lexicon

retains explicit knowledge of state

◮ easy to entertain competing analyses (´

Cavar 2008)

16 / 20

slide-66
SLIDE 66

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Efficiency

Is fully specifying every word wasteful of memory?

◮ Since the lexicon is an FST, sections shared across forms

will only be stored once

◮ stems which require such affixes simply point to them

◮ Added advantage: analyzer operating over FST lexicon

retains explicit knowledge of state

◮ easy to entertain competing analyses (´

Cavar 2008)

◮ easy to return to previous points in an analysis to resolve

ambiguities (cf., e.g., Beesley and Karttunen 2003)

16 / 20

slide-67
SLIDE 67

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Efficiency

Is fully specifying every word wasteful of memory?

◮ Since the lexicon is an FST, sections shared across forms

will only be stored once

◮ stems which require such affixes simply point to them

◮ Added advantage: analyzer operating over FST lexicon

retains explicit knowledge of state

◮ easy to entertain competing analyses (´

Cavar 2008)

◮ easy to return to previous points in an analysis to resolve

ambiguities (cf., e.g., Beesley and Karttunen 2003)

The error taxonomy prevents all possible paths from being simultaneously entertained

16 / 20

slide-68
SLIDE 68

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Sketch of error detection

Analyzer will try to build a path based on information it has

17 / 20

slide-69
SLIDE 69

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Sketch of error detection

Analyzer will try to build a path based on information it has

◮ Inappropriate ending for a verb

(7) *начина-ев begin-??

17 / 20

slide-70
SLIDE 70

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Sketch of error detection

Analyzer will try to build a path based on information it has

◮ Inappropriate ending for a verb

(7) *начина-ев begin-??

◮ Analyzers working from both directions will find same

morpheme boundary

17 / 20

slide-71
SLIDE 71

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Sketch of error detection

Analyzer will try to build a path based on information it has

◮ Inappropriate ending for a verb

(7) *начина-ев begin-??

◮ Analyzers working from both directions will find same

morpheme boundary

◮ Analysis of начина- and of -ев are easily identified as

incompatible

17 / 20

slide-72
SLIDE 72

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Sketch of error detection

Analyzer will try to build a path based on information it has

◮ Inappropriate ending for this verb

(7) *начина-ит begin-3s (cf. начина-ет)

17 / 20

slide-73
SLIDE 73

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Sketch of error detection

Analyzer will try to build a path based on information it has

◮ Inappropriate ending for this verb

(7) *начина-ит begin-3s (cf. начина-ет)

◮ Analyzers working from both directions will find same

morpheme boundary

17 / 20

slide-74
SLIDE 74

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Sketch of error detection

Analyzer will try to build a path based on information it has

◮ Inappropriate ending for this verb

(7) *начина-ит begin-3s (cf. начина-ет)

◮ Analyzers working from both directions will find same

morpheme boundary

◮ Analysis of начина- and of -ит do not match in features

◮ Morphological information from affix will enable the

repair operation substitution to find the right continuation.

17 / 20

slide-75
SLIDE 75

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Constructing the Lexicon

◮ Lexicon generation can be done semi-automatically

18 / 20

slide-76
SLIDE 76

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Constructing the Lexicon

◮ Lexicon generation can be done semi-automatically ◮ We need:

◮ Freely-available corpus (Sharoff et al. 2008) ◮ A handful of inflected forms to derive common

morphological paradigms

◮ Unsupervised morphology learner like Linguistica

(Goldsmith and Hu 2004)

18 / 20

slide-77
SLIDE 77

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Summary & Outlook

Summary:

◮ An FST lexicon provides a way to do morphological

error analysis on learner language in Russian that is:

  • 1. easily optimizable for learner environments
  • 2. accurate without sacrificing generality
  • 3. flexible enough to detect even unanticipated errors

◮ We believe this approach is applicable to a number of

languages

19 / 20

slide-78
SLIDE 78

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Summary & Outlook

Summary:

◮ An FST lexicon provides a way to do morphological

error analysis on learner language in Russian that is:

  • 1. easily optimizable for learner environments
  • 2. accurate without sacrificing generality
  • 3. flexible enough to detect even unanticipated errors

◮ We believe this approach is applicable to a number of

languages Next Steps:

  • 1. Construction of lexicon for small subset of the language

relevant to our exercises

  • 2. Performing/testing error detection and diagnosis on top
  • f the linguistic analysis
  • 3. Addition of linguistic analysis beyond the word level,
  • perating in parallel with the morphological analyzer

19 / 20

slide-79
SLIDE 79

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Acknowledgments

We would like to thank

◮ Detmar Meurers and Luiz Amaral for providing us with

the TAGARELA sourcecode & insights into ICALL systems

◮ Anna Feldman and Jirka Hana for advice on Russian

resources

◮ Two anonymous reviewers for insightful comments

This research was supported by grant P116S070001 through the U.S. Department of Education’s Fund for the Improvement of Postsecondary Education.

20 / 20

slide-80
SLIDE 80

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

References

Amaral, Luiz (2007). Designing Intelligent Language Tutoring Systems: integrating Natural Language Processing technology into foreign language

  • teaching. Ph.D. thesis, The Ohio State University.

Amaral, Luiz and Detmar Meurers (2006). Where does ICALL Fit into Foreign Language Teaching? Talk given at CALICO Conference. University of Hawaii, http://purl.org/net/icall/handouts/calico06-amaral-meurers.pdf. Amaral, Luiz and Detmar Meurers (2007). Putting activity models in the driver’s seat: Towards a demand-driven NLP architecture for ICALL. Talk given at EUROCALL. University of Ulster, Coleraine Campus, http://purl.org/net/icall/handouts/eurocall07-amaral-meurers.pdf. Beesley, Kenneth R. and Lauri Karttunen (2003). Finite State Morphology. CSLI Publications. Ćavar, Damir (2008). The Croatian Language Repository: Quantitative and Qualitative Resources for Linguistic Research and Language Technologies. Invited talk, Indiana University Department of Lingistics, January 2008. Clemenceau, David (1997). Finite-State Morphology: Inflections and Derivations in a Singl e Framework Using Dictionaries and Rules. In Emmanuel Roche and Yves Schabes (eds.), Finite State Language Processing, The MIT Press. Goldsmith, John and Yu Hu (2004). From Signatures to Finite State

  • Automata. In Midwest Computational Linguistics Colloquium (MCLC-04).

Bloomington, IN.

20 / 20

slide-81
SLIDE 81

Russian Morphological Processing for ICALL Introduction & Motivation ICALL context

System architecture Exercise design Error types

Morphological analysis

Lexicon Error detection

Constructing the Lexicon Summary & Outlook References

Heift, Trude and Devlan Nicholson (2001). Web delivery of adaptive and interactive language tutoring. International Journal of Artificial Intelligence in Education 12(4), 310–325. Koskenniemi, Kimmo (1983). Two-level morphology: a general computational model for word-fo rm recognition and production. Ph.D. thesis, University

  • f Helsinki.

Murray, Janet H. (1995). Lessons Learned from the Athena Language Learning Project: Using Natural Language Processing, Graphics, Speech Processing, and Interactive Video for Communication-Based Language

  • Learning. In V. Melissa Holland, Michelle R. Sams and Jonathan D.

Kaplan (eds.), Intelligent Language Tutors: Theory Shaping Technology, Lawrence Erlbaum Associates, chap. 13, pp. 243–256. Nagata, Noriko (1995). An Effective Application of Natural Language Processing in Second Language Instruction. CALICO Journal 13(1), 47–67. Roark, Brian and Richard Sproat (2007). Computational Approaches to Morphology and Syntax. Oxford University Press. Sharoff, Serge, Mikhail Kopotev, Tomaž Erjavec, Anna Feldman and Dagmar Divjak (2008). Designing and evaluating Russian tagsets. In Proceedings

  • f LREC 2008. Marrakech.

Vandeventer Faltin, Anne (2003). Syntactic error diagnosis in the context of computer assisted language learning. Thèse de doctorat, Université de Genève, Genève.

20 / 20