Turkish morphology in WebLicht ar ltekin University of Tbingen - - PowerPoint PPT Presentation

turkish morphology in weblicht
SMART_READER_LITE
LIVE PREVIEW

Turkish morphology in WebLicht ar ltekin University of Tbingen - - PowerPoint PPT Presentation

Turkish morphology in WebLicht ar ltekin University of Tbingen Seminar fr Sprachwissenschaft SFCM 2015 Turkish NLP in WebLicht environment Turkish morphology with a single example Turkish NLP pipeline in WebLicht This short


slide-1
SLIDE 1

Turkish morphology in WebLicht

Çağrı Çöltekin

University of Tübingen Seminar für Sprachwissenschaft

SFCM 2015

slide-2
SLIDE 2

Turkish NLP in WebLicht environment Turkish morphology with a single example

Turkish NLP pipeline in WebLicht

▶ Tokenization ▶ Morphological analysis ▶ Morphological disambiguation ▶ Dependency parsing

This short talk is only about some of the challenges in Turkish NLP because of the morphological complexity.

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 1 / 8

slide-3
SLIDE 3

Turkish NLP in WebLicht environment Turkish morphology with a single example

Turkish NLP pipeline in WebLicht

▶ Tokenization ▶ Morphological analysis ▶ Morphological disambiguation ▶ Dependency parsing

This short talk is only about some of the challenges in Turkish NLP because of the morphological complexity.

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 1 / 8

slide-4
SLIDE 4

Turkish NLP in WebLicht environment Turkish morphology with a single example

The classical example

İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz

‘You were (evidentially) one of those who we may not be able to convert to an Istanbulite’

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 2 / 8

slide-5
SLIDE 5

Turkish NLP in WebLicht environment Turkish morphology with a single example

Productive derivational morphology

İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz

  • lu makes adjectives/nouns from nouns

▶ İstanbul-lu ‘someone from Istanbul’ ▶ Stuttgart-lı ‘someone from Stuttgart’

  • laş makes verbs from adjectives/nouns, with the meaning ‘to

become ...’

▶ İstanbul-lu-laş- ‘to become an Istanbulite’ ▶ diktatör-leş- ‘to become a dictator’

Some challenges: A lexicon of all derived words is not feasible Ambiguity: the same suffjx may have both lexicalized and productive usage Some suffjxes repeat (göz-lük-lük ‘place for eye glasses’, göz-lük-çü-lük ‘profession of making or selling eye glasses’) :

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 3 / 8

slide-6
SLIDE 6

Turkish NLP in WebLicht environment Turkish morphology with a single example

Productive derivational morphology

İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz

  • lu makes adjectives/nouns from nouns

▶ İstanbul-lu ‘someone from Istanbul’ ▶ Stuttgart-lı ‘someone from Stuttgart’

  • laş makes verbs from adjectives/nouns, with the meaning ‘to

become ...’

▶ İstanbul-lu-laş- ‘to become an Istanbulite’ ▶ diktatör-leş- ‘to become a dictator’

Some challenges:

▶ A lexicon of all derived words is not feasible ▶ Ambiguity: the same suffjx may have both lexicalized and

productive usage

▶ Some suffjxes repeat (göz-lük-lük ‘place for eye glasses’,

göz-lük-çü-lük ‘profession of making or selling eye glasses’) :

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 3 / 8

slide-7
SLIDE 7

Turkish NLP in WebLicht environment Turkish morphology with a single example

Voice suffjxes

İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz

  • tır is the causative marker

▶ İstanbul-lu-laş-tır ‘to cause someone to become an Istanbulite’ ▶ oku-t-tur-… ‘…to cause someone to cause someone to read’

▶ Passive suffjx may also repeat twice

Theoretically unbounded number of suffjxes Even if the number is limited, representation as a typical feature is problematic Ambiguity: some multiple forms are for emphasis, not for double causation

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 4 / 8

slide-8
SLIDE 8

Turkish NLP in WebLicht environment Turkish morphology with a single example

Voice suffjxes

İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz

  • tır is the causative marker

▶ İstanbul-lu-laş-tır ‘to cause someone to become an Istanbulite’ ▶ oku-t-tur-… ‘…to cause someone to cause someone to read’

▶ Passive suffjx may also repeat twice ▶ Theoretically unbounded number of suffjxes ▶ Even if the number is limited, representation as a typical

feature is problematic

▶ Ambiguity: some multiple forms are for emphasis, not for

double causation

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 4 / 8

slide-9
SLIDE 9

Turkish NLP in WebLicht environment Turkish morphology with a single example

Other verbal infmections

İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz

  • a/-(y)abil indicate ability/possibility, -ma is the negative

marker

▶ İstanbul-…-a-ma- ‘not to be able to cause someone to become

an Istanbulite’

▶ İstanbul-…-a-ma-yabil- ‘may not be able to cause someone to

become an Istanbulite’

Nothing new, repetition and ambiguity A fjnite verb may have about 10 infmectional suffjxes marking voice, tense, aspect, modality and person/number

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 5 / 8

slide-10
SLIDE 10

Turkish NLP in WebLicht environment Turkish morphology with a single example

Other verbal infmections

İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz

  • a/-(y)abil indicate ability/possibility, -ma is the negative

marker

▶ İstanbul-…-a-ma- ‘not to be able to cause someone to become

an Istanbulite’

▶ İstanbul-…-a-ma-yabil- ‘may not be able to cause someone to

become an Istanbulite’

▶ Nothing new, repetition and ambiguity ▶ A fjnite verb may have about 10 infmectional suffjxes marking

voice, tense, aspect, modality and person/number

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 5 / 8

slide-11
SLIDE 11

Turkish NLP in WebLicht environment Turkish morphology with a single example

Subordination

İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz

  • ecek makes a subordinate clause

▶ İstanbul-…-ecek ‘someone who may not possibly be converted

to an Istanbulite’

▶ Now the word acts like a noun (referring to a person)

  • ler

is the plural marker

  • imiz

(normally) marks the possessor (fjrst person plural)

▶ ev-imiz ‘our house’ ▶ but, here it marks the subject of the subordinate clause

  • den

marks for ablative case

▶ İstanbul-…-ecek ‘of those we may not be able to converted an

Istanbulite’

We have two POS tags with infmections, the verb of the subordinate clause and the resulting noun Features may confmict: the verb has Person=1 while the noun has Person=3

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 6 / 8

slide-12
SLIDE 12

Turkish NLP in WebLicht environment Turkish morphology with a single example

Subordination

İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz

  • ecek makes a subordinate clause

▶ İstanbul-…-ecek ‘someone who may not possibly be converted

to an Istanbulite’

▶ Now the word acts like a noun (referring to a person)

  • ler

is the plural marker

  • imiz

(normally) marks the possessor (fjrst person plural)

▶ ev-imiz ‘our house’ ▶ but, here it marks the subject of the subordinate clause

  • den

marks for ablative case

▶ İstanbul-…-ecek ‘of those we may not be able to converted an

Istanbulite’

▶ We have two POS tags with infmections, the verb of the

subordinate clause and the resulting noun

▶ Features may confmict: the verb has Person=1 while the noun

has Person=3

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 6 / 8

slide-13
SLIDE 13

Turkish NLP in WebLicht environment Turkish morphology with a single example

Copular suffjxes

İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz

  • (y)miş marks for past tense and evidentiality, copula part ‘(y)’ is

dropped because of the phonological context

  • siniz marks for fjrst person plural

Now we have three POS tags, two of them are predicates The predicates have difgerent feature values, difgerent subjects İstanbul-lu-laş-tır-a-ma-yabil -ecek-ler-imiz-den miş-siniz

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 7 / 8

slide-14
SLIDE 14

Turkish NLP in WebLicht environment Turkish morphology with a single example

Copular suffjxes

İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz

  • (y)miş marks for past tense and evidentiality, copula part ‘(y)’ is

dropped because of the phonological context

  • siniz marks for fjrst person plural

▶ Now we have three POS tags, two of them are predicates ▶ The predicates have difgerent feature values, difgerent subjects

İstanbul-lu-laş-tır-a-ma-yabil -ecek-ler-imiz-den miş-siniz

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 7 / 8

slide-15
SLIDE 15

Turkish NLP in WebLicht environment Turkish morphology with a single example

Copular suffjxes

İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz

  • (y)miş marks for past tense and evidentiality, copula part ‘(y)’ is

dropped because of the phonological context

  • siniz marks for fjrst person plural

▶ Now we have three POS tags, two of them are predicates ▶ The predicates have difgerent feature values, difgerent subjects

⟨İstanbul-lu-laş-tır-a-ma-yabil⟩⟨-ecek-ler-imiz-den⟩⟨miş-siniz⟩

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 7 / 8

slide-16
SLIDE 16

Turkish NLP in WebLicht environment Turkish morphology with a single example

Summary

▶ Theoretically unbounded, repeated suffjxes ▶ Large number of tags means sparsity for machine learning

methods

▶ Multiple POS tags, multiple syntactic units in a single word

▶ Multiple/confmicting feature values ▶ Parts of a word may participate in difgerent syntactic relations ▶ Tokenization (for syntax) depends on morphological

analysis/disambiguation

▶ Ambiguity ▶ Free word order

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 8 / 8

slide-17
SLIDE 17

Morphological complexity in the real world

number of surface morphemes Frequency 1 2 3 4 5 6 7 1000 2000 3000 3257 1190 1041 421 104 27 8

*Counts over a corpus of approx. 6K hand-annotated tokens, excl. punctuation.

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 A.1

slide-18
SLIDE 18

An example dependency analysis

Kaygımız terörün durdurulama –ması –ydı NOUN NOUN VERB NOUN VERB

nsubj nsubj cop acl

‘Our worry was (the fact) that terror could not be stopped’

Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 A.2