Universal Derivations Kickoff: A Collection of Harmonized - - PowerPoint PPT Presentation

universal derivations kickoff a collection of harmonized
SMART_READER_LITE
LIVE PREVIEW

Universal Derivations Kickoff: A Collection of Harmonized - - PowerPoint PPT Presentation

Universal Derivations Kickoff: A Collection of Harmonized Derivational Resources for Eleven Languages ek y, Magda Luk a s Kyj anek, Zden Zabokrtsk Sev c kov a, Jon a s Vidra Charles University, Faculty of


slide-1
SLIDE 1

Universal Derivations Kickoff: A Collection of Harmonized Derivational Resources for Eleven Languages

Luk´ aˇ s Kyj´ anek, Zdenˇ ek ˇ Zabokrtsk´ y, Magda ˇ Sevˇ c´ ıkov´ a, Jon´ aˇ s Vidra

Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics

20th September 2019, DeriMo 2019

Kyj´ anek et al. Universal Derivations DeriMo 2019 1 / 31

slide-2
SLIDE 2

Let me choose any language, for example English. . .

Kyj´ anek et al. Universal Derivations DeriMo 2019 2 / 31

slide-3
SLIDE 3

Let me choose any language, for example English. . .

No!

Kyj´ anek et al. Universal Derivations DeriMo 2019 2 / 31

slide-4
SLIDE 4

Universal Derivations 0.5

soudit V soudívat V vysoudit V usoudit V rozsoudit V s u d i č N přisoudit V s
  • u
d i t e l n ý A soudný A souzený A s
  • u
z e n í N s u d b a N posoudit V soudící A
  • dsoudit V
s
  • u
d N soudce N dosoudit V nasoudit V prosoudit V s
  • u
d í v a n ý A s
  • u
d í v a t e l n ý A soudívání N s
  • u
d í v a c í A soudívající A vysouditelný A vysouzený A vysouzení N v y s u z
  • v
a t V vysoudivší A usouditelný A usouzený A usouzení N u s
  • u
d i v š í A u s u z
  • v
a t V r
  • z
s
  • u
d i t e l n ý A rozsouzený A r
  • z
s u d e k N r
  • z
s
  • u
z e n í N rozsoudivší A rozsuzovat V sudička N sudičský A sudičův A p ř í s u d e k N přisouditelný A p ř i s
  • u
z e n ý A p ř i s
  • u
z e n í N přisuzovat V p ř i s
  • u
d i v š í A souditelně D souditelnost N soudně D soudnost N souzeně D s
  • u
z e n
  • s
t N p
  • s
u z
  • v
a t V p
  • s
  • u
d i t e l n ý A p
  • s
  • u
z e n ý A p
  • s
u d e k N p
  • s
  • u
z e n í N posoudivší A
  • dsouditelný A
  • dsouzení N
  • dsudek N
  • dsoudivší A
  • dsuzovat V
  • dsouzený A
s
  • u
d n i c k ý A soudní A soudový A s
  • u
d n i č k a N s u d
  • v
a t V soudkyně N soudcová N soudcovský A soudcův A soudcovat V d
  • s
  • u
d i t e l n ý A dosouzený A dosouzení N dosoudivší A nasouditelný A nasouzený A nasouzení N n a s
  • u
d i v š í A p r
  • s
  • u
d i t e l n ý A prosouzený A p r
  • s
  • u
z e n í N p r
  • s
  • u
d i v š í A s
  • u
d í v a n ě D s
  • u
d í v a n
  • s
t N soudívatelně D soudívatelnost N vysouditelně D vysouditelnost N vysouzeně D vysouzenost N v y s u z
  • v
á v a t V vysuzovaný A vysuzovatelný A vysuzování N vysuzovací A vysuzující A usouditelně D u s
  • u
d i t e l n
  • s
t N usouzeně D u s
  • u
z e n
  • s
t N usuzovávat V usuzovatel N usuzovaný A u s u z
  • v
a t e l n ý A usuzování N usuzovací A usuzující A rozsouditelně D rozsouditelnost N rozsouzeně D r
  • z
s
  • u
z e n
  • s
t N r
  • z
s u d k
  • v
ý A rozsuzovávat V r
  • z
s u z
  • v
a t e l N r
  • z
s u z
  • v
a n ý A r
  • z
s u z
  • v
a t e l n ý A rozsuzování N r
  • z
s u z
  • v
a c í A rozsuzující A sudiččin A sudičsky D sudičskost N s u d i č s t v í N p ř í s u d k
  • v
ý A přisouditelně D přisouditelnost N p ř i s
  • u
z e n ě D p ř i s
  • u
z e n
  • s
t N přisuzovávat V p ř i s u z
  • v
a n ý A p ř i s u z
  • v
a t e l n ý A přisuzování N přisuzovací A p ř i s u z u j í c í A posuzovávat V posuzovaný A posuzovatelný A p
  • s
u z
  • v
á n í N posuzovatel N posuzovací A posuzující A p
  • s
  • u
d i t e l n ě D posouditelnost N posouzeně D posouzenost N superposudek N posudkový A
  • d
s
  • u
d i t e l n ě D
  • dsouditelnost N
  • dsuzovávat V
  • dsuzovatel N
  • d
s u z
  • v
a n ý A
  • dsuzovatelný A
  • d
s u z
  • v
á n í N
  • dsuzovací A
  • dsuzující A
  • dsouzenost N
  • dsouzenec N
soudnicky D s
  • u
d n i c k
  • s
t N soudnictví N soudově D s
  • u
d
  • v
  • s
t N soudničkový A s
  • u
d n i č k á ř N sudovávat V sudovaný A sudovatelný A sudování N n a s u d
  • v
a t V s u d
  • v
a c í A s u d u j í c í A soudkynin A soudcovsky D soudcovskost N soudcovství N s
  • u
d c
  • v
á v a t V s
  • u
d c
  • v
a n ý A s
  • u
d c
  • v
a t e l n ý A s
  • u
d c
  • v
á n í N
  • dsoudcovat V
usoudcovat V s
  • u
d c
  • v
a c í A soudcující A dosouditelně D dosouditelnost N dosouzeně D d
  • s
  • u
z e n
  • s
t N nasouditelně D nasouditelnost N n a s
  • u
z e n ě D nasouzenost N p r
  • s
  • u
d i t e l n ě D prosouditelnost N p r
  • s
  • u
z e n ě D prosouzenost N vysuzovávaný A v y s u z
  • v
á v a t e l n ý A vysuzovávání N v y s u z
  • v
á v a c í A v y s u z
  • v
á v a j í c í A vysuzovaně D vysuzovanost N v y s u z
  • v
a t e l n ě D vysuzovatelnost N usuzovávaný A usuzovávatelný A u s u z
  • v
á v á n í N usuzovávací A usuzovávající A u s u z
  • v
a t e l k a N usuzovatelův A usuzovaně D usuzovanost N usuzovatelně D usuzovatelnost N r
  • z
s u d k
  • v
  • s
t N r
  • z
s u z
  • v
á v a n ý A rozsuzovávatelný A rozsuzovávání N rozsuzovávací A rozsuzovávající A rozsuzovatelka N r
  • z
s u z
  • v
a t e l ů v A rozsuzovaně D r
  • z
s u z
  • v
a n
  • s
t N rozsuzovatelně D r
  • z
s u z
  • v
a t e l n
  • s
t N p ř í s u d k
  • v
ě D přísudkovost N p ř i s u z
  • v
á v a n ý A p ř i s u z
  • v
á v a t e l n ý A p ř i s u z
  • v
á v á n í N přisuzovávací A p ř i s u z
  • v
á v a j í c í A přisuzovaně D přisuzovanost N přisuzovatelně D p ř i s u z
  • v
a t e l n
  • s
t N posuzovávaný A posuzovávatelný A p
  • s
u z
  • v
á v á n í N posuzovávací A posuzovávající A p
  • s
u z
  • v
a n ě D posuzovanost N p
  • s
u z
  • v
a t e l n ě D p
  • s
u z
  • v
a t e l n
  • s
t N posuzovatelka N posuzovatelův A posuzovatelský A posudkově D posudkovost N
  • d
s u z
  • v
á v a n ý A
  • d
s u z
  • v
á v a t e l n ý A
  • d
s u z
  • v
á v á n í N
  • dsuzovávací A
  • d
s u z
  • v
á v a j í c í A
  • dsuzovatelka N
  • dsuzovatelův A
  • d
s u z
  • v
a n ě D
  • dsuzovanost N
  • dsuzovatelně D
  • d
s u z
  • v
a t e l n
  • s
t N
  • dsouzenkyně N
  • dsouzencův A
  • dsouzenecký A
s
  • u
d n i č k
  • v
ě D soudničkovost N soudničkářka N s
  • u
d n i č k á ř ů v A s
  • u
d n i č k á ř s k ý A s u d
  • v
á v a n ý A s u d
  • v
á v a t e l n ý A s u d
  • v
á v á n í N sudovávací A sudovávající A s u d
  • v
a n ě D s u d
  • v
a n
  • s
t N s u d
  • v
a t e l n ě D s u d
  • v
a t e l n
  • s
t N n a s u d
  • v
á v a t V nasudovaný A nasudovatelný A nasudování N nasudovací A nasudovavší A soudcovávaný A s
  • u
d c
  • v
á v a t e l n ý A soudcovávání N s
  • u
d c
  • v
á v a c í A s
  • u
d c
  • v
á v a j í c í A s
  • u
d c
  • v
a n ě D soudcovanost N soudcovatelně D soudcovatelnost N
  • dsoudcovávat V
  • dsoudcovaný A
  • d
s
  • u
d c
  • v
a t e l n ý A
  • d
s
  • u
d c
  • v
á n í N
  • dsoudcovací A
  • d
s
  • u
d c
  • v
a v š í A u s
  • u
d c
  • v
á v a t V usoudcovaný A u s
  • u
d c
  • v
a t e l n ý A u s
  • u
d c
  • v
á n í N usoudcovací A u s
  • u
d c
  • v
a v š í A vysuzovávaně D v y s u z
  • v
á v a n
  • s
t N v y s u z
  • v
á v a t e l n ě D vysuzovávatelnost N u s u z
  • v
á v a n ě D usuzovávanost N u s u z
  • v
á v a t e l n ě D usuzovávatelnost N u s u z
  • v
a t e l č i n A rozsuzovávaně D r
  • z
s u z
  • v
á v a n
  • s
t N rozsuzovávatelně D r
  • z
s u z
  • v
á v a t e l n
  • s
t N rozsuzovatelčin A p ř i s u z
  • v
á v a n ě D přisuzovávanost N přisuzovávatelně D přisuzovávatelnost N p
  • s
u z
  • v
á v a n ě D p
  • s
u z
  • v
á v a n
  • s
t N posuzovávatelně D posuzovávatelnost N posuzovatelčin A posuzovatelsky D posuzovatelskost N
  • d
s u z
  • v
á v a n ě D
  • dsuzovávanost N
  • dsuzovávatelně D
  • d
s u z
  • v
á v a t e l n
  • s
t N
  • dsuzovatelčin A
  • dsouzenkynin A
  • dsouzenecky D
  • d
s
  • u
z e n e c k
  • s
t N soudničkářčin A s
  • u
d n i č k á ř s k y D s
  • u
d n i č k á ř s k
  • s
t N sudovávaně D s u d
  • v
á v a n
  • s
t N s u d
  • v
á v a t e l n ě D sudovávatelnost N n a s u d
  • v
á v a n ý A nasudovávatelný A n a s u d
  • v
á v á n í N n a s u d
  • v
á v a c í A n a s u d
  • v
á v a j í c í A nasudovaně D nasudovanost N nasudovatelně D n a s u d
  • v
a t e l n
  • s
t N s
  • u
d c
  • v
á v a n ě D soudcovávanost N soudcovávatelně D s
  • u
d c
  • v
á v a t e l n
  • s
t N
  • d
s
  • u
d c
  • v
á v a n ý A
  • dsoudcovávatelný A
  • dsoudcovávání N
  • d
s
  • u
d c
  • v
á v a c í A
  • d
s
  • u
d c
  • v
á v a j í c í A
  • dsoudcovaně D
  • dsoudcovanost N
  • d
s
  • u
d c
  • v
a t e l n ě D
  • d
s
  • u
d c
  • v
a t e l n
  • s
t N usoudcovávaný A usoudcovávatelný A u s
  • u
d c
  • v
á v á n í N usoudcovávací A usoudcovávající A usoudcovaně D u s
  • u
d c
  • v
a n
  • s
t N u s
  • u
d c
  • v
a t e l n ě D u s
  • u
d c
  • v
a t e l n
  • s
t N n a s u d
  • v
á v a n ě D n a s u d
  • v
á v a n
  • s
t N n a s u d
  • v
á v a t e l n ě D nasudovávatelnost N
  • dsoudcovávaně D
  • d
s
  • u
d c
  • v
á v a n
  • s
t N
  • dsoudcovávatelně D
  • dsoudcovávatelnost N
u s
  • u
d c
  • v
á v a n ě D u s
  • u
d c
  • v
á v a n
  • s
t N usoudcovávatelně D usoudcovávatelnost N

avaliar VERB avaliador NOUN avaliação NOUN

aestimo VERB aestimabilis ADJ aestimatio NOUN aestimator NOUN aestimatorius ADJ coaestimo VERB peraestimo VERB inaestimabilis ADJ arvokas ADJ a r v

  • e

s i n e N O U N arvokkuus NOUN a r v

  • s

t a a V E R B arvo NOUN arvonmääritys NOUN arvioida VERB arvostettu ADJ arvoinen ADJ a r v i

  • i

t s i j a N O U N arvon määrittäminen NOUN arviointi NOUN arviomies NOUN a r v i

  • i

v a A D J arvostelija NOUN arviolaskelma NOUN a r v i

  • i

j a N O U N a r v i

  • N

O U N a r v

  • s

t u s N O U N arvohenkilö NOUN a r v

  • t

t a a V E R B a r v i

  • i

d a v e r

  • n

s u u r u u s V E R B a r v

  • s

t e l l a V E R B a r v

  • s

t e l u N O U N

evaluar evaluable evaluación evaluador evaluativo evaluatorio e v a l u a b i l i d a d inevaluable autoevaluación reevaluación évaluer VERB évaluateur NOUN évaluation NOUN évaluatif ADJ

hindama VERB hindamisi ADV

bewerten VERB B e w e r t u n g N O U N b e w e r t e n d A D J b e w e r t e t A D J umbewerten VERB unterbewerten VERB ü b e r b e w e r t e n V E R B F e h l b e w e r t u n g N O U N Bewertete NOUN überbewertet ADJ Unterbewertung NOUN u n t e r b e w e r t e t A D J Ü b e r b e w e r t u n g N O U N

evaluate VERB evaluation NOUN evaluator NOUN

DErivBase Word Formation Latin DeriNet.ES Démonette The Polish Word-Formation Network FinnWordNet DeriNet.FA English WordNet NomLex-PT EstWordNet DeriNet

ارز ا ر ز آ و ر ی ارزﻧﺪه ارزﻫﺎ ارزی ا ر ز ﯾ ﺎ ب ارزﻧﺪﻫﺎش ارزﻧﺪﻫﺎی ا ر ز ﻧ ﺪ ﻫ ﺘ ﺮ ﯾ ﻦ ارزﻫﺎی ا ر ز ﯾ ﺎ ﺑ ﯽ

Kyj´ anek et al. Universal Derivations DeriMo 2019 3 / 31

slide-5
SLIDE 5

Outline

Motivation; the success story of Universal Dependencies Diversity of existing derivational resources Design decision on which our harmonization is based. . . . . . with a special attention paid to trees Universal Derivations collection – basic properties

Kyj´ anek et al. Universal Derivations DeriMo 2019 4 / 31

slide-6
SLIDE 6

Motivation

Growing interest in derivational morphology in recent... 50+ existing derivational data resources for 20+ languages. Difficult to work with in a single experiment, because of

▸ different methodology, different formal model, ▸ different file format, incompatible software tools (tools for

annotation, querying, visualization etc.)

▸ published under various licenses (or unpublished), etc. Kyj´ anek et al. Universal Derivations DeriMo 2019 5 / 31

slide-7
SLIDE 7

Multilingual language resources in other domains

Pushing to shared annotation schemes proved very fertile elsewhere, as:

Kyj´ anek et al. Universal Derivations DeriMo 2019 6 / 31

slide-8
SLIDE 8

Multilingual language resources in other domains

Pushing to shared annotation schemes proved very fertile elsewhere, as: new schemes become less language dependent. . . . . . and more independent of local linguistic traditions,

Kyj´ anek et al. Universal Derivations DeriMo 2019 6 / 31

slide-9
SLIDE 9

Multilingual language resources in other domains

Pushing to shared annotation schemes proved very fertile elsewhere, as: new schemes become less language dependent. . . . . . and more independent of local linguistic traditions, sharing software tools (for annotation, visualization, querying...) becomes possible,

Kyj´ anek et al. Universal Derivations DeriMo 2019 6 / 31

slide-10
SLIDE 10

Multilingual language resources in other domains

Pushing to shared annotation schemes proved very fertile elsewhere, as: new schemes become less language dependent. . . . . . and more independent of local linguistic traditions, sharing software tools (for annotation, visualization, querying...) becomes possible, lower barrier for under-resourced languages,

Kyj´ anek et al. Universal Derivations DeriMo 2019 6 / 31

slide-11
SLIDE 11

Multilingual language resources in other domains

Pushing to shared annotation schemes proved very fertile elsewhere, as: new schemes become less language dependent. . . . . . and more independent of local linguistic traditions, sharing software tools (for annotation, visualization, querying...) becomes possible, lower barrier for under-resourced languages, typological studies become simpler,

Kyj´ anek et al. Universal Derivations DeriMo 2019 6 / 31

slide-12
SLIDE 12

Multilingual language resources in other domains

Pushing to shared annotation schemes proved very fertile elsewhere, as: new schemes become less language dependent. . . . . . and more independent of local linguistic traditions, sharing software tools (for annotation, visualization, querying...) becomes possible, lower barrier for under-resourced languages, typological studies become simpler, competitions in shared tasks becomes a huge source of energy.

Kyj´ anek et al. Universal Derivations DeriMo 2019 6 / 31

slide-13
SLIDE 13

Multilingual language resources in other domains

Pushing to shared annotation schemes proved very fertile elsewhere, as: new schemes become less language dependent. . . . . . and more independent of local linguistic traditions, sharing software tools (for annotation, visualization, querying...) becomes possible, lower barrier for under-resourced languages, typological studies become simpler, competitions in shared tasks becomes a huge source of energy. Perhaps the most convincing example: Universal Dependencies!

Kyj´ anek et al. Universal Derivations DeriMo 2019 6 / 31

slide-14
SLIDE 14

A brief history of multilingual treebank collections

Some steps in the evolution: 2006: 13 languages in the CoNLL-X shared task dataset 2011: 29 languages in HamleDT 2019: 85 languages in Universal Dependencies

Kyj´ anek et al. Universal Derivations DeriMo 2019 7 / 31

slide-15
SLIDE 15

The case of Universal Dependencies

UD is an obvious success as for the number of languages. Resulting from collaboration of a (still growing) community! What can we learn from this harmonization story?

Kyj´ anek et al. Universal Derivations DeriMo 2019 8 / 31

slide-16
SLIDE 16

Lesson No. 1: gain project momentum from snowballing

A positive feedback effect (snowballing, rich-get-richer principle):

▸ the more languages are covered, the more attractive the collection

becomes, and the more new languages added . . .

Kyj´ anek et al. Universal Derivations DeriMo 2019 9 / 31

slide-17
SLIDE 17

Lesson No. 1: gain project momentum from snowballing

A positive feedback effect (snowballing, rich-get-richer principle):

▸ the more languages are covered, the more attractive the collection

becomes, and the more new languages added . . .

Why CoNLL 2006, 2007, or 2009 or HamleDT were not sufficient to start the snowballing?

Kyj´ anek et al. Universal Derivations DeriMo 2019 9 / 31

slide-18
SLIDE 18

Lesson No. 1: gain project momentum from snowballing

A positive feedback effect (snowballing, rich-get-richer principle):

▸ the more languages are covered, the more attractive the collection

becomes, and the more new languages added . . .

Why CoNLL 2006, 2007, or 2009 or HamleDT were not sufficient to start the snowballing? Hard to say.

▸ Maybe super-critical initial energy investment is needed. ▸ Maybe an attractive brand matters most. Maybe the licensing policy. ▸ Maybe they were just lucky. Kyj´ anek et al. Universal Derivations DeriMo 2019 9 / 31

slide-19
SLIDE 19

Lesson No. 1: gain project momentum from snowballing

A positive feedback effect (snowballing, rich-get-richer principle):

▸ the more languages are covered, the more attractive the collection

becomes, and the more new languages added . . .

Why CoNLL 2006, 2007, or 2009 or HamleDT were not sufficient to start the snowballing? Hard to say.

▸ Maybe super-critical initial energy investment is needed. ▸ Maybe an attractive brand matters most. Maybe the licensing policy. ▸ Maybe they were just lucky.

Evolution is unpredictable. Still, snowballing can help a lot.

Kyj´ anek et al. Universal Derivations DeriMo 2019 9 / 31

slide-20
SLIDE 20

Lesson No. 2: simplicity is the key

[with a little bit of exaggeration] Better simple than perfectly linguistically adequate.

▸ Trees are clearly insufficient for syntax? Who cares, trees are simple,

let’s start with trees, and the other things can be solved later.

Better simple than expressive.

▸ Multilayer schemes are powerful, but complex. Let’s start with a single

structure for a sentence, the rest will be solved later.

Better simple than flexible.

▸ XML is versatile, but non-trivial to process. Let’s stick to a simple

plain-text file format with a fixed number of columns.

Kyj´ anek et al. Universal Derivations DeriMo 2019 10 / 31

slide-21
SLIDE 21

Diversity across word-formation resources

OK, lessons taken, so let’s return to word formation. How diverse the existing resources actually are? Let’s have a look at how a derivational family is represented formally.

Kyj´ anek et al. Universal Derivations DeriMo 2019 11 / 31

slide-22
SLIDE 22

Representation of derivational families in existing resources

We observed basically four distinct approaches in which derivational family is represented

1 just as an unstructured set, 2 or as a rooted tree, 3 or as a less constrained graph, e.g. as a weakly connected graph, 4 or just implicitly, by overlaps in constituency trees representing

internal structure of a word

5 LEARNED YESTERDAY: morpheme-centric graphs (LiLa) Kyj´ anek et al. Universal Derivations DeriMo 2019 12 / 31

slide-23
SLIDE 23

How do the existing resources represent a derivational family?

prospectrice.N prospecteur.N prospecter.V prospectif.A prospection.N adaptacijski.A adaptirati.V adaptacija.N adaptiranje.N adaptator.N adaptiran.A A B C D kočka.N kotě.N koťátko.N koččin.A kočkovat.V kočkování.N pokočkovat.V aan as drijf

lexeme: aandrijfas.N

V N P V N Kyj´ anek et al. Universal Derivations DeriMo 2019 13 / 31

slide-24
SLIDE 24

Universal Derivations (UDer)

a newly created collection of word-formation resources trying to go as multilingual as possible admittedly imitative title a shameless attempt at replicating the UD success story the current version (UDer 0.5) publicly available in the LINDAT/Clarin repository.

Kyj´ anek et al. Universal Derivations DeriMo 2019 14 / 31

slide-25
SLIDE 25

UDer’s design decision

a lexeme-centric graph-based approach inherited from DeriNet 2.0:

▸ a node represents a lexeme ▸ an oriented edge represents a derivational relation ▸ a (rooted) tree represents a derivational family ▸ the whole vocabulary of a language is represented by a forest ▸ additional links can be stored as extra non-tree edges ▸ space for other annotation components (morpheme segmentation,

semantic labels, etc.)

Kyj´ anek et al. Universal Derivations DeriMo 2019 15 / 31

slide-26
SLIDE 26

Why trees?

Just three conditions implied:

▸ acyclic ▸ single-rooted ▸ connected Kyj´ anek et al. Universal Derivations DeriMo 2019 16 / 31

slide-27
SLIDE 27

Why trees?

Just three conditions implied:

▸ acyclic ▸ single-rooted ▸ connected

Is there any risk that some of them is violated in our data?!?

Kyj´ anek et al. Universal Derivations DeriMo 2019 16 / 31

slide-28
SLIDE 28

Condition 1: always acyclic?

Kyj´ anek et al. Universal Derivations DeriMo 2019 17 / 31

slide-29
SLIDE 29

Condition 1: always acyclic?

Sometimes violated

Kyj´ anek et al. Universal Derivations DeriMo 2019 17 / 31

slide-30
SLIDE 30

Condition 1: always acyclic?

Sometimes violated Example: a systematic pattern, in which adding a prefix, or adding a suffix, or adding both, produces valid lexemes

X

pre-X

pref+X X+suff

pre-X

pref+X+suff

Kyj´ anek et al. Universal Derivations DeriMo 2019 17 / 31

slide-31
SLIDE 31

Condition 1: always acyclic?

Sometimes violated Example: a systematic pattern, in which adding a prefix, or adding a suffix, or adding both, produces valid lexemes

X

pre-X

pref+X X+suff

pre-X

pref+X+suff

Luckily, there’s a simple workaround : let’s store only a tree-shaped skeleton (chosen preferably according to some rules) and consider it a shortcut representation for a richer structure.

X

pre-X

pref+X X+suff

pre-X

pref+X+suff

Kyj´ anek et al. Universal Derivations DeriMo 2019 17 / 31

slide-32
SLIDE 32

Condition 1: always acyclic?

Sometimes violated Example: a systematic pattern, in which adding a prefix, or adding a suffix, or adding both, produces valid lexemes

X

pre-X

pref+X X+suff

pre-X

pref+X+suff

Luckily, there’s a simple workaround : let’s store only a tree-shaped skeleton (chosen preferably according to some rules) and consider it a shortcut representation for a richer structure.

X

pre-X

pref+X X+suff

pre-X

pref+X+suff

They do it too in UD! (argumentation by a logical fallacy, hopefully nobody notices): e.g. coordination structures are cyclic, but they’re represented as trees in UD.

Kyj´ anek et al. Universal Derivations DeriMo 2019 17 / 31

slide-33
SLIDE 33

Condition 2: always single-rooted?

Sometimes violated too

Kyj´ anek et al. Universal Derivations DeriMo 2019 18 / 31

slide-34
SLIDE 34

Condition 2: always single-rooted?

Sometimes violated too Example: composition.

Kyj´ anek et al. Universal Derivations DeriMo 2019 18 / 31

slide-35
SLIDE 35

Condition 2: always single-rooted?

Sometimes violated too Example: composition. Workaround : let’s allow inserting “second-class” edges

Kyj´ anek et al. Universal Derivations DeriMo 2019 18 / 31

slide-36
SLIDE 36

Condition 2: always single-rooted?

Sometimes violated too Example: composition. Workaround : let’s allow inserting “second-class” edges They do it too in UD: secondary predication (“She declared the cake beautiful”).

Kyj´ anek et al. Universal Derivations DeriMo 2019 18 / 31

slide-37
SLIDE 37

Condition 3: always connected?

Sometimes violated too

Kyj´ anek et al. Universal Derivations DeriMo 2019 19 / 31

slide-38
SLIDE 38

Condition 3: always connected?

Sometimes violated too Example: nab´ ızet (to offer) and pob´ ızet (to urge) feel as siblings, but no b´ ızet.

na[bíz]et vy[bíz]et po[bíz]et

Kyj´ anek et al. Universal Derivations DeriMo 2019 19 / 31

slide-39
SLIDE 39

Condition 3: always connected?

Sometimes violated too Example: nab´ ızet (to offer) and pob´ ızet (to urge) feel as siblings, but no b´ ızet.

na[bíz]et vy[bíz]et po[bíz]et

Workaround : introduce fictitious lexemes

na[bíz]et vy[bíz]et po[bíz]et *bízet

Kyj´ anek et al. Universal Derivations DeriMo 2019 19 / 31

slide-40
SLIDE 40

Condition 3: always connected?

Sometimes violated too Example: nab´ ızet (to offer) and pob´ ızet (to urge) feel as siblings, but no b´ ızet.

na[bíz]et vy[bíz]et po[bíz]et

Workaround : introduce fictitious lexemes

na[bíz]et vy[bíz]et po[bíz]et *bízet

They do it too in UD: “Sue likes coffee and Bill tea.” – an additional node inserted

Kyj´ anek et al. Universal Derivations DeriMo 2019 19 / 31

slide-41
SLIDE 41

Once again, why trees?

A tree is an irresistibly attractive data structure. Compared to unrestricted graphs, “treeness” simplifies all kinds of algorithmic processing. It simplifies any evaluation attempts too, such as measuring inter-annotator agreement or success of cross-lingual projection.

Kyj´ anek et al. Universal Derivations DeriMo 2019 20 / 31

slide-42
SLIDE 42

Common, why trees? Seriously!

Kyj´ anek et al. Universal Derivations DeriMo 2019 21 / 31

slide-43
SLIDE 43

Common, why trees? Seriously!

Perhaps the most influential reason: the law of the hammer

Kyj´ anek et al. Universal Derivations DeriMo 2019 21 / 31

slide-44
SLIDE 44

Common, why trees? Seriously!

Perhaps the most influential reason: the law of the hammer

Law of the hammer

A cognitive bias: If our basic tool is a hammer, one tends to look for nails. In our case: after a decade or two in treebanking, one sees trees everywhere around.

Kyj´ anek et al. Universal Derivations DeriMo 2019 21 / 31

slide-45
SLIDE 45

Common, why trees? Seriously!

Perhaps the most influential reason: the law of the hammer

Law of the hammer

A cognitive bias: If our basic tool is a hammer, one tends to look for nails. In our case: after a decade or two in treebanking, one sees trees everywhere around. Conclusion: rooted trees fit derivation perfectly, Q.E.D.

Kyj´ anek et al. Universal Derivations DeriMo 2019 21 / 31

slide-46
SLIDE 46

What if the input resource is not tree-based?

we can’t have a cake and eat it

▸ harmonization means reducing the diversity ▸ e.g., if a weakly connect graph is used to represent a family, we extract

its tree-shaped skeleton

compromise: other information not lost, but stored on a less prominent place

Kyj´ anek et al. Universal Derivations DeriMo 2019 22 / 31

slide-47
SLIDE 47

Resources integrated in Universal Derivations 0.5

D´ emonette 1.2 (French) DeriNet 2.0 (Czech) DeriNet.ES (Spanish) DeriNet.FA (Persian) DErivBase 2.0 (German) English WordNet 3.0 (English) EstWordNet 2.1 (Estonian) FinnWordNet 2.0 (Finnish) Nomlex-PT 2017 (Portuguese) Polish WFN (Polish) Word Formation Latin (Latin)

Kyj´ anek et al. Universal Derivations DeriMo 2019 23 / 31

slide-48
SLIDE 48

UDer 0.5 – basic statistical properties

After harmonization Resource Language Lexemes Relations Families License D´ emonette 1.2 French 21,290 13,808 7,482 CC BY-NC-SA DeriNet 2.0 Czech 1,027,665 808,682 218,383 CC BY-NC-SA DeriNet.ES Spanish 151,173 36,935 114,238 CC BY-NC-SA DeriNet.FA Persian 43,357 35,745 7,612 CC BY-NC-SA DErivBase 2.0 German 280,775 44,830 235,945 CC BY-SA 3.0 English WordNet 3.0 English 13,813 7,855 5,958 CC BY-NC-SA EstWordNet 2.1 Estonian 988 507 481 CC BY-SA 3.0 FinnWordNet 2.0 Finnish 20,035 13,687 6,348 CC BY 3.0 Nomlex-PT 2017 Portuguese 7,020 4,201 2,819 CC BY 4.0 Polish WFN 0.5 Polish 262,887 189,217 73,670 CC BY-NC-SA Word Formation Latin Latin 29,708 22,641 5,320 CC BY-NC-SA

Kyj´ anek et al. Universal Derivations DeriMo 2019 24 / 31

slide-49
SLIDE 49

UDer 0.5 – basic statistical properties, cont.

Singleton Tree Tree Part-of-speech distribution [%] Resource nodes #Nodes depth out-degree Noun Adj Verb Adv Other D´ emonette 1.2 69 2.8 / 12 1.1 / 4 1.8 / 8 63.0 2.5 34.5 – – DeriNet 2.0 96,208 4.7 / 1638 0.8 / 10 1.1 / 40 44.0 34.8 5.5 15.7 – DeriNet.ES 98,325 1.3 / 35 0.2 / 5 0.3 / 14 – – – – – DeriNet.FA 5.7 / 180 1.5 / 6 3.3 / 114 – – – – – DErivBase 2.0 215,823 1.2 / 51 0.1 / 7 0.1 / 13 85.5 9.9 4.6 – –

  • En. WordNet 3.0

65 2.3 / 6 1.0 / 1 1.3 / 6 56.9 – 43.1 – – EstWordNet 2.1 21 2.1 / 3 1.0 / 2 1.0 / 3 15.9 29.0 7.9 47.2 – FinnWordNet 2.0 3 3.2 / 36 1.5 / 9 1.5 / 13 55.3 29.2 15.5 – – Nomlex-PT 2017 17 2.5 / 7 1.0 / 1 1.5 / 7 59.8 – 40.2 – – Polish WFN 0.5 41,332 3.6 / 214 1.0 / 8 1.1 / 38 – – – – – Word Form. Latin 63 5.6 / 130 1.5 / 6 3.0 / 42 46.0 27.4 23.8 – 2.8

Kyj´ anek et al. Universal Derivations DeriMo 2019 25 / 31

slide-50
SLIDE 50

Universal Derivations sample, once again

soudit V soudívat V vysoudit V usoudit V rozsoudit V s u d i č N přisoudit V s
  • u
d i t e l n ý A soudný A souzený A s
  • u
z e n í N s u d b a N posoudit V soudící A
  • dsoudit V
s
  • u
d N soudce N dosoudit V nasoudit V prosoudit V s
  • u
d í v a n ý A s
  • u
d í v a t e l n ý A soudívání N s
  • u
d í v a c í A soudívající A vysouditelný A vysouzený A vysouzení N v y s u z
  • v
a t V vysoudivší A usouditelný A usouzený A usouzení N u s
  • u
d i v š í A u s u z
  • v
a t V r
  • z
s
  • u
d i t e l n ý A rozsouzený A r
  • z
s u d e k N r
  • z
s
  • u
z e n í N rozsoudivší A rozsuzovat V sudička N sudičský A sudičův A p ř í s u d e k N přisouditelný A p ř i s
  • u
z e n ý A p ř i s
  • u
z e n í N přisuzovat V p ř i s
  • u
d i v š í A souditelně D souditelnost N soudně D soudnost N souzeně D s
  • u
z e n
  • s
t N p
  • s
u z
  • v
a t V p
  • s
  • u
d i t e l n ý A p
  • s
  • u
z e n ý A p
  • s
u d e k N p
  • s
  • u
z e n í N posoudivší A
  • dsouditelný A
  • dsouzení N
  • dsudek N
  • dsoudivší A
  • dsuzovat V
  • dsouzený A
s
  • u
d n i c k ý A soudní A soudový A s
  • u
d n i č k a N s u d
  • v
a t V soudkyně N soudcová N soudcovský A soudcův A soudcovat V d
  • s
  • u
d i t e l n ý A dosouzený A dosouzení N dosoudivší A nasouditelný A nasouzený A nasouzení N n a s
  • u
d i v š í A p r
  • s
  • u
d i t e l n ý A prosouzený A p r
  • s
  • u
z e n í N p r
  • s
  • u
d i v š í A s
  • u
d í v a n ě D s
  • u
d í v a n
  • s
t N soudívatelně D soudívatelnost N vysouditelně D vysouditelnost N vysouzeně D vysouzenost N v y s u z
  • v
á v a t V vysuzovaný A vysuzovatelný A vysuzování N vysuzovací A vysuzující A usouditelně D u s
  • u
d i t e l n
  • s
t N usouzeně D u s
  • u
z e n
  • s
t N usuzovávat V usuzovatel N usuzovaný A u s u z
  • v
a t e l n ý A usuzování N usuzovací A usuzující A rozsouditelně D rozsouditelnost N rozsouzeně D r
  • z
s
  • u
z e n
  • s
t N r
  • z
s u d k
  • v
ý A rozsuzovávat V r
  • z
s u z
  • v
a t e l N r
  • z
s u z
  • v
a n ý A r
  • z
s u z
  • v
a t e l n ý A rozsuzování N r
  • z
s u z
  • v
a c í A rozsuzující A sudiččin A sudičsky D sudičskost N s u d i č s t v í N p ř í s u d k
  • v
ý A přisouditelně D přisouditelnost N p ř i s
  • u
z e n ě D p ř i s
  • u
z e n
  • s
t N přisuzovávat V p ř i s u z
  • v
a n ý A p ř i s u z
  • v
a t e l n ý A přisuzování N přisuzovací A p ř i s u z u j í c í A posuzovávat V posuzovaný A posuzovatelný A p
  • s
u z
  • v
á n í N posuzovatel N posuzovací A posuzující A p
  • s
  • u
d i t e l n ě D posouditelnost N posouzeně D posouzenost N superposudek N posudkový A
  • d
s
  • u
d i t e l n ě D
  • dsouditelnost N
  • dsuzovávat V
  • dsuzovatel N
  • d
s u z
  • v
a n ý A
  • dsuzovatelný A
  • d
s u z
  • v
á n í N
  • dsuzovací A
  • dsuzující A
  • dsouzenost N
  • dsouzenec N
soudnicky D s
  • u
d n i c k
  • s
t N soudnictví N soudově D s
  • u
d
  • v
  • s
t N soudničkový A s
  • u
d n i č k á ř N sudovávat V sudovaný A sudovatelný A sudování N n a s u d
  • v
a t V s u d
  • v
a c í A s u d u j í c í A soudkynin A soudcovsky D soudcovskost N soudcovství N s
  • u
d c
  • v
á v a t V s
  • u
d c
  • v
a n ý A s
  • u
d c
  • v
a t e l n ý A s
  • u
d c
  • v
á n í N
  • dsoudcovat V
usoudcovat V s
  • u
d c
  • v
a c í A soudcující A dosouditelně D dosouditelnost N dosouzeně D d
  • s
  • u
z e n
  • s
t N nasouditelně D nasouditelnost N n a s
  • u
z e n ě D nasouzenost N p r
  • s
  • u
d i t e l n ě D prosouditelnost N p r
  • s
  • u
z e n ě D prosouzenost N vysuzovávaný A v y s u z
  • v
á v a t e l n ý A vysuzovávání N v y s u z
  • v
á v a c í A v y s u z
  • v
á v a j í c í A vysuzovaně D vysuzovanost N v y s u z
  • v
a t e l n ě D vysuzovatelnost N usuzovávaný A usuzovávatelný A u s u z
  • v
á v á n í N usuzovávací A usuzovávající A u s u z
  • v
a t e l k a N usuzovatelův A usuzovaně D usuzovanost N usuzovatelně D usuzovatelnost N r
  • z
s u d k
  • v
  • s
t N r
  • z
s u z
  • v
á v a n ý A rozsuzovávatelný A rozsuzovávání N rozsuzovávací A rozsuzovávající A rozsuzovatelka N r
  • z
s u z
  • v
a t e l ů v A rozsuzovaně D r
  • z
s u z
  • v
a n
  • s
t N rozsuzovatelně D r
  • z
s u z
  • v
a t e l n
  • s
t N p ř í s u d k
  • v
ě D přísudkovost N p ř i s u z
  • v
á v a n ý A p ř i s u z
  • v
á v a t e l n ý A p ř i s u z
  • v
á v á n í N přisuzovávací A p ř i s u z
  • v
á v a j í c í A přisuzovaně D přisuzovanost N přisuzovatelně D p ř i s u z
  • v
a t e l n
  • s
t N posuzovávaný A posuzovávatelný A p
  • s
u z
  • v
á v á n í N posuzovávací A posuzovávající A p
  • s
u z
  • v
a n ě D posuzovanost N p
  • s
u z
  • v
a t e l n ě D p
  • s
u z
  • v
a t e l n
  • s
t N posuzovatelka N posuzovatelův A posuzovatelský A posudkově D posudkovost N
  • d
s u z
  • v
á v a n ý A
  • d
s u z
  • v
á v a t e l n ý A
  • d
s u z
  • v
á v á n í N
  • dsuzovávací A
  • d
s u z
  • v
á v a j í c í A
  • dsuzovatelka N
  • dsuzovatelův A
  • d
s u z
  • v
a n ě D
  • dsuzovanost N
  • dsuzovatelně D
  • d
s u z
  • v
a t e l n
  • s
t N
  • dsouzenkyně N
  • dsouzencův A
  • dsouzenecký A
s
  • u
d n i č k
  • v
ě D soudničkovost N soudničkářka N s
  • u
d n i č k á ř ů v A s
  • u
d n i č k á ř s k ý A s u d
  • v
á v a n ý A s u d
  • v
á v a t e l n ý A s u d
  • v
á v á n í N sudovávací A sudovávající A s u d
  • v
a n ě D s u d
  • v
a n
  • s
t N s u d
  • v
a t e l n ě D s u d
  • v
a t e l n
  • s
t N n a s u d
  • v
á v a t V nasudovaný A nasudovatelný A nasudování N nasudovací A nasudovavší A soudcovávaný A s
  • u
d c
  • v
á v a t e l n ý A soudcovávání N s
  • u
d c
  • v
á v a c í A s
  • u
d c
  • v
á v a j í c í A s
  • u
d c
  • v
a n ě D soudcovanost N soudcovatelně D soudcovatelnost N
  • dsoudcovávat V
  • dsoudcovaný A
  • d
s
  • u
d c
  • v
a t e l n ý A
  • d
s
  • u
d c
  • v
á n í N
  • dsoudcovací A
  • d
s
  • u
d c
  • v
a v š í A u s
  • u
d c
  • v
á v a t V usoudcovaný A u s
  • u
d c
  • v
a t e l n ý A u s
  • u
d c
  • v
á n í N usoudcovací A u s
  • u
d c
  • v
a v š í A vysuzovávaně D v y s u z
  • v
á v a n
  • s
t N v y s u z
  • v
á v a t e l n ě D vysuzovávatelnost N u s u z
  • v
á v a n ě D usuzovávanost N u s u z
  • v
á v a t e l n ě D usuzovávatelnost N u s u z
  • v
a t e l č i n A rozsuzovávaně D r
  • z
s u z
  • v
á v a n
  • s
t N rozsuzovávatelně D r
  • z
s u z
  • v
á v a t e l n
  • s
t N rozsuzovatelčin A p ř i s u z
  • v
á v a n ě D přisuzovávanost N přisuzovávatelně D přisuzovávatelnost N p
  • s
u z
  • v
á v a n ě D p
  • s
u z
  • v
á v a n
  • s
t N posuzovávatelně D posuzovávatelnost N posuzovatelčin A posuzovatelsky D posuzovatelskost N
  • d
s u z
  • v
á v a n ě D
  • dsuzovávanost N
  • dsuzovávatelně D
  • d
s u z
  • v
á v a t e l n
  • s
t N
  • dsuzovatelčin A
  • dsouzenkynin A
  • dsouzenecky D
  • d
s
  • u
z e n e c k
  • s
t N soudničkářčin A s
  • u
d n i č k á ř s k y D s
  • u
d n i č k á ř s k
  • s
t N sudovávaně D s u d
  • v
á v a n
  • s
t N s u d
  • v
á v a t e l n ě D sudovávatelnost N n a s u d
  • v
á v a n ý A nasudovávatelný A n a s u d
  • v
á v á n í N n a s u d
  • v
á v a c í A n a s u d
  • v
á v a j í c í A nasudovaně D nasudovanost N nasudovatelně D n a s u d
  • v
a t e l n
  • s
t N s
  • u
d c
  • v
á v a n ě D soudcovávanost N soudcovávatelně D s
  • u
d c
  • v
á v a t e l n
  • s
t N
  • d
s
  • u
d c
  • v
á v a n ý A
  • dsoudcovávatelný A
  • dsoudcovávání N
  • d
s
  • u
d c
  • v
á v a c í A
  • d
s
  • u
d c
  • v
á v a j í c í A
  • dsoudcovaně D
  • dsoudcovanost N
  • d
s
  • u
d c
  • v
a t e l n ě D
  • d
s
  • u
d c
  • v
a t e l n
  • s
t N usoudcovávaný A usoudcovávatelný A u s
  • u
d c
  • v
á v á n í N usoudcovávací A usoudcovávající A usoudcovaně D u s
  • u
d c
  • v
a n
  • s
t N u s
  • u
d c
  • v
a t e l n ě D u s
  • u
d c
  • v
a t e l n
  • s
t N n a s u d
  • v
á v a n ě D n a s u d
  • v
á v a n
  • s
t N n a s u d
  • v
á v a t e l n ě D nasudovávatelnost N
  • dsoudcovávaně D
  • d
s
  • u
d c
  • v
á v a n
  • s
t N
  • dsoudcovávatelně D
  • dsoudcovávatelnost N
u s
  • u
d c
  • v
á v a n ě D u s
  • u
d c
  • v
á v a n
  • s
t N usoudcovávatelně D usoudcovávatelnost N

avaliar VERB avaliador NOUN avaliação NOUN

aestimo VERB aestimabilis ADJ aestimatio NOUN aestimator NOUN aestimatorius ADJ coaestimo VERB peraestimo VERB inaestimabilis ADJ arvokas ADJ a r v

  • e

s i n e N O U N arvokkuus NOUN a r v

  • s

t a a V E R B arvo NOUN arvonmääritys NOUN arvioida VERB arvostettu ADJ arvoinen ADJ a r v i

  • i

t s i j a N O U N arvon määrittäminen NOUN arviointi NOUN arviomies NOUN a r v i

  • i

v a A D J arvostelija NOUN arviolaskelma NOUN a r v i

  • i

j a N O U N a r v i

  • N

O U N a r v

  • s

t u s N O U N arvohenkilö NOUN a r v

  • t

t a a V E R B a r v i

  • i

d a v e r

  • n

s u u r u u s V E R B a r v

  • s

t e l l a V E R B a r v

  • s

t e l u N O U N

evaluar evaluable evaluación evaluador evaluativo evaluatorio e v a l u a b i l i d a d inevaluable autoevaluación reevaluación évaluer VERB évaluateur NOUN évaluation NOUN évaluatif ADJ

hindama VERB hindamisi ADV

bewerten VERB B e w e r t u n g N O U N b e w e r t e n d A D J b e w e r t e t A D J umbewerten VERB unterbewerten VERB ü b e r b e w e r t e n V E R B F e h l b e w e r t u n g N O U N Bewertete NOUN überbewertet ADJ Unterbewertung NOUN u n t e r b e w e r t e t A D J Ü b e r b e w e r t u n g N O U N

evaluate VERB evaluation NOUN evaluator NOUN

DErivBase Word Formation Latin DeriNet.ES Démonette The Polish Word-Formation Network FinnWordNet DeriNet.FA English WordNet NomLex-PT EstWordNet DeriNet

ارز ا ر ز آ و ر ی ارزﻧﺪه ارزﻫﺎ ارزی ا ر ز ﯾ ﺎ ب ارزﻧﺪﻫﺎش ارزﻧﺪﻫﺎی ا ر ز ﻧ ﺪ ﻫ ﺘ ﺮ ﯾ ﻦ ارزﻫﺎی ا ر ز ﯾ ﺎ ﺑ ﯽ

Kyj´ anek et al. Universal Derivations DeriMo 2019 26 / 31

slide-51
SLIDE 51

Future perspectives

We are not dogmatic about UDer’s design decisions, not at all. Our main ambition: to provide the initial momentum and start the snowballing effect. Maybe our lexeme-centric representation will serve only as “Wittgenstein’s ladder”, and will be replaced

▸ by a paradigm-centric approach, ▸ by a morpheme-centric approach, ▸ or by something completely new . . . who knows? Kyj´ anek et al. Universal Derivations DeriMo 2019 27 / 31

slide-52
SLIDE 52

Take home message

There’s a collection of derivational databases for 11 languages converted into the same format. Publicly available in the LINDAT/Clarin repository under CC. Searchable using an online query interface. We will be happy if you start using it . . . . . . and we will be even happier if you allow include your data.

Kyj´ anek et al. Universal Derivations DeriMo 2019 28 / 31

slide-53
SLIDE 53

Acknowledgements

We would like to thank all brave men and women who made their own derivational resources publicly available under open licenses.

Kyj´ anek et al. Universal Derivations DeriMo 2019 29 / 31

slide-54
SLIDE 54

Time for a demo?

Kyj´ anek et al. Universal Derivations DeriMo 2019 30 / 31

slide-55
SLIDE 55

Thank you!

https://ufal.mff.cuni.cz/universal-derivations

Kyj´ anek et al. Universal Derivations DeriMo 2019 31 / 31