The Treatment of Word Formation in the LiLa Knowledge Base Eleonora - - PowerPoint PPT Presentation

the treatment of word formation in the lila knowledge base
SMART_READER_LITE
LIVE PREVIEW

The Treatment of Word Formation in the LiLa Knowledge Base Eleonora - - PowerPoint PPT Presentation

The Treatment of Word Formation in the LiLa Knowledge Base Eleonora Litta , Marco Passarotti and Francesco Mambrini DeriMo 2019 | FAL, Prague | 19-20 September 2019 Research question State of affairs 1 We have built and collected (for Latin


slide-1
SLIDE 1

The Treatment of Word Formation in the LiLa Knowledge Base

Eleonora Litta, Marco Passarotti and Francesco Mambrini DeriMo 2019 | ÚFAL, Prague | 19-20 September 2019

slide-2
SLIDE 2

1

Research question

State of affairs

We have built and collected (for Latin and other languages):

◮ Textual Resources ◮ Lexical Resources ◮ NLP Tools

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-3
SLIDE 3

1

Research question

State of affairs

We have built and collected (for Latin and other languages):

◮ Textual Resources ◮ Lexical Resources ◮ NLP Tools

Scattered and unconnected

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-4
SLIDE 4

2

Research need

Making sense

To make sense of this quantity of empirical data:

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-5
SLIDE 5

2

Research need

Making sense

To make sense of this quantity of empirical data:

◮ to extract maximum benefit from our research investments

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-6
SLIDE 6

2

Research need

Making sense

To make sense of this quantity of empirical data:

◮ to extract maximum benefit from our research investments ◮ to impact and improve the life of Classicists through exploitable computational resources and

tools

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-7
SLIDE 7

2

Research need

Making sense

To make sense of this quantity of empirical data:

◮ to extract maximum benefit from our research investments ◮ to impact and improve the life of Classicists through exploitable computational resources and

tools

From Information to Knowledge

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-8
SLIDE 8

3

LiLa Knowledge Base

Approach: Linked Data paradigm

2018-2023

A collection of interoperable linguistics resources (and NLP tools) described with the same vocabulary for knowledge description

Interlinking as a Form of Interaction

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-9
SLIDE 9

4

LiLa Knowledge Base

Conceptual and structural interoperability

LiLa is based on an ontology made of:

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-10
SLIDE 10

4

LiLa Knowledge Base

Conceptual and structural interoperability

LiLa is based on an ontology made of:

◮ Individuals: instances of objects (one specific token, lemma etc.)

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-11
SLIDE 11

4

LiLa Knowledge Base

Conceptual and structural interoperability

LiLa is based on an ontology made of:

◮ Individuals: instances of objects (one specific token, lemma etc.) ◮ Classes: types of objects/concepts (token, lemma, PoS etc.)

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-12
SLIDE 12

4

LiLa Knowledge Base

Conceptual and structural interoperability

LiLa is based on an ontology made of:

◮ Individuals: instances of objects (one specific token, lemma etc.) ◮ Classes: types of objects/concepts (token, lemma, PoS etc.) ◮ Data properties: attributes that objects can/must have (morphological features for

lemmas/tokens)

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-13
SLIDE 13

4

LiLa Knowledge Base

Conceptual and structural interoperability

LiLa is based on an ontology made of:

◮ Individuals: instances of objects (one specific token, lemma etc.) ◮ Classes: types of objects/concepts (token, lemma, PoS etc.) ◮ Data properties: attributes that objects can/must have (morphological features for

lemmas/tokens)

◮ Object properties: ways in which classes and individuals can be related to one another: RDF

triples. Labels from a restricted vocabulary of knowledge description: hasLemma, hasPoS

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-14
SLIDE 14

4

LiLa Knowledge Base

Conceptual and structural interoperability

LiLa is based on an ontology made of:

◮ Individuals: instances of objects (one specific token, lemma etc.) ◮ Classes: types of objects/concepts (token, lemma, PoS etc.) ◮ Data properties: attributes that objects can/must have (morphological features for

lemmas/tokens)

◮ Object properties: ways in which classes and individuals can be related to one another: RDF

triples. Labels from a restricted vocabulary of knowledge description: hasLemma, hasPoS Each component of the ontology is uniquely identified through a URI.

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-15
SLIDE 15

5

LiLa Knowledge Base

Lexically-based architecture and (meta)data sources

Morpho_Feats Form/Lemma NLP_Tools Textual_Ress Token Lexical_Ress

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-16
SLIDE 16

6

Word Formation Latin

recap

WFL: Word formation-based lexical resource for Classical Latin

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-17
SLIDE 17

6

Word Formation Latin

recap

WFL: Word formation-based lexical resource for Classical Latin

◮ WFRs are modelled as directed one-to-many input-output relations between lemmas (based

  • n I&A model of grammatical description)

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-18
SLIDE 18

6

Word Formation Latin

recap

WFL: Word formation-based lexical resource for Classical Latin

◮ WFRs are modelled as directed one-to-many input-output relations between lemmas (based

  • n I&A model of grammatical description)

◮ Morphotactic approach: each WF process is treated individually as the application of one

single rule in a certain order

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-19
SLIDE 19

7

WFL Online

https://wfl.marginalia.it

◮ Relationships between lemmas of the same “word formation family” are represented as the

edges in a directed graph with a hierarchical tree-like structure

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-20
SLIDE 20

7

WFL Online

https://wfl.marginalia.it

◮ Relationships between lemmas of the same “word formation family” are represented as the

edges in a directed graph with a hierarchical tree-like structure

◮ A node is a lemma, and an edge is the WFR used to derive the output lemma from the input

  • ne, together with any affix

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-21
SLIDE 21

8

WFL I&A

Problems

But: directed graphs are not completely satisfactory in representing the full range of relationships included within a word formation family. Main problems:

◮ Directionality ◮ Non-linear derivations

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-22
SLIDE 22

9

Paradigmatic approach to WF: Requirements

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-23
SLIDE 23

9

Paradigmatic approach to WF: Requirements

◮ No directionality: necessary to accommodate those lemmas for which the derivational

process is not of the simplex (or simpler) > complex type

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-24
SLIDE 24

9

Paradigmatic approach to WF: Requirements

◮ No directionality: necessary to accommodate those lemmas for which the derivational

process is not of the simplex (or simpler) > complex type

◮ The CELL has a central role in the paradigm (predictability and regularity)

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-25
SLIDE 25

9

Paradigmatic approach to WF: Requirements

◮ No directionality: necessary to accommodate those lemmas for which the derivational

process is not of the simplex (or simpler) > complex type

◮ The CELL has a central role in the paradigm (predictability and regularity) ◮ Each cell must be described in both its morphological characteristics and its semantic

features, due to the underlying role of semantics in accounting for derivational processes

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-26
SLIDE 26

10

Word Formation in LiLa

Different approach to Word Formation:

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-27
SLIDE 27

10

Word Formation in LiLa

Different approach to Word Formation:

◮ Structure: declarative rather than procedural

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-28
SLIDE 28

10

Word Formation in LiLa

Different approach to Word Formation:

◮ Structure: declarative rather than procedural ◮ No directionality

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-29
SLIDE 29

10

Word Formation in LiLa

Different approach to Word Formation:

◮ Structure: declarative rather than procedural ◮ No directionality ◮ No morphotaxis.

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-30
SLIDE 30

11

Background Theory:

Construction Morphology (CxM)

◮ Construction: [co(n) [stell](a)(t)io]N (more specific)

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-31
SLIDE 31

11

Background Theory:

Construction Morphology (CxM)

◮ Construction: [co(n) [stell](a)(t)io]N (more specific) ◮ Schema [co(n)[x](t)io]N (more generalised)

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-32
SLIDE 32

11

Background Theory:

Construction Morphology (CxM)

◮ Construction: [co(n) [stell](a)(t)io]N (more specific) ◮ Schema [co(n)[x](t)io]N (more generalised) ◮ Constructions and schemas are word-based and declarative

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-33
SLIDE 33

11

Background Theory:

Construction Morphology (CxM)

◮ Construction: [co(n) [stell](a)(t)io]N (more specific) ◮ Schema [co(n)[x](t)io]N (more generalised) ◮ Constructions and schemas are word-based and declarative ◮ Perfect for LiLa => words are described in their formative elements, which can be organised

into connected classes of objects into an ontology.

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-34
SLIDE 34

12

Word formation in LiLa

Three classes of objects:

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-35
SLIDE 35

12

Word formation in LiLa

Three classes of objects:

  • 1. Lemmas

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-36
SLIDE 36

12

Word formation in LiLa

Three classes of objects:

  • 1. Lemmas
  • 2. Affixes (prefixes and suffixes)

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-37
SLIDE 37

12

Word formation in LiLa

Three classes of objects:

  • 1. Lemmas
  • 2. Affixes (prefixes and suffixes)
  • 3. Bases (connectors between lemmas of the same WF family)

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-38
SLIDE 38

12

Word formation in LiLa

Three classes of objects:

  • 1. Lemmas
  • 2. Affixes (prefixes and suffixes)
  • 3. Bases (connectors between lemmas of the same WF family)

Connected by three possibile relationships:

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-39
SLIDE 39

12

Word formation in LiLa

Three classes of objects:

  • 1. Lemmas
  • 2. Affixes (prefixes and suffixes)
  • 3. Bases (connectors between lemmas of the same WF family)

Connected by three possibile relationships:

  • 1. hasPrefix
  • 2. hasSuffix
  • 3. hasBase

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-40
SLIDE 40

13

Stella - WFL

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-41
SLIDE 41

14

STELL - LiLa

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-42
SLIDE 42

15

WFL in LiLa

Prototype

LiLa triplestore available at: https://lila-erc.eu/data/ Connecting WF info with various linguistic resources, e.g.

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-43
SLIDE 43

15

WFL in LiLa

Prototype

LiLa triplestore available at: https://lila-erc.eu/data/ Connecting WF info with various linguistic resources, e.g.

◮ Find all occurrences of lemmas from the same WF family in the corpora connected in LiLa

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-44
SLIDE 44

15

WFL in LiLa

Prototype

LiLa triplestore available at: https://lila-erc.eu/data/ Connecting WF info with various linguistic resources, e.g.

◮ Find all occurrences of lemmas from the same WF family in the corpora connected in LiLa ◮ Find all occurrences of nouns displaying agent/instrument and action suffixes (tio/tor) that

govern verbs as subjects in the Latin treebanks connected in LiLa

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-45
SLIDE 45

15

WFL in LiLa

Prototype

LiLa triplestore available at: https://lila-erc.eu/data/ Connecting WF info with various linguistic resources, e.g.

◮ Find all occurrences of lemmas from the same WF family in the corpora connected in LiLa ◮ Find all occurrences of nouns displaying agent/instrument and action suffixes (tio/tor) that

govern verbs as subjects in the Latin treebanks connected in LiLa

◮ Count the frequency of the 15 most used affixes attached to nouns

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-46
SLIDE 46

16

Future Plans

Many!

◮ Find a way of defining and naming all "base" nodes ◮ Perhaps try to add word formation specific semantic information to the LiLa knowledge base ◮ Enlarge the lexical basis for which WF is provided with Medieval Latin lemmas contained in

Lemlat.

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-47
SLIDE 47

17

Conclusions

Summing up

Added value of adding WFL to the LiLa Knowledge Base:

◮ allows for a better displayed, less assuming, less problematic way of describing words in their

formative elements

◮ lets us connect a lexical resource with the realisation of its words into texts.

Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base

slide-48
SLIDE 48

18

Thank you!

Get in touch

The LiLa Team

Università Cattolica del Sacro Cuore CIRCSE Research Centre info@lila-erc.eu https://github.com/CIRCSE https://lila-erc.eu @ERC_LiLa Largo Gemelli 1, 20123 Milan, Italy

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme - Grant Agreement No. 769994. Eleonora Litta, Marco Passarotti and Francesco Mambrini | The Treatment of Word Formation in the LiLa Knowledge Base