Segmentation strategies for in fl ection class inference Sacha - - PowerPoint PPT Presentation

segmentation strategies for in fl ection class inference
SMART_READER_LITE
LIVE PREVIEW

Segmentation strategies for in fl ection class inference Sacha - - PowerPoint PPT Presentation

Segmentation strategies for in fl ection class inference Sacha Beniamine (LLF), Benot Sagot (Alpage) Universit Paris Diderot Dcembre es , Toulouse, / No consensus on how to obtain the classification We


slide-1
SLIDE 1

Segmentation strategies for inflection class inference

Beniamine (LLF), Benoît Sagot (Alpage) Université Paris Diderot Décembrees , Toulouse, 

 / 

Sacha

slide-2
SLIDE 2

    

▶ Concept of Inflection Classes widely used to analyse

inflectional systems

▶ e definition of IC is crucial for many linguistic and

psycholinguistic studies, yet they are oen taken for granted.

No consensus on how to obtain the classification We explore the concept through computational means: Brown and Evans, ; Lee and Goldsmith, ; Bonami, 

Formal definitions of the concept Large datasets Reproducible classifications Commensurable across languages Basis for theoretical and typological comparisons

 / 

slide-3
SLIDE 3

    

▶ Concept of Inflection Classes widely used to analyse

inflectional systems

▶ e definition of IC is crucial for many linguistic and

psycholinguistic studies, yet they are oen taken for granted.

▶ No consensus on how to obtain the classification

We explore the concept through computational means: Brown and Evans, ; Lee and Goldsmith, ; Bonami, 

Formal definitions of the concept Large datasets Reproducible classifications Commensurable across languages Basis for theoretical and typological comparisons

 / 

slide-4
SLIDE 4

    

▶ Concept of Inflection Classes widely used to analyse

inflectional systems

▶ e definition of IC is crucial for many linguistic and

psycholinguistic studies, yet they are oen taken for granted.

▶ No consensus on how to obtain the classification ▶ We explore the concept through computational means:

Brown and Evans, ; Lee and Goldsmith, ; Bonami, 

▶ Formal definitions of the concept ▶ Large datasets ▶ Reproducible classifications ▶ Commensurable across languages ▶ Basis for theoretical and typological comparisons

 / 

slide-5
SLIDE 5

I 

Groups of lexemes that inflect alike.  .. ..   ‘hold’ təniʁ tjɛ̃ tjɛn təny  ‘finish’ finiʁ fini finis fini  ‘hate’ aiʁ ɛ ais ai  ‘peel’ pəle pɛl pɛl pəle

}

 ‘wash’ lave lav lav lave }  ‘press’ tase tas tas tase

 / 

slide-6
SLIDE 6

I 

Groups of lexemes that inflect alike .  .. ..   ‘hold’ təniʁ tjɛ̃ tjɛn təny  ‘finish’ finiʁ fini finis fini  ‘hate’ aiʁ ɛ ais ai  ‘peel’ pəle pɛl pɛl pəle

}

 ‘wash’ lave lav lav lave }  ‘press’ tase tas tas tase

 / 

slide-7
SLIDE 7

W     IC   *. . What form should an IC system take? . What Inflectional Realisations should we infer from the data? . How do we measure which lexemes inflect alike? . How do we find the best classes among all possible ones?

 / 

slide-8
SLIDE 8

T  C . What form should an Inflection class (IC) system take? . What generalisations should we infer from the data? . How do we assess which lexemes inflect alike? . How do we find the best classes among all possible ones? . Results and discussion . Conclusion

 / 

slide-9
SLIDE 9

I : C  ?

▶ Insight from Canonical Typology (Corbe, ).

An ideal inflection class system is a partition of the set of lexemes that is:

Cohesive: Maximal homogeneity within classes Distinctive: Maximal heterogeneity between classes

In most languages, each of these criteria leads to different partitions:

favouring cohesion: numerous small, similar classe favouring distinction: fewer large classes with exceptions

 / 

slide-10
SLIDE 10

I : C  ?

▶ Insight from Canonical Typology (Corbe, ).

An ideal inflection class system is a partition of the set of lexemes that is:

▶ Cohesive: Maximal homogeneity within classes

Distinctive: Maximal heterogeneity between classes

In most languages, each of these criteria leads to different partitions:

favouring cohesion: numerous small, similar classe favouring distinction: fewer large classes with exceptions

 / 

slide-11
SLIDE 11

I : C  ?

▶ Insight from Canonical Typology (Corbe, ).

An ideal inflection class system is a partition of the set of lexemes that is:

▶ Cohesive: Maximal homogeneity within classes ▶ Distinctive: Maximal heterogeneity between classes

In most languages, each of these criteria leads to different partitions:

favouring cohesion: numerous small, similar classe favouring distinction: fewer large classes with exceptions

 / 

slide-12
SLIDE 12

I : C  ?

▶ Insight from Canonical Typology (Corbe, ).

An ideal inflection class system is a partition of the set of lexemes that is:

▶ Cohesive: Maximal homogeneity within classes ▶ Distinctive: Maximal heterogeneity between classes

▶ In most languages, each of these criteria leads to different

partitions:

favouring cohesion: numerous small, similar classe favouring distinction: fewer large classes with exceptions Lexeme  . . . . ‘hold’ təniʁ tjɛ̃ tjɛn təny . . ‘finish’ finiʁ fini finis fini . .  ‘hate’ aiʁ ɛ ais ai . . ‘peel’ pəle pɛl pɛl pəle . . ‘wash’ lave lav lav lave . . ‘press’ tase tas tas tase .

 / 

slide-13
SLIDE 13

I : C  ?

▶ Insight from Canonical Typology (Corbe, ).

An ideal inflection class system is a partition of the set of lexemes that is:

▶ Cohesive: Maximal homogeneity within classes ▶ Distinctive: Maximal heterogeneity between classes

▶ In most languages, each of these criteria leads to different

partitions:

▶ favouring cohesion: numerous small, similar classe

favouring distinction: fewer large classes with exceptions Lexeme  . . . . ‘hold’ təniʁ tjɛ̃ tjɛn təny . . ‘finish’ finiʁ fini finis fini . .  ‘hate’ aiʁ ɛ ais ai . . ‘peel’ pəle pɛl pɛl pəle . . ‘wash’ lave lav lav lave . . ‘press’ tase tas tas tase .

.

  • .
  • .
  • .
  • .
  •  / 
slide-14
SLIDE 14

I : C  ?

▶ Insight from Canonical Typology (Corbe, ).

An ideal inflection class system is a partition of the set of lexemes that is:

▶ Cohesive: Maximal homogeneity within classes ▶ Distinctive: Maximal heterogeneity between classes

▶ In most languages, each of these criteria leads to different

partitions:

▶ favouring cohesion: numerous small, similar classe ▶ favouring distinction: fewer large classes with exceptions

Lexeme  . . . . ‘hold’ təniʁ tjɛ̃ tjɛn təny . . ‘finish’ finiʁ fini finis fini . .  ‘hate’ aiʁ ɛ ais ai . . ‘peel’ pəle pɛl pɛl pəle . . ‘wash’ lave lav lav lave . . ‘press’ tase tas tas tase .

.

  • .
  • .
  • .
  • .
  • .

.

  • .
  • .
  •  / 
slide-15
SLIDE 15

I : M  ?

▶ Dressler and ornton’s terminology (): ▶ Micro-classes

▶ Numerous small, similar classes.

▶ Macro-classes

▶ Fewer large classes with exceptions.

Combined in a hierarchy. (Corbe and Fraser, ; Dressler and ornton, ; Brown and Evans, )

Lexeme  . . . . ‘hold’ təniʁ tjɛ̃ tjɛn təny . . ‘finish’ finiʁ fini finis fini . .  ‘hate’ aiʁ ɛ ais ai . . ‘peel’ pəle pɛl pɛl pəle . . ‘wash’ lave lav lav lave . . ‘press’ tase tas tas tase .

.

  • .
  • .
  • .
  • .
  • .
  • .
  • .
  •  / 
slide-16
SLIDE 16

I : M  ?

▶ Dressler and ornton’s terminology (): ▶ Micro-classes

▶ Numerous small, similar classes.

▶ Macro-classes

▶ Fewer large classes with exceptions.

▶ Combined in a hierarchy. (Corbe and Fraser, ; Dressler

and ornton, ; Brown and Evans, )

Lexeme  . . . . ‘hold’ təniʁ tjɛ̃ tjɛn təny . . ‘finish’ finiʁ fini finis fini . .  ‘hate’ aiʁ ɛ ais ai . . ‘peel’ pəle pɛl pɛl pəle . . ‘wash’ lave lav lav lave . . ‘press’ tase tas tas tase .

.

  • .
  • .
  • .
  • .
  • .

.

  • .
  • .
  • .
  •  / 
slide-17
SLIDE 17

T     

▶ School grammar (Bescherelle) :

Kilani-Schoch and Dressler, : different microclasses, some dropped, two macroclasses (dual route).

 / 

slide-18
SLIDE 18

T     

▶ School grammar (Bescherelle) ▶ Kilani-Schoch and Dressler, : different microclasses, some

dropped, two macroclasses (dual route).

 / 

slide-19
SLIDE 19

I : M  ?

▶ Micro-classes

▶ Homogenous: Numerous small, similar classes. ▶ Inventories vary across accounts. ▶ Empirically motivated

▶ Macro-classes

▶ Heterogenous: Fewer large classes with ”exceptions”. ▶ High variation across accounts. ▶ Empirical motivation in question:

 / 

slide-20
SLIDE 20

I : M  ?

▶ Micro-classes

▶ Homogenous: Numerous small, similar classes. ▶ Inventories vary across accounts. ▶ Empirically motivated

▶ Macro-classes

▶ Heterogenous: Fewer large classes with ”exceptions”. ▶ High variation across accounts. ▶ Empirical motivation in question:

Are macroclasses a descriptive artefact?

 / 

slide-21
SLIDE 21

T  C . What form should an Inflection class (IC) system take? . What generalisations should we infer from the data? . How do we assess which lexemes inflect alike? . How do we find the best classes among all possible ones? . Results and discussion . Conclusion

 / 

slide-22
SLIDE 22

T      I R.

▶ Stem and exponents

▶ Captures differences between cells under the assumption of a

constant stem.

▶ cf. (Blevins, )’s notion of constructive approach.

Binary alternation patterns

Captures the implicative relation between each pair of cells.

  • cf. (Blevins, )’s notion of abstractive approach.

Both rely on a segmentation of forms.

global segmentation over the whole paradigm. local segmentation over pairs of forms.

 / 

slide-23
SLIDE 23

T      I R.

▶ Stem and exponents

▶ Captures differences between cells under the assumption of a

constant stem.

▶ cf. (Blevins, )’s notion of constructive approach.

▶ Binary alternation patterns

▶ Captures the implicative relation between each pair of cells. ▶ cf. (Blevins, )’s notion of abstractive approach.

Both rely on a segmentation of forms.

global segmentation over the whole paradigm. local segmentation over pairs of forms.

 / 

slide-24
SLIDE 24

T      I R.

▶ Stem and exponents

▶ Captures differences between cells under the assumption of a

constant stem.

▶ cf. (Blevins, )’s notion of constructive approach.

▶ Binary alternation patterns

▶ Captures the implicative relation between each pair of cells. ▶ cf. (Blevins, )’s notion of abstractive approach.

▶ Both rely on a segmentation of forms.

▶ global segmentation over the whole paradigm. ▶ local segmentation over pairs of forms.

 / 

slide-25
SLIDE 25

S 

▶ Global: On the basis of a whole paradigm. ▶ Local: On each pair of cells.

Lexeme  . . .  ‘hold’ təniʁ tjɛ̃ tjɛn təny  ‘finish’ finiʁ fini finis fini  ‘hate’ aiʁ ɛ ais ai  ‘peel’ pəle pɛl pɛl pəle  ‘wash’ lave lav lav lave  ‘press’ tase tas tas tase

 / 

slide-26
SLIDE 26

S 

▶ Global: On the basis of a whole paradigm. ▶ Local: On each pair of cells.

Lexeme  . . .  ‘hold’ Xəniʁ Xjɛ̃ Xjɛn Xəny  ‘finish’ Xʁ X Xs X  ‘hate’ aiʁ ɛ ais ai  ‘peel’ X1əX2e X1ɛX2 X1ɛX2 X1əX2e  ‘wash’ Xe X X Xe  ‘press’ Xe X X Xe

 / 

slide-27
SLIDE 27

S 

▶ Global: On the basis of a whole paradigm. ▶ Local: On each pair of cells.

Lexeme ⇌ . ⇌ . ⇌ . …  ‘hold’ Xəniʁ ⇌ Xjɛ̃ Yəniʁ ⇌ Yjɛn Ziʁ ⇌ Zy  ‘finish’ Xʁ ⇌ X Yʁ ⇌ Ys Zʁ ⇌ Z  ‘hate’ aiʁ ⇌ ɛ Yʁ ⇌ Ys Zʁ ⇌ Z …  ‘peel’ X1əX2e ⇌ X1ɛX2 Y1əY2e ⇌ Y1ɛY2 Z ⇌ Z  ‘wash’ Xe ⇌ X Ye ⇌ Y Z ⇌ Z  ‘press’ Xe ⇌ X Ye ⇌ Y Z ⇌ Z

 / 

slide-28
SLIDE 28

A  

▶ In general, grouping elements into classes is a clustering

problem.

▶ ere are many well-known solutions in computer science to

address such problems.

▶ All of them require two things:

▶ A criterion to evaluate the quality of clusters (classes). ▶ An algorithm to explore the search space of all possible

groupings.

 / 

slide-29
SLIDE 29

A  

▶ In general, grouping elements into classes is a clustering

problem.

▶ ere are many well-known solutions in computer science to

address such problems.

▶ All of them require two things:

▶ A criterion to evaluate the quality of clusters (classes).

→ Minimum description length

▶ An algorithm to explore the search space of all possible

groupings.

 / 

slide-30
SLIDE 30

A  

▶ In general, grouping elements into classes is a clustering

problem.

▶ ere are many well-known solutions in computer science to

address such problems.

▶ All of them require two things:

▶ A criterion to evaluate the quality of clusters (classes).

→ Minimum description length

▶ An algorithm to explore the search space of all possible

groupings. → Greedy boom-up algorithm

 / 

slide-31
SLIDE 31

T  C . What form should an Inflection class (IC) system take? . What generalisations should we infer from the data? . How do we assess which lexemes inflect alike? . How do we find the best classes among all possible ones? . Results and discussion . Conclusion

 / 

slide-32
SLIDE 32

D 

▶ Minimum description length (Rissanen, ): Choose the

model allowing for the shortest description of the data.

▶ A partition of the set of lexemes is beer than another one if it

leads to a more economical description of the system. (Sagot and Walther, ; Walther, ) DL(system) = number of symbols × − ∑

x ∈symbols

P(x) × log2 (P(x))

  • entropy

 / 

slide-33
SLIDE 33

D         

▶ We break down the description length into four components:

Toy imaginary dataset with three cells A, B and D.

 / 

slide-34
SLIDE 34

D         

▶ We break down the description length into four components:

Toy imaginary dataset with three cells A, B and D.

 / 

slide-35
SLIDE 35

D         

▶ We break down the description length into four components:

Toy imaginary dataset with three cells A, B and D.

 / 

slide-36
SLIDE 36

D         

▶ We break down the description length into four components:

Toy imaginary dataset with three cells A, B and D.

 / 

slide-37
SLIDE 37

D         

▶ We break down the description length into four components:

Toy imaginary dataset with three cells A, B and D.

 / 

slide-38
SLIDE 38

D         

▶ We break down the description length into four components:

DL = M + C + P + R

 / 

slide-39
SLIDE 39

T  C . What form should an Inflection class (IC) system take? . What generalisations should we infer from the data? . How do we assess which lexemes inflect alike? . How do we find the best classes among all possible ones? . Results and discussion . Conclusion

 / 

slide-40
SLIDE 40

C , .  E P . (a) Begin with a partition into microclasses. (b) Merge the pair optimising DL to get a new partition. (c) Repeat until there is only  class. (d) Run several times, merge variations.

. .

. . . . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . . . . . jogar () . . . . . . . . . . . levar () . . . . . . . . . . . . . nomear () . . . . . . desempenhar () . . . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-41
SLIDE 41

C , .  E P . (a) Begin with a partition into microclasses. (b) Merge the pair optimising DL to get a new partition. (c) Repeat until there is only  class. (d) Run several times, merge variations.

. .

. . . . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . . . . . jogar () . . . . . . . . . . . levar () . . . . . . . . . . . . . nomear () . . . . . . desempenhar () . . . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-42
SLIDE 42

C , .  E P . (a) Begin with a partition into microclasses. (b) Merge the pair optimising DL to get a new partition. (c) Repeat until there is only  class. (d) Run several times, merge variations.

. .

. . . . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . . . . . jogar () . . . . . . . . . . . levar () . . . . . . . . . . . . . nomear () . . . . . . desempenhar () . . . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-43
SLIDE 43

C , .  E P . (a) Begin with a partition into microclasses. (b) Merge the pair optimising DL to get a new partition. (c) Repeat until there is only  class. (d) Run several times, merge variations.

. .

. . . . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . . . . . jogar () . . . . . . . . . . . levar () . . . . . . . . . . . . . nomear () . . . . . . desempenhar () . . . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-44
SLIDE 44

C , .  E P . (a) Begin with a partition into microclasses. (b) Merge the pair optimising DL to get a new partition. (c) Repeat until there is only  class. (d) Run several times, merge variations.

. .

. . . . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . . . . . jogar () . . . . . . . . . . . levar () . . . . . . . . . . . . . nomear () . . . . . . desempenhar () . . . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-45
SLIDE 45

C , .  E P . (a) Begin with a partition into microclasses. (b) Merge the pair optimising DL to get a new partition. (c) Repeat until there is only  class. (d) Run several times, merge variations.

. .

. . . . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . . . . . jogar () . . . . . . . . . . . levar () . . . . . . . . . . . . . nomear () . . . . . . desempenhar () . . . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-46
SLIDE 46

C , .  E P . (a) Begin with a partition into microclasses. (b) Merge the pair optimising DL to get a new partition. (c) Repeat until there is only  class. (d) Run several times, merge variations.

. .

. . . . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . . . . . jogar () . . . . . . . . . . . levar () . . . . . . . . . . . . . nomear () . . . . . . desempenhar () . . . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-47
SLIDE 47

C , .  E P . (a) Begin with a partition into microclasses. (b) Merge the pair optimising DL to get a new partition. (c) Repeat until there is only  class. (d) Run several times, merge variations.

. .

. . . . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . . . . . jogar () . . . . . . . . . . . levar () . . . . . . . . . . . . . nomear () . . . . . . desempenhar () . . . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-48
SLIDE 48

C , .  E P . (a) Begin with a partition into microclasses. (b) Merge the pair optimising DL to get a new partition. (c) Repeat until there is only  class. (d) Run several times, merge variations.

. .

. . . . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . . . . . jogar () . . . . . . . . . . . levar () . . . . . . . . . . . . . nomear () . . . . . . desempenhar () . . . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-49
SLIDE 49

C , .  E P . (a) Begin with a partition into microclasses. (b) Merge the pair optimising DL to get a new partition. (c) Repeat until there is only  class. (d) Run several times, merge variations.

. .

. . . . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . . . . . jogar () . . . . . . . . . . . levar () . . . . . . . . . . . . . nomear () . . . . . . desempenhar () . . . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-50
SLIDE 50

C , .  E P . (a) Begin with a partition into microclasses. (b) Merge the pair optimising DL to get a new partition. (c) Repeat until there is only  class. (d) Run several times, merge variations.

. .

. . . . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . . . . . jogar () . . . . . . . . . . . levar () . . . . . . . . . . . . . nomear () . . . . . . desempenhar () . . . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-51
SLIDE 51

C , .  E P . (a) Begin with a partition into microclasses. (b) Merge the pair optimising DL to get a new partition. (c) Repeat until there is only  class. (d) Run several times, merge variations.

. .

. . . . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . . . . . jogar () . . . . . . . . . . . levar () . . . . . . . . . . . . . nomear () . . . . . . desempenhar () . . . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-52
SLIDE 52

C , .  E P . (a) Begin with a partition into microclasses. (b) Merge the pair optimising DL to get a new partition. (c) Repeat until there is only  class. (d) Run several times, merge variations.

. .

. . . . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . . . . . jogar () . . . . . . . . . . . levar () . . . . . . . . . . . . . nomear () . . . . . . desempenhar () . . . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-53
SLIDE 53

C , .  E P . (a) Begin with a partition into microclasses. (b) Merge the pair optimising DL to get a new partition. (c) Repeat until there is only  class. (d) Run several times, merge variations.

.

. . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . jogar () . . . . . . . . . levar () . . . . . . . . . . . nomear () . . . . desempenhar () . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-54
SLIDE 54

D 

▶ is allows for an intuitive formal definition of macroclasses ▶ Macroclasses: e partition that best optimises the description

length.

▶ As we merge clusters, we first expect the DL to decrease. ▶ Macroclasses are reached when DL stops decreasing.

▶ It is an empirical issue whether a system has macroclasses or

not. We demonstrate their existence in French and European Portuguese conjugation systems.

 / 

slide-55
SLIDE 55

T  C . What form should an Inflection class (IC) system take? . What generalisations should we infer from the data? . How do we assess which lexemes inflect alike? . How do we find the best classes among all possible ones? . Results and discussion . Conclusion

 / 

slide-56
SLIDE 56

D

▶ Paradigm tables contain phonemically transcribed forms. ▶ European Portuguese: Coimbra pronunciation dictionary

(Veiga, Candeias, and Perdigão, ) ( verbal entries).

▶ Fren: Flexique (Bonami, Caron, and Plancq, ) (

verbal entries).

▶ Comparing local and global segmentation strategies

 / 

slide-57
SLIDE 57

P ,  

▶ Global strategy (stem & exponents): Produces scaered classes

with no relationship to conventional knowledge of Portuguese verbal IC.

 / 

slide-58
SLIDE 58

P ,  

▶ Local strategy (alternation paerns): finds generalisations that

display interesting relationship with traditional accounts.

 / 

slide-59
SLIDE 59

P ,  

▶ Local strategy (alternation paerns): finds generalisations that

display interesting relationship with traditional accounts.

 / 

slide-60
SLIDE 60

F ,  

▶ Global strategy (stem & exponents): Produces scaered classes

with no relationship to conventional knowledge of French verbal IC.

 / 

slide-61
SLIDE 61

F ,  

▶ Local strategy (alternation paerns): finds generalisations that

display interesting relationship with traditional accounts.

 / 

slide-62
SLIDE 62

F ,  

▶ Local strategy (alternation paerns): finds generalisations that

display interesting relationship with traditional accounts.

 / 

slide-63
SLIDE 63

D

▶ We do find macroclasses

Not a bipartition (regular / irregular or productive/unproductive), contra Kilani-Schoch and Dressler,  e algorithm had no knowledge of previous accounts.

We find groupings that were overlooked:

French: -yer, -oir French: haïr, finir, -ure, uire Portuguese: two “irregular” groups.

 / 

slide-64
SLIDE 64

D

▶ We do find macroclasses

▶ Not a bipartition (regular / irregular or

productive/unproductive), contra Kilani-Schoch and Dressler,  e algorithm had no knowledge of previous accounts.

We find groupings that were overlooked:

French: -yer, -oir French: haïr, finir, -ure, uire Portuguese: two “irregular” groups.

 / 

slide-65
SLIDE 65

D

▶ We do find macroclasses

▶ Not a bipartition (regular / irregular or

productive/unproductive), contra Kilani-Schoch and Dressler, 

▶ e algorithm had no knowledge of previous accounts.

We find groupings that were overlooked:

French: -yer, -oir French: haïr, finir, -ure, uire Portuguese: two “irregular” groups.

 / 

slide-66
SLIDE 66

D

▶ We do find macroclasses

▶ Not a bipartition (regular / irregular or

productive/unproductive), contra Kilani-Schoch and Dressler, 

▶ e algorithm had no knowledge of previous accounts.

▶ We find groupings that were overlooked:

French: -yer, -oir French: haïr, finir, -ure, uire Portuguese: two “irregular” groups.

 / 

slide-67
SLIDE 67

D

▶ We do find macroclasses

▶ Not a bipartition (regular / irregular or

productive/unproductive), contra Kilani-Schoch and Dressler, 

▶ e algorithm had no knowledge of previous accounts.

▶ We find groupings that were overlooked:

▶ French: -yer, -oir

French: haïr, finir, -ure, uire Portuguese: two “irregular” groups.

 / 

slide-68
SLIDE 68

D

▶ We do find macroclasses

▶ Not a bipartition (regular / irregular or

productive/unproductive), contra Kilani-Schoch and Dressler, 

▶ e algorithm had no knowledge of previous accounts.

▶ We find groupings that were overlooked:

▶ French: -yer, -oir ▶ French: haïr, finir, -ure, uire

Portuguese: two “irregular” groups.

 / 

slide-69
SLIDE 69

D

▶ We do find macroclasses

▶ Not a bipartition (regular / irregular or

productive/unproductive), contra Kilani-Schoch and Dressler, 

▶ e algorithm had no knowledge of previous accounts.

▶ We find groupings that were overlooked:

▶ French: -yer, -oir ▶ French: haïr, finir, -ure, uire ▶ Portuguese: two “irregular” groups.

 / 

slide-70
SLIDE 70

C   

Generalisations Criterion Algorithm Brown and Evans,  raw paradigms Compression distance CompLearn Bonami,  Affixes Edit distance UPGMA Bonami,  Paerns Hamming distance UPGMA Lee and Goldsmith,  Sets of characters DL variant greedy boom-up is work Local paerns DL greedy boom-up is work Global paerns DL greedy boom-up Features of our approach:

▶ Principled notion of Inflectional Realization. ▶ Using a measure that evaluates the quality of the system allows us to infer

macroscopic generalisations.

▶ No parameters to adjust: Occam’s razor is the only criterion.

 / 

slide-71
SLIDE 71

T  C . What form should an Inflection class (IC) system take? . What generalisations should we infer from the data? . How do we assess which lexemes inflect alike? . How do we find the best classes among all possible ones? . Results and discussion . Conclusion

 / 

slide-72
SLIDE 72



▶ Main properties:

▶ Based on information-theoretic measures. ▶ Relies on automatically inferred generalisations. ▶ Aims at cross-linguistic applications. ▶ Formal definition of macroclasses and microclasses.

▶ An analysis into macroclasses can be empirically motivated. ▶ Local segmentation beer captures the structure in inflection

systems than global segmentation.

▶ Supports the relevance of local paerns of alternation in

abstractive approaches (Blevins, ).

▶ Complementary to work on information-theoretic modelling of

implicative structure (Ackerman, Blevins, and Malouf, ; Ackerman and Malouf, ; Bonami and Beniamine, )

 / 

slide-73
SLIDE 73

O  Code available on my webpage: http://www.llf.cnrs.fr/fr/Gens/Beniamine

 / 

slide-74
SLIDE 74

R I

Ackerman, Farrell, James P Blevins, and Robert Malouf (). “Parts and wholes: Paerns of relatedness in complex morphological systems and why they maer”. In: Analogy in Grammar: Form and Acquisition, pp. –. Ackerman, Farrell and Robert Malouf (). “Morphological organization: e low conditional entropy conjecture.” In: Language ., pp. –. Blevins, James P. (). “Word-based morphology”. In: Journal of Linguistics  (),

  • pp. –. : -. : 10. 1017/ S0022226706004191.

Bonami, Olivier (). “La structure fine des paradigmes de flexion”. French. Habilitation à diriger des recherches. U. Paris Diderot. Bonami, Olivier and Beniamine (). “Implicative structure and joint predictiveness”. In: ed. by Vito Pirelli, Claudia Marzi, and Marcello Ferro. : ht t p: / / ceur - w

  • s. or g/ Vol - 1347/ .

Bonami, Olivier, Gauthier Caron, and Clément Plancq (). “Construction d’un lexique flexionnel phonétisé libre du français”. In: Actes du quatrième Congrès Mondial de Linguistique Française, pp. –. Brown, Dunstan and Roger Evans (). “Morphological complexity and unsupervised learning: validating Russian inflectional classes using high frequency data”. In: Current Issues in Morphological eory: (Ir)regularity, analogy and frequency. Ed. by F. Kiefer,

  • M. Ladányi, and P. Siptár. Amsterdam: John Benjamins, pp. –.

Corbe, Greville G. (). “Canonical Inflectional Classes”. In: Selected Proceedings of the th Décembrees: Morphology in Bordeaux. Corbe, Greville G. and Norman M. Fraser (). “Network Morphology: a DATR account of Russian nominal inflection”. In: Journal of Linguistics , pp. –.

 / 

Sacha

slide-75
SLIDE 75

R II

Dressler, Wolfgang U. and Anna M. ornton (). “Italian Nominal Inflection”. In: Wiener Linguistische Gazee -, pp. –. Kilani-Schoch, Marianne and Wolfgang Dressler (). Morphologie naturelle et flexion du verbe français. Tübingen: Gunter Narr Verlag. Lee, Jackson and John A. Goldsmith (). “Automatic morphological alignment and clustering”. Presented at the nd American International Morphology Meeting. Rissanen, J. (). “Universal coding, information, prediction, and estimation”. In: IEEE Tr. on

  • Info. . ., pp. –.

Sagot, Benoît and Géraldine Walther (). “Non-canonical inflection: data, formalisation and complexity measures”. In: Systems and Frameworks in Computational Morphology. Ed. by Cerstin Mahlow and Michael Piotrowski. Vol. . Communications in Computer and Information Science. Zurich, Suisse: Springer, pp. –. : ----. Veiga, Arlindo, Sara Candeias, and Fernando Perdigão (). “Generating a pronunciation dictionary for European Portuguese using a joint-sequence model with embedded stress assignment”. English. In: Journal of the Brazilian Computer Society ., pp. –. : -. : 10.1007/s13173-012-0088-0. Walther, Géraldine (). “On canonicity in morphology:an empirical, formal and computational approach”. PhD thesis. Université Paris Diderot, École doctorale de sciences du langage , U.F.R. de linguistique.

 / 

slide-76
SLIDE 76

S  Both can be used in an abstractive approach: .

global segmentation

. /gordo/ . . /gorda/ . . /gordos/ . . /gordas/ . . Xo ⇌ Xa . Xos ⇌ Xas . Xo ⇌ Xos . Xa ⇌ Xas Spanish adjective  ‘fat’.

 / 

slide-77
SLIDE 77

S  Both can be used in an abstractive approach: .

global segmentation vs local segmentation

. /gordo/ . . /gorda/ . . /gordos/ . . /gordas/ . . Xo ⇌ Xa Xo ⇌ Xa . Xos ⇌ Xas XoY ⇌ XaY . Xo ⇌ Xos X ⇌ Xs . Xa ⇌ Xas X ⇌ Xs Spanish adjective  ‘fat’.

 / 

slide-78
SLIDE 78

N D

. .

. . . . . . . . . . . . . . . . . . ganhar () . . . . . . . . . . ficar () . . . . . . . . . . . . . . . . . . jogar () . . . . . . . . . . . . . . levar () . . . . . . . . . . . . . nomear () . . . . DL = X DL = X . . . . . . . . . desempenhar () . . . . . . voar () . . . . abandonar () . . . . achar () . . . . chegar () . . . . . . pagar () . . . . passar ()

 / 

slide-79
SLIDE 79

F ,  ,  

▶ Local strategy (alternation paerns): finds generalisations that

are in line with traditional accounts.

 / 

slide-80
SLIDE 80

P ,  ,  

▶ Local strategy (alternation paerns): finds generalisations that

are in line with traditional accounts.

 / 