The Creagest Project A Digitized and Annotated Corpus for French - - PowerPoint PPT Presentation

the creagest project
SMART_READER_LITE
LIVE PREVIEW

The Creagest Project A Digitized and Annotated Corpus for French - - PowerPoint PPT Presentation

The Creagest Project A Digitized and Annotated Corpus for French Sign Language (LSF) and Natural Gestural Languages A. Balvet (Lille 3), B. Garcia (Paris 8) C. Courtin (Paris 5) D. Boutet, C. Cuxac, I. Fusellier-Souza, M-T. LHuillier, M-A.


slide-1
SLIDE 1

LREC 2010 1

The Creagest Project

A Digitized and Annotated Corpus for French Sign Language (LSF) and Natural Gestural Languages

  • A. Balvet (Lille 3), B. Garcia (Paris 8)
  • C. Courtin (Paris 5)
  • D. Boutet, C. Cuxac, I. Fusellier-Souza,

M-T. L’Huillier, M-A. Sallandre (Paris 8)

slide-2
SLIDE 2

LREC 2010 2

Outline

1.On Sign Languages 2.Objectives of the Creagest corpus 3.Methodological issues 4.Technical aspects 5.Theoretical/technical perspectives 6.Summary

slide-3
SLIDE 3

LREC 2010 3

On Sign Languages

 Visuo-gestural languages

➔No standardized written form ➔Variation

 Vocal language / SL

 Some influence from the vocal language

(French)

 But 2 distinct linguistic types

slide-4
SLIDE 4

LREC 2010 4

On Sign Languages

 Main typical linguistic features

 2 signifying strategies  lexical signs = say without showing  "Highly Iconic Structures": Transfers = say

by showing

 Multi-parametric and multi-linear structures  Parameters: facial expressions + eyegaze +

body movement + manual parameters

 Each parameter is linguistically specialized

slide-5
SLIDE 5

LREC 2010 5

Objectives of the Creagest corpus project

 3 main objectives

 representativity

 + complement existing LSF corpora

 interoperability, sustainability

 comparing SL corpora  accessing the digitized archives + transcriptions

  • ver long stretches of time (> 50 years)

 Linguistic description

 «Semiological model» (Cuxac)  Semiogenesis

slide-6
SLIDE 6

LREC 2010 6

3 sub-corpora

 Child LSF (ontogenesis)

 3-11 years old children (72 participants)

 Dialogues (lexicogenesis)

 deaf/deaf interactions

 Natural gesturality (phylogenesis)

 Natural gestures as a matrix for SL structures  explanation task: deaf/deaf, hearing/hearing,

mixed dyads

slide-7
SLIDE 7

LREC 2010 7

Still pictures

 Child LSF  Dialogues

slide-8
SLIDE 8

LREC 2010 8

Methodological issues

 ~300 h of digitized corpora, 250 signers

 breakthrough for LSF  comparable with other large-scale projects

 Auslan, BSL, NGT etc.

 but crucial methodological options

 not restricted to non-native speakers

 < 5% of deaf children have LSF as their first language

 accounting for HIS (Transfers)

 ~ 40% in average  never transcribed, generally not glossed or annotated  glosses are not felicitous for lexical signs, even less for

HIS

 challenge for LS corpora annotation

slide-9
SLIDE 9

LREC 2010 9

Methodological issues

 Deaf interviewers

slide-10
SLIDE 10

LREC 2010 10

LSF child-acquisition team

 Deaf interviewers

Deaf investigators from 4 different regions

  • P. Palacios

SW

  • S. Heouaine

Center

  • N. Boursin

W

  • C. Fitzenwald

E

slide-11
SLIDE 11

LREC 2010 11

Lexicogenesis team

 Deaf interviewers

  • B. Blandin

Center-W

  • L. Couton

E M-T. L'Huillier Paris IDF

  • P. Vivet

S-SW

slide-12
SLIDE 12

LREC 2010 12

Technical aspects

slide-13
SLIDE 13

LREC 2010 13

Technical aspects

 A web-based collaborative and federative

platform for corpus distribution

 Archiving and search platform  Extended querying and search features

 Elan companion tools  Adaptation of existing large corpora querying

tools (eg. CQP)

 Observatory for LSF

 Sign creation

slide-14
SLIDE 14

LREC 2010 14

Theoretical/technical Perspectives

 Interaction between theoretical frame-

work and practical aspects

 New annotation tools + annotation

scheme(s?)

➔Towards a computer-aided corpus-based

LSF grammar

 Using annotations as a corpus  Spotting recurrent structures  Similarity assessment between

emerging/established signs

➔[DESSIN/DESSINER] / [INFOGRAPHIE]

slide-15
SLIDE 15

LREC 2010 15

Summary

 ~300 h, 250 speakers, 3 sub-corpora  Crucial methodological choices

 eg.: Deaf interviewers, non-native speakers,

HIS

 A technical infrastructure for the observa-

tion, description and dissemination of LSF data and analysis

slide-16
SLIDE 16

LREC 2010 16

Acknowledgments

 Main funding

 ANR (Agence Nationale de la Recherche)

Corpus

 Complementary financial support

 DGLFLF (Délégation Générale à la Langue

Française et aux Langues de France): visa #17852, november 2009

slide-17
SLIDE 17

LREC 2010 17

CREAGEST

Thank you for your attention