the creagest project
play

The Creagest Project A Digitized and Annotated Corpus for French - PowerPoint PPT Presentation

The Creagest Project A Digitized and Annotated Corpus for French Sign Language (LSF) and Natural Gestural Languages A. Balvet (Lille 3), B. Garcia (Paris 8) C. Courtin (Paris 5) D. Boutet, C. Cuxac, I. Fusellier-Souza, M-T. LHuillier, M-A.


  1. The Creagest Project A Digitized and Annotated Corpus for French Sign Language (LSF) and Natural Gestural Languages A. Balvet (Lille 3), B. Garcia (Paris 8) C. Courtin (Paris 5) D. Boutet, C. Cuxac, I. Fusellier-Souza, M-T. L’Huillier, M-A. Sallandre (Paris 8) LREC 2010 1

  2. Outline 1.On Sign Languages 2.Objectives of the Creagest corpus 3.Methodological issues 4.Technical aspects 5.Theoretical/technical perspectives 6.Summary LREC 2010 2

  3. On Sign Languages  Visuo-gestural languages ➔ No standardized written form ➔ Variation  Vocal language / SL  Some influence from the vocal language (French)  But 2 distinct linguistic types LREC 2010 3

  4. On Sign Languages  Main typical linguistic features  2 signifying strategies  lexical signs = say without showing  "Highly Iconic Structures": Transfers = say by showing  Multi-parametric and multi-linear structures  Parameters: facial expressions + eyegaze + body movement + manual parameters  Each parameter is linguistically specialized LREC 2010 4

  5. Objectives of the Creagest corpus project  3 main objectives  representativity  + complement existing LSF corpora  interoperability, sustainability  comparing SL corpora  accessing the digitized archives + transcriptions over long stretches of time (> 50 years)  Linguistic description  «Semiological model» (Cuxac)  Semiogenesis LREC 2010 5

  6. 3 sub-corpora  Child LSF (ontogenesis)  3-11 years old children (72 participants)  Dialogues (lexicogenesis)  deaf/deaf interactions  Natural gesturality (phylogenesis)  Natural gestures as a matrix for SL structures  explanation task: deaf/deaf, hearing/hearing, mixed dyads LREC 2010 6

  7. Still pictures  Child LSF  Dialogues LREC 2010 7

  8. Methodological issues  ~300 h of digitized corpora, 250 signers  breakthrough for LSF  comparable with other large-scale projects  Auslan, BSL, NGT etc.  but crucial methodological options  not restricted to non-native speakers  < 5% of deaf children have LSF as their first language  accounting for HIS (Transfers)  ~ 40% in average  never transcribed, generally not glossed or annotated  glosses are not felicitous for lexical signs, even less for HIS  challenge for LS corpora annotation LREC 2010 8

  9. Methodological issues  Deaf interviewers LREC 2010 9

  10. LSF child-acquisition team  Deaf interviewers Deaf investigators from 4 different regions P. Palacios S. Heouaine N. Boursin C. Fitzenwald SW Center W E LREC 2010 10

  11. Lexicogenesis team  Deaf interviewers B. Blandin L. Couton P. Vivet M-T. L'Huillier Center-W E S-SW Paris IDF LREC 2010 11

  12. Technical aspects LREC 2010 12

  13. Technical aspects  A web-based collaborative and federative platform for corpus distribution  Archiving and search platform  Extended querying and search features  Elan companion tools  Adaptation of existing large corpora querying tools (eg. CQP)  Observatory for LSF  Sign creation LREC 2010 13

  14. Theoretical/technical Perspectives  Interaction between theoretical frame- work and practical aspects  New annotation tools + annotation scheme(s?) ➔ Towards a computer-aided corpus-based LSF grammar  Using annotations as a corpus  Spotting recurrent structures  Similarity assessment between emerging/established signs ➔ [DESSIN/DESSINER] / [INFOGRAPHIE] LREC 2010 14

  15. Summary  ~300 h, 250 speakers, 3 sub-corpora  Crucial methodological choices  eg.: Deaf interviewers, non-native speakers, HIS  A technical infrastructure for the observa- tion, description and dissemination of LSF data and analysis LREC 2010 15

  16. Acknowledgments  Main funding  ANR (Agence Nationale de la Recherche) Corpus  Complementary financial support  DGLFLF (Délégation Générale à la Langue Française et aux Langues de France): visa #17852, november 2009 LREC 2010 16

  17. CREAGEST Thank you for your attention LREC 2010 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend