Tutorial on LingSync A Free Tool for Creating and Maintaining a - - PowerPoint PPT Presentation

tutorial on lingsync
SMART_READER_LITE
LIVE PREVIEW

Tutorial on LingSync A Free Tool for Creating and Maintaining a - - PowerPoint PPT Presentation

Tutorial on LingSync A Free Tool for Creating and Maintaining a Shared Database For Communities, Linguists and Language Learners Thursday, May 8, 2014 9:00 am - 12:00 pm Joel Dunham (UBC) Elise McClay (McGill) Hisako Noguchi (Concordia)


slide-1
SLIDE 1

Tutorial on LingSync

A Free Tool for Creating and Maintaining a Shared Database For Communities, Linguists and Language Learners Thursday, May 8, 2014 9:00 am - 12:00 pm Joel Dunham (UBC) Elise McClay (McGill) Hisako Noguchi (Concordia)

slide-2
SLIDE 2

Plan of the Tutorial

9:00-9:10

Introduce the LingSync team

9:10-9:40

Why LingSync? (presentation)

9:40-9:50

Learn X/ Georgian Together video demo

9:50-10:10

Question period 1

10:10-10:20

coffee break

10:20-10:35

LingSync Practical Tutorial (presentation)

10:35-10:45

Question period 2

10:45-11:00

Participatory collaborative data entry

11:00-11:30

Question period 3

slide-3
SLIDE 3

Introductions

  • Elise McClay
  • Hisako Noguchi
  • Gretchen McCulloch
  • Louisa Bielig
  • Gina Cook
slide-4
SLIDE 4

Joel Dunham

  • Ph.D. candidate at UBC Linguistics
  • Fieldwork: Blackfoot & Okanagan (& Lillooet)
  • Research: aspectual semantics, noun incorporation
  • self-taught developer and aspiring computational

linguist

  • Dissertation title: The Online Linguistic Database:

Software for linguistic fieldwork

  • Recent contributor to LingSync
slide-5
SLIDE 5

Why LingSync?

  • What is LingSync?
  • Why would I want to use it?
slide-6
SLIDE 6

Outline

  • What is linguistic fieldwork?
  • Collaborative approaches to linguistic

fieldwork are valuable.

  • There is a dearth of collaborative fieldwork

software.

  • LingSync facilitates collaborative fieldwork and

does a whole lot more.

slide-7
SLIDE 7

Share/ Collaborate

slide-8
SLIDE 8

Fieldwork Features

slide-9
SLIDE 9

Share/ Collaborate Fieldwork Features

{}

slide-10
SLIDE 10

Share/ Collaborate Fieldwork Features

Ling- Sync

slide-11
SLIDE 11

Linguistic Fieldwork

working with speakers to generate artifacts that encode knowledge of a particular language

slide-12
SLIDE 12

Fieldwork Trichotomy

  • 1. Documentation
  • 2. Revitalization
  • 3. Research
slide-13
SLIDE 13

revitalization documentation field- work research

slide-14
SLIDE 14

revitalization documentation field- work research

slide-15
SLIDE 15

revitalization documentation field- work research

slide-16
SLIDE 16

Fieldwork-based Research

  • highly endangered, under-studied, and under-

documented languages

  • field methods courses: in their ~5 years of

impact, generate a LOT of transcriptions, audio, and low-level analyses

  • language-learning groups (proficiency required

for high-level research), text collection

slide-17
SLIDE 17

Research benefiting revitalization

First of all, let us look at how wa7 affects the different event types we introduced in 17.3. Examine the example in (11), together with its English translations:

  • 11. Wá7lhkan pun.

* “I am/was finding it.” √ “I find it (habitually).” (Davis 2006, ch. 18, with modification)

slide-18
SLIDE 18

speaker consultant field researcher field researcher field researcher field researcher

slide-19
SLIDE 19

speaker consultant field researcher field researcher field researcher field researcher

slide-20
SLIDE 20

revitalizer documentor field- work researcher researcher

slide-21
SLIDE 21

revitalizer documentor LingSync researcher researcher

slide-22
SLIDE 22

What is LingSync?

slide-23
SLIDE 23

A Free Tool for Creating and Maintaining a Shared Database For Communities, Linguists and Language Learners

slide-24
SLIDE 24

field- worker

LingSync corpus

slide-25
SLIDE 25

field- worker

LingSync corpus

field- worker

LingSync corpus

field- worker

LingSync corpus

slide-26
SLIDE 26

field- worker

LingSync corpus

field- worker

LingSync corpus

field- worker

LingSync corpus

read-only

slide-27
SLIDE 27

field- worker

LingSync corpus

field- worker

LingSync corpus

field- worker

LingSync corpus

read-only read/write

slide-28
SLIDE 28

field- worker

LingSync corpus

field- worker

LingSync corpus

field- worker

LingSync corpus

read-only read/write Public

slide-29
SLIDE 29

field- worker

LingSync corpus

field- worker

LingSync corpus

field- worker

LingSync corpus

read-only read/write Public encrypted bits

slide-30
SLIDE 30

field- worker

LingSync corpus

field- worker

LingSync corpus v1

field- worker

LingSync corpus

read-only read/write Public encrypted bits versioning

slide-31
SLIDE 31

field- worker

LingSync corpus

field- worker

LingSync corpus v1

field- worker

LingSync corpus

read-only read/write Public encrypted bits versioning

LingSync corpus v2

slide-32
SLIDE 32

field- worker

LingSync corpus

field- worker

LingSync corpus v1

field- worker

LingSync corpus

read-only read/write Public encrypted bits versioning

LingSync corpus v2 LingSync corpus v3

slide-33
SLIDE 33

field- worker

LingSync corpus

field- worker

LingSync corpus v1

field- worker

LingSync corpus

read-only read/write Public encrypted bits versioning

LingSync corpus v2

slide-34
SLIDE 34

field- worker

LingSync corpus

field- worker

LingSync corpus v1

field- worker

LingSync corpus

read-only read/write Public encrypted bits versioning

LingSync corpus v2

my other corpus

slide-35
SLIDE 35

LingSync corpus web service

slide-36
SLIDE 36

LingSync corpus web service LingSync Spreadsheet interface

slide-37
SLIDE 37

LingSync Spreadsheet

slide-38
SLIDE 38

LingSync corpus web service LingSync Spreadsheet interface LingSync Prototype interface

slide-39
SLIDE 39

LingSync Prototype

slide-40
SLIDE 40

LingSync corpus web service LingSync Spreadsheet interface LingSync Prototype interface Learn X

slide-41
SLIDE 41

LingSync corpus web service LingSync Spreadsheet interface LingSync Prototype interface Learn X LingSync audio web service

slide-42
SLIDE 42

LingSync corpus web service LingSync Spreadsheet interface LingSync Prototype interface Learn X LingSync audio web service LingSync lexicon web service

the co ca cl Do the co cl The i s co

slide-43
SLIDE 43

corpus Spread- sheet Prototype

Learn X

audio lexicon

LingSync Architecture

slide-44
SLIDE 44
  • many platforms

LingSync Architecture

corpus Spread- sheet Prototype

Learn X

audio lexicon

slide-45
SLIDE 45
  • many platforms
  • online/offline

LingSync Architecture

corpus Spread- sheet Prototype

Learn X

audio lexicon

slide-46
SLIDE 46
  • many platforms
  • online/offline

LingSync Architecture

  • many purposes

corpus Spread- sheet Prototype

Learn X

audio lexicon

slide-47
SLIDE 47
  • many platforms
  • online/offline

LingSync Architecture

  • many purposes
  • extensible

parser corpus Spread- sheet Prototype

Learn X

audio lexicon

slide-48
SLIDE 48

Share/ Collaborate

multi-user concurrent modification

Spread Proto

slide-49
SLIDE 49

Share/ Collaborate

multi-user concurrent modification

Spread Proto

permissions

Spread Proto

slide-50
SLIDE 50

Share/ Collaborate

multi-user concurrent modification

Spread Proto

permissions

Spread Proto Proto

encryption

slide-51
SLIDE 51

Share/ Collaborate

multi-user concurrent modification

Spread Proto

permissions

Spread Proto Proto

encryption versioning

Spread Proto

slide-52
SLIDE 52

Share/ Collaborate

multi-user concurrent modification

Spread Proto

permissions

Spread Proto Proto

encryption versioning

Spread Proto

cross- platform

Spread Proto

slide-53
SLIDE 53

Share/ Collaborate

multi-user concurrent modification

Spread Proto

permissions

Spread Proto Proto

encryption versioning

Spread Proto

cross- platform

Spread Proto

many purposes

Spread Proto

X

slide-54
SLIDE 54

Share/ Collaborate

multi-user concurrent modification

Spread Proto

permissions

Spread Proto Proto

encryption versioning

Spread Proto

cross- platform

Spread Proto

many purposes

Spread Proto

X

Proto offline

X

slide-55
SLIDE 55

Share/ Collaborate

activity feed

Proto

multi-user concurrent modification

Spread Proto

permissions

Spread Proto Proto

encryption versioning

Spread Proto

cross- platform

Spread Proto

many purposes

Spread Proto

X

Proto offline

X

slide-56
SLIDE 56

Fieldwork Features

slide-57
SLIDE 57

Audio/Video

Fieldwork Features

audio/video

  • Record audio

directly into the app

Spread

  • Play audio/video right

next to the utterance

Spread Proto

slide-58
SLIDE 58

Audio/Video

Fieldwork Features

audio/video

  • Drag-and-drop audio/

video files into your data

Proto

  • Record audio

directly into the app

Spread

  • Play audio/video right

next to the utterance

Spread Proto

slide-59
SLIDE 59

Audio/Video

Fieldwork Features

audio/video

  • Link to audio/video on the

web, e.g., YouTube

Proto

  • Drag-and-drop audio/

video files into your data

Proto

  • Record audio

directly into the app

Spread

  • Play audio/video right

next to the utterance

Spread Proto

slide-60
SLIDE 60

Semi-automatic Glosser

Fieldwork Features

audio/video auto-glosser

  • Predicts morpheme breaks

and glosses

Spread Proto

slide-61
SLIDE 61

Semi-automatic Glosser

Fieldwork Features

audio/video auto-glosser

  • Starts learning from your

data right away

  • Predicts morpheme breaks

and glosses

Spread Proto

slide-62
SLIDE 62

Semi-automatic Glosser

Fieldwork Features

audio/video auto-glosser

  • Starts learning from your

data right away

  • Predicts morpheme breaks

and glosses

Spread Proto

  • Context-sensitive
slide-63
SLIDE 63

Semi-automatic Glosser

Fieldwork Features

audio/video auto-glosser

  • Starts learning from your

data right away

  • Predicts morpheme breaks

and glosses

Spread Proto

  • Context-sensitive
  • Theory non-specific: you

define your glosses

slide-64
SLIDE 64

Semi-automatic Glosser

Fieldwork Features

audio/video auto-glosser

  • Starts learning from your

data right away

  • Predicts morpheme breaks

and glosses

Spread Proto

  • Context-sensitive
  • Theory non-specific: you

define your glosses

  • Uses Leipzig Glossing Rules
slide-65
SLIDE 65

Search

Fieldwork Features

audio/video auto-glosser search

  • Basic search

Spread

slide-66
SLIDE 66

Search

Fieldwork Features

audio/video auto-glosser search

  • Basic search

Spread

  • Advanced search

Proto

slide-67
SLIDE 67

Search

Fieldwork Features

audio/video auto-glosser search

  • Save searches as data

lists

Proto

  • Basic search

Spread

  • Advanced search

Proto

slide-68
SLIDE 68

Search

Fieldwork Features

audio/video auto-glosser search

  • Search over user-

defined tags

Proto Spread

  • Save searches as data

lists

Proto

  • Basic search

Spread

  • Advanced search

Proto

slide-69
SLIDE 69

Special Symbols

Fieldwork Features

audio/video auto-glosser search special symbols

  • Unicode throughout

Proto Spread

slide-70
SLIDE 70

Special Symbols

Fieldwork Features

audio/video auto-glosser search special symbols

  • Unicode throughout

Proto Spread

  • Drag-and-drop IPA entry

Proto

slide-71
SLIDE 71

Mass Editing

Fieldwork Features

audio/video auto-glosser search special symbols mass editing

Proto

  • Write scripts (bots) to

change data en masse (JavaScript)

  • E.g., re-analyze a tense

morpheme as aspectual

slide-72
SLIDE 72

Mass Editing

Fieldwork Features

audio/video auto-glosser search special symbols mass editing

Proto

  • Write scripts (bots) to

change data en masse (JavaScript)

  • E.g., re-analyze a tense

morpheme as aspectual

  • Scripts/bots can also archive,

run statistics, and create visualizations

slide-73
SLIDE 73

Customizable

Fieldwork Features

audio/video auto-glosser search special symbols mass editing customizable

Spread Proto

  • Commentable
slide-74
SLIDE 74

Customizable

Fieldwork Features

audio/video auto-glosser search special symbols mass editing customizable

Spread Proto

  • Commentable
  • Alterable data structure

Proto

slide-75
SLIDE 75

Customizable

Fieldwork Features

audio/video auto-glosser search special symbols mass editing customizable

Spread Proto

  • Commentable
  • Alterable data structure

Proto

  • Learns which fields you

use most and displays these by default

Proto

slide-76
SLIDE 76

Customizable

Fieldwork Features

audio/video auto-glosser search special symbols mass editing customizable

Spread Proto

  • Commentable
  • Alterable data structure

Proto

  • Learns which fields you

use most and displays these by default

Proto

slide-77
SLIDE 77
  • Smart import analyzes the

structure of your file

  • Non-lossy: preserves

whatever information you have; you decide which field it belongs in

Import

Fieldwork Features

audio/video auto-glosser search special symbols mass editing customizable import

Proto

slide-78
SLIDE 78
  • Smart import analyzes the

structure of your file

  • Non-lossy: preserves

whatever information you have; you decide which field it belongs in

Import

Fieldwork Features

audio/video auto-glosser search special symbols mass editing customizable import

Proto

  • Works with CSV

(FileMaker Pro, Excel) and Praat TextGrid formats

slide-79
SLIDE 79

Fieldwork Features

audio/video

Export

auto-glosser search special symbols mass editing customizable import export

  • Export corpus to CSV or JSON

Spread

slide-80
SLIDE 80

Fieldwork Features

audio/video

Export

auto-glosser search special symbols mass editing customizable import export

  • Export corpus to CSV or JSON

Spread

  • Export word and morpheme

lists with frequencies

Spread

slide-81
SLIDE 81

Fieldwork Features

audio/video

Export

auto-glosser search special symbols mass editing customizable import export

  • Export corpus to CSV or JSON

Spread

  • Export word and morpheme

lists with frequencies

Spread

  • Export selected entries as

plain text

Spread

slide-82
SLIDE 82

Fieldwork Features

audio/video

Export

auto-glosser search special symbols mass editing customizable import export

  • Export corpus to CSV or JSON

Spread

  • Export word and morpheme

lists with frequencies

Spread

  • Export selected entries as

plain text

Spread

  • Export to CSV, plain text, and

JSON

Proto

slide-83
SLIDE 83

Fieldwork Features

audio/video

Export

auto-glosser search special symbols mass editing customizable import export

  • Export corpus to CSV or JSON

Spread

  • Export word and morpheme

lists with frequencies

Spread

  • Export selected entries as

plain text

Spread

  • LaTeX export with IGT
  • formatted entries

Proto

  • Export to CSV, plain text, and

JSON

Proto

slide-84
SLIDE 84

Fieldwork Features

audio/video

Export

auto-glosser search special symbols mass editing customizable import export

  • Export corpus to CSV or JSON

Spread

  • Export word and morpheme

lists with frequencies

Spread

  • Export selected entries as

plain text

Spread

  • LaTeX export with IGT
  • formatted entries

Proto

  • Export to CSV, plain text, and

JSON

Proto Proto

  • Export all text data with

associated audio/video data as a single .zip file

slide-85
SLIDE 85

Share/ Collaborate

activity feed multi-user concurrent modification permissions encryption versioning cross- platform many purposes

  • ffline

Fieldwork Features

audio/video auto-glosser search special symbols mass editing customizable import export

Ling- Sync Ling- Sync

slide-86
SLIDE 86

Share/ Collaborate Fieldwork Features Structure

slide-87
SLIDE 87

Share/ Collaborate Fieldwork Features Structure

  • Word

documents

slide-88
SLIDE 88

Share/ Collaborate Fieldwork Features Structure

  • LaTeX
  • Word

documents

slide-89
SLIDE 89

Share/ Collaborate Fieldwork Features Structure

  • LaTeX
  • Word

documents

  • Desktop Spreadsheet

Apps (Excel)

slide-90
SLIDE 90

Share/ Collaborate Fieldwork Features Structure

  • LaTeX
  • Word

documents

  • Desktop Database Apps

(Access, FileMaker Pro)

  • Desktop Spreadsheet

Apps (Excel)

slide-91
SLIDE 91

Share/ Collaborate Fieldwork Features Structure

  • LaTeX
  • Word

documents

  • Desktop Database Apps

(Access, FileMaker Pro)

  • Desktop Spreadsheet

Apps (Excel)

  • Google Documents
slide-92
SLIDE 92

Share/ Collaborate Fieldwork Features Structure

  • LaTeX
  • Word

documents

  • Desktop Database Apps

(Access, FileMaker Pro)

  • Desktop Spreadsheet

Apps (Excel)

  • Google Documents
  • Google

Spreadsheets

slide-93
SLIDE 93

Share/ Collaborate Fieldwork Features Structure

  • LaTeX
  • Word

documents

  • Desktop Database Apps

(Access, FileMaker Pro)

  • Desktop Spreadsheet

Apps (Excel)

  • Google Documents
  • Google

Spreadsheets

  • ELAN
slide-94
SLIDE 94

Share/ Collaborate Fieldwork Features Structure

  • LaTeX
  • Word

documents

  • Desktop Database Apps

(Access, FileMaker Pro)

  • Desktop Spreadsheet

Apps (Excel)

  • Google Documents
  • Google

Spreadsheets

  • Toolbox
  • ELAN
slide-95
SLIDE 95

Share/ Collaborate Fieldwork Features Structure

  • LaTeX
  • Word

documents

  • Desktop Database Apps

(Access, FileMaker Pro)

  • Desktop Spreadsheet

Apps (Excel)

  • Google Documents
  • Google

Spreadsheets

  • FLEx
  • Toolbox
  • ELAN
slide-96
SLIDE 96

Share/ Collaborate Fieldwork Features Structure

  • LaTeX
  • Word

documents

  • Desktop Database Apps

(Access, FileMaker Pro)

  • Desktop Spreadsheet

Apps (Excel)

  • Google Documents
  • Google

Spreadsheets

  • FLEx
  • Toolbox
  • ELAN

LingSync

slide-97
SLIDE 97

By linguists, for linguists

  • We are linguists and fieldworkers: we want the same

things you do

  • Open-source, by volunteers & interns
  • Feel free to join us on Github!
  • Always free
  • We have experience using computational linguistics

tools for low-resource languages

  • Contact us if you have questions or feature requests,
  • r if you'd like to get involved, we are friendly!
slide-98
SLIDE 98

More about LingSync

  • Website
  • Spreadsheet App
  • Prototype App
  • Video Tutorials
  • White Paper for Grant

Committees

  • Community Wiki
  • Milestones and Future

Modules

  • Open Source Code
  • Privacy Policy
  • Advisory Board email:

info@lingsync.org

  • Technical Members

email: support@lingsync.org

slide-99
SLIDE 99

LingSync-related Demo Videos

  • Georgian Together
  • Lexicon Browser
  • Audio Web Service
  • Ecoute Alpha Demo
  • Lexicon Browser demo
slide-100
SLIDE 100
  • Alan Bale (Concordia)
  • M.E. Cathcart (U Delaware)
  • Gina Cook (iLanguage Lab

Ltd)

  • Jessica Coon (McGill)
  • Theresa Deering (Visit

Scotland, iLanguage Lab Ltd)

  • Joel Dunham (UBC)
  • Josh Horner (Amilia,

iLanguage Lab Ltd)

  • Yuliya Manyakina (McGill)
  • Elise McClay (McGill)
  • Gretchen McCulloch (McGill)
  • Hisako Noguchi (Concordia)
  • Michael Wagner (McGill)
  • Jesse Pollak (Pomona

College)

  • Tobin Skinner (Acquisio,

iLanguage Lab Ltd)

  • Xianli Sun (Miami University)
  • More...

Acknowledgements

slide-101
SLIDE 101

Thank you!

slide-102
SLIDE 102

References

  • B. Bickel, B. Comrie, M. and Haspelmath. 2008. The Leipzig Glossing Rules: Conventions for interlinear morpheme-by-

morpheme glosses. http://grammar.ucsd.edu/courses/lign120/leipziggloss.pdf Lynnika Butler and Heather van

  • Volkinburg. 2007. Review of Field- Works Language Explorer (FLEx). Language

Documentation & Conservation, 1(1):100–106. MaryEllen Cathcart, Gina Cook, Theresa Deering, Yuliya Manyakina, Gretchen McCulloch, and Hisako Noguchi. 2012. LingSync: A free tool for creating and maintaining a shared database for communities, linguists and language learners. In Robert Henderson and Pablo Pablo, edi- tors, Proceedings of FAMLi II: workshop on Corpus Approaches to Mayan Linguistics 2012, pages 247–250. Henry Davis. 2006. A Teacher's Grammar of Upper St’át’imcets. MS, University of British Columbia, Vancouver, BC. Joel Dunham, Gina Cook, and Joshua Horner. 2014. LingSync & the Online Linguistic Database: New models for the collection and management of data for language communities, linguists and language learners. To appear in The Proceedings ComputEL: The use of computational methods in the study of endangered languages, workshop at the 52nd Annual Meeting of the Association for Computational Linguistics. Gretchen McCulloch. LingSync.org: A revolutionary new app for field linguistics and language communities. (Slides.) Stuart Robinson, Greg Aumann, and Steven Bird. 2007. Managing fieldwork data with ToolBox and the Natural Language

  • Toolkit. Language Documentation & Conservation, 1(1):44– 57.

Chris Rogers. 2010. Review of Field- Works Language Explorer (FLEx) 3.0. Language Documentationation & Conservation, 4:78–84.

slide-103
SLIDE 103

Plan of the Tutorial

9:00-9:10

Introduce the LingSync team

9:10-9:40

Why LingSync? (presentation)

9:40-9:50

Learn X/ Georgian Together video demo

9:50-10:10

Question period 1

10:10-10:20

coffee break

10:20-10:35

LingSync Practical Tutorial (presentation)

10:35-10:45

Question period 2

10:45-11:00

Participatory collaborative data entry

11:00-11:30

Question period 3

slide-104
SLIDE 104

Georgian Together