Digital Methods in Language Documentation Andrea Berez-Kroeker, - - PowerPoint PPT Presentation

digital methods in language documentation
SMART_READER_LITE
LIVE PREVIEW

Digital Methods in Language Documentation Andrea Berez-Kroeker, - - PowerPoint PPT Presentation

2019 Linguistic Institute Course 353: Digital Methods in Language Documentation Andrea Berez-Kroeker, University of Hawaii at Mnoa Colleen Fitzgerald, The University of Texas at Arlington Welcome! Andrea Berez-Kroeker Colleen Fitzgerald


slide-1
SLIDE 1

2019 Linguistic Institute Course 353:

Digital Methods in Language Documentation

Andrea Berez-Kroeker, University of Hawaiʻi at Mānoa Colleen Fitzgerald, The University of Texas at Arlington

slide-2
SLIDE 2

Welcome!

Andrea Berez-Kroeker andrea.berez@hawaii.edu Colleen Fitzgerald cmfitz@uta.edu

slide-3
SLIDE 3

Course management:

ORBUND: Link to syllabus, schedule, grades, announcements Google Drive folder (bit.ly/DigLangDocLSA19): All slides and readings

slide-4
SLIDE 4

Get to know each other: Speed data-ing!

Turn to the person next to you… Find out about that person and their work Be ready to introduce your partner to the class!

slide-5
SLIDE 5

Day 1: What is language documentation? Why digital methods? Basics of data management for LangDoc

slide-6
SLIDE 6

What is language documentation?

slide-7
SLIDE 7

We can’t cover everything here!

Grant writing Ethics Grammar (Phonetics, Phonology, Morphology, Syntax) Field methods Culture / Anthropology Data management Equipment Data preservation Creating learning materials Language pedagogy Ethnoscience

slide-8
SLIDE 8

For more information...

http://hs.umt.edu/colang/default.php

slide-9
SLIDE 9

A definition of LangDoc

  • Language Documentation is the endeavor to create:
  • a long lasting
  • multipurpose record of
  • a language in use
  • in many genres.
slide-10
SLIDE 10

A language documentation is a long-lasting, multipurpose record of a language in use in many genres.

slide-11
SLIDE 11

Transcripts of recordings: Text files (.pdf, .txt, ELAN files, .doc, Google Docs) Digital recordings: Audio & video files (wav, mp3, mp4) Dictionary/lexicon: Database or spreadsheet files (.xls, ods, Google Sheet, FLEx, Toolbox, Filemaker) Images: Pictures

  • f people, or

notebooks, or specimens (.jpg, .tiff, .png) Books (Grammars, dictionaries, readers) Websites (online dictionaries, etc.) Portable sound collections (music, stories on DVD or mp3) Movies/videos, including with subtitles Scholarly pubs (articles, theses, dissertations) Mobile apps The documentation The purposes documentation serves

slide-12
SLIDE 12

Readings folder: What is LangDoc?

Berge, Anna. 2010. Adequacy in documentation. In Grenoble, Lenore A. & N. Louanna Furbee (eds.), Language documentation: Practice and values, 51-65. Amsterdam: John Benjamins. Himmelmann, Nikolaus P. 1998. Documentary and descriptive linguistics. Linguistics 36: 161-195. Himmelmann, Nikolaus P. 2006. Language documentation: What is it good for? In Jost Gippert, Nikolaus P. Himmelmann, & Ulrike Mosel (eds.). Essentials of language documentation, 1-30. Berlin: Mouton de Gruyter. Hinton, Leanne. 2001. Language revitalization: An overview. In Hinton, Leanne & Ken Hale (eds.), The green book of language revitalization in practice, 3-17. Leiden: Brill. Holton, Gary. 2014. Mediating language documentation. In Nathan, David & Peter K. Austin (eds.), Language documentation and description, vol 12: Special issue on language documentation and archiving, 37-52. London: SOAS.

slide-13
SLIDE 13

Lüpke, Frederike. 2010. Research methods in language documentation. In Austin, Peter K. (ed.), Language documentation and description, vol 7, 55-104. London: SOAS. McDonnell, Bradley, Andrea L. Berez-Kroeker & Gary Holton (eds.). 2018. Reflections on language documentation 20 years after Himmelmann 1998, Language Documentation & Conservation Special Publication 15. Honolulu: University of Hawai’i Press. Woodbury, Anthony C. 2003. Defining documentary linguistics. In Austin, Peter K. (ed.), Language documentation and description, vol 1, 35-51. London: SOAS. Woodbury, Anthony C. 2011. Language documentation. In Austin, Peter K. & Julia Sallabank (eds.), The Cambridge handbook of endangered languages, 159-186. Cambridge: Cambridge University Press.

slide-14
SLIDE 14

Why a class on digital methods in language documentation?

slide-15
SLIDE 15

Because digital data has a few problems with

longevity.

slide-16
SLIDE 16

Digital data problems with longevity

Three central problems need to be solved: The media problem The format problem The storage and access problem

slide-17
SLIDE 17

The media problem

The more advanced our technology becomes, the more ephemeral it is:

  • Hard drives: 5 years <
  • CDs/DVDs: 10 years <
  • Cassette tapes: 30 years <
  • Paper: 100-200 years (+) <
  • Stone tablets: ∞
slide-18
SLIDE 18

The media problem

Not only do media degrade… ...devices for reading them become obsolete!

slide-19
SLIDE 19

...requiring data rescuers and archivists to use machines like “Frank”

slide-20
SLIDE 20

The format (or encoding) problem

Proprietary formats are controlled by intellectual property law and are subject to the whims of the developers

  • Cease development or support
  • Charge fees to access data

Example: Hypercard dictionaries (eg Gwich’in)

  • Data now ostensibly lost
slide-21
SLIDE 21

The storage & access problem

Data cannot be effectively stored for longevity by individuals, who

  • Lack expertise in data migration to new formats
  • Inevitably lose interest, retire, or die

Only an archive with an institutional commitment to migrating and backing up data is an effective locus of long-term storage

slide-22
SLIDE 22

The storage & access problem

Data must be discoverable and (correctly, ethically) accessible. Without proper metadata, we don’t know anything about the data… ...or even that it exists! Data that isn’t accessible by anyone is useless.

slide-23
SLIDE 23

Basics of data management: File naming & organization Storage and backing up Metadata Archiving

slide-24
SLIDE 24

LangDoc digital workflow

  • verview
slide-25
SLIDE 25

File naming & organization

slide-26
SLIDE 26

Think about file naming & organization early

Naming and organizing your files a DAY ONE activity. If you start with a poor system, you will have trouble Locating files Distinguishing between files. You need to plan carefully.

slide-27
SLIDE 27

Think about file naming & organization early

Being organized includes Documenting your plans for naming and organization Keeping a metadata catalog that tracks key information (more on this later today) Sticking to your plan

slide-28
SLIDE 28

Tips for naming and organizing files

https://youtu.be/c-Mcp5ozgx0

slide-29
SLIDE 29

Tips for naming and organizing files

In file naming:

  • Use a unique ID that is not dependent on file structure
  • File names can be semantic or non-semantic
  • No spaces or funny characters
  • Select a convention and stick with it.
  • Check with your archive!

Folder organization is for your convenience, but should not be used for file identification.

slide-30
SLIDE 30

“Ahtna” “Tazlina_Village” “Louisa_Jones” “fish_story.wav”

Is this a good folder structure and file naming strategy?

No.

  • Dependent on folder

structure for identification.

  • Could be many fish

stories, so not unique.

  • Files can get moved.
slide-31
SLIDE 31

“Ahtna” “Tazlina_Village” “Louisa_Jones”

“aht-LJ-20170729-fishstory.wav”

How about this one?

Yes!

  • File name is unique
  • “Semantic” file name with

a lot of identifying info for your convenience

  • File structure also only for

your convenience

  • May be too long for some

tastes (or servers)

slide-32
SLIDE 32

“Ahtna”

“aht-LJ-20170729-fishstory.wav”

Or this one?

Yes!

  • Embedded file structure

not necessary

  • Files will order

alphabetically

slide-33
SLIDE 33

“Ahtna”

“ABK-0001.wav”

Or even this one?

Yes!

  • File name is still unique!
  • Non-semantic
  • All catalog info is kept in

your metadata catalog

  • FILE NAMES NEVER

REPLACE YOUR METADATA!

slide-34
SLIDE 34

Thinking about organizing your files & folders

  • Most common is to organize by session

○ Multimedia folders based on a single event

  • Check with your archive
  • Track location of files in your metadata catalog

(Video talks about organizing for deposit, but will also work for your own purposes)

https://youtu.be/ugQrzBHOUws

slide-35
SLIDE 35

Storage and Backup

slide-36
SLIDE 36

Storage and Backup

Data must be protected during collection and processing (“in the field and lab”) Protected for integrity, security and access This is not the same as your plans to keep data safe, secure, and accessible after it leaves your lab Although they may overlap in execution.

slide-37
SLIDE 37

Storage and Backup for data integrity

Your data is vulnerable during collection and processing! Electronic/digital dangers: Broken drives, power surges, viruses Environmental dangers: Water damage, fire damage, insects, mold Human dangers: Theft, loss, overwriting, dropping/crushing

slide-38
SLIDE 38

A good rule to remember

LOCKSS:

Lots Of Copies Keep Stuff Safe!

https://www.lockss.org/

slide-39
SLIDE 39

Storage and Backup for data integrity: LOCKSS

How will you redundantly backup your data? In the field: Multiple hard drives? Flash cards? Where will they be stored?

slide-40
SLIDE 40

One in your cabin... ...one in your car... ...and one at the cultural center.

slide-41
SLIDE 41

Another approach to LOCKSS: The 3-2-1 Principle

(At least) 3 copies on (At least) 2 types of storage media* with (At least) 1 off-site *Different brands of hard drive, or a hard drive and flash storage, or a hard drive and DVDs, or….

slide-42
SLIDE 42

What about cloud storage?

Fine for convenience and sharing with collaborators. Not to be considered primary backup, ever. Considerations: data ownership, cost, security, going out of business (eg Wuala). Higher security: SpiderOak, Tresorit. Easier, more common: Google Drive, DropBox, iCloud

slide-43
SLIDE 43

Metadata

slide-44
SLIDE 44

Metadata

  • Metadata describes, explains, locates, or otherwise

represents your language documentation.

  • Metadata make it easier to find, retrieve, (re-)use, manage,

understand, and cite the data.

  • Metadata creation is best done by you at the time of data

collection.

  • Don’t put it off!
slide-45
SLIDE 45

Descriptive metadata

Descriptive metadata are used for discovery, identification and retrieval (e.g., names, dates, languages, locations, keywords).

https://youtu.be/ZlEQqBVkTB4

slide-46
SLIDE 46

What info should I collect in my metadata?

  • Who

○ Data creator/researcher ○ Other research participants like speakers/signers, videographers, transcribers, translators, etc.

  • What

○ What language? ○ What topic?

  • When

○ Date recording was made

  • Where

○ Location recording was made

slide-47
SLIDE 47

A simple spreadsheet will work well for now

Filename Date Language Speaker Topic Location Recording Device

AHT-0001 2019-04-23 Ahtna Pete, Markle Fishing on the Tazlina river Copper Center, AK Zoom H2 AHT-0002 2019-04-24 Ahtna Maxim, Jeannie School days Copper Center, AK Zoom H4n AHT-0003 2019-04-26 Ahtna Pete, Markle||Maxim, Jeannie Grammaticality judgments Copper Center, AK Zoom H2

slide-48
SLIDE 48

Always check with your archive early

  • Each archive/repository uses a specific metadata schema that you should follow

at the time of data creation

  • Contact your archive early to determine what information you should collect
  • I’ve put Kaipuleohone’s metadata spreadsheet in our class Google Drive
slide-49
SLIDE 49

Archiving and Sharing

slide-50
SLIDE 50

A few definitions

Repository

  • Originally: a physical location where documents were stored
  • In the digital age, a software platform with associated online storage for digital materials.

Archive

  • Has somehow catalogued and organized a collection
  • Often has a place where you can interact with these materials, either online or physically.

In LangDoc repository and archive are used interchangeably.

slide-51
SLIDE 51

A website is not an archive! Your computer is not an archive!

An archive has an institutional commitment to preserving, cataloging, and migrating your digital files. Use a real archive! We’re friendly and want to work with you.

slide-52
SLIDE 52

Advice for Archiving Data

  • Archive staff won’t process your data for you

○ build in time for this work!

  • Find out the file and metadata requirements of your intended archive

○ Use these as part of your own workflow

  • Ask repository staff for any documentation or guides they might have

available early.

slide-53
SLIDE 53

Where to archive your LangDoc

Any member of the Digital Endangered Languages and Musics Archive Network (DELAMAN): http://www.delaman.org/

○ The Archive of the Indigenous Languages of Latin America ○ The Endangered Language Archive ○ The Language Archive ○ Kaipuleohone Language Archive (U of Hawai'i at Manoa) ○ Alaska Native Language Archive (ANLA) ○ American Philosophical Society (APS)

○ ...

slide-54
SLIDE 54

What to archive? Open formats:

https://youtu.be/2JCpg6ICr8M

slide-55
SLIDE 55

Ethical sharing

Be as open as possible, as closed as necessary.

slide-56
SLIDE 56

Ethical sharing

Ethically sharing data helps to preserve the health of our field. Data sharing helps with

  • Reproduction of scholarly work.
  • Advancement of new studies without the need to collect new data.
  • Collaboration on future projects.
  • Cross-disciplinary work.
  • Citations and overall elevation of your scholarly profile.
  • Creating accessible cultural and historical resources for the community.
slide-57
SLIDE 57

Sharing: Questions to Consider

  • Are there ethical or legal restrictions on what you can share?
  • Are any sensitive data anonymized properly?
  • Where (e.g. which repository) will you share your data?
  • Will you need any access restrictions for privacy/ethics, or can your dataset be Open?

Check this decision tree.

  • How will you track use of your dataset?
  • Should you use a particular format or arrangement to facilitate a potential reuse project?
slide-58
SLIDE 58

Readings folder: Data management in LangDoc

Berez, Andrea L., Tana Finnesand & Karen Linnell. 2012. C’ek’aedi Hwnax, the Ahtna Regional Linguistic and Ethnographic Archive. Language Documentation & Conservation 6. 256–251. Berez-Kroeker, Andrea L., Lauren Gawne, Susan Smythe Kung, Barbara F. Kelly, Tyler Heston, Gary Holton, Peter Pulsifer, David I. Beaver, Shobhana Chelliah, Stanley Dubinsky, Richard P. Meier, Nick Thieberger, Keren, Rice & Anthony C. Woodbury. 2018. Reproducible research in linguistics: A position statement on data citation and attribution in

  • ur field. Linguistics 56: 1-18.

Bird, Steven & Gary Simons. 2003. Seven dimensions of portability for language documentation and description. Language 79(3): 557-582. Bowern, Claire. 2011. Planning a language documentation project. In Grenoble, Lenore A. & N. Louanna Furbee (eds.).

  • 2010. Language documentation: Practice and values, 459-483.. Amsterdam: John Benjamins.
slide-59
SLIDE 59

Conathan, Lisa. 2011. Archiving and language documentation. In Austin, Peter K. & Julia Sallabank (eds.), Cambridge Handbook of Endangered Languages, 235-254. Cambridge: Cambridge University Press. Garrett 2014 Participant driven Gawne, Lauren, Barbara F. Kelly, Andrea L. Berez-Kroeker & Tyler Heston. 2017. Putting practice into words: The state of data and methods transparency in grammatical descriptions. Language Documentation & Conservation 11: 157-189. Gehr, Susan. 2013. Breath of Life: Revitalizing California’s native languages through archives. MA thesis, San Jose State University. Good, Jeff. 2010. Valuing technology: Finding the linguist’s place in a new technological universe. In Grenoble, Lenore A. & N. Louanna Furbee (eds.), Language documentation: Practice and values, 111-131. Amsterdam: John Benjamins.

slide-60
SLIDE 60

Good, Jeff. 2011. Data and language documentation. In Peter K. Austin & Julia Sallabank (eds.), The Cambridge handbook of endangered languages, 212-234. Cambridge: Cambridge University Press. Henke, Ryan, & Andrea L. Berez-Kroeker. 2016. A brief history of archiving in language documentation with an annotated

  • bibliography. Language Documentation & Conservation 10: 411-457.

Thieberger, Nicholas & Andrea L. Berez. 2012. Linguistic data management. In Nicholas Thieberger (ed.), Oxford Handbook of Linguistic Fieldwork, 90-118. Oxford: Oxford University Press. Woodbury, Anthony C. 2014. Archives and audiences: Toward making endangered language documentations people can read, use, understand, and admire. In Nathan, David & Peter K. Austin (eds.), Language Documentation and Description, vol 12: Special Issue on Language Documentation and Archiving, 19-36. London: SOAS.