2019 Linguistic Institute Course 353:
Digital Methods in Language Documentation
Andrea Berez-Kroeker, University of Hawaiʻi at Mānoa Colleen Fitzgerald, The University of Texas at Arlington
Digital Methods in Language Documentation Andrea Berez-Kroeker, - - PowerPoint PPT Presentation
2019 Linguistic Institute Course 353: Digital Methods in Language Documentation Andrea Berez-Kroeker, University of Hawaii at Mnoa Colleen Fitzgerald, The University of Texas at Arlington Welcome! Andrea Berez-Kroeker Colleen Fitzgerald
Andrea Berez-Kroeker, University of Hawaiʻi at Mānoa Colleen Fitzgerald, The University of Texas at Arlington
Andrea Berez-Kroeker andrea.berez@hawaii.edu Colleen Fitzgerald cmfitz@uta.edu
ORBUND: Link to syllabus, schedule, grades, announcements Google Drive folder (bit.ly/DigLangDocLSA19): All slides and readings
Turn to the person next to you… Find out about that person and their work Be ready to introduce your partner to the class!
Grant writing Ethics Grammar (Phonetics, Phonology, Morphology, Syntax) Field methods Culture / Anthropology Data management Equipment Data preservation Creating learning materials Language pedagogy Ethnoscience
Transcripts of recordings: Text files (.pdf, .txt, ELAN files, .doc, Google Docs) Digital recordings: Audio & video files (wav, mp3, mp4) Dictionary/lexicon: Database or spreadsheet files (.xls, ods, Google Sheet, FLEx, Toolbox, Filemaker) Images: Pictures
notebooks, or specimens (.jpg, .tiff, .png) Books (Grammars, dictionaries, readers) Websites (online dictionaries, etc.) Portable sound collections (music, stories on DVD or mp3) Movies/videos, including with subtitles Scholarly pubs (articles, theses, dissertations) Mobile apps The documentation The purposes documentation serves
Berge, Anna. 2010. Adequacy in documentation. In Grenoble, Lenore A. & N. Louanna Furbee (eds.), Language documentation: Practice and values, 51-65. Amsterdam: John Benjamins. Himmelmann, Nikolaus P. 1998. Documentary and descriptive linguistics. Linguistics 36: 161-195. Himmelmann, Nikolaus P. 2006. Language documentation: What is it good for? In Jost Gippert, Nikolaus P. Himmelmann, & Ulrike Mosel (eds.). Essentials of language documentation, 1-30. Berlin: Mouton de Gruyter. Hinton, Leanne. 2001. Language revitalization: An overview. In Hinton, Leanne & Ken Hale (eds.), The green book of language revitalization in practice, 3-17. Leiden: Brill. Holton, Gary. 2014. Mediating language documentation. In Nathan, David & Peter K. Austin (eds.), Language documentation and description, vol 12: Special issue on language documentation and archiving, 37-52. London: SOAS.
Lüpke, Frederike. 2010. Research methods in language documentation. In Austin, Peter K. (ed.), Language documentation and description, vol 7, 55-104. London: SOAS. McDonnell, Bradley, Andrea L. Berez-Kroeker & Gary Holton (eds.). 2018. Reflections on language documentation 20 years after Himmelmann 1998, Language Documentation & Conservation Special Publication 15. Honolulu: University of Hawai’i Press. Woodbury, Anthony C. 2003. Defining documentary linguistics. In Austin, Peter K. (ed.), Language documentation and description, vol 1, 35-51. London: SOAS. Woodbury, Anthony C. 2011. Language documentation. In Austin, Peter K. & Julia Sallabank (eds.), The Cambridge handbook of endangered languages, 159-186. Cambridge: Cambridge University Press.
Three central problems need to be solved: The media problem The format problem The storage and access problem
The more advanced our technology becomes, the more ephemeral it is:
Not only do media degrade… ...devices for reading them become obsolete!
Proprietary formats are controlled by intellectual property law and are subject to the whims of the developers
Example: Hypercard dictionaries (eg Gwich’in)
Data cannot be effectively stored for longevity by individuals, who
Only an archive with an institutional commitment to migrating and backing up data is an effective locus of long-term storage
Data must be discoverable and (correctly, ethically) accessible. Without proper metadata, we don’t know anything about the data… ...or even that it exists! Data that isn’t accessible by anyone is useless.
Naming and organizing your files a DAY ONE activity. If you start with a poor system, you will have trouble Locating files Distinguishing between files. You need to plan carefully.
Being organized includes Documenting your plans for naming and organization Keeping a metadata catalog that tracks key information (more on this later today) Sticking to your plan
https://youtu.be/c-Mcp5ozgx0
In file naming:
Folder organization is for your convenience, but should not be used for file identification.
“Ahtna” “Tazlina_Village” “Louisa_Jones” “fish_story.wav”
Is this a good folder structure and file naming strategy?
structure for identification.
stories, so not unique.
“Ahtna” “Tazlina_Village” “Louisa_Jones”
“aht-LJ-20170729-fishstory.wav”
How about this one?
a lot of identifying info for your convenience
your convenience
tastes (or servers)
“Ahtna”
“aht-LJ-20170729-fishstory.wav”
Or this one?
not necessary
alphabetically
“Ahtna”
“ABK-0001.wav”
Or even this one?
your metadata catalog
REPLACE YOUR METADATA!
○ Multimedia folders based on a single event
(Video talks about organizing for deposit, but will also work for your own purposes)
https://youtu.be/ugQrzBHOUws
Data must be protected during collection and processing (“in the field and lab”) Protected for integrity, security and access This is not the same as your plans to keep data safe, secure, and accessible after it leaves your lab Although they may overlap in execution.
Your data is vulnerable during collection and processing! Electronic/digital dangers: Broken drives, power surges, viruses Environmental dangers: Water damage, fire damage, insects, mold Human dangers: Theft, loss, overwriting, dropping/crushing
https://www.lockss.org/
How will you redundantly backup your data? In the field: Multiple hard drives? Flash cards? Where will they be stored?
One in your cabin... ...one in your car... ...and one at the cultural center.
(At least) 3 copies on (At least) 2 types of storage media* with (At least) 1 off-site *Different brands of hard drive, or a hard drive and flash storage, or a hard drive and DVDs, or….
Fine for convenience and sharing with collaborators. Not to be considered primary backup, ever. Considerations: data ownership, cost, security, going out of business (eg Wuala). Higher security: SpiderOak, Tresorit. Easier, more common: Google Drive, DropBox, iCloud
represents your language documentation.
understand, and cite the data.
collection.
Descriptive metadata are used for discovery, identification and retrieval (e.g., names, dates, languages, locations, keywords).
https://youtu.be/ZlEQqBVkTB4
○ Data creator/researcher ○ Other research participants like speakers/signers, videographers, transcribers, translators, etc.
○ What language? ○ What topic?
○ Date recording was made
○ Location recording was made
Filename Date Language Speaker Topic Location Recording Device
AHT-0001 2019-04-23 Ahtna Pete, Markle Fishing on the Tazlina river Copper Center, AK Zoom H2 AHT-0002 2019-04-24 Ahtna Maxim, Jeannie School days Copper Center, AK Zoom H4n AHT-0003 2019-04-26 Ahtna Pete, Markle||Maxim, Jeannie Grammaticality judgments Copper Center, AK Zoom H2
at the time of data creation
Repository
Archive
In LangDoc repository and archive are used interchangeably.
An archive has an institutional commitment to preserving, cataloging, and migrating your digital files. Use a real archive! We’re friendly and want to work with you.
○ build in time for this work!
○ Use these as part of your own workflow
available early.
Any member of the Digital Endangered Languages and Musics Archive Network (DELAMAN): http://www.delaman.org/
○ The Archive of the Indigenous Languages of Latin America ○ The Endangered Language Archive ○ The Language Archive ○ Kaipuleohone Language Archive (U of Hawai'i at Manoa) ○ Alaska Native Language Archive (ANLA) ○ American Philosophical Society (APS)
○ ...
https://youtu.be/2JCpg6ICr8M
Ethically sharing data helps to preserve the health of our field. Data sharing helps with
Check this decision tree.
Berez, Andrea L., Tana Finnesand & Karen Linnell. 2012. C’ek’aedi Hwnax, the Ahtna Regional Linguistic and Ethnographic Archive. Language Documentation & Conservation 6. 256–251. Berez-Kroeker, Andrea L., Lauren Gawne, Susan Smythe Kung, Barbara F. Kelly, Tyler Heston, Gary Holton, Peter Pulsifer, David I. Beaver, Shobhana Chelliah, Stanley Dubinsky, Richard P. Meier, Nick Thieberger, Keren, Rice & Anthony C. Woodbury. 2018. Reproducible research in linguistics: A position statement on data citation and attribution in
Bird, Steven & Gary Simons. 2003. Seven dimensions of portability for language documentation and description. Language 79(3): 557-582. Bowern, Claire. 2011. Planning a language documentation project. In Grenoble, Lenore A. & N. Louanna Furbee (eds.).
Conathan, Lisa. 2011. Archiving and language documentation. In Austin, Peter K. & Julia Sallabank (eds.), Cambridge Handbook of Endangered Languages, 235-254. Cambridge: Cambridge University Press. Garrett 2014 Participant driven Gawne, Lauren, Barbara F. Kelly, Andrea L. Berez-Kroeker & Tyler Heston. 2017. Putting practice into words: The state of data and methods transparency in grammatical descriptions. Language Documentation & Conservation 11: 157-189. Gehr, Susan. 2013. Breath of Life: Revitalizing California’s native languages through archives. MA thesis, San Jose State University. Good, Jeff. 2010. Valuing technology: Finding the linguist’s place in a new technological universe. In Grenoble, Lenore A. & N. Louanna Furbee (eds.), Language documentation: Practice and values, 111-131. Amsterdam: John Benjamins.
Good, Jeff. 2011. Data and language documentation. In Peter K. Austin & Julia Sallabank (eds.), The Cambridge handbook of endangered languages, 212-234. Cambridge: Cambridge University Press. Henke, Ryan, & Andrea L. Berez-Kroeker. 2016. A brief history of archiving in language documentation with an annotated
Thieberger, Nicholas & Andrea L. Berez. 2012. Linguistic data management. In Nicholas Thieberger (ed.), Oxford Handbook of Linguistic Fieldwork, 90-118. Oxford: Oxford University Press. Woodbury, Anthony C. 2014. Archives and audiences: Toward making endangered language documentations people can read, use, understand, and admire. In Nathan, David & Peter K. Austin (eds.), Language Documentation and Description, vol 12: Special Issue on Language Documentation and Archiving, 19-36. London: SOAS.