review access and triage of
play

Review, Access, and Triage of Mail (RATOM) Jamie Patrick-Burns - PowerPoint PPT Presentation

Review, Access, and Triage of Mail (RATOM) Jamie Patrick-Burns Digital Archivist, State Archives of North Carolina Christopher (Cal) Lee University of North Carolina at Chapel Hill Best Practices Exchange Columbus, Ohio April 30, 2019 1


  1. Review, Access, and Triage of Mail (RATOM) Jamie Patrick-Burns Digital Archivist, State Archives of North Carolina Christopher (Cal) Lee University of North Carolina at Chapel Hill Best Practices Exchange Columbus, Ohio April 30, 2019 1

  2. Motivation – Selection/Appraisal • Despite progress on various technologies to support data management and digital preservation, relatively little progress on software support for the core activities of selection and appraisal • Selection/appraisal decisions are based on various patterns • When patterns can be identified algorithmically, software can assist the process • LAMs frequently want to take actions that reflect contextual relationships • Timeline representations and visualizations can also provide useful, high-level views of materials

  3. Motivation - Email • 48 years of email creation • Hundreds of billions of messages generated every day • Most has little long-term retention value, but some absolutely does • Despite presence of numerous other modalities, email still deeply embedded in activities, serving as massive source of evidence and information • Often found in collections and acquisitions with other types of materials http://hci.stanford.edu/~jheer/projects/enron/v1/

  4. Background – BitCurator (2011-2014) • BitCurator environment allows LAMs to: • acquire data from media • characterize and triage data • expose numerous data points that can inform selection and appraisal decisions, including file types, file sizes, timestamps, original directory structures, potentially sensitive features • Output is generally static • Users have expressed interest in additional ways to iteratively make judgements

  5. http://bitcurator.github.io/

  6. Background – BitCurator Access and BitCurator NLP (2014-2018) • Developed and repurposed software (topic modelling and named entity extraction) that can facilitate appraisal/selection

  7. TOMES and the State Archives of NC State highway system of NC, 1936, NC State Highway Commission (MC.150.1936na). NC Maps, https://dc.lib.unc.edu/cdm/ref/collection/ncmaps/id/760 7

  8. What was Transforming Online Mail with Embedded Semantics (TOMES)? • NHPRC-funded grant, 2015-2018 • Appraisal, preservation, and processing challenges of email in state government • Utah State Archives and Kansas Historical Society partners • Building on EMCAPP (EAXS XML) • More information: • https://www.ncdcr.gov/resources/records-management/tomes • https://github.com/StateArchivesOfNorthCarolina/tomes-project 8

  9. TOMES objectives • Identify email accounts of public officials with records of enduring value (Capstone methodology) • Produce cross platform .pst to EAXS XML parser • Publish NLP dictionary designed to flag named entities unique to government at the state and local level • Process set of test email accounts 9

  10. Results: Capstone Archival • Methodology for managing/accessing archival email • NARA Bulletin 2013-02 Non- permanent • Email appraised at account level 10

  11. Results: Software 1. TOMES PST Extractor: converts PST to EML PST EML 2. TOMES DarcMail: converts EML or MBOX to EAXS 3. TOMES Entities: converts Microsoft Excel files to a valid Tagged entity dictionary file EAXS EAXS 4. TOMES Tagger: converts EAXS to a tagged EAXS file 5. TOMES Packager: creates an AIP structure consisting of source and derivative files as well as AIP basic METS files 11

  12. Building on the BitCurator/TOMES foundation • We have XML output with lots of metadata and tags; now what? • Iterative processing • Archivist-assisted review and machine learning • Record/non-record • PII/redaction • Reaching beyond state governments • Integration with other datasets and tools (BitCurator) • Open source iterative access tool to facilitate processing and access to historically significant email accounts • Review and approve tags • Redact sensitive information • Make reviewed emails viewable to researchers 12

  13. Review, Appraisal and Triage of Mail • Funded by Andrew W. Mellon Foundation (2019-2020) • Developing and repurposing software (including NLP and machine learning) for selection/appraisal in BitCurator environment with hooks and enhancements to TOMES output • Support iterative processing - information discovered at various points in the processing workflow can support further selection, redaction or description actions • Mapping of timestamp, entity, sensitive features and other Ray Tomlinson elements across the tools https://upload.wikimedia.org/wikipedia/commons/0/01/Ray_Tomlinson_%28cropped%29.jpg

  14. RATOM Project Team at UNC • Christopher (Cal) Lee, Principal Investigator • Kam Woods, Co-PI and Technical Lead • Antoine de Torcy, Software Developer • Anusha Suresh, Project Manager

  15. RATOM Project Team at State Archives of NC • Camille Tyndall Watson, Co- Principal Investigator • Jamie Patrick-Burns, Investigator • Nitin Arora, Software Developer

  16. RATOM Goals 1. Explore the incorporation of software into an iterative processing approach 2. Create a module that would allow email items approved for release to be reviewed/released 3. Investigate machine learning applications to support automated identification of records and materials that require redaction or closure 16

  17. http://ratom.web.unc.edu/ Cal Lee University of North Carolina https://ils.unc.edu/callee/ Jamie Patrick-Burns Digital Archivist, State Archives of North Carolina Jamie.patrickburns@ncdcr.gov (919) 814-6905 State Archives Twitter: @NCArchives State Archives Facebook: https://www.facebook.com/State-Archives-of-North-Carolina-119904548024750/ 17

  18. Discussion Questions • What are your most pressing needs related to email? • How are you addressing those needs now? • What would you like software to do? 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend