database transcription and
play

Database Transcription and Digitization: The Robert E. MacLaury - PowerPoint PPT Presentation

MDP Project #6 Designing Methods for Large-Scale Database Transcription and Digitization: The Robert E. MacLaury Color Categorization Archive Our Team: Students Our Team: Mentors What Did We Do? The Robert E. MacLaury Mesoamerican Color


  1. MDP Project #6 Designing Methods for Large-Scale Database Transcription and Digitization: The Robert E. MacLaury Color Categorization Archive

  2. Our Team: Students

  3. Our Team: Mentors

  4. What Did We Do?

  5. The Robert E. MacLaury Mesoamerican Color Survey • Conducted from 1978-1981 • 900 speakers • 116 Languages

  6. The Robert E. MacLaury Mesoamerican Color Survey • Conducted from 1978-1981 • 900 speakers • 116 Languages

  7. Raw Handwritten Datasheet

  8. OPTICAL CHARACTER RECOGNITION APPROACH Yang Jiao and Ram Bhakta

  9. Optical Character Recognition • Automatic identification • Recognize optically processed characters • Convert documents into editable and searchable data

  10. OCR Areas Eikvil, 1993

  11. Components of OCR Eikvil, 1993

  12. Our Challenge

  13. Our Challenge Pw

  14. Our Challenge Pw

  15. Approach Text: Pw Confidence level: 52

  16. Our Results More likely Less likely

  17. CROWDSOURCING APPROACH DESIGN & IMPLEMENTATION Stephanie Chang

  18. Crowdsourcing

  19. Amazon Mechanical Turk

  20. What is Crowdsourcing?

  21. ?

  22. http://crowdsource-mcswebsite.rhcloud.com/ Place your screenshot here

  23. Flow of Information Raw Data Transcription Verification

  24. http://crowdsource-mcswebsite.rhcloud.com/

  25. Crowdsourcing

  26. Crowdsourcing

  27. CROWDSOURCING APPROACH EMPIRICAL ANALYSES Prutha Deshpande

  28. Surowiecki, 2004

  29. Surowiecki, 2004

  30. Problem of Data Aggregation Lee et al., 2014

  31. Problem of Data Aggregation Lee et al., 2014

  32. Problem of Data Aggregation Lee et al., 2014

  33. Problem of Data Aggregation Lee et al., 2014

  34. Our Approach to Data Aggregation Cultural Consensus Theory (CCT) ● Family of computational models ● Informants share cultural knowledge - Ethnographic research application ● Correct answer not known a priori

  35. Cultural Consensus Analyses Advantages ● Predict informant competency ● Estimate homogeneity of responses ● Estimate “correct” answers

  36. Does CCT work for our data?

  37. Does CCT work for our data? Pilot Study ● 30 human subject pool participants

  38. Does CCT work for our data? Pilot Study ● 30 human subject pool participants ● Two implementations of CCT - Standard Bayesian CCT model - Alternative CCT model

  39. Does CCT work for our data? Yes! We found CCT to be an appropriate model for aggregating our crowdsourced data.

  40. CCT Statistical Output Oravecz et al., 2014

  41. CCT Transcription Solutions

  42. CCT Transcription Solutions → Archive

  43. DATABASE AND TIKI-WIKI Nathan Benjamin and Zhimin Xiang

  44. Why Build a Database? • Accessibility

  45. Why Build a Database? • Accessibility • Why not simply put photocopies of every page from the archive online?

  46. Why Build a Database? A Relational Database: • Define datatype relationships

  47. Why Build a Database? A Relational Database: • Define datatype relationships • Exclude superfluous results

  48. Why Build a Database? A Relational Database: • Define datatype relationships • Exclude superfluous results • Will remain extensible

  49. The New Problem Relational databases are difficult to traverse without advanced knowledge: Both Structurally

  50. The New Problem Relational databases are difficult to traverse without advanced knowledge: Both Structurally And Syntactically SELECT * FROM Experiment INNER JOIN Language ON Experiment.language#=Language.language# WHERE Language.langName = ‘Korean' ORDER BY Experiment.ExperimentID;

  51. The Solution: We surveyed 9 separate web-frameworks and content management systems.

  52. The Solution: We chose Tiki-Wiki: ● CMS (Content Management System) - Version control - User control - File Access control ● Wiki ● Open source

  53. The Solution: We chose Tiki-Wiki: ● CMS (Content Management System) ● Wiki - Searchable - Provides extensible structure for explanation of project and associated data - Allows for public web access ● Open Source

  54. The Solution: We chose Tiki-Wiki: ● CMS (Content Management System) ● Wiki ● Open Source - Provides Flexibility - Access to Online Mapping Databases

  55. The Solution: We chose Tiki-Wiki: ● CMS (Content Management System) ● Wiki ● Open Source This allowed us to add features and functionality that would inject momentum to research in this area.

  56. In Conclusion Color Categorization Archive (ColCat)

  57. Thank you! Any questions or suggestions? Coming Soon to: Students: Mentors: N. Benjamin ColCat.Calit2.uci.edu S. Gago, PhD H. Bhakta I. Harris, PhD S. Chang Contact us at: K. Jameson, PhD P. Deshpande colcat@calit2.uci.edu Y. Jiao S. Tauber, PhD Z. Xiang Support for the archive project provided by: ● The Multidisciplinary Design Program, 2014-2015 ● The University of California Pacific Rim Research Program, 2010-2015 (K. A. Jameson, PI) ● The National Science Foundation, 2014-2017 (#SMA-1416907, K. A. Jameson, PI)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend