Database Transcription and Digitization: The Robert E. MacLaury - - PowerPoint PPT Presentation

database transcription and
SMART_READER_LITE
LIVE PREVIEW

Database Transcription and Digitization: The Robert E. MacLaury - - PowerPoint PPT Presentation

MDP Project #6 Designing Methods for Large-Scale Database Transcription and Digitization: The Robert E. MacLaury Color Categorization Archive Our Team: Students Our Team: Mentors What Did We Do? The Robert E. MacLaury Mesoamerican Color


slide-1
SLIDE 1

Designing Methods for Large-Scale Database Transcription and Digitization: The Robert E. MacLaury Color Categorization Archive

MDP Project #6

slide-2
SLIDE 2

Our Team: Students

slide-3
SLIDE 3

Our Team: Mentors

slide-4
SLIDE 4

What Did We Do?

slide-5
SLIDE 5

The Robert E. MacLaury Mesoamerican Color Survey

  • Conducted from 1978-1981
  • 900 speakers
  • 116 Languages
slide-6
SLIDE 6

The Robert E. MacLaury Mesoamerican Color Survey

  • Conducted from 1978-1981
  • 900 speakers
  • 116 Languages
slide-7
SLIDE 7
slide-8
SLIDE 8

Raw Handwritten Datasheet

slide-9
SLIDE 9

OPTICAL CHARACTER RECOGNITION APPROACH

Yang Jiao and Ram Bhakta

slide-10
SLIDE 10

Optical Character Recognition

  • Automatic identification
  • Recognize optically processed characters
  • Convert documents into editable and

searchable data

slide-11
SLIDE 11

OCR Areas

Eikvil, 1993

slide-12
SLIDE 12

Components of OCR

Eikvil, 1993

slide-13
SLIDE 13

Our Challenge

slide-14
SLIDE 14

Our Challenge

Pw

slide-15
SLIDE 15

Our Challenge

Pw

slide-16
SLIDE 16
slide-17
SLIDE 17

Approach

Text: Pw Confidence level: 52

slide-18
SLIDE 18

Our Results

More likely Less likely

slide-19
SLIDE 19

CROWDSOURCING APPROACH

DESIGN & IMPLEMENTATION Stephanie Chang

slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25

Crowdsourcing

slide-26
SLIDE 26

Amazon Mechanical Turk

slide-27
SLIDE 27

What is Crowdsourcing?

slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36
slide-37
SLIDE 37

?

slide-38
SLIDE 38

Place your screenshot here

http://crowdsource-mcswebsite.rhcloud.com/

slide-39
SLIDE 39

Flow of Information

Raw Data Transcription Verification

slide-40
SLIDE 40

http://crowdsource-mcswebsite.rhcloud.com/

slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45

Crowdsourcing

slide-46
SLIDE 46

Crowdsourcing

slide-47
SLIDE 47

CROWDSOURCING APPROACH EMPIRICAL ANALYSES

Prutha Deshpande

slide-48
SLIDE 48

Surowiecki, 2004

slide-49
SLIDE 49

Surowiecki, 2004

slide-50
SLIDE 50

Problem of Data Aggregation

Lee et al., 2014

slide-51
SLIDE 51

Problem of Data Aggregation

Lee et al., 2014

slide-52
SLIDE 52

Problem of Data Aggregation

Lee et al., 2014

slide-53
SLIDE 53

Problem of Data Aggregation

Lee et al., 2014

slide-54
SLIDE 54

Our Approach to Data Aggregation

Cultural Consensus Theory (CCT)

  • Family of computational models
  • Informants share cultural knowledge
  • Ethnographic research application
  • Correct answer not known a priori
slide-55
SLIDE 55

Cultural Consensus Analyses Advantages

  • Predict informant competency
  • Estimate homogeneity of responses
  • Estimate “correct” answers
slide-56
SLIDE 56

Does CCT work for our data?

slide-57
SLIDE 57

Does CCT work for our data? Pilot Study

  • 30 human subject pool participants
slide-58
SLIDE 58

Does CCT work for our data? Pilot Study

  • 30 human subject pool participants
  • Two implementations of CCT
  • Standard Bayesian CCT model
  • Alternative CCT model
slide-59
SLIDE 59

Does CCT work for our data?

Yes!

We found CCT to be an appropriate model for aggregating our crowdsourced data.

slide-60
SLIDE 60

Oravecz et al., 2014

CCT Statistical Output

slide-61
SLIDE 61

CCT Transcription Solutions

slide-62
SLIDE 62

CCT Transcription Solutions → Archive

slide-63
SLIDE 63

DATABASE AND TIKI-WIKI

Nathan Benjamin and Zhimin Xiang

slide-64
SLIDE 64

Why Build a Database?

  • Accessibility
slide-65
SLIDE 65
slide-66
SLIDE 66
slide-67
SLIDE 67

Why Build a Database?

  • Accessibility
  • Why not simply put photocopies
  • f every page from the archive
  • nline?
slide-68
SLIDE 68
slide-69
SLIDE 69
slide-70
SLIDE 70
slide-71
SLIDE 71
slide-72
SLIDE 72
slide-73
SLIDE 73
slide-74
SLIDE 74
slide-75
SLIDE 75
slide-76
SLIDE 76

Why Build a Database?

A Relational Database:

  • Define datatype relationships
slide-77
SLIDE 77

Why Build a Database?

A Relational Database:

  • Define datatype relationships
  • Exclude superfluous results
slide-78
SLIDE 78

Why Build a Database?

A Relational Database:

  • Define datatype relationships
  • Exclude superfluous results
  • Will remain extensible
slide-79
SLIDE 79

The New Problem

Relational databases are difficult to traverse without advanced knowledge:

Both Structurally

slide-80
SLIDE 80

The New Problem

Relational databases are difficult to traverse without advanced knowledge:

Both Structurally

SELECT * FROM Experiment INNER JOIN Language ON Experiment.language#=Language.language# WHERE Language.langName = ‘Korean' ORDER BY Experiment.ExperimentID;

And Syntactically

slide-81
SLIDE 81

The Solution:

We surveyed 9 separate web-frameworks and content management systems.

slide-82
SLIDE 82

The Solution:

We chose Tiki-Wiki:

  • CMS (Content Management System)
  • Version control
  • User control
  • File Access control
  • Wiki
  • Open source
slide-83
SLIDE 83

The Solution:

We chose Tiki-Wiki:

  • CMS (Content Management System)
  • Wiki
  • Searchable
  • Provides extensible structure for explanation of project and

associated data

  • Allows for public web access
  • Open Source
slide-84
SLIDE 84

The Solution:

We chose Tiki-Wiki:

  • CMS (Content Management System)
  • Wiki
  • Open Source
  • Provides Flexibility
  • Access to Online Mapping Databases
slide-85
SLIDE 85

The Solution:

We chose Tiki-Wiki:

  • CMS (Content Management System)
  • Wiki
  • Open Source

This allowed us to add features and functionality that would inject momentum to research in this area.

slide-86
SLIDE 86

In Conclusion

Color Categorization Archive (ColCat)

slide-87
SLIDE 87
slide-88
SLIDE 88
slide-89
SLIDE 89

Thank you!

Any questions or suggestions?

Coming Soon to: ColCat.Calit2.uci.edu Contact us at: colcat@calit2.uci.edu Students:

  • N. Benjamin
  • H. Bhakta
  • S. Chang
  • P. Deshpande
  • Y. Jiao
  • Z. Xiang

Mentors:

  • S. Gago, PhD
  • I. Harris, PhD
  • K. Jameson, PhD
  • S. Tauber, PhD

Support for the archive project provided by:

  • The Multidisciplinary Design Program, 2014-2015
  • The University of California Pacific Rim Research Program, 2010-2015 (K. A. Jameson, PI)
  • The National Science Foundation, 2014-2017 (#SMA-1416907, K. A. Jameson, PI)