Extending an atomistic Fedora- Commons object model to facilitate - - PowerPoint PPT Presentation

extending an atomistic fedora commons object model to
SMART_READER_LITE
LIVE PREVIEW

Extending an atomistic Fedora- Commons object model to facilitate - - PowerPoint PPT Presentation

Extending an atomistic Fedora- Commons object model to facilitate image segmentation and enhance discovery David Lacy david.lacy@villanova.edu Villanova University Open Repositories 2013 Prince Edward Island July 11 th , 2013


slide-1
SLIDE 1

Extending an atomistic Fedora- Commons object model to facilitate image segmentation and enhance discovery

David Lacy david.lacy@villanova.edu Villanova University Open Repositories 2013 Prince Edward Island July 11th, 2013

slide-2
SLIDE 2

digital.library.villanova.edu

  • Our repository has large amounts of

scanned/paginated resources

– Books – Manuscripts – Newspapers – Theses – Scrapbooks – etc

slide-3
SLIDE 3

Topics

  • Existing Model, Hierarchy and View
  • Extensions

– Image Segmentation – Page Level Search Results

slide-4
SLIDE 4

Basic Model

Core Collection Data

slide-5
SLIDE 5

Enhanced Model

Folder Resource List Image Folder Document Audio Video Core Collection Data

slide-6
SLIDE 6

Object Hierarchy rel:isMemberOf

Dime Novel Collection (Folder) Bride of the Tomb (Resource) Page 1 (Image) Page 2 (Image) Page 3 (Image)

slide-7
SLIDE 7

Hierarchy with multiple relationships (1)

rel:isMemberOf

Dime Novel Collection (Folder) Series List (Folder) Buffalo Bill (Folder) Fiction (Folder)

slide-8
SLIDE 8

Dime Novel Collection (Folder) Bride of the Tomb (Resource) Page 1 (Image) Page 2 (Image) Page 3 (Image) Page Images (List) Chapters (List) Chapter 1 (List) Chapter 2 (List) Page 33 (Image) Page 34 (Image) Page 35 (Image)

Hierarchy with multiple relationships (2)

rel:isMemberOf

slide-9
SLIDE 9

Basic Object Hierarchy in Solr

  • Objects included in Solr

– Resource Objects – Folder Objects

  • Each Solr Record includes parent record ID(s)

– Facilitates browsing collections

slide-10
SLIDE 10

Browse Hierarchy

slide-11
SLIDE 11

Browse Hierarchy

slide-12
SLIDE 12

Browse Hierarchy Tree

slide-13
SLIDE 13

Search Resources and Folders

slide-14
SLIDE 14

Moving forward... We have a large amount of scanned pages

slide-15
SLIDE 15

That is, we have lots of stuff that looks like this

slide-16
SLIDE 16

We want to expose this

slide-17
SLIDE 17

But I want to work on this instead

slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22

The Plan

  • Define segments of Images and extract to

create new objects

  • Create new Article Resources from these new

images

slide-23
SLIDE 23

Image Object

  • Comprised utilizing Fedora's “Mixed-in”

approach, and combines the following models:

– Core Model – Data Model – Image Model

slide-24
SLIDE 24

Core Model

  • Datastreams

– THUMBNAIL – PARENT-LIST

  • Methods

– getThumb – generateParentList

slide-25
SLIDE 25

Data Model

  • Datastreams

– MASTER – MASTER-MD

  • Methods

– generateMetadata

slide-26
SLIDE 26

Image Data Model

  • Datastreams

– LARGE – MEDIUM – OCR-DIRTY

  • Methods

– generateDerivative – generateOCR

slide-27
SLIDE 27

Image Object

  • Datastreams

– THUMBNAIL – PARENT-LIST – MASTER – MASTER-MD – MEDIUM – LARGE – OCR-DIRTY

  • Methods

– getThumb – generateParentList – generateMetadata – generateDerivative – generateOCR

slide-28
SLIDE 28

Segment Image

Extension of Image Object

  • Comprised Utilizing Fedora's “Mixed-in”

approach, and combines the following:

– Core Model – Data Model – Image Model – Segment Model

slide-29
SLIDE 29

Segment Image Model – Part 1

New elements

  • Datastreams

– COORDINATES

  • Methods

– generateSegment

slide-30
SLIDE 30

Segment Object

  • Datastreams

– THUMBNAIL – PARENT-LIST – MASTER – MASTER-MD – MEDIUM – LARGE – OCR-DIRTY – COORDINATES

  • Methods

– getThumb – generateParentList – generateMetadata – generateDerivative – generateOCR – generateSegment

slide-31
SLIDE 31

Segment Image Model – Part 2 New relationship – rel:isPartOf

Page 1 (Image) Article Segment 1 (Segment) rel:isPartOf

slide-32
SLIDE 32

Hierarchy of Segmented Images

March 2003 (Resource) Page List (List) Page 1 (Image) Article A (Segment) Article B (Segment) rel:isPartOf

slide-33
SLIDE 33

Segment Image Model – Part 3

Creating a new MASTER datastream

MASTER Article Segment 1 (Segment) MASTER Page 1 (Image) COORDINATES generateSegment rel:isPartOf

slide-34
SLIDE 34

Interface for generating COORDS

slide-35
SLIDE 35

Image MASTER Segment MASTER

slide-36
SLIDE 36
  • Datastreams

– THUMBNAIL – PARENT-LIST – MASTER – MASTER-MD – MEDIUM – LARGE – OCR-DIRTY – COORDINATES

Segment Object

slide-37
SLIDE 37

Segments within a Resource

rel:isMemberOf

Taj Mahal Interview (Resource) Segment List (List) Part 1 (Segment) Part 2 (Segment) Part 3 (Segment)

slide-38
SLIDE 38

Complex Object Hierarchy

March 2003 (Folder) Page List (List) Page 1 (Image) Page 2 (Image) Page 3 (Image) Article List (List) Taj Mahal Interview (Resource) Part 1 (Segment) Part 2 (Segment) Segment List (List) rel:isPartOf

slide-39
SLIDE 39

Resource with multiple List Objects

slide-40
SLIDE 40

Article List Expanded

slide-41
SLIDE 41

Pages List Expanded

slide-42
SLIDE 42

Front End / Solr

slide-43
SLIDE 43

Current Solr Result Set

Folders and Resources

Record: PID = Resource Record: PID = Resource Record: PID = Folder Record: PID = Resource

slide-44
SLIDE 44

Front End: Existing Results

slide-45
SLIDE 45

Front End: Existing Results

slide-46
SLIDE 46

This works, but as mentioned before matching text on page 30 will return the entire Resource

slide-47
SLIDE 47

Expose page-specific matches by ingesting data objects too

slide-48
SLIDE 48

Total Objects

  • 18,000+ Resource Objects
  • 600+ Folder Objects
  • 220,000+ Data objects
slide-49
SLIDE 49

Solr Field Collapsing

  • Group results based on shared solr field

– <parentGroup/>

  • Data Objects

– <parentGroup/> = Parent Resource

  • Folders and Resources

– <parentGroup> = Self

slide-50
SLIDE 50

Collapsed Solr Result Set

Folders, Resources, and Data Objects

Record / Image Record / Image Record / Image Record / Image Group: PID = Resource Group: PID = Resource Group: PID = Resource

  • Display Groups as

search Results instead of Records

  • Records within

Groups can direct patrons to specific pages within Resources

Record / Resource

slide-51
SLIDE 51

Advanced Solr Results

slide-52
SLIDE 52

Taj Mahal Interview

slide-53
SLIDE 53

Taj Mahal Interview

slide-54
SLIDE 54

March Issue, page 27

slide-55
SLIDE 55

Lists in Accordion

slide-56
SLIDE 56

Lists in Accordion

slide-57
SLIDE 57

Hangups

  • Null Resource hit on query
  • Multiple collection memberships in Solr

– Cannot sort on a multi-value field

slide-58
SLIDE 58

Acknowledgments

  • Demian Katz, Villanova University
  • Chris Hallberg, Villanova University
  • Eoghan Ó Carragáin, National Library of Ireland