extending an atomistic fedora commons object model to
play

Extending an atomistic Fedora- Commons object model to facilitate - PowerPoint PPT Presentation

Extending an atomistic Fedora- Commons object model to facilitate image segmentation and enhance discovery David Lacy david.lacy@villanova.edu Villanova University Open Repositories 2013 Prince Edward Island July 11 th , 2013


  1. Extending an atomistic Fedora- Commons object model to facilitate image segmentation and enhance discovery David Lacy david.lacy@villanova.edu Villanova University Open Repositories 2013 Prince Edward Island July 11 th , 2013

  2. digital.library.villanova.edu ● Our repository has large amounts of scanned/paginated resources – Books – Manuscripts – Newspapers – Theses – Scrapbooks – etc

  3. Topics ● Existing Model, Hierarchy and View ● Extensions – Image Segmentation – Page Level Search Results

  4. Basic Model Collection Core Data

  5. Enhanced Model Folder Folder Resource Collection List Core Image Data Document Audio Video

  6. Object Hierarchy rel:isMemberOf Dime Novel Collection (Folder) Bride of the Tomb (Resource) Page 1 (Image) Page 2 (Image) Page 3 (Image)

  7. Hierarchy with multiple relationships (1) rel:isMemberOf Dime Novel Collection (Folder) Series List (Folder) Buffalo Bill (Folder) Fiction (Folder)

  8. Hierarchy with multiple relationships (2) rel:isMemberOf Dime Novel Collection Page 1 (Folder) (Image) Page 2 Bride of the Tomb (Image) (Resource) Page 3 Page Images (Image) (List) Chapters (List) Page 33 (Image) Chapter 1 (List) Page 34 (Image) Chapter 2 Page 35 (List) (Image)

  9. Basic Object Hierarchy in Solr ● Objects included in Solr – Resource Objects – Folder Objects ● Each Solr Record includes parent record ID(s) – Facilitates browsing collections

  10. Browse Hierarchy

  11. Browse Hierarchy

  12. Browse Hierarchy Tree

  13. Search Resources and Folders

  14. Moving forward... We have a large amount of scanned pages

  15. That is, we have lots of stuff that looks like this

  16. We want to expose this

  17. But I want to work on this instead

  18. The Plan ● Define segments of Images and extract to create new objects ● Create new Article Resources from these new images

  19. Image Object ● Comprised utilizing Fedora's “Mixed-in” approach, and combines the following models: – Core Model – Data Model – Image Model

  20. Core Model ● Datastreams ● Methods – THUMBNAIL – getThumb – PARENT-LIST – generateParentList

  21. Data Model ● Datastreams ● Methods – MASTER – generateMetadata – MASTER-MD

  22. Image Data Model ● Datastreams ● Methods – LARGE – generateDerivative – MEDIUM – generateOCR – OCR-DIRTY

  23. Image Object ● Datastreams ● Methods – THUMBNAIL – getThumb – PARENT-LIST – generateParentList – MASTER – generateMetadata – MASTER-MD – generateDerivative – MEDIUM – generateOCR – LARGE – OCR-DIRTY

  24. Segment Image Extension of Image Object ● Comprised Utilizing Fedora's “Mixed-in” approach, and combines the following: – Core Model – Data Model – Image Model – Segment Model

  25. Segment Image Model – Part 1 New elements ● Datastreams ● Methods – COORDINATES – generateSegment

  26. Segment Object ● Datastreams ● Methods – THUMBNAIL – getThumb – PARENT-LIST – generateParentList – MASTER – generateMetadata – MASTER-MD – generateDerivative – MEDIUM – generateOCR – LARGE – generateSegment – OCR-DIRTY – COORDINATES

  27. Segment Image Model – Part 2 New relationship – rel:isPartOf rel:isPartOf Article Segment 1 Page 1 (Segment) (Image)

  28. Hierarchy of Segmented Images March 2003 (Resource) Page List (List) Page 1 (Image) Article A (Segment) rel:isPartOf Article B (Segment)

  29. Segment Image Model – Part 3 Creating a new MASTER datastream Article Segment 1 Page 1 (Segment) (Image) generateSegment MASTER MASTER COORDINATES rel:isPartOf

  30. Interface for generating COORDS

  31. Image MASTER Segment MASTER

  32. Segment Object ● Datastreams – THUMBNAIL – PARENT-LIST – MASTER – MASTER-MD – MEDIUM – LARGE – OCR-DIRTY – COORDINATES

  33. Segments within a Resource rel:isMemberOf Taj Mahal Interview (Resource) Segment List (List) Part 1 (Segment) Part 2 (Segment) Part 3 (Segment)

  34. Complex Object Hierarchy Page 1 (Image) March 2003 (Folder) Page 2 (Image) Page List (List) Page 3 (Image) Article List (List) rel:isPartOf Part 1 Taj Mahal Interview (Segment) (Resource) Part 2 (Segment) Segment List (List)

  35. Resource with multiple List Objects

  36. Article List Expanded

  37. Pages List Expanded

  38. Front End / Solr

  39. Current Solr Result Set Folders and Resources Record: PID = Resource Record: PID = Resource Record: PID = Folder Record: PID = Resource

  40. Front End: Existing Results

  41. Front End: Existing Results

  42. This works, but as mentioned before matching text on page 30 will return the entire Resource

  43. Expose page-specific matches by ingesting data objects too

  44. Total Objects ● 18,000+ Resource Objects ● 600+ Folder Objects ● 220,000+ Data objects

  45. Solr Field Collapsing ● Group results based on shared solr field – <parentGroup/> ● Data Objects – <parentGroup/> = Parent Resource ● Folders and Resources – <parentGroup> = Self

  46. Collapsed Solr Result Set Folders, Resources, and Data Objects Group: PID = Resource ● Display Groups as Record / Image search Results Record / Image instead of Records ● Records within Group: PID = Resource Groups can direct Record / Image patrons to specific Record / Image pages within Resources Group: PID = Resource Record / Resource

  47. Advanced Solr Results

  48. Taj Mahal Interview

  49. Taj Mahal Interview

  50. March Issue, page 27

  51. Lists in Accordion

  52. Lists in Accordion

  53. Hangups ● Null Resource hit on query ● Multiple collection memberships in Solr – Cannot sort on a multi-value field

  54. Acknowledgments ● Demian Katz, Villanova University ● Chris Hallberg, Villanova University ● Eoghan Ó Carragáin, National Library of Ireland

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend