automating authority work automating authority work
play

Automating Authority Work Automating authority work, or, Be your - PowerPoint PPT Presentation

Mike Monaco Coordinator, Cataloging Services May 14, 2018 Automating Authority Work Automating authority work, or, Be your own authority control vendor Ohio Valley Group of Technical Services Librarians Mike Monaco 2018 Conference, May


  1. Mike Monaco Coordinator, Cataloging Services May 14, 2018 Automating Authority Work

  2. Automating authority work, or, Be your own authority control vendor Ohio Valley Group of Technical Services Librarians Mike Monaco 2018 Conference, May 13-15, 2018 Coordinator, Cataloging Services Hesburgh Libraries The University of Akron The University of Notre Dame mmonaco@uakron.edu South Bend, Indiana

  3. Who are you? John Carroll University (2001-2004) Part-time AV cataloger Akron-Summit County Public Library (2001-2004) Substitute librarian Cleveland Public Library (2004-2016) Catalog librarian The University of Akron (2016- ) Coordinator, Cataloging Services

  4. The University of Akron Libraries University Libraries (Separate Units) Bierce Library Wayne College Library Science & Technology Library Akron Law Library Archival Services Center for the History of Psychology

  5. Authority control at UA ● 1995 migration and vendor (BNA) supplied one-time authority processing ● Local authority work put on hold in expectation contracting with a vendor…which never happened ● Authority work resumed early 2000s ○ Full authority control for tangible items only ○ Shift to batches of e-resources over time made authority work for batches overwhelming ○ 2013: Budget 80:20 electronic:tangible ○ 2018: ratio is about 95:5

  6. What this is NOT about Automated authority control within the ILS Working with an authority control vendor

  7. What this IS Grabbing the “low-hanging fruit” for batches of records When traditional authority work is not practical (the item is not in-hand or headings reports are too vast to address individually)

  8. Wouldn’t it be nice if... The “Headings used for the first time” report could export a list of the headings, and we could batch search OCLC for records?

  9. The tool box ● MarcEdit ● OCLC Connexion Client ● Excel (or other program for sorting textual lists) ● pgAdmin (or similar for a SQL query, III/Sierra only ) ● A rudimentary grasp of Regular Expressions ● EditPad (or similar RegEx-compatible text editor: Google Sheets, EmEditor)

  10. The process 1. Before loading, correct variant headings (with MarcEdit) 2. After loading, extract headings from report (with SQL query or ILS’s output) 3. Separate names and subjects (in a spreadsheet or text editor) 4. Remove extraneous data (with RegEx-capable editor) 5. Batch search for authority records (in Connexion Client) 6. Load authority records

  11. Validate Headings MarcEdit can check name and subject fields against LC authorities in the Linked Data Service, and automatically correct headings that match a variant (“Use for”) heading*. *NB: The process is imperfect!

  12. Because this is an extra step, we’ve been comparing record sets from various vendors to determine which ones really benefit.

  13. Selected Vendor Loads (March 2017-March 2018) Record Records per Invalid Variants Invalid Variant: Source load headings changed per heading: Record ratio per load load Record ratio Alexander 293 117 18 0.399279 0.060566 Street Press EBSCO 76992 75668 181 0.982807 0.002348 Films on 2509 1019 175 0.406314 0.069911 Demand Kanopy 9960 4946 397 0.496628 0.039815 Proquest 13086 2309 101 0.17647 0.0077 EBC World Share 31 7 0.7 0.232114 0.023772

  14. III Sierra

  15. SQL query of Headings used for the first time report https://mmonaco-uakron.tinytake.com/sf/MjUwMDQxMF83NTIyNTY0

  16. Headings used for the first time Hundreds or even thousands of entries after batch loads...

  17. SQL query*

  18. Results...

  19. In Excel... Be sure to import as Unicode (UTF-8) if your ILS is encoding characters as Unicode rather than MARC8!

  20. Sort the terms Sorting A-Z arranges the headings by field group tag and MARC tag (a=names, b=other names, d=subject) So a100-b730 : names used as names d600-d630 : names used as subjects d650- : subjects

  21. Notice You can’t feed this raw data into a batch search in Connexion Client

  22. In EditPad (or other RegEx-enabled editor) Strip out MARC tags, delimiters, punctuation, etc.

  23. Find/replace using RegEx (.*\|a) (\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.) (\|e.*|\|4.*|\|0.*) (\|x.*|\|v.*|\|z.*) (\|.|\|$) (;|:|\(|\)|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’ | ‘| be | that |\.{3}| near )

  24. Names (.*\|a) Everything before |a

  25. (\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.) AACR2 abbreviations b. d. fl. ca.

  26. (\|e.*|\|4.*|\|0.*) Relator terms, URIs

  27. (\|.|\|$) Any remaining delimiters and subfield codes

  28. (;|:|\(|\)|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’ | ‘| be | that |\.{3}| near ) Punctuation, operators, and stopwords that foil OCLC searches

  29. Names as subjects (\|x.*|\|v.*|\|z.*) Subdivisions

  30. Converting SQL output to batch searchable text file with RegEx https://mmonaco-uakron.tinytake.com/sf/MjU4ODk4OF83Nzg3NTMy

  31. Name headings

  32. (.*\|a)

  33. (\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.)

  34. (\|e.*|\|4.*|\|0.*)

  35. (\|.|\|$)

  36. (;|:|\(|\)|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’| be | that |\.{3}| near )

  37. Names (can be skipped) (.*\|a) (\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.) (\|e.*|\|4.*|\|0.*) (\|x.*|\|v.*|\|z.*) (\|.|\|$) (;|:|\(|\)|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’| be | that |\.{3}| near )

  38. Names as subjects (.*\|a) (\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.) (\|e.*|\|4.*|\|0.*) (\|x.*|\|v.*|\|z.*) (\|.|\|$) (;|:|\(|\)|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’| be | that |\.{3}| near )

  39. Topical subjects (can be skipped) (.*\|a) (\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.) (\|e.*|\|4.*|\|0.*) (\|x.*|\|v.*|\|z.*) (\|.|\|$) (;|:|\(|\)|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’| be | that |\.{3}| near )

  40. Sirsi/Dynix Symphony

  41. List unauthorized tags report

  42. Slightly different procedure to clean these up 1. Open .txt file in editor 2. Delete header of report 3. Find/Replace to delete page headers (“Tags With UNAUTHORIZED Headings / Produced on Sat Jul 1 17:00:11 2017”) 4. Separate name and topical headings 5. RegEx to remove other data

  43. So far so good... (.*\|a)

  44. Uh oh... (\|e.*|\|4.*|\|0.*) Misses “|?UNAUTHORIZED” by itself. Only captures it if preceded by |e |4 |0

  45. (\|e.*|\|4.*|\|0.*|\|\?.*) \|\?.* captures “|?” followed by anything

  46. Problems

  47. Spaces

  48. Scraps Portions of “|?UNAUTHORIZED” that wrapped to new line

  49. Asterisks Output changes any diacritics to them

  50. Line breaks Name/Title headings are especially likely to get broken up. Here, the delimiter was even separated from the subfield code “t”

  51. Workaround 1. FIND: \* REPLACE:[nothing] to delete asterisks 2. Use EditPad’s “Extras” to delete blank lines, duplicate lines, etc. 3. Depending on the number of items, you might close up split lines by hand.

  52. Searching in batches

  53. Searching a batch of terms in Connexion Client https://mmonaco-uakron.tinytake.com/sf/MjU4OTAyMF83Nzg3NjE2

  54. Batch searching “Use default index” settings nw: for names/titles su: for topics/geographic terms Maximum number of matches to download: 1 (Tools>Options>Batch)

  55. Batch searching NOTE: Your local save file has a maximum capacity of 10,000 records, so don’t search more than that many strings!

  56. Successful name searches (of 1941 entries)

  57. Names, names as subjects, and subjects III requires name headings that are to be used as subjects to be loaded separately from name headings to be used as names! SirsiDynix does not have this issue.

  58. A four month test Total headings Hits in batch search Success rate (ARs extracted found for heading) Names 36,244 21,760 60 % Names as Subjects 3,795 896 23.6 % Subjects 29,147 1,516 5.2 % I’m very pleased with hit rate on names!

  59. A four month test Total headings Hits in batch search Success rate (ARs extracted found for heading) Names 36,244 21,760 60 % Names as Subjects 3,795 896 23.6 % Subjects 29,147 1,516 5.2 % Main issue: Name/Title headings often not established

  60. A four month test Total headings Unique hits in batch Success rate (ARs extracted search found for heading) Names 36,244 21,760 60 % Names as Subjects 3,795 896 23.6 % Subjects 29,147 1,516 5.2 % Main issues: Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct this issue!) Music headings (instruments, Arranged) often valid but not established in an AR

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend