Automating Authority Work Automating authority work, or, Be your - PowerPoint PPT Presentation

Mike Monaco Coordinator, Cataloging Services May 14, 2018 Automating Authority Work

Automating authority work, or, Be your own authority control vendor Ohio Valley Group of Technical Services Librarians Mike Monaco 2018 Conference, May 13-15, 2018 Coordinator, Cataloging Services Hesburgh Libraries The University of Akron The University of Notre Dame mmonaco@uakron.edu South Bend, Indiana

Who are you? John Carroll University (2001-2004) Part-time AV cataloger Akron-Summit County Public Library (2001-2004) Substitute librarian Cleveland Public Library (2004-2016) Catalog librarian The University of Akron (2016- ) Coordinator, Cataloging Services

The University of Akron Libraries University Libraries (Separate Units) Bierce Library Wayne College Library Science & Technology Library Akron Law Library Archival Services Center for the History of Psychology

Authority control at UA ● 1995 migration and vendor (BNA) supplied one-time authority processing ● Local authority work put on hold in expectation contracting with a vendor…which never happened ● Authority work resumed early 2000s ○ Full authority control for tangible items only ○ Shift to batches of e-resources over time made authority work for batches overwhelming ○ 2013: Budget 80:20 electronic:tangible ○ 2018: ratio is about 95:5

What this is NOT about Automated authority control within the ILS Working with an authority control vendor

What this IS Grabbing the “low-hanging fruit” for batches of records When traditional authority work is not practical (the item is not in-hand or headings reports are too vast to address individually)

Wouldn’t it be nice if... The “Headings used for the first time” report could export a list of the headings, and we could batch search OCLC for records?

The tool box ● MarcEdit ● OCLC Connexion Client ● Excel (or other program for sorting textual lists) ● pgAdmin (or similar for a SQL query, III/Sierra only ) ● A rudimentary grasp of Regular Expressions ● EditPad (or similar RegEx-compatible text editor: Google Sheets, EmEditor)

The process 1. Before loading, correct variant headings (with MarcEdit) 2. After loading, extract headings from report (with SQL query or ILS’s output) 3. Separate names and subjects (in a spreadsheet or text editor) 4. Remove extraneous data (with RegEx-capable editor) 5. Batch search for authority records (in Connexion Client) 6. Load authority records

Validate Headings MarcEdit can check name and subject fields against LC authorities in the Linked Data Service, and automatically correct headings that match a variant (“Use for”) heading*. *NB: The process is imperfect!

Because this is an extra step, we’ve been comparing record sets from various vendors to determine which ones really benefit.

Selected Vendor Loads (March 2017-March 2018) Record Records per Invalid Variants Invalid Variant: Source load headings changed per heading: Record ratio per load load Record ratio Alexander 293 117 18 0.399279 0.060566 Street Press EBSCO 76992 75668 181 0.982807 0.002348 Films on 2509 1019 175 0.406314 0.069911 Demand Kanopy 9960 4946 397 0.496628 0.039815 Proquest 13086 2309 101 0.17647 0.0077 EBC World Share 31 7 0.7 0.232114 0.023772

III Sierra

SQL query of Headings used for the first time report https://mmonaco-uakron.tinytake.com/sf/MjUwMDQxMF83NTIyNTY0

Headings used for the first time Hundreds or even thousands of entries after batch loads...

SQL query*

Results...

In Excel... Be sure to import as Unicode (UTF-8) if your ILS is encoding characters as Unicode rather than MARC8!

Sort the terms Sorting A-Z arranges the headings by field group tag and MARC tag (a=names, b=other names, d=subject) So a100-b730 : names used as names d600-d630 : names used as subjects d650- : subjects

Notice You can’t feed this raw data into a batch search in Connexion Client

In EditPad (or other RegEx-enabled editor) Strip out MARC tags, delimiters, punctuation, etc.

Find/replace using RegEx (.*\|a) (\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.) (\|e.*|\|4.*|\|0.*) (\|x.*|\|v.*|\|z.*) (\|.|\|$) (;|:|$|$|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’ | ‘| be | that |\.{3}| near )

Names (.*\|a) Everything before |a

(\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.) AACR2 abbreviations b. d. fl. ca.

(\|e.*|\|4.*|\|0.*) Relator terms, URIs

(\|.|\|$) Any remaining delimiters and subfield codes

(;|:|$|$|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’ | ‘| be | that |\.{3}| near ) Punctuation, operators, and stopwords that foil OCLC searches

Names as subjects (\|x.*|\|v.*|\|z.*) Subdivisions

Converting SQL output to batch searchable text file with RegEx https://mmonaco-uakron.tinytake.com/sf/MjU4ODk4OF83Nzg3NTMy

Name headings

(.*\|a)

(\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.)

(\|e.*|\|4.*|\|0.*)

(\|.|\|$)

(;|:|$|$|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’| be | that |\.{3}| near )

Names (can be skipped) (.*\|a) (\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.) (\|e.*|\|4.*|\|0.*) (\|x.*|\|v.*|\|z.*) (\|.|\|$) (;|:|$|$|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’| be | that |\.{3}| near )

Names as subjects (.*\|a) (\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.) (\|e.*|\|4.*|\|0.*) (\|x.*|\|v.*|\|z.*) (\|.|\|$) (;|:|$|$|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’| be | that |\.{3}| near )

Topical subjects (can be skipped) (.*\|a) (\|db\. ca\. |\|db\. |\|d\. ca\.|\|dd\. |\|dca\. |-ca\. |\|dfl\. ca\. |\|dfl\.) (\|e.*|\|4.*|\|0.*) (\|x.*|\|v.*|\|z.*) (\|.|\|$) (;|:|$|$|\?| and | or |&c\.|&| in | an |,| the | for | on | so | with | to | by |”|’| be | that |\.{3}| near )

Sirsi/Dynix Symphony

List unauthorized tags report

Slightly different procedure to clean these up 1. Open .txt file in editor 2. Delete header of report 3. Find/Replace to delete page headers (“Tags With UNAUTHORIZED Headings / Produced on Sat Jul 1 17:00:11 2017”) 4. Separate name and topical headings 5. RegEx to remove other data

So far so good... (.*\|a)

Uh oh... (\|e.*|\|4.*|\|0.*) Misses “|?UNAUTHORIZED” by itself. Only captures it if preceded by |e |4 |0

(\|e.*|\|4.*|\|0.*|\|\?.*) \|\?.* captures “|?” followed by anything

Problems

Spaces

Scraps Portions of “|?UNAUTHORIZED” that wrapped to new line

Asterisks Output changes any diacritics to them

Line breaks Name/Title headings are especially likely to get broken up. Here, the delimiter was even separated from the subfield code “t”

Workaround 1. FIND: \* REPLACE:[nothing] to delete asterisks 2. Use EditPad’s “Extras” to delete blank lines, duplicate lines, etc. 3. Depending on the number of items, you might close up split lines by hand.

Searching in batches

Searching a batch of terms in Connexion Client https://mmonaco-uakron.tinytake.com/sf/MjU4OTAyMF83Nzg3NjE2

Batch searching “Use default index” settings nw: for names/titles su: for topics/geographic terms Maximum number of matches to download: 1 (Tools>Options>Batch)

Batch searching NOTE: Your local save file has a maximum capacity of 10,000 records, so don’t search more than that many strings!

Successful name searches (of 1941 entries)

Names, names as subjects, and subjects III requires name headings that are to be used as subjects to be loaded separately from name headings to be used as names! SirsiDynix does not have this issue.

A four month test Total headings Hits in batch search Success rate (ARs extracted found for heading) Names 36,244 21,760 60 % Names as Subjects 3,795 896 23.6 % Subjects 29,147 1,516 5.2 % I’m very pleased with hit rate on names!

A four month test Total headings Hits in batch search Success rate (ARs extracted found for heading) Names 36,244 21,760 60 % Names as Subjects 3,795 896 23.6 % Subjects 29,147 1,516 5.2 % Main issue: Name/Title headings often not established

A four month test Total headings Unique hits in batch Success rate (ARs extracted search found for heading) Names 36,244 21,760 60 % Names as Subjects 3,795 896 23.6 % Subjects 29,147 1,516 5.2 % Main issues: Sierra treats subdivided subject headings as single headings, inflating report (3.4 should correct this issue!) Music headings (instruments, Arranged) often valid but not established in an AR

Automating Authority Work Automating authority work, or, Be your - PowerPoint PPT Presentation

Mike Monaco Coordinator, Cataloging Services May 14, 2018 Automating Authority Work Automating authority work, or, Be your own authority control vendor Ohio Valley Group of Technical Services Librarians Mike Monaco 2018 Conference, May

Automating our work away One consulting firms experience with RMarkdown Finbarr Timbers 1 2

AUTOMATING KNOWLEDGE WORK WITH LARGE-SCALE KNOWLEDGE GRAPHS 2018 Strata Data Conference, New

Automating Work fl ows for Analytics Pipelines Sadayuki Furuhashi Sadayuki Furuhashi An

Automating batch fecundity measurements Automating batch fecundity measurements using digital

RANDOMIZING AND RANDOMIZING AND AUTOMATING ASSESSMENT AUTOMATING ASSESSMENT WITH R WITH R exams

Automating the Automating the configuration of flow configuration of flow monitoring probes

Automating Registrar Onboarding What is AROS? A utomated R egistrar O nboarding S ystem

Adding a Shape with Text XP XP Click the Insert tab, and then in the Illustrations group,

Stop Work Authority 3 million nonfatal workplace injuries and illnesses were reported in 2013

Automating the NDR Kerry Blinston: Global Commercial Director Introduction What is

Automating MySQL Deployments on Kubernetes Calin Don & Flavius Mecea Presslabs Automating

CENTRAL FLORIDA EXPRESSWAY AUTHORITY FY 2015-2019 Draft Five Year Work Plan August 14, 2014

Session Objectives Improve the safety of your patients Ensure compliance with regulatory

Automating and Simplifying your External Reporting by Integrating XBRL Ken Pavell & Steve

Approaching Automation Necessary for introducing a task Prioritize automation steps based

Well Permitting & Approvals Thomas Donohue, PG Authority? Authority? Most of our authority

Cambodia Leadership and Service Learning Journey Parent and Student Introduction Evening Service

UNDERSTANDING LEADERSHIP STYLES Leadership on Demand AGENDA What is Leadership? What are

Software Development Methodologies Lecturer: Raman Ramsin Lecture 6 Integrated Object-Oriented

EMBC Tutorial on Interpretable and Transparent Deep Learning Wojciech Samek Grgoire Montavon

HOW ARE WE TO LIVE? HOW ARE WE TO LIVE? Titus 3:1-2 ESV 1 Remind them to be submissive to rulers

Workloads with Heterogeneous Programmable Datacenters Anton Burtsev, Alex Veidenbaum

Introduction This course aims to support those who lead (or who hope to lead) research teams. It

Managing Your Teams Personalities Yours, theirs, and using them to get the best from your staff

Automating Authority Work Automating authority work, or, Be your - PowerPoint PPT Presentation

Mike Monaco Coordinator, Cataloging Services May 14, 2018 Automating Authority Work Automating authority work, or, Be your own authority control vendor Ohio Valley Group of Technical Services Librarians Mike Monaco 2018 Conference, May

Automating our work away One consulting firms experience with RMarkdown Finbarr Timbers 1 2

AUTOMATING KNOWLEDGE WORK WITH LARGE-SCALE KNOWLEDGE GRAPHS 2018 Strata Data Conference, New

Automating Work fl ows for Analytics Pipelines Sadayuki Furuhashi Sadayuki Furuhashi An

Automating batch fecundity measurements Automating batch fecundity measurements using digital

RANDOMIZING AND RANDOMIZING AND AUTOMATING ASSESSMENT AUTOMATING ASSESSMENT WITH R WITH R exams

Automating the Automating the configuration of flow configuration of flow monitoring probes

Automating Registrar Onboarding What is AROS? A utomated R egistrar O nboarding S ystem

Adding a Shape with Text XP XP Click the Insert tab, and then in the Illustrations group,

Stop Work Authority 3 million nonfatal workplace injuries and illnesses were reported in 2013

Automating the NDR Kerry Blinston: Global Commercial Director Introduction What is

Automating MySQL Deployments on Kubernetes Calin Don &amp; Flavius Mecea Presslabs Automating

CENTRAL FLORIDA EXPRESSWAY AUTHORITY FY 2015-2019 Draft Five Year Work Plan August 14, 2014

Session Objectives Improve the safety of your patients Ensure compliance with regulatory

Automating and Simplifying your External Reporting by Integrating XBRL Ken Pavell &amp; Steve

Approaching Automation Necessary for introducing a task Prioritize automation steps based

Well Permitting &amp; Approvals Thomas Donohue, PG Authority? Authority? Most of our authority

Cambodia Leadership and Service Learning Journey Parent and Student Introduction Evening Service

UNDERSTANDING LEADERSHIP STYLES Leadership on Demand AGENDA What is Leadership? What are

Software Development Methodologies Lecturer: Raman Ramsin Lecture 6 Integrated Object-Oriented

EMBC Tutorial on Interpretable and Transparent Deep Learning Wojciech Samek Grgoire Montavon

HOW ARE WE TO LIVE? HOW ARE WE TO LIVE? Titus 3:1-2 ESV 1 Remind them to be submissive to rulers

Workloads with Heterogeneous Programmable Datacenters Anton Burtsev, Alex Veidenbaum

Introduction This course aims to support those who lead (or who hope to lead) research teams. It

Managing Your Teams Personalities Yours, theirs, and using them to get the best from your staff

Automating MySQL Deployments on Kubernetes Calin Don & Flavius Mecea Presslabs Automating

Automating and Simplifying your External Reporting by Integrating XBRL Ken Pavell & Steve

Well Permitting & Approvals Thomas Donohue, PG Authority? Authority? Most of our authority