College Center for Library Automation College Center for Library - - PowerPoint PPT Presentation

college center for library automation college center for
SMART_READER_LITE
LIVE PREVIEW

College Center for Library Automation College Center for Library - - PowerPoint PPT Presentation

College Center for Library Automation College Center for Library Automation Tallahassee, FL Tallahassee, FL Susan B. Campbell Susan B. Campbell (scampbell@cclaflorida.org scampbell@cclaflorida.org) ) ( Jim McGill Jim McGill


slide-1
SLIDE 1

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

College Center for Library Automation College Center for Library Automation Tallahassee, FL Tallahassee, FL

  • Susan B. Campbell

Susan B. Campbell ( (scampbell@cclaflorida.org scampbell@cclaflorida.org) )

  • Jim McGill

Jim McGill ( (jmcgill@cclaflorida.org jmcgill@cclaflorida.org) )

slide-2
SLIDE 2

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

automating retrieval and reporting of automating retrieval and reporting of database usage statistics for a consortium database usage statistics for a consortium

  • CCLA provides and maintains the Library Information Network

CCLA provides and maintains the Library Information Network for 28 Community Colleges (LINCC) for Florida's 65+ for 28 Community Colleges (LINCC) for Florida's 65+ community college libraries. community college libraries.

  • db statistics we

db statistics we’ ’re collecting and reporting re collecting and reporting

  • 19 vendors

19 vendors

  • ver 200 databases
  • ver 200 databases
  • monthly reports by database, campus, statewide

monthly reports by database, campus, statewide

  • n demand
  • n demand
  • customers for monthly reports

customers for monthly reports

  • 28 community colleges in Florida

28 community colleges in Florida

  • internal reports

internal reports

slide-3
SLIDE 3

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

automating retrieval and reporting of automating retrieval and reporting of database usage statistics for a consortium database usage statistics for a consortium

  • problem

problem

  • what we were doing and why it doesn

what we were doing and why it doesn’ ’t work t work

  • solution

solution

  • the pieces, the parts and how they fit together

the pieces, the parts and how they fit together

  • future

future

  • what we

what we’ ’ve learned and our expectations ve learned and our expectations

  • problem

problem

  • what we were doing and why it doesn

what we were doing and why it doesn’ ’t work t work

slide-4
SLIDE 4

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

the problem the problem

  • excel excess

excel excess

slide-5
SLIDE 5

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

the problem the problem

  • vendor variety

vendor variety

repeat 28 times or more for each vendor

(and sometimes each database)

slide-6
SLIDE 6

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

slide-7
SLIDE 7

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

slide-8
SLIDE 8

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

automating retrieval and reporting of automating retrieval and reporting of database usage statistics for a consortium database usage statistics for a consortium

  • problem

problem

  • what we were doing and why it doesn

what we were doing and why it doesn’ ’t work t work

  • solution

solution

  • the pieces, the parts and how they fit together

the pieces, the parts and how they fit together

  • future

future

  • what we

what we’ ’ve learned and our expectations ve learned and our expectations

slide-9
SLIDE 9

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

the solution the solution

  • automating

automating

  • maintenance utilities

maintenance utilities

  • handling retrieved data

handling retrieved data

  • reporting in multiple formats

reporting in multiple formats

  • retrieval of vendor data

retrieval of vendor data

slide-10
SLIDE 10

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

intranet web interface intranet web interface

slide-11
SLIDE 11

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

slide-12
SLIDE 12

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

slide-13
SLIDE 13

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

V e n d

  • r

n

  • t

r e s p

  • n

d i n g

slide-14
SLIDE 14

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

slide-15
SLIDE 15

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

slide-16
SLIDE 16

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

slide-17
SLIDE 17

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

slide-18
SLIDE 18

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

slide-19
SLIDE 19

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

slide-20
SLIDE 20

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

slide-21
SLIDE 21

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

reporting reporting

slide-22
SLIDE 22

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

slide-23
SLIDE 23

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

creating retrieval scripts creating retrieval scripts “ “nuts and bolts nuts and bolts” ”

slide-24
SLIDE 24

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

slide-25
SLIDE 25

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

(Manual edits for testing & first cleanup – remove everything that isn’t in table. This is iterative and run from the command prompt until satisfactory file is returned.)

slide-26
SLIDE 26

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

This is a manual process to create the Perl script that will accept variables and create GetWebPage_VENDOR.pl

step 1. capture HTTP headers

Process Trace File (ParseHTTPTrace.pl) Generic Web Page retrieval (GetWebPage_VENDOR.pl)

slide-27
SLIDE 27

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

step 2. modify Perl script to accept command line variables

to reformat standard YYYYMM format to two separate variables: MM and YYYY for URL $Period= $ARGV[0]; $ScopeCustID= $ARGV[1]; $UserName= $ARGV[2]; $Password= $ARGV[3]; # $ScopeCustID= "bcc"; # $Period= "200701"; $yr= substr($Period,0,4); $mon= substr($Period,4,2); if ($mon < 10) { $mon= ~ s/0//gi;} ; YYYYMM - our DB format vendor specific scope customer ID remarks - unremarked for testing

Automated Web Page Retrieval (GetWebPage_VENDOR.pl)

slide-28
SLIDE 28

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

Step 3. modify script with command line variables and parse runtime variables ... iodFromMonth= ' . $mon . '&timePeriodFromYear= ' . $yr . '&timeP ...

$content0= $resp5-> content; $pos= index($content0,"VIEWSTATE")+ 13; $pos2= substr($content0,$pos,5000); $pos3= index($pos2,"value")+ 7; $pos4= index($pos2,"\/> "); $VIEWSTATE= substr($pos2,$pos3,$pos4-$pos3-2); $VIEWSTATE= ~ s/\//\%2F/gi; $VIEWSTATE= ~ s/\+ /\%2B/gi; $VIEWSTATE= ~ s/\= /\%3D/gi; $pos= index($content0,"EVENTVALIDATION")+ 13; $pos2= substr($content0,$pos,2000); $pos3= index($pos2,"value")+ 7; $pos4= index($pos2,"\/> "); $EVENTVALIDATION= substr($pos2,$pos3,$pos4-$pos3-2); $EVENTVALIDATION= ~ s/\//\%2F/gi; $EVENTVALIDATION= ~ s/\+ /\%2B/gi; $EVENTVALIDATION= ~ s/\= /\%3D/gi;

SECURITY CODES some codes are session based & must be parsed

  • ut to pass to

subsequent pages

Automated Web Page Retrieval (GetWebPage_VENDOR.pl)

slide-29
SLIDE 29

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

step 4. create page parser (part 1)

Parse Web Page Information (ProcessVENDOR.pl)

creating ProcessVendor.pl script include file with needed subroutines $col= $ARGV[0]; $vendor= “vendorname"; $VDBSuffix= “VENDOR"; $jumpin= "< b> Site:"; $jumpout= "Grand Total"; require ("../VDBProcs.pl"); anonymized (for this presentation) vendor name college name – when needed points to begin and stop processing file

slide-30
SLIDE 30

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

After processing, each table row is on one line with all carriage returns, linefeeds, and tabs

  • removed. Blank lines and page feeds are not
  • utput, code outside jump* is ignored. Period,

college name and other variables are passed from the database by the VDBProc.pl file. Validation is run on SQL log file to look for error messages and write to log. Entries are made for no data, change from previously retrieved period value or other potential problems.

Step 4. create page parser (part 2)

Parse Web Page Information (ProcessVENDOR.pl)

procedures called from common include file

slide-31
SLIDE 31

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

automated process

Automated Web Page Retrieval (GetWebPage_VENDOR.pl) Web Page Code (GetWebPage_VENDOR.html) SQL Server EXPRESS Parse Web Page Information (ProcessVENDOR.pl) Parameters Statistics ProcessVENDOR.sql Web Interface Queue

slide-32
SLIDE 32

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

handling retrieved data handling retrieved data

delete from VDBStatistics where vendor= ‘VENDOR' and college= 'VALENCIA COMM COLLEGE' and datasource= ‘SOME VENDOR DATABASE' and datatype= 'Sessions' and subdatatype= '0' and period= '200802' insert into VDBStatistics ( sourcefile, vendor, college, period, datatype , subdatatype, datasource, quantity ) values ('ProcessVENDOR.sql',‘VENDOR','VALENCIA COMM COLLEGE‘ ,'200802','Sessions','0',SOME VENDOR DATABASE','4348')

ProcessVENDOR.sql

slide-33
SLIDE 33

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

handling retrieved data handling retrieved data

  • where/how we store what we retrieve

where/how we store what we retrieve

slide-34
SLIDE 34

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

daily backup of database via windows scheduler daily backup of database via windows scheduler

* SQL Server Express does not support SQL Agent

handling retrieved data handling retrieved data

slide-35
SLIDE 35

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

tools tools

slide-36
SLIDE 36

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

software used software used

  • retrieval of data

retrieval of data – – free free

  • Internet Explorer

Internet Explorer

  • Perl

Perl

  • LWP library (Library for the WWW for Perl)

LWP library (Library for the WWW for Perl)

  • ieHTTP

ieHTTP Headers Headers

  • ParseHTTPTrace.pl

ParseHTTPTrace.pl

  • SQLExpress

SQLExpress and manager and manager

  • Intranet Site (IIS, .asp,

Intranet Site (IIS, .asp, vbscript vbscript, java) , java)

  • reporting

reporting – – some cost some cost

  • EZView

EZView (low cost) (low cost)

  • Crystal Reports (had it)

Crystal Reports (had it)

slide-37
SLIDE 37

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

structure structure

  • environment

environment

  • each vendor has its own working directory

each vendor has its own working directory

  • each vendor has several files in this directory

each vendor has several files in this directory

  • batch file (called from SQL Server)

batch file (called from SQL Server)

  • Perl script (gets web page)

Perl script (gets web page)

  • Perl script (makes

Perl script (makes sql sql to load data) to load data)

  • log files (troubleshoot)

log files (troubleshoot)

slide-38
SLIDE 38

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

  • activePerl

activePerl 5.8.6 build 811 to download 5.8.6 build 811 to download webpages webpages

  • run from command prompt in development and testing

run from command prompt in development and testing

  • ieHTTPHeaders

ieHTTPHeaders -

  • an add

an add-

  • on for IE that displays HTTP
  • n for IE that displays HTTP

Headers Headers

  • http://www.blunck.se/iehttpheaders/iehttpheaders.html

http://www.blunck.se/iehttpheaders/iehttpheaders.html

  • nce trace file is captured with
  • nce trace file is captured with ieHTTPHeaders

ieHTTPHeaders add add-

  • on,
  • n,

use use ParseHTTPTrace.pl ParseHTTPTrace.pl to create to create GetWebPage_VENDOR.pl GetWebPage_VENDOR.pl file. file.

  • http://

http:// www.codeproject.com/KB/perl/webautomaton.aspx www.codeproject.com/KB/perl/webautomaton.aspx

retrieval of vendor data retrieval of vendor data

slide-39
SLIDE 39

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

automating retrieval and reporting of automating retrieval and reporting of database usage statistics for a consortium database usage statistics for a consortium

  • problem

problem

  • what we were doing and why it doesn

what we were doing and why it doesn’ ’t work t work

  • solution

solution

  • the pieces, the parts and how they fit together

the pieces, the parts and how they fit together

  • future

future

  • what we

what we’ ’ve learned and our expectations ve learned and our expectations

slide-40
SLIDE 40

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

what have we learned? what have we learned?

  • large change in service requires staffing and support

large change in service requires staffing and support

  • project name should be closely related to the service

project name should be closely related to the service

  • administration understanding of needs

administration understanding of needs

  • assignment of priorities

assignment of priorities

  • proof

proof-

  • of
  • f-
  • concept

concept

  • need for ongoing support

need for ongoing support – –vendor changes, local needs vendor changes, local needs

  • moving from proof

moving from proof-

  • of
  • f-
  • concept is NOT trivial

concept is NOT trivial

  • data checking/revisions/data checking/revisions

data checking/revisions/data checking/revisions

  • handoff from development to maintenance

handoff from development to maintenance

slide-41
SLIDE 41

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

expectations expectations

  • future use

future use

  • until SUSHI is widespread OR

until SUSHI is widespread OR

  • until data collection and reporting in ERM products is

until data collection and reporting in ERM products is mature OR mature OR

  • until existing automated systems have reasonable

until existing automated systems have reasonable consortial pricing consortial pricing

  • future plans

future plans

  • customer/college interface

customer/college interface

  • hope

hope… …

slide-42
SLIDE 42

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

slide-43
SLIDE 43

March 20, 2008 March 20, 2008 Electronic Resources and Libraries Electronic Resources and Libraries

Thank you

College Center for Library Automation 1753 W. Paul Dirac Drive Tallahassee, Florida 32310 Susan Campbell scampbell@cclaflorida.org Jim McGill jmcgill@cclaflorida.org