Centralization and Regionalization at the National Agricultural - - PowerPoint PPT Presentation

centralization and regionalization at the national
SMART_READER_LITE
LIVE PREVIEW

Centralization and Regionalization at the National Agricultural - - PowerPoint PPT Presentation

Centralization and Regionalization at the National Agricultural Statistics Service Roger Schou National Agricultural Statistics Service IBUC XV Washington, DC, USA Centralized Surveys IBUC XIII 1 survey Mink (April 2010)


slide-1
SLIDE 1

Centralization and Regionalization at the National Agricultural Statistics Service

Roger Schou National Agricultural Statistics Service IBUC XV Washington, DC, USA

slide-2
SLIDE 2

Centralized Surveys

  • IBUC XIII

– 1 survey – Mink (April 2010)

  • IBUC XIV

– Approximately 25 surveys

  • Currently

– Approximately 55 surveys

  • Weekly, bi-weekly, monthly, semi-annual, annual,

and pentennial

slide-3
SLIDE 3

Blaise Data Storage at NASS

  • MySQL Database

– All surveys in one set of tables

  • Generic In-Depth Storage

– Allows for one ETL (Extract, Transfer, Load) program to copy necessary data from transactional MySQL database to the analyticl Redbrick database – Eight Blaise tables

  • A few additional NASS-specific tables
slide-4
SLIDE 4

Citrix Server (DEP & Manipula executables) Application/Data Server Blaise Services Running Instruments & BOI files Blaise Tables MySQL ETL (Edit,Transfer,Load) WIP RedBrick Analysis & Summary Applications Transactional Activity Analytical Activity

System Flow

slide-5
SLIDE 5

Generic In-Depth Tables

  • BLAISE_DICTIONARY
  • BLAISE_ID
  • BLAISE_CASE
  • BLAISE_FORM
  • BLAISE_KEY
  • BLAISE_DATA
  • BLAISE_REMARK
  • BLAISE_OPEN
slide-6
SLIDE 6

BLAISE_DICTIONARY

  • Catalog of all the surveys

– Each instance of a survey has a unique DMKEY (data model key)

  • Each time a data model changes, a new DMKEY is

assigned

  • Can be as minimal as a data model name
  • We use the survey’s folder name as the data

model name, so we programmatically change it for weekly and monthly surveys

  • External sections are created on the fly and

referenced via INCLUDES in the data model code

slide-7
SLIDE 7

BLAISE_ID

  • Contains all block names and field names

– All metadata about the block/field

  • Including FieldTag, DescriptionText, & ArrayIndex
  • NASS Cameleon scripts that created item

code related files have been replaced by VB.NET code utilizing the Blaise API

  • Table is not used much by the Blaise

system, but it’s a critical table for NASS

slide-8
SLIDE 8

Item Codes and Varnames

  • NASS developed “hash notation” for item

codes and varnames for repeated blocks

– Example 1 in the paper illustrates hash notation (page 310)

  • Arrays with item code and varname coded

within the block.

– Item code mapping file gets a special coding pattern to insure the correct element is used – Varname has Table_Row number appended

slide-9
SLIDE 9

BLAISE_CASE

  • Contains the unique JOINKEY and

PRIMARY KEY for each record in the survey

slide-10
SLIDE 10

BLAISE_KEY

  • Contains all of the Primary and Secondary

Keys as defined in an instrument

  • BeginStamp is part of the key for this table

which is needed if versioning is active

slide-11
SLIDE 11

BLAISE_FORM

  • Contains the status information for each

form

– Form status – Error count – Remark count – Don’t Know count – Refusal count

slide-12
SLIDE 12

BLAISE_DATA

  • Contains the data for the records in the

survey

slide-13
SLIDE 13

BLAISE_REMARK

  • Contains the remarks left on fields for a

survey

slide-14
SLIDE 14

BLAISE_OPEN

  • Contains the answers to any OPEN type

fields

slide-15
SLIDE 15

NASS CASIC Tables

  • CASIC_SURVEYINFO
  • CASIC_FAT
  • CASIC_MANAGEMENT
  • CASIC_EVENT_LOG
slide-16
SLIDE 16

CASIC_SURVEYINFO

  • Survey-level information

– Instrument Name – Folder Name – BOI File Name – Assorted indicators – Some Start and End Dates

  • Menu system makes extensive use of this

table

slide-17
SLIDE 17

CASIC_FAT

  • Used to control access
  • Identifies:

– State to whom the record belongs – Data Collection Center (DCC) Assigned – Estimation Center (EC) Assigned – Region to which the state belongs – Region to which the DCC belongs – Region to which the EC belongs

slide-18
SLIDE 18

CASIC_MANAGEMENT

  • Update to the paper:

– Key of the CASIC_MANAGEMENT table is now: DMKEY, JOINKEY, BEGINSTAMP

  • Contains several fields that appear in

nearly all of our instrument

  • Table is indexed on these fields to

increase the performance of instruments

  • Allows record filters to be more efficient
slide-19
SLIDE 19

CASIC_MANAGEMENT

  • June Area instrument is exception to

NASS “standard” fields

  • Similar fields, but different blocks

– Table was not populated by our system – Performance ground to a halt

  • Complicated instrument
  • No indexes utilized
  • An extremely large external file
slide-20
SLIDE 20

CASIC_EVENT_LOG

  • Used as a debugging tool
  • Tracks the activity on the CASIC Menu
  • Every button click registers a “BEGIN”
  • When process finishes, registers an “END”
  • We can detect what processes are

running, who is running them, and how long they are running

slide-21
SLIDE 21

Centralized Blaise Concept

  • All data in one central location
  • All surveys look the same in the database

– Only one ETL needed to copy data

  • Records logically separated using record

filters

  • Dynamic menu reacts to user’s location

and role

slide-22
SLIDE 22

NASS Infrastructure

Workstation Blaise Data Server MySQL

LAN

LAN = Local Area Network WAN = Wide Area Network

LAN Eastern Citrix

Workstation Blaise Data Server MySQL

LAN WAN Western Citrix

(1 ms latency) (1 ms latency) (30 ms latency) (1 ms latency)

slide-23
SLIDE 23

Future NASS Infrastructure

Workstation Blaise Data Server MySQL

LAN

LAN = Local Area Network

LAN Eastern & Western Citrix

(1 ms latency) (1 ms latency)

Citrix Server, Blaise Data Server, and MySQL Database all located in Kansas City

slide-24
SLIDE 24

Hybrid Surveys

  • Temporary Workaround

– For high profile, quick turnaround surveys where 1 missed night of calling is not acceptable

  • CATI data collection is done in a

decentralized (local) Blaise data set, then sent to the Central MySQL Blaise data set via Manipula

– Remaining processes done centrally

  • Currently being phased out
slide-25
SLIDE 25

REGIONALIZATION

  • Moved from 46 state field offices to 12

regional offices

– Nearly complete

  • New set of record filters now group by

region

  • CASIC System works regionally whether

you are sitting in the Regional Field Office

  • r in one of the states within a region
slide-26
SLIDE 26

Other Centralization

  • Other NASS systems undergoing

centralization efforts

– Survey Management System – Tracking and Control – Enumerator Skills Database

  • Centralizing commonly used tables

– County lookup/validation

  • Inter-database communication

– Blaise API

slide-27
SLIDE 27

Conclusion

  • Streamlining Processes

– Centralization – Regionalization – Colocation of Servers

  • Do more with less
  • Future of Blaise at NASS

– Only CATI software – One of two edit systems (after sunsetting legacy system)

slide-28
SLIDE 28

Questions