Imputation Methodology for the Agricultural Resource Management - - PowerPoint PPT Presentation

imputation methodology for
SMART_READER_LITE
LIVE PREVIEW

Imputation Methodology for the Agricultural Resource Management - - PowerPoint PPT Presentation

Assessing the Impact of a New Imputation Methodology for the Agricultural Resource Management Survey Darcy Miller National Agricultural Statistics Service . . . providing timely, accurate, and useful statistics in service to U.S.


slide-1
SLIDE 1

“. . . providing timely, accurate, and useful statistics in service to U.S. agriculture.”

Assessing the Impact of a New Imputation Methodology for the Agricultural Resource Management Survey

Darcy Miller National Agricultural Statistics Service

slide-2
SLIDE 2

Summer Conference Preview/Review 2014 July 22nd 2014

National Agricultural Statistics Service (NASS)

  • “The National Agricultural Statistics

Service provides timely, accurate, and useful statistics in service to U.S. Agriculture.”

2

slide-3
SLIDE 3

Summer Conference Preview/Review 2014 July 22nd 2014

Agricultural Resource Management Survey (ARMS)

ARMS is the USDA’s primary survey for the annual collection of data from farm operators

  • Household
  • demographic attributes, labor allocation and debt
  • Farm
  • ownership, management structure, cost and returns, assets

and debt

  • Production Practices
  • tillage, fertilizer, and pesticides

3

slide-4
SLIDE 4

Summer Conference Preview/Review 2014 July 22nd 2014

Background

  • Research effort started in June 2009
  • Cooperative agreement between NASS and

National Institute of Statistical Sciences (NISS)

  • Agreement formed in response to a panel

review by the Committee on National Statistics (CNSTAT)

4

slide-5
SLIDE 5

Summer Conference Preview/Review 2014 July 22nd 2014

Recommendation from CNSTAT

Recommendation 6.7: NASS and ERS should consider approaches for imputation of missing data that would be appropriate when analyzing the data using multivariate

  • models. Methods for accounting for the

variability due to using imputed values should be investigated. Such methods would depend on the imputation approach adopted.

5

slide-6
SLIDE 6

Summer Conference Preview/Review 2014 July 22nd 2014

Current Imputation Methodology

  • Uses conditional mean imputation
  • Form Groups of Operations believed to be

similar (Region, Farm Size, Farm Type)

  • Impute the mean item value of the group for
  • perations in the group with missing values

for that item

6

slide-7
SLIDE 7

Summer Conference Preview/Review 2014 July 22nd 2014

New Imputation Methodology

  • Uses multiple variables in imputation
  • Data are transformed and a regression-based

technique is used

  • Various criteria are used to select the covariates
  • Parameter estimates for the sequence of linear

models and imputations are obtained using Markov chain Monte Carlo

  • Referred to as Iterative Sequential Regression

(ISR)

7

slide-8
SLIDE 8

Summer Conference Preview/Review 2014 July 22nd 2014

Operational Testing

  • R for Operational Use
  • Generalization & User Interface
  • Integrity of Data File
  • Transformations
  • Convergence
  • Impact to Workload
  • Impact to Indications

8

slide-9
SLIDE 9

Summer Conference Preview/Review 2014 July 22nd 2014

R for Operational Use

  • R was approved for operational use by the end
  • f the research project.
  • Server Issues

– Loading – Moving Data Across Platforms

9

slide-10
SLIDE 10

Summer Conference Preview/Review 2014 July 22nd 2014

Generalization & User Interface

  • Parameter Files

– Calculated Variables, Variable Groups, Variable Types, Questionnaire Versions, Transformations, Percents, Income Bins, Notification Email, Seed & Iterations & Imputations

  • SAS Programs

– Convert &Move Data and Run Program – Move Data and Convert Data

10

slide-11
SLIDE 11

Summer Conference Preview/Review 2014 July 22nd 2014

Integrity of Data File

  • Moving data across platforms and software

– Character Values – Rounded Values

  • Correct Cells and Reasonable Values
  • Zeros

11

slide-12
SLIDE 12

Summer Conference Preview/Review 2014 July 22nd 2014

Efficacy of Transformations 2008-2012

  • Achieving Normality (Univariate)
  • Across all years, 2008 to 2012, the

transformations selected produce a reasonable fit for nearly every variable.

12

slide-13
SLIDE 13

Summer Conference Preview/Review 2014 July 22nd 2014

Markov Chain Monte Carlo Convergence Diagnostics 2008-2012

  • Looking across the years 2008 to 2012,

convergence seems to be demonstrated by the 100th iteration for most imputed variables, and by the 200th iteration for most of the remainder.

13

slide-14
SLIDE 14

Summer Conference Preview/Review 2014 July 22nd 2014

Analysis of 2011/2012 Data

  • Evaluated change in workload by analyzing the

critical error counts

  • Examined the 18 key variables after the

summary

  • Looked results for 2011, 2012, and 2012

“collapsed” (covariates summed together)

14

slide-15
SLIDE 15

Summer Conference Preview/Review 2014 July 22nd 2014

Workload Evaluation

  • Analyzed Critical Error Count Differences and

Percent Differences for the following scenarios:

– 2011 ISR vs. 2011 Mean – 2012 ISR vs. 2012 Mean – 2012 ISR Collapsed vs. 2012 Mean

15

slide-16
SLIDE 16

Summer Conference Preview/Review 2014 July 22nd 2014

US Level Results

16

slide-17
SLIDE 17

Summer Conference Preview/Review 2014 July 22nd 2014

Workload Assessment Conclusions

  • Indications that the new ISR method will

somewhat increase workload

  • Indications that collapsing the variables

included in the model will somewhat increase workload compared to the full variable model

  • Indications that adding a couple edits to the

ISR program will not significantly reduce the workload

17

slide-18
SLIDE 18

Summer Conference Preview/Review 2014 July 22nd 2014

Impact to Estimates

  • NASS publishes 18 estimates from data

collected on ARMS III

  • 3 estimates include some imputed data
  • No post edit run after imputation

18

slide-19
SLIDE 19

Impact to Indications

  • Agricultural Chemicals

Expenditures

  • Farm Improvements and

Construction

  • Farm Services*
  • Farm Supplies and Repairs
  • Feed Expenditures
  • Fertilizer, Lime and Soil

Conditioner Expenditures

  • Fuels Expenditures
  • Interest
  • Labor Expenditures
  • Livestock, Poultry, and

Related Expenses

  • Miscellaneous Capital

Expenses

  • Other Farm Machinery

Expenditures

  • Rent
  • Seeds and Plants
  • Taxes*
  • Total Expenditures*
  • Tractor and Self-Propelled

Farm Machinery Expenditures

  • Trucks and Autos

Expenditures

* Variable contains imputed values

slide-20
SLIDE 20

Summer Conference Preview/Review 2014 July 22nd 2014

Calibration Interaction

  • Components that make up GVSALES are imputed.

– i.e. P543 (landlord share gov payments)

  • As GVSALES changes, ECONCLS changes.
  • One of our calibration targets is ECONCLS

– Movement between ECONCLS required us to re-calibrate.

20

slide-21
SLIDE 21

Summer Conference Preview/Review 2014 July 22nd 2014 21

slide-22
SLIDE 22

Summer Conference Preview/Review 2014 July 22nd 2014

slide-23
SLIDE 23

Summer Conference Preview/Review 2014 July 22nd 2014

slide-24
SLIDE 24

Summer Conference Preview/Review 2014 July 22nd 2014

Other and Future

  • Checks for ill-conditioned matrices
  • Stress Test and Document I/O Functions
  • Tuning Other Parameters of the Program
  • Speed

24

slide-25
SLIDE 25

“. . . providing timely, accurate, and useful statistics in service to U.S. agriculture.”

Thank You!