“. . . providing timely, accurate, and useful statistics in service to U.S. agriculture.”
Imputation Methodology for the Agricultural Resource Management - - PowerPoint PPT Presentation
Imputation Methodology for the Agricultural Resource Management - - PowerPoint PPT Presentation
Assessing the Impact of a New Imputation Methodology for the Agricultural Resource Management Survey Darcy Miller National Agricultural Statistics Service . . . providing timely, accurate, and useful statistics in service to U.S.
Summer Conference Preview/Review 2014 July 22nd 2014
National Agricultural Statistics Service (NASS)
- “The National Agricultural Statistics
Service provides timely, accurate, and useful statistics in service to U.S. Agriculture.”
2
Summer Conference Preview/Review 2014 July 22nd 2014
Agricultural Resource Management Survey (ARMS)
ARMS is the USDA’s primary survey for the annual collection of data from farm operators
- Household
- demographic attributes, labor allocation and debt
- Farm
- ownership, management structure, cost and returns, assets
and debt
- Production Practices
- tillage, fertilizer, and pesticides
3
Summer Conference Preview/Review 2014 July 22nd 2014
Background
- Research effort started in June 2009
- Cooperative agreement between NASS and
National Institute of Statistical Sciences (NISS)
- Agreement formed in response to a panel
review by the Committee on National Statistics (CNSTAT)
4
Summer Conference Preview/Review 2014 July 22nd 2014
Recommendation from CNSTAT
Recommendation 6.7: NASS and ERS should consider approaches for imputation of missing data that would be appropriate when analyzing the data using multivariate
- models. Methods for accounting for the
variability due to using imputed values should be investigated. Such methods would depend on the imputation approach adopted.
5
Summer Conference Preview/Review 2014 July 22nd 2014
Current Imputation Methodology
- Uses conditional mean imputation
- Form Groups of Operations believed to be
similar (Region, Farm Size, Farm Type)
- Impute the mean item value of the group for
- perations in the group with missing values
for that item
6
Summer Conference Preview/Review 2014 July 22nd 2014
New Imputation Methodology
- Uses multiple variables in imputation
- Data are transformed and a regression-based
technique is used
- Various criteria are used to select the covariates
- Parameter estimates for the sequence of linear
models and imputations are obtained using Markov chain Monte Carlo
- Referred to as Iterative Sequential Regression
(ISR)
7
Summer Conference Preview/Review 2014 July 22nd 2014
Operational Testing
- R for Operational Use
- Generalization & User Interface
- Integrity of Data File
- Transformations
- Convergence
- Impact to Workload
- Impact to Indications
8
Summer Conference Preview/Review 2014 July 22nd 2014
R for Operational Use
- R was approved for operational use by the end
- f the research project.
- Server Issues
– Loading – Moving Data Across Platforms
9
Summer Conference Preview/Review 2014 July 22nd 2014
Generalization & User Interface
- Parameter Files
– Calculated Variables, Variable Groups, Variable Types, Questionnaire Versions, Transformations, Percents, Income Bins, Notification Email, Seed & Iterations & Imputations
- SAS Programs
– Convert &Move Data and Run Program – Move Data and Convert Data
10
Summer Conference Preview/Review 2014 July 22nd 2014
Integrity of Data File
- Moving data across platforms and software
– Character Values – Rounded Values
- Correct Cells and Reasonable Values
- Zeros
11
Summer Conference Preview/Review 2014 July 22nd 2014
Efficacy of Transformations 2008-2012
- Achieving Normality (Univariate)
- Across all years, 2008 to 2012, the
transformations selected produce a reasonable fit for nearly every variable.
12
Summer Conference Preview/Review 2014 July 22nd 2014
Markov Chain Monte Carlo Convergence Diagnostics 2008-2012
- Looking across the years 2008 to 2012,
convergence seems to be demonstrated by the 100th iteration for most imputed variables, and by the 200th iteration for most of the remainder.
13
Summer Conference Preview/Review 2014 July 22nd 2014
Analysis of 2011/2012 Data
- Evaluated change in workload by analyzing the
critical error counts
- Examined the 18 key variables after the
summary
- Looked results for 2011, 2012, and 2012
“collapsed” (covariates summed together)
14
Summer Conference Preview/Review 2014 July 22nd 2014
Workload Evaluation
- Analyzed Critical Error Count Differences and
Percent Differences for the following scenarios:
– 2011 ISR vs. 2011 Mean – 2012 ISR vs. 2012 Mean – 2012 ISR Collapsed vs. 2012 Mean
15
Summer Conference Preview/Review 2014 July 22nd 2014
US Level Results
16
Summer Conference Preview/Review 2014 July 22nd 2014
Workload Assessment Conclusions
- Indications that the new ISR method will
somewhat increase workload
- Indications that collapsing the variables
included in the model will somewhat increase workload compared to the full variable model
- Indications that adding a couple edits to the
ISR program will not significantly reduce the workload
17
Summer Conference Preview/Review 2014 July 22nd 2014
Impact to Estimates
- NASS publishes 18 estimates from data
collected on ARMS III
- 3 estimates include some imputed data
- No post edit run after imputation
18
Impact to Indications
- Agricultural Chemicals
Expenditures
- Farm Improvements and
Construction
- Farm Services*
- Farm Supplies and Repairs
- Feed Expenditures
- Fertilizer, Lime and Soil
Conditioner Expenditures
- Fuels Expenditures
- Interest
- Labor Expenditures
- Livestock, Poultry, and
Related Expenses
- Miscellaneous Capital
Expenses
- Other Farm Machinery
Expenditures
- Rent
- Seeds and Plants
- Taxes*
- Total Expenditures*
- Tractor and Self-Propelled
Farm Machinery Expenditures
- Trucks and Autos
Expenditures
* Variable contains imputed values
Summer Conference Preview/Review 2014 July 22nd 2014
Calibration Interaction
- Components that make up GVSALES are imputed.
– i.e. P543 (landlord share gov payments)
- As GVSALES changes, ECONCLS changes.
- One of our calibration targets is ECONCLS
– Movement between ECONCLS required us to re-calibrate.
20
Summer Conference Preview/Review 2014 July 22nd 2014 21
Summer Conference Preview/Review 2014 July 22nd 2014
Summer Conference Preview/Review 2014 July 22nd 2014
Summer Conference Preview/Review 2014 July 22nd 2014
Other and Future
- Checks for ill-conditioned matrices
- Stress Test and Document I/O Functions
- Tuning Other Parameters of the Program
- Speed
24
“. . . providing timely, accurate, and useful statistics in service to U.S. agriculture.”