characterizing discrepancies in reported a acreage
play

Characterizing Discrepancies in Reported A Acreage between the - PowerPoint PPT Presentation

Characterizing Discrepancies in Reported A Acreage between the Census of Agriculture b t th C f A i lt and June Agricultural Survey Michael E. Bellow Heather Ridolfo Heather Ridolfo National Agricultural Statistics Service United States


  1. Characterizing Discrepancies in Reported A Acreage between the Census of Agriculture b t th C f A i lt and June Agricultural Survey Michael E. Bellow Heather Ridolfo Heather Ridolfo National Agricultural Statistics Service United States Department of Agriculture DC-AAPOR/WSS Summer Conference / Aug. 3, 2015 Washington, DC

  2. Outline • Background • Methods Methods • Results – Descriptive Graphics – Logistic Regression g g • Summary and Implications

  3. Research Question What factors were most influential on the large discrepancies in reported acreage operated p p g p between the 2012 JAS and COA?

  4. Background g • 2007 Classification Error Survey (CES) 2007 Cl ifi ti E S (CES) – Misclassification (farms classified as non-farms and vice versa) – Substantial acreage discrepancies between Census of Agriculture (COA) and June Agricultural Survey (JAS) for land related variables (COA) d J A i l l S (JAS) f l d l d i bl (e.g., total acres operated) • Re-interviews conducted on 147 operations found that acreage discrepancies were due to: – Actual changes in acreage over period between JAS and COA Actual changes in acreage over period between JAS and COA – Reporting errors – Change in respondents 2012 - large acreage discrepancies found again

  5. Definition of Total Acres Operated Total Acres Operated = (Acres owned) + (Acres rented/leased from others) (Acres rented/leased from others) – (Acres rented/leased to others) ( / )

  6. June Agricultural Survey (JAS) g y • Area frame based sample survey conducted A f b d l d t d annually in June • Sampling unit is segment (generally 1 square • Sampling unit is segment (generally 1 square mile), divided into tracts • Data collected on U S crops livestock grain • Data collected on U.S. crops, livestock, grain storage capacity, type and size of farm for tracts within sampled segments within sampled segments • Two week data collection period (first half of the month) • Face-to-face interviewing

  7. Census of Agriculture (COA) g • Complete enumeration of U.S. farms and ranches conducted every 5 years y y • Data collected on land use and ownership, operator characteristics income expenditures operator characteristics, income, expenditures and farming practices for the previous year • Multiple frame (area and list) • Primarily mail survey • Primarily mail survey

  8. Combined JAS/COA Data Set • JAS records matched to corresponding records in two COA datasets (unedited and edited) • Total number of matched records = 25,983 • Some COA records were linked to multiple JAS Some COA records were linked to multiple JAS records, each reporting data for the entire operation • Some JAS records were linked to multiple COA • Some JAS records were linked to multiple COA records (mainly ‘split’ operations)

  9. Adjusted Percent Difference (APD) APD = 100*(COA-JAS)/(COA+100) (if COA>JAS) = 100*(JAS-COA)/(JAS+100) (otherwise) Example - Example - COA JAS %Diff APD 7 5 29 1.9 700 500 29 25

  10. Exploratory Data Analysis p y y • Records for which APD of total acres operated is 25 or higher defined to be discrepant g p • 23% of operations (nationwide) identified as discrepant discrepant • Dependent variable in logistic regression is binary for acreage discrepancy (1 if discrepant, 0 otherwise) 0 otherwise)

  11. Explanatory Variables p y • Farm type (crop vs livestock) • Land rented from others (acres) ( ) • Land rented to others (acres) • Number of operators • Number of operators • Operator tenure (years operating farm) • Average drought level during JAS (county level) • Mode of COA data collection (face-to-face, CATI, etc.) ( , , ) • Time between JAS and COA (days)

  12. Drought Intensity Data Set g y • Obtained from Univ. of Nebraska’s National Drought Mitigation Center (NMDC) • • Drought Monitor Classification Scheme (DMCS) - Drought Monitor Classification Scheme (DMCS) - - six levels of drought ranging from ‘none’ to ‘exceptional’ recorded weekly at county level from May 29 – June 25, 2012 recorded weekly at county level from May 29 June 25, 2012 - data sets give percent of county’s area classified to each drought level g - overall county level average drought level computed from data

  13. Effect of Data Editing Data Set  JAS/Unedited / JAS/Edited COA / COA No. Records 25,983 25,983 Discrepant Records 6,601 (25.4%) ( ) 5,958 (22.9%) ( ) Discrepant Records 1,351 (20.5%) _ Edited Discrepancies _ 745 (55%) Resolved Non Discrepancies Non-Discrepancies _ 102 (11%) 102 (11%) Broken

  14. Preliminary Findings (From Exploratory Data Analysis) Data Analysis) More Discrepancy If: More Discrepancy If Independent Variable: • Livestock farm • Farm type (crop or li livestock) k) • Multiple operators • Number of operators • Newer operators • Newer operators • Operator tenure • Higher drought level • Drought level during JAS • Phone/CATI Ph /CATI • Mode of COA data collection • Longer time • Time between JAS and COA L ti Ti b t JAS d COA

  15. Logistic Regression g g • Goal – model probability of discrepancy as function of independent (explanatory) variables • Wald Chi-Square Statistic – used to test whether regression parameter estimate for a given independent variable is significantly different from zero • Odds Ratio – measures strength of association between dependent variable and a given p g independent variable

  16. Results of Logistic Regression g g Independent Variable Wald Test Odds Ratio Chi-Square P-Value Value 95% Confidence Statistic Interval Livestock Farm* Li t k F * 20 1 20.1 < 0001 <.0001 1 148 1.148 [1 081 1 219] [1.081-1.219] No. Operators 2.6 0.11 1.027 [0.994-1.06] Operator Tenure Operator Tenure 2 19 2.19 0 14 0.14 1 002 1.002 [1 0-1 004] [1.0 1.004] Avg. Drought Level 86.1 <.0001 1.137 [1.107-1.169] Mode = Phone/CATI* / 6.9 0.009 1.198 [1.047-1.371] [ ] Mode = EDR (Web)* 0.31 0.58 0.973 [0.883-1.072] Mode = FTF/CAPI* 0.26 0.61 0.963 [0.832-1.114] Days (JAS to COA) 12.0 0.0005 1.001 [1.001-1.002] * - binary variable y

  17. Summary and Future Work y • Six explanatory variables found to be significant in logistic Si l t i bl f d t b i ifi t i l i ti regression based on Wald chi-square test • Of those variables, livestock farm , average drought level and f g g phone/CATI showed most influence in terms of which farms have discrepancies and which do not • Next phase of research effort Next phase of research effort - explore explanatory variables further - probe largest outliers - investigate odd patterns (e.g. 60+ records with COA total land = 1, JAS investigate odd patterns (e g 60+ records with COA total land 1 JAS total land > 100) - data mining techniques (classification trees, clustering)?

  18. Acknowledgments g • Denise Abreu i b • Mark Apodaca • Mark Gorsak • Noemi Guindin Noemi Guindin • Thomas Jacob • Andrea Lamas A d L • Jaki McCarthy

  19. Questions/Comments? Michael E. Bellow, USDA/NASS Sampling and Estimation Research Section Mike.Bellow@nass.usda.gov e e o @ ass usda go Heather Ridolfo, USDA/NASS Survey Methodology and Technology Section Survey Methodology and Technology Section Heather.Ridolfo@nass.usda.gov

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend