Canada Data and Resources Hugh McCague Valerie Preston Walter - - PowerPoint PPT Presentation

canada data and resources
SMART_READER_LITE
LIVE PREVIEW

Canada Data and Resources Hugh McCague Valerie Preston Walter - - PowerPoint PPT Presentation

Accessing Statistics Canada Data and Resources Hugh McCague Valerie Preston Walter Giesbrecht Sara Tumpane Outline Survey Terminology Research Data Centre (RDC) RDC versus Public Use Microdata Files (PUMF) Accessing the RDC


slide-1
SLIDE 1

Accessing Statistics Canada Data and Resources

Hugh McCague Valerie Preston Walter Giesbrecht Sara Tumpane

slide-2
SLIDE 2

Outline

  • Survey Terminology
  • Research Data Centre (RDC)
  • RDC versus Public Use Microdata Files (PUMF)
  • Accessing the RDC
  • Statistics Canada Surveys and Data
  • Statistical Software
  • Research Opportunities
  • Statistical Consulting Service
  • Resources
slide-3
SLIDE 3

Some Survey Terminology

3

  • Population
  • Elements
  • Sample: Simple Random Sample, Probability

Sample

  • Response Rate
  • Weights: Simple Weights
slide-4
SLIDE 4

4

  • Demographics
  • Strata
  • Clusters (primary sampling units, PSUs)
  • Complex Sample
  • Complex Weights, Bootstrap and Jackknife

Replicate Weights

Some Survey Terminology

slide-5
SLIDE 5

5

  • Cross-sectional data
  • Longitudinal data: periods, waves, cycles,

trajectory, life course

  • Attrition: attrition rate.
  • Helpful reference:

Ornstein, Michael. A Companion to Survey

  • Research. London; Thousand Oaks, CA:

SAGE, 2013.

Some Survey Terminology

slide-6
SLIDE 6

Research Data Center (RDC)

  • Access to Statistics Canada data and statistical software
  • Microdata & administrative data
  • For York students and faculty, access is free
  • A “secure” environment
  • Researchers are “deemed employees” of Statistics Canada
  • Must work in RDC
  • CRDCN Network
slide-7
SLIDE 7

The CRDCN Network

slide-8
SLIDE 8

York RDC

  • 282 York Lanes
  • Staffed by:
  • Analyst Sara Tumpane (yorkrdc2@yorku.ca)
  • Assistant Theresa Kim (yorkrdc3@yorku.ca)
  • 8 workstations
  • Open 3-3.5 days/ wk
  • http://www.isr.yorku.ca/rdc/

8

slide-9
SLIDE 9

Before you apply to the RDC…

  • Consider your options
  • Is what you need in some more readily accessible source

(either PUMF or aggregate file)

slide-10
SLIDE 10

RDC or PUMF?

Confidential Microdata in Research Data Centres Public Use Microdata Files accessed

  • nline

Characteristics:

  • Contains most of the original

information collected during the survey

  • Continuous variables are accessible
  • Longitudinal identifiers provided
  • Contains bootstrap weights used for

calculating exact variance Characteristics:

  • Manipulated by aggregating,

capping, or deleting variables that could be “identifiers”; survey respondents cannot be identified

  • Many continuous variables

transformed into categorical variables

  • Longitudinal identifiers stripped

Access is appropriate when:

  • Sensitive variables not provided in

PUMF

  • A PUMF does not exist
  • Longitudinal data is necessary
  • Analytical work is complex in

nature Access is appropriate when:

  • Immediate data access is required
  • Analysis is for a course paper or

equivalent

  • Data exploration
slide-11
SLIDE 11

CCHS 2012 Example 1

PUMF Master File

  • 1381 variables
  • Sources of personal income
  • Employment inc.
  • EI/Worker's comp
  • Senior benefits
  • Other
  • 1815 variables
  • Sources of personal income
  • wages and salaries
  • income from self-employment
  • dividends and interest
  • employment insurance
  • worker's compensation
  • CPP or QPP
  • job related retirement pensions
  • RRSP/RRIF
  • OAS and GIS
  • social assistance/welfare
  • child tax benefits
  • child support
  • alimony
  • ther
  • none
slide-12
SLIDE 12

CCHS 2012 Example 2

PUMF Master File

  • Geography
  • Province of residence of respondent-(G)
  • Health Region - (G)
  • B.C. Health Authority (BCHA) - (D)
  • Geography
  • Province of residence of respondent
  • Postal code - (D)
  • Health region of residence of respondent - (D)
  • Sub-health region (Québec only) - (D)
  • Nova Scotia district health authority
  • British Columbia local health authority - (D)
  • Regional health authority (RHA) - Alberta - (D)
  • British Columbia health authority - (D)
  • Local health integrated networks - Ontario - (D)
  • 2006 census dissemination area
  • Federal electoral district - (D)
  • Census subdivision - (D)
  • Census division - (D)
  • Statistical area classification type - (D)
  • 2006 Census metropolitan area (CMA)
  • Health region peer group
  • Urban and rural areas
  • Urban and rural areas - 2 levels - (D)
  • Subzones for Alberta
  • Manitoba health authority - (D)
slide-13
SLIDE 13

Accessing PUMFs & master file metadata

  • Statistics Canada Nesstar data portal
  • metadata only, for PUMFs and master files
  • http://www62.statcan.ca/webview/
  • YUL: Data & Statistics library guide
  • http://researchguides.library.yorku.ca/data
  • <odesi> (OCUL)
  • http://www.library.yorku.ca/e/resolver/id/1165738
slide-14
SLIDE 14

http://www.andertoons.com/data/cartoon/6543/things-good-stuff-ok-i-reiterate-request-for-specific-data

slide-15
SLIDE 15

How to apply to an RDC and available datasets

  • RDC Application Pages
  • SSHRC Website
  • Data available in the RDCs
slide-16
SLIDE 16

Accessing the RDC

Action Timeline Notes Apply through the SSHRC website 1-2 Hours

Provide list of academic contributions; 5-10 page project proposal

Evaluation of the proposal 2-4 Weeks

Approval based on relevance of methods and data, and demonstrated need for microdata

Security screening process 1-3 Weeks for approval Sign Microdata Research Contract 1-3 Weeks for approval

slide-17
SLIDE 17

Project Proposal

  • The project proposal is a maximum of ten pages and

includes the following elements:

  • Title of the Project
  • Rationale and objectives of the study
  • Proposed data analysis and software requirements
  • Data requirements
  • Expected project start and end dates
  • Expected products
  • References
slide-18
SLIDE 18

Data at the RDC

  • Canadian Community Health Survey (CCHS): 2001-2014
  • Health status, health care utilization, and health determinants
  • Annual Component (starting in 2001, N~130,000)
  • Mental Health (2002, 2012) N ~ 37,000
  • Nutrition (2004) N ~ 35,000
  • Healthy Aging (2008-2009) N ~ 52,000 (sample 45+)
  • Canadian Health Measures Survey (CHMS): 2011, 2012, 2013
  • Survey and administrative data
  • Hate Crime Data (Pilot): 2010-2012
  • Characteristics of hate-motivated criminal incidents, victims, and accused

persons

slide-19
SLIDE 19

Data (continued)

  • General Social Survey (GSS): 1985-2014
  • Health (1985, 1991)
  • Time Use (1986, 1992, 1998, 2005, 2010)
  • Victimization (1988, 1993, 1999, 2004, 2009, 2014)
  • Education, Work and Retirement (1989, 1994)
  • Family (1990, 1995, 2001, 2006, 2011)
  • Caregiving and Care Receiving (1996, 2002, 2007, 2012)
  • Access to and Use of Information Technology (2000)
  • Social Networks/Social Identity (2003, 2008, 2013)
  • Giving, Volunteering and Participating (2013)
  • National Longitudinal Survey of Children and Youth (NLSCY): 8 cycles
  • Development and well-being: birth - early adulthood
  • Follow-ups every two years to age 25
slide-20
SLIDE 20

Data by Themes

  • Health and Health Care
  • National Population Health Survey (NPHS)
  • Participation and Activity Limitation Survey (PALS)
  • Canadian Tobacco, Alcohol and Drugs Survey (CTADS)
  • Occupations and Organizations
  • Workplace and Employee Survey (WES)
  • Survey of Labour and Income Dynamics (SLID)
  • Census
  • Education
  • Youth in Transition Survey (YITS)
  • National Graduates Survey (NGS)
  • Race and Ethnicity
  • Aboriginal Peoples Survey (APS)
  • Longitudinal Survey of Immigrants to Canada (LSIC)
  • Ethnic Diversity Survey (EDS)
slide-21
SLIDE 21

Pilot Data

  • Canadian Cancer Registry (CCR)
  • Vital Statistics
  • Uniform Crime Reporting
  • Homicide Survey
  • Hate Crime Data
  • Ministry of Community and Social Services (MCSS)
  • Citizenship and Immigration Canada (CIC)
slide-22
SLIDE 22

Which Statistical Software to use at the York RDC? Features to Consider

  • SPSS 23
  • SAS 9.4
  • Stata 13
  • R 3.0.3

Statistical Software Resources: Institute for Digital Research and Educations (idre), UCLA

http://www.ats.ucla.edu/stat/

slide-23
SLIDE 23

23

  • Ames, M. E., Rawana J. S., Gentile P., and Morgan A. S.

“The protective role of optimism and self-esteem on depressive symptom pathways among Canadian Aboriginal youth.” Journal of Youth and Adolescence 44.1 (2013): 142-154.

  • National Longitudinal Study of Children and Youth
  • Complex Sample Design, Post-Stratification
  • Longitudinal Linear Mixed Models with Mediation

An Example of a Psychology Research Project at the York RDC

slide-24
SLIDE 24

24

  • Extending methods to Complex Samples Designs
  • Proper methods for the Structural Equation Modeling
  • f Complex Survey Data are strongly needed (Bollen

et al., 2013)

  • R package laavan.survey has started to address this

issue (Oberski, 2014)

  • Item Response Theory with Complex Survey Data

needs much more development (Cyr and Davies, 2005)

A Few of Many Quantitative Methods Research Opportunities

slide-25
SLIDE 25

Statistical Consulting Service (SCS)

25

  • Statistical Consulting provided by a group of York

faculty and graduate students with staff at the Institute for Social Research (ISR).

  • Usually, no fee for York faculty and student researchers
  • Online appointment scheduler
slide-26
SLIDE 26

http://truthfacts.com/truthfacts/2014/04/09

slide-27
SLIDE 27

Statistical Consulting Service (SCS)

27

  • ISR/SCS Short Courses and Spring Seminar Series on

data analysis, qualitative research methods, survey methods, and related software

  • More details: http://www.isryorku.ca/centres/scs/
slide-28
SLIDE 28

Contact Information and Resources

  • http://www.isryorku.ca/qmforum
slide-29
SLIDE 29