Analyzing the household: adjusting and smoothing Iv an Mej - PowerPoint PPT Presentation

Analyzing the household: adjusting and smoothing Iv´ an Mej´ ıa-Guevara imejia@demog.berkeley.edu Postdoctoral Scholar CEDA University of California, Berkeley East-West Center Summer Seminar on Population, June 10, 2010

Outline 1. Smoothing 2. Friedman’s Super Smoother (supsmu) 3. Variance estimation for age profiles 4. Age profile confidence intervals

1. Smoothing

1. Smoothing The per capita age profiles are noisy, particularly at ages with relatively few observations, and except as noted below should be smoothed. The following guidelines should be followed (NTA Manual): ◮ The per capita education profile should not be smoothed.

1. Smoothing: education age profile (Mexico 2004) cfe: sna 1993 6000 mexican pesos 4000 2000 0 0 0 5 10 15 20 20 25 30 35 40 40 45 50 55 60 60 65 70 75 80 80 85 90

1. Smoothing... ◮ Basic components should be smoothed, but not aggregations. For example, earnings and unincorporated income profiles should be smoothed, but the sum of the two should not be smoothed.

1. Smoothing: earnings (Mexico 2004) yl: sna 1993 80000 60000 mexican pesos 40000 20000 0 0 0 5 10 15 20 20 25 30 35 40 40 45 50 55 60 60 65 70 75 80 80 85 90

1. Smoothing: unincorporated income (Mexico 2004) yls: sna 1993 35000 30000 25000 20000 mexican pesos 15000 10000 5000 0 0 0 5 10 15 20 20 25 30 35 40 40 45 50 55 60 60 65 70 75 80 80 85 90 age

1. Smoothing: labor income (Mexico 2004) yl: sna 1993 80000 yl yle ylf yls 60000 mexican pesos 40000 20000 0 0 0 5 10 15 20 20 25 30 35 40 40 45 50 55 60 60 65 70 75 80 80 85 90

1. Smoothing... ◮ The objective is to reduce sampling variance but not eliminate what may be “real” features of the data. For example, Public health spending may increase dramatically when individuals reach an age threshold, e.g., 65. This kind of feature of the data should not be smoothed away.

1. Smoothing... ◮ Due to unusual high health consumption by newborns, we tend not to smooth health consumption by age 0. This could be done by including estimated unsmoothed health consumption by newborns to the age profile of smoothed private health consumption by other age groups.

1. Smoothing: private health consumption (Mexico 2004) cfh: sna 1993 10000 8000 6000 mexican pesos 4000 2000 0 0 0 5 10 15 20 20 25 30 35 40 40 45 50 55 60 60 65 70 75 80 80 85 90

1. Smoothing... ◮ Only adults (usually ages 15 and older) receive income, pay income taxes and make familial transfer outflows. Thus, when we smooth these age profiles, we begin smoothing from the adults, excluding those younger age group who do not earn income.

2. Friedman’s Super Smoother (supsmu)

2. Friedman’s Super Smoother (supsmu) There are a couple of steps to smoothing the per capita profile: 1. Create a spreadsheet, which contains unsmoothed age profile and the number of observations for each age. 2. Use Friedman’s SuperSmoother (supsmu function in R) to smooth the per capita profile incorporating the number of observations. The following is the R code to use the command “supsmu”. Suppose “thyl.csv” is the file name (tab delimited excel file format), yl the unsmoothed variable name, and sample is the number of observations for each age in the data. The R programming code is: nta < − read . csv (” thyl . csv ” , header = T ) ∗ Read in data . Work name is nta test < − supsmu ( nta $ age , nta $ yl , nta $ sample ) ∗ Smooth data . Work name is test write . csv ( test , ” smoothed yl . csv ”) ∗ Write out data using name ” smoothed yl ”

2. supsmu: R code ◮ supsmu(x, y, wt, span = ”cv”, periodic = FALSE, bass = 0) -Arguments: x: x values for smoothing y: y values for smoothing wt: case weights, by default all equal span: the fraction of the observations in the span of the running lines smoother, or ”cv” to choose this by leave-one-out cross-validation. periodic: if TRUE, the x values are assumed to be in [0, 1] and of period 1. bass: controls the smoothness of the fitted curve. Values of up to 10 indicate increasing smoothness.

2. Alternative to supsmu... The alternative smoothing method is “lowess” smoothing. The procedure is found to be unreliable because it does not incorporate sample weights. We recommend that it not be used. (see the NTA Manual for more detail about it if you feel more comfortable using the Stata rather than the R program, and would prefer to use the lowess smoothing method).

3. Variance estimation for age profiles

3. Variance estimation for age profiles ◮ Age profile estimation in NTA: ∑ n a y a = y a w ia y ia ¯ w = (1) ∑ n a a w ia where ¯ y a is the mean value of variable y (e.g. education) for individual aged a , w ia is the sampling weight for the individual i age a , n a is the sampling size of individuals in the age group a . ◮ Survey design: a) Simple Random Sampling (SRS) b) Complex design survey (CDS): estratified multi-stage cluster * Survey variables in CDS: 1) strata, 2) primary sampling units, 3) weights

3. Variance estimation for age profiles ◮ Variance estimation for Simple Random Samples (SRS): ( y = s 2 ) Var w n ( y � = Var ( y ) ◮ Variance estimation for CDS: Var ) w Var ( w ) * Taylor series linearization method (TSL): let’s define r = y w , then: y a ) = 1 w 2 [ var ( y ) + r 2 · var ( w ) − 2 · r · cov ( y , w )] var (¯ (2) where: h α − y 2 ( ) [∑ n h ] var ( y ) = ∑ H n h α =1 y 2 h h =1 n h − 1 n h h α − w 2 ( ) [∑ n h ] var ( w ) = ∑ H n h α =1 w 2 h h =1 n h − 1 n h ( ) [∑ n h ] cov ( y , w ) = ∑ H α =1 y h α w h α − y h w h n h h =1 n h − 1 n h where: H : number of estrata n h : number of individuals in stratum h

3. Stata code for variance estimation ◮ SRS: mean yl [pw=factor], over(age) where: yl: NTA variable, i.e. labor income factor: sampling weight age: ’age’ survey variable ◮ CDS: svyset psu [pw=factor], strata (stratum) svy: mean yl, over(age) where: psu: primary sampling unit survey variable stratum: strata survey variable

3. Stata output yle Over Mean Std. Err. [95% Conf. Interval] 0 0 0 . . 1 0 0 . . 2 0 0 . . 3 0 0 . . ... 30 7133.63 256.329 6631.23 7636.03 31 8576.72 419.072 7755.34 9398.09 32 7959.72 347.977 7277.69 8641.75 33 9022.32 395.903 8246.35 9798.28 34 8751.68 374.232 8018.19 9485.17 35 8395.42 421.098 7570.07 9220.77 ... 86 490.310 463.267 -417.69 1398.31 87 9.375 9.375 -8.9999 27.7499 ...

4. Confidence intervals

4. Stata output yle Over Mean Std. Err. [95% Conf. Interval] ... 30 7133.63 256.329 6631.23 7636.03 31 8576.72 419.072 7755.34 9398.09 Mean: ¯ y a Std. Err.: se ( ¯ y a ) Conf. Interval: ¯ y a + / − t df ∗ se ( ¯ y a )

4. Example: YL: earnings (yle) confidence interval (95%) 20000 cds-l yle cds-u 17500 srs-l srs-u 15000 12500 mexican pesos 10000 7500 5000 2500 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 age

4. Coefficient of variation: ce (¯ y a ) = se (¯ y a ) / ¯ y a cv: yle 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 age

4. Example: YL: entrepreneurial income (yls) confidence interval (95%) 4000 cds-l yls cds-u 3500 srs-l srs-u 3000 2500 mexican pesos 2000 1500 1000 500 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 age

4. YL: imputed self-employed income (ylss) confidence interval (95%) 2000 cds-l ylss cds-u srs-l srs-u 1500 mexican pesos 1000 500 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 age

4. YL: coefficient of variation (yls) cv: yls 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 age

4. Confidence intervals for smoothed profiles: supsmu ◮ ( x 1 , y 1 )...( x n , y n ): y i = s ( x i ) + r i , i = 1 ... n (3) ◮ Smoothed value at point x i : i + J / 2 s ( x i ) = 1 ∑ y i J i − J / 2 ◮ Expected squared error at point x i , under E ( r i ) = 0, Var ( r i ) = σ 2 : 2   i + J / 2  f ( x i ) − 1 + 1 e 2 ( x i � J ) = ∑ J σ 2 f ( x i ) (4)  J i − J / 2

4. supsmu: NTA framework ◮ ( a , ¯ y a )...( a , ¯ y a ): ¯ y a = s (¯ y a ) + r a , a = 0 ...ω (5) ◮ Smoothed value at age a : i + J / 2 y a ) = 1 ∑ s (¯ y a ¯ J i − J / 2 ◮ Expected squared error at age a , under E ( r a ) = 0, Var ( r a ) = σ 2 i = Var cds (¯ y a ): 2   a + J / 2 a + J / 2 y a − 1 + 1 ∑ ∑ e 2 ( a � J ) =  ¯ y a ¯ Var cds (¯ y a ) (6)  J 2 J a − J / 2 a − J / 2

4. Example-supsmu: remittances (span=0.05) confidence interval (95%) 1300 1200 cds-l rem cds-u 1100 ci-l: span=0.05 ci-u: span=0.05 1000 900 800 700 mexican pesos 600 500 400 300 200 100 0 -100 -200 -300 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 age

Analyzing the household: adjusting and smoothing Iv an Mej - PowerPoint PPT Presentation

Analyzing the household: adjusting and smoothing Iv an Mej a-Guevara imejia@demog.berkeley.edu Postdoctoral Scholar CEDA University of California, Berkeley East-West Center Summer Seminar on Population, June 10, 2010 Outline 1.

Household Magnets Household Magnets Magnets stick only to certain metals Magnets stick only

Adjusting Your Development Plan Due to COVID-19 Objectives for Today Fundraising planning

Adjusting to Adjusting to Succeed Succeed Raym ond Raym ond McManus McManus President &

Self Adjusting Data Structures 1 Self Adjusting Data Structures t ve to front 2

Self-Adjusting Data Structures 1 Self-Adjusting Data Structures move-to-front 2 7 4 1 9 5

Household Safety Household Safety Objectives Statistics on household safety Common

Household Safety Household Safety Objectives Statistics on household safety Common

Household Analysis Review Group 12 April 2011 Incorporating Survey Data in Household Projections

Characterization of the Household Electricity Characterization of the Household Electricity

FNNR Household Survey Data & Map Outputs (2014-2016) Household Surveys in FNNR, 2014- 2016

THE COMPARISON OF INCOME THE COMPARISON OF INCOME SMOOTHING AND MARKET SMOOTHING AND MARKET

Exponential smoothing and non-negative data Muhammad Akram Rob J Hyndman J Keith Ord Business

Rotating Half Smoothing Filters, Image Rotating Half Smoothing Filters, Image Segmentation and

Image Smoothing ! Chicken-and-egg dilemma! " ! Edge preserving image smoothing !

Language Models LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing Web Search Slides based

Kernel Smoothing Methods (Part 1) Henry Tan Georgetown University April 13, 2015 Georgetown

Webinars Monday, June 5, 2017 11:00am 12:30pm Tuesday, June 6, 2017 11:00am 12:30pm Monday,

Establishing the significant properties of digital research Gareth Knight Centre for e-Research

WORKSHOP ON RECRUITMENT COSTS SURVEY Survey Solutions: Operationalisation (Designer &Tester)

Providing structure to experimental data: A large scale heterogeneous database for collaborative

OxCORT Course Director Role Sarah Holmes Business Systems Trainer Business Services and

Cluster Subspace Identification Via Conditional Entropy Calculations James Diggans George

Ini nitiat iativ ive 60% 50% Faculty Senate Presentation September 10, 2019 40% Denise

Sit back, relax, have something sweet, as we learn to Tweet!

Sambuz

Useful Links

Newsletter

Mail Us

Analyzing the household: adjusting and smoothing Iv an Mej - PowerPoint PPT Presentation

Analyzing the household: adjusting and smoothing Iv an Mej a-Guevara imejia@demog.berkeley.edu Postdoctoral Scholar CEDA University of California, Berkeley East-West Center Summer Seminar on Population, June 10, 2010 Outline 1.

Household Magnets Household Magnets Magnets stick only to certain metals Magnets stick only

Adjusting Your Development Plan Due to COVID-19 Objectives for Today Fundraising planning

Adjusting to Adjusting to Succeed Succeed Raym ond Raym ond McManus McManus President &amp;

Self Adjusting Data Structures 1 Self Adjusting Data Structures t ve to front 2

Self-Adjusting Data Structures 1 Self-Adjusting Data Structures move-to-front 2 7 4 1 9 5

Household Safety Household Safety Objectives Statistics on household safety Common

Household Safety Household Safety Objectives Statistics on household safety Common

Household Analysis Review Group 12 April 2011 Incorporating Survey Data in Household Projections

Characterization of the Household Electricity Characterization of the Household Electricity

FNNR Household Survey Data &amp; Map Outputs (2014-2016) Household Surveys in FNNR, 2014- 2016

THE COMPARISON OF INCOME THE COMPARISON OF INCOME SMOOTHING AND MARKET SMOOTHING AND MARKET

Exponential smoothing and non-negative data Muhammad Akram Rob J Hyndman J Keith Ord Business

Rotating Half Smoothing Filters, Image Rotating Half Smoothing Filters, Image Segmentation and

Image Smoothing ! Chicken-and-egg dilemma! &quot; ! Edge preserving image smoothing !

Language Models LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing Web Search Slides based

Kernel Smoothing Methods (Part 1) Henry Tan Georgetown University April 13, 2015 Georgetown

Webinars Monday, June 5, 2017 11:00am 12:30pm Tuesday, June 6, 2017 11:00am 12:30pm Monday,

Establishing the significant properties of digital research Gareth Knight Centre for e-Research

WORKSHOP ON RECRUITMENT COSTS SURVEY Survey Solutions: Operationalisation (Designer &amp;Tester)

Providing structure to experimental data: A large scale heterogeneous database for collaborative

OxCORT Course Director Role Sarah Holmes Business Systems Trainer Business Services and

Cluster Subspace Identification Via Conditional Entropy Calculations James Diggans George

Ini nitiat iativ ive 60% 50% Faculty Senate Presentation September 10, 2019 40% Denise

Sit back, relax, have something sweet, as we learn to Tweet!

Sambuz

Useful Links

Newsletter

Mail Us

Adjusting to Adjusting to Succeed Succeed Raym ond Raym ond McManus McManus President &

FNNR Household Survey Data & Map Outputs (2014-2016) Household Surveys in FNNR, 2014- 2016

Image Smoothing ! Chicken-and-egg dilemma! " ! Edge preserving image smoothing !

WORKSHOP ON RECRUITMENT COSTS SURVEY Survey Solutions: Operationalisation (Designer &Tester)