Lecture 6 Discrete Time Series 9/21/2018 1
Discrete Time Series
Stationary Processes A stocastic process (i.e. a time series) is considered to be strictly stationary if the properties of the process are not changed by a shift in origin. In the time series context this means that the joint distribution of {๐ง ๐ข 1 , โฆ , ๐ง ๐ข ๐ } must be identical to the distribution of {๐ง ๐ข 1 +๐ , โฆ , ๐ง ๐ข ๐ +๐ } for any value of ๐ and ๐ . 2
Stationary Processes A stocastic process (i.e. a time series) is considered to be strictly stationary if the properties of the process are not changed by a shift in origin. In the time series context this means that the joint distribution of {๐ง ๐ข 1 , โฆ , ๐ง ๐ข ๐ } must be identical to the distribution of {๐ง ๐ข 1 +๐ , โฆ , ๐ง ๐ข ๐ +๐ } for any value of ๐ and ๐ . 2
Weak Stationary Strict stationary is unnecessarily strong / restrictive for many applications, so instead we often opt for weak stationary which requires the following, 1. The process has finite variance ๐น(๐ง 2 2. The mean of the process is constant ๐น(๐ง ๐ข ) = ๐ for all ๐ข 3. The second moment only depends on the lag ๐ท๐๐ค(๐ง ๐ข , ๐ง ๐ก ) = ๐ท๐๐ค(๐ง ๐ข+๐ , ๐ง ๐ก+๐ ) for all ๐ข, ๐ก, ๐ When we say stationary in class we will almost always mean weakly stationary . 3 ๐ข ) < โ for all ๐ข
Weak Stationary Strict stationary is unnecessarily strong / restrictive for many applications, so instead we often opt for weak stationary which requires the following, 1. The process has finite variance ๐น(๐ง 2 2. The mean of the process is constant ๐น(๐ง ๐ข ) = ๐ for all ๐ข 3. The second moment only depends on the lag ๐ท๐๐ค(๐ง ๐ข , ๐ง ๐ก ) = ๐ท๐๐ค(๐ง ๐ข+๐ , ๐ง ๐ก+๐ ) for all ๐ข, ๐ก, ๐ When we say stationary in class we will almost always mean weakly stationary . 3 ๐ข ) < โ for all ๐ข
๐ฟ ๐ = ๐ฟ(๐ข, ๐ข + ๐) = ๐ท๐๐ค(๐ง ๐ข , ๐ง ๐ข+๐ ) = ๐ฟ(๐) ๐ ๐ = Autocorrelation as ๐ฟ(0) โ๐ฟ(๐ข, ๐ข)๐ฟ(๐ข + ๐, ๐ข + ๐) ๐ฟ(๐ข, ๐ข + ๐) this is also sometimes written in terms of the autocovariance function ( ๐ฟ ๐ ) ๐ 2 โ๐ ๐๐ (๐ง ๐ข )๐ ๐๐ (๐ง ๐ข+๐ ) ๐ท๐๐ค(๐ง ๐ข , ๐ง ๐ข+๐ ) = we define the autocorrelation at lag ๐ as 4 For a stationary time series, where ๐น(๐ง ๐ข ) = ๐ and Var (๐ง ๐ข ) = ๐ 2 for all ๐ข , ๐ ๐ = ๐ท๐๐ (๐ง ๐ข , ๐ง ๐ข+๐ ) = ๐น ((๐ง ๐ข โ ๐)(๐ง ๐ข+๐ โ ๐))
Autocorrelation ๐ 2 ๐ฟ(0) โ๐ฟ(๐ข, ๐ข)๐ฟ(๐ข + ๐, ๐ข + ๐) ๐ฟ(๐ข, ๐ข + ๐) as this is also sometimes written in terms of the autocovariance function ( ๐ฟ ๐ ) โ๐ ๐๐ (๐ง ๐ข )๐ ๐๐ (๐ง ๐ข+๐ ) ๐ท๐๐ค(๐ง ๐ข , ๐ง ๐ข+๐ ) = we define the autocorrelation at lag ๐ as 4 For a stationary time series, where ๐น(๐ง ๐ข ) = ๐ and Var (๐ง ๐ข ) = ๐ 2 for all ๐ข , ๐ ๐ = ๐ท๐๐ (๐ง ๐ข , ๐ง ๐ข+๐ ) = ๐น ((๐ง ๐ข โ ๐)(๐ง ๐ข+๐ โ ๐)) ๐ฟ ๐ = ๐ฟ(๐ข, ๐ข + ๐) = ๐ท๐๐ค(๐ง ๐ข , ๐ง ๐ข+๐ ) = ๐ฟ(๐) ๐ ๐ =
Covariance Structure โฎ ๐ฟ(1) ๐ฟ(0) โฏ ๐ฟ(๐ โ 4) ๐ฟ(๐ โ 3) ๐ฟ(๐ โ 2) ๐ฟ(๐ โ 1) โฎ ๐ฟ(๐ โ 1) โฑ โฎ โฎ โฎ โฎ ๐ฟ(๐ โ 3) ๐ฟ(๐ โ 4) ๐ฟ(๐) ๐ฟ(๐ โ 2) ๐ฟ(0) โ where ๐ ๐ข,๐ (๐ง) is the project of ๐ง onto the space spanned by ๐ง ๐ข+1 , โฆ , ๐ง ๐ข+๐โ1 . โ โ โ โ โ โ โ ๐ฟ(๐ โ 3) โ โ โ โ โ ๐ฟ(0) ๐ฟ(1) โฏ โฏ ๐ฟ(1) Based on our definition of a (weakly) stationary process, it implies a โ ๐ฟ(0) โ โ โ โ โ โ โ ๐ฟ(2) โ โ โ โ โ ๐ป = covariance of the following structure, ๐ฟ(1) ๐ฟ(3) ๐ฟ(2) ๐ฟ(2) ๐ฟ(3) ๐ฟ(๐ โ 2) ๐ฟ(๐ โ 3) โฏ ๐ฟ(1) ๐ฟ(0) ๐ฟ(1) ๐ฟ(๐ โ 1) โฏ ๐ฟ(๐ โ 2) โฏ ๐ฟ(2) ๐ฟ(1) ๐ฟ(0) ๐ฟ(1) ๐ฟ(๐) ๐ฟ(๐ โ 1) 5
Example - Random walk 6 Let ๐ง ๐ข = ๐ง ๐ขโ1 + ๐ฅ ๐ข with ๐ง 0 = 0 and ๐ฅ ๐ข โผ ๐ช(0, 1) . Random walk 10 y 0 โ10 0 250 500 750 1000 t
ACF + PACF 7 rw$y 10 0 โ10 0 200 400 600 800 1000 1.00 1.00 0.75 0.75 PACF ACF 0.50 0.50 0.25 0.25 0.00 0.00 0 10 20 30 40 50 0 10 20 30 40 50 Lag Lag
Stationary? 8 Is ๐ง ๐ข stationary?
Partial Autocorrelation - pACF Given these type of patterns in the autocorrelation we often want to This is done through the calculation of a partial autocorrelation ( ๐ฝ(๐) ), which is defined as follows: ๐ฝ(0) = 1 ๐ฝ(1) = ๐(1) = ๐ท๐๐ (๐ง ๐ข , ๐ง ๐ข+1 ) โฎ 9 examine the relationship between ๐ง ๐ข and ๐ง ๐ข+๐ with the (linear) dependence of ๐ง ๐ข on ๐ง ๐ข+1 through ๐ง ๐ข+๐โ1 removed. ๐ฝ(๐) = ๐ท๐๐ (๐ง ๐ข โ ๐ ๐ข,๐ (๐ง ๐ข ), ๐ง ๐ข+๐ โ ๐ ๐ข,๐ (๐ง ๐ข+๐ ))
Example - Random walk with drift 10 Let ๐ง ๐ข = ๐ + ๐ง ๐ขโ1 + ๐ฅ ๐ข with ๐ง 0 = 0 and ๐ฅ ๐ข โผ ๐ช(0, 1) . Random walk with trend 80 60 40 y 20 0 0 250 500 750 1000 t
ACF + PACF 11 rwt$y 80 60 40 20 0 0 200 400 600 800 1000 1.00 1.00 0.75 0.75 PACF ACF 0.50 0.50 0.25 0.25 0.00 0.00 0 10 20 30 40 50 0 10 20 30 40 50 Lag Lag
Stationary? 12 Is ๐ง ๐ข stationary?
Example - Moving Average 13 Let ๐ฅ ๐ข โผ ๐ช(0, 1) and ๐ง ๐ข = ๐ฅ ๐ขโ1 + ๐ฅ ๐ข . Moving Average 3 2 1 y 0 โ1 โ2 0 25 50 75 100 t
ACF + PACF 14 ma$y 3 2 1 0 โ1 โ2 0 20 40 60 80 100 0.25 0.25 PACF ACF 0.00 0.00 โ0.25 โ0.25 0 10 20 30 40 50 0 10 20 30 40 50 Lag Lag
Stationary? 15 Is ๐ง ๐ข stationary?
16 Autoregressive Let ๐ฅ ๐ข โผ ๐ช(0, 1) and ๐ง ๐ข = ๐ง ๐ขโ1 โ 0.9๐ง ๐ขโ2 + ๐ฅ ๐ข with ๐ง ๐ข = 0 for ๐ข < 1 . Autoregressive 4 y 0 โ4 0 100 200 300 400 500 t
ACF + PACF 17 ar$y 4 0 โ4 0 100 200 300 400 500 0.5 0.5 PACF 0.0 0.0 ACF โ0.5 โ0.5 0 10 20 30 40 50 0 10 20 30 40 50 Lag Lag
Example - Australian Wine Sales ## ## # ... with 166 more rows ## 10 1981. 22591 9 1981. 21133 ## 8 1981. 23739 ## 7 1980. 22893 ## 6 1980. 19227 ## 5 1980. 18019 ## 4 1980. 17708 3 1980. 20016 Australian total wine sales by wine makers in bottles <= 1 litre. Jan 1980 โ ## 2 1980. 16733 ## 15136 1 1980 ## <dbl> <dbl> ## date sales ## ## # A tibble: 176 x 2 aus_wine Aug 1994. 18 aus_wine = readRDS (โ../data/aus_wine.rdsโ)
19 Time series 40000 30000 sales 20000 1980 1985 1990 1995 date
Basic Model Fit 20 40000 30000 model sales linear quadratic 20000 1980 1985 1990 1995 date
Residuals 21 lin_resid 15000 10000 5000 0 โ5000 โ10000 type residual lin_resid quad_resid 15000 quad_resid 10000 5000 0 โ5000 โ10000 1980 1985 1990 1995 date
Autocorrelation Plot 22 d$quad_resid 10000 5000 0 โ5000 โ10000 0 50 100 150 0.5 0.5 PACF ACF 0.0 0.0 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 Lag Lag
23 lag1 lag2 lag3 lag4 10000 5000 0 โ5000 โ10000 lag5 lag6 lag7 lag8 10000 quad_resid 5000 0 โ5000 โ10000 lag9 lag10 lag11 lag12 10000 5000 0 โ5000 โ10000 โ10000 โ5000 0 5000 10000 โ10000 โ5000 0 5000 10000 โ10000 โ5000 0 5000 10000 โ10000 โ5000 0 5000 10000 lag_value
Auto regressive errors 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 0.679 ## lag_12 0.89024 0.04045 22.006 <2e-16 *** ## --- ## Signif. codes: ## 201.58416 ## Residual standard error: 2581 on 162 degrees of freedom ## (12 observations deleted due to missingness) ## Multiple R-squared: 0.7493, Adjusted R-squared: 0.7478 ## F-statistic: 484.3 on 1 and 162 DF, p-value: < 2.2e-16 0.415 83.65080 ## 3Q ## Call: ## lm(formula = quad_resid ~ lag_12, data = d_ar) ## ## Residuals: ## Min 1Q Median Max ## (Intercept) ## -12286.5 -1380.5 73.4 1505.2 7188.1 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) 24
Residual residuals 25 5000 0 resid โ5000 โ10000 1980 1985 1990 1995 date
Residual residuals - acf 26 l_ar$residuals 5000 0 โ5000 โ10000 0 50 100 150 0.2 0.2 0.1 0.1 0.0 PACF 0.0 ACF โ0.1 โ0.1 โ0.2 โ0.2 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 Lag Lag
27 lag1 lag2 lag3 lag4 5000 0 โ5000 โ10000 lag5 lag6 lag7 lag8 5000 0 resid โ5000 โ10000 lag9 lag10 lag11 lag12 5000 0 โ5000 โ10000 โ10000 โ5000 0 5000 โ10000 โ5000 0 5000 โ10000 โ5000 0 5000 โ10000 โ5000 0 5000 lag_value
sales (๐ข) = ๐พ 0 + ๐พ 1 ๐ข + ๐พ 2 ๐ข 2 + ๐ฅ ๐ข ๐ฅ ๐ข = ๐ ๐ฅ ๐ขโ12 + ๐ ๐ข Writing down the model? So, is our EDA suggesting that we fit the following model? the model we actually fit is, where 28 sales (๐ข) = ๐พ 0 + ๐พ 1 ๐ข + ๐พ 2 ๐ข 2 + ๐พ 3 sales (๐ข โ 12) + ๐ ๐ข
Recommend
More recommend