Primer on time series Joshua Loftus July 17, 2015 Outline - - PowerPoint PPT Presentation

primer on time series
SMART_READER_LITE
LIVE PREVIEW

Primer on time series Joshua Loftus July 17, 2015 Outline - - PowerPoint PPT Presentation

Primer on time series Joshua Loftus July 17, 2015 Outline Motivating examples A spoonful of theory Further reading ts(): Creating a time series object Google trends: search popularity of game of thrones Read the data and


slide-1
SLIDE 1

Primer on time series

Joshua Loftus July 17, 2015

slide-2
SLIDE 2

Outline

◮ Motivating examples ◮ A spoonful of theory ◮ Further reading

slide-3
SLIDE 3

ts(): Creating a time series object

Google trends: search popularity of “game of thrones”

Read the data and subset to the right part (the .csv file from Google trends is a bit messy) setwd("~/Dropbox/work/teaching/consulting/timeseries") data <- read.csv("GoT.csv", skip = 4, stringsAsFactors = F) data <- data[1:211,] data[,2] <- as.numeric(data[,2]) The data is given by week. Seasons happen once per year. d <- ts(data[,2], frequency = 52)

slide-4
SLIDE 4

stl(): Seasonal decomposition by Loess

fit <- stl(log(d), s.window = "period") plot(fit)

1.5 2.5 3.5 4.5

data

−0.5 0.0 0.5 1.0

seasonal

2.2 2.4 2.6 2.8 3.0

trend

−0.6 −0.2 0.2 0.6 1 2 3 4 5

remainder time

slide-5
SLIDE 5

library(forecast): Predicting the future

plot(forecast(fit))

Forecasts from STL + ETS(A,N,N)

1 2 3 4 5 6 7 2 3 4 5

slide-6
SLIDE 6

Discontinuity and “causal” inference

◮ Time series observed before and after an intervention ◮ If behavior changes dramatically, maybe it was because of the

intervention

◮ Important to rule out other things happening at that time ◮ Example next slide: search popularity of “Star Wars” before

and after Disney purchase announced

slide-7
SLIDE 7

library(CausalImpact) developed at Google

impact <- CausalImpact(as.numeric(data[,2]), pre.period, post.period) plot(impact)

25 50 75 100 25 50 75 250 500 750

  • riginal

pointwise cumulative 100 200

slide-8
SLIDE 8

Stochastic processes

◮ {Xt}t≥t0 ◮ Collection of random variables indexed by time t, in practice

discrete

◮ Most methods require stationarity : (Xt1, . . . , Xtk) has same

distribution as (Xt1+h, . . . , Xtk+h)

◮ Transform data by taking logs, differences, to get stationarity ◮ Many classes of models. . .

slide-9
SLIDE 9

Moving averages and autoregression

◮ MA(q) moving average: Xt = µ + ǫt + θ1ǫt−1 + · · · + θqǫt−q ◮ Random shock affects future values of X directly ◮ AR(p) autoregression: Xt = c + φ1Xt−1 + · · · + φpXt−p + ǫt ◮ Random shock affects future values of X only through past

values of X

◮ AMRA(p,q) autoregessive moving-average ◮ ARIMA. . .

slide-10
SLIDE 10

Error terms

◮ ARCH conditional heteroskedasticity: variance of present error

depends on observed past errors

◮ GARCH generalized: also depends on variance of past errors ◮ e.g. ARIMA/GARCH together quite general (5 parameters)

slide-11
SLIDE 11

Further introductory reading

Very short reference http://www.statmethods.net/advstats/timeseries.html Short, easy tutorial, start in chapter 2 http://www.statoek.wiso.uni-goettingen.de/ veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf Another similar tutorial (I prefer the one above) https://a-little-book-of-r-for-time-series. readthedocs.org/en/latest/src/timeseries.html A Bayesian approach like “interrupted time series” (developed at Google) http://www.r-bloggers.com/ causalimpact-a-new-open-source-package-for-estimating-causal- Hidden Markov models (application in genetics) http://a-little-book-of-r-for-bioinformatics. readthedocs.org/en/latest/src/chapter10.html