Forecasting Data Streams: Next Generation Flow Field Forecasting - - PowerPoint PPT Presentation

forecasting data streams
SMART_READER_LITE
LIVE PREVIEW

Forecasting Data Streams: Next Generation Flow Field Forecasting - - PowerPoint PPT Presentation

Forecasting Data Streams: Next Generation Flow Field Forecasting Kyle Caudle South Dakota School of Mines & Technology (SDSMT) kyle.caudle@sdsmt.edu Joint work with Michael Frey (Bucknell University) and Patrick Fleming (SDSMT) Research


slide-1
SLIDE 1

Interface 2015 (June 10 – 13)

Kyle Caudle South Dakota School of Mines & Technology (SDSMT) kyle.caudle@sdsmt.edu

Joint work with Michael Frey (Bucknell University) and Patrick Fleming (SDSMT)

Forecasting Data Streams:

Next Generation Flow Field Forecasting

Research supported by the Naval Postgraduate School Assistance Grant N00244-15-1-0052

slide-2
SLIDE 2

Interface 2015 (June 10 – 13)

Outline

[1] Background [2] Flow Field Forecasting Overview [3] Strengths of Flow Field Forecasting [4] Comparison Study with Traditional Methods [5] Bivariate Forecasting [6] Autonomous History Selection [7] Other Forecasting Outputs [8] Concluding Remarks

slide-3
SLIDE 3

Interface 2015 (June 10 – 13)

Background

  • Spring 2011 - Original concept was a need to predict network

performance characteristics on the Energy Sciences Network (DoE)

– Long sequence of observations with observation times – Predict future observation autonomously with no human guidance – Accept non-uniformly spaced observations – Error estimates – Fast/Computationally efficient – Able to exploit parallel data

slide-4
SLIDE 4

Interface 2015 (June 10 – 13)

Background (continued)

  • December 2011 – Poster Session: “Introducing Flow Field Forecasting” 10th Annual

International Conference on Machine Learning and Applications (ICMLA), Honolulu HI.

  • June 2012 – Introduced method for continuously updating forecast, 32nd Annual

International Symposium on Forecasting (ISF), Boston MA.

  • August 2012 – Contributed Session on Forecasting JSM 2012, San Diego CA.
  • May 2013 – “Flow Field Forecasting for Univariate Time Series”, published in

Statistical Analysis and Data Mining (SADM)

  • March 2014 – R package accepted and placed on the Comprehensive R Archive

Network (CRAN). Package is called “flowfield”

  • January 2015 – Awarded research assistance grant from the Naval Post Graduate

School to research the next generation flow field software

slide-5
SLIDE 5

Interface 2015 (June 10 – 13)

FF Forecasting in 3 Easy Steps

  • Methodology

– Framework that makes associations between historical process levels and subsequent changes. – Extract the “flow” from one level to the next

  • 3 Step Framework

1. Extract data histories (levels and subsequent changes) 2. Interpolate between observed levels in histories 3. Use the interpolator to step-by-step predict the process forward to the desired forecast horizon

Principle of FFF: Past associations between history and change are predictive of changes associated with current histories/future changes

slide-6
SLIDE 6

Interface 2015 (June 10 – 13)

Step 1:

Extract Histories

– Use penalized spline regression to build a skeleton

  • f historical process levels and changes

– Extract relevant histories based on application

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Data Stream (Time Series) Extract Noise PSR

slide-7
SLIDE 7

Interface 2015 (June 10 – 13)

History Extraction

Past histories h1 and h2 and associated changes d1 and d2.

Example 1 Example 2 Principle of FFF: Past associations between history and change are predictive of changes associated with current histories/future changes

slide-8
SLIDE 8

Interface 2015 (June 10 – 13)

Step 2: Interpolate the Flow Field

The current history may include values that may not have been observed In the past. We use GPR to interpolate observed values to unobserved values.

slide-9
SLIDE 9

Interface 2015 (June 10 – 13)

Step 3: Iteratively Build to the Future

𝑒∗- Slope, 𝑡∗ - Level, 𝜆∗ - Knot 𝜀∗- GPR interpolated value

slide-10
SLIDE 10

Interface 2015 (June 10 – 13)

Strengths of FFF

  • Step I data skeleton achieves  data reduction and

standardization (estimates process noise)

  • Runs autonomously  no interactive supervision of a skilled

analyst

  • Conservative  In situations where there is no information in

the history space that corresponds to the current situation, it conservatively predicts no change

  • Computationally efficient  Large data streams with limited

computational resources

– Penalized spline regression is computationally efficient. To further increase its efficiency, we replace the standard numerical search for the optimal smoothing by an asymptotic approximation [Wand, 1999] – The step II Gaussian process regression and the step III extrapolation mechanism are also computationally efficient

slide-11
SLIDE 11

Interface 2015 (June 10 – 13)

Comparison Study

  • We compare FFF with Box-Jenkins ARIMA,

Exponential Smoothing and Artificial Neural Networks

  • ARIMA & Exponential Smoothing we use R

package “forecast” [Hyndman and Khandakar]

  • Artificial Neural Networks we use R package

“tsDyn”[A. Di Narzo, F. Dii Narzo, J.L. Aznarte and M. Stigler]

slide-12
SLIDE 12

Interface 2015 (June 10 – 13)

Simulated Time Series

  • Simulated data using a baseline data model of

the form: 𝑍

𝑗 = 𝑇 𝑢 + 𝜁𝑗 (𝜁𝑗 - Gaussian noise)

  • N = 1500 uniformly spaced observation times

ti ∈ {1, 2, . . . , 1550} and σ = 0.4.

  • For the Systematically Determined

Component (S(t)), we used realizations of a zero-mean, unit-variance stationary Gaussian process with squared exponential covariance Cov 𝑇 𝑢 , 𝑇 𝑢′

= 𝑙 𝑢 − 𝑢′ = exp

(𝑢−𝑢′)2 2Δ2

slide-13
SLIDE 13

Interface 2015 (June 10 – 13)

Comparison 1

  • For our first comparison, we generated 1000 time series realizations

(3 pictured)

  • Each time series was 1550 observations (mean zero, 𝜏 = 0.4)
  • 1500 observations were used to build the model and 50
  • bservations were used for testing
  • Mean forecast error was computed for each method
  • This model expresses short term

‘noise’ and longer term, non- Markovian dynamics

  • Models such as this might plausibly

be encountered in real data set

  • Characteristic length, Δ = 50
slide-14
SLIDE 14

Interface 2015 (June 10 – 13)

Comparison 1: Results

  • FF was very

competitive with the other traditional methods

  • Artificial NN was

marginally worse and took 4 times longer

slide-15
SLIDE 15

Interface 2015 (June 10 – 13)

Comparison 2

  • For our second comparison, we generated 1000 time series

realizations (3 pictured)

  • Variant data model with a recurring distinctive history
  • The characteristic length is Δ = 500 in the time interval [500, 600]

and then again beginning at time 1490; elsewhere, Δ = 50.

slide-16
SLIDE 16

Interface 2015 (June 10 – 13)

Comparison 2: Results

  • Short range forecast

competitive

  • Long range, FF wins

decisively

slide-17
SLIDE 17

Interface 2015 (June 10 – 13)

Comparison 3

  • Irregularly Space Intervals
  • Most traditional forecasting methods rely on

time series data collected at regular intervals

  • FF forecasting is not handicapped by this

restriction

  • Demonstration 3 compares FF forecasting to

itself

slide-18
SLIDE 18

Interface 2015 (June 10 – 13)

Demonstration 3

  • We compute 2 time series from the baseline

model used in demonstration 1

  • The first time series uses uniformly spaced
  • bservations
  • The second series uses non-uniformly spaced
  • bservation times. Times are drawn from a

Poisson process yielding time spacings between

  • bservations that are exponentially distributed
slide-19
SLIDE 19

Interface 2015 (June 10 – 13)

Demonstration 3: Results

  • This demonstration

highlights a unique capability of flow field forecasting to accept non-uniformly spaced time series

  • Flow field forecasting

can do this with almost no loss of forecast accuracy

slide-20
SLIDE 20

Interface 2015 (June 10 – 13)

  • Move from a univariate data stream to

multivariate

– For bivariate forecasting we compute 2 separate PSRs – Next we would forecast both a change in the x- direction and a change in the y-direction

  • Autonomous selection of history structure

Next Generation Software Goals

slide-21
SLIDE 21

Interface 2015 (June 10 – 13)

  • Recall the FFF Guiding Principle: Past associations

between history and change are predictive of changes associated with current histories/future changes

  • For CPA we need to find which prior history

matches closest with the current history

  • Speed Bumps

– Sampling rate vs. data stream change rate(s) – Number of lags to include in history structure – Appropriate distance measure in a high dimensional space – Characteristic length for GPR interpolator (if used)

Closest Point Approach (CPA)

slide-22
SLIDE 22

Interface 2015 (June 10 – 13)

  • Suppose there are p candidate predictor

values for the history (e.g. xt , yt , xt-1 , yt-1, Δx(t), Δy(t) , …)

  • For p-candidate predictors this gives us 2p – 1

power sets

  • Create a distance table by computing the

distance from between the current point and all historical points for a given history structure

CPA Algorithm

slide-23
SLIDE 23

Interface 2015 (June 10 – 13)

  • Create the following distance table
  • Entry (i,j) is the distance from point i to the

current point (C) under history structure j 𝐷 − 𝑄𝑗

𝑘

CPA Algorithm

(continued)

H1 H2 … Hj … H2p-1 P1 P2 : Pi 𝐷 − 𝑄𝑗

𝑘

:

slide-24
SLIDE 24

Interface 2015 (June 10 – 13)

  • For each column in the table, determine the minimum

distance value 𝑄

𝑘 ∗ = argmin𝑄𝑗 𝐷 − 𝑄𝑗 𝑘

  • Standardize this value by subtracting the column mean and

dividing by the column standard deviation

𝑅𝑘 = 𝑒 𝐷, 𝑄

𝑘 ∗ − 𝐷 − 𝑄𝑗 𝑘

𝑡𝑒( 𝐷 − 𝑄

𝑗 𝑘)

  • Determine the minimum value of Qj
  • The minimum value of Qj gives us the closest point as well

as the history structure that gave us that point

  • Use the closest point to forecast the next (x,y)

CPA Algorithm

(continued)

slide-25
SLIDE 25

Interface 2015 (June 10 – 13)

  • The CPA algorithm is statistically equivalent to

adding a penalty to the distance when comparing two different dimensional history structures

  • Suppose I am comparing a history of dimension j to

a history of dimension size Let 𝐸𝑙 =

𝑒 𝐷,𝑄𝑙

𝑡𝑒( 𝐷−𝑄

𝑗 𝑙) and 𝐸

𝑘 = 𝑒 𝐷,𝑄𝑘

𝑡𝑒( 𝐷−𝑄

𝑗 𝑘)

  • Check to see if 𝐸

𝑘 + Π𝑘𝑙 < 𝐸𝑙

where Π 𝑘𝑙 =

𝐷−𝑄𝑗 𝑙 𝑡𝑒( 𝐷−𝑄

𝑗 𝑙) −

𝐷−𝑄𝑗 𝑘 𝑡𝑒( 𝐷−𝑄

𝑗 𝑘)

Additive Penalty

slide-26
SLIDE 26

Interface 2015 (June 10 – 13)

  • We forecast a

periodic data stream using the parametric model

x(t) = t + 0.5*cos(3*t) + N(0,𝜏2) y(t) = t+3*sin(t) + N(0,𝜏2)

CPA Demonstrations

slide-27
SLIDE 27

Interface 2015 (June 10 – 13)

  • The MFC (𝜕) expresses through the variance

an estimate of how well the forecast path is accurately reflected in the history space

  • The MFC is a value between 0 and 1. The

closer 𝜕 is to 1 the more accurately the history space matches with the forecast path

  • MFC is analogous to R2 in linear regression

Mean Flow Certainty Approach (MFCA)

slide-28
SLIDE 28

Interface 2015 (June 10 – 13)

  • Create a large set of all potential predictors as

was done with CPA

  • Hold out the last 5 data stream values for a

test set

  • Perform GPR and all possible subsets of these

predictors using all but the last 5 data stream values

MFCA Algorithm

slide-29
SLIDE 29

Interface 2015 (June 10 – 13)

  • Calculate the mean prediction error (MPE) for the

last data values and the average mean flow certainty (MFC)

  • Calculate the prediction strength

PS = MFC x exp(-MPE)

  • Choose the history structure (i.e. subset of

predictors) that gives us the value of PS that is closest to 1.

MFCA Algorithm

(continued)

slide-30
SLIDE 30

Interface 2015 (June 10 – 13)

  • CPA works great if the algorithm picks the correct point

– Occasionally due to additional factors (i.e. sampling rate, data stream changes) the incorrect point is chosen – An incorrectly chosen “closest” point results in a poor forecast

  • MFCA requires the correct choice of a characteristic

length (Δ). The correct choice of Δ balances the bias variance tradeoff

  • Both algorithms require selecting the appropriate

history depth (i.e. number of lags)

Issues/Concerns

slide-31
SLIDE 31

Interface 2015 (June 10 – 13)

  • It is our belief that the correct algorithm will

most likely be a combination of the two methods

  • We think that we should pick some subset of

closest points, potentially 5, using CPA and then perform a localized GPR on only these 5 points using MFCA to determine the winner

Hybrid Approach

slide-32
SLIDE 32

Interface 2015 (June 10 – 13)

Future Work

  • Investigate thoroughly the hybrid approach
  • Look into R-trees as a way to organize the history

structure searches

  • Look into an innovative way to calculate the

characteristic length

  • Given a data stream, can we figure out a way a priori

whether our method will provide a reasonable

  • forecast. This may be accomplished by looking for a

clustering of histories

  • Investigate the effect of data sampling rate and the

appropriate number of lags in our potential set of history predictors

slide-33
SLIDE 33

Interface 2015 (June 10 – 13)

  • Novel, computationally efficient method, for

forecasting a bivariate time series

  • Results are generalizable to multivariate data

streams

  • Created a new proximity measure for comparing

spaces in different dimensions

  • Results could be used to improve univariate

forecasting methods

  • Instead of predicting slope, we could predict

acceleration or potential energy

Concluding Remarks

slide-34
SLIDE 34

Interface 2015 (June 10 – 13)

“Those who have knowledge, don't predict. Those who predict, don't have knowledge.”

  • -Lao Tzu, 6th

Century BC Chinese Poet

Questions?

slide-35
SLIDE 35

Interface 2015 (June 10 – 13)

Backup Slides

slide-36
SLIDE 36

Interface 2015 (June 10 – 13)

  • Flow field forecasting works by estimating the

“flow field” or slope field. Essentially we are using GPR to predict (i.e. interpolate) the forward slope and using this to predict the next location

  • A conservative feature of GPR is that when trying

to interpolate the slope, if there is no information in the past the is “close” to the most recent history it conservatively predicts no change or zero slope

Different Forecasting Methods (Flow FF)

slide-37
SLIDE 37

Interface 2015 (June 10 – 13)

  • When forecasting a bivariate data stream,

predicting zero change the slope may not accurately reflect the physics of the situation

  • When forecasting in 2 dimensions the

conservative predicting might be no change in velocity

  • Force ∝ Acceleration (assuming constant mass)
  • Using GPR to predict no change in acceleration

results in constant velocity

Different Forecasting Methods (Force FF)

slide-38
SLIDE 38

Interface 2015 (June 10 – 13)

  • Use Force Field Forecasting to create an estimated

“Force Field”, (𝐺

𝑦, 𝐺 𝑧 )

  • A force field (𝐺

𝑦, 𝐺 𝑧) that has an associated potential

energy 𝑊(𝑦, 𝑧) is said to be conservative

  • From (𝐺

𝑦, 𝐺 𝑧 ) we create an estimate of the potential

energy 𝑊 (𝑦, 𝑧)

  • Using the estimated potential energy we calculate

consistent estimates of the force field components (𝐺

𝑦, 𝐺 𝑧)

Potential Energy Forecasting

slide-39
SLIDE 39

Interface 2015 (June 10 – 13)

  • 𝐺

𝑦 𝑦, 𝑧 = − Δ Δ𝑦 𝑊

(𝑦, 𝑧) and 𝐺

𝑧 𝑦, 𝑧 = − Δ Δ𝑧 𝑊

(𝑦, 𝑧)

  • We can then check for conservatism by looking at

the distances 𝐺

𝑦 𝑦, 𝑧 − 𝐺 𝑦 𝑦, 𝑧

and 𝐺

𝑧 𝑦, 𝑧 − 𝐺 𝑧 𝑦, 𝑧

  • We estimate the next x and y increments on our

path by Δ𝑦 = (𝑦 𝑑 + 𝐺

𝑦 𝑦𝑑, 𝑧𝑑 Δ𝑢)Δ𝑢 and Δ𝑧 =

(𝑧 𝑑 + 𝐺

𝑧 𝑦𝑑, 𝑧𝑑 Δ𝑢)Δ𝑢

Potential Energy Forecasting (continued)