Forecasting Data Streams: Next Generation Flow Field Forecasting - PowerPoint PPT Presentation

Forecasting Data Streams: Next Generation Flow Field Forecasting Kyle Caudle South Dakota School of Mines & Technology (SDSMT) kyle.caudle@sdsmt.edu Joint work with Michael Frey (Bucknell University) and Patrick Fleming (SDSMT) Research supported by the Naval Postgraduate School Assistance Grant N00244-15-1-0052 Interface 2015 (June 10 – 13)

Outline [1] Background [2] Flow Field Forecasting Overview [3] Strengths of Flow Field Forecasting [4] Comparison Study with Traditional Methods [5] Bivariate Forecasting [6] Autonomous History Selection [7] Other Forecasting Outputs [8] Concluding Remarks Interface 2015 (June 10 – 13)

Background • Spring 2011 - Original concept was a need to predict network performance characteristics on the Energy Sciences Network (DoE) – Long sequence of observations with observation times – Predict future observation autonomously with no human guidance – Accept non-uniformly spaced observations – Error estimates – Fast/Computationally efficient – Able to exploit parallel data Interface 2015 (June 10 – 13)

Background (continued) • December 2011 – Poster Session: “Introducing Flow Field Forecasting” 10 th Annual International Conference on Machine Learning and Applications (ICMLA) , Honolulu HI. • June 2012 – Introduced method for continuously updating forecast, 32 nd Annual International Symposium on Forecasting (ISF) , Boston MA. • August 2012 – Contributed Session on Forecasting JSM 2012, San Diego CA. • May 2013 – “Flow Field Forecasting for Univariate Time Series”, published in Statistical Analysis and Data Mining (SADM) • March 2014 – R package accepted and placed on the Comprehensive R Archive Network (CRAN). Package is called “ flowfield ” • January 2015 – Awarded research assistance grant from the Naval Post Graduate School to research the next generation flow field software Interface 2015 (June 10 – 13)

FF Forecasting in 3 Easy Steps • Methodology – Framework that makes associations between historical process levels and subsequent changes. – Extract the “flow” from one level to the next Principle of FFF: Past associations between history and change are predictive of changes associated with current histories/future changes • 3 Step Framework 1. Extract data histories (levels and subsequent changes) 2. Interpolate between observed levels in histories 3. Use the interpolator to step-by-step predict the process forward to the desired forecast horizon Interface 2015 (June 10 – 13)

Step 1: Extract Histories ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? – Use penalized spline regression to build a skeleton of historical process levels and changes – Extract relevant histories based on application PSR Extract Noise Data Stream (Time Series) Interface 2015 (June 10 – 13)

History Extraction Past histories h 1 and h 2 and associated changes d 1 and d 2 . Example 1 Example 2 Principle of FFF: Past associations between history and change are predictive of changes associated with current histories/future changes Interface 2015 (June 10 – 13)

Step 2: Interpolate the Flow Field The current history may include values that may not have been observed In the past. We use GPR to interpolate observed values to unobserved values. Interface 2015 (June 10 – 13)

Step 3: Iteratively Build to the Future 𝑒 ∗ - Slope, 𝑡 ∗ - Level, 𝜆 ∗ - Knot 𝜀 ∗ - GPR interpolated value Interface 2015 (June 10 – 13)

Strengths of FFF • Step I data skeleton achieves  data reduction and standardization (estimates process noise) • Runs autonomously  no interactive supervision of a skilled analyst • Conservative  In situations where there is no information in the history space that corresponds to the current situation, it conservatively predicts no change • Computationally efficient  Large data streams with limited computational resources – Penalized spline regression is computationally efficient. To further increase its efficiency, we replace the standard numerical search for the optimal smoothing by an asymptotic approximation [Wand, 1999] – The step II Gaussian process regression and the step III extrapolation mechanism are also computationally efficient Interface 2015 (June 10 – 13)

Comparison Study • We compare FFF with Box-Jenkins ARIMA, Exponential Smoothing and Artificial Neural Networks • ARIMA & Exponential Smoothing we use R package “forecast” [Hyndman and Khandakar] • Artificial Neural Networks we use R package “ tsDyn ” [A. Di Narzo, F. Dii Narzo, J.L. Aznarte and M. Stigler] Interface 2015 (June 10 – 13)

Simulated Time Series • Simulated data using a baseline data model of the form: 𝑍 𝑗 = 𝑇 𝑢 + 𝜁 𝑗 ( 𝜁 𝑗 - Gaussian noise) • N = 1500 uniformly spaced observation times ti ∈ {1 , 2 , . . . , 1550} and σ = 0 . 4. • For the Systematically Determined Component ( S(t )), we used realizations of a zero-mean, unit-variance stationary Gaussian process with squared exponential covariance (𝑢−𝑢 ′ ) 2 = 𝑙 𝑢 − 𝑢 ′ = exp Cov 𝑇 𝑢 , 𝑇 𝑢 ′ 2Δ 2 Interface 2015 (June 10 – 13)

Comparison 1 • For our first comparison, we generated 1000 time series realizations (3 pictured) - This model expresses short term ‘noise’ and longer term, non- Markovian dynamics - Models such as this might plausibly be encountered in real data set Characteristic length, Δ = 50 - Each time series was 1550 observations (mean zero, 𝜏 = 0.4) • • 1500 observations were used to build the model and 50 observations were used for testing • Mean forecast error was computed for each method Interface 2015 (June 10 – 13)

Comparison 1: Results • FF was very competitive with the other traditional methods • Artificial NN was marginally worse and took 4 times longer Interface 2015 (June 10 – 13)

Comparison 2 • For our second comparison, we generated 1000 time series realizations (3 pictured) • Variant data model with a recurring distinctive history The characteristic length is Δ = 500 in the time interval [500 , 600] • and then again beginning at time 1490; elsewhere, Δ = 50. Interface 2015 (June 10 – 13)

Comparison 2: Results • Short range forecast competitive • Long range, FF wins decisively Interface 2015 (June 10 – 13)

Comparison 3 • Irregularly Space Intervals • Most traditional forecasting methods rely on time series data collected at regular intervals • FF forecasting is not handicapped by this restriction • Demonstration 3 compares FF forecasting to itself Interface 2015 (June 10 – 13)

Demonstration 3 • We compute 2 time series from the baseline model used in demonstration 1 • The first time series uses uniformly spaced observations • The second series uses non-uniformly spaced observation times. Times are drawn from a Poisson process yielding time spacings between observations that are exponentially distributed Interface 2015 (June 10 – 13)

Demonstration 3: Results • This demonstration highlights a unique capability of flow field forecasting to accept non-uniformly spaced time series • Flow field forecasting can do this with almost no loss of forecast accuracy Interface 2015 (June 10 – 13)

Next Generation Software Goals • Move from a univariate data stream to multivariate – For bivariate forecasting we compute 2 separate PSRs – Next we would forecast both a change in the x- direction and a change in the y-direction • Autonomous selection of history structure Interface 2015 (June 10 – 13)

Closest Point Approach (CPA) • Recall the FFF Guiding Principle: Past associations between history and change are predictive of changes associated with current histories/future changes • For CPA we need to find which prior history matches closest with the current history • Speed Bumps – Sampling rate vs. data stream change rate(s) – Number of lags to include in history structure – Appropriate distance measure in a high dimensional space – Characteristic length for GPR interpolator (if used) Interface 2015 (June 10 – 13)

CPA Algorithm • Suppose there are p candidate predictor values for the history (e.g. x t , y t , x t-1 , y t-1, Δ x(t) , Δ y(t) , …) • For p-candidate predictors this gives us 2 p – 1 power sets • Create a distance table by computing the distance from between the current point and all historical points for a given history structure Interface 2015 (June 10 – 13)

CPA Algorithm (continued) • Create the following distance table H1 H2 … H j … H2 p -1 P1 P2 : 𝐷 − 𝑄 𝑗 Pi 𝑘 : • Entry ( i,j ) is the distance from point i to the current point ( C ) under history structure j 𝐷 − 𝑄 𝑗 𝑘 Interface 2015 (June 10 – 13)

Forecasting Data Streams: Next Generation Flow Field Forecasting - PowerPoint PPT Presentation

Forecasting Data Streams: Next Generation Flow Field Forecasting Kyle Caudle South Dakota School of Mines & Technology (SDSMT) kyle.caudle@sdsmt.edu Joint work with Michael Frey (Bucknell University) and Patrick Fleming (SDSMT) Research

WITH C++ Prof. Amr Goneid AUC Part 9. Streams & Files Prof. amr Goneid, AUC 1 Streams

Flood Forecasting Initiative Guy Shalev Flooding impact Flood Forecasting Flood Forecasting

Forecasts and potential futures Rob Hyndman Author, forecast Forecasting Using R Sample

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data & Real Time Data Streams

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Data Streams Many large sources of data are generated as streams of updates: IP Network

Data Streams Many large sources of data are generated as streams of updates: IP Network

Forecasting 21 January 2013 1 FCAS Agenda Business Goals & Forecasting Approach

Stream Bank Stabilization in Open Space Streams in open space There are approximately 35

CSE 143 Streams as C++ Classes Streams are C++ classes Streams have lots of built-in

Lecture 10 Forecasting and Model Fitting Colin Rundel 02/20/2017 1 Forecasting 2 Forecasting

Welcome to Forecasting Using R Rob Hyndman Author, forecast Forecasting Using R What you will

Comparing Data Streams Using Hamming Norms Graham Cormode, Mayur Datar, Piotr Indyk, S.

A P A P A Proposal for Publishing Data A Proposal for Publishing Data l f l f P bli hi P bli

Streams and File I/O Fundamentals of Computer Science Outline Overview of Streams and File

Processing Forecasting Queries Processing Forecasting Queries Songyun Duan, Shivnath Babu Duke

Welcome Joe Nelson Brian Lonsdorf @begriffs @drboolean Separation and Recognition The Soul

Unit 2: Probability and distributions Lecture 3: Normal distribution Statistics 101 Thomas

PyMOL Scripting Prof. Michael Schroeder Melissa Adasme Python & PyMOL Arbitrary Python

Association of creatinine-adjusted urine concentration Saxagliptin and Cardiovascular Outcomes in

Frameworks resulting from a one-pot reaction Amani Direm 1, *, Mohammed S. M. Abdelbaky 2 , Koray

Phi: Beautiful Geometry Accompanies Episode 5 The Numbers Show Starring Zero and The Digits

MATH 20: PROBABILITY Lecture 3: Permutations Xingru Chen xingru.chen.gr@dartmouth.edu XC

BLUE WATERS ENABLED ADVANCES IN THE FIELDS OF ATMOSPHERIC SCIENCE, CLIMATE, AND WEATHER Susan

Sambuz

Useful Links

Newsletter

Mail Us

Forecasting Data Streams: Next Generation Flow Field Forecasting - PowerPoint PPT Presentation

Forecasting Data Streams: Next Generation Flow Field Forecasting Kyle Caudle South Dakota School of Mines & Technology (SDSMT) kyle.caudle@sdsmt.edu Joint work with Michael Frey (Bucknell University) and Patrick Fleming (SDSMT) Research

WITH C++ Prof. Amr Goneid AUC Part 9. Streams &amp; Files Prof. amr Goneid, AUC 1 Streams

Flood Forecasting Initiative Guy Shalev Flooding impact Flood Forecasting Flood Forecasting

Forecasts and potential futures Rob Hyndman Author, forecast Forecasting Using R Sample

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data &amp; Real Time Data Streams

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Data Streams Many large sources of data are generated as streams of updates: IP Network

Data Streams Many large sources of data are generated as streams of updates: IP Network

Forecasting 21 January 2013 1 FCAS Agenda Business Goals &amp; Forecasting Approach

Stream Bank Stabilization in Open Space Streams in open space There are approximately 35

CSE 143 Streams as C++ Classes Streams are C++ classes Streams have lots of built-in

Lecture 10 Forecasting and Model Fitting Colin Rundel 02/20/2017 1 Forecasting 2 Forecasting

Welcome to Forecasting Using R Rob Hyndman Author, forecast Forecasting Using R What you will

Comparing Data Streams Using Hamming Norms Graham Cormode, Mayur Datar, Piotr Indyk, S.

A P A P A Proposal for Publishing Data A Proposal for Publishing Data l f l f P bli hi P bli

Streams and File I/O Fundamentals of Computer Science Outline Overview of Streams and File

Processing Forecasting Queries Processing Forecasting Queries Songyun Duan, Shivnath Babu Duke

Welcome Joe Nelson Brian Lonsdorf @begriffs @drboolean Separation and Recognition The Soul

Unit 2: Probability and distributions Lecture 3: Normal distribution Statistics 101 Thomas

PyMOL Scripting Prof. Michael Schroeder Melissa Adasme Python &amp; PyMOL Arbitrary Python

Association of creatinine-adjusted urine concentration Saxagliptin and Cardiovascular Outcomes in

Frameworks resulting from a one-pot reaction Amani Direm 1, *, Mohammed S. M. Abdelbaky 2 , Koray

Phi: Beautiful Geometry Accompanies Episode 5 The Numbers Show Starring Zero and The Digits

MATH 20: PROBABILITY Lecture 3: Permutations Xingru Chen xingru.chen.gr@dartmouth.edu XC

BLUE WATERS ENABLED ADVANCES IN THE FIELDS OF ATMOSPHERIC SCIENCE, CLIMATE, AND WEATHER Susan

Sambuz

Useful Links

Newsletter

Mail Us

WITH C++ Prof. Amr Goneid AUC Part 9. Streams & Files Prof. amr Goneid, AUC 1 Streams

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data & Real Time Data Streams

Forecasting 21 January 2013 1 FCAS Agenda Business Goals & Forecasting Approach

PyMOL Scripting Prof. Michael Schroeder Melissa Adasme Python & PyMOL Arbitrary Python