Benchmarking (State-of-the-Art) Univariate Time Series Classifiers - PowerPoint PPT Presentation

Benchmarking (State-of-the-Art) Univariate Time Series Classifiers Patrick Schäfer and Ulf Leser Humboldt-Universität zu Berlin, Wissensmanagement in der Bioinformatik patrick.schaefer@hu-berlin.de BTW 2017, 08.03.2017 1

✤ Time series (TS) result from recording data over time. ✤ Increasingly popular due to the growing importance of automatic sensors producing an increasing flood of large, high-resolution TS. ✤ Application areas: motion sensors, personalized medicine (ECG/EEG signals), machine surveillance, spectrograms, astronomy (starlight-curves), and image outlines/contour of objects. 2

✤ UCR time series archive contains 85 benchmark datasets used in TS research. ✤ Datasets from a whole range of application, grouped by: synthetic, motion sensors, sensor readings and image outlines. ✤ Overall, there are 50.000 train and 100.000 test TS or 55 million values. ✤ At most thousands of TS with thousands of measured values for a single dataset. 3

Long-term human intracranial ✤ At the same time real- EEG recordings time systems emerge: The total file size is >50GB with Billions of measurements 240000x16x6000 measurements for thousands of sensors. (6000 samples, 16 electrodes). Smart Plugs Real-Time Location System „4055 Millions of „The total filesize measurements for 2125 plugs is 2.6 GB and it contains a total distributed across 40 houses.“ of 49,576,080 position events.“ 4

Model ✤ Time series classification (TSC) aims at assigning a class label to an unlabeled query TS based on a model trained from labeled samples. ✤ Most basic: 1-nearest neighbor classifiers. ✤ We look into the four groups of TS classifiers: whole series, shapelets, bag- of-patterns, and ensembles. find Query label 5

Whole Series ✤ Based on a distance measure defined on the whole TS data and 1-NN classification. ✤ Elastic distance measures compensate for small Euclidean differences like warping in the time axis. Distance ✤ Base-line, simple model, cannot skip irrelevant subsections, linear to quadratic complexity in TS length. DTW ✤ Representatives: 1-NN Dynamic Time Warping (DTW) and 1-NN Euclidean distance (ED). 6

Shapelets ✤ Shapelets are TS subsequences that are caffein maximally representative of a class label. ✤ A TS is labeled based on the similarity to a shapelet. ✤ Interpretable, high computational complexity (cubic to bi-quadratic in TS chlorogenic acid length). ✤ Representatives: Shapelet Transform (ST), Learning Shapelets (LS), Fast Shapelets (FS). 7

Bag-of-Patterns / Bag-of-Features ✤ TS are distinguished by the frequency of occurrence of features generated over substructures of the TS. ✤ A bag-of-patterns (histogram) of feature counts is used as input to classification. ✤ Fast (linear complexity), noise reducing, but order of substructures gets lost. ✤ Representatives: Bag-of-SFA-Symbols (BOSS), Bag-of-Patterns (BoP), Time Series Bag of Features (TSBF). 8

Ensembles ✤ Ensembles combine different core classifiers (i.e., shapelets, bag-of-patterns, whole series) into a single classifier using bagging or majority voting. ✤ High accuracy by combining different representations but high computational complexity (quadratic to bi- quadratic in TS length). ✤ Representatives: Elastic Ensemble (EE PROP), Collective of Transformation Ensembles (COTE). 9

UCR datasets: Accuracy vs Single Query Prediction Time 90% Accurate and fast Accurate but slower ST 83% BOSS VS Average Accuracy LS COTE 80% BOSS DTW EE (PROP) TSBF BOP DTW CV SAX VSM 70% FS Less accurate and slower 60% 1 10 100 1.000 10.000 Single Query Predict Time in Milliseconds ✤ Slowest (fastest) classifier took 4s (2ms). ✤ Methods are either scalable but offer only inferior accuracy, or they achieve state-of-the-art accuracy but do not scale to larger dataset sizes. 10

✤ Prediction times of state of 87.5% the art. 90% ✤ Using StarLightCurves dataset with 1000 train and 90.4% 8236 test TS of length 1024. 92.6% ✤ Video runs at 10x playback speed. 94.7% ✤ Slowest classifier took 100 97.8% hours. Fastest took 20 ms. 97.9% 97.9% 11

Average Ranks on 85 UCR datasets CD 12 11 10 9 8 7 6 5 4 3 2 1 3.09 COTE 9.62 FastShapelets 4.34 ST 8.65 1-NN DTW 4.78 BOSS 8.39 BoP 5.52 EE (PROP) 8.05 SAXVSM 5.66 LS 7.62 1-NN DTW CV 6.14 BOSS VS 6.15 TSBF ✤ Most accurate TSCs are Ensembles, Shapelets and Bag-of-Patterns:   COTE, ST, BOSS and EE. 12

Conclusion ✤ Methods are either scalable but offer only inferior accuracy, or they achieve state-of-the-art accuracy but do not scale to larger dataset sizes. ✤ Bag-of-Patterns approaches are faster than Shapelets, Ensembles or Whole Series Measures. ✤ Overall, COTE, ST and BOSS show the highest classification accuracy at the cost of increased runtimes. ✤ FS, SAX VSM, BOP, BOSS VS show the lowest runtimes at the cost of limited accuracy. 13

Benchmarking (State-of-the-Art) Univariate Time Series Classifiers - PowerPoint PPT Presentation

Benchmarking (State-of-the-Art) Univariate Time Series Classifiers Patrick Schfer and Ulf Leser Humboldt-Universitt zu Berlin, Wissensmanagement in der Bioinformatik patrick.schaefer@hu-berlin.de BTW 2017, 08.03.2017 1 Time series (TS)

HISTORY ART Pre- Historic Art Egyptian Art Greek Art Roman Art Byzantine Art Medieval Art

HISTORY ART Pre- Historic Art Egyptian Art Greek Art Roman Art Byzantine Art Medieval Art

Univariate Time Series Analysis; ARIMA Models Heino Bohn Nielsen 1 of 40 Outline of the Lecture

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Lionel Riou Fransca Univariate & bivariate Two kind of analysis Univariate

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Overview of Presentation Public Art Definitions Why is Public Art Important ? Percent for Art

ART OF CHANGE 21 PRSENTATION 2 ART OF CHANGE 21 ABOUT US Art of Change 21 works in the field

standard series Overview DP series DX series H series M series bitte hier

Robust Statistics Part 1: Introduction and univariate data Peter Rousseeuw LARS-IASC School, May

Univariate Graphics STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley

Fisher scoring for some univariate discrete distributions Thomas Yee University of Auckland 26

A New Java Runtime for a Parallel World Christoph Reichenbach, Yannis Smaragdakis University of

Future Internet Testbed and Future Internet Testbed and Multi-Domain OpenFlow M Management in

Cosmology with galaxy surveys Ramon Miquel ICREA / IFAE Barcelona LST-1 inauguration, La Palma,

Dark Matter from cosmology/astrophysics Jo Dunkley Oxford Astrophysics Summary Cosmological

Beyond the standard Baryon Acoustic Oscillation measurement Florian Beutler 20 April, 2018

Optimization of Structured Mean Field Objectives Alexandre Bouchard-Ct* Michael I. Jordan*

Discrete time approximation of BSDEs with Lipschitz coefficients B. Bouchard Ceremade,

Stochastic target problems and pricing under risk constraints B. Bouchard Ceremade - Univ.

Benchmarking (State-of-the-Art) Univariate Time Series Classifiers - PowerPoint PPT Presentation

Benchmarking (State-of-the-Art) Univariate Time Series Classifiers Patrick Schfer and Ulf Leser Humboldt-Universitt zu Berlin, Wissensmanagement in der Bioinformatik patrick.schaefer@hu-berlin.de BTW 2017, 08.03.2017 1 Time series (TS)

HISTORY ART Pre- Historic Art Egyptian Art Greek Art Roman Art Byzantine Art Medieval Art

HISTORY ART Pre- Historic Art Egyptian Art Greek Art Roman Art Byzantine Art Medieval Art

Univariate Time Series Analysis; ARIMA Models Heino Bohn Nielsen 1 of 40 Outline of the Lecture

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Lionel Riou Fransca Univariate &amp; bivariate Two kind of analysis Univariate

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Overview of Presentation Public Art Definitions Why is Public Art Important ? Percent for Art

ART OF CHANGE 21 PRSENTATION 2 ART OF CHANGE 21 ABOUT US Art of Change 21 works in the field

standard series Overview DP series DX series H series M series bitte hier

Robust Statistics Part 1: Introduction and univariate data Peter Rousseeuw LARS-IASC School, May

Univariate Graphics STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley

Fisher scoring for some univariate discrete distributions Thomas Yee University of Auckland 26

A New Java Runtime for a Parallel World Christoph Reichenbach, Yannis Smaragdakis University of

Future Internet Testbed and Future Internet Testbed and Multi-Domain OpenFlow M Management in

Cosmology with galaxy surveys Ramon Miquel ICREA / IFAE Barcelona LST-1 inauguration, La Palma,

Dark Matter from cosmology/astrophysics Jo Dunkley Oxford Astrophysics Summary Cosmological

Beyond the standard Baryon Acoustic Oscillation measurement Florian Beutler 20 April, 2018

Optimization of Structured Mean Field Objectives Alexandre Bouchard-Ct* Michael I. Jordan*

Discrete time approximation of BSDEs with Lipschitz coefficients B. Bouchard Ceremade,

Stochastic target problems and pricing under risk constraints B. Bouchard Ceremade - Univ.

Lionel Riou Fransca Univariate & bivariate Two kind of analysis Univariate