BTW 2017, 08.03.2017
Benchmarking (State-of-the-Art) Univariate Time Series Classifiers
Patrick Schäfer and Ulf Leser Humboldt-Universität zu Berlin, Wissensmanagement in der Bioinformatik
1
patrick.schaefer@hu-berlin.de
Benchmarking (State-of-the-Art) Univariate Time Series Classifiers - - PowerPoint PPT Presentation
Benchmarking (State-of-the-Art) Univariate Time Series Classifiers Patrick Schfer and Ulf Leser Humboldt-Universitt zu Berlin, Wissensmanagement in der Bioinformatik patrick.schaefer@hu-berlin.de BTW 2017, 08.03.2017 1 Time series (TS)
BTW 2017, 08.03.2017
Patrick Schäfer and Ulf Leser Humboldt-Universität zu Berlin, Wissensmanagement in der Bioinformatik
1
patrick.schaefer@hu-berlin.de
✤ Time series (TS) result from
✤ Increasingly popular due to the
✤ Application areas: motion sensors,
2
✤ UCR time series archive
✤ Datasets from a whole range of
✤ Overall, there are 50.000 train
✤ At most thousands of TS with
3
4
Smart Plugs „4055 Millions of measurements for 2125 plugs distributed across 40 houses.“ Real-Time Location System „The total filesize is 2.6 GB and it contains a total
Long-term human intracranial EEG recordings The total file size is >50GB with 240000x16x6000 measurements (6000 samples, 16 electrodes).
✤ At the same time real-
✤ Time series classification (TSC) aims at
✤ Most basic: 1-nearest neighbor classifiers. ✤ We look into the four groups of TS
find label Model Query
5
✤ Based on a distance measure defined on the
whole TS data and 1-NN classification.
✤ Elastic distance measures compensate for small
differences like warping in the time axis.
✤ Base-line, simple model, cannot skip irrelevant
subsections, linear to quadratic complexity in TS length.
✤ Representatives: 1-NN Dynamic Time Warping
(DTW) and 1-NN Euclidean distance (ED).
6
Euclidean Distance DTW
✤ Shapelets are TS subsequences that are
maximally representative of a class label.
✤ A TS is labeled based on the similarity to
a shapelet.
✤ Interpretable, high computational
complexity (cubic to bi-quadratic in TS length).
✤ Representatives: Shapelet Transform (ST),
Learning Shapelets (LS), Fast Shapelets (FS).
7
caffein chlorogenic acid
✤ TS are distinguished by the frequency of
✤ A bag-of-patterns (histogram) of feature
✤ Fast (linear complexity), noise reducing,
✤ Representatives: Bag-of-SFA-Symbols
8
✤ Ensembles combine different core classifiers (i.e.,
✤ High accuracy by combining different representations
✤ Representatives: Elastic Ensemble (EE PROP),
9
10
✤ Slowest (fastest) classifier took 4s (2ms). ✤ Methods are either scalable but offer only inferior accuracy, or they
DTW DTW CV FS ST BOSS BOSS VS SAX VSM LS TSBF BOP EE (PROP) COTE
60% 70% 80% 90%
1 10 100 1.000 10.000 Average Accuracy Single Query Predict Time in Milliseconds
UCR datasets: Accuracy vs Single Query Prediction Time
Accurate and fast Accurate but slower Less accurate and slower
83%
11
✤ Prediction times of state of
✤ Using StarLightCurves
✤ Video runs at 10x playback
✤ Slowest classifier took 100
94.7% 97.9% 90% 87.5% 97.8% 90.4% 92.6% 97.9%
12
12 11 10 9 8 7 6 5 4 3 2 1
3.09 COTE 4.34 ST 4.78 BOSS 5.52 EE (PROP) 5.66 LS 6.14 BOSS VS 6.15
7.62
8.05
8.39
8.65
9.62
✤ Most accurate TSCs are Ensembles, Shapelets and Bag-of-Patterns:
✤ Methods are either scalable but offer only inferior
✤ Bag-of-Patterns approaches are faster than Shapelets,
✤ Overall, COTE, ST and BOSS show the highest
✤ FS, SAX VSM, BOP, BOSS VS show the lowest runtimes
13