Biased Resampling Strategies for Imbalanced Spatio-temporal - PowerPoint PPT Presentation

Biased Resampling Strategies for Imbalanced Spatio-temporal Forecasting M A R I AN A O L IV EI R A¹ ⋅ ² , N U N O M O N I Z ¹ ⋅ ² , L U Í S T O R G O ¹ ⋅ ² ⋅ ³ A N D V Í T O R S A N T O S C O S TA ¹ ⋅ ² 5 - 8 O C T O B E R 2 0 1 9 1 2 3

Spatio-temporal Data Remote monitoring equipment (Source: NDSU) PM 2.5 pollution levels (time series) Air quality measurement station network (Source: Zheng et al., 2013) 2/28

Imbalanced Numeric Forecasting IMBALANCED DOMAIN RELEVANCE FUNCTION 3/28

Our Contribution • Random resampling • Will introducing a approaches are often sampling bias used to tackle this that takes into account Research problem spatio-temporal Motivation dependencies improve Questions • However, our data is performance? not i.i.d. -- there are spatial and temporal • Should we weight the dependencies dimensions differently? 4/28

Biased Resampling 5/28

Proposed Resampling Strategies Spatio-temporal Random Under-sampling (STRUS) • Keep all extreme cases • Keep only u % of normal cases, 0 < u < 100 (with sampling bias) Spatio-temporal Random Over-sampling (STROS) • Keep all (normal and extreme) cases • Add o % replicas of extreme cases, o > 0 (with sampling bias) 6/28

Spatio-temporal Sampling Bias Which cases should have higher probability of being selected during resampling? More recent Temp mporal weight observations Isolated rare Spatialweight (extreme cases) cases At each time-step Far away Spatialweight from rare (normal cases) cases 7/28

Spatio-temporal Sampling Bias What if spatial and temporal dimensions havedifferent impacts? Add weighting parameter α 8/28

Experiments 9/28

Datasets Data source ID # time IDs # loc IDs % available % extreme MESA 10 280 20 100 7.3 NCDC 20 105 72 100 6.0 30 6.3 TCE 31 330 26 100 3.8 32 2.4 Rural 40 4k 70 ~49 7.5 50 3.5 Beijing Air 51 11k 36 ~40 5.5 52 8.6 53 3.8 10/28

Learning Process • None • MARS Calculate • RUS spatio- • Random Feature • ROS Resampling Model engineering temporal Forest • STRUS indicators • RPART • STROS PRE-PROCESSING 11/28

Experimental Evaluation Performance estimation Evaluation metrics procedure 12/28

Evaluation Metrics • Utility-based precision and recall for numeric prediction: 13/28

Performance Estimation Procedure • Prequential temporal block evaluation y x train test time 14/28

Parametrization Internal Fixed a Optimal a tuning priori posteriori 15/28

Internal Tuning INTERNAL ESTIMATION PROCEDURE PARAMETER GRID SEARCH For each training set: Parameter Values u 0.2; 0.4; 0.6; 0.8; 0.95 o 0.5; 1; 2; 3; 4 Temporal-block CV α 0; 0.25; 0.5; 0.75; 1 y x time 17/28

Fixed a priori FIXED PARAMETERS Parameter Values u 0.2; 0.4; 0.6 ; 0.8; 0.95 For all training sets: o 0.5; 1; 2 ; 3; 4 α 0; 0.25; 0.5 ; 0.75; 1 Fixed parameters at middle of the grid. 19/28

Optimal a posteriori EXTERNAL ESTIMATION PROCEDURE PARAMETER GRID SEARCH For each data set: Parameter Values u 0.2; 0.4; 0.6; 0.8; 0.95 o 0.5; 1; 2; 3; 4 α 0; 0.25; 0.5; 0.75; 1 Choose parameters with best results on the external (prequential) procedure. 21/28

Results 22/28

Average Rank of F 1 u Parametrization None ROS STROS RUS STRUS Internal tuning 4.60 3.07 2.37 2.67 2.30 Fixed a priori 4.53 2.77 2.73 2.57 2.40 Optimal a posteriori 5.00 3.07 2.27 2.93 1.73 23/28

Parameter Sensitivity Analysis TWO PARAMETERS DIMENSION WEIGHTING RUS/ROS RUS/ROS RUS/ROS 24/28

Precision and Recall Trade-off PRECISION RECALL RUS/ROS RUS/ROS RUS/ROS RUS/ROS RUS/ROS RUS/ROS 25/28

Conclusion 26/28

Conclusion • Including spatio-temporal bias when resampling improves performance • The contributions of each dimension should be weigthed : • When over-sampling : favour temporal weight and prioritize more recent observations • When under-sampling : favour spatial weight and prioritize isolated rare cases and normal cases that are spatially distant from extreme cases • Future work: • Study the impact of data characteristics on performance • Consider local instead of global definitions of extreme values 27/28

Thank you! Code available at https://github.com/mrfoliveira/STResampling-DSAA2019 28/28

Biased Resampling Strategies for Imbalanced Spatio-temporal - PowerPoint PPT Presentation

Biased Resampling Strategies for Imbalanced Spatio-temporal Forecasting M A R I AN A O L IV EI R A , N U N O M O N I Z , L U S T O R G O A N D V T O R S A N T O S C O S TA 5 - 8 O C T O B

Imbalanced Domain Learning Fraud Detection Course - 2019/2020 Nuno Moniz nuno.moniz@fc.up.pt

Natures Theory For Humans For Plants An imbalanced diet is An imbalanced Fertilizer poison to

The Short Introduction to Imbalanced Classification Zeyu Qin 07.02.2020 Overview Reference

Biased and Unbiased Samples James J. Heckman Econ 312, Spring 2019 May 14, 2019 1 / 125

Biased and Unbiased Samples James J. Heckman Econ 312, Spring 2019 May 13, 2019 1 / 125

Estimating the Performance of Predictive Models with Resampling Methods Florian Pargent

Introduction to Machine Learning Tuning: Nested Resampling compstat-lmu.github.io/lecture_i2ml

Dealing with imbalanced datasets Bart Baesens Professor Data Science at KU Leuven DataCamp

ML in Practice: Dealing with imbalanced data CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu T

ADVANCED MACHINE LEARNING Caveats and Techniques to Deal with Imbalanced Datasets (Adapted from

Extreme Event-Size Extreme Event-Size Fluctuations in Biased Fluctuations in Biased Random

Introduction to Machine Learning Evaluation: Resampling compstat-lmu.github.io/lecture_i2ml

Subsampling versus bootstrap in resampling-based model selection for multivariable regression

The EXQUIRES (EXtensible QUantitative Image RESampling) Test Suite: Impact of the Downsampler,

Controlling Adaptive Resampling Fons Adriaensen Casa della Musica, Parma Linux Audio Conference

Introduction to resampling methods Tushar Shanker Data Scientist DataCamp Statistical

DYNAMIC RESAMPLING FOR GUIDED EVOLUTIONARY MULTI-OBJECTIVE OPTIMIZATION OF STOCHASTIC SYSTEMS

1 Image-Based Image-Based Example Acquistion Setup Example Acquistion Setup BRDF Measurement

Last lecture Configuration Space Free-Space and C-Space Obstacles Minkowski Sums 1

Advanced Simulation - Lecture 13 Patrick Rebeschini February 26th, 2018 Patrick Rebeschini

CS 4495 Computer Vision Tracking 2: Particle Filters Aaron Bobick School of Interactive

Regression Regression is a predictive method (like the nearest neighbour algorithm) The

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Computational Methods for Nonlinear Mixed Models Douglas Bates University of Wisconsin - Madison

Sambuz

Useful Links

Newsletter

Mail Us