biased resampling strategies for imbalanced
play

Biased Resampling Strategies for Imbalanced Spatio-temporal - PowerPoint PPT Presentation

Biased Resampling Strategies for Imbalanced Spatio-temporal Forecasting M A R I AN A O L IV EI R A , N U N O M O N I Z , L U S T O R G O A N D V T O R S A N T O S C O S TA 5 - 8 O C T O B


  1. Biased Resampling Strategies for Imbalanced Spatio-temporal Forecasting M A R I AN A O L IV EI R A¹ ⋅ ² , N U N O M O N I Z ¹ ⋅ ² , L U Í S T O R G O ¹ ⋅ ² ⋅ ³ A N D V Í T O R S A N T O S C O S TA ¹ ⋅ ² 5 - 8 O C T O B E R 2 0 1 9 1 2 3

  2. Spatio-temporal Data Remote monitoring equipment (Source: NDSU) PM 2.5 pollution levels (time series) Air quality measurement station network (Source: Zheng et al., 2013) 2/28

  3. Imbalanced Numeric Forecasting IMBALANCED DOMAIN RELEVANCE FUNCTION 3/28

  4. Our Contribution • Random resampling • Will introducing a approaches are often sampling bias used to tackle this that takes into account Research problem spatio-temporal Motivation dependencies improve Questions • However, our data is performance? not i.i.d. -- there are spatial and temporal • Should we weight the dependencies dimensions differently? 4/28

  5. Biased Resampling 5/28

  6. Proposed Resampling Strategies Spatio-temporal Random Under-sampling (STRUS) • Keep all extreme cases • Keep only u % of normal cases, 0 < u < 100 (with sampling bias) Spatio-temporal Random Over-sampling (STROS) • Keep all (normal and extreme) cases • Add o % replicas of extreme cases, o > 0 (with sampling bias) 6/28

  7. Spatio-temporal Sampling Bias Which cases should have higher probability of being selected during resampling? More recent Temp mporal weight observations Isolated rare Spatialweight (extreme cases) cases At each time-step Far away Spatialweight from rare (normal cases) cases 7/28

  8. Spatio-temporal Sampling Bias What if spatial and temporal dimensions havedifferent impacts? Add weighting parameter α 8/28

  9. Experiments 9/28

  10. Datasets Data source ID # time IDs # loc IDs % available % extreme MESA 10 280 20 100 7.3 NCDC 20 105 72 100 6.0 30 6.3 TCE 31 330 26 100 3.8 32 2.4 Rural 40 4k 70 ~49 7.5 50 3.5 Beijing Air 51 11k 36 ~40 5.5 52 8.6 53 3.8 10/28

  11. Learning Process • None • MARS Calculate • RUS spatio- • Random Feature • ROS Resampling Model engineering temporal Forest • STRUS indicators • RPART • STROS PRE-PROCESSING 11/28

  12. Experimental Evaluation Performance estimation Evaluation metrics procedure 12/28

  13. Evaluation Metrics • Utility-based precision and recall for numeric prediction: 13/28

  14. Performance Estimation Procedure • Prequential temporal block evaluation y x train test time 14/28

  15. Parametrization Internal Fixed a Optimal a tuning priori posteriori 15/28

  16. Parametrization Internal Fixed a Optimal a tuning priori posteriori 16/28

  17. Internal Tuning INTERNAL ESTIMATION PROCEDURE PARAMETER GRID SEARCH For each training set: Parameter Values u 0.2; 0.4; 0.6; 0.8; 0.95 o 0.5; 1; 2; 3; 4 Temporal-block CV α 0; 0.25; 0.5; 0.75; 1 y x time 17/28

  18. Parametrization Internal Fixed a Optimal a tuning priori posteriori 18/28

  19. Fixed a priori FIXED PARAMETERS Parameter Values u 0.2; 0.4; 0.6 ; 0.8; 0.95 For all training sets: o 0.5; 1; 2 ; 3; 4 α 0; 0.25; 0.5 ; 0.75; 1 Fixed parameters at middle of the grid. 19/28

  20. Parametrization Internal Fixed a Optimal a tuning priori posteriori 20/28

  21. Optimal a posteriori EXTERNAL ESTIMATION PROCEDURE PARAMETER GRID SEARCH For each data set: Parameter Values u 0.2; 0.4; 0.6; 0.8; 0.95 o 0.5; 1; 2; 3; 4 α 0; 0.25; 0.5; 0.75; 1 Choose parameters with best results on the external (prequential) procedure. 21/28

  22. Results 22/28

  23. Average Rank of F 1 u Parametrization None ROS STROS RUS STRUS Internal tuning 4.60 3.07 2.37 2.67 2.30 Fixed a priori 4.53 2.77 2.73 2.57 2.40 Optimal a posteriori 5.00 3.07 2.27 2.93 1.73 23/28

  24. Parameter Sensitivity Analysis TWO PARAMETERS DIMENSION WEIGHTING RUS/ROS RUS/ROS RUS/ROS 24/28

  25. Precision and Recall Trade-off PRECISION RECALL RUS/ROS RUS/ROS RUS/ROS RUS/ROS RUS/ROS RUS/ROS 25/28

  26. Conclusion 26/28

  27. Conclusion • Including spatio-temporal bias when resampling improves performance • The contributions of each dimension should be weigthed : • When over-sampling : favour temporal weight and prioritize more recent observations • When under-sampling : favour spatial weight and prioritize isolated rare cases and normal cases that are spatially distant from extreme cases • Future work: • Study the impact of data characteristics on performance • Consider local instead of global definitions of extreme values 27/28

  28. Thank you! Code available at https://github.com/mrfoliveira/STResampling-DSAA2019 28/28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend