probability density function estimation based over
play

Probability Density Function Estimation Based Over-Sampling for - PowerPoint PPT Presentation

Introduction PDF Estimation Based Over-sampling Experiments Conclusions Probability Density Function Estimation Based Over-Sampling for Imbalanced Two-Class Problems Ming Gao a , Xia Hong a , Sheng Chen b , c , Chris J. Harris b a School of


  1. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Probability Density Function Estimation Based Over-Sampling for Imbalanced Two-Class Problems Ming Gao a , Xia Hong a , Sheng Chen b , c , Chris J. Harris b a School of Systems Engineering, University of Reading, Reading RG6 6AY, UK ming.gao@pgr.reading.ac.uk x.hong@reading.ac.uk b Electronics and Computer Science, Faculty of Physical and Applied Sciences, University of Southampton, Southampton SO17 1BJ, UK sqc@ecs.soton.ac.uk cjh@ecs.soton.ac.uk c Faculty of Engineering, King Abdulaziz University, Jeddah 21589, Saudi Arabia IEEE World Congress on Computational Intelligence Brisbane Australia, June 10-15, 2012

  2. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Outline Introduction 1 Motivations and Solutions PDF Estimation Based Over-sampling 2 Kernel Density Estimation Over-sampling Procedure Tunable RBF Classifier Construction Experiments 3 Experimental Setup Experimental Results Conclusions 4 Concluding Remarks

  3. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Outline Introduction 1 Motivations and Solutions PDF Estimation Based Over-sampling 2 Kernel Density Estimation Over-sampling Procedure Tunable RBF Classifier Construction Experiments 3 Experimental Setup Experimental Results Conclusions 4 Concluding Remarks

  4. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Background Highly imbalanced two-class classification problems widely occur in life-threatening or safety critical applications Techniques for imbalanced problems can be divided into: Imbalanced learning algorithms: 1 Internally modify existing algorithms, without artificially altering original imbalanced data Resampling methods: 2 Externally operate on original imbalanced data set to re-balance data for conventional classifier Resampling methods can be categorised into: Under-sampling : which tends to be ideal when imbalance 1 degree is not very severe Over-sampling : which becomes necessary if imbalance 2 degree is high

  5. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Our Approach What would be ideal over-sampling: Draw synthetic data according to same probability distribution which produces observed positive-class data samples Our probability density function estimation based over-sampling Construct Parzen window or kernel density estimation from 1 observed positive-class data samples Generate synthetic data samples according to estimated 2 positive-class probability density function Apply our tunable radial basis function classifier based on 3 leave-one-out misclassification rate to rebalanced data Ready-made PW estimator is low complexity in this application, as minority-class by nature is small size Particle swarm optimisation aided OFR for constructing RBF classifier based on LOO error rate is a state-of-the-art

  6. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Outline Introduction 1 Motivations and Solutions PDF Estimation Based Over-sampling 2 Kernel Density Estimation Over-sampling Procedure Tunable RBF Classifier Construction Experiments 3 Experimental Setup Experimental Results Conclusions 4 Concluding Remarks

  7. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Problem Statement Imbalanced two-class data set D N = { x k , y k } N k = 1 D N − = { x i , y i = + 1 } N + N − [ [ D N = D N + { x l , y l = − 1 } i = 1 l = 1 y k ∈ {± 1 } : class label for feature vector x k ∈ R m 1 x k are i.i.d. drawn from unknown underlying PDF 2 N = N + + N − , and N + ≪ N − 3 Kernel density estimator ˆ p ( x ) for p ( x ) is constructed based on positive-class samples D N + = { x i , y i = + 1 } N + i = 1 N + p ( x ) = ( det S ) − 1 / 2 “ ” X S − 1 / 2 ( x − x i ) ˆ Φ σ N + i = 1 Kernel : 1 σ − m ( 2 π ) m / 2 e − 1 2 σ − 2 ( x − x i ) T S − 1 ( x − x i ) “ ” S − 1 / 2 ( x − x i ) Φ σ = S : covariance matrix of positive class 2 σ : smoothing parameter 3

  8. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Kernel Parameter Estimate Unbiased estimate of positive-class covariance matrix is N + 1 X x ) T ( x i − ¯ x )( x i − ¯ S = N + − 1 i = 1 � N + 1 with mean vector of positive class ¯ x = i = 1 x i N + Smoothing parameter by grid search to minimise score function “ ” M ( σ ) = N − 2 X X Φ ∗ S − 1 / 2 ( x j − x i ) + 2 N − 1 + Φ σ ( 0 ) σ + i j with “ ” ≈ Φ ( 2 ) “ ” “ ” S − 1 / 2 ( x j − x i ) S − 1 / 2 ( x j − x i ) S − 1 / 2 ( x j − x i ) Φ ∗ − 2 Φ σ σ σ √ 2 σ ) − m √ = ( “ ” ( 2 π ) m / 2 e − 1 2 σ ) − 2 ( x j − x i ) T S − 1 ( x j − x i ) Φ ( 2 ) 2 ( S − 1 / 2 ( x j − x i ) σ M ( σ ) is based on mean integrated square error measure

  9. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Outline Introduction 1 Motivations and Solutions PDF Estimation Based Over-sampling 2 Kernel Density Estimation Over-sampling Procedure Tunable RBF Classifier Construction Experiments 3 Experimental Setup Experimental Results Conclusions 4 Concluding Remarks

  10. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Draw Synthetic Samples Over-sampling positive class by drawing synthetic data samples according to PDF estimate ˆ p ( x ) Procedure for generating a synthetic sample 1) Based on discrete uniform distribution, randomly draw a data sample, x o , from positive-class data set D N + 2) Generate a synthetic data sample, x n , using Gaussian distribution with mean x o and covariance matrix σ 2 S x n = x o + σ R · randn () R : upper triangular matrix that is Cholesky decomposition of S randn () : pseudorandom vector drawn from zero-mean normal distribution with covariance matrix I m Repeat Procedure r · N + times, given oversampling rate r

  11. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Example (PDF estimate) (a) Imbalanced data set: x denoting positive-class instance and ◦ negative-class instance N + = 10 positive-class samples: mean [ 2 2 ] T and covariance I 2 N − = 100 negative-class samples: mean [ 0 0 ] T and covariance I 2 (b) Constructed PDF kernel of each positive-class instance Optimal smoothing parameter σ = 1 . 25 and covariance matrix S ≈ I 2 (c) Estimated density distribution of positive class (a) (b) (c)

  12. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Example (over-sampling) Over-sampling rate : r = 100 % , ideal decision boundary : x + y − 2 = 0 (a) Proposed PDF estimate based over-sampling: over-sampled positive-class data set expands along direction of ideal decision boundary (b) Synthetic minority over-sampling technique (SMOTE): over-sampled data set is confined in region defined by original positive-class instances (a) (b)

  13. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Outline Introduction 1 Motivations and Solutions PDF Estimation Based Over-sampling 2 Kernel Density Estimation Over-sampling Procedure Tunable RBF Classifier Construction Experiments 3 Experimental Setup Experimental Results Conclusions 4 Concluding Remarks

  14. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Tunable RBF Classifier Construct radial basis function classifier from oversampled training data, still denoted as D N = { x k , y k } N k = 1 M y ( M ) y ( M ) y ( M ) � ˆ � � = g T ˜ � ˆ � = w i g i x k M ( k ) w M and = sgn k k k i = 1 y ( M ) M : number of tunable kernels, ˜ : estimated class label 1 k Gaussian kernel adopted: g i ( x ) = e − ( x − µ i ) T Σ − 1 ( x − µ i ) 2 i µ i ∈ R m : i th RBF kernel center vector 3 Σ i = diag { σ 2 i , 1 , σ 2 i , 2 , · · · , σ 2 i , m } : i th covariance matrix 4 Regression model on training data D N y = G M w M + ε ( M ) � T with error ε ( M ) ε ( M ) = ε ( M ) · · · ε ( M ) y ( M ) � = y k − ˆ 1 1 N k k � � G M = g 1 g 2 · · · g M : N × M regression matrix 2 � T : classifier’s weight vector � w M = w 1 · · · w M 3

  15. Introduction PDF Estimation Based Over-sampling Experiments Conclusions Orthogonal Decomposition Orthogonal decomposition of regression matrix G M = P M A M  1 a 1 , 2 · · · a 1 , M  . ... .   0 1 .   A M =  .  ... ... .   . a M − 1 , M   0 · · · 0 1 � � with orthogonal columns : p T P M = p 1 · · · p M i p j = 0 for i � = j Equivalent regression model y = G M w M + ε ( M ) ⇔ y = P M θ M + ε ( M ) � T satisfies θ M = A M w M � θ M = θ 1 · · · θ M � � After n th stage of orthogonal forward selection, G n = g 1 · · · g n � � is built with corresponding P n = p 1 · · · p n and A n k th row of P n is denoted as p T ( k ) = � � p 1 ( k ) · · · p n ( k )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend