Probability Density Function Estimation Based Over-Sampling for - PowerPoint PPT Presentation

Introduction PDF Estimation Based Over-sampling Experiments Conclusions Probability Density Function Estimation Based Over-Sampling for Imbalanced Two-Class Problems Ming Gao a , Xia Hong a , Sheng Chen b , c , Chris J. Harris b a School of Systems Engineering, University of Reading, Reading RG6 6AY, UK ming.gao@pgr.reading.ac.uk x.hong@reading.ac.uk b Electronics and Computer Science, Faculty of Physical and Applied Sciences, University of Southampton, Southampton SO17 1BJ, UK sqc@ecs.soton.ac.uk cjh@ecs.soton.ac.uk c Faculty of Engineering, King Abdulaziz University, Jeddah 21589, Saudi Arabia IEEE World Congress on Computational Intelligence Brisbane Australia, June 10-15, 2012

Introduction PDF Estimation Based Over-sampling Experiments Conclusions Outline Introduction 1 Motivations and Solutions PDF Estimation Based Over-sampling 2 Kernel Density Estimation Over-sampling Procedure Tunable RBF Classifier Construction Experiments 3 Experimental Setup Experimental Results Conclusions 4 Concluding Remarks

Introduction PDF Estimation Based Over-sampling Experiments Conclusions Background Highly imbalanced two-class classification problems widely occur in life-threatening or safety critical applications Techniques for imbalanced problems can be divided into: Imbalanced learning algorithms: 1 Internally modify existing algorithms, without artificially altering original imbalanced data Resampling methods: 2 Externally operate on original imbalanced data set to re-balance data for conventional classifier Resampling methods can be categorised into: Under-sampling : which tends to be ideal when imbalance 1 degree is not very severe Over-sampling : which becomes necessary if imbalance 2 degree is high

Introduction PDF Estimation Based Over-sampling Experiments Conclusions Our Approach What would be ideal over-sampling: Draw synthetic data according to same probability distribution which produces observed positive-class data samples Our probability density function estimation based over-sampling Construct Parzen window or kernel density estimation from 1 observed positive-class data samples Generate synthetic data samples according to estimated 2 positive-class probability density function Apply our tunable radial basis function classifier based on 3 leave-one-out misclassification rate to rebalanced data Ready-made PW estimator is low complexity in this application, as minority-class by nature is small size Particle swarm optimisation aided OFR for constructing RBF classifier based on LOO error rate is a state-of-the-art

Introduction PDF Estimation Based Over-sampling Experiments Conclusions Problem Statement Imbalanced two-class data set D N = { x k , y k } N k = 1 D N − = { x i , y i = + 1 } N + N − [ [ D N = D N + { x l , y l = − 1 } i = 1 l = 1 y k ∈ {± 1 } : class label for feature vector x k ∈ R m 1 x k are i.i.d. drawn from unknown underlying PDF 2 N = N + + N − , and N + ≪ N − 3 Kernel density estimator ˆ p ( x ) for p ( x ) is constructed based on positive-class samples D N + = { x i , y i = + 1 } N + i = 1 N + p ( x ) = ( det S ) − 1 / 2 “ ” X S − 1 / 2 ( x − x i ) ˆ Φ σ N + i = 1 Kernel : 1 σ − m ( 2 π ) m / 2 e − 1 2 σ − 2 ( x − x i ) T S − 1 ( x − x i ) “ ” S − 1 / 2 ( x − x i ) Φ σ = S : covariance matrix of positive class 2 σ : smoothing parameter 3

Introduction PDF Estimation Based Over-sampling Experiments Conclusions Kernel Parameter Estimate Unbiased estimate of positive-class covariance matrix is N + 1 X x ) T ( x i − ¯ x )( x i − ¯ S = N + − 1 i = 1 � N + 1 with mean vector of positive class ¯ x = i = 1 x i N + Smoothing parameter by grid search to minimise score function “ ” M ( σ ) = N − 2 X X Φ ∗ S − 1 / 2 ( x j − x i ) + 2 N − 1 + Φ σ ( 0 ) σ + i j with “ ” ≈ Φ ( 2 ) “ ” “ ” S − 1 / 2 ( x j − x i ) S − 1 / 2 ( x j − x i ) S − 1 / 2 ( x j − x i ) Φ ∗ − 2 Φ σ σ σ √ 2 σ ) − m √ = ( “ ” ( 2 π ) m / 2 e − 1 2 σ ) − 2 ( x j − x i ) T S − 1 ( x j − x i ) Φ ( 2 ) 2 ( S − 1 / 2 ( x j − x i ) σ M ( σ ) is based on mean integrated square error measure

Introduction PDF Estimation Based Over-sampling Experiments Conclusions Draw Synthetic Samples Over-sampling positive class by drawing synthetic data samples according to PDF estimate ˆ p ( x ) Procedure for generating a synthetic sample 1) Based on discrete uniform distribution, randomly draw a data sample, x o , from positive-class data set D N + 2) Generate a synthetic data sample, x n , using Gaussian distribution with mean x o and covariance matrix σ 2 S x n = x o + σ R · randn () R : upper triangular matrix that is Cholesky decomposition of S randn () : pseudorandom vector drawn from zero-mean normal distribution with covariance matrix I m Repeat Procedure r · N + times, given oversampling rate r

Introduction PDF Estimation Based Over-sampling Experiments Conclusions Example (PDF estimate) (a) Imbalanced data set: x denoting positive-class instance and ◦ negative-class instance N + = 10 positive-class samples: mean [ 2 2 ] T and covariance I 2 N − = 100 negative-class samples: mean [ 0 0 ] T and covariance I 2 (b) Constructed PDF kernel of each positive-class instance Optimal smoothing parameter σ = 1 . 25 and covariance matrix S ≈ I 2 (c) Estimated density distribution of positive class (a) (b) (c)

Introduction PDF Estimation Based Over-sampling Experiments Conclusions Example (over-sampling) Over-sampling rate : r = 100 % , ideal decision boundary : x + y − 2 = 0 (a) Proposed PDF estimate based over-sampling: over-sampled positive-class data set expands along direction of ideal decision boundary (b) Synthetic minority over-sampling technique (SMOTE): over-sampled data set is confined in region defined by original positive-class instances (a) (b)

Introduction PDF Estimation Based Over-sampling Experiments Conclusions Tunable RBF Classifier Construct radial basis function classifier from oversampled training data, still denoted as D N = { x k , y k } N k = 1 M y ( M ) y ( M ) y ( M ) � ˆ � � = g T ˜ � ˆ � = w i g i x k M ( k ) w M and = sgn k k k i = 1 y ( M ) M : number of tunable kernels, ˜ : estimated class label 1 k Gaussian kernel adopted: g i ( x ) = e − ( x − µ i ) T Σ − 1 ( x − µ i ) 2 i µ i ∈ R m : i th RBF kernel center vector 3 Σ i = diag { σ 2 i , 1 , σ 2 i , 2 , · · · , σ 2 i , m } : i th covariance matrix 4 Regression model on training data D N y = G M w M + ε ( M ) � T with error ε ( M ) ε ( M ) = ε ( M ) · · · ε ( M ) y ( M ) � = y k − ˆ 1 1 N k k � � G M = g 1 g 2 · · · g M : N × M regression matrix 2 � T : classifier’s weight vector � w M = w 1 · · · w M 3

Introduction PDF Estimation Based Over-sampling Experiments Conclusions Orthogonal Decomposition Orthogonal decomposition of regression matrix G M = P M A M  1 a 1 , 2 · · · a 1 , M  . ... .   0 1 .   A M =  .  ... ... .   . a M − 1 , M   0 · · · 0 1 � � with orthogonal columns : p T P M = p 1 · · · p M i p j = 0 for i � = j Equivalent regression model y = G M w M + ε ( M ) ⇔ y = P M θ M + ε ( M ) � T satisfies θ M = A M w M � θ M = θ 1 · · · θ M � � After n th stage of orthogonal forward selection, G n = g 1 · · · g n � � is built with corresponding P n = p 1 · · · p n and A n k th row of P n is denoted as p T ( k ) = � � p 1 ( k ) · · · p n ( k )

Probability Density Function Estimation Based Over-Sampling for - PowerPoint PPT Presentation

Introduction PDF Estimation Based Over-sampling Experiments Conclusions Probability Density Function Estimation Based Over-Sampling for Imbalanced Two-Class Problems Ming Gao a , Xia Hong a , Sheng Chen b , c , Chris J. Harris b a School of

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Probability Density (1) Let f ( x 1 , x 2 . . . x n ) be a probability density for the variables {

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Density Function (PDF) Joint Probability Distribution Jo Banana -shaped

Linking losses for density ratio and class-probability estimation Aditya Krishna Menon Cheng

Relative Density Chapters 3.5 Relative Density 1 2/5/2015 Minimum Density Pluviate soil from

Counting and Probability Whats to come? Counting and Probability Whats to come?

Density Ratio Estimation Density Ratio Estimation in Machine Learning in Machine Learning

Outline Density Estimation 1 Nonparametric Methods Bins Kernel Estimators k-Nearest Neighbor

Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5 Kernel Density Estimation

Nonparametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Density Estimation

Polyethylene Monomer: Ethylene High Density Polyethylene (HDPE) Low Density Polyethylene

Bulk Density and Void Content Bulk Density Bulk density ( n .) the mass of a unit volume of bulk

Scaling Distributes Systems Natalia Chechina and RELEASE Team June 11, 2015 N. Chechina,

in a Growth Network Keynote Speech Manila, September 12, 2017 ASEAN at 50: Building Partnerships

Characterizing Ext xtragalactic Pre-Main- Sequence Stars wit ith Machine and Deep Learnin ing

CHAPTER VIII VIII CHAPTER Data Clustering and Data Clustering and Self- -Organizing Feature

Sparse Kernel Density Estimation Technique Based on Zero-Norm Constraint Xia Hong 1 , Sheng Chen 2

Kernel Density Adaptive Random Testing Matthew Patrick and Yue Jia 13 April 2015 Outline

Exploring the Limits of Classification Accuracy Carolyn Kim 1 Lester Mackey 2 1 Computer Science

Deep Learning Generative Models in Wireless Networks Wireless AI Innovation @ Verizon (WAIV)