Bayesian Kernel Methods for Non-Gaussian Distributions Cameron - - PowerPoint PPT Presentation
Bayesian Kernel Methods for Non-Gaussian Distributions Cameron - - PowerPoint PPT Presentation
Bayesian Kernel Methods for Non-Gaussian Distributions Cameron MacKenzie and Theodore Trafalis School of Industrial Engineering University of Oklahoma INFORMS Annual Meeting November 9, 2010 Current Bayesian Kernel methods Combine
2 MacKenzie and Trafalis
Current Bayesian Kernel methods
- Combine Bayesian probability with Support
Vector Machines (SVM)
- n data points, m attributes
- X is n x m matrix
- y is n x 1 vector of 0’s and 1’s
- q(X) is a function of X used to predict y
) ( | | y X X y y X P P P P q q q
Posterior Prior Likelihood
3 MacKenzie and Trafalis
Support Vector Machines and idea
- f kernel methods
Input Space Feature Space
F
2 1 2 1
, , x x x x F F K
4 MacKenzie and Trafalis
Gaussian distributions
X X y y X q q q P P P | |
Posterior Prior Likelihood
i i i i
y P x x x q q q exp 1 exp 1| Logistic likelihood Normal prior
K
x x
n
q q ,..., cov
1
X q E
n x n Kernel matrix
Refs: Schölkopf and Smola, 2002 Bishop and Tipping, 2003
5 MacKenzie and Trafalis
What’s new
- Beta distributions as priors
- Adaptation of beta-binomial updating formula
- Comparison of beta kernel classifiers with
existing SVM classifiers
- Online learning
6 MacKenzie and Trafalis
Beta distribution
q , ~Beta
q E
7 MacKenzie and Trafalis
Shape of beta density functions
0.2 0.4 0.6 0.8 1 1 2
Beta(1,1)
0.2 0.4 0.6 0.8 1 1 2
Beta(3,3)
0.2 0.4 0.6 0.8 1 5
Beta(5,1)
0.2 0.4 0.6 0.8 1 2 4
Beta(10,10)
0.2 0.4 0.6 0.8 1 2 4
Beta(2,6)
0.2 0.4 0.6 0.8 1 5
Beta(15,6)
q
8 MacKenzie and Trafalis
Beta-binomial conjugate
- Prior
- Likelihood
- Posterior
q , ~Beta
q , ~ n Binomial Y
y n y Beta y Y q , ~ |
Number of ones Number of trials Number of zeros
9 MacKenzie and Trafalis
Applying beta-binomial to data mining
- Prior
- Posterior
i i i
Beta q , ~ x
1
, 1 , , ~ |
j j
y i j i y i j i i
K K Beta x x x x y x q n n
2 2 2
2 exp ,
i j i j
K x x x x
Number of zeros in training set Parameter to be tuned
10 MacKenzie and Trafalis
Data set Number of attributes Number
- f data
points Ones Zeros Training set Tuning set Testing set Parkinson 22 195 147 48 98 58 39 Tornado 83 10,816 721 10,095 541 271 541 Colon Cancer 2,000 62 22 40 31 19 12 Spam 57 4,601 1,813 2,788 460 230 460 Transfusion 4 748 178 570 150 74 524
Data sets
Each training, tuning, and testing set is randomly sampled 100 times.
11 MacKenzie and Trafalis
Testing on data sets
Data set Percentage
- f ones in
data set Beta prior Weighted SVM Regular SVM TP rate 86 91 98 TN rate 95 76 75 TP rate 80 87 59 TN rate 97 91 99 TP rate 87 78 77 TN rate 85 93 95 TP rate 85 85 85 TN rate 85 93 95 TP rate 71 69 24 TN rate 61 64 94 Parkinson Tornado Colon Cancer Spam Transfusion 75% 7% 35% 39% 24%
12 MacKenzie and Trafalis
E[q ] E[q ] E[q ] Prior 1 1 0.5 0.7 9.3 0.07 0.7 9.3 0.07 1 1.01 1.00 0.50 0.71 9.30 0.07 0.71 9.30 0.07 2 1.01 1.00 0.50 0.71 9.30 0.07 0.71 9.30 0.07 3 1.10 1.00 0.52 0.80 9.30 0.08 0.81 9.30 0.08 5 1.16 1.00 0.54 0.86 9.30 0.08 0.88 9.38 0.09 10 1.49 1.01 0.60 1.19 9.31 0.11 1.22 9.41 0.11 Weighted likelihood Weighted likelihood Unweighted likelihood Trial E[q ] E[q ] E[q ] Prior 1 1 0.5 0.7 9.3 0.070 0.7 9.3 0.07 1 1.00 1.13 0.47 0.70 9.43 0.069 0.70 16.03 0.04 2 1.02 1.42 0.42 0.72 9.72 0.069 0.72 21.82 0.03 3 1.02 1.93 0.35 0.72 10.23 0.066 0.72 27.47 0.03 5 1.08 2.41 0.31 0.78 10.71 0.068 0.78 38.13 0.02 10 1.24 3.95 0.24 0.94 12.25 0.071 0.95 66.24 0.01 Weighted likelihood Weighted likelihood Unweighted likelihood Trial
Online learning
Updated probabilities for one data point from tornado data y = 0 y = 1
Each trial uses 100 data points to update prior
13 MacKenzie and Trafalis
Conclusions
- Adapting the beta-binomial updating rule to a
kernel-based classifier can create a fast and accurate data mining algorithm
- User can set prior and weights to reflect
imbalanced data sets
- Results are comparable to weighted SVM
- Online learning combines previous and
current information
14 MacKenzie and Trafalis