Bayesian Kernel Methods for Non-Gaussian Distributions Cameron - - PowerPoint PPT Presentation

bayesian kernel methods for
SMART_READER_LITE
LIVE PREVIEW

Bayesian Kernel Methods for Non-Gaussian Distributions Cameron - - PowerPoint PPT Presentation

Bayesian Kernel Methods for Non-Gaussian Distributions Cameron MacKenzie and Theodore Trafalis School of Industrial Engineering University of Oklahoma INFORMS Annual Meeting November 9, 2010 Current Bayesian Kernel methods Combine


slide-1
SLIDE 1

Bayesian Kernel Methods for Non-Gaussian Distributions

Cameron MacKenzie and Theodore Trafalis School of Industrial Engineering University of Oklahoma INFORMS Annual Meeting November 9, 2010

slide-2
SLIDE 2

2 MacKenzie and Trafalis

Current Bayesian Kernel methods

  • Combine Bayesian probability with Support

Vector Machines (SVM)

  • n data points, m attributes
  • X is n x m matrix
  • y is n x 1 vector of 0’s and 1’s
  • q(X) is a function of X used to predict y

           

) ( | | y X X y y X P P P P q q q 

Posterior Prior Likelihood

slide-3
SLIDE 3

3 MacKenzie and Trafalis

Support Vector Machines and idea

  • f kernel methods

Input Space Feature Space

F

     

2 1 2 1

, , x x x x F F  K

slide-4
SLIDE 4

4 MacKenzie and Trafalis

Gaussian distributions

           

X X y y X q q q P P P | | 

Posterior Prior Likelihood

           

i i i i

y P x x x q q q exp 1 exp 1|    Logistic likelihood Normal prior

      K

x x 

n

q q ,..., cov

1

 

 

X  q E

n x n Kernel matrix

Refs: Schölkopf and Smola, 2002 Bishop and Tipping, 2003

slide-5
SLIDE 5

5 MacKenzie and Trafalis

What’s new

  • Beta distributions as priors
  • Adaptation of beta-binomial updating formula
  • Comparison of beta kernel classifiers with

existing SVM classifiers

  • Online learning
slide-6
SLIDE 6

6 MacKenzie and Trafalis

Beta distribution

 

  q , ~Beta

 

   q   E

slide-7
SLIDE 7

7 MacKenzie and Trafalis

Shape of beta density functions

0.2 0.4 0.6 0.8 1 1 2

Beta(1,1)

0.2 0.4 0.6 0.8 1 1 2

Beta(3,3)

0.2 0.4 0.6 0.8 1 5

Beta(5,1)

0.2 0.4 0.6 0.8 1 2 4

Beta(10,10)

0.2 0.4 0.6 0.8 1 2 4

Beta(2,6)

0.2 0.4 0.6 0.8 1 5

Beta(15,6)

q

slide-8
SLIDE 8

8 MacKenzie and Trafalis

Beta-binomial conjugate

  • Prior
  • Likelihood
  • Posterior

 

  q , ~Beta

 

q , ~ n Binomial Y

 

y n y Beta y Y       q , ~ |

Number of ones Number of trials Number of zeros

slide-9
SLIDE 9

9 MacKenzie and Trafalis

Applying beta-binomial to data mining

  • Prior
  • Posterior

   

i i i

Beta   q , ~ x

 

 

 

 

         

 

  1

, 1 , , ~ |

j j

y i j i y i j i i

K K Beta x x x x y x     q n n  

 

            

2 2 2

2 exp , 

i j i j

K x x x x

Number of zeros in training set Parameter to be tuned

slide-10
SLIDE 10

10 MacKenzie and Trafalis

Data set Number of attributes Number

  • f data

points Ones Zeros Training set Tuning set Testing set Parkinson 22 195 147 48 98 58 39 Tornado 83 10,816 721 10,095 541 271 541 Colon Cancer 2,000 62 22 40 31 19 12 Spam 57 4,601 1,813 2,788 460 230 460 Transfusion 4 748 178 570 150 74 524

Data sets

Each training, tuning, and testing set is randomly sampled 100 times.

slide-11
SLIDE 11

11 MacKenzie and Trafalis

Testing on data sets

Data set Percentage

  • f ones in

data set Beta prior Weighted SVM Regular SVM TP rate 86 91 98 TN rate 95 76 75 TP rate 80 87 59 TN rate 97 91 99 TP rate 87 78 77 TN rate 85 93 95 TP rate 85 85 85 TN rate 85 93 95 TP rate 71 69 24 TN rate 61 64 94 Parkinson Tornado Colon Cancer Spam Transfusion 75% 7% 35% 39% 24%

slide-12
SLIDE 12

12 MacKenzie and Trafalis

  E[q ]   E[q ]   E[q ] Prior 1 1 0.5 0.7 9.3 0.07 0.7 9.3 0.07 1 1.01 1.00 0.50 0.71 9.30 0.07 0.71 9.30 0.07 2 1.01 1.00 0.50 0.71 9.30 0.07 0.71 9.30 0.07 3 1.10 1.00 0.52 0.80 9.30 0.08 0.81 9.30 0.08 5 1.16 1.00 0.54 0.86 9.30 0.08 0.88 9.38 0.09 10 1.49 1.01 0.60 1.19 9.31 0.11 1.22 9.41 0.11 Weighted likelihood Weighted likelihood Unweighted likelihood Trial   E[q ]   E[q ]   E[q ] Prior 1 1 0.5 0.7 9.3 0.070 0.7 9.3 0.07 1 1.00 1.13 0.47 0.70 9.43 0.069 0.70 16.03 0.04 2 1.02 1.42 0.42 0.72 9.72 0.069 0.72 21.82 0.03 3 1.02 1.93 0.35 0.72 10.23 0.066 0.72 27.47 0.03 5 1.08 2.41 0.31 0.78 10.71 0.068 0.78 38.13 0.02 10 1.24 3.95 0.24 0.94 12.25 0.071 0.95 66.24 0.01 Weighted likelihood Weighted likelihood Unweighted likelihood Trial

Online learning

Updated probabilities for one data point from tornado data y = 0 y = 1

Each trial uses 100 data points to update prior

slide-13
SLIDE 13

13 MacKenzie and Trafalis

Conclusions

  • Adapting the beta-binomial updating rule to a

kernel-based classifier can create a fast and accurate data mining algorithm

  • User can set prior and weights to reflect

imbalanced data sets

  • Results are comparable to weighted SVM
  • Online learning combines previous and

current information

slide-14
SLIDE 14

14 MacKenzie and Trafalis

Questions

cmackenzie@ou.edu