Bayesian Kernel Methods for Non-Gaussian Distributions Cameron - PowerPoint PPT Presentation

Bayesian Kernel Methods for Non-Gaussian Distributions Cameron MacKenzie and Theodore Trafalis School of Industrial Engineering University of Oklahoma INFORMS Annual Meeting November 9, 2010

Current Bayesian Kernel methods • Combine Bayesian probability with Support Vector Machines (SVM) • n data points, m attributes • X is n x m matrix • y is n x 1 vector of 0’s and 1’s • q ( X ) is a function of X used to predict y Likelihood Prior Posterior         q q P y | X P X     q  P X | y P ( y ) MacKenzie and Trafalis 2

Support Vector Machines and idea of kernel methods Feature Space Input Space F        F F K x , x x , x 1 2 1 2 MacKenzie and Trafalis 3

Gaussian distributions Refs: Schölkopf and Likelihood Prior Posterior Smola, 2002 Bishop and Tipping, 2003             q  q q P X | y P y | X P X Logistic likelihood     q exp x      q  i P y 1| x      q i i 1 exp x i Normal prior     q X  E 0 n x n Kernel matrix       K q q  cov x ,..., x 1 n MacKenzie and Trafalis 4

What’s new • Beta distributions as priors • Adaptation of beta-binomial updating formula • Comparison of beta kernel classifiers with existing SVM classifiers • Online learning MacKenzie and Trafalis 5

Beta distribution   q   ~ Beta ,    q  E    MacKenzie and Trafalis 6

Shape of beta density functions 2 2 Beta(1,1) Beta(3,3) 1 1 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 5 4 Beta(10,10) Beta(5,1) 2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 4 5 Beta(2,6) Beta(15,6) 2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 q MacKenzie and Trafalis 7

Beta-binomial conjugate • Prior   q   ~ Beta , • Likelihood Number of trials   q Y ~ Binomial n , • Posterior   q       | Y y ~ Beta y , n y Number of zeros Number of ones MacKenzie and Trafalis 8

Applying beta-binomial to data mining     q   • Prior x ~ Beta , i i i • Posterior                q        x | y ~ Beta K x , x , 1 K x , x  i i j i i j i     y 1 y 0 j j   2 Number of zeros  x x     in training set n  j i   2 K x , x exp      j i   2 2   n Parameter to be tuned MacKenzie and Trafalis 9

Data sets Number Number of Training Tuning Testing Data set of data Ones Zeros attributes set set set points Parkinson 22 195 147 48 98 58 39 Tornado 83 10,816 721 10,095 541 271 541 Colon Cancer 2,000 62 22 40 31 19 12 Spam 57 4,601 1,813 2,788 460 230 460 Transfusion 4 748 178 570 150 74 524 Each training, tuning, and testing set is randomly sampled 100 times. MacKenzie and Trafalis 10

Testing on data sets Percentage Weighted Regular Data set of ones in Beta prior SVM SVM data set TP rate 86 91 98 Parkinson 75% TN rate 95 76 75 TP rate 80 87 59 Tornado 7% TN rate 97 91 99 TP rate 87 78 77 Colon Cancer 35% TN rate 85 93 95 TP rate 85 85 85 Spam 39% TN rate 85 93 95 TP rate 71 69 24 Transfusion 24% TN rate 61 64 94 MacKenzie and Trafalis 11

Online learning Updated probabilities for one data point from tornado data y = 0 Weighted likelihood Weighted likelihood Unweighted likelihood Each trial Trial E[ q ] E[ q ] E[ q ]       uses 100 Prior 1 1 0.5 0.7 9.3 0.070 0.7 9.3 0.07 data points 1 1.00 1.13 0.47 0.70 9.43 0.069 0.70 16.03 0.04 to update prior 2 1.02 1.42 0.42 0.72 9.72 0.069 0.72 21.82 0.03 3 1.02 1.93 0.35 0.72 10.23 0.066 0.72 27.47 0.03 5 1.08 2.41 0.31 0.78 10.71 0.068 0.78 38.13 0.02 10 1.24 3.95 0.24 0.94 12.25 0.071 0.95 66.24 0.01 y = 1 Weighted likelihood Weighted likelihood Unweighted likelihood Trial   E[ q ]   E[ q ]   E[ q ] Prior 1 1 0.5 0.7 9.3 0.07 0.7 9.3 0.07 1 1.01 1.00 0.50 0.71 9.30 0.07 0.71 9.30 0.07 2 1.01 1.00 0.50 0.71 9.30 0.07 0.71 9.30 0.07 3 1.10 1.00 0.52 0.80 9.30 0.08 0.81 9.30 0.08 5 1.16 1.00 0.54 0.86 9.30 0.08 0.88 9.38 0.09 10 1.49 1.01 0.60 1.19 9.31 0.11 1.22 9.41 0.11 MacKenzie and Trafalis 12

Conclusions • Adapting the beta-binomial updating rule to a kernel-based classifier can create a fast and accurate data mining algorithm • User can set prior and weights to reflect imbalanced data sets • Results are comparable to weighted SVM • Online learning combines previous and current information MacKenzie and Trafalis 13

Questions cmackenzie@ou.edu MacKenzie and Trafalis 14

Bayesian Kernel Methods for Non-Gaussian Distributions Cameron - PowerPoint PPT Presentation

Bayesian Kernel Methods for Non-Gaussian Distributions Cameron MacKenzie and Theodore Trafalis School of Industrial Engineering University of Oklahoma INFORMS Annual Meeting November 9, 2010 Current Bayesian Kernel methods Combine

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods

Optimization for Kernel Methods S. Sathiya Keerthi Yahoo! Research, Burbank, CA, USA Kernel

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Heterogeneity of Australian Population Mortality, and Implications for a Viable Life Annuity

Receive 4 forecasts from different NWP models. One problem is resolution: UK NWP models have a

Introduction to State Space Methods Siem Jan Koopman s.j.koopman@feweb.vu.nl Vrije Universiteit

Overview: Integrated DNAPL Site Integrated DNAPL Site Characterization and Tools Selection Naji

www.berl.co.nz Will wellbeing be gazumped? As sombre economic outlook hovers Haratua 2019

Basic Concepts and Definitions Prof. Eric Sims University of Notre Dame Fall 2015 1 / 21 Gross

The policy challenge: Catalyse the private sector for stronger and more inclusive growth ngel

By Mohammad Altaf Ul Alam, PhD Deputy Secretary Macroeconomic Wing Finance Division,