Sparse Kernel Density Estimation Technique Based on Zero-Norm - PowerPoint PPT Presentation

Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Sparse Kernel Density Estimation Technique Based on Zero-Norm Constraint Xia Hong 1 , Sheng Chen 2 , Chris J. Harris 2 1 School of Systems Engineering University of Reading, Reading RG6 6AY, UK E-mail: x.hong@reading.ac.uk 2 School of Electronics and Computer Science University of Southampton, Southampton SO17 1BJ, UK E-mails: {sqc,cjh}@ecs.soton.ac.uk International Joint Conference on Neural Networks 2010

Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Outline Motivations 1 Existing Regularisation Approaches Our Contributions Proposed Sparse Kernel Density Estimator 2 Problem Formulation Approximate Zero-Norm Regularisation D -Optimality Based Subset Selection Numerical Examples 3 Experimental Set Up Experimental Results Conclusions 4

Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Regularisation Methods Two-norm of weight vector Naturally combined with quadratic main cost function, and computationally efficient implementation Only drive many weights to small near-zero values One-norm of weight vector Can drive many weights to zero, and hence should achieve sparser results than two-norm based method Harder to minimise and higher complexity implementation Zero-norm of weight vector Ultimate model sparsity and generalisation performance Intractable in implementation, and even with approximation, very difficult to minimise and impose very high complexity Two-norm and one-norm based regularisations have been combined with OLS algorithm, with the former approach providing highly efficient sparse kernel modelling

Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Our Contributions We incorporate an effective approximate zero-norm regularisation into sparse kernel density estimation Approximate zero-norm naturally merges into underlying constrained nonnegative quadratic programming Various SVM algorithms can readily be applied to obtain SKD estimate efficiently Proposed sparse kernel density estimator: use D -optimality OLS subset selection to select a small number of significant kernels, in terms of kernel eigenvalues then solve final SKD estimate from associate subset constrained nonnegative quadratic programming

Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Kernel Density Estimation Give finite data set D N = { x k } N k = 1 , drawn from unknown density p ( x ) , where x k ∈ R m Infer p ( x ) based on D N using kernel density estimate N � ˆ p ( x ; β N , ρ ) = β k K ρ ( x , x k ) k = 1 β k ≥ 0 , 1 ≤ k ≤ N , β T s.t. N 1 N = 1 Here β N = [ β 1 β 2 · · · β N ] T : kernel weight vector, 1 N : the vector of ones with dimension N , and K ρ ( • , • ) : chosen kernel function with kernel width ρ Unsupervised density estimation ⇒ “ supervised ” regression using Parzen window estimate as “desired response”

Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Regression Formulation For x k ∈ D N , denote ˆ y k = ˆ p ( x k ; β N , ρ ) , y k as Parzen window estimate at x k , and ε k = y k − ˆ y k ⇒ regression formulation y k + ε k = φ T y k = ˆ N ( k ) β N + ε k or over D N y = Φ N β N + ε Associated constrained nonnegative quadratic programming � � 2 β T 1 N B N β N − v T min N β N β N s.t. β T N 1 N = 1 and β i ≥ 0 , 1 ≤ i ≤ N where B N = Φ T N Φ N is the design matrix and v N = Φ T N y This is not using kernel density estimate to fit Parzen window estimate !

Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Zero-Norm Constraint Given α > 0, an approximation to zero norm � β N � 0 is N � 1 − e − α | β i | � � � β N � 0 ≈ i = 1 Combining this zero-norm constraint with constrained NNQP � N 1 − e − α | β i | �� 1 2 β T N B N β N − v T � � min N β N + λ β N i = 1 s.t. β T N 1 N = 1 and β i ≥ 0 , 1 ≤ i ≤ N with λ > 0 a small “regularisation” parameter With 2nd order Taylor series expansion for e − α | β i | e − α | β i | ≈ 1 − α | β i | + α 2 β 2 i ⇒ 2 N N N | β i | − α 2 � 1 − e − α | β i | � � � � β 2 ≈ α i 2 i = 1 i = 1 i = 1

Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Constrained NNQP Hence, “new” constrained NNQP � � 2 β T 1 N A N β N − v T min N β N β N s.t. β T N 1 N = 1 and β i ≥ 0 , 1 ≤ i ≤ N A N = B N − δ I N and δ = λα 2 predetermined small parameter Remark : Under convexity constraint on β N , minimisation of approximate zero norm ⇔ maximisation of two norm β T N I N β N Design matrix B N should positive definite , and δ bounded by smallest eigenvalue of B N so that A N also positive definite Common for B N of large data set to be ill-conditioned Approach most effective when it is applied following some model subset selection preprocessing

Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions D -Optimality Design Least squares estimate ˆ β N = B − 1 N Φ T N y is unbiased and � ˆ ∝ B − 1 � covariance matrix of estimate Cov β N N Estimation accurate depends on condition number C = max { σ i , 1 ≤ i ≤ N } min { σ i , 1 ≤ i ≤ N } where σ i , 1 ≤ i ≤ N , are eigenvalues of B N D -optimality design maximises determinant of design matrix Selected subset model Φ N s maximises � � Φ T � � det N s Φ N s = det B N s Prevent oversized ill-posed model and high estimate variances “ Unsupervised ” D -optimality design particularly suitable for determining structure of kernel density estimate

Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions OFR Aided Algorithm Orthogonal forward regression selects Φ N s of N s significant kernels based on D -optimality criterion Complexity of this preprocessing no more than O ( N 2 ) This preprocessing results in subset constrained NNQP � � 2 β T 1 N s A N s β N s − v T min N s β N s β Ns s.t. β T N s 1 N s = 1 and β i ≥ 0 , 1 ≤ i ≤ N s with v N s = Φ T N s y , A N s = B N s − δ I N s , B N s = Φ T N s Φ N s , δ < w T N s w N s Various SVM algorithms can be used to solve this problem As N s is very small and A N s is well-conditioned, we use simple multiplicative nonnegative quadratic programming algorithm Complexity of which is negligible, in comparison with O ( N 2 ) of D -optimality based OFR preprocessing

Sparse Kernel Density Estimation Technique Based on Zero-Norm - PowerPoint PPT Presentation

Motivations Proposed Sparse Kernel Density Estimator Numerical Examples Conclusions Sparse Kernel Density Estimation Technique Based on Zero-Norm Constraint Xia Hong 1 , Sheng Chen 2 , Chris J. Harris 2 1 School of Systems Engineering

Lecture 7: Kernel Density Estimation Applied Statistics 2015 1 / 20 Kernel Density Estimator

Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5 Kernel Density Estimation

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Outline Density Estimation 1 Nonparametric Methods Bins Kernel Estimators k-Nearest Neighbor

Nonparametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Density Estimation

Relative Density Chapters 3.5 Relative Density 1 2/5/2015 Minimum Density Pluviate soil from

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Lecture 8: Kernel Density Estimation (2) Applied Statistics 2015 1 / 20 Choice of bandwidth by

Nonparametric density estimation Christopher F Baum EC 823: Applied Econometrics Boston College,

Nonparametric density estimation Christopher F Baum ECON 8823: Applied Econometrics Boston

Zero Waste at The Nat Zero Waste Zero Waste Zero Waste is a philosophy that encourages the

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma

Density Ratio Estimation Density Ratio Estimation in Machine Learning in Machine Learning

Polyethylene Monomer: Ethylene High Density Polyethylene (HDPE) Low Density Polyethylene

Probability Density Function Estimation Based Over-Sampling for Imbalanced Two-Class Problems Ming

Scaling Distributes Systems Natalia Chechina and RELEASE Team June 11, 2015 N. Chechina,

in a Growth Network Keynote Speech Manila, September 12, 2017 ASEAN at 50: Building Partnerships

Characterizing Ext xtragalactic Pre-Main- Sequence Stars wit ith Machine and Deep Learnin ing

Kernel Density Adaptive Random Testing Matthew Patrick and Yue Jia 13 April 2015 Outline

Exploring the Limits of Classification Accuracy Carolyn Kim 1 Lester Mackey 2 1 Computer Science

Deep Learning Generative Models in Wireless Networks Wireless AI Innovation @ Verizon (WAIV)

Generative Adversarial Networks MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain