Non-parametric Methods
Selim Aksoy
Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr
CS 551, Fall 2012
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 1 / 25
Non-parametric Methods Selim Aksoy Department of Computer - - PowerPoint PPT Presentation
Non-parametric Methods Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2012 CS 551, Fall 2012 2012, Selim Aksoy (Bilkent University) c 1 / 25 Introduction Density estimation
Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 1 / 25
◮ Density estimation with parametric models assumes that
◮ However, common parametric forms do not always fit the
◮ In addition, most of the classical parametric densities are
◮ Non-parametric methods can be used with arbitrary
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 2 / 25
◮ Suppose that n samples x1, . . . , xn are drawn i.i.d.
◮ The probability P that a vector x will fall in a region R is
◮ The probability that k of the n will fall in R is given by the
◮ The expected value of k is E[k] = nP and the MLE for P is
n.
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 3 / 25
◮ If we assume that p(x) is continuous and R is small enough
◮ Then, the density estimate becomes
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 4 / 25
◮ Let n be the number of samples used, Rn be the region
Vn be the
◮ If pn(x) is to converge to p(x), three conditions are required:
n→∞ Vn = 0
n→∞ kn = ∞
n→∞
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 5 / 25
◮ A very simple method is to
◮ The estimate of the density at a point x becomes
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 6 / 25
◮ Although the histogram method is very easy to implement, it
◮ Many observations are required to prevent the estimate
◮ Modifications for overcoming these difficulties:
◮ Data-adaptive histograms, ◮ Independence assumption (naive Bayes), ◮ Dependence trees. CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 7 / 25
◮ Other methods for obtaining the regions for estimation:
◮ Shrink regions as some function of n, such as Vn = 1/√n.
◮ Specify kn as some function of n, such as kn = √n. This is
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 8 / 25
◮ Suppose that ϕ is a d-dimensional window function that
◮ A density estimate can be obtained as
n
n.
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 9 / 25
◮ The density estimate can also be written as
n
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 10 / 25
◮ If hn is very large, pn(x) is the superposition of n broad functions,
◮ If hn is very small, pn(x) is the superposition of n sharp pulses
◮ As hn approaches zero, δn(x − xi) approaches a Dirac delta
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 11 / 25
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 12 / 25
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 13 / 25
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 14 / 25
◮ Densities estimated using Parzen windows can be used with the
◮ The training error can be made arbitrarily low by making the
◮ However, the goal is to classify novel patterns so the window
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 15 / 25
◮ A potential remedy for the problem of the unknown “best”
◮ To estimate p(x) from n samples, we can center a volume
◮ These samples are called the k-nearest neighbors of x. ◮ If the density is high near x, the volume will be relatively
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 16 / 25
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 17 / 25
◮ Posterior probabilities can be estimated from a set of n
◮ Suppose that a volume V around x includes k samples, ki
◮ As estimate for the joint probability p(x, wi) becomes
j=1 pn(x, wj) = ki
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 18 / 25
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 19 / 25
◮ Advantages:
◮ No assumptions are needed about the distributions ahead of
◮ With enough samples, convergence to an arbitrarily
◮ Disadvantages:
◮ The number of samples needed may be very large (number
◮ There may be severe requirements for computation time and
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 20 / 25
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 21 / 25
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 22 / 25
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 23 / 25
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 24 / 25
CS 551, Fall 2012 c 2012, Selim Aksoy (Bilkent University) 25 / 25