Non-Parametric Models Review of last class: Decision Tree Learning - PowerPoint PPT Presentation

Non-Parametric Models

Review of last class: Decision Tree Learning • dealing with the overlearning problem: pruning • ensemble learning • boosting

Agenda • Nearest neighbor models • Finding nearest neighbors with kd trees • Locality-sensitive hashing • Nonparametric regression

Non-Parametric Models • doesn’t mean that the model lacks parameters • parameters are not known or fixed in advance • make no assumptions about probability distributions • instead, structure determined from the data

Comparison of Models Parametric Non-Parametric • data summarized by a • data summarized by an fixed set of parameters unknown (or non-fixed) set of parameters • once learned, the original data can be • must keep original data discarded to make predictions or to update model • good when data set is relatively small – avoids • may be slower, but overfitting generally more accurate • best when correct parameters are chosen!

Instance-Based Learning Decision Trees • examples (training set) described by: • input: the values of attributes • output: the classification (yes/no) • can represent any Boolean function

Another NPM approach: Nearest neighbor (k-NN) models • given query x q • answer query by finding the k examples nearest to x q • classification: • take plurality vote (majority for binary classification) of neighbors • regression • take mean or median of neighbor values

Example: Earthquake or Bomb?

Modeling the data with k-NN k = 1 k = 5

Measuring “nearest” • Minkowski distance calculated over each attribute (or dimension) i p ) 1/ p L p ( x j , x q ) = ( ∑ | x j , i − x q , i | i • p = 2: Euclidean distance – typically used if dimensions measure similar properties (e.g., width, height, depth) • p = 1: Manhattan distance – if dimensions measure dissimilar properties (e.g., age, weight, gender)

Recall a problem we faced before • shape of the data looks very different depending on the scale • e.g., height vs. weight, with height in mm or km • similarly, with k-NN, if we change the scale, we’ll end up with different neighbors

Simple solution • simple solution is to normalize: x ' ( x µ / ) = − σ j,i j,i i i

Example: Density estimation x x smallest circles enclosing 10 neighbours 128-point sample MoG representation

Density Estimation using k-NN • # of neighbours impacts quality of estimation ground k=3 k=10 k=40 truth

Curse of dimensionality • we want to find k = 10 nearest neighbors among N=1,000,000 points of an n-dimensional space • sounds easy, right? • volume of neighborhood is k/N • average side length l of neighborhood is (k/N) 1/n n l 1 .00001 2 .003 3 .002 10 .3 20 .56

k-dimensional (kd) trees • balanced binary tree with arbitrary # of dimensions • data structure that allows efficient lookup of nearest neighbors (when # of examples >> k) • recursively divides data into left and right branches based on value of dimension i

k-dimensional (kd) trees • query value might be on left half of divide but have some of k nearest neighbors on right half • decide whether to inspect the right half based on distance of best match found from dividing hyperplane

Locality-Sensitive Hashing (LSH) • uses a combination of n random projections, built from subsets of the bit-string representation of each value • value of each of the n projections stored in the associated hash bucket

Locality-Sensitive Hashing (LSH) • on search, the set of points from all hash buckets corresponding to the query are combined together • then measure distance from query value to each of the returned values • real-world example: • data set of 13 million samples of 512 dimensions • LSH only needs to examine a few thousand images • 1000-fold improvement over kd trees!

Nonparametric Regression Models • Let’s see how different NPM strategies fare on a regression problem

Piecewise linear regression

3-NN Average

Linear regression through 3-NN

Local weighting of data with kernel 1 0.5 0 -10 -5 0 5 10 quadratic kernel with k = 10:

Locally weighted quadratic kernel k=10

Comparison connect the 3-NN average dots locally weighted regression 3-NN linear (quadratic kernel width k=10) regression

Next class • Statistical learning methods, Ch. 20

Non-Parametric Models Review of last class: Decision Tree Learning - PowerPoint PPT Presentation

Non-Parametric Models Review of last class: Decision Tree Learning dealing with the overlearning problem: pruning ensemble learning boosting Agenda Nearest neighbor models Finding nearest neighbors with kd trees

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Introduction to non-parametric Bayes Introduction to non-parametric Bayes methods 1 Overview

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Towards a non-parametric Towards a non-parametric stochastic framework: a consistent approach of

Non parametric prediction and mapping of standing Non-parametric prediction and mapping of

Parametric vs Nonparametric Models Parametric models assume some finite set of parameters .

Decision tree learning Aim: find a small tree consistent with the training examples Idea:

A Brief History of Decision Tree Implementation MAX AUSTIN Overview Famous Decision Tree

TCTL model checking lower/upper-bound Introduction parametric timed automata without Parametric

CMSC427 Notes on piecewise parametric curves: Hermite, Catmull-Rom, and Bezier I. Parametric

Nonlinear Aerodynamics Modeling Using Fuzzy Logic Jay Brandon Eugene Morelli Outline NARI

Nonlinear models Posterior Gradient Ascent Adaptive Step Size Approach to Limit Example Will

Great West Way Conference Andrew Stokes England Director 1 The Discover England Fund

Mental health and economic conditions: how do economic uctuations inuence mental health?

Asthma CarePartners Program Sinai Urban Health Institute AAAAI Meeting March 1, 2014 Asthma

Uncertainty Quantification for Complex Aero-mechanical Systems S Adhikari School of Engineering,

Enforcement Overview Enforcement Overview 2010 Michigan Environmental Compliance Conference

Developing Developmental Relationships Menlo-Atherton High School May 1, 2014 Visit