Introduction to Machine Learning Random Forests: Proximities - PowerPoint PPT Presentation

Introduction to Machine Learning Random Forests: Proximities compstat-lmu.github.io/lecture_i2ml

RANDOM FOREST PROXIMITIES One of the most useful tools in random forests A measure of similarity ("closeness" or "nearness") of observations derived from random forests can be calculated for each pair of observations Definition: The proximity between two observations x ( i ) and x ( j ) is calculated by measuring the number of times that these two observations are placed in the same terminal node of the same tree of random forest, divided by the number of trees in the forest The proximity of observations x ( i ) and x ( j ) can be written as x ( i ) , x ( j ) � � prox The proximities form an intrinsic similarity measure between pairs of observations The proximities of all observations form a symmetric n × n matrix. � c Introduction to Machine Learning – 1 / 6

RANDOM FOREST PROXIMITIES Algorithm: Once a random forest has been trained, all of the training data is put through each tree (both in- and out-of-bag). Every time two observations x ( i ) and x ( j ) end up in the same terminal node of a tree, their proximity is increased by one. Once all data has been put through all trees and the proximities have been counted, the proximities are normalized by dividing them by the number of trees. � c Introduction to Machine Learning – 2 / 6

USING RANDOM FOREST PROXIMITIES Imputing missing data: Replace missing values for a given variable using the median 1 of the non-missing values Get proximities 2 Replace missing values in observation x ( i ) by a weighted 3 average of non-missing values, with weights proportional to the proximity between observation x ( i ) and the observations with the non-missing values Steps 2 and 3 are then iterated a few times. Locating outliers: An outlier is an observation whose proximities to all other observations are small Measure of outlyingness can be computed for each observation in the training sample If the measure is unusually large, the observation should be carefully inspected � c Introduction to Machine Learning – 3 / 6

USING RANDOM FOREST PROXIMITIES Identifying mislabeled data: Instances in the training dataset are sometimes labeled ambiguously or incorrectly, especially in “manually” created data sets. Proximities can help in finding them: they often show up as outliers in terms of their proximity values. Visualizing the forest � x ( i ) , x ( j ) � The values 1 − prox can be thought of as distances in a high-dimensional space They can be projected onto a low-dimensional space using metric multidimensional scaling (MDS) Metric multidimensional scaling uses eigenvectors of a modified version of the proximity matrix to get scaling coordinates � c Introduction to Machine Learning – 4 / 6

USING RANDOM FOREST PROXIMITIES image from G. Louppe (2014) Understanding Random Forests arXiv:1407.7502 . � c Introduction to Machine Learning – 5 / 6

USING RANDOM FOREST PROXIMITIES The figure depicts the proximity matrix learnt for a 10-class handwritten digit classification task proximity matrix distances projected onto the plane using multidimensional scaling samples from the same class form identifiable clusters, which suggests that they share similar structure also shows the fact for which classes errors are made, e.g. digits 1 and 8 have high within class variance and have overlaps with other classes � c Introduction to Machine Learning – 6 / 6

Introduction to Machine Learning Random Forests: Proximities - PowerPoint PPT Presentation

Introduction to Machine Learning Random Forests: Proximities compstat-lmu.github.io/lecture_i2ml RANDOM FOREST PROXIMITIES One of the most useful tools in random forests A measure of similarity ("closeness" or "nearness") of

Chapter 9 Object recognition Random Forests 9.9 Random forests 2 9.9 Random forests

Random forests and wine Machine Learning Toolbox Random forests Popular type of machine

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Random

A Look at our Wyoming Forests December 18 - 20, 2013 Governors Task Force on Forests Forests

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck & Co., Inc.

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Forests and Climate Forests and Climate Keeping Earth a Livable Place Keeping Earth a Livable

South- -East East Pahang Pahang Peat Peat South Swamp Forests, Malaysia Swamp Forests,

Mangrove forests and sea level rise 1 / 48 00001 - 00:00:01 Mangrove forests and sea level rise

Introduction to Machine Learning Random Forest: Benchmarking Trees, Forests, and Bagging K-NN

Econ 2148, fall 2019 Trees, forests, and causal trees Maximilian Kasy Department of Economics,

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Our Changing Forests Harvard Forest Schoolyard Project August 22, 2019 1. How do forests change?

Conservation Plan Update Liz Dent, State Forests Division Chief Brian Pew, State Forests Deputy

STOCHASTIC PROXIMAL LANGEVIN ALGORITHM Adil Salim Joint work with Dmitry Kovalev and Peter

Introduction to Mobile Robotics Proximity Sensors Wolfram Burgard, Cyrill Stachniss, Maren

Preservation of prox-regularity Florent Nacry 1 Based on a joint work with Samir Adly and Lionel

v F c v F c 2 1 4 3 4 v < 3 v < 2 v < 1 v v F c (

RFID SECURITY MODULE 20th december 2017 pepe vila @cgvwzq

Introduction to Experimental Robotics CSCI 1108 Lecture 18 Course Review (2) CSCI 1108

Differential inclusions and applications Sweeping process Introduction New assumption Juliette

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization Jialei Wang

Introduction to Machine Learning Random Forests: Proximities - PowerPoint PPT Presentation

Introduction to Machine Learning Random Forests: Proximities compstat-lmu.github.io/lecture_i2ml RANDOM FOREST PROXIMITIES One of the most useful tools in random forests A measure of similarity ("closeness" or "nearness") of

Chapter 9 Object recognition Random Forests 9.9 Random forests 2 9.9 Random forests

Random forests and wine Machine Learning Toolbox Random forests Popular type of machine

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Random

A Look at our Wyoming Forests December 18 - 20, 2013 Governors Task Force on Forests Forests

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck &amp; Co., Inc.

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Forests and Climate Forests and Climate Keeping Earth a Livable Place Keeping Earth a Livable

South- -East East Pahang Pahang Peat Peat South Swamp Forests, Malaysia Swamp Forests,

Mangrove forests and sea level rise 1 / 48 00001 - 00:00:01 Mangrove forests and sea level rise

Introduction to Machine Learning Random Forest: Benchmarking Trees, Forests, and Bagging K-NN

Econ 2148, fall 2019 Trees, forests, and causal trees Maximilian Kasy Department of Economics,

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Our Changing Forests Harvard Forest Schoolyard Project August 22, 2019 1. How do forests change?

Conservation Plan Update Liz Dent, State Forests Division Chief Brian Pew, State Forests Deputy

STOCHASTIC PROXIMAL LANGEVIN ALGORITHM Adil Salim Joint work with Dmitry Kovalev and Peter

Introduction to Mobile Robotics Proximity Sensors Wolfram Burgard, Cyrill Stachniss, Maren

Preservation of prox-regularity Florent Nacry 1 Based on a joint work with Samir Adly and Lionel

v F c v F c 2 1 4 3 4 v &lt; 3 v &lt; 2 v &lt; 1 v v F c (

RFID SECURITY MODULE 20th december 2017 pepe vila @cgvwzq

Introduction to Experimental Robotics CSCI 1108 Lecture 18 Course Review (2) CSCI 1108

Differential inclusions and applications Sweeping process Introduction New assumption Juliette

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization Jialei Wang

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck & Co., Inc.

v F c v F c 2 1 4 3 4 v < 3 v < 2 v < 1 v v F c (