Random Forests vs. Deep Learning Christian Wolf Universit de Lyon, - PowerPoint PPT Presentation

Random Forests vs. Deep Learning Christian Wolf Université de Lyon, INSA-Lyon LIRIS UMR CNRS 5205 November 26 th , 2015

RF vs. DL Goal: prediction (classification, regression) max pooling ConvD2 HLV1 HLV2 (𝐽, x) vD1 shared hid tree 1 HLS vC1 ConvC2 ConvD2 HLV1 𝑄 � 𝑄 � (𝑑) vD1 HLV2 (𝐽, x) vC1 ConvC2 HLM2 HLM3 tree 𝑈 … HLM1 eature extractor 𝑄 � (𝑑) HLA2 HLA1 equency ConvA1 ograms

Deep networks - Many layers, many parameters …. and all of them are used for testing, for each single sample! - Feature learning integred into classification - End-to-end training, using gradient of the loss function max pooling ConvD2 Layer Filter size / n.o. units N.o. parameters Pooling HLV1 Paths V1, V2 HLV2 Input D1,D2 72 × 72 × 5 - 2 × 2 × 1 ConvD1 25 × 5 × 5 × 3 1900 2 × 2 × 3 vD1 ConvD2 25 × 5 × 5 650 1 × 1 shared hid HLS Input C1,C2 72 × 72 × 5 - 2 × 2 × 1 vC1 ConvC1 25 × 5 × 5 × 3 1900 2 × 2 × 3 ConvC2 ConvD2 ConvC2 25 × 5 × 5 650 1 × 1 HLV1 HLV1 3 240 900 - 900 HLV2 450 405 450 - vD1 Path M Input M 183 - HLV2 HLM1 700 128 800 - vC1 ConvC2 HLM2 HLM3 HLM2 700 490 700 - HLM3 350 245 350 - Path A Input A 40 × 9 - 1 × 1 HLM1 eature ConvA1 25 × 5 × 5 650 1 × 1 extractor HLA1 700 3 150 000 - HLA2 350 245 350 - Shared layers HLA2 HLA1 HLS1 3 681 600 - equency ConvA1 1600 ograms HLS2 84 134 484 - Output layer 21 1785 - 12.4M per scale = 37.2M parameters total!

Random Forests - Many levels, many parameters …. but only log 2 (N) of them are used for testing! - Training is done layer-wise, no end-to-end. No gradient on the objective function - No/limited feature learning (𝐽, x) tree 1 𝑄 𝑄 � (𝑑)

RF vs DL : applications (1) Full body pose, Random Forest Hand pose with a deep network 3 Trees, depth 20 Semi/weakly supervised training >10M parameters 8 layers, ~5M parameters [Shotton et al., CVPR 2011] [Neverova, Wolf, Nebout, Taylor, (Microsoft Research) under review, arXiv 2015]

RF vs DL : applications (2) Scene parsing with structured Scene parsing with deep networks random forests (5 layers, ~2M parameters) [Kontschieder et al., CVPR 2014] [Fourure, Emonet, Fromont, Muselet, (Microsoft Research) Tremeau, Wolf, under review]

Types of random forests (𝐽, x) Classical random forests tree 1 𝑄 � 𝑄 � (𝑑) Structured random forests R Neural random forests R 2 X R R f (0) : X → R 3 Deep convolutional random forests f 11 f 9 f 12 f 8 f 13 f 10 f 14 d 8 d 9 d 10 d 11 d 12 d 13 d 14 π 9 π 10 π 11 π 12 π 13 π 14 π 15 π 16

Example for classical RF Real-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton Andrew Fitzgibbon Mat Cook Toby Sharp Mark Finocchio Richard Moore Alex Kipman Andrew Blake Microsoft Research Cambridge & Xbox Incubation [Shotton et al., CVPR 2011] (Best Paper!)

Depth images -> 3D joint locations depth image body parts 3D joint proposals [Shotton et al., CVPR 2011]

synthetic (train & test) 31 ¡body ¡parts (Labels) real (test) [Shotton et al., CVPR 2011]

Classification with random forests (𝐽, x) (𝐽, x) tree 1 tree 𝑈 … 𝑄 � (𝑑) 𝑄 � (𝑑) Each split node thresholds one of the features Each leaf node contains a class distribution ution P t ( c | I, x ) Class distributions are averaged over the trees: utions are averaged T P ( c | I, x ) = 1 X P t ( c | I, x ) . T t =1 [Shotton et al., CVPR 2011]

Learning & Entropy A good split function minimizes entropy in the label distributions. Q 444 0.2 0.4 0.4 Q l ( θ ) Q r ( θ ) 0.33 0.66 0.01 0.33 0.01 0.66

Random forests: learning algorithm 1. Randomly propose a set of splitting candidates φ = (𝐽, x) ( θ , τ ) (feature parameters θ and thresholds τ ). tree 1 2. Partition the set of examples Q = { ( I, x ) } into left and right subsets by each φ : Q l ( φ ) = { ( I, x ) | f ✓ ( I, x ) < τ } (3) 𝑄 𝑄 � (𝑑) Q r ( φ ) = Q \ Q l ( φ ) (4) 3. Compute the φ giving the largest gain in information: Training: φ ? = argmax G ( φ ) (5) 3 ¡trees ¡ • � depth ¡20 ¡ • | Q s ( φ ) | X G ( φ ) = H ( Q ) − H ( Q s ( φ )) (6) 1.000.000 ¡images ¡ • | Q | s 2 { l , r } 2000 ¡candidate ¡features • where Shannon entropy H ( Q ) is computed on the nor- 50 ¡thresholds ¡ per ¡feature • malized histogram of body part labels l I ( x ) for all ( I, x ) ∈ Q . 4. If the largest gain G ( φ ? ) is sufficient, and the depth in 1 ¡day ¡: 1000 ¡cores ¡cluster the tree is below a maximum, then recurse for left and right subsets Q l ( φ ? ) and Q r ( φ ? ) . [Shotton et al., CVPR 2011]

Examples Figure 5. Example inferences. Synthetic (top row); real (middle); failure [Shotton et al., CVPR 2011]

Dependencies of results on hyper-parameters [Shotton et al., CVPR 2011]

Structured random forests In the classical version, decision (=leaf) nodes contain predictions for a single pixel (a label or a posterior distribution) In the structured version, a decision node is assigned a rectangular patch of predictions. x ( u, v ) p 1. Training data example, as used in our proposed [Kontschieder et al., ICCV 2011]

Structured version : integration Integration over multiple pixels by vote: [Kontschieder et al., ICCV 2011]

Neural Decision Forests for Semantic Image Labelling Samuel Rota Bul` o Peter Kontschieder Fondazione Bruno Kessler Microsoft Research Trento, Italy Cambridge, UK rotabulo@fbk.eu pekontsc@microsoft.com [Rota Bulo and Kontschieder, CVPR 2014]

Neural split functions Classical random forest with neural split functions (𝐽, x) tree 1 W (1) ∈ R 4 × 4 +1 W (2) ∈ R 5 × 1 +1 R R R f ( x ) x 2 X R f (2) : R 4 → R 𝑄 𝑄 � (𝑑) R f (0) : X → R 3 f (1) : R 3 → R 4 [Rota Bulo and Kontschieder, CVPR 2014]

Learning the neural split function (1) Probabilistic loss function: 𝑄 = ( π ( L ) ) , π ( R ) ) Q ( Θ ) = max P [ y | X , π , Θ ] , π 𝑄 � (𝑑) Labels Node input samples Latent distributions of labels routed to Network parameters left and right child nodes of π = ( π ( L ) , π ( R ) ) , n Y P [ y | X , π , Θ ] = P [ y s | x s , π , Θ ] , Samples are independent: s =1 X P [ y s | x s , π , Θ ] = P [ y s | ψ s = d, π ] P [ ψ s = d | x s , Θ ] d ∈ { L , R } X π ( d ) = y s f d ( x s | Θ ) . d ∈ { L , R } [Rota Bulo and Kontschieder, CVPR 2014]

Learning the neural split function (2) Learning procedure alternates between two steps : Q ( Θ ) = max P [ y | X , π , Θ ] , π 1. Update child distributions 2. Update network parameters (backprop) [Rota Bulo and Kontschieder, CVPR 2014]

Results on semantic labelling φ ( x ) Input layer f (0) h x ∈ X Normalization Figure 1. Example input RGB image and learned representations of our rMLP taken from a hidden layer, visualized using heat-maps. E TRIMS 8 C AMVID Method Global Class-Avg Jaccard Global Class-Avg Jaccard RF Baseline 64.5 ± 1.6 59.6 ± 1.7 40.3 ± 1.1 64.0 41.6 27.2 NDF P 69.8 ± 1.8 64.3 ± 2.2 45.0 ± 1.9 67.4 46.5 30.8 NDF MLP 68.9 ± 2.0 62.4 ± 2.3 44.2 ± 2.1 67.1 44.4 30.1 NDF MLPC 69.7 ± 1.7 62.5 ± 2.1 44.7 ± 1.9 67.4 44.2 30.2 NDF MLPC − ` 1 71.7 ± 2.0 ( +7.2) 65.3 ± 2.3 ( +5.7) 46.9 ± 2.0 ( +6.6) 69.0 ( +5.0) 46.8 ( +5.2) 31.7 ( +4.5) RF Baseline 72.2 ± 1.9 68.0 ± 0.8 47.5 ± 1.0 68.5 50.3 32.4 NDF MLPC − ` 1 80.8 ± 0.7 ( +8.6) 74.6 ± 0.7 ( +6.6) 56.9 ± 1.2 ( +9.4) 82.1 ( +13.6) 56.1 ( +5.8) 43.3 ( +10.9) Best RF in [13] 76.1 72.3 - - - - Best in [14] 75.1 72.4 - - - - Best RF in [19] - - - - - 38.3 Best RF in [20] - - - 72.5 51.4 36.4 Best in [8] - - - 69.1 53.0 - Best in [35] - - - 73.7 36.3 29.6 [Rota Bulo and Kontschieder, CVPR 2014]

One model to rule them all … Deep Neural Decision Forests Peter Kontschieder 1 Madalina Fiterau ∗ , 2 Antonio Criminisi 1 o 1 , 3 Samuel Rota Bul` Microsoft Research 1 Carnegie Mellon University 2 Fondazione Bruno Kessler 3 Cambridge, UK Pittsburgh, PA Trento, Italy [Kontschieder et al., ICCV 2015]

Goals Combine neural networks and random forests Advantage of NN : representation learning Advantage of RR : divide and conquer Différentiable loss function, allowing gradient backprop « Backpropagation trees » [Kontschieder et al., ICCV 2015]

Random Forests vs. Deep Learning Christian Wolf Universit de Lyon, - PowerPoint PPT Presentation

Random Forests vs. Deep Learning Christian Wolf Universit de Lyon, INSA-Lyon LIRIS UMR CNRS 5205 November 26 th , 2015 RF vs. DL Goal: prediction (classification, regression) max pooling ConvD2 HLV1 HLV2 (, x) vD1 shared hid tree 1

Chapter 9 Object recognition Random Forests 9.9 Random forests 2 9.9 Random forests

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

Random forests and wine Machine Learning Toolbox Random forests Popular type of machine

A Look at our Wyoming Forests December 18 - 20, 2013 Governors Task Force on Forests Forests

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck & Co., Inc.

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Random

Forests and Climate Forests and Climate Keeping Earth a Livable Place Keeping Earth a Livable

South- -East East Pahang Pahang Peat Peat South Swamp Forests, Malaysia Swamp Forests,

Mangrove forests and sea level rise 1 / 48 00001 - 00:00:01 Mangrove forests and sea level rise

Introduction to Machine Learning Random Forests: Proximities compstat-lmu.github.io/lecture_i2ml

Econ 2148, fall 2019 Trees, forests, and causal trees Maximilian Kasy Department of Economics,

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Aron Yu Nov 2, 2012 1 Depth Image Body Parts 3D Joint Est. 2 Image Credit: Shotton et al.

343H: Honors AI Lecture 26: More applications 4/29/2014 Kristen Grauman UT Austin This week

7 CS221 / Spring 2020 / Finn & Anari It is generally not hard to motivate AI these days.

Project Update Shanghai Open Infrastructure Summit Airship Overview Airship is a collection

Trans ansfor orming ming Tec echnic hnical al As Assis sistan tance: ce: Using Using

of human actions Ivan Laptev ivan.laptev@inria.fr WILLOW, INRIA/ENS/CNRS, Paris Objects:

CSC 411: Lecture 06: Decision Trees Class based on Raquel Urtasun & Rich Zemels lectures

Classification and statistical machine learning Sylvain Arlot http://www.di.ens.fr/~arlot/ 1 Cnrs

Random Forests vs. Deep Learning Christian Wolf Universit de Lyon, - PowerPoint PPT Presentation

Random Forests vs. Deep Learning Christian Wolf Universit de Lyon, INSA-Lyon LIRIS UMR CNRS 5205 November 26 th , 2015 RF vs. DL Goal: prediction (classification, regression) max pooling ConvD2 HLV1 HLV2 (, x) vD1 shared hid tree 1

Chapter 9 Object recognition Random Forests 9.9 Random forests 2 9.9 Random forests

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

Random forests and wine Machine Learning Toolbox Random forests Popular type of machine

A Look at our Wyoming Forests December 18 - 20, 2013 Governors Task Force on Forests Forests

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck &amp; Co., Inc.

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Random

Forests and Climate Forests and Climate Keeping Earth a Livable Place Keeping Earth a Livable

South- -East East Pahang Pahang Peat Peat South Swamp Forests, Malaysia Swamp Forests,

Mangrove forests and sea level rise 1 / 48 00001 - 00:00:01 Mangrove forests and sea level rise

Introduction to Machine Learning Random Forests: Proximities compstat-lmu.github.io/lecture_i2ml

Econ 2148, fall 2019 Trees, forests, and causal trees Maximilian Kasy Department of Economics,

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Aron Yu Nov 2, 2012 1 Depth Image Body Parts 3D Joint Est. 2 Image Credit: Shotton et al.

343H: Honors AI Lecture 26: More applications 4/29/2014 Kristen Grauman UT Austin This week

7 CS221 / Spring 2020 / Finn &amp; Anari It is generally not hard to motivate AI these days.

Project Update Shanghai Open Infrastructure Summit Airship Overview Airship is a collection

Trans ansfor orming ming Tec echnic hnical al As Assis sistan tance: ce: Using Using

of human actions Ivan Laptev ivan.laptev@inria.fr WILLOW, INRIA/ENS/CNRS, Paris Objects:

CSC 411: Lecture 06: Decision Trees Class based on Raquel Urtasun &amp; Rich Zemels lectures

Classification and statistical machine learning Sylvain Arlot http://www.di.ens.fr/~arlot/ 1 Cnrs

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck & Co., Inc.

7 CS221 / Spring 2020 / Finn & Anari It is generally not hard to motivate AI these days.

CSC 411: Lecture 06: Decision Trees Class based on Raquel Urtasun & Rich Zemels lectures