random forests vs deep learning
play

Random Forests vs. Deep Learning Christian Wolf Universit de Lyon, - PowerPoint PPT Presentation

Random Forests vs. Deep Learning Christian Wolf Universit de Lyon, INSA-Lyon LIRIS UMR CNRS 5205 November 26 th , 2015 RF vs. DL Goal: prediction (classification, regression) max pooling ConvD2 HLV1 HLV2 (, x) vD1 shared hid tree 1


  1. Random Forests vs. Deep Learning Christian Wolf Université de Lyon, INSA-Lyon LIRIS UMR CNRS 5205 November 26 th , 2015

  2. RF vs. DL Goal: prediction (classification, regression) max pooling ConvD2 HLV1 HLV2 (𝐽, x) vD1 shared hid tree 1 HLS vC1 ConvC2 ConvD2 HLV1 𝑄 � 𝑄 � (𝑑) vD1 HLV2 (𝐽, x) vC1 ConvC2 HLM2 HLM3 tree 𝑈 … HLM1 eature extractor 𝑄 � (𝑑) HLA2 HLA1 equency ConvA1 ograms

  3. Deep networks - Many layers, many parameters …. and all of them are used for testing, for each single sample! - Feature learning integred into classification - End-to-end training, using gradient of the loss function max pooling ConvD2 Layer Filter size / n.o. units N.o. parameters Pooling HLV1 Paths V1, V2 HLV2 Input D1,D2 72 × 72 × 5 - 2 × 2 × 1 ConvD1 25 × 5 × 5 × 3 1900 2 × 2 × 3 vD1 ConvD2 25 × 5 × 5 650 1 × 1 shared hid HLS Input C1,C2 72 × 72 × 5 - 2 × 2 × 1 vC1 ConvC1 25 × 5 × 5 × 3 1900 2 × 2 × 3 ConvC2 ConvD2 ConvC2 25 × 5 × 5 650 1 × 1 HLV1 HLV1 3 240 900 - 900 HLV2 450 405 450 - vD1 Path M Input M 183 - HLV2 HLM1 700 128 800 - vC1 ConvC2 HLM2 HLM3 HLM2 700 490 700 - HLM3 350 245 350 - Path A Input A 40 × 9 - 1 × 1 HLM1 eature ConvA1 25 × 5 × 5 650 1 × 1 extractor HLA1 700 3 150 000 - HLA2 350 245 350 - Shared layers HLA2 HLA1 HLS1 3 681 600 - equency ConvA1 1600 ograms HLS2 84 134 484 - Output layer 21 1785 - 12.4M per scale = 37.2M parameters total!

  4. Random Forests - Many levels, many parameters …. but only log 2 (N) of them are used for testing! - Training is done layer-wise, no end-to-end. No gradient on the objective function - No/limited feature learning (𝐽, x) tree 1 𝑄 𝑄 � (𝑑)

  5. RF vs DL : applications (1) Full body pose, Random Forest Hand pose with a deep network 3 Trees, depth 20 Semi/weakly supervised training >10M parameters 8 layers, ~5M parameters [Shotton et al., CVPR 2011] [Neverova, Wolf, Nebout, Taylor, (Microsoft Research) under review, arXiv 2015]

  6. RF vs DL : applications (2) Scene parsing with structured Scene parsing with deep networks random forests (5 layers, ~2M parameters) [Kontschieder et al., CVPR 2014] [Fourure, Emonet, Fromont, Muselet, (Microsoft Research) Tremeau, Wolf, under review]

  7. Types of random forests (𝐽, x) Classical random forests tree 1 𝑄 � 𝑄 � (𝑑) Structured random forests R Neural random forests R 2 X R R f (0) : X → R 3 Deep convolutional random forests f 11 f 9 f 12 f 8 f 13 f 10 f 14 d 8 d 9 d 10 d 11 d 12 d 13 d 14 π 9 π 10 π 11 π 12 π 13 π 14 π 15 π 16

  8. Example for classical RF Real-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton Andrew Fitzgibbon Mat Cook Toby Sharp Mark Finocchio Richard Moore Alex Kipman Andrew Blake Microsoft Research Cambridge & Xbox Incubation [Shotton et al., CVPR 2011] (Best Paper!)

  9. Depth images -> 3D joint locations depth image body parts 3D joint proposals [Shotton et al., CVPR 2011]

  10. synthetic (train & test) 31 ¡body ¡parts (Labels) real (test) [Shotton et al., CVPR 2011]

  11. Classification with random forests (𝐽, x) (𝐽, x) tree 1 tree 𝑈 … 𝑄 � (𝑑) 𝑄 � (𝑑) Each split node thresholds one of the features Each leaf node contains a class distribution ution P t ( c | I, x ) Class distributions are averaged over the trees: utions are averaged T P ( c | I, x ) = 1 X P t ( c | I, x ) . T t =1 [Shotton et al., CVPR 2011]

  12. Learning & Entropy A good split function minimizes entropy in the label distributions. Q 444 0.2 0.4 0.4 Q l ( θ ) Q r ( θ ) 0.33 0.66 0.01 0.33 0.01 0.66

  13. Random forests: learning algorithm 1. Randomly propose a set of splitting candidates φ = (𝐽, x) ( θ , τ ) (feature parameters θ and thresholds τ ). tree 1 2. Partition the set of examples Q = { ( I, x ) } into left and right subsets by each φ : Q l ( φ ) = { ( I, x ) | f ✓ ( I, x ) < τ } (3) 𝑄 𝑄 � (𝑑) Q r ( φ ) = Q \ Q l ( φ ) (4) 3. Compute the φ giving the largest gain in information: Training: φ ? = argmax G ( φ ) (5) 3 ¡trees ¡ • � depth ¡20 ¡ • | Q s ( φ ) | X G ( φ ) = H ( Q ) − H ( Q s ( φ )) (6) 1.000.000 ¡images ¡ • | Q | s 2 { l , r } 2000 ¡candidate ¡features • where Shannon entropy H ( Q ) is computed on the nor- 50 ¡thresholds ¡ per ¡feature • malized histogram of body part labels l I ( x ) for all ( I, x ) ∈ Q . 4. If the largest gain G ( φ ? ) is sufficient, and the depth in 1 ¡day ¡: 1000 ¡cores ¡cluster the tree is below a maximum, then recurse for left and right subsets Q l ( φ ? ) and Q r ( φ ? ) . [Shotton et al., CVPR 2011]

  14. Examples Figure 5. Example inferences. Synthetic (top row); real (middle); failure [Shotton et al., CVPR 2011]

  15. Dependencies of results on hyper-parameters [Shotton et al., CVPR 2011]

  16. Types of random forests (𝐽, x) Classical random forests tree 1 𝑄 � 𝑄 � (𝑑) Structured random forests R Neural random forests R 2 X R R f (0) : X → R 3 Deep convolutional random forests f 11 f 9 f 12 f 8 f 13 f 10 f 14 d 8 d 9 d 10 d 11 d 12 d 13 d 14 π 9 π 10 π 11 π 12 π 13 π 14 π 15 π 16

  17. Structured random forests In the classical version, decision (=leaf) nodes contain predictions for a single pixel (a label or a posterior distribution) In the structured version, a decision node is assigned a rectangular patch of predictions. x ( u, v ) p 1. Training data example, as used in our proposed [Kontschieder et al., ICCV 2011]

  18. Structured version : integration Integration over multiple pixels by vote: [Kontschieder et al., ICCV 2011]

  19. Types of random forests (𝐽, x) Classical random forests tree 1 𝑄 � 𝑄 � (𝑑) Structured random forests R Neural random forests R 2 X R R f (0) : X → R 3 Deep convolutional random forests f 11 f 9 f 12 f 8 f 13 f 10 f 14 d 8 d 9 d 10 d 11 d 12 d 13 d 14 π 9 π 10 π 11 π 12 π 13 π 14 π 15 π 16

  20. Neural Decision Forests for Semantic Image Labelling Samuel Rota Bul` o Peter Kontschieder Fondazione Bruno Kessler Microsoft Research Trento, Italy Cambridge, UK rotabulo@fbk.eu pekontsc@microsoft.com [Rota Bulo and Kontschieder, CVPR 2014]

  21. Neural split functions Classical random forest with neural split functions (𝐽, x) tree 1 W (1) ∈ R 4 × 4 +1 W (2) ∈ R 5 × 1 +1 R R R f ( x ) x 2 X R f (2) : R 4 → R 𝑄 𝑄 � (𝑑) R f (0) : X → R 3 f (1) : R 3 → R 4 [Rota Bulo and Kontschieder, CVPR 2014]

  22. Learning the neural split function (1) Probabilistic loss function: 𝑄 = ( π ( L ) ) , π ( R ) ) Q ( Θ ) = max P [ y | X , π , Θ ] , π 𝑄 � (𝑑) Labels Node input samples Latent distributions of labels routed to Network parameters left and right child nodes of π = ( π ( L ) , π ( R ) ) , n Y P [ y | X , π , Θ ] = P [ y s | x s , π , Θ ] , Samples are independent: s =1 X P [ y s | x s , π , Θ ] = P [ y s | ψ s = d, π ] P [ ψ s = d | x s , Θ ] d ∈ { L , R } X π ( d ) = y s f d ( x s | Θ ) . d ∈ { L , R } [Rota Bulo and Kontschieder, CVPR 2014]

  23. Learning the neural split function (2) Learning procedure alternates between two steps : Q ( Θ ) = max P [ y | X , π , Θ ] , π 1. Update child distributions 2. Update network parameters (backprop) [Rota Bulo and Kontschieder, CVPR 2014]

  24. Results on semantic labelling φ ( x ) Input layer f (0) h x ∈ X Normalization Figure 1. Example input RGB image and learned representations of our rMLP taken from a hidden layer, visualized using heat-maps. E TRIMS 8 C AMVID Method Global Class-Avg Jaccard Global Class-Avg Jaccard RF Baseline 64.5 ± 1.6 59.6 ± 1.7 40.3 ± 1.1 64.0 41.6 27.2 NDF P 69.8 ± 1.8 64.3 ± 2.2 45.0 ± 1.9 67.4 46.5 30.8 NDF MLP 68.9 ± 2.0 62.4 ± 2.3 44.2 ± 2.1 67.1 44.4 30.1 NDF MLPC 69.7 ± 1.7 62.5 ± 2.1 44.7 ± 1.9 67.4 44.2 30.2 NDF MLPC − ` 1 71.7 ± 2.0 ( +7.2) 65.3 ± 2.3 ( +5.7) 46.9 ± 2.0 ( +6.6) 69.0 ( +5.0) 46.8 ( +5.2) 31.7 ( +4.5) RF Baseline 72.2 ± 1.9 68.0 ± 0.8 47.5 ± 1.0 68.5 50.3 32.4 NDF MLPC − ` 1 80.8 ± 0.7 ( +8.6) 74.6 ± 0.7 ( +6.6) 56.9 ± 1.2 ( +9.4) 82.1 ( +13.6) 56.1 ( +5.8) 43.3 ( +10.9) Best RF in [13] 76.1 72.3 - - - - Best in [14] 75.1 72.4 - - - - Best RF in [19] - - - - - 38.3 Best RF in [20] - - - 72.5 51.4 36.4 Best in [8] - - - 69.1 53.0 - Best in [35] - - - 73.7 36.3 29.6 [Rota Bulo and Kontschieder, CVPR 2014]

  25. Types of random forests (𝐽, x) Classical random forests tree 1 𝑄 � 𝑄 � (𝑑) Structured random forests R Neural random forests R 2 X R R f (0) : X → R 3 Deep convolutional random forests f 11 f 9 f 12 f 8 f 13 f 10 f 14 d 8 d 9 d 10 d 11 d 12 d 13 d 14 π 9 π 10 π 11 π 12 π 13 π 14 π 15 π 16

  26. One model to rule them all … Deep Neural Decision Forests Peter Kontschieder 1 Madalina Fiterau ∗ , 2 Antonio Criminisi 1 o 1 , 3 Samuel Rota Bul` Microsoft Research 1 Carnegie Mellon University 2 Fondazione Bruno Kessler 3 Cambridge, UK Pittsburgh, PA Trento, Italy [Kontschieder et al., ICCV 2015]

  27. Goals Combine neural networks and random forests Advantage of NN : representation learning Advantage of RR : divide and conquer Différentiable loss function, allowing gradient backprop « Backpropagation trees » [Kontschieder et al., ICCV 2015]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend