Random Forests vs. Deep Learning
Christian Wolf Université de Lyon, INSA-Lyon LIRIS UMR CNRS 5205 November 26th, 2015
Random Forests vs. Deep Learning Christian Wolf Universit de Lyon, - - PowerPoint PPT Presentation
Random Forests vs. Deep Learning Christian Wolf Universit de Lyon, INSA-Lyon LIRIS UMR CNRS 5205 November 26 th , 2015 RF vs. DL Goal: prediction (classification, regression) max pooling ConvD2 HLV1 HLV2 (, x) vD1 shared hid tree 1
Christian Wolf Université de Lyon, INSA-Lyon LIRIS UMR CNRS 5205 November 26th, 2015
tree 1
(𝐽, x) 𝑄 𝑄
(𝑑)
…
tree 𝑈
(𝐽, x) 𝑄(𝑑)
HLV2 ConvD2 vD1 vC1 vC1 ConvC2 ConvC2 max pooling ConvD2 vD1 HLV1 shared hid HLS HLV1 HLV2 HLA2 HLM3 HLM2 HLM1 ConvA1 HLA1 equency
eature extractor
for each single sample!
HLV2 ConvD2 vD1 vC1 vC1 ConvC2 ConvC2 max pooling ConvD2 vD1 HLV1 shared hid HLS HLV1 HLV2 HLA2 HLM3 HLM2 HLM1 ConvA1 HLA1 equency
eature extractor
Layer Filter size / n.o. units N.o. parameters Pooling Paths V1, V2 Input D1,D2 72×72×5
ConvD1 25×5×5×3 1900 2×2×3 ConvD2 25×5×5 650 1×1 Input C1,C2 72×72×5
ConvC1 25×5×5×3 1900 2×2×3 ConvC2 25×5×5 650 1×1 HLV1 900 3 240 900
450 405 450
Input M 183
700 128 800
700 490 700
350 245 350
Input A 40×9
ConvA1 25×5×5 650 1×1 HLA1 700 3 150 000
350 245 350
HLS1 1600 3 681 600
84 134 484
21 1785
(𝑑)
[Shotton et al., CVPR 2011] (Microsoft Research) [Neverova, Wolf, Nebout, Taylor, under review, arXiv 2015]
Full body pose, Random Forest 3 Trees, depth 20 >10M parameters Hand pose with a deep network Semi/weakly supervised training 8 layers, ~5M parameters
[Kontschieder et al., CVPR 2014] (Microsoft Research) [Fourure, Emonet, Fromont, Muselet, Tremeau, Wolf, under review]
Scene parsing with structured random forests Scene parsing with deep networks (5 layers, ~2M parameters)
tree 1
(𝐽, x) 𝑄 𝑄
(𝑑)
d8 d9 d11 π9 π10 d12 π11 π12 d10 d13 π13 π14 d14 π15 π16 f14 f10 f13 f8 f12 f9 f112 X R R R R
f(0) : X → R3
Real-Time Human Pose Recognition in Parts from Single Depth Images
Jamie Shotton Andrew Fitzgibbon Mat Cook Toby Sharp Mark Finocchio Richard Moore Alex Kipman Andrew Blake Microsoft Research Cambridge & Xbox Incubation [Shotton et al., CVPR 2011] (Best Paper!)
depth image body parts 3D joint proposals
[Shotton et al., CVPR 2011]
[Shotton et al., CVPR 2011]
(𝐽, x) (𝐽, x) 𝑄(𝑑) 𝑄
(𝑑)
T
t=1
Each split node thresholds one of the features Each leaf node contains a class distribution Class distributions are averaged over the trees:
ution Pt(c|I, x) utions are averaged
[Shotton et al., CVPR 2011]
444 0.2 0.4 0.4 0.33 0.66 0.01 0.33 0.01 0.66 Ql(θ) Q Qr(θ)
(θ, τ) (feature parameters θ and thresholds τ).
and right subsets by each φ: Ql(φ) = { (I, x) | f✓(I, x) < τ } (3) Qr(φ) = Q \ Ql(φ) (4)
φ? = argmax
(5) G(φ) = H(Q) − X
s2{l,r}
|Qs(φ)| |Q| H(Qs(φ)) (6) where Shannon entropy H(Q) is computed on the nor- malized histogram of body part labels lI(x) for all (I, x) ∈ Q.
the tree is below a maximum, then recurse for left and right subsets Ql(φ?) and Qr(φ?).
tree 1
(𝐽, x) 𝑄 𝑄
(𝑑)
Training:
[Shotton et al., CVPR 2011]
Figure 5. Example inferences. Synthetic (top row); real (middle); failure
[Shotton et al., CVPR 2011]
[Shotton et al., CVPR 2011]
tree 1
(𝐽, x) 𝑄 𝑄
(𝑑)
d8 d9 d11 π9 π10 d12 π11 π12 d10 d13 π13 π14 d14 π15 π16 f14 f10 f13 f8 f12 f9 f112 X R R R R
f(0) : X → R3
[Kontschieder et al., ICCV 2011]
[Kontschieder et al., ICCV 2011]
tree 1
(𝐽, x) 𝑄 𝑄
(𝑑)
d8 d9 d11 π9 π10 d12 π11 π12 d10 d13 π13 π14 d14 π15 π16 f14 f10 f13 f8 f12 f9 f112 X R R R R
f(0) : X → R3
[Rota Bulo and Kontschieder, CVPR 2014]
rotabulo@fbk.eu
pekontsc@microsoft.com
x 2 X
+1
R R R R
+1
R f(x)
f(1) : R3 → R4 f(0) : X → R3 f(2) : R4 → R W(1) ∈ R4×4 W(2) ∈ R5×1
(𝑑)
[Rota Bulo and Kontschieder, CVPR 2014]
Q(Θ) = max
π
P[y|X, π, Θ] , P[y|X, π, Θ] =
n
Y
s=1
P[ys|xs, π, Θ] ,
Samples are independent: Latent distributions of labels routed to left and right child nodes Labels Node input samples Network parameters
P[ys|xs, π, Θ] = X
d∈{L,R}
P[ys|ψs = d, π]P[ψs = d|xs, Θ] = X
d∈{L,R}
π(d)
ys fd(xs|Θ) .
= (π(L)
), π(R))
[Rota Bulo and Kontschieder, CVPR 2014]
Q(Θ) = max
π
P[y|X, π, Θ] ,
1. Update child distributions 2. Update network parameters (backprop)
[Rota Bulo and Kontschieder, CVPR 2014]
Method ETRIMS8 CAMVID Global Class-Avg Jaccard Global Class-Avg Jaccard RF Baseline 64.5 ±1.6 59.6 ±1.7 40.3 ±1.1 64.0 41.6 27.2 NDFP 69.8 ±1.8 64.3 ±2.2 45.0 ±1.9 67.4 46.5 30.8 NDFMLP 68.9 ±2.0 62.4 ±2.3 44.2 ±2.1 67.1 44.4 30.1 NDFMLPC 69.7 ±1.7 62.5 ±2.1 44.7 ±1.9 67.4 44.2 30.2 NDFMLPC−`1 71.7 ±2.0 (+7.2) 65.3 ±2.3 (+5.7) 46.9 ±2.0 (+6.6) 69.0 (+5.0) 46.8 (+5.2) 31.7 (+4.5) RF Baseline 72.2 ±1.9 68.0 ±0.8 47.5 ±1.0 68.5 50.3 32.4 NDFMLPC−`1 80.8 ±0.7 (+8.6) 74.6 ±0.7 (+6.6) 56.9 ±1.2 (+9.4) 82.1 (+13.6) 56.1 (+5.8) 43.3 (+10.9) Best RF in [13] 76.1 72.3
75.1 72.4
Best RF in [20]
51.4 36.4 Best in [8]
53.0
36.3 29.6
h φ(x) Input layer f(0) x ∈ X Normalization
Figure 1. Example input RGB image and learned representations of our rMLP taken from a hidden layer, visualized using heat-maps.
[Rota Bulo and Kontschieder, CVPR 2014]
tree 1
(𝐽, x) 𝑄 𝑄
(𝑑)
d8 d9 d11 π9 π10 d12 π11 π12 d10 d13 π13 π14 d14 π15 π16 f14 f10 f13 f8 f12 f9 f112 X R R R R
f(0) : X → R3
Deep Neural Decision Forests
Peter Kontschieder1 Madalina Fiterau∗,2 Antonio Criminisi1 Samuel Rota Bul`
Microsoft Research1 Carnegie Mellon University2 Fondazione Bruno Kessler3 Cambridge, UK Pittsburgh, PA Trento, Italy
[Kontschieder et al., ICCV 2015]
[Kontschieder et al., ICCV 2015]
(𝐽, x) 𝑄 𝑄
(𝑑)
[Kontschieder et al., ICCV 2015]
`∈L
where π = (π`)`∈L and ⇡`y denotes the probability of a sample reaching leaf ` to take on class y, while µ`(x|Θ) is regarded as the routing function providing the probability that sample x will reach leaf `. Clearly, P
` µ`(x|Θ) = 1
for all x 2 X.
[Kontschieder et al., ICCV 2015]
d1 d2 d4 d5 d3 d6 d7
Figure 1. Each node n ∈ N of the tree performs routing decisions via function dn(·) (we omit the parametrization Θ). The black path shows an exemplary routing of a sample x along a tree to reach leaf `4, which has probability µ`4 = d1(x) ¯ d2(x) ¯ d5(x).
[Kontschieder et al., ICCV 2015]
d1 d2 d4 π1 π2 d5 π3 π4 d3 d6 π5 π6 d7 π7 π8 f7 f3 f6 f1 f5 f2 f4 d8 d9 d11 π9 π10 d12 π11 π12 d10 d13 π13 π14 d14 π15 π16 f14 f10 f13 f8 f12 f9 f11 FC Deep CNN with parameters Θ
dn(x; Θ) = (fn(x; Θ)) , (3) where (x) = (1 + ex)1 is the sigmoid function, and fn(·; Θ) : X → R is a real-valued function depending
. Further details
[Kontschieder et al., ICCV 2015]
[Kontschieder et al., ICCV 2015]
GoogLeNet [36] GoogLeNet? dNDF.NET # Models 1 7 1 1 7 # Crops 1 10 144 1 10 144 1 1 10 1 Top5-Errors 10.07% 9.15% 7.89% 8.09% 7.62% 6.67% 10.02% 7.84% 7.08% 6.38%
Table 2. Top5-Errors obtained on ImageNet validation data, comparing our dNDF.NET to GoogLeNet(?).
[Kontschieder et al., ICCV 2015]
# # # 100 200 300 400 500 600 700 800 900 1000 3 4 5 6 7 8 9
# Training Epochs Average Leaf Entropy [bits] Average Leaf Entropy during Training
[Kontschieder et al., ICCV 2015]
# # #
#Training Epochs
200 400 600 800 1000
Top5-Error [%]
20 40 60 80 100
#Training Epochs
500 550 600 650 700 750 800 850 900 950 1000
Top5-Error [%]
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 11.5 12
ImageNet Top5-Errors
dNDF0 on Validation dNDF1 on Validation dNDF2 on Validation dNDF.NET on Validation dNDF.NET on Training
Figure 5. Top5-Error plots for individual dNDFx used in dNDF.NET as well as their joint ensemble errors. Left: Plot over all 1000 training epochs. Right: Zoomed version of left plot, showing Top5-Errors from 0–12% between training epochs 500- 1000.
[Kontschieder et al., ICCV 2015]
tree 1
(𝐽, x) 𝑄 𝑄
(𝑑)
HLV2 ConvD2 vD1 vC1 vC1 ConvC2 ConvC2 max pooling ConvD2 vD1 HLV1 shared hid HLS HLV1 HLV2 HLA2 HLM3 HLM2 HLM1 ConvA1 HLA1 equency