Deep TEN: Texture Encoding Network
Hang Zhang, Jia Xue, Kristin Dana
Hang Zhang
1
Deep TEN: Texture Encoding Network Hang Zhang, Jia Xue, Kristin Dana - - PowerPoint PPT Presentation
Deep TEN: Texture Encoding Network Hang Zhang, Jia Xue, Kristin Dana 1 Hang Zhang Highlight and Overview Introduced Encoding-Net a new architecture of CNNs Achieved state-of-the-art results on texture recognition MINC-2500, FMD ,GTOS,
Hang Zhang, Jia Xue, Kristin Dana
Hang Zhang
1
a new architecture of CNNs
MINC-2500, FMD ,GTOS, KTH, 4D-Light
Hang Zhang
2
Hang Zhang
3
Hang Zhang
4
Feature extraction Filterbank responses or SIFT Hang Zhang
5
Feature extraction Dictionary Learning Hang Zhang
6
Feature extraction Dictionary Learning
Encoding
Bag-of-words , VQ or VLAD
Hang Zhang
7
Feature extraction Dictionary Learning
Encoding
Classifier
Hang Zhang
8
Feature extraction Dictionary Learning
Encoding
Hang Zhang
Classifier
9
Feature extraction Dictionary Learning
Encoding
Hang Zhang
10
Feature extraction Dictionary Learning
Encoding
Hang Zhang
11
Histogram Encoding Dictionary SIFT / Filter Bank Responses SVM
BoWs
Hang Zhang
12
encoders are fixed once built
are not benefiting from the labeled data
Histogram Encoding Dictionary SIFT / Filter Bank Responses SVM
BoWs
Fisher Vector Dictionary Pre-trained CNNs SVM
FV-CNN
Off-the-Shelf Hang Zhang
13
Residual Encoding Dictionary
Convolutional Layers FC Layer End-to-End
Deep-TEN
Encoding Layer Histogram Encoding Dictionary SIFT / Filter Bank Responses SVM
BoWs
Off-the-Shelf Fisher Vector Dictionary Pre-trained CNNs SVM
FV-CNN
Hang Zhang
14
6 =
:∈ %,…,
:= 6})
Hang Zhang
15
and standard deviation 𝐻BC
D = E 𝑏12 ( 1F%
𝑦1 − 𝑑2 𝐻GC
D = E 𝑏12 ( 1F%
𝑦1 − 𝑑2 6 − 1
𝑊
2 =
E 𝑦1 − 𝑑2
( 1F(( JK FBC
Hang Zhang
16
12 = 𝑦1 − 𝑑2
12
Input Dictionary Residuals Assign Aggregate
Encoding-Layer
Hang Zhang
17
1: 6)
1: 6) , :F%
12 6)
: 𝑠 1: 6) , :F%
Hang Zhang
18
Residual Encoding Dictionary
Convolutional Layers FC Layer End-to-End
Deep-TEN
Encoding Layer
Hang Zhang
19
Hang Zhang
20
Hang Zhang
21
Hang Zhang
22
Hang Zhang
23
Input Dictionary Residuals Assign Aggregate
Encoding-Layer
Hang Zhang
24
( 1F%
BXK
BY
Hang Zhang
25
Hang Zhang
26
12
12 = 𝑦1 − 𝑒2 ≈ 0
: (𝑘 ≠ 𝑙) is close to zero, since 𝑏1: = ^_` (abc dKc
e)
∑ ^_` (abf dKf e)
g fhi
≈ 0
Hang Zhang
27
Hang Zhang
28
Hang Zhang
29
Hang Zhang
30
Hang Zhang
31
Hang Zhang
32
Hang Zhang
33
Hang Zhang
34
Hang Zhang
35
36×36)
96×96)
CIFAR
STL Conv Layers E1 E2
Hang Zhang
36
Hang Zhang
37
The SoA for CIFAR-10 is 95.4% using 1,001 layers ResNet (He et. al. ECCV 2016)
into a single layer of CNN
texture recognition and achieved state-of-the-art results
allowing arbitrary input image sizes
learned features easier to transfer
Hang Zhang
38
Hang Zhang
39