Unsupervised Discovery of Object Landmarks as Structural Representations
Yuting Zhang1, Yijie Guo1, Yixin Jin1, Yijun Luo1, Zhiyuan He1, Honglak Lee1,2
1 University of Michigan, Ann Arbor 2 Google Brain
Unsupervised Discovery of Object Landmarks as Structural - - PowerPoint PPT Presentation
Unsupervised Discovery of Object Landmarks as Structural Representations Yuting Zhang 1 , Yijie Guo 1 , Yixin Jin 1 , Yijun Luo 1 , Zhiyuan He 1 , Honglak Lee 1,2 1 University of Michigan, Ann Arbor 2 Google Brain Structural representations of
Yuting Zhang1, Yijie Guo1, Yixin Jin1, Yijun Luo1, Zhiyuan He1, Honglak Lee1,2
1 University of Michigan, Ann Arbor 2 Google Brain
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
representations and explicit structures
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
latent representations and explicit structures
Can we train a deep neural network to get image representations of explicit structures without supervision?
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Can we train a deep neural network to get image representations of explicit structures without supervision?
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Unsupervised landmark discovery Image representation Latent features
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Image representation Latent features Image reconstruction Unsupervised landmark discovery
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Latent features Image reconstruction Unsupervised landmark discovery
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Latent features Image reconstruction
Unsupervised landmark discovery
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Latent features Image reconstruction
Training signal
Unsupervised landmark discovery
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Latent features Image reconstruction Unsupervised landmark discovery
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Input image Reconstructed image Latent features Landmark coordinates
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Input image Latent features Reconstructed image Landmark coordinates Input image Landmark coordinates
Related work:
James Thewlis, Hakan Bilen, and Andrea Vedaldi, “Unsupervised learning of object landmarks by factorized spatial embeddings,” In ICCV, 2017.
define a valid landmark detector
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Encoder-decoder with skip-links Foreground Background Input image Landmark coordinates
Channel-wise softmax
Heatmap to coordinate
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Encoder-decoder with skip-links Foreground Input image Landmark coordinates
Channel-wise softmax
Heatmap to coordinate Background
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Ours:
A foreground heatmap Isotropic Gaussian approximation
N ✓ (x, y), σ σ ◆
Landmark coordinate
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
the landmark coordinates can be arbitrary latent features.
(x1, y1) (x2, y2) … (xK, yK)
Can be arbitrary without physical meanings
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Original heatmap Gaussian heatmap
For a detector, the output heatmap should concentrate in a local region.
variance to be small.
Earlier stage Later stage
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Lsep =
1,...,K
X
k6=k0
exp k(xk0, yk0) (xk, yk)k2
2
2σ2
sep
!
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
transformation g. g
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
transformation g. g
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
transformation g. g
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
transformation g. g
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Leqv =
K
X
k=1
kg(x0
k, y0 k) (xk, yk)k2 2 (Thewlis et al, 2017) James Thewlis, Hakan Bilen, and Andrea Vedaldi, “Unsupervised learning of object landmarks by factorized spatial embeddings,” In ICCV, 2017.
transformation g. g
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Input image Latent features Reconstructed image Landmark coordinates Input image Landmark coordinates
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Input image Latent features Reconstructed image Landmark coordinates
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Input image Latent features Reconstructed image Landmark coordinates
Input image Latent features
differentiable pooling masks
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Input image Latent features
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
H W # channels
Gaussian heatmap
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
H W # channels Weighted global average pooling # channels
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
H W # channels
Weighted global average pooling
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
H W # channels
Weighted global average pooling
Original heatmap
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Weighted global average pooling using each heatmap Latent feature attached to each landmark # channels
H W # channels
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Weighted global average pooling using each heatmap Latent feature attached to each landmark # channels
H W # channels
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Input image Latent features
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Input image Latent features Reconstructed image Landmark coordinates
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Input image Latent features Reconstructed image Landmark coordinates
Latent features Reconstructed image Landmark coordinates
feature encoding
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Input image Latent features Reconstructed image Landmark coordinates Latent features Reconstructed image Landmark coordinates
.… .
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
… Normalization across channels .….
Gaussian heatmap with fixed variance Uniform heatmap for BG
# channels
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
… …
Latent features
…
Pixel-wise duplication
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
…
… …
Latent features
Pixel-wise duplication
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
…
… … …
Latent features
Pixel-wise duplication
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
…
… … …
Latent features
Pixel-wise duplication
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
… … … …
Latent features
Pixel-wise duplication
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Latent features Reconstructed image Landmark coordinates
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Input image Latent features Reconstructed image Landmark coordinates
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Ours Thewlis et al.
(Thewlis et al.) James Thewlis, Hakan Bilen, and Andrea Vedaldi, “Unsupervised learning of object landmarks by factorized spatial embeddings,” In ICCV, 2017.
Thewlis at al. Ours
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Thewlis et al. Errors
Forehead landmark to the left Lower-lip landmark to the right Mouth-corner landmark on the forehead Right-eyebrow landmark on the left side Forehead landmark to the left
Thewlis at al. Incorrect landmarks Expected correct location
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Ours
Ours
Thewlis et al. Errors
Forehead landmark to the left Lower-lip landmark to the right Mouth-corner landmark on the forehead Right-eyebrow landmark on the left side Forehead landmark to the left
Thewlis at al.
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
human-annotated landmarks without finetuning the neural network.
Linear regression
Discovered landmarks Human-annotated landmarks
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
7.95 9.73 5.39 6.67 5.83 3.46 3.15
2 4 6 8 10 12
Supervised Ours Thewlis et al. Supervised method Localization error
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
0.5 1 1.5 2 2 3 4 5 6 7 8 9 10
TCDCN (error: 7.95) MTCNN (error: 5.39) Ours (10 landmarks) Ours (30 landmarks)
Localization error # label samples
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
4.14 14.84 15.35 5.80 5.87 7.51 26.94 26.76 11.11 11.42
5 10 15 20 25 30
Human (16 discovered, 32 target) Cat (20 discovered, 7 target) Cat (10 discovered, 7 target) Car (24 discovered, 5 target) Car (10 discovered, 5 target)
Thewlis et al. Ours
Localization error
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Arched Eyebrows, Bags Under Eyes, Big Lips, Big Nose, Double Chin, High Cheekbones, Male, Mouth Slightly Open, Narrow Eyes, Oval Face, Pointy Nose, Receding Hairline, Smiling
Method Feature dimension Accuracy Ours (discovered landmarks) 60 83.2
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
[FaceNet] Florian Schroff, Dmitry Kalenichenko, and James Philbin, “FaceNet: A unified embedding for face recognition and clustering,” in CVPR, 2015
Arched Eyebrows, Bags Under Eyes, Big Lips, Big Nose, Double Chin, High Cheekbones, Male, Mouth Slightly Open, Narrow Eyes, Oval Face, Pointy Nose, Receding Hairline, Smiling
Method Feature dimension Accuracy Ours (discovered landmarks) 60 83.2 FaceNet (top-layer) 128 80.0 FaceNet (conv-layer) 1792 82.4
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Landmark discovery Latent features Image reconstruction
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Latent features Image reconstruction
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Discovered landmarks Manipulated landmarks
Original images Manipulated images L a n d m a r k s
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Original images Manipulated images Landmarks
Discovered landmarks Manipulated landmarks
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
manipulating all 16 landmarks
Landmarks Manipulated images Original images
Discovered landmarks Manipulated landmarks
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
with explicit structures
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
with explicit structures
Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations
Project page (Code & results):
http://ytzhang.net/projects/lmdis-rep
Unsupervised Discovery of Object Landmarks as Structural Representations