Inference Networks, Graph Convolutional Networks
Greg Mori
School of Computing Science Simon Fraser University
Inference Networks, Graph Convolutional Networks Greg Mori School - - PowerPoint PPT Presentation
Inference Networks, Graph Convolutional Networks Greg Mori School of Computing Science Simon Fraser University Outline Scene Image annotation with label hierarchies outdoor outdoor indoor man-made natural house sports man-made
School of Computing Science Simon Fraser University
Indoor
man-made
natural leisure sports field man-made elements cabins houses trench pitcher mound batter’s box play- ground barn field bat base ball grass person building
Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016
Indoor
man- made
natural leisure man- made elements cabins houses sports field field bat base ball grass person building trench pitcher mound batter’s box play- ground barn
Positive correlation Negative correlation
Visual Architecture (CNN) Initial Activation Inference Machine
Knowledge Graph Refined Probability Back-propagate Gradient from Loss Function
An End-to-end Trainable System
Metadata
Partial Label
SINN Prediction with Partial Human Labels
Activations from partial human labels Information Propagation Visual Activation Output Activation Prediction
sports field batter box baseball, bat, people, field
CNN
Label: Outdoor Man-made Reverse Sigmoid SINN Prediction with Partial Human Labels
Activations from partial human labels Information Propagation Visual Activation Output Activation Prediction
sports field batter box baseball, bat, people, field
CNN
Label: Outdoor Man-made Reverse Sigmoid SINN Prediction with Partial Human Labels
Activations from partial human labels Information Propagation Visual Activation Output Activation Prediction
sports field batter box baseball, bat, people, field
CNN
Label: Outdoor Man-made Reverse Sigmoid
SINN Prediction with Partial Human Labels
Activations from partial human labels Information Propagation Visual Activation Output Activation Prediction
sports field batter box baseball, bat, people, field
CNN
Label: Outdoor Man-made Reverse Sigmoid SINN Prediction with Partial Human Labels
Activations from partial human labels Information Propagation Visual Activation Output Activation Prediction
sports field batter box baseball, bat, people, field
CNN
Label: Outdoor Man-made Reverse Sigmoid SINN Prediction with Partial Human Labels
Activations from partial human labels Information Propagation Visual Activation Output Activation Prediction
sports field batter box baseball, bat, people, field
CNN
Label: Outdoor Man-made Reverse Sigmoid
Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016
Visual Architect ure
Top-down inference
within each layer of label graph
ai
t = Vt−1,t · ai t−1 + Ht · xi t + bt
Activation at current concept layer Vertical weight propagates information across concept layers Horizontal weight propagates information within concept layers
Activation at last concept layer
xi
t = Wt · CNN(Ii) + bt
Produce initial visual activation from CNN
Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016
Bidirectional inference Visual Architect ure
− → a i
t =−
→ V t−1,t · − → a i
t−1 + −
→ H t · xi
t + −
→ b t, ← − a i
t =←
− V t+1,t · ← − a i
t+1 + ←
− H t · xi
t + ←
− b t, ai
t =−
→ U t · − → a i
t + ←
− U t · ← − a i
t + bt
Class Zebra Leopard Cat Hound Attributes Fast Striped Spotted Domestic
Positive Correlation Negative Correlation
prior knowledge about label correlations
into Positive correlation + Negative correlation
Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016
ReLU neuron is essential to keep positive/neg ative contribution
γ(x) = ReLU(x)
Structured inference Visual Architect ure
− → a i
t = γ(−
→ V +
t−1,t · −
→ a i
t−1) + γ(−
→ H +
t · xi t)
− γ(− → V −
t−1,t · −
→ a i
t−1) − γ(−
→ H −
t · xi t) + −
→ b t, ← − a i
t = γ(←
− V +
t+1,t · ←
− a i
t+1) + γ(←
− H +
t · xi t)
− γ(← − V −
t+1,t · ←
− a i
t+1) − γ(←
− H −
t · xi t) + ←
− b t, ai
t = −
→ U t · − → a i
t + ←
− U t · ← − a i
t + bt
Positive Component Negative Component
SINN Prediction
Information Propagation Visual Activation Output Activation Prediction
manmade sports field batter box bat, people, water
CNN
Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016
SINN Prediction with Partial Human Labels
Activations from partial human labels Information Propagation Visual Activation Output Activation Prediction
sports field batter box baseball, bat, people, field
CNN
Label: Outdoor Man-made Reverse Sigmoid
a(y) = log 1 1 − g(y), g(y) = ( y + ✏, if y = 0, y − ✏, if y = 1.
Use a small epsilon to keep numerical stability (0.005) Inverse of sigmoid
y = σ(x) = 1 1 + exp−x
Animals with Attributes
[Lampert et al. 2009]
Labels 28 taxonomy terms 50 animal classes 85 attributes
constructed from Word Net as [Hwang et al. 2012]
by combining class-attributes graph with taxonomy graph
Task: predict entire label set NUS-WIDE
[Chua et al. 2009]
Labels
698 image groups 81 concepts 1000 tags
Task: predict 81 concepts with observing tags/image groups
Word Net using semantic similarity
from image meta data
SUN 397
[Xiao et al. 2012]
Labels 3 coarse 16 general 397 fine- grained Task 1: predict entire label set Task 2: predict fine- grained scene given coarse scene category
dataset
75 80 85 90 95 100 28 Taxonomy Terms 50 Animal Classes 85 Attributes
Animal With Attributes
CNN + Logistics CNN + BINN CNN + SINN
50 60 70 80 90 100 3 Coarse Scene Categories 16 General Scene Categories 397 Fine-grained Scene Categories
SUN 397
CNN + Logistics CNN + BINN CNN + SINN
Ground Truth: railroad CNN + Logistic: statue buildings person Our Predictions: railroad person sky Ground Truth: animal grass water dog CNN + Logistic: grass person animal Our Predictions: water animal dog Ground Truth: rainbow clouds sky CNN + Logistic: clouds water sky Our Predictions: rainbow clouds sky Ground Truth: food water CNN + Logistic: food plants flower Our Predictions: food plants water
Correct predictions are marked in blue while incorrect are marked in red
69.24
40 45 50 55 60 65 70
mAP per class
65 70 75 80 85
mAP per image 1k tags + groups + CNN + SINN 1k tags + CNN + SINN 1k tags + groups + CNN + Logistics 1k tags + groups + Logistics CNN + Logistics 1k tags + Logistics 5k tags + Tag neighbors [Johnson et. al. 2015] 5k tags + Logistics [Johnson et. al. 2015]
Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016
Correct predictions are marked in blue while incorrect are marked in red
CNN + Logistic: campus Observed Label:
Our Predictions: abbey Ground Truth: abbey CNN + Logistic: building facade Observed Label:
Our Predictions: library/outdoor Ground Truth: library/outdoor CNN + Logistic: patio Observed Label:
Our Predictions: picnic area Ground Truth: picnic area CNN + Logistic:
Observed Label: indoor Our Predictions: dentists
Ground Truth: dentists office
40 50 60 70
Multiclass Accuracy mAP per Class
SUN 397
Image Features + SVM [Xiao et al. 2012] CNN + Logistics CNN + BINN CNN + SINN CNN + Partial Labels + Logistics CNN + Partial Labels + SINN
Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016
average
features
19
CNN CNN CNN
Animal Cat Bread Adorable Input Frames Feature Extraction Feature Aggregation
Coarse−Grained Classification Fine−Grained Classification
Hierarchical Label Inference
Nauata, Smith, Mori, Hierarchical Label Inference for Video Classification CVPR Workshops 2017
20
Method mAP / gAP YouTube-8M v1 YouTube-8M v2 LSTM [Abu El Haija et al.]
26.6 / N/A
Logistic regression [Abu El Haija et al.]
28.1 / N/A
CNN features
27.98 / 60.34 36.84 / 70.31
BINN
31.18 / 64.74 40.19 / 76.33
CNN CNN CNN
Animal Cat Bread Adorable Input Frames Feature Extraction Feature Aggregation
Coarse−Grained Classification Fine−Grained Classification
Hierarchical Label Inference
1. Message passing in a Recurrent Neural Network (RNN) 1. Gating function between nodes 1. Structure learning of graphical model
Input t Input t+1 State t State t+1 Output t Output t+1
Message Classification Prediction
CNN Prediction Layer
CNN Prediction Layer
Nodes for each person’s action Make output prediction
passing process
the same message classifier
same semantic meaning
CNN Prediction Layer
waiting waiting walking
Content(t-1) Gate Content(t)
CNN CNN Prediction Layer
Messages Prediction Layer Messages
CNN CNN Prediction Layer
Messages Prediction Layer Messages
Scene Person
CNN CNN
Structure Inference Machine
Walking? Waiting Waiting Walking
Structure Inference Machine Structure Inference Machine
Waiting? Waiting Waiting Waiting Walking Waiting Waiting Walking
Choi et al., VSWS 2009
Choi et al., CVPR 2011
Deng et al., BMVC 2015
Iterations 1 2 3 Tied 73.86% 74.02% 74.02% Gated Tied 80.12% 80.9% 81.22% Gated Untied 80.12% 81.06% 81.22% Collective Activity Dataset Iterations 1 2 3 Tied 84.45% 87.97% 87.97% Gated Tied 89.51% 90.14% 90.14% Gated Untied 89.51% 90.14% 90.23% Iterations 1 2 3 Tied 83.68% 84.91% 84.91% Gated Tied 84.46% 85.32% 85.32% Gated Untied 84.46% 85.50% 85.50% Collective Activity Extended Dataset Nursing Home Dataset
Graph Convolutional Networks
Neural network architectures designed to run over graphs “Convolutions” defined over adjacent nodes in the graph, filters shared over all nodes in the graph Variations in terms of form of function, normalization, layers, adjacency 47
Fingerprints, NIPS 2015
Networks, ICLR 2017
CVPR 2016
NIPS 2017
Recognition and Retrieval, ECCV 2018
48