Inference Networks, Graph Convolutional Networks Greg Mori School - - PowerPoint PPT Presentation

inference networks graph convolutional networks
SMART_READER_LITE
LIVE PREVIEW

Inference Networks, Graph Convolutional Networks Greg Mori School - - PowerPoint PPT Presentation

Inference Networks, Graph Convolutional Networks Greg Mori School of Computing Science Simon Fraser University Outline Scene Image annotation with label hierarchies outdoor outdoor indoor man-made natural house sports man-made


slide-1
SLIDE 1

Inference Networks, Graph Convolutional Networks

Greg Mori

School of Computing Science Simon Fraser University

slide-2
SLIDE 2

Outline

  • Message passing with deep structured networks
  • Deng et al. BMVC 2015, CVPR 2016
  • Image annotation with label hierarchies
  • Hu et al. CVPR 2016
Waiting Walking Waiting? Walking?
  • utdoor
man-made indoor
  • utdoor
natural construction sports field leisure house cabins farms man-made elements batter box pitcher mound play- ground trech barn arena hockey bat base ball grass building field floor people Scene
slide-3
SLIDE 3

Image Classification

Indoor

  • utdoor

man-made

  • utdoor

natural leisure sports field man-made elements cabins houses trench pitcher mound batter’s box play- ground barn field bat base ball grass person building

  • A natural image can be categorized with labels at

different concept layers

Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016

slide-4
SLIDE 4

Label Correlation Helps

Indoor

  • utdoor

man- made

  • utdoor

natural leisure man- made elements cabins houses sports field field bat base ball grass person building trench pitcher mound batter’s box play- ground barn

Positive correlation Negative correlation

  • Such categorization at different concept layers can

be modeled with label graphs

  • It is natural and straightforward to leverage label

correlation

slide-5
SLIDE 5

Goal: A generic label relation model

  • Infer the entire label space from visual input
  • Infer missing labels given a few fixed provided labels

Visual Architecture (CNN) Initial Activation Inference Machine

  • n

Knowledge Graph Refined Probability Back-propagate Gradient from Loss Function

An End-to-end Trainable System

Metadata

  • r

Partial Label

SINN Prediction with Partial Human Labels

Activations from partial human labels Information Propagation Visual Activation Output Activation Prediction

sports field batter box baseball, bat, people, field

CNN

Label: Outdoor Man-made Reverse Sigmoid SINN Prediction with Partial Human Labels

Activations from partial human labels Information Propagation Visual Activation Output Activation Prediction

sports field batter box baseball, bat, people, field

CNN

Label: Outdoor Man-made Reverse Sigmoid SINN Prediction with Partial Human Labels

Activations from partial human labels Information Propagation Visual Activation Output Activation Prediction

sports field batter box baseball, bat, people, field

CNN

Label: Outdoor Man-made Reverse Sigmoid

SINN Prediction with Partial Human Labels

Activations from partial human labels Information Propagation Visual Activation Output Activation Prediction

sports field batter box baseball, bat, people, field

CNN

Label: Outdoor Man-made Reverse Sigmoid SINN Prediction with Partial Human Labels

Activations from partial human labels Information Propagation Visual Activation Output Activation Prediction

sports field batter box baseball, bat, people, field

CNN

Label: Outdoor Man-made Reverse Sigmoid SINN Prediction with Partial Human Labels

Activations from partial human labels Information Propagation Visual Activation Output Activation Prediction

sports field batter box baseball, bat, people, field

CNN

Label: Outdoor Man-made Reverse Sigmoid

Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016

slide-6
SLIDE 6

Top-down Inference Neural Network

Visual Architect ure

Top-down inference

  • Refine activations for each label
  • Pass messages top-down and

within each layer of label graph

ai

t = Vt−1,t · ai t−1 + Ht · xi t + bt

Activation at current concept layer Vertical weight propagates information across concept layers Horizontal weight propagates information within concept layers

Activation at last concept layer

xi

t = Wt · CNN(Ii) + bt

Produce initial visual activation from CNN

Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016

slide-7
SLIDE 7

Bidirectional Inference Neural Network (BINN)

Bidirectional inference Visual Architect ure

− → a i

t =−

→ V t−1,t · − → a i

t−1 + −

→ H t · xi

t + −

→ b t, ← − a i

t =←

− V t+1,t · ← − a i

t+1 + ←

− H t · xi

t + ←

− b t, ai

t =−

→ U t · − → a i

t + ←

− U t · ← − a i

t + bt

  • Bidirectional inference to make information propagate

across entire label structure

  • Inference in each direction independently and blend

results

slide-8
SLIDE 8

Structured Inference Neural Network (SINN)

Class Zebra Leopard Cat Hound Attributes Fast Striped Spotted Domestic

Positive Correlation Negative Correlation

  • BINN is hard to train
  • Regularize connections with

prior knowledge about label correlations

  • Decompose connections

into Positive correlation + Negative correlation

Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016

slide-9
SLIDE 9

ReLU neuron is essential to keep positive/neg ative contribution

γ(x) = ReLU(x)

  • Evolve BINN formulation with regularization in connections

Structured inference Visual Architect ure

− → a i

t = γ(−

→ V +

t−1,t · −

→ a i

t−1) + γ(−

→ H +

t · xi t)

− γ(− → V −

t−1,t · −

→ a i

t−1) − γ(−

→ H −

t · xi t) + −

→ b t, ← − a i

t = γ(←

− V +

t+1,t · ←

− a i

t+1) + γ(←

− H +

t · xi t)

− γ(← − V −

t+1,t · ←

− a i

t+1) − γ(←

− H −

t · xi t) + ←

− b t, ai

t = −

→ U t · − → a i

t + ←

− U t · ← − a i

t + bt

Positive Component Negative Component

Structured Inference Neural Network (SINN)

slide-10
SLIDE 10

Prediction from Purely Visual Input

SINN Prediction

Information Propagation Visual Activation Output Activation Prediction

  • utdoor

manmade sports field batter box bat, people, water

CNN

  • Visual architecture (e.g. Convolutional Neural Network)

produces visual activation

  • SINN implements information propagation bidirectionally

and produces refined output activation

Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016

slide-11
SLIDE 11

Prediction with Partially Observed Labels

SINN Prediction with Partial Human Labels

Activations from partial human labels Information Propagation Visual Activation Output Activation Prediction

sports field batter box baseball, bat, people, field

CNN

Label: Outdoor Man-made Reverse Sigmoid

  • Reverse Sigmoid (logit) neuron produces activation

from Partial labels

  • SINN adapts both visual activation and activation

from partial labels to infer the remaining labels

slide-12
SLIDE 12

Reverse sigmoid (logit): produce activation from label

a(y) = log 1 1 − g(y), g(y) = ( y + ✏, if y = 0, y − ✏, if y = 1.

  • Reverse the sigmoid function to produce sigmoid input

Use a small epsilon to keep numerical stability (0.005) Inverse of sigmoid

y = σ(x) = 1 1 + exp−x

slide-13
SLIDE 13

Image Datasets

  • Evaluate with two types of experiments on three datasets

Animals with Attributes

[Lampert et al. 2009]

Labels 28 taxonomy terms 50 animal classes 85 attributes

  • Taxonomy terms are

constructed from Word Net as [Hwang et al. 2012]

  • Knowledge graph constructed

by combining class-attributes graph with taxonomy graph

Task: predict entire label set NUS-WIDE

[Chua et al. 2009]

Labels

698 image groups 81 concepts 1000 tags

Task: predict 81 concepts with observing tags/image groups

  • Knowledge graph produced by

Word Net using semantic similarity

  • 698 image groups constructed

from image meta data

SUN 397

[Xiao et al. 2012]

Labels 3 coarse 16 general 397 fine- grained Task 1: predict entire label set Task 2: predict fine- grained scene given coarse scene category

  • Knowledge graph provided by

dataset

slide-14
SLIDE 14

Ex1: Inference from visual input

75 80 85 90 95 100 28 Taxonomy Terms 50 Animal Classes 85 Attributes

Animal With Attributes

CNN + Logistics CNN + BINN CNN + SINN

50 60 70 80 90 100 3 Coarse Scene Categories 16 General Scene Categories 397 Fine-grained Scene Categories

SUN 397

CNN + Logistics CNN + BINN CNN + SINN

  • Produce predictions on entire label space
  • Evaluate on each concept layer (measured by mAP per

class)

  • Consistent improvement over baselines on different concept

layers

slide-15
SLIDE 15

Ground Truth: railroad CNN + Logistic: statue buildings person Our Predictions: railroad person sky Ground Truth: animal grass water dog CNN + Logistic: grass person animal Our Predictions: water animal dog Ground Truth: rainbow clouds sky CNN + Logistic: clouds water sky Our Predictions: rainbow clouds sky Ground Truth: food water CNN + Logistic: food plants flower Our Predictions: food plants water

Correct predictions are marked in blue while incorrect are marked in red

  • Produce predictions given partial 1k tags and 698

image groups

Ex2: Inference from partial labels (NUS-WIDE)

slide-16
SLIDE 16

Ex2: Inference from partial labels (NUS-WIDE)

69.24

40 45 50 55 60 65 70

mAP per class

65 70 75 80 85

mAP per image 1k tags + groups + CNN + SINN 1k tags + CNN + SINN 1k tags + groups + CNN + Logistics 1k tags + groups + Logistics CNN + Logistics 1k tags + Logistics 5k tags + Tag neighbors [Johnson et. al. 2015] 5k tags + Logistics [Johnson et. al. 2015]

  • Evaluate on standard

81 ground truth classes of NUSWIDE

  • Outperform all

baselines by large margin

Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016

slide-17
SLIDE 17

Correct predictions are marked in blue while incorrect are marked in red

  • Produce predictions given coarse-level labels (3 coarse

categories)

Ex2: Inference with partial labels (SUN397)

CNN + Logistic: campus Observed Label:

  • utdoor/man-made

Our Predictions: abbey Ground Truth: abbey CNN + Logistic: building facade Observed Label:

  • utdoor/man-made

Our Predictions: library/outdoor Ground Truth: library/outdoor CNN + Logistic: patio Observed Label:

  • utdoor/natural;
  • utdoor/man-made

Our Predictions: picnic area Ground Truth: picnic area CNN + Logistic:

  • perating room

Observed Label: indoor Our Predictions: dentists

  • ffice

Ground Truth: dentists office

slide-18
SLIDE 18

Ex2: Inference with partial labels (SUN397)

40 50 60 70

Multiclass Accuracy mAP per Class

SUN 397

Image Features + SVM [Xiao et al. 2012] CNN + Logistics CNN + BINN CNN + SINN CNN + Partial Labels + Logistics CNN + Partial Labels + SINN

  • Evaluate on 397 fine-

grained scene categories

  • Significantly improved

performance

Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016

slide-19
SLIDE 19

Video Dataset: YouTube-8M

  • Youtube-8M V1 / V2
  • 8 million / 7 million videos
  • ~500K hours of video
  • 4800 possible labels
  • 1.8 / 3.4 labels per video

average

  • Inception V3 frame features
  • Neural network audio

features

19

CNN CNN CNN

Animal Cat Bread Adorable Input Frames Feature Extraction Feature Aggregation

Coarse−Grained Classification Fine−Grained Classification

Hierarchical Label Inference

Nauata, Smith, Mori, Hierarchical Label Inference for Video Classification CVPR Workshops 2017

slide-20
SLIDE 20

Results

20

Method mAP / gAP YouTube-8M v1 YouTube-8M v2 LSTM [Abu El Haija et al.]

26.6 / N/A

Logistic regression [Abu El Haija et al.]

28.1 / N/A

CNN features

27.98 / 60.34 36.84 / 70.31

BINN

31.18 / 64.74 40.19 / 76.33

CNN CNN CNN

Animal Cat Bread Adorable Input Frames Feature Extraction Feature Aggregation

Coarse−Grained Classification Fine−Grained Classification

Hierarchical Label Inference

slide-21
SLIDE 21

Summary

  • Inference in structured label space
  • Relations within and across levels of a label space
  • Model positive and negative correlations between

labels in end-to-end trainable model

slide-22
SLIDE 22

Outline

  • Message passing with deep structured networks
  • Deng et al. BMVC 2015, CVPR 2016
  • Image annotation with label hierarchies
  • Hu et al. CVPR 2016
Waiting Walking Waiting? Walking?
  • utdoor
man-made indoor
  • utdoor
natural construction sports field leisure house cabins farms man-made elements batter box pitcher mound play- ground trech barn arena hockey bat base ball grass building field floor people Scene
slide-23
SLIDE 23

Overview

  • Group Activity Recognition: structures
slide-24
SLIDE 24

Overview

  • Flexible Structure to adaptively capture

discriminative dependencies [Lan et al. 10, Amer et al. 14]

slide-25
SLIDE 25

Overview: Probabilistic Graphical Models P(fall|image)

slide-26
SLIDE 26

Belief Propagation

P(fall|image)

  • messages contents depend on graphical model parameters
  • messages passed depend on graphical model structure
slide-27
SLIDE 27

Overview

  • Combine Graphical Models and Deep Learning,

enable message passing

CNN

slide-28
SLIDE 28

CNN

Overview

  • Combine Graphical Models and Deep Learning,

enable message passing and structure learning

slide-29
SLIDE 29

Method

1. Message passing in a Recurrent Neural Network (RNN) 1. Gating function between nodes 1. Structure learning of graphical model

slide-30
SLIDE 30

Recurrent Networks

Input t Input t+1 State t State t+1 Output t Output t+1

slide-31
SLIDE 31

Method: Message Passing in RNN

  • Another view of belief propagation: Inference

Machine, classify messages [Ross et al. CVPR’11]

Message Classification Prediction

slide-32
SLIDE 32

Method: Message Passing in RNN

CNN Prediction Layer

t

CNN Prediction Layer

t+1

Nodes for each person’s action Make output prediction

slide-33
SLIDE 33

Method: Message Passing in RNN

  • Represent potential functions in message

passing process

  • The same potential function corresponds to

the same message classifier

  • Weights are shared across all instances with

same semantic meaning

  • The same with prediction layer

CNN Prediction Layer

slide-34
SLIDE 34

Method: Gating Function between Instances

  • We introduce instance level gates:

waiting waiting walking

slide-35
SLIDE 35

Method: Gating Function

  • C.f. Long Short-Term Memory (LSTM), Gated Recurrent Unit

Content(t-1) Gate Content(t)

slide-36
SLIDE 36

Method: Graphical Model Structure Learning

  • Use instance gates in inference machines:

CNN CNN Prediction Layer

t t+1

Messages Prediction Layer Messages

slide-37
SLIDE 37
  • Untying weights between recurrent units

Method: General Purpose Inference Machine

CNN CNN Prediction Layer

t t+1

Messages Prediction Layer Messages

slide-38
SLIDE 38

Scene Person

CNN CNN

Method: Review of Pipeline

slide-39
SLIDE 39

Structure Inference Machine

Walking? Waiting Waiting Walking

Structure Inference Machine Structure Inference Machine

Waiting? Waiting Waiting Waiting Walking Waiting Waiting Walking

Method: Review of Pipeline

slide-40
SLIDE 40
  • Contains 44 video clips
  • Action classes: others, crossing, waiting, queuing, walking, talking
  • Scene classes: crossing, waiting, queuing, walking, talking

Experiment : Collective Activity Dataset

Choi et al., VSWS 2009

slide-41
SLIDE 41
  • Contains 72 video clips
  • Action classes: others, crossing, waiting, queuing, talking, dancing, jogging
  • Scene classes: crossing, waiting, queuing, talking, dancing, jogging

Experiment: Collective Activity Dataset Extended

Choi et al., CVPR 2011

slide-42
SLIDE 42

Experiment : Nursing Home Dataset

  • Contains 80 videos captured from a nursing home
  • Action classes: walking, standing, sitting, bending, squatting and falling
  • Scene classes: fall and non-fall

Deng et al., BMVC 2015

slide-43
SLIDE 43

Quantitative Results

Iterations 1 2 3 Tied 73.86% 74.02% 74.02% Gated Tied 80.12% 80.9% 81.22% Gated Untied 80.12% 81.06% 81.22% Collective Activity Dataset Iterations 1 2 3 Tied 84.45% 87.97% 87.97% Gated Tied 89.51% 90.14% 90.14% Gated Untied 89.51% 90.14% 90.23% Iterations 1 2 3 Tied 83.68% 84.91% 84.91% Gated Tied 84.46% 85.32% 85.32% Gated Untied 84.46% 85.50% 85.50% Collective Activity Extended Dataset Nursing Home Dataset

slide-44
SLIDE 44
  • Gates between person-level node and group activity

node:

Qualitative Results

slide-45
SLIDE 45

Summary

  • Recurrent network for inference in a graphical model
  • Nodes in graphical model represented by units in

recurrent network

  • Inference by repeated message passing
  • Structure learning by gating functions
slide-46
SLIDE 46

Graph Convolutional Networks

  • These are both examples of a broad class of methods known as

Graph Convolutional Networks

Neural network architectures designed to run over graphs “Convolutions” defined over adjacent nodes in the graph, filters shared over all nodes in the graph Variations in terms of form of function, normalization, layers, adjacency 47

slide-47
SLIDE 47

Reading list

  • Duvenaud et al. Convolutional Networks on Graphs for Learning Molecular

Fingerprints, NIPS 2015

  • Kipf and Welling. Semi-Supervised Classification with Graph Convolutional

Networks, ICLR 2017

  • Jain et al. Structural-RNN: Deep Learning on Spatio-Temporal Graphs,

CVPR 2016

  • Santoro et al. A simple neural network module for relational reasoning,

NIPS 2017

  • Ibrahim and Mori. Hierarchical Relational Networks for Group Activity

Recognition and Retrieval, ECCV 2018

48