Transformation Equivariance vs. Invariance: Unsupervised Learning of - - PowerPoint PPT Presentation

โ–ถ
transformation equivariance vs
SMART_READER_LITE
LIVE PREVIEW

Transformation Equivariance vs. Invariance: Unsupervised Learning of - - PowerPoint PPT Presentation

Transformation Equivariance vs. Invariance: Unsupervised Learning of Visual Representations Guo-Jun Qi guojunq@gmail.com Laboratory for MA chine P erception and LE arning ( MAPLE ) & Futurewei Technologies (Huawei Research USA) Laboratory


slide-1
SLIDE 1

Transformation Equivariance vs. Invariance: Unsupervised Learning of Visual Representations

Guo-Jun Qi guojunq@gmail.com Laboratory for MAchine Perception and LEarning (MAPLE) & Futurewei Technologies (Huawei Research USA)

1 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-2
SLIDE 2

Contents

  • TER: Transformation Equivariant Representations
  • Definition, Steerability
  • AET: AutoEncoding Transformations
  • Deterministic approach: AET (AutoEncoding Transformations)
  • Probabilistic approach: AVT (Autoencoding Variational Transformations)
  • SAT: (Semi-)supervised Autoencoding Transformations
  • Conclusions and Future Work
  • Unifying the Transformation Equivariance and Invariance

2 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-3
SLIDE 3

Contents

  • TER: Transformation Equivariant Representations
  • Definition, Steerability
  • AET: AutoEncoding Transformations
  • Deterministic approach: AET (AutoEncoding Transformations)
  • Probabilistic approach: AVT (Autoencoding Variational Transformations)
  • SAT: (Semi-)supervised Autoencoding Transformations
  • Conclusions and Future Work

3 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-4
SLIDE 4

Recipe in Success of CNNs

Horse Grass Tree โ€ฆ Classifier Convolution layers Translation Equivariant Representation Fully Connected Classifier

Visual structures Semantic concepts

CNN = Translation-Equivariant Representation + Fully-Connected classifier

4 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-5
SLIDE 5

Transformation Equivariant Representations

  • Beyond translations: equivariant feature maps under transformations

5

Various transformations Representations

Laboratory for MAchine Perception and LEarning (MAPLE)

slide-6
SLIDE 6

Generalize CNNs beyond translations

Horse Grass Tree โ€ฆ FC classifier Representati

  • n

Transformation Equivariance Transformation Invariance Spatial Semantic

6 Laboratory for MAchine Perception and LEarning (MAPLE)

Transformation Equivariant Representations + Transformation Invariant Classifiers

slide-7
SLIDE 7

Contents

  • TER: Transformation Equivariant Representations
  • Definition, Steerability
  • AET: AutoEncoding Transformations
  • Deterministic approach: AET (AutoEncoding Transformations)
  • Probabilistic approach: AVT (Autoencoding Variational Transformations)
  • SAT: (Semi-)supervised Autoencoding Transformations
  • Conclusions and Future Work

7 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-8
SLIDE 8

Transformation Equivariance

  • Definition of transformation equivariance
  • ๐‘ญ -- the representation of a sample
  • ๐ฎ -- a transformation on samples
  • ๐‡ ๐ฎ -- the representation transformation corresponding to ๐‡.
  • Transformation invariance is a special case of transformation

equivariance when ๐‡ ๐ฎ is an identity.

๐น ๐ฎ (๐ฒ) = ๐‡ ๐ฎ ๐น(๐ฒ)

8 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-9
SLIDE 9

Steerability property

  • Steerability: a transformed sample ๐ฎ (๐ฒ) can be represented directly

from the representation ๐น ๐ฒ of original sample, with no access to ๐ฒ

  • ๐‡(๐ฎ) is a function of the transformation ๐ฎ, independently of sample.

๐น ๐ฎ (๐ฒ) = ๐‡ ๐ฎ [๐น(๐ฒ)]

9 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-10
SLIDE 10
  • For general transformations
  • unnecessarily limited to discrete or spatial transformations
  • Recoloring, contrasting, etc.
  • Nonlinear representations between transformed and original images
  • Capturing complex visual structures from transformed images

๐น ๐ฎ (๐ฒ) = ๐‡ ๐ฎ [๐น(๐ฒ)]

Our Goals

Nonlinear transformations ๐‡ ๐ฎ on representations

10 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-11
SLIDE 11

Contents

  • TER: Transformation Equivariant Representations
  • Definition, Steerability
  • AET: AutoEncoding Transformations
  • Deterministic approach: AET (AutoEncoding Transformations)
  • Probabilistic approach: AVT (Autoencoding Variational Transformations)
  • SAT: (Semi-)supervised Autoencoding Transformations
  • Conclusions and Future Work

11 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-12
SLIDE 12

A Big Picture: Stack of AET

CNN Group Equivariant CNN SAT AED AET Autoencoders CNNs

SAT: (Semi-)Supervised Autoencding Transformations

Capsule Net

  • AET learns a general representation that can be applied everywhere.

AVT

AET: AutoEncoding Transformations

Transformation Equivariant Representations Deterministic AET Deterministic SAT Probabilistic SAT Probabilistic AVT

AVT: Autoencoding Variational Transformations

Stack of Autoencoding Transformations for learning TER

12 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-13
SLIDE 13

Contents

  • TER: Transformation Equivariant Representations
  • Definition, Steerability
  • AET: AutoEncoding Transformations
  • Deterministic approach: AET (AutoEncoding Transformations)
  • Probabilistic approach: AVT (Autoencoding Variational Transformations)
  • SAT: (Semi-)supervised Autoencoding Transformations
  • Conclusions and Future Work

13 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-14
SLIDE 14

Take A Glance: Autoencoding Transformations than Data

E D E(x) E(t(x)) t(x) x ๐ฎ ๐ฎ E E x D E(x) ๐ฒ AutoEncoding Data (AED) AutoEncoding Transformations (AET)

Zhang et al., AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations rather than Data, in CVPR 2019.

14 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-15
SLIDE 15

How does AET work?

  • Generative Process:
  • An input image x ~ p(x)
  • A random transformation t ~ p(t)
  • The transformed t(x)
  • A representation encoder
  • E: x โŸผ E(x), t(x) โŸผ E(t(x))
  • A transformation decoder
  • D: (x , t(x)) โŸผ D(x , t(x))

E D E(x) E(t(x)) t(x) x ๐ฎ

๐ฎ

E

15 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-16
SLIDE 16

Decoding Transformations

  • A Siamese network individually encodes representations of images
  • E.g., Visual structures, and spatial relations among objects
  • Decoding by comparing the representations before and after

transformations.

E D E(x) E(t(x)) t(x) x ๐ฎ ๐ฎ E

16 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-17
SLIDE 17

AET loss for training

  • Parameterized Transformations: ๐’ฐ= {tฮธ|ฮธ ~ ำจ}

E.g. affine or projective: ๐‘š ๐ฎฮธ, ๐ฎ ฮธ =

1 2 ||๐‘ ฮธ โˆ’ ๐‘(

ฮธ)||2

2

  • GAN-Induced Transformations: transformed image G(x, z)

๐‘š ๐ฎz, ๐ฎ z = 1 2 ||๐ด โˆ’ z||2

2

  • Non-Parametric Transformations

๐‘š ๐ฎ, ๐ฎ = 1 2 ๐”ฝ๐ฒ~๐‘Œdist(๐ฎ ๐ฒ , ๐ฎ(๐ฒ))

17 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-18
SLIDE 18

Contents

  • TER: Transformation Equivariant Representations
  • Definition, Steerability
  • AET: AutoEncoding Transformations
  • Deterministic approach: AET (AutoEncoding Transformations)
  • Probabilistic approach: AVT (Autoencoding Variational Transformations)
  • SAT: (Semi-)supervised Autoencoding Transformations
  • Conclusions and Future Work

18 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-19
SLIDE 19

Steerability of z through ๐‡ ๐ฎ and ๐ด

Revisit: Steerability of TER

  • Obtain the representation ๐ด of a transformed sample ๐ฎ(๐ฒ) from ๐ฎ and

๐ด without accessing x

  • Maximizing the mutual information between ๐ด and (

๐ด, ๐ฎ)

๐น ๐ฎ (๐ฒ) = ๐‡ ๐ฎ [๐น(๐ฒ)] ๐ฎ ๐ด t(x)

transformation

๐ด

19 Laboratory for MAchine Perception and LEarning (MAPLE)

x

slide-20
SLIDE 20

An Information-Theoretical Insight

  • Train a TER model ๐œพ by maximizing
  • By chain rule of mutual information, we have
  • ๐‘ฑ๐œพ ๐ด;

๐ด, ๐ฎ attains its maximum value ๐‘ฑ๐œพ ๐ด; ๐ด, ๐ฎ, ๐ฒ (the upper bound) when

  • Nonlinearity of transformation ๐‡ ๐ฎ in representations.

max๐œพ ๐‘ฑ๐œพ(๐ด; ๐ด, ๐ฎ) ๐‘ฑ๐œพ ๐ด; ๐ด, ๐ฎ = ๐‘ฑ๐œพ ๐ด; ๐ด, ๐ฎ, ๐ฒ โˆ’ ๐‘ฑ๐œพ ๐ด; ๐ฒ| ๐ด, ๐ฎ โ‰ค ๐‘ฑ๐œพ ๐ด; ๐ด, ๐ฎ, ๐ฒ ๐‘ฑ๐œพ ๐ด; ๐ฒ| ๐ด, ๐ฎ = ๐Ÿ

Steerability: Given ( ๐ด, ๐ฎ), x contains no more information about z.

x ๐ฎ ๐ด t(x) transformation ๐ด

20 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-21
SLIDE 21

AVT: Autoencoding Variational Transformations

  • Unable to maximize the mutual information directly
  • Intractable to evaluate the posterior ๐‘ž๐œ„(๐ฎ|๐ด, ๐ฒ)
  • Deriving a lower bound by introducing a transformation decoder ๐‘Ÿ๐”
  • Unsupervised loss to learn AVT

๐‘ฑ๐œพ ๐ด; ๐ด, ๐ฎ โ‰ฅ ๐ผ(๐ฎ| ๐ด) + ๐”ฝ๐’’๐œพ ๐’–,๐’œ,

๐ด log๐‘Ÿ๐”(๐’–|๐’œ,

๐ด) max

๐œพ,๐” ๐”ฝ๐’’๐œพ ๐ฎ,๐ด, ๐ด log๐‘Ÿ๐”(๐’–|๐’œ,

๐ด)

21

Qi, Learning Generalized Transformation Equivariant Representations via Autoencoding Transformations, preprint.

Laboratory for MAchine Perception and LEarning (MAPLE)

slide-22
SLIDE 22

AVT: Autoencoding Variational Transformations

  • Generative process
  • Given an image x sampled from p(x)
  • Sample a transformation t from p(t)
  • Apply t to x, resulting in t(x)
  • Sample a representation z of t(x) from

๐‘ž๐œ„(๐ด|๐ฒ, ๐ฎ)

  • ๐ด is sampled by setting t to an identity
  • Decode transformations

๐ฎ from ๐‘Ÿ๐œš(๐ฎ|๐ด, ๐ด)

t(x) x ๐ฎ ๐ฎ ๐‘ž๐œ„(๐ด|๐ฒ, ๐Ÿ) ๐‘Ÿ๐œš(๐ฎ|๐ด, ๐ด) ๐‘ž๐œ„(๐ด|๐ฒ, ๐ฎ) ๐ด ๐ด AVT

22 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-23
SLIDE 23

Deterministic vs. Probabilistic Approaches

  • AVT can handle uncertainty in
  • A probabilistic Representing of images: ๐‘ž๐œ„(๐ด|๐ฒ, ๐ฎ)
  • Decoding transformations with a probabilistic decoder: ๐‘Ÿ๐œš(๐ฎ|๐ด,

๐ด)

t(x) x ๐ฎ ๐ฎ ๐‘ž๐œ„(๐ด|๐ฒ, ๐Ÿ) ๐‘Ÿ๐œš(๐ฎ|๐ด, ๐ด) ๐‘ž๐œ„(๐ด|๐ฒ, ๐ฎ) ๐ด ๐ด Probabilistic AVT Deterministic AET E D z=E(t(x)) t(x) x ๐ฎ ๐ฎ E

23 Laboratory for MAchine Perception and LEarning (MAPLE)

๐ด = ๐น(๐ฒ)

slide-24
SLIDE 24

Experiments for AET and AVT

  • Unsupervised Representation Learning
  • AET and AVT
  • Evaluation Protocols
  • Test learned representations on downstream tasks
  • Datasets
  • CIFAR10
  • ImageNet
  • Places

Transformation Equivariant Representations Deterministic AET Deterministic SAT Probabilistic SAT Probabilistic AVT

Stack of Autoencoding Transformations for learning TER

24 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-25
SLIDE 25

Transformations in Experiments

  • Affine transformations
  • Projective transformations

25

Random rotation [-180o, 180o] Translation ยฑ0.2*H/W Scale 0.7 โ€“ 1.3 Scale 0.8 โ€“ 1.2 Random rotation 0o, 90o,180o, 270o Stretch each corner ยฑ0.125*H/W

Laboratory for MAchine Perception and LEarning (MAPLE)

slide-26
SLIDE 26

CIFAR10 and ImageNet

26

Method Error rate Supervised NIN (Lower Bound) 7.20 Random Init. + conv (upper Bound) 27.50 Roto-Scat + SVM 17.7 ExamplarCNN 15.7 DCGAN 17.2 Scattering 15.3 RotNet + FC 10.94 RotNet + Conv 8.84 (Ours) AET-affine + FC 9.77 (Ours) AET-affine + conv 8.05 (Ours) AET-project + FC 9.41 (Ours) AET-project + conv 7.82 (ours) AVT-project + FC 8.96 (ours) AVT-project + conv 7.75 Method Conv4 Conv5 ImageNet Labels (Upper Bound) 59.7 59.7 Random (Lower Bound) 27.1 12.0 Tracking 38.8 29.8 Context 45.6 30.4 Colorization 40.7 35.2 Jigsaw Puzzles 45.3 34.6 BiGAN 41.9 32.2 NAT

  • 36.0

DeepCluster

  • 44.0

RotNet 50.0 43.8 (Ours) AET-project 53.2 47.0 (Ours) AVT-project 54.2 48.4

Error rates on CIFAR10 Top 1 Accuracy on ImageNet

Projective transformation is slightly better than the affine.

Laboratory for MAchine Perception and LEarning (MAPLE)

slide-27
SLIDE 27

Model-Free KNN on Representations

  • Purely test the learned representations without training models
  • Error rates on CIFAR10

27

K= 3 5 10 15 20 RotNet baseline 25.67 25.01 24.97 25.85 26.00 AET-affine 24.88 23.29 23.07 23.34 23.94 AET-project 23.29 22.40 22.39 23.32 23.75 AVT-project 22.46 21.62 23.70 22.16 21.51

Laboratory for MAchine Perception and LEarning (MAPLE)

slide-28
SLIDE 28

PASCAL VOC: Object Detection

  • A milestone: even better than fully supervised counterpart

Laboratory for MAchine Perception and LEarning (MAPLE) 28

slide-29
SLIDE 29

Contents

  • TER: Transformation Equivariant Representations
  • Definition, Steerability
  • AET: AutoEncoding Transformations
  • Deterministic approach: AET (AutoEncoding Transformations)
  • Probabilistic approach: AVT (Autoencoding Variational Transformations)
  • SAT: (Semi-)supervised Autoencoding Transformations
  • Conclusions and Future Work

29 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-30
SLIDE 30

Supervised Autoencoding Transformations

  • Beyond Translation Equivariance and CNNs
  • Transformation Equivariant Representations + Transformation Invariant Classifiers

30

Dog

Laboratory for MAchine Perception and LEarning (MAPLE)

slide-31
SLIDE 31

SAT: Semi-Supervised Autoencoding Transformation

  • Adding a label decoder ๐‘Ÿ๐œš(๐ณ|

๐ด)to approximate the posterior ๐‘ž๐œ„(๐ณ|๐ฒ)

Horse Grass Tree โ€ฆ

t(x) x ๐ฎ ๐ฎ ๐‘Ÿ๐œš(๐ณ| ๐ด) ๐ณ ๐‘ž๐œ„(๐ด|๐ฒ, ๐Ÿ) ๐‘Ÿ๐œš(๐ฎ|๐ด, ๐ด) ๐‘ž๐œ„(๐ด|๐ฒ, ๐ฎ) ๐ด ๐ด

Encoder Transformation decoder Label decoder

31

Qi, Learning Generalized Transformation Equivariant Representations via Autoencoding Transformations, in preprint.

Laboratory for MAchine Perception and LEarning (MAPLE)

slide-32
SLIDE 32

Training SAT

  • Seek to maximize the mutual information between (๐ณ, ๐ด) and (

๐ด, ๐ฎ)

  • Assumption: (

๐ด, ๐ฎ) contains sufficient information to decode both label y and representation z of transformed images

  • Intractable due to the difficulty in evaluating the posterior

๐‘ž๐œ„(๐ณ, ๐ฎ|๐ด, ๐ด) max

๐œ„

๐ฝ๐œ„(๐ณ, ๐ด; ๐ด, ๐ฎ)

32 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-33
SLIDE 33
  • By introducing label decoder and transformation decoder, we have

๐ฝ๐œ„ ๐ณ, ๐ด; ๐ด, ๐ฎ โ‰ฅ ๐”ฝ๐‘ž๐œ„(๐ณ,๐ด,

๐ด) log ๐‘Ÿ๐œš ๐ณ ๐ด,

๐ด + ๐”ฝ๐‘ž๐œ„(๐ฎ,๐ด,

๐ด) log ๐‘Ÿ๐œš ๐ฎ ๐ด,

๐ด

  • Jointly maximizing over encoder ๐œ„ and decoders ๐œš

max

๐œ„,๐œš ๐”ฝ๐‘ž๐œ„(๐ณ,๐ด, ๐ด) log ๐‘Ÿ๐œš ๐ณ ๐ด,

๐ด + ๐”ฝ๐‘ž๐œ„(๐ฎ,๐ด,

๐ด) log ๐‘Ÿ๐œš ๐ฎ ๐ด,

๐ด

Variational Bound

Label decoder Transformation decoder

33 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-34
SLIDE 34

A Simple Case: Deterministic SAT

  • Replace probabilistic encoder and decoders with deterministic ones
  • Supervised AET

Horse Grass Tree โ€ฆ

t(x) x ๐ฎ ๐ฎ ๐ท๐œš( ๐‘จ) ๐ณ ๐น๐œ„(๐ฒ) ๐ธ๐œš(๐ด, ๐ด) ๐น๐œ„(๐ฎ ๐ฒ ) ๐ด ๐ด

Encoder Decoders Label decoder

๐ด

max

๐œ„,๐œš ๐”ฝ ๐ฒ,๐ณ ๐‘šCrossEntropy ๐ณ, ๐ท๐œš

๐ด + ๐”ฝ ๐ฎ,๐ฒ ๐‘šAET ๐ฎ, ๐ธ๐œš ๐ด, ๐ด

34 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-35
SLIDE 35

Experiments on SAT

  • (Semi-)supervised learning: SAT
  • Evaluation Protocols
  • Follow semi-supervised protocols with varying number of labels
  • Datasets
  • CIFAR10
  • SVHN

35

Transformation Equivariant Representations Deterministic AET Deterministic SAT Probabilistic SAT Probabilistic AVT

Stack of Autoencoding Transformations for learning TER

Laboratory for MAchine Perception and LEarning (MAPLE)

slide-36
SLIDE 36

Error Rate on CIFAR10

Methods 1000 labels 2000 labels 4000 labels All GAN 18.63 ยฑ2.32 ฮ  model 12.36 ยฑ 0.31 5.56 ยฑ 0.10 Temporal Ensembling 12.16 ยฑ 0.31 5.60 ยฑ 0.10 VAT 10.55 Supervised-only 46.43 ยฑ 1.21 33.94 ยฑ 0.73 20.66 ยฑ 0.57 5.81 ยฑ 0.15 ฮ  model 27.36 ยฑ 1.20 18.02 ยฑ 0.60 13.20 ยฑ 0.27 6.06 ยฑ 0.11 Mean Teacher 21.55 ยฑ 1.48 15.73 ยฑ 0.31 12.31 ยฑ 0.28 5.94 ยฑ 0.15 SAT 14.89 ยฑ 0.38 11.71 ยฑ 0.29 9.85 ยฑ 0.11 4.91 ยฑ 0.13

36 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-37
SLIDE 37

Error Rate on SVHN

37

Methods 250 labels 500 labels 1000 labels All GAN 18.63 ยฑ2.32 ฮ  model 12.36 ยฑ 0.31 5.56 ยฑ 0.10 Temporal Ensembling 12.16 ยฑ 0.31 5.60 ยฑ 0.10 VAT 10.55 Supervised-only 46.43 ยฑ 1.21 33.94 ยฑ 0.73 20.66 ยฑ 0.57 5.81 ยฑ 0.15 ฮ  model 27.36 ยฑ 1.20 18.02 ยฑ 0.60 13.20 ยฑ 0.27 6.06 ยฑ 0.11 Mean Teacher 21.55 ยฑ 1.48 15.73 ยฑ 0.31 12.31 ยฑ 0.28 5.94 ยฑ 0.15 SAT 14.89 ยฑ 0.38 11.71 ยฑ 0.29 9.85 ยฑ 0.11 4.91 ยฑ 0.13

Laboratory for MAchine Perception and LEarning (MAPLE)

slide-38
SLIDE 38

Contents

  • TER: Transformation Equivariant Representations
  • Definition, Steerability
  • AET: AutoEncoding Transformations
  • Deterministic approach: AET (AutoEncoding Transformations)
  • Probabilistic approach: AVT (Autoencoding Variational Transformations)
  • SAT: (Semi-)supervised Autoencoding Transformations
  • Conclusions and Future Work

38 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-39
SLIDE 39

Future Works

  • Evolutionalize representation learning and various high-level tasks
  • More powerful in capturing the (non-)linear equivariant (invariant) visual

structures against transformations

39

Transformation Equivariant Representations Deterministic AET Deterministic SAT Probabilistic SAT Probabilistic AVT

Classification Detection Segmentation Face recogntion Pose estimation Super Resolutions

Representation Learning Foundation High-level tasks

Laboratory for MAchine Perception and LEarning (MAPLE)

slide-40
SLIDE 40

Future Work: Unifying Transformation Equivariance vs. Invariance

Laboratory for MAchine Perception and LEarning (MAPLE) 40

equivariance equivariance equivariance Transformation Equivariance discerns the difference between transformed images, thus more advantageous in modeling spatial structures for object detection and semantic segmentation. Push apart

slide-41
SLIDE 41

Future Work: Unifying Transformation Equivariance vs. Invariance (contโ€™d)

Laboratory for MAchine Perception and LEarning (MAPLE) 41

invariance invariance invariance Pull together Transformation invariance (e.g., in contrastive learning) pulls together the embeddings of transformed images, while discriminating between different instances, thus more suitable for image classification tasks.

slide-42
SLIDE 42

Future Work: Unifying Transformation Equivariance vs. Invariance (contโ€™d)

  • A naive approach (yet to be done) by combining
  • AET loss for transformation equivariance
  • Contrastive loss for transformation invariance
  • Open questions
  • How can both transformation equivariance and invariance reconcile in

learning feature embeddings?

  • Will the new approach excel on both spatial sensitive tasks (object detection

and semantic segmentation) and image classification tasks?

Laboratory for MAchine Perception and LEarning (MAPLE) 42

Guo-Jun Qi and Jiebo Luo, Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods, arXiv:1903.11260

slide-43
SLIDE 43

Conclusions

  • Transformation Equivariant Representation
  • Plays key role in the success of
  • Generalizes to any (Continuous, discrete, and beyond spatial)

transformations

  • Nonlinear, steerable representations of transformations
  • Unsupervised: AutoEncoding Transformations
  • Unsupervised Learning: AET and AVT
  • (Semi-)supervised learning: SAT
  • (Semi-)Supervised: Generalizing CNNs beyond translations
  • Transformation Equivariant Representations + Transformation Invariant

Classifier

  • With invariance contained in equivariance as a special case

43 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-44
SLIDE 44

References

Preprint

  • Qi, Learning Generalized Transformation Equivariant Representations

via Autoencoding Transformations, in preprint. (long version) Conference Publications

  • Zhang et al., AET vs. AED: Unsupervised Representation Learning by

Auto-Encoding Transformations rather than Data, in CVPR 2019. (AET)

  • Qi et al., AVT: Unsupervised Learning of Transformation Equivariant

Representations by Autoencoding Variational Transformations, in ICCV

  • 2019. (AVT)

44 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-45
SLIDE 45

Thank you! Q&A

45 Laboratory for MAchine Perception and LEarning (MAPLE)

slide-46
SLIDE 46

Recent Progresses

Laboratory for MAchine Perception and LEarning (MAPLE) 46

Unsupervised Learning of Transformation Equivariant Representations via Auto-Encoding Node-wise Transformations

slide-47
SLIDE 47

Auto-Encoding Node-wise Transformations

  • Apply transformations to individual nodes of a graph
  • Apply transformations to sampled nodes
  • Study different parts of a graph each time
  • Reduce the transformation parameters
  • Global vs. local transformations
  • Isotropic vs. anisotropic transformations

Laboratory for MAchine Perception and LEarning (MAPLE) 47

slide-48
SLIDE 48

Auto-Encoding Node-wise Transformations

  • Decode the transformations at each

node

  • Learn the representation of individual

nodes on both local and global scale

  • Global structures are learned by

sampling nodes from different parts of graphs

  • Local structures are covered by

integrating information through edge convolutions on nearby nodes

Laboratory for MAchine Perception and LEarning (MAPLE) 48

slide-49
SLIDE 49

Results

  • Datasets: cloud data of 3D points
  • ModelNet40: graph classication
  • 12,311 meshed CAD model
  • 40 categories
  • ShapeNet part: graph segmentations
  • 16,881 3D point clouds
  • 16 categories
  • Two tasks
  • Graph classification
  • Graph segmentation

Laboratory for MAchine Perception and LEarning (MAPLE) 49

slide-50
SLIDE 50

Results

  • Classification

Laboratory for MAchine Perception and LEarning (MAPLE) 50

  • Segmentation
slide-51
SLIDE 51

Results

  • Compared with both supervised and unsupervised graph

segmentations, our approach is much more competitive.

Laboratory for MAchine Perception and LEarning (MAPLE) 51

slide-52
SLIDE 52

AETv2: AutoEncoding Transformations by Minimizing Geodesic Distances in Lie Groups of Transformations

52 Laboratory for MAchine Perception and LEarning (MAPLE)

Recent Progresses

slide-53
SLIDE 53

Mean-Squared Estimation vs. Geodesic Distance Minimization

  • (Matrix representation of ) Transformations

stay in a curved manifold of Lie group rather than in a flat Euclidean space

  • MSE minimizes the Euclidean space between

predicted and groudtruth transformations

  • Not represent how a transformation continuously

evolves to another one in the Lie group

  • Geodesic distance minimization (GDM)

represent a โ€œrealโ€ distance between transformations

Laboratory for MAchine Perception and LEarning (MAPLE) 53

slide-54
SLIDE 54

Geodesic distance between Transformations

  • Given two (matrix form of) transformations

๐” and ๐”, the geodesic distance between them is โ„“ ๐”, ๐” = 1 2 LogI ๐”โˆ’1 ๐”

F

where LogI is the Riemannian logarithm not the matrix logarithm.

Laboratory for MAchine Perception and LEarning (MAPLE) 54

  • Riemannian exp at I maps a point

in tangent space to Lie group

  • Riemannian logarithm at I maps

back to the tangent space.

slide-55
SLIDE 55

Homography Transformations

  • For many spatial transformations, there is no closed-form Riemannian

logarithm.

  • Homography transformations achieves the SOTA accuracy in AETv1.
  • Solution
  • Use a subgroup of transformation with tractable Riemannian logarithm

(which reduces to matrix logarithm)

  • Project the transformations to the subgroup
  • Minimize the resultant geodesic distance in the subgroup (together with the

projection loss)

Laboratory for MAchine Perception and LEarning (MAPLE) 55

slide-56
SLIDE 56

Projection into SO(3) subgroup

  • Riemannian logarithm reduces to matrix logarithm in SO(3)
  • Project ๐”โˆ’1

๐” to SO(3) ๐ = ฮ SO 3 (๐”โˆ’1 ๐”)

  • By Rodriguesโ€™ rotation formula,

๐œ„ = arccos[tr ๐ โˆ’ 1 2 ] is the rotation angle around a unit 3D axis in the direction of log ๐

Laboratory for MAchine Perception and LEarning (MAPLE) 56

slide-57
SLIDE 57

Objective in SO(3) subgroup

  • Given two (matrix form of)

transformations ๐” and ๐”, โ„“ ๐”, ๐” = ๐œ„ + ๐œ‡ ๐’ฮ 

๐บ 2

Where ๐’ฮ  = ๐”โˆ’1 ๐” โˆ’ ฮ SO 3 (๐”โˆ’1 ๐”) is the projection residual

Laboratory for MAchine Perception and LEarning (MAPLE) 57

slide-58
SLIDE 58

ImageNet Results

  • Better than AETv1 (MSE)

Laboratory for MAchine Perception and LEarning (MAPLE) 58