Self-supervised learning in computer vision Ishan Misra Facebook AI - - PowerPoint PPT Presentation

self supervised learning in computer vision
SMART_READER_LITE
LIVE PREVIEW

Self-supervised learning in computer vision Ishan Misra Facebook AI - - PowerPoint PPT Presentation

Self-supervised learning in computer vision Ishan Misra Facebook AI Research With slides from Andrew Zisserman, Carl Doersch Success story of supervision: Pre-training Features from networks pre-trained on ImageNet can be used for a variety


slide-1
SLIDE 1

Self-supervised learning in computer vision

Ishan Misra Facebook AI Research

With slides from Andrew Zisserman, Carl Doersch

slide-2
SLIDE 2

2

Images from ImageNet (Pre-train)

ConvNet Learn a representation husky, terrier, tench, ...

  • Features from networks pre-trained on ImageNet can be used for a variety of different

downstream tasks

Success story of supervision: Pre-training

slide-3
SLIDE 3

3

  • Pre-train on a large supervised dataset.
  • Collect a dataset of "supervised" images
  • Train a ConvNet

Success story of supervision: Recipe for good solutions

slide-4
SLIDE 4

4

  • Getting "real" labels is difficult and expensive
  • ImageNet with 14M images took 22 human years.
  • Obtain labels using a "semi-automatic" process
  • Hashtags
  • GPS locations
  • Using the data itself: "self"-supervised

The promise of "alternative" supervision

slide-5
SLIDE 5

5

Can we get labels for all data?

slide-6
SLIDE 6

6

Can we get labels for all data?

0E+00 2.5E+05 5E+05 7.5E+05 1E+06 Bounding Boxes

Stats from Pawan Kumar at Oxford

Dog, chair, pizza, donut

Dog, chair, pizza, donut

slide-7
SLIDE 7

7

Can we get labels for all data?

0E+00 3.5E+06 7E+06 1.05E+07 1.4E+07 Bounding Boxes Image Level

Dog, chair, pizza, donut

slide-8
SLIDE 8

8

Can we get labels for all data?

0E+00 3.5E+11 7E+11 1.05E+12 1.4E+12 Bounding Boxes Image Level Internet Photos

forbes.com

https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/

slide-9
SLIDE 9

9

Can we get labels for all data?

1E+00 1E+06 1E+12 Bounding Boxes Image Level Internet Photos

slide-10
SLIDE 10

10

Can we get labels for all data?

1E+00 1E+06 1E+12 Bounding Boxes Image Level Internet Photos

Real World

ImageNet (14 million images) needed 22 human years to label

slide-11
SLIDE 11

11

  • What about complex concepts?
  • Video?
  • Labelling cannot scale to the size of the data we generate

Can we get labels for all data?

slide-12
SLIDE 12

12

Rare concepts?

Slide credit: Rob Fergus

10% of the classes account for 93% of the data

slide-13
SLIDE 13

13

Different Domains?

ImageNet pre-training may not work

slide-14
SLIDE 14

14

  • Obtain "labels" from the data itself by using a "semi-automatic" process
  • Predict part of the data from other parts

What is "self" supervision?

Observed data Hidden data Hidden property of the data

slide-15
SLIDE 15

15

Virginia de Sa, 1994, Image: Learning classification with Unlabeled Data

What is "self" supervision?

slide-16
SLIDE 16

16

  • Fill in the blanks

Word2vec

word2vec - Mikolov et al. Image by Julian Gilyadov

slide-17
SLIDE 17

17

  • Fill in the blanks is a powerful signal to learn representations
  • Sentence/Word representations: BERT - Devlin et al., 2018

Success of self-supervised learning in NLP

slide-18
SLIDE 18

18

  • Helps us learn using observations and interactions
  • Does not require exhaustive annotation of concepts
  • Leverage multiple modalities or structure in the domain

Why self supervision?

slide-19
SLIDE 19

In the context of Computer Vision

slide-20
SLIDE 20

20

  • Self-supervised task used for learning representations
  • Often, not the "real" task (like image classification) we care about

Pretext task

Observed data Hidden data Hidden property of the data

Pretext task

Pretext task - Doersch et al., 2015, Unsupervised visual representation learning by context prediction

slide-21
SLIDE 21

21

  • Using images
  • Using video
  • Using video and sound

Pretext task

Observed data Hidden data Hidden property of the data

slide-22
SLIDE 22

22

  • Using images
  • Using video
  • Using video and sound

Pretext task

slide-23
SLIDE 23

23

Relative Position of patches

Doersch et al., 2015, Unsupervised visual representation learning by context prediction

slide-24
SLIDE 24

24

Relative Position: Nearest Neighbors in features

Doersch et al., 2015, Unsupervised visual representation learning by context prediction

slide-25
SLIDE 25

25

Predicting Rotations

Gidaris et al., 2018, Predicting Image Rotations

00 900 1800 2700

slide-26
SLIDE 26

26

Colorization

Zhang and Efros, 2016, Colorful image colorization

slide-27
SLIDE 27

27

Fill in the blanks

Pathak et al., 2016, Context auto encoders

slide-28
SLIDE 28

28

  • Using images
  • Using video
  • Using video and sound

Self-supervision in computer vision

slide-29
SLIDE 29

29

  • Video is a "sequence" of frames
  • How to get "self-supervision"?
  • Predict order of frames
  • Fill in the blanks
  • Track objects and predict their position

Video

slide-30
SLIDE 30

30

Misra et al., 2016, Shuffle and Learn

Shuffle & Learn

slide-31
SLIDE 31

31

Shuffle & Learn

slide-32
SLIDE 32

32

Shuffle & Learn

slide-33
SLIDE 33

33

slide-34
SLIDE 34

34

Fine-tune on Human Keypoint Estimation

Shuffle & Learn

slide-35
SLIDE 35

35

Fine-tune on Human Keypoint Estimation

Initialization (AlexNet)

End task FLIC Dataset Keypoints AUC MPII Dataset Keypoints AUC

ImageNet Supervised

51.3 47.2

Shuffle and Learn (Self-supervised)

49.6 47.6

Shuffle & Learn

slide-36
SLIDE 36

36

Odd-one-out Networks

Fernando et al., 2017, Odd-one-out networks

slide-37
SLIDE 37

37

  • Using images
  • Using video
  • Using video and sound

Self-supervision in computer vision

slide-38
SLIDE 38

38

Audio-Visual co-supervision

Arandjelović and Zisserman, 2017, “Objects that Sound”

slide-39
SLIDE 39

39

Objects that Sound

slide-40
SLIDE 40

40

Objects that Sound

slide-41
SLIDE 41

41

What can be learnt?

  • Good representations – Visual features

– Audio features

  • Intra- and cross-modal retrieval

– Aligned audio and visual embeddings

  • “What is making the sound?”

– Learn to localize objects that sound

Objects that Sound

slide-42
SLIDE 42

42

Objects that Sound

slide-43
SLIDE 43

Understanding what the "pretext" task learns

slide-44
SLIDE 44

44

Are they complementary?

Initialization (ResNet101)

End task ImageNet top-5 accuracy VOC07 Detection mAP

Relative Position

59.2 66.8

Colorization

62.5 65.5

Relative Position + Colorization (Multi-task)

66.6 68.8

Doersch & Zisserman, 2017, Multi-task self-supervised visual learning

slide-45
SLIDE 45

45

Information predicted: varies across tasks

Less More

slide-46
SLIDE 46

46

Pretext tasks Generative Contrastive/Clustering

Related Unrelated

Pretext Image Transform

Transform t

Standar

It

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

I

<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit> <latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

Pr

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit> <latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>

Pretext Invariant Representation Learning

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit> <latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit> <latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

Representation

ConvNet

Representation

ConvNet Encourage to be similar

It

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

I

<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>

ConvNet

Predict more information

AutoEncoder, VAE, GAN, BiGAN

slide-47
SLIDE 47

47

Scaling self-supervised learning

Jigsaw puzzles

(Noorozi & Favaro, 2016)

Goyal et al., 2019, Scaling and benchmarking self-supervised visual representation learning

slide-48
SLIDE 48

48

Evaluating the representation

ConvNet

Extract "fixed" features

slide-49
SLIDE 49

49

Evaluating the representation

  • Train a Linear SVM on fixed feature representations
  • Use the VOC07 image classification task
slide-50
SLIDE 50

50

Increasing amount of information predicted

mAP = mean Average Precision (Higher is better)

Linear classifier on VOC07

slide-51
SLIDE 51

51

Surface Normal Estimation

  • Predict surface normals on NYU-v2
  • Same optimization parameters for all methods (including supervised)
  • PSPNet Architecture
  • Train last few layers only (res5 onwards)

Image from the NYU dataset

Input Output

slide-52
SLIDE 52

52

Surface Normal Estimation

Initialization

Median Error (Lower better) % correct within 11.250 (higher better)

ImageNet Supervised

17.1 36.1

Jigsaw Flickr 100M

13.1 44.6

Outperforms ImageNet supervised

slide-53
SLIDE 53

What is missing from "pretext" tasks? Or in general "proxy" tasks

slide-54
SLIDE 54

54

Jigsaw puzzles

(Noroozi et al., 2016)

Rotation

(Gidaris et al., 2018)

Pretext tasks

slide-55
SLIDE 55

The hope of generalization

  • We really hope that the pre-training task and the transfer task are "aligned"

Pre-training

Self-supervised

Transfer Tasks

slide-56
SLIDE 56

The hope of generalization

  • We really hope that the pre-training task and the transfer task are "aligned"

#sun #nofilter #fun #tree #aruba

Pre-training

Weak or self-supervised

Transfer Tasks

Why should solving Jigsaw puzzles teach about "semantics"? Why should performing a non semantic task produce good features?

slide-57
SLIDE 57

The hope of generalization ... ?

Jigsaw Pre-train data ConvNet ConvNet

Linear classifiers on "fixed" features

Pre-training

Weak or self-supervised

Transfer

slide-58
SLIDE 58

Higher layers do not generalize ...

Jigsaw

Linear classifier on VOC07

mAP = mean Average Precision (Higher is better) conv1 res5

slide-59
SLIDE 59

Pretext-Invariant Representation Learning

(PIRL)

Ishan Misra, Laurens van der Maaten

slide-60
SLIDE 60

60

Pretext task

ConvNet

Predict property

  • f transform t

It

<latexit sha1_base64="MXdteYo7j3dYQsNtpapn5lT1fY=">AB9XicbVDLSgNBEOz1GeMr6tHLYCJ4CrtBUG9BL3qLYB6QbMLsZDYZMvtgplcJS/7DiwdFvPov3vwbZ5McNLFgoKjqpmvKi6XQaNvf1srq2vrGZm4rv72zu7dfODhs6ChRjNdZJCPV8qjmUoS8jgIlb8WK08CTvOmNbjK/+ciVFlH4gOYuwEdhMIXjKRuqVOQHo+endpIulXqFol+0pyDJx5qQIc9R6ha9OP2JwENkmrduwY3ZQqFEzySb6TaB5TNqID3jY0pAHXbjpNPSGnRukTP1LmhUim6u+NlAZajwPTGYh9aKXif957QT9SzcVYZwgD9nskJ9IghHJKiB9oThDOTaEMiVMVsKGVFGpqi8KcFZ/PIyaVTKzn56r5SrF7P68jBMZzAGThwAVW4hRrUgYGCZ3iFN+vJerHerY/Z6Io13zmCP7A+fwAHuJI5</latexit>

Contain information about transform t

It

<latexit sha1_base64="Xy1u5b/DFE3jDC5HvZeUuaZkrIE=">ACBXicbVA7T8MwGHTKq5RXgBEGixaJqUoqJGCrYIGtSPQhNaFyXKe16jiR7SBVURYW/goLAwix8h/Y+Dc4aQZoOcnS6e57+byIUaks69soLS2vrK6V1ysbm1vbO+buXkeGscCkjUMWip6HJGUk7aipFeJAgKPEa63uQq87sPREga8js1jYgboBGnPsVIaWlgHtacAKmx5yc36X3i5AMTj8UkVWltYFatupUDLhK7IFVQoDUwv5xhiOAcIUZkrJvW5FyEyQUxYykFSeWJEJ4gkakrylHAZFuki9N4bFWhtAPhX5cwVz93ZGgQMp4OnK7GQ572Xif14/Vv65m1AexYpwPFvkxwyqEGaRwCEVBCs21QRhQfWtEI+RQFjp4Co6BHv+y4uk06jbp/WL20a1eVnEUQYH4AicABucgSa4Bi3QBhg8gmfwCt6MJ+PFeDc+ZqUlo+jZB39gfP4AL8+ZCQ=</latexit>

Image Transform t

I

<latexit sha1_base64="UMBKcX4627dw7ura0nNTPruZI8=">AB83icbVDLSsNAFL2pr1pfUZduBlvBVUmKoO6KbnRXwT6gCWUynbRDJ5MwMxFK6G+4caGIW3/GnX/jpM1CWw8MHM65l3vmBAlnSjvOt1VaW9/Y3CpvV3Z29/YP7MOjopTSWibxDyWvQArypmgbc0p71EUhwFnHaDyW3ud5+oVCwWj3qaUD/CI8FCRrA2klfzIqzHQZjdz2oDu+rUnTnQKnELUoUCrYH95Q1jkZUaMKxUn3XSbSfYakZ4XRW8VJFE0wmeET7hgocUeVn8wzdGaUIQpjaZ7QaK7+3shwpNQ0CsxkHlEte7n4n9dPdXjlZ0wkqaCLA6FKUc6RnkBaMgkJZpPDcFEMpMVkTGWmGhTU8WU4C5/eZV0GnX3on790Kg2b4o6ynACp3AOLlxCE+6gBW0gkMAzvMKblVov1rv1sRgtWcXOMfyB9fkDdcaRUw=</latexit>

It

<latexit sha1_base64="MXdteYo7j3dYQsNtpapn5lT1fY=">AB9XicbVDLSgNBEOz1GeMr6tHLYCJ4CrtBUG9BL3qLYB6QbMLsZDYZMvtgplcJS/7DiwdFvPov3vwbZ5McNLFgoKjqpmvKi6XQaNvf1srq2vrGZm4rv72zu7dfODhs6ChRjNdZJCPV8qjmUoS8jgIlb8WK08CTvOmNbjK/+ciVFlH4gOYuwEdhMIXjKRuqVOQHo+endpIulXqFol+0pyDJx5qQIc9R6ha9OP2JwENkmrduwY3ZQqFEzySb6TaB5TNqID3jY0pAHXbjpNPSGnRukTP1LmhUim6u+NlAZajwPTGYh9aKXif957QT9SzcVYZwgD9nskJ9IghHJKiB9oThDOTaEMiVMVsKGVFGpqi8KcFZ/PIyaVTKzn56r5SrF7P68jBMZzAGThwAVW4hRrUgYGCZ3iFN+vJerHerY/Z6Io13zmCP7A+fwAHuJI5</latexit>

Less Semantic Features

slide-61
SLIDE 61

61

Underlying Principle for Pretext Tasks

Pretext Image Transform

Transform t

Standard Pretext Learning

It

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

I

<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>

It

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

Representation

ConvNet Predict property of t

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit> <latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>
  • Apply known image transform t
  • Construct task to predict t from

transformed Image (It )

  • Final layer representations must

carry information about t

  • Representations "covary" with t
slide-62
SLIDE 62

62

How important has invariance been?

  • Hand-crafted features like SIFT and HOG
  • SIFT - Scale Invariant Feature Transform
  • Supervised systems are trained to be invariant

to "data augmentation"

slide-63
SLIDE 63

63

Pretext-Invariant Representation Learning (PIRL)

  • Be invariant to t

Pretext Image Transform Pretext Invariant Representation Learning

Transform t

Standard Pretext Learning

It

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

I

<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>

It

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

Representation

ConvNet Predict property of t

Representation

ConvNet

Representation

ConvNet Encourage to be similar

It

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

I

<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>

ConvNet

slide-64
SLIDE 64

64

Pretext-Invariant Representation Learning (PIRL)

  • Be invariant to t
  • Representation

contains no information about t

Pretext Image Transform Pretext Invariant Representation Learning

Transform t

Standard Pretext Learning

It

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

I

<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>

It

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

Representation

ConvNet Predict property of t

Representation

ConvNet

Representation

ConvNet Encourage to be similar

It

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

I

<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>

ConvNet

slide-65
SLIDE 65

65

PIRL

Pretext Invariant Representation Learning

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit> <latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit> <latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

Representation

ConvNet

Representation

ConvNet Encourage to be similar

It

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

I

<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>

ConvNet

Pretext Image Transform

Transform t

It

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

I

<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit> <latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

Pr

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit> <latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>
  • Representations from I and It

should be similar

  • t = Pretext Transforms

(Jigsaw/ Rotation, combinations etc.)

  • Use a contrastive loss to

enforce similarity of features

Lcontrastive(vI, vIt)

<latexit sha1_base64="A/7PWM+VfPfU7Nqhfek7uRT9E=">ACOXicbVDLSgMxFM34rPVdekm2AoKUmYU0aXoRsFBfuAtpZMmqmhmcyQ3CmUIb/lxr9wJ7hxoYhbf8C0nYWPHgczjmX3Hv8WHANrvszMzOzS8s5pbyura+uFjc2ajhJFWZVGIlINn2gmuGRV4CBYI1aMhL5gdb9/MfLrA6Y0j+QtDGPWDklP8oBTAlbqFCql607aCgncqzClkQRFNPABM2ZvrPpBOjBZwvIrYw7wdOMuBWP2S51C0S27Y+D/xMtIEWodApPrW5Ek5BJoIJo3fTcGNopUcCpYCbfSjSLCe2THmtaKknIdDsdX27wrlW6OIiUfRLwWP05kZJQ62Ho2+RoUf3XG4nTvGYCwWk75TJOgEk6+ShIBIYIj2rEXa4YBTG0hFDF7a6Y3hNFKNiy87YE7+/J/0ntsOwdlY9vDotn51kdObSNdtAe8tAJOkOXqIKqiKIH9ILe0Lvz6Lw6H87nJDrjZDNb6Becr2/jF6/1</latexit>
slide-66
SLIDE 66

Contrastive Learning

Groups of Related and Unrelated Images

slide-67
SLIDE 67

Contrastive Learning

Groups of Related and Unrelated Images Shared network (Siamese Net) Image Features (Embeddings)

slide-68
SLIDE 68

Contrastive Learning

Related and Unrelated Images Shared network (Siamese Net) Image Features (Embeddings)

Loss Function

Embeddings from related images should be closer than embeddings from unrelated images

Hadsell et al., 2005, DrLim

d( d( ) ) < d( d( ) ) <

slide-69
SLIDE 69

Contrastive Loss Function

Loss Function

Embeddings from related images should be closer than embeddings from unrelated images

Hadsell et al., 2005, DrLim

d( d( ) ) < d( d( ) ) <

Positive (Related) Negative (Unrelated)

Good negatives are very important in contrastive learning

slide-70
SLIDE 70

Contrastive learning -- what does it do?

Negative samples Negative samples Positive Sample

slide-71
SLIDE 71

How does this relate to "pretext" tasks?

slide-72
SLIDE 72

72

PIRL - How it works

I

<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>

M

<latexit sha1_base64="Tn5ijQgOnAy4mPiBG3gQMhvSFH4=">ACE3icbVDLSsNAFJ3UV62vqEs3waZQXZSkio9d0Y0uhAr2AU0tk+mkHTp5MHMjlNB/cOvuHGhiFs37vwbkzSIWg8MnDnXu69xw4k2AYn0pubn5hcSm/XFhZXVvfUDe3mtIPBaEN4nNftG0sKWcebQADTtuBoNi1OW3Zo/PEb91RIZnv3cA4oF0XDzmMIhlnrqfkm3giHrWTCkgMuWi2FoO9Hl5Bb29IKe/gnm0dVE76lFo2Kk0GaJmZEiylDvqR9W3yehSz0gHEvZMY0AuhEWwAink4IVShpgMsID2omph10qu1F60QrxUpfc3wRPw+0VP3ZEWFXyrFrx5XJjvKvl4j/eZ0QnJNuxLwgBOqR6SAn5Br4WhKQ1meCEuDjmGAiWLyrRoZYAJxjIU0hNMER98nz5JmtWIeVA6vq8XaWRZHu2gXVRGJjpGNXSB6qiBCLpHj+gZvSgPypPyqrxNS3NK1rONfkF5/wJm4J1f</latexit>

Memory Bank

Similar

θ

<latexit sha1_base64="CZMu9i9mqYh9pdzTQnbewbqTXR0=">AB73icbVDLSsNAFJ3UV62vqks3g63gqiRVfOyKblxWsA9oQ5lMJ+3QySTO3Agl9CfcuFDErb/jzr9xkgZR64ELh3Pu5d57vEhwDb9aRWldW14rpY3Nre2d8u5eW4exoqxFQxGqrkc0E1yFnAQrBspRgJPsI43uU79zgNTmofyDqYRcwMyktznlICRutU+jBmQ6qBcsWt2BrxInJxUI7moPzRH4Y0DpgEKojWPceOwE2IAk4Fm5X6sWYRoRMyYj1DJQmYdpPs3hk+MsoQ+6EyJQFn6s+JhARaTwPdAYExvqvl4r/eb0Y/As34TKgUk6X+THAkOI0+fxkCtGQUwNIVRxcyumY6IBRNRKQvhMsXZ98uLpF2vOSe109t6pXGVx1FEB+gQHSMHnaMGukFN1EIUCfSIntGLdW89Wa/W27y1YOUz+gXrPcvduyPug=</latexit>

mI

<latexit sha1_base64="Ly2C3v3mHZfg5kOK31IrZoYRcrg=">ACt3icbVFNb9QwEHXCVwkfXeDIxWK3UpHQKhuWwt6qcoEDUpHYtmKzRI7jbKw6TrAnhZXlv8iBG/8GJxvKljKSrTdv3shvxmktuIYw/OX5N27eun1n525w7/6Dh7uDR49PdNUoyua0EpU6S4lmgks2Bw6CndWKkTIV7DQ9f9vWTy+Y0rySn2Bds2VJVpLnBJwVDL4sTeK64InMRQMyH5cEijS3Ly3X+D5KHDFlqBEmA/WpXGjWU3oOVmxRXbBay1JyfTSfO+MWCfIWO6cdKkhuiaKrBptjVql1oTj6AUOxwftNY1scEV8pKpv8lL4ataKou4ObWDibdHoj8vSJuavYzuyQTIYuoYu8HUw6cEQ9XGcDH7GWUWbkmgmi9mIQ1LA1RwKlgzuPWxA72827GxXuOyXBeKXck4I7d7jCk1Hpdpk7ZutT/1lryf7VFA/mbpeGyboBJunkobwSGCrefiDOuGAWxdoBQxZ1XTAu3awruqzdLmLVxcDnydXASjScvx9OP0fDwqF/HDnqKnqF9NEGv0SF6h47RHFv6n32qJf5Mz/xc7/YSH2v73mCroT/9Tdnf9X</latexit>

mI0

<latexit sha1_base64="y2FWsphi+t+VGdcXlc76o/Lki9c=">ADHicbVLbtQwFHXCq4RHp7BkY3WmokholAltobuqbGCBVKROW2kyRI7jzFh1HMt2CiPLH8CGX2HDAoTY8gHs+BucTDpMH1fK1fG956c3DgVjCodhn89/8bNW7fvrNwN7t1/8HC1s/boSJWVxGSIS1bKkxQpwignQ01IydCElSkjBynp6/r/vEZkYqW/FDPBkXaMJpTjHSrpSsesbvVhMaRLrKdFoMy6Qnqa5eWs/6Ge9wDXrAkbMvLPuGFeKCIRP0YSMsjMqFEcFUWPzqXFiHSEjubPSHA1SAk0qZQ1cpJaE/aj5zDs79RpK7LBJfq+LD/yBXV7t6ZFTQ6dsomXWb1zo4VNzH/TtudUL4geEsQWmEt0kG5wLNozr9Z42gkmn60abgFfBoAVd0MZB0vkTZyWuCsI1Zkip0SAUemyQ1BQzYoPlLTrY7nC+QrjhKhnMS+kermFTXZ4wqFBqVqSOWdtUl3t18breqNL5q7GhXFSacDx/UV4xqEtY3wyYUmwZjMHEJbUeYV46v4f1u7+zJewW8fO4pOvgqOoP3jR3ofdf23WsgCdgHWyCAXgJ9sAbcACGAHufva/ed+H/8X/5v/0f82pvtfOPAYXwv/9D6/i9XA=</latexit>

Dissimilar Dissimilar

mI0

<latexit sha1_base64="NZU1SuG24a2ZQjMLXDCxlDXMhQ=">ADYHicjVLPb9MwFHYTfpSwsRZucIloJ4aEqiR0g92mcYED0pDWbVJTIsdxWmvOD9lOobL8T3LjwIW/BDsJpaxD4kl+en7v8+fPfi8uKeHC8753LPvO3Xv3uw+chzu7j/Z6/cXvKgYwhNU0IJdxZBjSnI8EURQfFUyDLOY4sv4+p2pXy4x46TIz8WqxLMznOSEgSFTkX9znJ/GJYLEoVigQU8CDMoFnEqP6jP4uXQ0UWTQJDKj0pvw4rjEqJrOMfTZElKnsM85n8WitRGpDgVEuptxLyEjI4r7iSbB4r6Y2CV643OjJuHCjnBvyUFV/yNfTw2MC2nuaWYabqOFvoZmK5B/RarjFeo4hXZN6hu2wdhuUNeR2xhcN5b8ZfUNWu/FYOf/NGPUGWk1t7nbgt8EAtHYW9b6FSYGqDOcCUcj51PdKMZOQCYIoVs5mZ3TY9qVpi7uvM4mbFkyvXLh1dvOEhBnqyzWSCOT36yZ5G21aSXStzNJ8rISOEfNRWlFXVG4ZtrchDCMBF3pACJGtFYXLfRMIKFnsvmEY2NH6ydvBxfByH89Gn8KBien7Xd0wTPwHBwAH7wBJ+A9OAMTgDo/LNvasXatn3bX3rP7DdTqtGegL/MfvoLchwPOg=</latexit>

Similar

θ

<latexit sha1_base64="CZMu9i9mqYh9pdzTQnbewbqTXR0=">AB73icbVDLSsNAFJ3UV62vqks3g63gqiRVfOyKblxWsA9oQ5lMJ+3QySTO3Agl9CfcuFDErb/jzr9xkgZR64ELh3Pu5d57vEhwDb9aRWldW14rpY3Nre2d8u5eW4exoqxFQxGqrkc0E1yFnAQrBspRgJPsI43uU79zgNTmofyDqYRcwMyktznlICRutU+jBmQ6qBcsWt2BrxInJxUI7moPzRH4Y0DpgEKojWPceOwE2IAk4Fm5X6sWYRoRMyYj1DJQmYdpPs3hk+MsoQ+6EyJQFn6s+JhARaTwPdAYExvqvl4r/eb0Y/As34TKgUk6X+THAkOI0+fxkCtGQUwNIVRxcyumY6IBRNRKQvhMsXZ98uLpF2vOSe109t6pXGVx1FEB+gQHSMHnaMGukFN1EIUCfSIntGLdW89Wa/W27y1YOUz+gXrPcvduyPug=</latexit>

res5 res5

It

<latexit sha1_base64="ZYkFvcOHAPMcQbcKwhPq3ujeh+c=">ACG3icbVDLSsNAFJ34rPEVdelmsCnUTUkqosuiG91VsA9oY5lMJ+3QySTMTIQS+h9u/BU3LhRxJbjwb5y0WdTWAwOHc85l7j1+zKhUjvNjrKyurW9sFrbM7Z3dvX3r4LApo0Rg0sARi0TbR5IwyklDUcVIOxYEhT4jLX90nfmtRyIkjfi9GsfEC9GA04BipLTUs6ole1DuhkgN/SC9nTyoU9s2cGcpAV7PmD3rKJTcaAy8TNSRHkqPesr24/wklIuMIMSdlxnVh5KRKYkYmZjeRJEZ4hAakoylHIZFeOr1tAkta6cMgEvpxBafq/ESKQinHoa+T2ZJy0cvE/7xOoJL6U8ThThePZRkDCoIpgVBftUEKzYWBOEBdW7QjxEAmGl6zR1Ce7iycukWa24Z5Xzu2qxdpXUQDH4ASUgQsuQA3cgDpoAyewAt4A+/Gs/FqfBifs+iKkc8cgT8wvn8BRCKfGg=</latexit>

g(vIt)

<latexit sha1_base64="KWmfCBMJGy46BFqyZVbSAliZW7g=">ACK3icbVDLSsNAFJ34rPEVdekm2BbqpiQV0WpG91VsA9oY5hMJ+3QySTMTAol5H/c+CsudOEDt/6H0zaL2Hpg4HDOucy9x4soEdKyPrW19Y3Nre3Cjr67t39waBwdt0UYc4RbKQh73pQYEoYbkiKe5GHMPAo7jW9mfmeCuSAhe5DTCDsBHDLiEwSlklyjUS4NK/0AypHnJ3fpozwv6eWSn5OUkItMUjfJx5XtGkWras1hrhI7I0WQoekar/1BiOIAM4koFKJnW5F0EsglQRSnej8WOIJoDIe4pyiDARZOMr81NctKGZh+yNVj0pyr+YkEBkJMA08lZ2uKZW8m/uf1YulfOwlhUSwxQ4uP/JiaMjRnxZkDwjGSdKoIRJyoXU0ghwiqerVQn28smrpF2r2hfVy/tasd7I6iAU3AGKsAGV6AObkETtACT+AFvIMP7Vl7076070V0TctmTsAfaD+/XpSmBQ=</latexit>

f(vI)

<latexit sha1_base64="1XIy5Ptm3yjnfAU4ZvSDtu2g1M8=">ACKXicbVDLSsNAFJ34rPEVdekm2BbqpiQV0WXRje4q2Ae0MUymk3boZBJmJoUS8jtu/BU3Coq69UectAFr64GBM+fcy73eBElQlrWp7ayura+sVnY0rd3dvf2jYPDlghjnAThTkHQ8KTAnDTUkxZ2IYxh4FLe90Xmt8eYCxKyezmJsBPASM+QVAqyTXq5dKg0gugHp+cps+yNOSXi75c5IS5v7j1E1+PW6RtGqWlOYy8TOSRHkaLjGa68fojATCIKhejaViSdBHJEMWp3osFjiAawQHuKspgIWTC9NzbJS+qYfcvWYNKfqfEcCAyEmgacqsyXFopeJ/3ndWPqXTkJYFEvM0GyQH1NThmYWm9knHCNJ4pAxIna1URDyCGSKlxdhWAvnrxMWrWqfVY9v6sV61d5HAVwDE5ABdjgAtTBDWiAJkDgETyDN/CuPWkv2of2NStd0fKeI/AH2vcPpm6lHg=</latexit>

vI

<latexit sha1_base64="UJ5UsvgNfPvuSUTDITK4OtV6Cvg=">ACRHicdZDLSsNAFIYn9VbjLerSTbAp1E1JKqLohvdVbAXSGOYTCft0MmFmUmhDycGx/AnU/gxoUibsVJW7C2emDg5/vP4Zz5vZgSLkzWSmsrK6tbxQ31a3tnd09bf+gxaOEIdxEY1Yx4McUxLipiC4k7MAw8itve8Cr32yPMOInCOzGOsRPAfkh8gqCQyNXstGvdAMoBp6f3mT34sRQy4Y/hxbAKHPTHzN3jX8sw9VKZtWclL4srJkogVk1XO2p24tQEuBQIAo5ty0zFk4KmSCI4kztJhzHEA1hH9tShjDA3EknIWR6WZKe7kdMvlDoEzo/kcKA83Hgyc78Rr7o5fAvz06Ef+GkJIwTgUM0XeQnVBeRnieq9wjDSNCxFBAxIm/V0QAyiITMXZUhWItfXhatWtU6rZ7d1kr1y1kcRXAEjkEFWOAc1ME1aIAmQOABvIA38K48Kq/Kh/I5bS0os5lD8KuUr2+xRbBF</latexit>

vIt

<latexit sha1_base64="8oYhATVQxn+kCv7hSq1/xFVvwEs=">ACSHicbZBLS8NAFIUn9VXrq+rSTbAp1E1JKqLohvdVbAP6CNMpN26OTBzE2hPw8Ny7d+RvcuFDEnZO2YG17YeDwnXu5d4TcibBN+0zMbm1vZOdje3t39weJQ/PmnIBKE1knA9FysKSc+bQODhthYJiz+G06YzuUr85pkKywH+CSUi7Hh74zGUEg0J23i4ag1LHwzB03Pgh6cGFkSsa7gJaAuPEjv/M1DXW70YksSw8wWzbE5LXxXWXBTQvGp2/rXTD0jkUR8Ix1K2LTOEbowFMJpkutEkoaYjPCAtpX0sUdlN54GkehFRfq6Gwj1fNCndHEixp6UE89RnemdctlL4TqvHYF7042ZH0ZAfTJb5EZch0BPU9X7TFACfKIEJoKpW3UyxAITUNnVAjW8pdXRaNSti7LV4+VQvV2HkcWnaFzVEIWukZVdI9qI4Iekbv6BN9aS/ah/at/cxaM9p85hT9q0zmF4FDsTg=</latexit>

Should be similar

<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit> <latexit sha1_base64="Tn5ijQgOnAy4mPiBG3gQMhvSFH4=">ACE3icbVDLSsNAFJ3UV62vqEs3waZQXZSkio9d0Y0uhAr2AU0tk+mkHTp5MHMjlNB/cOvuHGhiFs37vwbkzSIWg8MnDnXu69xw4k2AYn0pubn5hcSm/XFhZXVvfUDe3mtIPBaEN4nNftG0sKWcebQADTtuBoNi1OW3Zo/PEb91RIZnv3cA4oF0XDzmMIhlnrqfkm3giHrWTCkgMuWi2FoO9Hl5Bb29IKe/gnm0dVE76lFo2Kk0GaJmZEiylDvqR9W3yehSz0gHEvZMY0AuhEWwAink4IVShpgMsID2omph10qu1F60QrxUpfc3wRPw+0VP3ZEWFXyrFrx5XJjvKvl4j/eZ0QnJNuxLwgBOqR6SAn5Br4WhKQ1meCEuDjmGAiWLyrRoZYAJxjIU0hNMER98nz5JmtWIeVA6vq8XaWRZHu2gXVRGJjpGNXSB6qiBCLpHj+gZvSgPypPyqrxNS3NK1rONfkF5/wJm4J1f</latexit>

θ

<latexit sha1_base64="CZMu9i9mqYh9pdzTQnbewbqTXR0=">AB73icbVDLSsNAFJ3UV62vqks3g63gqiRVfOyKblxWsA9oQ5lMJ+3QySTO3Agl9CfcuFDErb/jzr9xkgZR64ELh3Pu5d57vEhwDb9aRWldW14rpY3Nre2d8u5eW4exoqxFQxGqrkc0E1yFnAQrBspRgJPsI43uU79zgNTmofyDqYRcwMyktznlICRutU+jBmQ6qBcsWt2BrxInJxUI7moPzRH4Y0DpgEKojWPceOwE2IAk4Fm5X6sWYRoRMyYj1DJQmYdpPs3hk+MsoQ+6EyJQFn6s+JhARaTwPdAYExvqvl4r/eb0Y/As34TKgUk6X+THAkOI0+fxkCtGQUwNIVRxcyumY6IBRNRKQvhMsXZ98uLpF2vOSe109t6pXGVx1FEB+gQHSMHnaMGukFN1EIUCfSIntGLdW89Wa/W27y1YOUz+gXrPcvduyPug=</latexit> <latexit sha1_base64="Ly2C3v3mHZfg5kOK31IrZoYRcrg=">ACt3icbVFNb9QwEHXCVwkfXeDIxWK3UpHQKhuWwt6qcoEDUpHYtmKzRI7jbKw6TrAnhZXlv8iBG/8GJxvKljKSrTdv3shvxmktuIYw/OX5N27eun1n525w7/6Dh7uDR49PdNUoyua0EpU6S4lmgks2Bw6CndWKkTIV7DQ9f9vWTy+Y0rySn2Bds2VJVpLnBJwVDL4sTeK64InMRQMyH5cEijS3Ly3X+D5KHDFlqBEmA/WpXGjWU3oOVmxRXbBay1JyfTSfO+MWCfIWO6cdKkhuiaKrBptjVql1oTj6AUOxwftNY1scEV8pKpv8lL4ataKou4ObWDibdHoj8vSJuavYzuyQTIYuoYu8HUw6cEQ9XGcDH7GWUWbkmgmi9mIQ1LA1RwKlgzuPWxA72827GxXuOyXBeKXck4I7d7jCk1Hpdpk7ZutT/1lryf7VFA/mbpeGyboBJunkobwSGCrefiDOuGAWxdoBQxZ1XTAu3awruqzdLmLVxcDnydXASjScvx9OP0fDwqF/HDnqKnqF9NEGv0SF6h47RHFv6n32qJf5Mz/xc7/YSH2v73mCroT/9Tdnf9X</latexit> <latexit sha1_base64="y2FWsphi+t+VGdcXlc76o/Lki9c=">ADHicbVLbtQwFHXCq4RHp7BkY3WmokholAltobuqbGCBVKROW2kyRI7jzFh1HMt2CiPLH8CGX2HDAoTY8gHs+BucTDpMH1fK1fG956c3DgVjCodhn89/8bNW7fvrNwN7t1/8HC1s/boSJWVxGSIS1bKkxQpwignQ01IydCElSkjBynp6/r/vEZkYqW/FDPBkXaMJpTjHSrpSsesbvVhMaRLrKdFoMy6Qnqa5eWs/6Ge9wDXrAkbMvLPuGFeKCIRP0YSMsjMqFEcFUWPzqXFiHSEjubPSHA1SAk0qZQ1cpJaE/aj5zDs79RpK7LBJfq+LD/yBXV7t6ZFTQ6dsomXWb1zo4VNzH/TtudUL4geEsQWmEt0kG5wLNozr9Z42gkmn60abgFfBoAVd0MZB0vkTZyWuCsI1Zkip0SAUemyQ1BQzYoPlLTrY7nC+QrjhKhnMS+kermFTXZ4wqFBqVqSOWdtUl3t18breqNL5q7GhXFSacDx/UV4xqEtY3wyYUmwZjMHEJbUeYV46v4f1u7+zJewW8fO4pOvgqOoP3jR3ofdf23WsgCdgHWyCAXgJ9sAbcACGAHufva/ed+H/8X/5v/0f82pvtfOPAYXwv/9D6/i9XA=</latexit> <latexit sha1_base64="NZU1SuG24a2ZQjMLXDCxlDXMhQ=">ADYHicjVLPb9MwFHYTfpSwsRZucIloJ4aEqiR0g92mcYED0pDWbVJTIsdxWmvOD9lOobL8T3LjwIW/BDsJpaxD4kl+en7v8+fPfi8uKeHC8753LPvO3Xv3uw+chzu7j/Z6/cXvKgYwhNU0IJdxZBjSnI8EURQfFUyDLOY4sv4+p2pXy4x46TIz8WqxLMznOSEgSFTkX9znJ/GJYLEoVigQU8CDMoFnEqP6jP4uXQ0UWTQJDKj0pvw4rjEqJrOMfTZElKnsM85n8WitRGpDgVEuptxLyEjI4r7iSbB4r6Y2CV643OjJuHCjnBvyUFV/yNfTw2MC2nuaWYabqOFvoZmK5B/RarjFeo4hXZN6hu2wdhuUNeR2xhcN5b8ZfUNWu/FYOf/NGPUGWk1t7nbgt8EAtHYW9b6FSYGqDOcCUcj51PdKMZOQCYIoVs5mZ3TY9qVpi7uvM4mbFkyvXLh1dvOEhBnqyzWSCOT36yZ5G21aSXStzNJ8rISOEfNRWlFXVG4ZtrchDCMBF3pACJGtFYXLfRMIKFnsvmEY2NH6ydvBxfByH89Gn8KBien7Xd0wTPwHBwAH7wBJ+A9OAMTgDo/LNvasXatn3bX3rP7DdTqtGegL/MfvoLchwPOg=</latexit> <latexit sha1_base64="CZMu9i9mqYh9pdzTQnbewbqTXR0=">AB73icbVDLSsNAFJ3UV62vqks3g63gqiRVfOyKblxWsA9oQ5lMJ+3QySTO3Agl9CfcuFDErb/jzr9xkgZR64ELh3Pu5d57vEhwDb9aRWldW14rpY3Nre2d8u5eW4exoqxFQxGqrkc0E1yFnAQrBspRgJPsI43uU79zgNTmofyDqYRcwMyktznlICRutU+jBmQ6qBcsWt2BrxInJxUI7moPzRH4Y0DpgEKojWPceOwE2IAk4Fm5X6sWYRoRMyYj1DJQmYdpPs3hk+MsoQ+6EyJQFn6s+JhARaTwPdAYExvqvl4r/eb0Y/As34TKgUk6X+THAkOI0+fxkCtGQUwNIVRxcyumY6IBRNRKQvhMsXZ98uLpF2vOSe109t6pXGVx1FEB+gQHSMHnaMGukFN1EIUCfSIntGLdW89Wa/W27y1YOUz+gXrPcvduyPug=</latexit>

res5

<latexit sha1_base64="ZYkFvcOHAPMcQbcKwhPq3ujeh+c=">ACG3icbVDLSsNAFJ34rPEVdelmsCnUTUkqosuiG91VsA9oY5lMJ+3QySTMTIQS+h9u/BU3LhRxJbjwb5y0WdTWAwOHc85l7j1+zKhUjvNjrKyurW9sFrbM7Z3dvX3r4LApo0Rg0sARi0TbR5IwyklDUcVIOxYEhT4jLX90nfmtRyIkjfi9GsfEC9GA04BipLTUs6ole1DuhkgN/SC9nTyoU9s2cGcpAV7PmD3rKJTcaAy8TNSRHkqPesr24/wklIuMIMSdlxnVh5KRKYkYmZjeRJEZ4hAakoylHIZFeOr1tAkta6cMgEvpxBafq/ESKQinHoa+T2ZJy0cvE/7xOoJL6U8ThThePZRkDCoIpgVBftUEKzYWBOEBdW7QjxEAmGl6zR1Ce7iycukWa24Z5Xzu2qxdpXUQDH4ASUgQsuQA3cgDpoAyewAt4A+/Gs/FqfBifs+iKkc8cgT8wvn8BRCKfGg=</latexit> <latexit sha1_base64="KWmfCBMJGy46BFqyZVbSAliZW7g=">ACK3icbVDLSsNAFJ34rPEVdekm2BbqpiQV0WpG91VsA9oY5hMJ+3QySTMTAol5H/c+CsudOEDt/6H0zaL2Hpg4HDOucy9x4soEdKyPrW19Y3Nre3Cjr67t39waBwdt0UYc4RbKQh73pQYEoYbkiKe5GHMPAo7jW9mfmeCuSAhe5DTCDsBHDLiEwSlklyjUS4NK/0AypHnJ3fpozwv6eWSn5OUkItMUjfJx5XtGkWras1hrhI7I0WQoekar/1BiOIAM4koFKJnW5F0EsglQRSnej8WOIJoDIe4pyiDARZOMr81NctKGZh+yNVj0pyr+YkEBkJMA08lZ2uKZW8m/uf1YulfOwlhUSwxQ4uP/JiaMjRnxZkDwjGSdKoIRJyoXU0ghwiqerVQn28smrpF2r2hfVy/tasd7I6iAU3AGKsAGV6AObkETtACT+AFvIMP7Vl7076070V0TctmTsAfaD+/XpSmBQ=</latexit> <latexit sha1_base64="1XIy5Ptm3yjnfAU4ZvSDtu2g1M8=">ACKXicbVDLSsNAFJ34rPEVdekm2BbqpiQV0WXRje4q2Ae0MUymk3boZBJmJoUS8jtu/BU3Coq69UectAFr64GBM+fcy73eBElQlrWp7ayura+sVnY0rd3dvf2jYPDlghjnAThTkHQ8KTAnDTUkxZ2IYxh4FLe90Xmt8eYCxKyezmJsBPASM+QVAqyTXq5dKg0gugHp+cps+yNOSXi75c5IS5v7j1E1+PW6RtGqWlOYy8TOSRHkaLjGa68fojATCIKhejaViSdBHJEMWp3osFjiAawQHuKspgIWTC9NzbJS+qYfcvWYNKfqfEcCAyEmgacqsyXFopeJ/3ndWPqXTkJYFEvM0GyQH1NThmYWm9knHCNJ4pAxIna1URDyCGSKlxdhWAvnrxMWrWqfVY9v6sV61d5HAVwDE5ABdjgAtTBDWiAJkDgETyDN/CuPWkv2of2NStd0fKeI/AH2vcPpm6lHg=</latexit> <latexit sha1_base64="UJ5UsvgNfPvuSUTDITK4OtV6Cvg=">ACRHicdZDLSsNAFIYn9VbjLerSTbAp1E1JKqLohvdVbAXSGOYTCft0MmFmUmhDycGx/AnU/gxoUibsVJW7C2emDg5/vP4Zz5vZgSLkzWSmsrK6tbxQ31a3tnd09bf+gxaOEIdxEY1Yx4McUxLipiC4k7MAw8itve8Cr32yPMOInCOzGOsRPAfkh8gqCQyNXstGvdAMoBp6f3mT34sRQy4Y/hxbAKHPTHzN3jX8sw9VKZtWclL4srJkogVk1XO2p24tQEuBQIAo5ty0zFk4KmSCI4kztJhzHEA1hH9tShjDA3EknIWR6WZKe7kdMvlDoEzo/kcKA83Hgyc78Rr7o5fAvz06Ef+GkJIwTgUM0XeQnVBeRnieq9wjDSNCxFBAxIm/V0QAyiITMXZUhWItfXhatWtU6rZ7d1kr1y1kcRXAEjkEFWOAc1ME1aIAmQOABvIA38K48Kq/Kh/I5bS0os5lD8KuUr2+xRbBF</latexit> <latexit sha1_base64="8oYhATVQxn+kCv7hSq1/xFVvwEs=">ACSHicbZBLS8NAFIUn9VXrq+rSTbAp1E1JKqLohvdVbAP6CNMpN26OTBzE2hPw8Ny7d+RvcuFDEnZO2YG17YeDwnXu5d4TcibBN+0zMbm1vZOdje3t39weJQ/PmnIBKE1knA9FysKSc+bQODhthYJiz+G06YzuUr85pkKywH+CSUi7Hh74zGUEg0J23i4ag1LHwzB03Pgh6cGFkSsa7gJaAuPEjv/M1DXW70YksSw8wWzbE5LXxXWXBTQvGp2/rXTD0jkUR8Ix1K2LTOEbowFMJpkutEkoaYjPCAtpX0sUdlN54GkehFRfq6Gwj1fNCndHEixp6UE89RnemdctlL4TqvHYF7042ZH0ZAfTJb5EZch0BPU9X7TFACfKIEJoKpW3UyxAITUNnVAjW8pdXRaNSti7LV4+VQvV2HkcWnaFzVEIWukZVdI9qI4Iekbv6BN9aS/ah/at/cxaM9p85hT9q0zmF4FDsTg=</latexit>

Unrelated (Negative)

slide-73
SLIDE 73

73

Better self-supervised learning objective

Accuracy on ImageNet-1K

slide-74
SLIDE 74

74

Object Detection

  • Outperforms ImageNet supervised pre-trained networks
  • Full fine-tuning, no bells & whistles
  • No extra data, changes in model architecture, fine-tuning schedule

Initialization VOC07+12 VOC07 APall AP50 AP75 APall AP50 AP75

ImageNet Supervised

52.6 81.1 57.4 43.8 74.5 45.9

PIRL

54.0 80.7 59.7 44.7 73.4 47.0

+1.4 +2.3 +1.1

slide-75
SLIDE 75

75

Linear Classification

  • Linear classifiers on fixed features. Evaluate on ImageNet-1K

CPCv2

slide-76
SLIDE 76

76

Easily Multi-task

Transfer Dataset Method ImageNet-1M VOC07 Places205 iNaturalist

Jigsaw

46.0 66.1 41.4 22.1

Rotation

48.9 63.9 47.6 23

PIRL (Rot)

60.2 77.1 47.6 31.2

PIRL (Jigsaw + Rot)

63.1 80.3 49.7 33.6

slide-77
SLIDE 77

The rise of contrastive learning

slide-78
SLIDE 78

Contrastive Learning

  • How to define what images are "related" and "unrelated"?

Related and Unrelated Images

slide-79
SLIDE 79

Frames of a video

Hadsell et al., 2005, DrLim van der Oord et al., 2018, CPC

Video & Audio

AVID - Morgado et al., ECCV 2020 GDT - Patrick et al., 2020

slide-80
SLIDE 80

80

Tracking Objects

Wang & Gupta, 2015, Unsupervised Learning of Visual Representations using Videos

slide-81
SLIDE 81

van der Oord et al., 2018, Henaff et al., 2019 Contrastive Predictive Coding

Related (Positives) Unrelated (Negative)

Nearby patches vs. distant patches of an Image

slide-82
SLIDE 82

Wu et al., 2018, Instance Discrimination He et al., 2019, MoCo Misra & van der Maaten, 2019, PIRL Chen et al., 2020, SimCLR and lots more ....

Related (Positives) Unrelated (Negative)

Patches of an image vs. patches of other images

slide-83
SLIDE 83

Is "contrastive" really important?

slide-84
SLIDE 84

Contrastive learning -- what does it do?

Positive Sample Negative samples Negative samples

slide-85
SLIDE 85

Contrastive learning -- what does it do?

Positive Sample Negative samples Negative samples

slide-86
SLIDE 86

Contrastive learning -- what does it do?

Creates groups in the feature space

slide-87
SLIDE 87

Contrastive learning -- what does it do?

Creates groups in the feature space So does clustering?!

slide-88
SLIDE 88

Swapping Assignments between Views

(SwAV)

Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin

slide-89
SLIDE 89

89

Grouping

Dataset Prototypes

See also - SeLa by Asano et al., 2019

Similarity of dataset sample & prototypes (which cluster does a sample belong to?)

slide-90
SLIDE 90

90

Grouping

Codes

}

Prototypes Dataset

slide-91
SLIDE 91

fθ fθ

Code 1 Code 2

Prototypes

slide-92
SLIDE 92

fθ fθ

Code 1 Code 2 Predict

Prototypes

slide-93
SLIDE 93

fθ fθ

Prototypes

Code 1 Code 2 Backprop Backprop

Not contrastive!

slide-94
SLIDE 94

94

Key Results

Linear Classifier Detection (Fixed Features) ImageNet Places iNaturalist VOC07+12 COCO

Supervised

76.5 53.2 46.7 81.3 40.8

Prior self-supervised

71.1 (-5.4) 52.1 38.9 82.5 42.0

SwAV

75.3 (-1.2) 56.7 48.6 82.6 42.1

slide-95
SLIDE 95

95

Practical advantages of SwAV

  • Trains on 4-8 GPUs
  • Faster convergence than prior work (SimCLR, MoCov2)
  • Smaller compute requirements.
  • 2x faster than MoCo-v2 on 8 GPUs
  • 72% after 100h vs. 71% after 200h
  • Better results

Code & Models - https://github.com/facebookresearch/swav PyTorch Lightning implementation on the way

1 % S t u d e n t f r i e n d l y

slide-96
SLIDE 96

Combining clustering with contrastive learning

slide-97
SLIDE 97

Audio Visual Instance Discrimination with Cross Modal Agreement

(AVID + CMA)

Pedro Morgado, Nuno Vasconcelos, Ishan Misra

https://github.com/facebookresearch/AVID-CMA

slide-98
SLIDE 98

98

Positives

d( d( ) ) < d( d( ) ) <

Audio & Video (same sample)

}

Negatives

Relate to other video/audio using negatives

}

Contrastive (Audio Video Instance Discrimination)

slide-99
SLIDE 99

99

Grouping using Audio-visual Agreements (CMA)

Positive Set Negative Set

Video Similarity (vT

i vj)

Audio Similarity (aT

i aj)

Positives Visual Negatives Audio Negatives Reference

Positives

d( d( ) ) < d( d( ) ) <

Negatives

}

Videos that are similar in audio & video features

slide-100
SLIDE 100

100

Grouping using Audio-visual Agreements (CMA)

Positive Set Negative Set

Example 3 Example 1 Example 2

Video Similarity (vT

i vj)

Audio Similarity (aT

i aj)

Moving Train

Positives

Dancing Playing Violin Exercising Fire Truck Station

Visual Negatives

Playing Guitar Moving Boat

Audio Negatives

Fishing with background music Playing Accordion Moving Train

Reference

Dancing Playing Violin

slide-101
SLIDE 101
slide-102
SLIDE 102

Pretext tasks Generative Contrastive/Clustering

Related Unrelated

Pretext Image Transform

Transform t

Standar

It

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

I

<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit> <latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

Pr

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit> <latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>

Pretext Invariant Representation Learning

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit> <latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit> <latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

Representation

ConvNet

Representation

ConvNet Encourage to be similar

It

<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>

I

<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>

ConvNet

Predict more information

AutoEncoder, VAE, GAN, BiGAN