Self-supervised learning in computer vision Ishan Misra Facebook AI - - PowerPoint PPT Presentation
Self-supervised learning in computer vision Ishan Misra Facebook AI - - PowerPoint PPT Presentation
Self-supervised learning in computer vision Ishan Misra Facebook AI Research With slides from Andrew Zisserman, Carl Doersch Success story of supervision: Pre-training Features from networks pre-trained on ImageNet can be used for a variety
2
Images from ImageNet (Pre-train)
ConvNet Learn a representation husky, terrier, tench, ...
- Features from networks pre-trained on ImageNet can be used for a variety of different
downstream tasks
Success story of supervision: Pre-training
3
- Pre-train on a large supervised dataset.
- Collect a dataset of "supervised" images
- Train a ConvNet
Success story of supervision: Recipe for good solutions
4
- Getting "real" labels is difficult and expensive
- ImageNet with 14M images took 22 human years.
- Obtain labels using a "semi-automatic" process
- Hashtags
- GPS locations
- Using the data itself: "self"-supervised
The promise of "alternative" supervision
5
Can we get labels for all data?
6
Can we get labels for all data?
0E+00 2.5E+05 5E+05 7.5E+05 1E+06 Bounding Boxes
Stats from Pawan Kumar at Oxford
Dog, chair, pizza, donut
Dog, chair, pizza, donut
7
Can we get labels for all data?
0E+00 3.5E+06 7E+06 1.05E+07 1.4E+07 Bounding Boxes Image Level
Dog, chair, pizza, donut
8
Can we get labels for all data?
0E+00 3.5E+11 7E+11 1.05E+12 1.4E+12 Bounding Boxes Image Level Internet Photos
forbes.com
https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/
9
Can we get labels for all data?
1E+00 1E+06 1E+12 Bounding Boxes Image Level Internet Photos
10
Can we get labels for all data?
1E+00 1E+06 1E+12 Bounding Boxes Image Level Internet Photos
Real World
ImageNet (14 million images) needed 22 human years to label
11
- What about complex concepts?
- Video?
- Labelling cannot scale to the size of the data we generate
Can we get labels for all data?
12
Rare concepts?
Slide credit: Rob Fergus
10% of the classes account for 93% of the data
13
Different Domains?
ImageNet pre-training may not work
14
- Obtain "labels" from the data itself by using a "semi-automatic" process
- Predict part of the data from other parts
What is "self" supervision?
Observed data Hidden data Hidden property of the data
15
Virginia de Sa, 1994, Image: Learning classification with Unlabeled Data
What is "self" supervision?
16
- Fill in the blanks
Word2vec
word2vec - Mikolov et al. Image by Julian Gilyadov
17
- Fill in the blanks is a powerful signal to learn representations
- Sentence/Word representations: BERT - Devlin et al., 2018
Success of self-supervised learning in NLP
18
- Helps us learn using observations and interactions
- Does not require exhaustive annotation of concepts
- Leverage multiple modalities or structure in the domain
Why self supervision?
In the context of Computer Vision
20
- Self-supervised task used for learning representations
- Often, not the "real" task (like image classification) we care about
Pretext task
Observed data Hidden data Hidden property of the data
Pretext task
Pretext task - Doersch et al., 2015, Unsupervised visual representation learning by context prediction
21
- Using images
- Using video
- Using video and sound
Pretext task
Observed data Hidden data Hidden property of the data
22
- Using images
- Using video
- Using video and sound
Pretext task
23
Relative Position of patches
Doersch et al., 2015, Unsupervised visual representation learning by context prediction
24
Relative Position: Nearest Neighbors in features
Doersch et al., 2015, Unsupervised visual representation learning by context prediction
25
Predicting Rotations
Gidaris et al., 2018, Predicting Image Rotations
00 900 1800 2700
26
Colorization
Zhang and Efros, 2016, Colorful image colorization
27
Fill in the blanks
Pathak et al., 2016, Context auto encoders
28
- Using images
- Using video
- Using video and sound
Self-supervision in computer vision
29
- Video is a "sequence" of frames
- How to get "self-supervision"?
- Predict order of frames
- Fill in the blanks
- Track objects and predict their position
Video
30
Misra et al., 2016, Shuffle and Learn
Shuffle & Learn
31
Shuffle & Learn
32
Shuffle & Learn
33
34
Fine-tune on Human Keypoint Estimation
Shuffle & Learn
35
Fine-tune on Human Keypoint Estimation
Initialization (AlexNet)
End task FLIC Dataset Keypoints AUC MPII Dataset Keypoints AUC
ImageNet Supervised
51.3 47.2
Shuffle and Learn (Self-supervised)
49.6 47.6
Shuffle & Learn
36
Odd-one-out Networks
Fernando et al., 2017, Odd-one-out networks
37
- Using images
- Using video
- Using video and sound
Self-supervision in computer vision
38
Audio-Visual co-supervision
Arandjelović and Zisserman, 2017, “Objects that Sound”
39
Objects that Sound
40
Objects that Sound
41
What can be learnt?
- Good representations – Visual features
– Audio features
- Intra- and cross-modal retrieval
– Aligned audio and visual embeddings
- “What is making the sound?”
– Learn to localize objects that sound
Objects that Sound
42
Objects that Sound
Understanding what the "pretext" task learns
44
Are they complementary?
Initialization (ResNet101)
End task ImageNet top-5 accuracy VOC07 Detection mAP
Relative Position
59.2 66.8
Colorization
62.5 65.5
Relative Position + Colorization (Multi-task)
66.6 68.8
Doersch & Zisserman, 2017, Multi-task self-supervised visual learning
45
Information predicted: varies across tasks
Less More
46
Pretext tasks Generative Contrastive/Clustering
Related Unrelated
Pretext Image Transform
Transform t
Standar
It
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>I
<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit> <latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>Pr
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit> <latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>Pretext Invariant Representation Learning
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit> <latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit> <latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>Representation
ConvNet
Representation
ConvNet Encourage to be similar
It
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>I
<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>ConvNet
Predict more information
AutoEncoder, VAE, GAN, BiGAN
47
Scaling self-supervised learning
Jigsaw puzzles
(Noorozi & Favaro, 2016)
Goyal et al., 2019, Scaling and benchmarking self-supervised visual representation learning
48
Evaluating the representation
ConvNet
Extract "fixed" features
49
Evaluating the representation
- Train a Linear SVM on fixed feature representations
- Use the VOC07 image classification task
50
Increasing amount of information predicted
mAP = mean Average Precision (Higher is better)
Linear classifier on VOC07
51
Surface Normal Estimation
- Predict surface normals on NYU-v2
- Same optimization parameters for all methods (including supervised)
- PSPNet Architecture
- Train last few layers only (res5 onwards)
Image from the NYU dataset
Input Output
52
Surface Normal Estimation
Initialization
Median Error (Lower better) % correct within 11.250 (higher better)
ImageNet Supervised
17.1 36.1
Jigsaw Flickr 100M
13.1 44.6
Outperforms ImageNet supervised
What is missing from "pretext" tasks? Or in general "proxy" tasks
54
Jigsaw puzzles
(Noroozi et al., 2016)
Rotation
(Gidaris et al., 2018)
Pretext tasks
The hope of generalization
- We really hope that the pre-training task and the transfer task are "aligned"
Pre-training
Self-supervised
Transfer Tasks
The hope of generalization
- We really hope that the pre-training task and the transfer task are "aligned"
#sun #nofilter #fun #tree #aruba
Pre-training
Weak or self-supervised
Transfer Tasks
Why should solving Jigsaw puzzles teach about "semantics"? Why should performing a non semantic task produce good features?
The hope of generalization ... ?
Jigsaw Pre-train data ConvNet ConvNet
Linear classifiers on "fixed" features
Pre-training
Weak or self-supervised
Transfer
Higher layers do not generalize ...
Jigsaw
Linear classifier on VOC07
mAP = mean Average Precision (Higher is better) conv1 res5
Pretext-Invariant Representation Learning
(PIRL)
Ishan Misra, Laurens van der Maaten
60
Pretext task
ConvNet
Predict property
- f transform t
It
<latexit sha1_base64="MXdteYo7j3dYQsNtpapn5lT1fY=">AB9XicbVDLSgNBEOz1GeMr6tHLYCJ4CrtBUG9BL3qLYB6QbMLsZDYZMvtgplcJS/7DiwdFvPov3vwbZ5McNLFgoKjqpmvKi6XQaNvf1srq2vrGZm4rv72zu7dfODhs6ChRjNdZJCPV8qjmUoS8jgIlb8WK08CTvOmNbjK/+ciVFlH4gOYuwEdhMIXjKRuqVOQHo+endpIulXqFol+0pyDJx5qQIc9R6ha9OP2JwENkmrduwY3ZQqFEzySb6TaB5TNqID3jY0pAHXbjpNPSGnRukTP1LmhUim6u+NlAZajwPTGYh9aKXif957QT9SzcVYZwgD9nskJ9IghHJKiB9oThDOTaEMiVMVsKGVFGpqi8KcFZ/PIyaVTKzn56r5SrF7P68jBMZzAGThwAVW4hRrUgYGCZ3iFN+vJerHerY/Z6Io13zmCP7A+fwAHuJI5</latexit>Contain information about transform t
It
<latexit sha1_base64="Xy1u5b/DFE3jDC5HvZeUuaZkrIE=">ACBXicbVA7T8MwGHTKq5RXgBEGixaJqUoqJGCrYIGtSPQhNaFyXKe16jiR7SBVURYW/goLAwix8h/Y+Dc4aQZoOcnS6e57+byIUaks69soLS2vrK6V1ysbm1vbO+buXkeGscCkjUMWip6HJGUk7aipFeJAgKPEa63uQq87sPREga8js1jYgboBGnPsVIaWlgHtacAKmx5yc36X3i5AMTj8UkVWltYFatupUDLhK7IFVQoDUwv5xhiOAcIUZkrJvW5FyEyQUxYykFSeWJEJ4gkakrylHAZFuki9N4bFWhtAPhX5cwVz93ZGgQMp4OnK7GQ572Xif14/Vv65m1AexYpwPFvkxwyqEGaRwCEVBCs21QRhQfWtEI+RQFjp4Co6BHv+y4uk06jbp/WL20a1eVnEUQYH4AicABucgSa4Bi3QBhg8gmfwCt6MJ+PFeDc+ZqUlo+jZB39gfP4AL8+ZCQ=</latexit>Image Transform t
I
<latexit sha1_base64="UMBKcX4627dw7ura0nNTPruZI8=">AB83icbVDLSsNAFL2pr1pfUZduBlvBVUmKoO6KbnRXwT6gCWUynbRDJ5MwMxFK6G+4caGIW3/GnX/jpM1CWw8MHM65l3vmBAlnSjvOt1VaW9/Y3CpvV3Z29/YP7MOjopTSWibxDyWvQArypmgbc0p71EUhwFnHaDyW3ud5+oVCwWj3qaUD/CI8FCRrA2klfzIqzHQZjdz2oDu+rUnTnQKnELUoUCrYH95Q1jkZUaMKxUn3XSbSfYakZ4XRW8VJFE0wmeET7hgocUeVn8wzdGaUIQpjaZ7QaK7+3shwpNQ0CsxkHlEte7n4n9dPdXjlZ0wkqaCLA6FKUc6RnkBaMgkJZpPDcFEMpMVkTGWmGhTU8WU4C5/eZV0GnX3on790Kg2b4o6ynACp3AOLlxCE+6gBW0gkMAzvMKblVov1rv1sRgtWcXOMfyB9fkDdcaRUw=</latexit>It
<latexit sha1_base64="MXdteYo7j3dYQsNtpapn5lT1fY=">AB9XicbVDLSgNBEOz1GeMr6tHLYCJ4CrtBUG9BL3qLYB6QbMLsZDYZMvtgplcJS/7DiwdFvPov3vwbZ5McNLFgoKjqpmvKi6XQaNvf1srq2vrGZm4rv72zu7dfODhs6ChRjNdZJCPV8qjmUoS8jgIlb8WK08CTvOmNbjK/+ciVFlH4gOYuwEdhMIXjKRuqVOQHo+endpIulXqFol+0pyDJx5qQIc9R6ha9OP2JwENkmrduwY3ZQqFEzySb6TaB5TNqID3jY0pAHXbjpNPSGnRukTP1LmhUim6u+NlAZajwPTGYh9aKXif957QT9SzcVYZwgD9nskJ9IghHJKiB9oThDOTaEMiVMVsKGVFGpqi8KcFZ/PIyaVTKzn56r5SrF7P68jBMZzAGThwAVW4hRrUgYGCZ3iFN+vJerHerY/Z6Io13zmCP7A+fwAHuJI5</latexit>Less Semantic Features
61
Underlying Principle for Pretext Tasks
Pretext Image Transform
Transform t
Standard Pretext Learning
It
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>I
<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>It
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>Representation
ConvNet Predict property of t
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit> <latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>- Apply known image transform t
- Construct task to predict t from
transformed Image (It )
- Final layer representations must
carry information about t
- Representations "covary" with t
62
How important has invariance been?
- Hand-crafted features like SIFT and HOG
- SIFT - Scale Invariant Feature Transform
- Supervised systems are trained to be invariant
to "data augmentation"
63
Pretext-Invariant Representation Learning (PIRL)
- Be invariant to t
Pretext Image Transform Pretext Invariant Representation Learning
Transform t
Standard Pretext Learning
It
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>I
<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>It
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>Representation
ConvNet Predict property of t
Representation
ConvNet
Representation
ConvNet Encourage to be similar
It
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>I
<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>ConvNet
64
Pretext-Invariant Representation Learning (PIRL)
- Be invariant to t
- Representation
contains no information about t
Pretext Image Transform Pretext Invariant Representation Learning
Transform t
Standard Pretext Learning
It
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>I
<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>It
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>Representation
ConvNet Predict property of t
Representation
ConvNet
Representation
ConvNet Encourage to be similar
It
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>I
<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>ConvNet
65
PIRL
Pretext Invariant Representation Learning
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit> <latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit> <latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>Representation
ConvNet
Representation
ConvNet Encourage to be similar
It
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>I
<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>ConvNet
Pretext Image Transform
Transform t
It
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>I
<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit> <latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>Pr
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit> <latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>- Representations from I and It
should be similar
- t = Pretext Transforms
(Jigsaw/ Rotation, combinations etc.)
- Use a contrastive loss to
enforce similarity of features
Lcontrastive(vI, vIt)
<latexit sha1_base64="A/7PWM+VfPfU7Nqhfek7uRT9E=">ACOXicbVDLSgMxFM34rPVdekm2AoKUmYU0aXoRsFBfuAtpZMmqmhmcyQ3CmUIb/lxr9wJ7hxoYhbf8C0nYWPHgczjmX3Hv8WHANrvszMzOzS8s5pbyura+uFjc2ajhJFWZVGIlINn2gmuGRV4CBYI1aMhL5gdb9/MfLrA6Y0j+QtDGPWDklP8oBTAlbqFCql607aCgncqzClkQRFNPABM2ZvrPpBOjBZwvIrYw7wdOMuBWP2S51C0S27Y+D/xMtIEWodApPrW5Ek5BJoIJo3fTcGNopUcCpYCbfSjSLCe2THmtaKknIdDsdX27wrlW6OIiUfRLwWP05kZJQ62Ho2+RoUf3XG4nTvGYCwWk75TJOgEk6+ShIBIYIj2rEXa4YBTG0hFDF7a6Y3hNFKNiy87YE7+/J/0ntsOwdlY9vDotn51kdObSNdtAe8tAJOkOXqIKqiKIH9ILe0Lvz6Lw6H87nJDrjZDNb6Becr2/jF6/1</latexit>Contrastive Learning
Groups of Related and Unrelated Images
Contrastive Learning
Groups of Related and Unrelated Images Shared network (Siamese Net) Image Features (Embeddings)
Contrastive Learning
Related and Unrelated Images Shared network (Siamese Net) Image Features (Embeddings)
Loss Function
Embeddings from related images should be closer than embeddings from unrelated images
Hadsell et al., 2005, DrLim
d( d( ) ) < d( d( ) ) <
Contrastive Loss Function
Loss Function
Embeddings from related images should be closer than embeddings from unrelated images
Hadsell et al., 2005, DrLim
d( d( ) ) < d( d( ) ) <
Positive (Related) Negative (Unrelated)
Good negatives are very important in contrastive learning
Contrastive learning -- what does it do?
Negative samples Negative samples Positive Sample
How does this relate to "pretext" tasks?
72
PIRL - How it works
I
<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>M
<latexit sha1_base64="Tn5ijQgOnAy4mPiBG3gQMhvSFH4=">ACE3icbVDLSsNAFJ3UV62vqEs3waZQXZSkio9d0Y0uhAr2AU0tk+mkHTp5MHMjlNB/cOvuHGhiFs37vwbkzSIWg8MnDnXu69xw4k2AYn0pubn5hcSm/XFhZXVvfUDe3mtIPBaEN4nNftG0sKWcebQADTtuBoNi1OW3Zo/PEb91RIZnv3cA4oF0XDzmMIhlnrqfkm3giHrWTCkgMuWi2FoO9Hl5Bb29IKe/gnm0dVE76lFo2Kk0GaJmZEiylDvqR9W3yehSz0gHEvZMY0AuhEWwAink4IVShpgMsID2omph10qu1F60QrxUpfc3wRPw+0VP3ZEWFXyrFrx5XJjvKvl4j/eZ0QnJNuxLwgBOqR6SAn5Br4WhKQ1meCEuDjmGAiWLyrRoZYAJxjIU0hNMER98nz5JmtWIeVA6vq8XaWRZHu2gXVRGJjpGNXSB6qiBCLpHj+gZvSgPypPyqrxNS3NK1rONfkF5/wJm4J1f</latexit>Memory Bank
Similar
θ
<latexit sha1_base64="CZMu9i9mqYh9pdzTQnbewbqTXR0=">AB73icbVDLSsNAFJ3UV62vqks3g63gqiRVfOyKblxWsA9oQ5lMJ+3QySTO3Agl9CfcuFDErb/jzr9xkgZR64ELh3Pu5d57vEhwDb9aRWldW14rpY3Nre2d8u5eW4exoqxFQxGqrkc0E1yFnAQrBspRgJPsI43uU79zgNTmofyDqYRcwMyktznlICRutU+jBmQ6qBcsWt2BrxInJxUI7moPzRH4Y0DpgEKojWPceOwE2IAk4Fm5X6sWYRoRMyYj1DJQmYdpPs3hk+MsoQ+6EyJQFn6s+JhARaTwPdAYExvqvl4r/eb0Y/As34TKgUk6X+THAkOI0+fxkCtGQUwNIVRxcyumY6IBRNRKQvhMsXZ98uLpF2vOSe109t6pXGVx1FEB+gQHSMHnaMGukFN1EIUCfSIntGLdW89Wa/W27y1YOUz+gXrPcvduyPug=</latexit>mI
<latexit sha1_base64="Ly2C3v3mHZfg5kOK31IrZoYRcrg=">ACt3icbVFNb9QwEHXCVwkfXeDIxWK3UpHQKhuWwt6qcoEDUpHYtmKzRI7jbKw6TrAnhZXlv8iBG/8GJxvKljKSrTdv3shvxmktuIYw/OX5N27eun1n525w7/6Dh7uDR49PdNUoyua0EpU6S4lmgks2Bw6CndWKkTIV7DQ9f9vWTy+Y0rySn2Bds2VJVpLnBJwVDL4sTeK64InMRQMyH5cEijS3Ly3X+D5KHDFlqBEmA/WpXGjWU3oOVmxRXbBay1JyfTSfO+MWCfIWO6cdKkhuiaKrBptjVql1oTj6AUOxwftNY1scEV8pKpv8lL4ataKou4ObWDibdHoj8vSJuavYzuyQTIYuoYu8HUw6cEQ9XGcDH7GWUWbkmgmi9mIQ1LA1RwKlgzuPWxA72827GxXuOyXBeKXck4I7d7jCk1Hpdpk7ZutT/1lryf7VFA/mbpeGyboBJunkobwSGCrefiDOuGAWxdoBQxZ1XTAu3awruqzdLmLVxcDnydXASjScvx9OP0fDwqF/HDnqKnqF9NEGv0SF6h47RHFv6n32qJf5Mz/xc7/YSH2v73mCroT/9Tdnf9X</latexit>mI0
<latexit sha1_base64="y2FWsphi+t+VGdcXlc76o/Lki9c=">ADHicbVLbtQwFHXCq4RHp7BkY3WmokholAltobuqbGCBVKROW2kyRI7jzFh1HMt2CiPLH8CGX2HDAoTY8gHs+BucTDpMH1fK1fG956c3DgVjCodhn89/8bNW7fvrNwN7t1/8HC1s/boSJWVxGSIS1bKkxQpwignQ01IydCElSkjBynp6/r/vEZkYqW/FDPBkXaMJpTjHSrpSsesbvVhMaRLrKdFoMy6Qnqa5eWs/6Ge9wDXrAkbMvLPuGFeKCIRP0YSMsjMqFEcFUWPzqXFiHSEjubPSHA1SAk0qZQ1cpJaE/aj5zDs79RpK7LBJfq+LD/yBXV7t6ZFTQ6dsomXWb1zo4VNzH/TtudUL4geEsQWmEt0kG5wLNozr9Z42gkmn60abgFfBoAVd0MZB0vkTZyWuCsI1Zkip0SAUemyQ1BQzYoPlLTrY7nC+QrjhKhnMS+kermFTXZ4wqFBqVqSOWdtUl3t18breqNL5q7GhXFSacDx/UV4xqEtY3wyYUmwZjMHEJbUeYV46v4f1u7+zJewW8fO4pOvgqOoP3jR3ofdf23WsgCdgHWyCAXgJ9sAbcACGAHufva/ed+H/8X/5v/0f82pvtfOPAYXwv/9D6/i9XA=</latexit>Dissimilar Dissimilar
mI0
<latexit sha1_base64="NZU1SuG24a2ZQjMLXDCxlDXMhQ=">ADYHicjVLPb9MwFHYTfpSwsRZucIloJ4aEqiR0g92mcYED0pDWbVJTIsdxWmvOD9lOobL8T3LjwIW/BDsJpaxD4kl+en7v8+fPfi8uKeHC8753LPvO3Xv3uw+chzu7j/Z6/cXvKgYwhNU0IJdxZBjSnI8EURQfFUyDLOY4sv4+p2pXy4x46TIz8WqxLMznOSEgSFTkX9znJ/GJYLEoVigQU8CDMoFnEqP6jP4uXQ0UWTQJDKj0pvw4rjEqJrOMfTZElKnsM85n8WitRGpDgVEuptxLyEjI4r7iSbB4r6Y2CV643OjJuHCjnBvyUFV/yNfTw2MC2nuaWYabqOFvoZmK5B/RarjFeo4hXZN6hu2wdhuUNeR2xhcN5b8ZfUNWu/FYOf/NGPUGWk1t7nbgt8EAtHYW9b6FSYGqDOcCUcj51PdKMZOQCYIoVs5mZ3TY9qVpi7uvM4mbFkyvXLh1dvOEhBnqyzWSCOT36yZ5G21aSXStzNJ8rISOEfNRWlFXVG4ZtrchDCMBF3pACJGtFYXLfRMIKFnsvmEY2NH6ydvBxfByH89Gn8KBien7Xd0wTPwHBwAH7wBJ+A9OAMTgDo/LNvasXatn3bX3rP7DdTqtGegL/MfvoLchwPOg=</latexit>Similar
θ
<latexit sha1_base64="CZMu9i9mqYh9pdzTQnbewbqTXR0=">AB73icbVDLSsNAFJ3UV62vqks3g63gqiRVfOyKblxWsA9oQ5lMJ+3QySTO3Agl9CfcuFDErb/jzr9xkgZR64ELh3Pu5d57vEhwDb9aRWldW14rpY3Nre2d8u5eW4exoqxFQxGqrkc0E1yFnAQrBspRgJPsI43uU79zgNTmofyDqYRcwMyktznlICRutU+jBmQ6qBcsWt2BrxInJxUI7moPzRH4Y0DpgEKojWPceOwE2IAk4Fm5X6sWYRoRMyYj1DJQmYdpPs3hk+MsoQ+6EyJQFn6s+JhARaTwPdAYExvqvl4r/eb0Y/As34TKgUk6X+THAkOI0+fxkCtGQUwNIVRxcyumY6IBRNRKQvhMsXZ98uLpF2vOSe109t6pXGVx1FEB+gQHSMHnaMGukFN1EIUCfSIntGLdW89Wa/W27y1YOUz+gXrPcvduyPug=</latexit>res5 res5
It
<latexit sha1_base64="ZYkFvcOHAPMcQbcKwhPq3ujeh+c=">ACG3icbVDLSsNAFJ34rPEVdelmsCnUTUkqosuiG91VsA9oY5lMJ+3QySTMTIQS+h9u/BU3LhRxJbjwb5y0WdTWAwOHc85l7j1+zKhUjvNjrKyurW9sFrbM7Z3dvX3r4LApo0Rg0sARi0TbR5IwyklDUcVIOxYEhT4jLX90nfmtRyIkjfi9GsfEC9GA04BipLTUs6ole1DuhkgN/SC9nTyoU9s2cGcpAV7PmD3rKJTcaAy8TNSRHkqPesr24/wklIuMIMSdlxnVh5KRKYkYmZjeRJEZ4hAakoylHIZFeOr1tAkta6cMgEvpxBafq/ESKQinHoa+T2ZJy0cvE/7xOoJL6U8ThThePZRkDCoIpgVBftUEKzYWBOEBdW7QjxEAmGl6zR1Ce7iycukWa24Z5Xzu2qxdpXUQDH4ASUgQsuQA3cgDpoAyewAt4A+/Gs/FqfBifs+iKkc8cgT8wvn8BRCKfGg=</latexit>g(vIt)
<latexit sha1_base64="KWmfCBMJGy46BFqyZVbSAliZW7g=">ACK3icbVDLSsNAFJ34rPEVdekm2BbqpiQV0WpG91VsA9oY5hMJ+3QySTMTAol5H/c+CsudOEDt/6H0zaL2Hpg4HDOucy9x4soEdKyPrW19Y3Nre3Cjr67t39waBwdt0UYc4RbKQh73pQYEoYbkiKe5GHMPAo7jW9mfmeCuSAhe5DTCDsBHDLiEwSlklyjUS4NK/0AypHnJ3fpozwv6eWSn5OUkItMUjfJx5XtGkWras1hrhI7I0WQoekar/1BiOIAM4koFKJnW5F0EsglQRSnej8WOIJoDIe4pyiDARZOMr81NctKGZh+yNVj0pyr+YkEBkJMA08lZ2uKZW8m/uf1YulfOwlhUSwxQ4uP/JiaMjRnxZkDwjGSdKoIRJyoXU0ghwiqerVQn28smrpF2r2hfVy/tasd7I6iAU3AGKsAGV6AObkETtACT+AFvIMP7Vl7076070V0TctmTsAfaD+/XpSmBQ=</latexit>f(vI)
<latexit sha1_base64="1XIy5Ptm3yjnfAU4ZvSDtu2g1M8=">ACKXicbVDLSsNAFJ34rPEVdekm2BbqpiQV0WXRje4q2Ae0MUymk3boZBJmJoUS8jtu/BU3Coq69UectAFr64GBM+fcy73eBElQlrWp7ayura+sVnY0rd3dvf2jYPDlghjnAThTkHQ8KTAnDTUkxZ2IYxh4FLe90Xmt8eYCxKyezmJsBPASM+QVAqyTXq5dKg0gugHp+cps+yNOSXi75c5IS5v7j1E1+PW6RtGqWlOYy8TOSRHkaLjGa68fojATCIKhejaViSdBHJEMWp3osFjiAawQHuKspgIWTC9NzbJS+qYfcvWYNKfqfEcCAyEmgacqsyXFopeJ/3ndWPqXTkJYFEvM0GyQH1NThmYWm9knHCNJ4pAxIna1URDyCGSKlxdhWAvnrxMWrWqfVY9v6sV61d5HAVwDE5ABdjgAtTBDWiAJkDgETyDN/CuPWkv2of2NStd0fKeI/AH2vcPpm6lHg=</latexit>vI
<latexit sha1_base64="UJ5UsvgNfPvuSUTDITK4OtV6Cvg=">ACRHicdZDLSsNAFIYn9VbjLerSTbAp1E1JKqLohvdVbAXSGOYTCft0MmFmUmhDycGx/AnU/gxoUibsVJW7C2emDg5/vP4Zz5vZgSLkzWSmsrK6tbxQ31a3tnd09bf+gxaOEIdxEY1Yx4McUxLipiC4k7MAw8itve8Cr32yPMOInCOzGOsRPAfkh8gqCQyNXstGvdAMoBp6f3mT34sRQy4Y/hxbAKHPTHzN3jX8sw9VKZtWclL4srJkogVk1XO2p24tQEuBQIAo5ty0zFk4KmSCI4kztJhzHEA1hH9tShjDA3EknIWR6WZKe7kdMvlDoEzo/kcKA83Hgyc78Rr7o5fAvz06Ef+GkJIwTgUM0XeQnVBeRnieq9wjDSNCxFBAxIm/V0QAyiITMXZUhWItfXhatWtU6rZ7d1kr1y1kcRXAEjkEFWOAc1ME1aIAmQOABvIA38K48Kq/Kh/I5bS0os5lD8KuUr2+xRbBF</latexit>vIt
<latexit sha1_base64="8oYhATVQxn+kCv7hSq1/xFVvwEs=">ACSHicbZBLS8NAFIUn9VXrq+rSTbAp1E1JKqLohvdVbAP6CNMpN26OTBzE2hPw8Ny7d+RvcuFDEnZO2YG17YeDwnXu5d4TcibBN+0zMbm1vZOdje3t39weJQ/PmnIBKE1knA9FysKSc+bQODhthYJiz+G06YzuUr85pkKywH+CSUi7Hh74zGUEg0J23i4ag1LHwzB03Pgh6cGFkSsa7gJaAuPEjv/M1DXW70YksSw8wWzbE5LXxXWXBTQvGp2/rXTD0jkUR8Ix1K2LTOEbowFMJpkutEkoaYjPCAtpX0sUdlN54GkehFRfq6Gwj1fNCndHEixp6UE89RnemdctlL4TqvHYF7042ZH0ZAfTJb5EZch0BPU9X7TFACfKIEJoKpW3UyxAITUNnVAjW8pdXRaNSti7LV4+VQvV2HkcWnaFzVEIWukZVdI9qI4Iekbv6BN9aS/ah/at/cxaM9p85hT9q0zmF4FDsTg=</latexit>Should be similar
<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit> <latexit sha1_base64="Tn5ijQgOnAy4mPiBG3gQMhvSFH4=">ACE3icbVDLSsNAFJ3UV62vqEs3waZQXZSkio9d0Y0uhAr2AU0tk+mkHTp5MHMjlNB/cOvuHGhiFs37vwbkzSIWg8MnDnXu69xw4k2AYn0pubn5hcSm/XFhZXVvfUDe3mtIPBaEN4nNftG0sKWcebQADTtuBoNi1OW3Zo/PEb91RIZnv3cA4oF0XDzmMIhlnrqfkm3giHrWTCkgMuWi2FoO9Hl5Bb29IKe/gnm0dVE76lFo2Kk0GaJmZEiylDvqR9W3yehSz0gHEvZMY0AuhEWwAink4IVShpgMsID2omph10qu1F60QrxUpfc3wRPw+0VP3ZEWFXyrFrx5XJjvKvl4j/eZ0QnJNuxLwgBOqR6SAn5Br4WhKQ1meCEuDjmGAiWLyrRoZYAJxjIU0hNMER98nz5JmtWIeVA6vq8XaWRZHu2gXVRGJjpGNXSB6qiBCLpHj+gZvSgPypPyqrxNS3NK1rONfkF5/wJm4J1f</latexit>θ
<latexit sha1_base64="CZMu9i9mqYh9pdzTQnbewbqTXR0=">AB73icbVDLSsNAFJ3UV62vqks3g63gqiRVfOyKblxWsA9oQ5lMJ+3QySTO3Agl9CfcuFDErb/jzr9xkgZR64ELh3Pu5d57vEhwDb9aRWldW14rpY3Nre2d8u5eW4exoqxFQxGqrkc0E1yFnAQrBspRgJPsI43uU79zgNTmofyDqYRcwMyktznlICRutU+jBmQ6qBcsWt2BrxInJxUI7moPzRH4Y0DpgEKojWPceOwE2IAk4Fm5X6sWYRoRMyYj1DJQmYdpPs3hk+MsoQ+6EyJQFn6s+JhARaTwPdAYExvqvl4r/eb0Y/As34TKgUk6X+THAkOI0+fxkCtGQUwNIVRxcyumY6IBRNRKQvhMsXZ98uLpF2vOSe109t6pXGVx1FEB+gQHSMHnaMGukFN1EIUCfSIntGLdW89Wa/W27y1YOUz+gXrPcvduyPug=</latexit> <latexit sha1_base64="Ly2C3v3mHZfg5kOK31IrZoYRcrg=">ACt3icbVFNb9QwEHXCVwkfXeDIxWK3UpHQKhuWwt6qcoEDUpHYtmKzRI7jbKw6TrAnhZXlv8iBG/8GJxvKljKSrTdv3shvxmktuIYw/OX5N27eun1n525w7/6Dh7uDR49PdNUoyua0EpU6S4lmgks2Bw6CndWKkTIV7DQ9f9vWTy+Y0rySn2Bds2VJVpLnBJwVDL4sTeK64InMRQMyH5cEijS3Ly3X+D5KHDFlqBEmA/WpXGjWU3oOVmxRXbBay1JyfTSfO+MWCfIWO6cdKkhuiaKrBptjVql1oTj6AUOxwftNY1scEV8pKpv8lL4ataKou4ObWDibdHoj8vSJuavYzuyQTIYuoYu8HUw6cEQ9XGcDH7GWUWbkmgmi9mIQ1LA1RwKlgzuPWxA72827GxXuOyXBeKXck4I7d7jCk1Hpdpk7ZutT/1lryf7VFA/mbpeGyboBJunkobwSGCrefiDOuGAWxdoBQxZ1XTAu3awruqzdLmLVxcDnydXASjScvx9OP0fDwqF/HDnqKnqF9NEGv0SF6h47RHFv6n32qJf5Mz/xc7/YSH2v73mCroT/9Tdnf9X</latexit> <latexit sha1_base64="y2FWsphi+t+VGdcXlc76o/Lki9c=">ADHicbVLbtQwFHXCq4RHp7BkY3WmokholAltobuqbGCBVKROW2kyRI7jzFh1HMt2CiPLH8CGX2HDAoTY8gHs+BucTDpMH1fK1fG956c3DgVjCodhn89/8bNW7fvrNwN7t1/8HC1s/boSJWVxGSIS1bKkxQpwignQ01IydCElSkjBynp6/r/vEZkYqW/FDPBkXaMJpTjHSrpSsesbvVhMaRLrKdFoMy6Qnqa5eWs/6Ge9wDXrAkbMvLPuGFeKCIRP0YSMsjMqFEcFUWPzqXFiHSEjubPSHA1SAk0qZQ1cpJaE/aj5zDs79RpK7LBJfq+LD/yBXV7t6ZFTQ6dsomXWb1zo4VNzH/TtudUL4geEsQWmEt0kG5wLNozr9Z42gkmn60abgFfBoAVd0MZB0vkTZyWuCsI1Zkip0SAUemyQ1BQzYoPlLTrY7nC+QrjhKhnMS+kermFTXZ4wqFBqVqSOWdtUl3t18breqNL5q7GhXFSacDx/UV4xqEtY3wyYUmwZjMHEJbUeYV46v4f1u7+zJewW8fO4pOvgqOoP3jR3ofdf23WsgCdgHWyCAXgJ9sAbcACGAHufva/ed+H/8X/5v/0f82pvtfOPAYXwv/9D6/i9XA=</latexit> <latexit sha1_base64="NZU1SuG24a2ZQjMLXDCxlDXMhQ=">ADYHicjVLPb9MwFHYTfpSwsRZucIloJ4aEqiR0g92mcYED0pDWbVJTIsdxWmvOD9lOobL8T3LjwIW/BDsJpaxD4kl+en7v8+fPfi8uKeHC8753LPvO3Xv3uw+chzu7j/Z6/cXvKgYwhNU0IJdxZBjSnI8EURQfFUyDLOY4sv4+p2pXy4x46TIz8WqxLMznOSEgSFTkX9znJ/GJYLEoVigQU8CDMoFnEqP6jP4uXQ0UWTQJDKj0pvw4rjEqJrOMfTZElKnsM85n8WitRGpDgVEuptxLyEjI4r7iSbB4r6Y2CV643OjJuHCjnBvyUFV/yNfTw2MC2nuaWYabqOFvoZmK5B/RarjFeo4hXZN6hu2wdhuUNeR2xhcN5b8ZfUNWu/FYOf/NGPUGWk1t7nbgt8EAtHYW9b6FSYGqDOcCUcj51PdKMZOQCYIoVs5mZ3TY9qVpi7uvM4mbFkyvXLh1dvOEhBnqyzWSCOT36yZ5G21aSXStzNJ8rISOEfNRWlFXVG4ZtrchDCMBF3pACJGtFYXLfRMIKFnsvmEY2NH6ydvBxfByH89Gn8KBien7Xd0wTPwHBwAH7wBJ+A9OAMTgDo/LNvasXatn3bX3rP7DdTqtGegL/MfvoLchwPOg=</latexit> <latexit sha1_base64="CZMu9i9mqYh9pdzTQnbewbqTXR0=">AB73icbVDLSsNAFJ3UV62vqks3g63gqiRVfOyKblxWsA9oQ5lMJ+3QySTO3Agl9CfcuFDErb/jzr9xkgZR64ELh3Pu5d57vEhwDb9aRWldW14rpY3Nre2d8u5eW4exoqxFQxGqrkc0E1yFnAQrBspRgJPsI43uU79zgNTmofyDqYRcwMyktznlICRutU+jBmQ6qBcsWt2BrxInJxUI7moPzRH4Y0DpgEKojWPceOwE2IAk4Fm5X6sWYRoRMyYj1DJQmYdpPs3hk+MsoQ+6EyJQFn6s+JhARaTwPdAYExvqvl4r/eb0Y/As34TKgUk6X+THAkOI0+fxkCtGQUwNIVRxcyumY6IBRNRKQvhMsXZ98uLpF2vOSe109t6pXGVx1FEB+gQHSMHnaMGukFN1EIUCfSIntGLdW89Wa/W27y1YOUz+gXrPcvduyPug=</latexit>res5
<latexit sha1_base64="ZYkFvcOHAPMcQbcKwhPq3ujeh+c=">ACG3icbVDLSsNAFJ34rPEVdelmsCnUTUkqosuiG91VsA9oY5lMJ+3QySTMTIQS+h9u/BU3LhRxJbjwb5y0WdTWAwOHc85l7j1+zKhUjvNjrKyurW9sFrbM7Z3dvX3r4LApo0Rg0sARi0TbR5IwyklDUcVIOxYEhT4jLX90nfmtRyIkjfi9GsfEC9GA04BipLTUs6ole1DuhkgN/SC9nTyoU9s2cGcpAV7PmD3rKJTcaAy8TNSRHkqPesr24/wklIuMIMSdlxnVh5KRKYkYmZjeRJEZ4hAakoylHIZFeOr1tAkta6cMgEvpxBafq/ESKQinHoa+T2ZJy0cvE/7xOoJL6U8ThThePZRkDCoIpgVBftUEKzYWBOEBdW7QjxEAmGl6zR1Ce7iycukWa24Z5Xzu2qxdpXUQDH4ASUgQsuQA3cgDpoAyewAt4A+/Gs/FqfBifs+iKkc8cgT8wvn8BRCKfGg=</latexit> <latexit sha1_base64="KWmfCBMJGy46BFqyZVbSAliZW7g=">ACK3icbVDLSsNAFJ34rPEVdekm2BbqpiQV0WpG91VsA9oY5hMJ+3QySTMTAol5H/c+CsudOEDt/6H0zaL2Hpg4HDOucy9x4soEdKyPrW19Y3Nre3Cjr67t39waBwdt0UYc4RbKQh73pQYEoYbkiKe5GHMPAo7jW9mfmeCuSAhe5DTCDsBHDLiEwSlklyjUS4NK/0AypHnJ3fpozwv6eWSn5OUkItMUjfJx5XtGkWras1hrhI7I0WQoekar/1BiOIAM4koFKJnW5F0EsglQRSnej8WOIJoDIe4pyiDARZOMr81NctKGZh+yNVj0pyr+YkEBkJMA08lZ2uKZW8m/uf1YulfOwlhUSwxQ4uP/JiaMjRnxZkDwjGSdKoIRJyoXU0ghwiqerVQn28smrpF2r2hfVy/tasd7I6iAU3AGKsAGV6AObkETtACT+AFvIMP7Vl7076070V0TctmTsAfaD+/XpSmBQ=</latexit> <latexit sha1_base64="1XIy5Ptm3yjnfAU4ZvSDtu2g1M8=">ACKXicbVDLSsNAFJ34rPEVdekm2BbqpiQV0WXRje4q2Ae0MUymk3boZBJmJoUS8jtu/BU3Coq69UectAFr64GBM+fcy73eBElQlrWp7ayura+sVnY0rd3dvf2jYPDlghjnAThTkHQ8KTAnDTUkxZ2IYxh4FLe90Xmt8eYCxKyezmJsBPASM+QVAqyTXq5dKg0gugHp+cps+yNOSXi75c5IS5v7j1E1+PW6RtGqWlOYy8TOSRHkaLjGa68fojATCIKhejaViSdBHJEMWp3osFjiAawQHuKspgIWTC9NzbJS+qYfcvWYNKfqfEcCAyEmgacqsyXFopeJ/3ndWPqXTkJYFEvM0GyQH1NThmYWm9knHCNJ4pAxIna1URDyCGSKlxdhWAvnrxMWrWqfVY9v6sV61d5HAVwDE5ABdjgAtTBDWiAJkDgETyDN/CuPWkv2of2NStd0fKeI/AH2vcPpm6lHg=</latexit> <latexit sha1_base64="UJ5UsvgNfPvuSUTDITK4OtV6Cvg=">ACRHicdZDLSsNAFIYn9VbjLerSTbAp1E1JKqLohvdVbAXSGOYTCft0MmFmUmhDycGx/AnU/gxoUibsVJW7C2emDg5/vP4Zz5vZgSLkzWSmsrK6tbxQ31a3tnd09bf+gxaOEIdxEY1Yx4McUxLipiC4k7MAw8itve8Cr32yPMOInCOzGOsRPAfkh8gqCQyNXstGvdAMoBp6f3mT34sRQy4Y/hxbAKHPTHzN3jX8sw9VKZtWclL4srJkogVk1XO2p24tQEuBQIAo5ty0zFk4KmSCI4kztJhzHEA1hH9tShjDA3EknIWR6WZKe7kdMvlDoEzo/kcKA83Hgyc78Rr7o5fAvz06Ef+GkJIwTgUM0XeQnVBeRnieq9wjDSNCxFBAxIm/V0QAyiITMXZUhWItfXhatWtU6rZ7d1kr1y1kcRXAEjkEFWOAc1ME1aIAmQOABvIA38K48Kq/Kh/I5bS0os5lD8KuUr2+xRbBF</latexit> <latexit sha1_base64="8oYhATVQxn+kCv7hSq1/xFVvwEs=">ACSHicbZBLS8NAFIUn9VXrq+rSTbAp1E1JKqLohvdVbAP6CNMpN26OTBzE2hPw8Ny7d+RvcuFDEnZO2YG17YeDwnXu5d4TcibBN+0zMbm1vZOdje3t39weJQ/PmnIBKE1knA9FysKSc+bQODhthYJiz+G06YzuUr85pkKywH+CSUi7Hh74zGUEg0J23i4ag1LHwzB03Pgh6cGFkSsa7gJaAuPEjv/M1DXW70YksSw8wWzbE5LXxXWXBTQvGp2/rXTD0jkUR8Ix1K2LTOEbowFMJpkutEkoaYjPCAtpX0sUdlN54GkehFRfq6Gwj1fNCndHEixp6UE89RnemdctlL4TqvHYF7042ZH0ZAfTJb5EZch0BPU9X7TFACfKIEJoKpW3UyxAITUNnVAjW8pdXRaNSti7LV4+VQvV2HkcWnaFzVEIWukZVdI9qI4Iekbv6BN9aS/ah/at/cxaM9p85hT9q0zmF4FDsTg=</latexit>Unrelated (Negative)
73
Better self-supervised learning objective
Accuracy on ImageNet-1K
74
Object Detection
- Outperforms ImageNet supervised pre-trained networks
- Full fine-tuning, no bells & whistles
- No extra data, changes in model architecture, fine-tuning schedule
Initialization VOC07+12 VOC07 APall AP50 AP75 APall AP50 AP75
ImageNet Supervised
52.6 81.1 57.4 43.8 74.5 45.9
PIRL
54.0 80.7 59.7 44.7 73.4 47.0
+1.4 +2.3 +1.1
75
Linear Classification
- Linear classifiers on fixed features. Evaluate on ImageNet-1K
CPCv2
76
Easily Multi-task
Transfer Dataset Method ImageNet-1M VOC07 Places205 iNaturalist
Jigsaw
46.0 66.1 41.4 22.1
Rotation
48.9 63.9 47.6 23
PIRL (Rot)
60.2 77.1 47.6 31.2
PIRL (Jigsaw + Rot)
63.1 80.3 49.7 33.6
The rise of contrastive learning
Contrastive Learning
- How to define what images are "related" and "unrelated"?
Related and Unrelated Images
Frames of a video
Hadsell et al., 2005, DrLim van der Oord et al., 2018, CPC
Video & Audio
AVID - Morgado et al., ECCV 2020 GDT - Patrick et al., 2020
80
Tracking Objects
Wang & Gupta, 2015, Unsupervised Learning of Visual Representations using Videos
van der Oord et al., 2018, Henaff et al., 2019 Contrastive Predictive Coding
Related (Positives) Unrelated (Negative)
Nearby patches vs. distant patches of an Image
Wu et al., 2018, Instance Discrimination He et al., 2019, MoCo Misra & van der Maaten, 2019, PIRL Chen et al., 2020, SimCLR and lots more ....
Related (Positives) Unrelated (Negative)
Patches of an image vs. patches of other images
Is "contrastive" really important?
Contrastive learning -- what does it do?
Positive Sample Negative samples Negative samples
Contrastive learning -- what does it do?
Positive Sample Negative samples Negative samples
Contrastive learning -- what does it do?
Creates groups in the feature space
Contrastive learning -- what does it do?
Creates groups in the feature space So does clustering?!
Swapping Assignments between Views
(SwAV)
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin
89
Grouping
Dataset Prototypes
See also - SeLa by Asano et al., 2019
Similarity of dataset sample & prototypes (which cluster does a sample belong to?)
90
Grouping
Codes
}
Prototypes Dataset
fθ fθ
Code 1 Code 2
Prototypes
fθ fθ
Code 1 Code 2 Predict
Prototypes
fθ fθ
Prototypes
Code 1 Code 2 Backprop Backprop
Not contrastive!
94
Key Results
Linear Classifier Detection (Fixed Features) ImageNet Places iNaturalist VOC07+12 COCO
Supervised
76.5 53.2 46.7 81.3 40.8
Prior self-supervised
71.1 (-5.4) 52.1 38.9 82.5 42.0
SwAV
75.3 (-1.2) 56.7 48.6 82.6 42.1
95
Practical advantages of SwAV
- Trains on 4-8 GPUs
- Faster convergence than prior work (SimCLR, MoCov2)
- Smaller compute requirements.
- 2x faster than MoCo-v2 on 8 GPUs
- 72% after 100h vs. 71% after 200h
- Better results
Code & Models - https://github.com/facebookresearch/swav PyTorch Lightning implementation on the way
1 % S t u d e n t f r i e n d l y
Combining clustering with contrastive learning
Audio Visual Instance Discrimination with Cross Modal Agreement
(AVID + CMA)
Pedro Morgado, Nuno Vasconcelos, Ishan Misra
https://github.com/facebookresearch/AVID-CMA
98
Positives
d( d( ) ) < d( d( ) ) <
Audio & Video (same sample)
}
Negatives
Relate to other video/audio using negatives
}
Contrastive (Audio Video Instance Discrimination)
99
Grouping using Audio-visual Agreements (CMA)
Positive Set Negative Set
Video Similarity (vT
i vj)
Audio Similarity (aT
i aj)
Positives Visual Negatives Audio Negatives Reference
Positives
d( d( ) ) < d( d( ) ) <
Negatives
}
Videos that are similar in audio & video features
100
Grouping using Audio-visual Agreements (CMA)
Positive Set Negative Set
Example 3 Example 1 Example 2
Video Similarity (vT
i vj)
Audio Similarity (aT
i aj)
Moving Train
Positives
Dancing Playing Violin Exercising Fire Truck Station
Visual Negatives
Playing Guitar Moving Boat
Audio Negatives
Fishing with background music Playing Accordion Moving Train
Reference
Dancing Playing Violin
Pretext tasks Generative Contrastive/Clustering
Related Unrelated
Pretext Image Transform
Transform t
Standar
It
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>I
<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit> <latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>Pr
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit> <latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>Pretext Invariant Representation Learning
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit> <latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit> <latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>Representation
ConvNet
Representation
ConvNet Encourage to be similar
It
<latexit sha1_base64="b9Db75leVutsWGyjrhn+yRcISj8=">AB9XicbVDLSgMxFM34rPVdekm2AquykwVH7uiG91VsA9opyWTZtrQTGZI7ihl6H+4caGIW/FnX9jZlpErQcCh3Pu5Z4cLxJcg21/WguLS8srq7m1/PrG5tZ2YWe3ocNYUVanoQhVyOaCS5ZHTgI1oUI4EnWNMbXaV+854pzUN5B+OIuQEZSO5zSsBI3VInID0/ORm0oVSr1C0y3YGPE+cGSmiGWq9wkenH9I4YBKoIFq3HTsCNyEKOBVsku/EmkWEjsiAtQ2VJGDaTbLUE3xolD72Q2WeBJypPzcSEmg9DjwzmYbUf71U/M9rx+CfuwmXUQxM0ukhPxYQpxWgPtcMQpibAihipusmA6JIhRMUfmshIsUp9fnieNStk5Lp/cVorVy1kdObSPDtARctAZqJrVEN1RJFCj+gZvVgP1pP1ar1NRxes2c4e+gXr/Qsc8pJl</latexit>I
<latexit sha1_base64="zG0+86ACKs0ZivmNXL/57ldlvG0=">AB83icbVDLSsNAFL2pr1pfVZduBlvBVUmr+NgV3eiugn1AE8pkOmHTiZhZiKU0N9w40IRt/6MO/GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXSxubW9s75d29jgpjSWibhDyUPQ8rypmgbc0p71IUhx4nHa9yXqdx+oVCwU93oaUTfAI8F8RrA2klN1AqzHnp/czqDcsWu2RnQIqnpAI5WoPyhzMSRxQoQnHSvXrdqTdBEvNCKezkhMrGmEywSPaN1TgCo3yTLP0JFRhsgPpXlCo0z9uZHgQKlp4JnJNKL6Xif14/1v6FmzARxZoKMj/kxzpEKUFoCGTlGg+NQTyUxWRMZYqJNTaWshMsUZ9fXiSdRq1+Uju9a1SaV3kdRTiAQziGOpxDE26gBW0gEMEjPMOLFVtP1qv1Nh8tWPnOPvyC9f4FiwCRfw=</latexit>ConvNet