Self-ensembling for visual domain adaptation Geoff French - - PowerPoint PPT Presentation

self ensembling for visual domain adaptation
SMART_READER_LITE
LIVE PREVIEW

Self-ensembling for visual domain adaptation Geoff French - - PowerPoint PPT Presentation

Self-ensembling for visual domain adaptation Geoff French g.french@uea.ac.uk Colour Lab (Finlayson Lab) University of East Anglia, Norwich, UK Image montages from http://www.image-net.org Thanks to My supervisory team: Prof. G. Finlayson,


slide-1
SLIDE 1

Self-ensembling for visual domain adaptation

Geoff French – g.french@uea.ac.uk Colour Lab (Finlayson Lab) University of East Anglia, Norwich, UK

Image montages from http://www.image-net.org

slide-2
SLIDE 2

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Thanks to My supervisory team: Prof. G. Finlayson,

  • Dr. M. Mackiewicz

Competition organisers and all participants

slide-3
SLIDE 3

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Described in more detail in our ICLR 2018 submission “Self-Ensembling for Visual Domain Adaptation” https://arxiv.org/abs/1706.05208 (v2)

slide-4
SLIDE 4

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Model

slide-5
SLIDE 5

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Self-ensembling developed for semi- supervised learning in [Laine17] Further developed in [Tarvainen17] (mean teacher model)

slide-6
SLIDE 6

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Mean-teacher model Standard classifier DNN

𝑦"# stochastic aug. cross- entropy Squared diff Weighted sum loss 𝑨"# 𝑧&# 𝑦&# 𝑨̃"# Student network Teacher network

slide-7
SLIDE 7

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Mean-teacher model Weights of teacher network are exponential moving average of student network

𝑦"# stochastic aug. cross- entropy Squared diff Weighted sum loss 𝑨"# 𝑧&# 𝑦&# 𝑨̃"# Student network Teacher network

slide-8
SLIDE 8

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Source domain sample: Traditional supervised cross-entropy loss (with data augmentation)

𝑦"# stochastic aug. cross- entropy Squared diff Weighted sum loss 𝑨"# 𝑧&# 𝑦&# 𝑨̃"# Student network Teacher network

slide-9
SLIDE 9

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Target domain sample:

  • ne sample

𝑦"# stochastic aug. cross- entropy Squared diff Weighted sum loss 𝑨"# 𝑧&# 𝑦&# 𝑨̃"# Student network Teacher network

slide-10
SLIDE 10

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Target domain sample: augment twice, differently each time (gaussian noise, translation, flip)

𝑦"# stochastic aug. cross- entropy Squared diff Weighted sum loss 𝑨"# 𝑧&# 𝑦&# 𝑨̃"# Student network Teacher network

slide-11
SLIDE 11

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Target domain sample: One path through student network Second through teacher (different dropout)

𝑦"# stochastic aug. cross- entropy Squared diff Weighted sum loss 𝑨"# 𝑧&# 𝑦&# 𝑨̃"# Student network Teacher network

slide-12
SLIDE 12

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Target domain sample: Result: two predicted probability vectors

𝑦"# stochastic aug. cross- entropy Squared diff Weighted sum loss 𝑨"# 𝑧&# 𝑦&# 𝑨̃"# Student network Teacher network

slide-13
SLIDE 13

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Target domain sample: Self-ensembling loss: train network to learn to make them the same (squared difference)

𝑦"# stochastic aug. cross- entropy Squared diff Weighted sum loss 𝑨"# 𝑧&# 𝑦&# 𝑨̃"# Student network Teacher network

slide-14
SLIDE 14

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Self-ensembling performs label propagation over unsupervised samples

slide-15
SLIDE 15

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Model so far may handle simple domain adaptation tasks…

slide-16
SLIDE 16

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Our adaptations for domain adaptation

slide-17
SLIDE 17

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Separate source and target batches Per training iteration, process source and target mini-batches separately Each gets its own batch norm stats, bit like AdaBN [Li16]

slide-18
SLIDE 18

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Confidence thresholding If confidence of teacher predictions <96.8%, mask self-ensembling loss for that sample to 0

slide-19
SLIDE 19

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

MORE Data augmentation VisDA model Random crops, rotation, scale, h-flip Intensity/brightness scaling, colour

  • ffset, colour rotation, desaturation
slide-20
SLIDE 20

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

MORE Data augmentation Our small image benchmarks: 1 + 𝒪 0,0.1 𝒪 0,0.1 𝒪 0,0.1 1 + 𝒪 0,0.1

slide-21
SLIDE 21

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Class balancing Binary-cross-entropy loss between target domain predictions (averaged

  • ver sample dimension) and uniform

probability vector

slide-22
SLIDE 22

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Class balancing Otherwise in unbalanced datasets one class is re-inforced more than the others Classifier separates source domain from target and assigns all target domain samples to most populous class

slide-23
SLIDE 23

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Works with randomly initialised nets e.g. for small image benchmarks

slide-24
SLIDE 24

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Works with pre-trained nets e.g. the ResNet 152 we used for VisDA J

slide-25
SLIDE 25

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

VisDA-17 Results

slide-26
SLIDE 26

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Images from VisDA-17

Validation set Training set Unlabeled Labeled

slide-27
SLIDE 27

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Model Fine-tuned ResNet-152 Remove classification layer (after global pooling) Replace with two fully-connected layers

slide-28
SLIDE 28

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Notes Test set augmentation Predictions were computed by augmenting each test sample 16x and averaging predictions Gained 1-2% MCA on validation set

slide-29
SLIDE 29

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Notes 5 network ensemble Predictions of 5 independent training runs were averaged Gained us ~0.5% MCA on test set

slide-30
SLIDE 30

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

VisDA-17

VALIDATION Acc TEST Acc Resnet-50 82.8 Resnet-50 Resnet-152 Resnet-152

slide-31
SLIDE 31

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

VisDA-17

VALIDATION Acc TEST Acc Resnet-50 82.8 Resnet-50 Resnet-152 85.3* Resnet-152 * Not on leaderboard

slide-32
SLIDE 32

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

VisDA-17

VALIDATION Acc TEST Acc Resnet-50 82.8 Resnet-50 ~80 Resnet-152 85.3* Resnet-152 * Not on leaderboard

slide-33
SLIDE 33

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

VisDA-17

VALIDATION Acc TEST Acc Resnet-50 82.8 Resnet-50 ~80 Resnet-152 85.3* Resnet-152 92.8 * Not on leaderboard

slide-34
SLIDE 34

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

VisDA-17

VALIDATION Acc TEST Acc Resnet-50 82.8 Resnet-50 ~80 Resnet-152 85.3* Resnet-152 92.8

Plane Bike Bus Car Horse Knife Val 96.3 87.9 84.7 55.7 95.9 95.2 Test 96.9 92.4 92.0 97.2 95.2 98.8 MCycle Person Plant Skbrd Train Truck MEAN Val 88.6 77.4 93.3 92.8 87.5 38.2 82.8 Test 86.3 75.3 97.7 93.3 94.5 93.3 92.8

* Not on leaderboard

slide-35
SLIDE 35

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Validation set Lots of confusion between car and truck Much less so on test set

slide-36
SLIDE 36

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Small image results

slide-37
SLIDE 37

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

MNIST ⟷ USPS

Model USPS → MNIST MNIST → USPS

  • Sup. on SRC

91.97 96.25 SBADA-GAN [Russo17] 97.60 95.04 OURS 99.54 98.26

  • Sup. On TGT

99.62 97.83 MNIST USPS

slide-38
SLIDE 38

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Syn-digits → SVHN

Model Syn-digits → SVHN

  • Sup. on SRC

86.96 ATT [Saito17] 93.1 OURS 96.00

  • Sup. On TGT

95.55 Syn-digits SVHN

slide-39
SLIDE 39

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Syn-signs → GTSRB

Model Syn-signs → GTSRB

  • Sup. on SRC

96.72 ATT [Saito17] 96.2 OURS 98.32

  • Sup. On TGT

98.54 Syn-signs GTSRB

slide-40
SLIDE 40

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

SVHN (greyscale*) → MNIST

Model SVHN → MNIST

  • Sup. on SRC

73.00 ATT [Saito17] 76.14 OURS 99.22

  • Sup. On TGT

99.66 SVHN (grey) MNIST * [Ghiffary16]

slide-41
SLIDE 41

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

MNIST → SVHN (greyscale)

Model MNIST → SVHN

  • Sup. on SRC

28.78 SBADA-GAN [Russo17] 61.08 OURS 41.98

  • Sup. on TGT

96.68 MNIST SVHN (grey)

slide-42
SLIDE 42

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

MNIST → SVHN (greyscale)

Model MNIST → SVHN

  • Sup. on SRC (aug)

64.82 SBADA-GAN [Russo17] 61.08 OURS 96.6

  • Sup. on TGT (aug)

97.3 MNIST (aug) SVHN (grey)

slide-43
SLIDE 43

Conclusions

slide-44
SLIDE 44

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Our approach has produced good results It won VisDA J

slide-45
SLIDE 45

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

Promising avenue for domain adaptation: two components…

slide-46
SLIDE 46

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

STEP 1. Align source and target distributions Pre-trained net, data augmentation, … Prior work in the field (e.g. CORAL, AdaBN) does this!

slide-47
SLIDE 47

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

STEP 2. refine correspondence Self-ensembling is well suited to this

slide-48
SLIDE 48

THANK YOU!

slide-49
SLIDE 49

References

slide-50
SLIDE 50

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

[Ghiffary16] Muhammad Ghifary, W Bastiaan Kleijn, Mengjie Zhang, David Balduzzi, and Wen Li. "Deep reconstruction-classification networks for unsupervised domain adaptation.“ ECCV 2017.

slide-51
SLIDE 51

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

[Laine17] Samuli Laine and Timo Aila . "Temporal Ensembling for Semi- Supervised Learning.“ ICLR 2017.

slide-52
SLIDE 52

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

[Li16] Yanghao Li, Naiyan Wang, Jianping Shi, Jiaying Liu, and Xiaodi Hou . "Revisiting batch normalization for practical domain adaptation.“ 2016.

slide-53
SLIDE 53

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

[Saito17] Kuniaki Saito, Yoshitaka Ushiku, and Tatsuya Harada . "Asymmetric Tri-training for Unsupervised Domain Adaptation“ 2017.

slide-54
SLIDE 54

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

[Russo17] Paolo Russo, Fabio Maria Carlucci, Tatiana Tommasi, and Barbara Caputo . "From source to target and back: symmetric bi-directional adaptive GAN.“ 2017.

slide-55
SLIDE 55

https://arxiv.org/abs/1706.05208 Self-Ensembling for Visual Domain Adaptation

[Tarvainen17] Antti Tarvainen and Harri Valpola. "Mean teachers are better role models: Weight-averaged consistency targets improve semi- supervised deep learning results.“ 2017.