Adversarial Continual Learning Sayna Ebrahimi Franziska Meier - - PowerPoint PPT Presentation

adversarial continual learning
SMART_READER_LITE
LIVE PREVIEW

Adversarial Continual Learning Sayna Ebrahimi Franziska Meier - - PowerPoint PPT Presentation

Adversarial Continual Learning Sayna Ebrahimi Franziska Meier Roberto Calandra Trevor Darrell Marcus Rohrbach UC Berkeley Facebook AI Research Facebook AI Research UC Berkeley Facebook AI Research What is Continual Learning? Definition:


slide-1
SLIDE 1

Adversarial Continual Learning

Sayna Ebrahimi UC Berkeley Trevor Darrell UC Berkeley Marcus Rohrbach Facebook AI Research Roberto Calandra Facebook AI Research Franziska Meier Facebook AI Research

slide-2
SLIDE 2

What is Continual Learning?

Definition: learning a sequence of tasks and performing well on all of them

Task 2 Task 3 Task 1

Objectives:

  • No forgetting
  • Data streams and revisiting is not allowed/limited
  • High knowledge transferability
  • Efficiency and scalability
  • No/limited task information at test time
  • etc.
slide-3
SLIDE 3

Approaches in Continual Learning

SI MAS EWC UCB LWF LFL

LWF: Li & Hoiem, 2016 LFL: Jung et al., 2016 EWC: Kirkpatrick et al., 2016 SI: Zenke et al., 2017 MAS: Aljundi, 2018 UCB: Ebrahimi et al., 2020

VCL

Regularization

LWF: Li & Hoiem, 2016 LFL: Jung et al., 2016 EWC: Kirkpatrick et al., 2016 SI: Zenke et al., 2017 VCL: Nguyen et al., 2017 MAS: Aljundi, 2018 UCB: Ebrahimi et al., 2020

slide-4
SLIDE 4

Approaches in Continual Learning

SI MAS EWC UCB LWF LFL

LWF: Li & Hoiem, 2016 LFL: Jung et al., 2016 EWC: Kirkpatrick et al., 2016 SI: Zenke et al., 2017 MAS: Aljundi, 2018 UCB: Ebrahimi et al., 2020

VCL

Regularization

GEM Experience replay A-GEM Experience replay: Robins, 1995

GEM: Lopez-Paz & Ranzato, 2017 A-GEM: Chaudhry et al., 2019

Memory

LWF: Li & Hoiem, 2016 LFL: Jung et al., 2016 EWC: Kirkpatrick et al., 2016 SI: Zenke et al., 2017 VCL: Nguyen et al., 2017 MAS: Aljundi, 2018 UCB: Ebrahimi et al., 2020

slide-5
SLIDE 5

Approaches in Continual Learning

SI MAS EWC UCB LWF LFL

LWF: Li & Hoiem, 2016 LFL: Jung et al., 2016 EWC: Kirkpatrick et al., 2016 SI: Zenke et al., 2017 MAS: Aljundi, 2018 UCB: Ebrahimi et al., 2020

VCL

Regularization

GEM Experience replay A-GEM Experience replay: Robins, 1995

GEM: Lopez-Paz & Ranzato, 2017 A-GEM: Chaudhry et al., 2019

Memory

PNN DEN

PNN: Rusu et al., 2016 DEN: Yoon et al., 2018 PC: Schwarz et al., 2018

PC

Structure

LWF: Li & Hoiem, 2016 LFL: Jung et al., 2016 EWC: Kirkpatrick et al., 2016 SI: Zenke et al., 2017 VCL: Nguyen et al., 2017 MAS: Aljundi, 2018 UCB: Ebrahimi et al., 2020

slide-6
SLIDE 6

Approaches in Continual Learning

SI MAS EWC UCB LWF LFL

LWF: Li & Hoiem, 2016 LFL: Jung et al., 2016 EWC: Kirkpatrick et al., 2016 SI: Zenke et al., 2017 MAS: Aljundi, 2018 UCB: Ebrahimi et al., 2020

VCL

Regularization

GEM Experience replay A-GEM Experience replay: Robins, 1995

GEM: Lopez-Paz & Ranzato, 2017 A-GEM: Chaudhry et al., 2019

Memory

PNN DEN

PNN: Rusu et al., 2016 DEN: Yoon et al., 2018 PC: Schwarz et al., 2018

PC

Structure

LWF: Li & Hoiem, 2016 LFL: Jung et al., 2016 EWC: Kirkpatrick et al., 2016 SI: Zenke et al., 2017 VCL: Nguyen et al., 2017 MAS: Aljundi, 2018 UCB: Ebrahimi et al., 2020

PackNet

FearNet DGR

iCaRL: Rebuffi et al., 2016 DGR: Shin et al., 2017 FearNet: Kemker et al., 2017 PackNet: Mallya, 2018 Piggyback: Mallya 2018 HAT: Serrà et al., 2018

Piggyback HAT iCaRL

slide-7
SLIDE 7

Where does our approach (ACL) stand?

ACL

Memory Structure

slide-8
SLIDE 8

Intuition

  • Tasks in a sequence
  • Have task-invariant (shared) knowledge in common
  • Require task-specific (private) features to master them

Shared Knowledge (task-invariant)

How can we factorize task-invariant from task-specific features?

Private (Task 1) Private (Task 2) Private (Task 3)

slide-9
SLIDE 9

Private (Task 2) Private (Task 1) Private (Task 3)

Discriminator

Task label

Private (Task 2) Private (Task 1) Private (Task 3)

Target label

Shared

Input images & task labels

Our Approach: Adversarial Continual Learning

zS zP

slide-10
SLIDE 10

Private (Task 2) Private (Task 1) Private (Task 3)

Discriminator

Task label

Private (Task 2) Private (Task 1) Private (Task 3)

Input images & task labels

Our Approach: Adversarial Continual Learning

Target label

Shared

zS zP

slide-11
SLIDE 11

Private (Task 2) Private (Task 1) Private (Task 3)

Discriminator

Task label

Private (Task 2) Private (Task 1) Private (Task 3)

Shared (task-invariant)

Input images & task labels

Our Approach: Adversarial Continual Learning

Target label

Shared

zS zP

ℒadv

slide-12
SLIDE 12

Private (Task 2) Private (Task 1) Private (Task 3)

Discriminator

Task label

Private (Task 2) Private (Task 1) Private (Task 3)

Shared (task-invariant)

Input images & task labels

Our Approach: Adversarial Continual Learning

Target label

Shared

zS zP

ℒadv

slide-13
SLIDE 13

Private (Task 2) Private (Task 1) Private (Task 3)

Discriminator

Task label

Private (Task 2) Private (Task 1) Private (Task 3)

ℒadv ℒdiff

Shared (task-invariant)

Input images & task labels

Our Approach: Adversarial Continual Learning

Target label Target label

Shared

zS zP

Shared (task-invariant)

slide-14
SLIDE 14

Private (Task 2) Private (Task 1) Private (Task 3)

Discriminator

Task label

Private (Task 2) Private (Task 1) Private (Task 3)

Shared (task-invariant)

Input images & task labels (x,t)

Our Approach: Adversarial Continual Learning

Target label Target label Target label

ℒadv ℒdiff

zS zP

slide-15
SLIDE 15

Avoiding Forgetting in ACL

Shared (task-independent) Discriminator

Task label

Input images & task labels

Stored per task

Private (Task 2) Private (Task 1) Private (Task 3)

Private (Task 2) Private (Task 1) Private (Task 3)

Target label

Experience Replay

ℒadv ℒdiff

zS zP

slide-16
SLIDE 16

Experiments

miniImageNet (20 Tasks) CIFAR100 (20 Tasks) Split MNIST (5 Tasks) Permuted MNIST (10/20/30/40 Tasks) Sequence of 5 datasets:

(SVHN, CIFAR10, MNIST, FashionMNIST, NotMNIST)

ACC = 1 n

n

i=1

Ri,n BWT = 1 n − 1

n

i=1

Ri,n − Ri,i

Backward Transfer: Average Accuracy

Evaluation metrics Datasets

slide-17
SLIDE 17

ACL (Ours) HAT PNN ER-RES A-GEM Ordinary Finetune 70

28.76 52.43 57.32 58.96 59.45 62.07

ACC (%)

Results on 20-Split MiniImageNet

63

62.07

Accuracy (%)

slide-18
SLIDE 18

ACL (Ours) HAT PNN ER-RES A-GEM Ordinary Finetune

  • 70

70

  • 64.23
  • 15.23
  • 11.34
  • 0.04

28.76 52.43 57.32 58.96 59.45 62.07

Results on 20-Split MiniImageNet

63

62.07

0.00 0.00

slide-19
SLIDE 19

ACL (Ours) HAT PNN ER-RES A-GEM Ordinary Finetune

  • 70

70

  • 64.23
  • 15.23
  • 11.34
  • 0.04

28.76 52.43 57.32 58.96 59.45 62.07

Results on 20-Split MiniImageNet

63

62.07

ACL (Ours) HAT PNN ER-RES A-GEM Ordinary Finetune 200 400 600

110.10 110.10 8.50

28.8 52.4 57.3 588.0 123.6 113.1

Architecture Memory (MB) Replay Buffer (MB)

Memory (MB)

0.00 0.00

slide-20
SLIDE 20

Results on Sequence of 5-Datasets

(SVHN, CIFAR10, MNIST, FashionMNIST, NotMNIST)

# Classes Training Test 50 212,785 48,365

slide-21
SLIDE 21

Results on Sequence of 5-Datasets

ACL (Ours) UCB Finetune 80

27.32 76.34 78.55

ACC (%)

79

78.55

Accuracy (%)

(SVHN, CIFAR10, MNIST, FashionMNIST, NotMNIST)

# Classes Training Test 50 212,785 48,365

slide-22
SLIDE 22

ACL (Ours) UCB Ordinary Finetune

  • 80

80

  • 42.12
  • 1.34
  • 0.01

27.32 76.34 78.55

Results on Sequence of 5-Datasets

79

78.55

ACL (Ours) UCB Finetune 13.333 26.667 40

16.5 32.8 16.5

Architecture Memory (MB)

slide-23
SLIDE 23

Ablation Study on 20-Split miniImageNet

Discriminator

Replay buffer

ACC (%) BWT (%) X X X 62.07 0.00

ℒdiff

slide-24
SLIDE 24

Ablation Study on 20-Split miniImageNet

Discriminator

Replay buffer

ACC (%) BWT (%)

  • X

X 52.07

  • 0.01

X X X 62.07 0.00

ℒdiff

slide-25
SLIDE 25

Ablation Study on 20-Split miniImageNet

Discriminator

Replay buffer

ACC (%) BWT (%)

  • X

X 52.07

  • 0.01

X

  • X

57.66

  • 3.71

X X X 62.07 0.00

ℒdiff

slide-26
SLIDE 26

Ablation Study on 20-Split miniImageNet

Discriminator

Replay buffer

ACC (%) BWT (%) X X 52.07

  • 0.01

X X 57.66

  • 3.71

X X

  • 60.28

0.00 X X X 62.07 0.00

ℒdiff

slide-27
SLIDE 27

Visualizing Adversarial Learning Effect

(20-Split miniImageNet)

  • w. Discriminator

w/o Discriminator Task Number Shared Private Shared Private Task 20 Tasks 1-10

Without Dis

C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10

slide-28
SLIDE 28

Visualizing Adversarial Learning Effect

(20-Split miniImageNet)

  • w. Discriminator

w/o Discriminator Task Number Shared Private Shared Private Task 20 Tasks 1-10

Without Dis

C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10

slide-29
SLIDE 29

Visualizing Adversarial Learning Effect

(20-Split miniImageNet)

  • w. Discriminator

w/o Discriminator Task Number Shared Private Shared Private Task 20 Tasks 1-10

Without Dis

C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10

slide-30
SLIDE 30

Visualizing Adversarial Learning Effect

(20-Split miniImageNet)

  • w. Discriminator

w/o Discriminator Task Number Shared Private Shared Private Task 20 Tasks 1-10

Without Dis

C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10

slide-31
SLIDE 31

Visualizing Adversarial Learning Effect

(20-Split miniImageNet)

  • w. Discriminator

w/o Discriminator Task Number Shared Private Shared Private Task 20 Tasks 1-10

Without Dis

C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10

slide-32
SLIDE 32

Visualizing Adversarial Learning Effect

(20-Split miniImageNet)

  • w. Discriminator

w/o Discriminator Task Number Shared Private Shared Private Task 20 Tasks 1-10

Without Dis

C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10

slide-33
SLIDE 33

Visualizing Adversarial Learning Effect

(20-Split miniImageNet)

  • w. Discriminator

w/o Discriminator Task Number Shared Private Shared Private Task 20 Tasks 1-10

Without Dis

C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 C1 C2 C3 C4 C5 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10

10 well-separated clusters 9-entangled clusters

slide-34
SLIDE 34

Conclusion

ACL is primarily an architecture-based method but can benefit from experience replay if need be Uses adversarial learning and an orthogonality constraint to disentangle task-specific and task-invariant features Achieves near zero forgetting and state-of-the-art accuracy on image classification benchmarks

slide-35
SLIDE 35

Paper: https://arxiv.org/pdf/2003.09553.pdf Code: https://github.com/facebookresearch/Adversarial-Continual-Learning

Sayna Ebrahimi UC Berkeley Trevor Darrell UC Berkeley Marcus Rohrbach Facebook AI Research Roberto Calandra Facebook AI Research Franziska Meier Facebook AI Research

Questions: sayna@berkeley.edu