A Small Step to Remember: Study of Single Model VS Dynamic Model - PowerPoint PPT Presentation

A Small Step to Remember: Study of Single Model VS Dynamic Model Liguang Zhou School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS) November 4, 2019 Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 1 / 17

Overview Overview Introduction Elastic Weights Consolidation (EWC) - Single Model Learning without Forgetting (LwF) - Dynamic Model Experiments Conclusion Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 2 / 17

Introduction Competition Details Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 3 / 17

Introduction Introduction In robotics area, the incremental learning of various objects is an essential problem for perception of robots. When there are many tasks to be trained in sequence, the DNNs will be suffering from catastrophic forgetting problem. One way to solve this catastrophic problem is called multi-task training, in which the various task will be trained concurrently in the training process. This solution can also be regarded as the upper bound of the Life Long Learning problem. However, in reality, if we need to train DNNs every time when new the task comes, it is low-efficiency and a lot of computing resources will be waste Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 4 / 17

Introduction Introduction In robotics area, the incremental learning of various objects is an essential problem for perception of robots. When there are many tasks to be trained in sequence, the DNNs will be suffering from catastrophic forgetting problem. One way to solve this catastrophic problem is called multi-task training, in which the various task will be trained concurrently in the training process. This solution can also be regarded as the upper bound of the Life Long Learning problem. However, in reality, if we need to train DNNs every time when new the task comes, it is low-efficiency and a lot of computing resources will be waste Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 5 / 17

Introduction Introduction Therefore, the alternative methods of solving this life long learning problem have been proposed, such as Elastic Weights Consolidation (EWC), Learning without Forgetting (LwF), generative methods and so on. EWC is a single model that utilize the Fisher Information Matrix, which is also related to the second derivative of the gradient, to preserve some important parameters of the previous tasks during the training. LwR is a dynamic model used for preserve the memory of the previous tasks by expend the network and introducing the knowledge distillation loss. Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 6 / 17

Single Model Elastic Weights Consolidation (EWC) Figure 1: The learning sequence is from task A to task B We assume some parameters that are less useful and others are more valuable in DNNs. In the sequentially training, each parameter is treated equally. In EWC, we intend to utilize the diagonal components in Fisher Information Matrix to identify the importance of parameters to task A and apply the corresponding weights to them. Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 7 / 17

Single Model L2 Case To avoid forgetting the learned knowledge in task A, one simple trick is to minimize the distances between θ, θ ∗ A , which also can be regarded as L 2 . L B ( θ ) + 1 θ ∗ = argmin A ) 2 2 α ( θ − θ ∗ (1) θ In L 2 case, each parameters is treated equally, which is not a wise solution because the sensitivity of each parameters varies a lot. The assumption is the importance of each parameters is different and varies a lot. Hence, the diagonal components in Fisher Information Matrix is used to measure the weights of importance of each parameter. Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 8 / 17

Single Model Close Look at EWC Baye’s rule log p ( θ |D ) = log p ( D| θ ) + log p ( θ ) − log p ( D ) (2) Assume data is split into two parts, one defining task A ( D A ) and the other defining task B ( D B ), we obtain: log p ( θ |D ) = log p (( D B | θ ) + log p ( θ |D A ) − log p ( D B ) (3) Fisher Information Matrix L B ( θ ) + 1 θ ∗ = argmin � 2 θ i − θ ∗ � 2 α F θ ∗ A , i A , i θ N A = 1 � A ) T ∇ θ log p ( x A , i | θ ∗ A ) ∇ θ log p ( x A , i | θ ∗ F θ ∗ N i =1 Loss function, L B is the loss for task B only and λ indicates how important the old task is. λ � 2 � � θ i − θ ∗ L ( θ ) = L B ( θ ) + (4) 2 F i A , i i Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 9 / 17

Dynamic Model Learning without Forgetting (LwF) θ s : a set of shared parameters for CNNs (e.g., five convolutional layers and two fully connected layers for AlexNet [3] architecture) θ 0 : task-specific parameters for previously learned tasks (e.g., the output layer for ImageNet [4] classification and corresponding weights) θ n : randomly initialized task specific parameters for new tasks Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 10 / 17

Dynamic Model Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 11 / 17

Dynamic Model Close Look at LwR Figure 2: The details of algorithms R : regularization term to avoid overfitting Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 12 / 17

Dynamic Model Loss function L new ( y n , ˆ y n ) = − y n · log ˆ y n (5) y n is the one-hot ground truth label vector y n is the softmax output of the network ˆ Knowledge Distillation loss y ′ y ′ � � L old ( y o , ˆ y o ) = − H o , ˆ o l y ′ ( i ) y ′ ( i ) � = − log ˆ o o i =1 l: number of labels y ( i ) o : ground truth/recorded probability y ( i ) ˆ o : current/predicted probability � 1 / T � 1 / T � y ( i ) � y ( i ) ˆ o o y ( i ) y ′ ( i ) = ˆ = � 1 / T , o o � 1 / T � � y ( j ) y ( j ) � � ˆ o o j j Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 13 / 17

Experiment results Experiment Setting and Results Settings : The resnet101 is used as our base model. The task is first sequentially trained on the training set. The total epoch of whole dataset is about 12*2(for each task) in total. Figure 3: Training with different methods and configurations, X represents for task name and average accuracy, while y is the accuracy. Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 14 / 17

Experiment results Conclusion We first training the task sequentially and got 93.33% average accuracy at Validation set across task 1 to task 12. However, during the training process, the accuracy test on Validation set is nearly 100%, which means the model is suffering from the catastrophic forgetting problem in sequentially training. EWC is then employed on the training process, however, the result is getting worse. Sequentially Training will be suffering from the catastrophic forgetting problem. Less training epochs out performances large training epochs. EWC training has a worse result due to the fact the estimation of Fisher Information Matrix might be biased estimated. In the future, we will focus on the dynamic graph for better preserving the memory of pervious task. Liguang Zhou ( School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial IROS2019, LL Object Recognition November 4, 2019 15 / 17

A Small Step to Remember: Study of Single Model VS Dynamic Model - PowerPoint PPT Presentation

A Small Step to Remember: Study of Single Model VS Dynamic Model Liguang Zhou School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS)

Demo (Step 1, Selection) Demo (Step 1, Optimization) Demo (Step 2, Selection) Demo (Step 2,

Quick guide Step 1: Purchasing an RSEvents! membership Step 2: Downloading RSEvents! Step 3:

Step by step guide Step 1: Purchasing an RSBlog! membership Step 2: Downloading RSBlog! Step 3:

Step by step guide Step 1: Purchasing an RSEvents! membership Step 2: Downloading RSEvents! Step

Step by step guide Step 1: Accessing the account Step 2: Download RSFiles! 2.1 Download the

Step 1 Step 2 Step 3 Step 4 Step 5 Preparation of a sketch Submission of birth map of all

Quick guide Step 1: Purchasing RSMail! Step 2: Download RSMail! Step 3: Installing RSMail! Step

Credential Assessment Mapping Privilege Escalation at Scale Matt Weeks @scriptjunkie1 Adversary

Small step and Auto Zhuoran Liu May 30th Difference between small step and big step The big

Flip-Flop One-bit Memory Something to Remember What I Remember D Q Remember Now! What I

Step by step guide Step 1: Purchasing a RSMembership! membership Step 2: Download RSMembership!

Selection of Design Team Step 3 Design Step 4 June 2013 Project Management Concept

Step by step guide Step 1: Purchasing an RSMail! membership Step 2: Download RSMail! 2.1.

Step by step guide Step 1: Purchasing a RSFirewall! membership Step 2: Download RSFirewall! 2.1.

Step by step guide Step 1: Purchasing a RSTickets!Pro membership Step 2: Downloading

Quick guide Step 1: Purchasing a RSComments! membership Step 2: Download RSComments! Step 3:

N URTURING W HOLE P ERSONS IN A T ECHNOLOGY - D RIVEN W ORLD : S UCCESS S TORIES FROM H ONG K ONG B

Methods for Experimental Analysis Marco Chiarandini Department of Mathematics & Computer

Student s t-test The value of t will be compared to values in the specific table of

Unit15:RoadMap(VERBAL)

CONTENTS Background Data collection Opportunities Strengths Capabilities

Network Overlap Community Structure Fabricio A. Breve 1 , Liang Zhao 1 , Marcos G. Quiles 2 ,

Direct Search Methods (nongradient methods) 1. Random search methods 2. Univariate method (one

Derivative Free Optimization Anne Auger (Inria and CMAP, Ecole Polytechnique) Laurent Dumas (U.

A Small Step to Remember: Study of Single Model VS Dynamic Model - PowerPoint PPT Presentation

A Small Step to Remember: Study of Single Model VS Dynamic Model Liguang Zhou School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS)

Demo (Step 1, Selection) Demo (Step 1, Optimization) Demo (Step 2, Selection) Demo (Step 2,

Quick guide Step 1: Purchasing an RSEvents! membership Step 2: Downloading RSEvents! Step 3:

Step by step guide Step 1: Purchasing an RSBlog! membership Step 2: Downloading RSBlog! Step 3:

Step by step guide Step 1: Purchasing an RSEvents! membership Step 2: Downloading RSEvents! Step

Step by step guide Step 1: Accessing the account Step 2: Download RSFiles! 2.1 Download the

Step 1 Step 2 Step 3 Step 4 Step 5 Preparation of a sketch Submission of birth map of all

Quick guide Step 1: Purchasing RSMail! Step 2: Download RSMail! Step 3: Installing RSMail! Step

Credential Assessment Mapping Privilege Escalation at Scale Matt Weeks @scriptjunkie1 Adversary

Small step and Auto Zhuoran Liu May 30th Difference between small step and big step The big

Flip-Flop One-bit Memory Something to Remember What I Remember D Q Remember Now! What I

Step by step guide Step 1: Purchasing a RSMembership! membership Step 2: Download RSMembership!

Selection of Design Team Step 3 Design Step 4 June 2013 Project Management Concept

Step by step guide Step 1: Purchasing an RSMail! membership Step 2: Download RSMail! 2.1.

Step by step guide Step 1: Purchasing a RSFirewall! membership Step 2: Download RSFirewall! 2.1.

Step by step guide Step 1: Purchasing a RSTickets!Pro membership Step 2: Downloading

Quick guide Step 1: Purchasing a RSComments! membership Step 2: Download RSComments! Step 3:

N URTURING W HOLE P ERSONS IN A T ECHNOLOGY - D RIVEN W ORLD : S UCCESS S TORIES FROM H ONG K ONG B

Methods for Experimental Analysis Marco Chiarandini Department of Mathematics &amp; Computer

Student s t-test The value of t will be compared to values in the specific table of

Unit15:RoadMap(VERBAL)

CONTENTS Background Data collection Opportunities Strengths Capabilities

Network Overlap Community Structure Fabricio A. Breve 1 , Liang Zhao 1 , Marcos G. Quiles 2 ,

Direct Search Methods (nongradient methods) 1. Random search methods 2. Univariate method (one

Derivative Free Optimization Anne Auger (Inria and CMAP, Ecole Polytechnique) Laurent Dumas (U.

Methods for Experimental Analysis Marco Chiarandini Department of Mathematics & Computer