Riemannian Walk for Incremental Learning: Understanding Forgetting - PowerPoint PPT Presentation

Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence Arslan Chaudhry et al. Presented by Miloš Prágr Pattern Recognition and Computer Vision Reading Group Faculty of Electrical Engineering Czech Technical University in Prague January 14, 2020 M. Prágr 1 / 36

Outline � Incremental Learning � Elastic Weight Consolidation � Path Integral � Riemannian Walk M. Prágr 2 / 36

Incremental Learning Online learning approaches use training samples one by one, without knowing their number in advance, to optimise their internal cost function Incremental learning refers to online learning strategies which work with limited memory resources Gepperth and Hammer, Incremental learning algorithms and applications , ESANN 2016 M. Prágr 3 / 36

Challenges of Incremental Learning 1. Online model parameter adaptation 2. Concept drift 3. Stability-plasticity dilema 4. Adaptive model complexity and meta-parameters 5. Efficient memory models 6. Model benchmarking Gepperth and Hammer, Incremental learning algorithms and applications , ESANN 2016 M. Prágr 4 / 36

Online Model Parameter Adaptation 1. Online model parameter adaptation 2. Concept drift 3. Stability-plasticity dilema 4. Adaptive model complexity and meta-parameters 5. Efficient memory models 6. Model benchmarking medium.com/starschema-blog Fritzke, A Growing Neural Gas Network Learns Topologies , NIPS 1994 M t ← update ( M t − 1 , ( x t , y t )) M. Prágr 5 / 36

Concept Drift 1. Online model parameter adaptation 2. Concept drift 3. Stability-plasticity dilema 4. Adaptive model complexity and meta-parameters 5. Efficient memory models 6. Model benchmarking Webb et al., 2016 � The distribution underlying the data changes during learning M. Prágr 6 / 36

Concept Drift 1. Online model parameter adaptation 2. Concept drift 3. Stability-plasticity dilema 4. Adaptive model complexity and meta-parameters 5. Efficient memory models 6. Model benchmarking Moreno-Torres et al., 2012 Covariate shift of p ( x ) � The distribution underlying the data changes during learning M. Prágr 6 / 36

Concept Drift 1. Online model parameter adaptation 2. Concept drift 3. Stability-plasticity dilema 4. Adaptive model complexity and meta-parameters 5. Efficient memory models Moreno-Torres et al., 2012 6. Model benchmarking Concept shift of p ( y | x ) � The distribution underlying the data changes during learning M. Prágr 6 / 36

Stability-plasticity Dilema 1. Online model parameter adaptation 2. Concept drift 3. Stability-plasticity dilema 4. Adaptive model complexity and meta-parameters 5. Efficient memory models 6. Model benchmarking � Quick updates cause old information to be forgotten equally quickly � Gradual forgetting is natural component of both artificial and natural systems � Catastrophic forgetting - completely disrupting or erasing previously learned information French, Catastrophic forgetting in connectionist networks , Trends in Cognitive Sciences 1999 M. Prágr 7 / 36

Adaptive Model Complexity and Meta-parameters 1. Online model parameter adaptation 2. Concept drift 3. Stability-plasticity dilema 4. Adaptive model complexity and meta-parameters 5. Efficient memory models 6. Model benchmarking � It is impossible to estimate the model complexity in advance � Minimal complexity increased by concept drift � Maximal complexity bounded by resources M. Prágr 8 / 36

Efficient Memory Models 1. Online model parameter adaptation 2. Concept drift 3. Stability-plasticity dilema 4. Adaptive model complexity and meta-parameters 5. Efficient memory models 6. Model benchmarking M. Prágr 9 / 36

Model Benchmarking 1. Online model parameter adaptation 2. Concept drift 3. Stability-plasticity dilema 4. Adaptive model complexity and meta-parameters 5. Efficient memory models 6. Model benchmarking 1. Incremental vs non-incremental 2. Incremental vs incremental M. Prágr 10 / 36

Motivation: Deployment of Incremental Learning Environment representation Traversal cost modeling Model inference Exploration 2.5D map Traversability map Traversal cost map Frontier selection Exteroception Goal selection and Confidence map Path planning Terrain descriptors Traversal cost model [0.12, 2.34, … , 0.30] [1.14, 3.76, … , 0.11] GP 1 GP 2 … GP k … [0.33, 1.07, … ,0.76] Robust Bayesian Committee Machine Proprioception Online Incremental Learning of the Terrain Traversal Cost in Autonomous Exploration , RSS 2019 M. Prágr 11 / 36

Forgetting and Intransigence � Forgetting: catastrophically forgetting knowledge of previous tasks � Intransigence: inability to update the knowledge to learn the new task Chaudhry et al., Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , ECCV 2018 M. Prágr 12 / 36

Forgetting and Intransigence Measures: Preliminaries � General setup: stream of tasks, each corresponding to a set of labels � Let the dataset D k corresponding to the k -th task be as follows D k = { ( x k i , y k i ) } n k i =1 , where k is the task identifier, x k i ∈ X the inputs, and y k i ∈ Y the ground truth labels � Single-head evaluation - the task identity k is unknown in testing � Multi-head evaluation - the task identity k is given in testing M. Prágr 13 / 36

Average Accuracy � Accuracy a k,j on the test set of the j -th task after training incrementally to task k is s.t. j ≤ k a k,j � Average accuracy A k at task k is defined as k � A k = 1 a k,j k j =1 M. Prágr 14 / 36

Forgetting Measure � Forgetting f k j for the j -th task training up to task k is f k j = max l ∈ 1 , ··· ,k − 1 a l,j − a k,j , s.t. j < k � Average forgetting F k at the k -th task is defined as k − 1 � 1 f k F k = j k − 1 j =1 � Backward transfer - influence of learning task k has on performance of task j < k f k j < 0 implies positive backward transfer : the performance on a previous task was improved by learning additional tasks M. Prágr 15 / 36

Intransigence Measure � Reference model accuracy a ∗ k is learned using the whole dataset as ∪ k l =1 D l � Intransigence I k at the k -th task is defined as I k = a ∗ k − a k,k � I k j < 0 implies positive forward transfer : learning incrementally up to task k positively influences model’s knowledge about it M. Prágr 16 / 36

Outline � Incremental Learning � Elastic Weight Consolidation � Path Integral � Riemannian Walk M. Prágr 17 / 36

Elastic Weight Consolidation Motivation: continual learning in the neocortex relies on task-specific synaptic consolidation, where knowledge is encoded by rendering a proportion of synapses less plastic Remember old tasks by selectively slowing down learning on the weights important for those tasks Aim for fast learning rates on parameters unconstrained by the previous tasks and slow rate from crucial parameters Kirkpatrick et al., Overcoming catastrophic forgetting in neural networks , PNAS 2016 M. Prágr 18 / 36

Elastic Weight Consolidation Remember old tasks by selectively slowing down learning on the weights important for those tasks � Given dataset D , select the configuration θ ∗ as θ ∗ = argmax θ p ( θ |D ) � Bayes gives the conditional probability p ( θ |D ) as log p ( θ |D ) = log p ( D| θ ) + log p ( θ ) − log p ( D ) negative loss function −L ( θ ) M. Prágr 19 / 36

Elastic Weight Consolidation � Spliting the data into tasks A and B gives log p ( θ |D ) = log p ( D B | θ ) + log p ( θ |D A ) − log p ( D B ) negative loss function for task B −L B ( θ ) intractable posterior of task A � Approximating the posterior as a Gaussian distribution given as N ( θ ∗ A , ( diag ( F )) − 1 ) MacKay, A practical Bayesian framework for backpropagation networks , Neural Computing 1992 where the precision diag ( F ) is the diagonal of the Fisher information matrix F defined as �� δ � � δ �� log p θ ( y | x ) log p θ ( y | x ) [ F ] ij = E ( x ,y ) ∼D δθ i δθ j M. Prágr 20 / 36

Elastic Weight Consolidation � The Fisher information measures sensitivity of function f ( x | θ ) to changes of θ � Approximating the posterior as a Gaussian distribution given as N ( θ ∗ A , ( diag ( F )) − 1 ) MacKay, A practical Bayesian framework for backpropagation networks , Neural Computing 1992 where the precision diag ( F ) is the diagonal of the Fisher information matrix F defined as �� δ � � δ �� [ F ] ij = E ( x ,y ) ∼D log p θ ( y | x ) log p θ ( y | x ) δθ i δθ j � The Fisher matrix is equivalent to the second derivative of the loss near a minimum and is always positive semidefinite Pascanu and Bengio, Revisiting natural gradient for deep networks , 2013 � The loss function to be minimized is � λ 2 ( F ii )( θ i − θ ∗ A,i ) 2 L ( θ ) = L B ( θ ) + i M. Prágr 21 / 36

Riemannian Walk for Incremental Learning: Understanding Forgetting - PowerPoint PPT Presentation

Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence Arslan Chaudhry et al. Presented by Milo Prgr Pattern Recognition and

The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at

On the Smallest Enclosing Riemannian Balls On Approximating the Riemannian 1-Center

Riemannian manifolds with nontrivial Limbeek local symmetry Wouter van Limbeek University of

D U E o i r ud ig el it i R o e t Riemannian Holonomy. To a Riemannian manifold ( M n

Incremental Garbage Collection Part II Roland Schatz Incremental Garbage Collection p.1/22

Southeast Cooler Corporation Southeast Cooler Corporation Walk Walk- -In Cooler In Cooler

Turn Right Walk forward 100 pixels Start Here Walk Forward Turn Left and 100 pixels walk

Onelight.com Training Series Connecting the Pyramids and the Crystal Cities the ISIS Walk 2 The

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Jesse Read 1 ,

Incremental and Non-incremental Learning of Control Knowledge for Planning Daniel Borrajo Mill

Incremental Classification: First Step into Lifelong Learning PAN Xinyu MMLab, Department of IE

Roslindale Village Walk Assessment Walk Assessment Introduce all participants Discuss basics of

John Finley Walk ADA Ramps Located at East 82nd and East 83rd Streets, John Finley Walk Borough

Be Inspired. Get Connected. Walk MS. Be Inspired. Get Connected. Walk MS. OBJECTIVES

Sin: to miss the mark. Walk the talk. Sin: to miss the mark. Walk the talk. The Mark: the

The Riemannian Potato: an Automatic and Adaptive Artifact Detection Method for Online Experiments

Overcoming Multi-Model Forgetting Y. Benyahia, K. Yu, K. Bennani-Smires, M. Jaggi, A. Davison, M.

Xilai Li 1* , Yingbo Zhou 2* , Tianfu Wu 1 , Richard Socher 2 , and Caiming Xiong 2 North Carolina

The Mobile Library at UCD: Achievements & Plans Samantha Drennan Josh Clark Head of Library

quancol . ........ . . . ... ... ... ... ... ... ... www.quanticol.eu Spatial

Concise Preservation by combining Managed Forgetting and Contextualized Remembering Research

1 Optimization in decision graphs Unfolding to decision tree Only option until Shachter

Week 4 Video 7 Memory Algorithms Is future correctness enough? Up until this point weve

Bit attacks D. J. Bernstein University of Illinois at Chicago From: andr...@ise... Date: 11 Feb

Riemannian Walk for Incremental Learning: Understanding Forgetting - PowerPoint PPT Presentation

Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence Arslan Chaudhry et al. Presented by Milo Prgr Pattern Recognition and

The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at

On the Smallest Enclosing Riemannian Balls On Approximating the Riemannian 1-Center

Riemannian manifolds with nontrivial Limbeek local symmetry Wouter van Limbeek University of

D U E o i r ud ig el it i R o e t Riemannian Holonomy. To a Riemannian manifold ( M n

Incremental Garbage Collection Part II Roland Schatz Incremental Garbage Collection p.1/22

Southeast Cooler Corporation Southeast Cooler Corporation Walk Walk- -In Cooler In Cooler

Turn Right Walk forward 100 pixels Start Here Walk Forward Turn Left and 100 pixels walk

Onelight.com Training Series Connecting the Pyramids and the Crystal Cities the ISIS Walk 2 The

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Jesse Read 1 ,

Incremental and Non-incremental Learning of Control Knowledge for Planning Daniel Borrajo Mill

Incremental Classification: First Step into Lifelong Learning PAN Xinyu MMLab, Department of IE

Roslindale Village Walk Assessment Walk Assessment Introduce all participants Discuss basics of

John Finley Walk ADA Ramps Located at East 82nd and East 83rd Streets, John Finley Walk Borough

Be Inspired. Get Connected. Walk MS. Be Inspired. Get Connected. Walk MS. OBJECTIVES

Sin: to miss the mark. Walk the talk. Sin: to miss the mark. Walk the talk. The Mark: the

The Riemannian Potato: an Automatic and Adaptive Artifact Detection Method for Online Experiments

Overcoming Multi-Model Forgetting Y. Benyahia, K. Yu, K. Bennani-Smires, M. Jaggi, A. Davison, M.

Xilai Li 1* , Yingbo Zhou 2* , Tianfu Wu 1 , Richard Socher 2 , and Caiming Xiong 2 North Carolina

The Mobile Library at UCD: Achievements &amp; Plans Samantha Drennan Josh Clark Head of Library

quancol . ........ . . . ... ... ... ... ... ... ... www.quanticol.eu Spatial

Concise Preservation by combining Managed Forgetting and Contextualized Remembering Research

1 Optimization in decision graphs Unfolding to decision tree Only option until Shachter

Week 4 Video 7 Memory Algorithms Is future correctness enough? Up until this point weve

Bit attacks D. J. Bernstein University of Illinois at Chicago From: andr...@ise... Date: 11 Feb

The Mobile Library at UCD: Achievements & Plans Samantha Drennan Josh Clark Head of Library