Memory Models for Incremental Learning Architectures Viktor - - PowerPoint PPT Presentation

memory models for incremental learning
SMART_READER_LITE
LIVE PREVIEW

Memory Models for Incremental Learning Architectures Viktor - - PowerPoint PPT Presentation

Memory Models for Incremental Learning Architectures Viktor Losing, Heiko Wersing and Barbara Hammer Outline Motivation Case study: Personalized Maneuver Prediction at Intersections Handling of Heterogeneous Concept Drift Motivation


slide-1
SLIDE 1

Memory Models for Incremental Learning Architectures

Viktor Losing, Heiko Wersing and Barbara Hammer

slide-2
SLIDE 2

➢ Motivation ➢ Case study: Personalized Maneuver Prediction at Intersections ➢ Handling of Heterogeneous Concept Drift

Outline

slide-3
SLIDE 3

Motivation

➢ Personalization

− adaptation to user habits / environments

➢ Lifelong-learning

slide-4
SLIDE 4

Challenges - Personalized online learning

➢ Learning from few data

slide-5
SLIDE 5

➢ Learning from few data ➢ Sequential data with predefined order

Challenges - Personalized online learning

slide-6
SLIDE 6

➢ Learning from few data ➢ Sequential data with predefined order ➢ Concept drift

Challenges - Personalized online learning

slide-7
SLIDE 7

➢ Learning from few data ➢ Sequential data with predefined order ➢ Concept drift ➢ Cooperation between average and personalized model

Challenges - Personalized online learning

slide-8
SLIDE 8

Change is everywhere

➢ Coping with „arbitrary“ changes

slide-9
SLIDE 9

Change of taste / interest

slide-10
SLIDE 10

Seasonal changes

slide-11
SLIDE 11

Change of context

slide-12
SLIDE 12

Rialto task: Change of lighting conditions

slide-13
SLIDE 13

Setting

➢ Supervised stream classification

− Predict for an incoming stream of features x1, … , xj, xi ℝn the corresponding labels y1, … yj, yi ∈ {1, … , c}

➢ On-line learning scheme

− After each touple xi, yi generate a new model hi to predict the next incoming example

slide-14
SLIDE 14

Setting

➢ Supervised stream classification

− Predict for an incoming stream of features x1, … , xj, xi ℝn the corresponding labels y1, … yj, yi ∈ {1, … , c}

➢ On-line learning scheme

− After each touple xi, yi generate a new model hi to predict the next incoming example

slide-15
SLIDE 15

Setting

➢ Supervised stream classification

− Predict for an incoming stream of features x1, … , xj, xi ℝn the corresponding labels y1, … yj, yi ∈ {1, … , c}

➢ On-line learning scheme

− After each touple xi, yi generate a new model hi to predict the next incoming example

slide-16
SLIDE 16

Setting

➢ Supervised stream classification

− Predict for an incoming stream of features x1, … , xj, xi ℝn the corresponding labels y1, … yj, yi ∈ {1, … , c}

➢ On-line learning scheme

− After each touple xi, yi generate a new model hi to predict the next incoming example

slide-17
SLIDE 17

Setting

➢ Supervised stream classification

− Predict for an incoming stream of features x1, … , xj, xi ℝn the corresponding labels y1, … yj, yi ∈ {1, … , c}

➢ On-line learning scheme

− After each touple xi, yi generate a new model hi to predict the next incoming example

slide-18
SLIDE 18

Setting

➢ Supervised stream classification

− Predict for an incoming stream of features x1, … , xj, xi ℝn the corresponding labels y1, … yj, yi ∈ {1, … , c}

➢ On-line learning scheme

− After each touple xi, yi generate a new model hi to predict the next incoming example

Preconditions for application:

− Obtainable labels in retrospective

slide-19
SLIDE 19

Definition

➢ Concept drift is given when the joint distribution changes

∃𝑢0, 𝑢1: 𝑄𝑢0 𝑌, 𝑍 ≠ 𝑄𝑢1 𝑌, 𝑍

slide-20
SLIDE 20

Definition

∃𝑢0, 𝑢1: 𝑄𝑢0 𝑌, 𝑍 ≠ 𝑄𝑢1 𝑌, 𝑍

➢ Concept drift is given when the joint distribution changes

Image source : Gama et al. “A survey on concept drift adaptation”, ACM Computing Surveys 2014

𝑢0

slide-21
SLIDE 21

Definition

➢ Concept drift is given when the joint distribution changes

Image source : Gama et al. “A survey on concept drift adaptation”, ACM Computing Surveys 2014

𝑢0 𝑄 𝑍 𝑌 changes 𝑢1 Real drift

∃𝑢0, 𝑢1: 𝑄𝑢0 𝑌, 𝑍 ≠ 𝑄𝑢1 𝑌, 𝑍

slide-22
SLIDE 22

Definition

∃𝑢0, 𝑢1: 𝑄𝑢0 𝑌, 𝑍 ≠ 𝑄𝑢1 𝑌, 𝑍

➢ Concept drift is given when the joint distribution changes

Image source : Gama et al. “A survey on concept drift adaptation”, ACM Computing Surveys 2014

𝑢0 𝑄 𝑍 𝑌 changes 𝑢1 𝑄(𝑌) changes Virtual drift Real drift

slide-23
SLIDE 23

Definition

∃𝑢0, 𝑢1: 𝑄𝑢0 𝑌, 𝑍 ≠ 𝑄𝑢1 𝑌, 𝑍

➢ Concept drift is given when the joint distribution changes

Image source : Gama et al. “A survey on concept drift adaptation”, ACM Computing Surveys 2014

𝑢0 𝑄 𝑍 𝑌 changes 𝑢1 𝑄(𝑌) changes Virtual drift Real drift

slide-24
SLIDE 24

Related work

➢ Dynamic sliding windows techniques

− PAW Bifet et al. “Efficient Data Stream Classification Via Probabilistic Adaptive Windows“, ACM 2013

➢ Ensemble methods with various weighting schemes

− LVGB Bifet et al. “Leveraging Bagging for Evolving Data Streams“, ECML-PKDD 2010 − Learn++.NSE Elwell et al. “Incremental Learning in Non-Stationary Environments“, IEEE-TNN 2011 − DACC Jaber et al. “Online Learning: Searching for the Best Forgetting Strategy Under Concept Drift“, ICONIP-2013

slide-25
SLIDE 25

Related work

➢ Dynamic sliding windows techniques

− PAW Bifet et al. “Efficient Data Stream Classification Via Probabilistic Adaptive Windows“, ACM 2013

➢ Ensemble methods with various weighting schemes

− LVGB Bifet et al. “Leveraging Bagging for Evolving Data Streams“, ECML-PKDD 2010 − Learn++.NSE Elwell et al. “Incremental Learning in Non-Stationary Environments“, IEEE-TNN 2011 − DACC Jaber et al. “Online Learning: Searching for the Best Forgetting Strategy Under Concept Drift“, ICONIP-2013

➢ Drawbacks:

− Target specific drift types − Require hyperparameter setting according to the expected drift − Discard former knowledge that still may be valuable

slide-26
SLIDE 26

Drawbacks

slide-27
SLIDE 27

Drawbacks – Usual result

slide-28
SLIDE 28

Drawbacks – Desired behavior

slide-29
SLIDE 29

Drawbacks – Desired behavior

slide-30
SLIDE 30

Drawbacks – Desired behavior

slide-31
SLIDE 31

Self Adaptive Memory (SAM)

slide-32
SLIDE 32

Self Adaptive Memory (SAM)

kNN model kNN model

slide-33
SLIDE 33

Self Adaptive Memory (SAM)

slide-34
SLIDE 34

Moving squares dataset

slide-35
SLIDE 35

STM size adaptation

slide-36
SLIDE 36

STM size adaptation

27.12 % Error

slide-37
SLIDE 37

STM size adaptation

27.12 % Error 13.12 %

slide-38
SLIDE 38

STM size adaptation

27.12 % Error 13.12 % 7.12 %

slide-39
SLIDE 39

STM size adaptation

27.12 % Error 13.12 % 7.12 % 0.0 %

slide-40
SLIDE 40

STM size adaptation

27.12 % Error 13.12 % 7.12 % 0.0 %

slide-41
SLIDE 41

Distance-based cleaning

slide-42
SLIDE 42

Distance-based cleaning

cleaning STM-consistent data

slide-43
SLIDE 43

Distance-based cleaning

Data to clean STM

slide-44
SLIDE 44

Distance-based cleaning

Data to clean STM

slide-45
SLIDE 45

Distance-based cleaning

Data to clean STM

slide-46
SLIDE 46

Distance-based cleaning

STM Data to clean

slide-47
SLIDE 47

Distance-based cleaning

STM Data to clean

slide-48
SLIDE 48

Distance-based cleaning

STM Data to clean

slide-49
SLIDE 49

Adaptive compression

Long Term Memory cleaning STM-consistent data

slide-50
SLIDE 50

Adaptive compression

Long Term Memory cleaning class-wise clustering STM-consistent data

slide-51
SLIDE 51

Prediction

slide-52
SLIDE 52

Prediction

slide-53
SLIDE 53

Prediction

slide-54
SLIDE 54

Prediction

slide-55
SLIDE 55

Moving squares by SAM

slide-56
SLIDE 56

Results: Error rates / ranks

slide-57
SLIDE 57

SAM achieves best results

slide-58
SLIDE 58

SAM is robust

slide-59
SLIDE 59

Reasons for robustness

➢ Adaptation guided through error minimization

− Dynamic size of the STM − Model selection for prediction − Reduction of hyperparameters

➢ Consistency between STM and LTM ➢ LTM acts as safety net

slide-60
SLIDE 60

Q & A