Convolutional Prototype Ensemble Robust Stream Classification & - - PowerPoint PPT Presentation

convolutional prototype ensemble robust stream
SMART_READER_LITE
LIVE PREVIEW

Convolutional Prototype Ensemble Robust Stream Classification & - - PowerPoint PPT Presentation

UT DALLAS Erik Jonsson School of Engineering & ComputerScience Convolutional Prototype Ensemble Robust Stream Classification & Novel Class Detection Zhuoyi Wang * , Hemeng T ao * , Swarup Changra * , Latifur Khan * * The University of T


slide-1
SLIDE 1

UT DALLAS

Erik Jonsson School of Engineering & ComputerScience

FEARLESS engineering

Convolutional Prototype Ensemble Robust Stream Classification & Novel Class Detection

Zhuoyi Wang*, Hemeng T ao*, Swarup Changra*, Latifur Khan*

* The University of T exas at Dallas, Richardson TX,USA

This material is based upon work supported by

slide-2
SLIDE 2

Agenda

2 FEARLESS engineering

❑ High Dimensional Data Stream Mining and Challenges ❑ Shortcomings of Current Solutions ❑ The Proposed Approach

– Novel Class Detection – Classification – Incremental Learning – Performance Analysis & Improvement

❑ Experiments ❑ Discussion

slide-3
SLIDE 3

High Dimensional Stream Mining

➢ High Dimensional Data Stream:

– continuous flow of high dimensional instances. – common in life’s image recognition and text application.

➢ Challenge:

➢ May evolve new emerging class during stream scenario. ➢ Limited amount of labeled data. ➢ Time limited for the execution of learning methods.

3 FEARLESS engineering

Scene Stream in autonomous system Flow of news Summary in Social Network.

slide-4
SLIDE 4

Evolving new class (Novel Class)

Novel class

4 FEARLESS engineering

Novel Class

Previous work: ODIN[4], Open-Set[5]

Traditional low dimensional space of IRIS dataset Previous work: ECSMiner[1], SAND[2], ECHO[3]. High dimensional space in real world image data set (FASHION-MNIST)

[1]. Al-Khateeb, T., Masud, M. M., Khan, L., Aggarwal, C., Han, J., & Thuraisingham, B. (2012, December). Stream classification with recurring and novel class detection using class-based ensemble. In 2012 ICDM.​ [2]. Haque, Ahsanul, Latifur Khan, and Michael Baron. "Sand: Semi-supervised adaptive novel class detection and classification over data stream." In AAAI 2016.​ [3]. Haque, Ahsanul, et al. "Efficient handling of concept drift and concept evolution over stream data." In 2016 ICDE. [4]. Liang, Shiyu, Yixuan Li, and R. Srikant. "Enhancing the reliability of out-of-distribution image detection in neural networks." In ICLR 2017.​ [5]. Bendale, Abhijit, and Terrance E. Boult. "Towards open set deep networks." In CVPR 2016.

slide-5
SLIDE 5

Limitation of Time and Space

Class Set Network Models

M1 M2 D3

Ensemble

Ma D4

Prediction

D4 D5 M M5

3

Mb

Novel Class chunk

New coming stream instances

5 FEARLESS engineering

➢ Generating instances from novel/unseen class sets

➢ Incrementally training classifier ensemble from new emerge class sets.

Previous work:

[1]. Han, Shizhong, et al. "Incremental boosting convolutional neural network for facial action unit recognition." In NIPS 2016.​ [2]. Rebuffi, Sylvestre-Alvise, et al. "icarl: Incremental classifier and representation learning." In CVPR. 2017.

Note: Di contains instances from novel class set i

Dn D1 D2 D3

Mc

slide-6
SLIDE 6

Shortcomings of Current Solutions

6 FEARLESS engineering

❖Shortcomings:

– Novel Class Detection: For traditional appraoch like SAND[1], ECHO[2], they typically suiable for the low dimensional feature space, where the novel class instances farther away from clusters containing known class examples. For recent years Deep Neurual Network (DNN) based methods such as [3] and [4], they utilize the DNN with softmax output and filter threshold. However, softmax function tend to allocate the new coming samples to a known class with high confidence[5], only apply the softmax output for rejecting novelty class is not suitable enough.

[1]. Haque, Ahsanul, Latifur Khan, and Michael Baron. "Sand: Semi-supervised adaptive novel class detection and classification over data stream." In AAAI 2016.​ [2]. Haque, Ahsanul, et al. "Efficient handling of concept drift and concept evolution over stream data." In 2016 ICDE. [3]. Han, Shizhong, et al. "Incremental boosting convolutional neural network for facial action unit recognition." In NIPS 2016.​ [4]. Liang, Shiyu, Yixuan Li, and R. Srikant. "Enhancing the reliability of out-of-distribution image detection in neural networks." In ICLR 2017.​ [5]. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In ICLR, 2014.

slide-7
SLIDE 7

Shortcomings of Current Solutions

7 FEARLESS engineering

Incremetal ensemble of different DNN or layer

❖Shortcomings:

– Incremental Learning Current methods should also apply incremental learning to adapt changes along the high dimensional stream over a long period of time. Typical solution for the DNN mainly apply network ensemble [1] or fine tune, like: Shortcomes: Incresing of either DNN structure parameters or layer embedding during continous or lifelong learning scenario would be both time and space consuming.

[1]. Han, Shizhong, et al. "Incremental boosting convolutional neural network for facial action unit recognition." In NIPS 2016.​

slide-8
SLIDE 8

Motivation

8 FEARLESS engineering

We could model the data of each (existed) class as a Gaussian mixture component, so the novel class could be regarded as a different distribution compared with existed ones, although they may bear some resemblance to the existed classes.

Existed Class Distribution Potential Novel Class Distribution

Novel Class Detection: could be addressed through filter out the anomaly large distance between different distributions; Incremental Learning: could be solved by adding new distribution and updating existed ones.

slide-9
SLIDE 9

Proposed Approach: Prototype Ensemble Learning

9 FEARLESS engineering

✓ Novel Class Detection:

– Similar instances of a class forms different prototypes under a certain class, outlier exmaples potentially form a new prototype associated with a novel class which is easier to be detected.

✓ Stream Classification:

– Ensemble prototype as classifier to trained on different section of stream instances, and used for classification.

✓ Incremental Learning:

– Create new prototypes accoring to novel class instances continuesly during stream process, then update the existing prototypes to make it adapt changes along the stream.

➢ Prototype means class-characteristic distribution, if each class is regarded as a Gaussian mixture distribution, the prototypes would act as the mean value of each class' gaussian components

Class: Car

Prototype1 Prototype2 Prototype3 Prototype4

slide-10
SLIDE 10

Overview: CPE

10 FEARLESS engineering

slide-11
SLIDE 11

Prototype Establish

11 FEARLESS engineering

We employ a Deep Neural Network architecture with convolutional layers. For a given input X, the output of the network is denoted by the , where f is the feature representation and θ is the correspond network parameters. For every class i, we select a small set of instances Di from D, and form the exemplar set Ɛi. Then we form the initial prototype:

Here, each prototype is denoted by 𝑞𝑗𝑘, i indicates a class label index in 𝑍, and j is the prototype index. We denote the set of prototypes for each class 𝑧𝑗 ∈ 𝑍 by 𝑄𝑗. 𝑞𝑗𝑘 ∈ 𝑄𝑗

slide-12
SLIDE 12

Prototype Ensemble Loss

12 FEARLESS engineering

We focus on improving local separation between prototypes.

slide-13
SLIDE 13

Overall loss function for Training

13 FEARLESS engineering

Similar with softmax/cross-entropy, the probability that x belongs to the prototype 𝑞𝑗𝑘 is defined as: where C is the size of class set Y , K is the maximum number

  • f the prototypes per class. Therefore, the probability of

class label assignment for x is given:

slide-14
SLIDE 14

Overall loss function for Training

➢ Overall objective function

14 FEARLESS engineering

maximize the probability of x being associated with a prototype in P, could be regarded as the Cross Entropy loss for prototype. Act as a regularization of the loss function

slide-15
SLIDE 15

Novel Class Detection

  • utlier

Pi1

  • utlier

Pi2 Pik Ensemble prot 𝐐𝐣 for class i

. . .

AND True X is a potential novel class instance False

  • utlier

X comes from class i (Step 1) (Step 2) Coming instance X Threshold of prototype to determined accept or reject:

15 FEARLESS engineering

Go through DNN and get Calculate and compare distance with other prototypes If the distance of x to it’s nearest prototype is larger than correspond threshold, we degerming it as a novel class instance

slide-16
SLIDE 16

Incremental Learning

16 FEARLESS engineering

slide-17
SLIDE 17

17 FEARLESS engineering

Prototype Based Incremental Learning

Then: Apply back-prop to update parameter θ in Network model.

Establish New Prototype

Period 1 Period 2 Period 3

slide-18
SLIDE 18

18 FEARLESS engineering

Prototype Based Incremental Learning

Then: Apply back-prop to update parameter θ in Network model. Period 1 Period 2 Period 3

Update Existing Prototype

slide-19
SLIDE 19

Complexity

19 FEARLESS engineering

Time Complexity: Size of novel class candidate buffer is 𝑡𝐶, time complexity of calculating the gradient of one example is a constant 𝐷 , mini- batch size is 𝑡𝑛𝑗𝑜𝑗, epochs number is 𝑜𝑓 , the number of classes in Stream is 𝑍′ . The time complexity of CPE is O ቀ ቁ 𝐷𝑜𝑓൫ ൯ 𝑇𝑛𝑗𝑜𝑗

2

+ 𝑡𝐶 𝑍′ . Space Complexity: It is a constant since the space used by exemplars, prototypes, buffer and network are constant.

slide-20
SLIDE 20

Experiment

20 FEARLESS engineering

Name of Data Set Number of Instances Number of Features

FASHION-MNIST 70,000 784 SVHN 100,000 3072 CIFAR-10 60,000 3072 LSUN 80,000 4096 CINIC 106,110 4096 New-York-News 66,000 300

CPE setup:

  • 1. DenseNet as the DNN straucture.
  • 2. M = 2000 exemplars, K = 10 (Maximum prototype amount)
slide-21
SLIDE 21

Learned Representation and Classification

21 FEARLESS engineering

The upper plot illustrates the visualization result from the input images, the lower left plot is the

  • utput of softmax (used by HG-CNN.etc), and the

lower right plot is the output of CPE. Comparing to HG-CNN or the original input, our method shows that instances from novel classes are well separated from other clusters containing known class instances. We shows the effect of querying/requesting less number of potential novel class instances from the buffer in CPE. CPE would not request so many instances’ real label comparing with previous methods.

slide-22
SLIDE 22

Performance: Novel Class Detection

22 FEARLESS engineering

sample size (%) soft-max CPE 50 93.82 ± 0.10 93.87 ± 0.12 30 91.27 ± 0.15 92.69 ± 0.16 10 88.36 ± 0.30 90.85 ± 0.18 5 81.52 ± 0.52 85.77 ± 0.34

Novel Class detection performance over data streams. Here - denotes failure of concept evolution detection. Test accuracy (%) under different percentages of training samples.

slide-23
SLIDE 23

Performance: Updating Time Analysis

23 FEARLESS engineering

Time consuming of CPE is more robust and efficient than traditional methods. When higher dimension examples used, the execution time of traditional methods could grow rapidly, the update time of SENC-Mas or ECSMiner is significantly higher us. Smaller number of exemplars would reduce the update time drastically.

Number of exemplars Time Consuming 1000 58.66±0.19 1500 94.01±0.24 2000 115.45±0.41 2500 142.18±0.55

slide-24
SLIDE 24

Parameter Sensitivity

24 FEARLESS engineering

The results observed for margin value is different: if the value of is too small, it drastically affects classification accuracy due to different classes being relatively close to each other. This also affects novelty detection.

slide-25
SLIDE 25

Conclusion

25 FEARLESS engineering

➢ CPE:

– is a Deep Neural Net based multi-task learning framework, which handles both novel class detection and incremental learning tasks. – Learns multiple prototypes in representation feature space corresponding to each class, flexible to various tasks. – outperforms state of the art approaches both in classification accuracy and novel class detection.

slide-26
SLIDE 26

FEARLESS engineering 26

Thanks and Q&A