Stress Classification: A Deep Stacked Autoencoder Approach Yusuf - - PowerPoint PPT Presentation

stress classification a deep stacked autoencoder approach
SMART_READER_LITE
LIVE PREVIEW

Stress Classification: A Deep Stacked Autoencoder Approach Yusuf - - PowerPoint PPT Presentation

Stress Classification: A Deep Stacked Autoencoder Approach Yusuf Gandhi Putra Faculty of Information Technology Universitas Advent Indonesia 2017 1/32 Table of Contents Introduction 1 Literature Review 2 Dataset 3 Experiment 4


slide-1
SLIDE 1

1/32

Stress Classification: A Deep Stacked Autoencoder Approach

Yusuf Gandhi Putra

Faculty of Information Technology Universitas Advent Indonesia

2017

slide-2
SLIDE 2

2/32

Table of Contents

1

Introduction

2

Literature Review

3

Dataset

4

Experiment

5

Conclusions

slide-3
SLIDE 3

3/32

Introduction

slide-4
SLIDE 4

4/32

Introduction

Stress is an inseparable part of modern people’s lives. From personal problems to pressure in work places, people may find stress stimuli from challenging circumstances within their daily activities. Stress is considered harmful to personal well-being and may negatively impact both physical and mental health. It is known as a factor that can lead to unfavourable state, even terminal illnessess (Sharma et al., 2013). Understanding stress better has become crucial as it may prevent its negative effects on people’s personal well-beings. Several studies have reported using facial data to detect stress. Capturing these types of data uses non-intrusive methods, e.g. recording facial patterns using video or thermal cameras. Despite

slide-5
SLIDE 5

5/32

Introduction

the unconventional way to collect data for stress identification, the studies show promising results. A study shows that they succesfully identify stress by using thermal spectrums (TS) due to the skin temperature changes (Yuen et al., 2009). Furthermore, a number of studies use visual spectrums (VS) to model emotion (Dhall et al., 2011) and depression (Joshi et al., 2012). In a more recent study in this non-obtrusive approach, Sharma et.

  • al. attempted to classify stress based on facial patterns using the

combination of VS and TS (Sharma et al., 2013) from the ANU StressDB. Based on Sharma’s work, we attempt to classify stress based on thermal images using deep neural network approach.

slide-6
SLIDE 6

6/32

Literature Review

slide-7
SLIDE 7

7/32

Neural networks

Artificial Neural Networks (ANNs) are a computational model which is inspired by how human brains process information (Rojas, 2013). This model utilizes neurons, the basic building blocks of

  • ANNs. These neurons are highly interconnected to simulate human

brains mechanisms in solving problems. McCulloch and Pitts’s work in 1943 was considered the first attempt of building neural networks and laid the foundation of the field (Abraham, 2002). Their first model of neurons, known as McCulloch Pitts (MCP) neurons, was built on several assumptions

  • n how actual neurons work. One of the assumptions was that

activities of neurons are “all-or-none” processes (McCulloch and Pitts, 1943), which are represented within the modern neural networks model by the usage of activation functions.

slide-8
SLIDE 8

8/32

Neural networks

A neuron, the basic building block of neural networks, is a computational unit that takes x1, x2, . . . , xn as inputs. This computational unit sums all the inputs multiplied by their respective weights, i.e. w1, w2, . . . , wn. The neuron also has an

  • utput as a result of the input calculation

hW ,b(x) = f (W Tx) = f (n

i=1 Wixi + b). Figure 1: Neuron architecture

slide-9
SLIDE 9

9/32

Autoencoders

Autoencoders, which are sometimes referred as autoassociative neural networks, autoassociators, or Diabolo networks (Scholz, 2012; Hinton and Salakhutdinov, 2006; Bengio, 2009), are neural networks that are trained to replicate the inputs in the output

  • layer. The autoencoders attemps to produce hW ,b(x) ≈ x. i.e., the

autoencoders try to approximate the identity function. Although the attempt to approximate the identity function seems trivial, by putting constraints in the number of hidden neurons forces the autoassociative neural networks to compress the input and, as a result of the compression, produce the lower representation of the inputs (Ng, 2011).

slide-10
SLIDE 10

10/32

Autoencoders

Figure 2: Architecture of autoencoders

slide-11
SLIDE 11

11/32

Stress Classification

Stress has been an inseparable part of modern people’s lives

  • nowadays. It has become important to better understand it as

stress may cause undesirable conditions, e.g. physical illnesses. Thus, stress identification has been explored in numerous studies, including within the Computer Science field. Various machine learning techniques have been employed to achieve this, ranging from decision trees to Support Vector Machines (SVM). The measures used as the stress input vary as well, e.g. brain activity and skin response. These measures, however, often employ obtrusive methods i.e. the requirement to wear sensors on certain body parts. (Sharma et al., 2013) introduced a non-intrusive stress recognition model using thermal (TS) and visual (VS) spectrums. The TS can be captured as the blood flow under the surface of human face

slide-12
SLIDE 12

12/32

Stress Classification

skin and has been reported successfully used as a stress measure (Yuen et al., 2009). Best stress recognition rate from Sharma et al. was 72% using the combination of dynamic thermal patterns in histograms (HDTP) with TS and local binary patterns in three orthogonal planes (LBP-TOP) with VS.

slide-13
SLIDE 13

13/32

Dataset

slide-14
SLIDE 14

14/32

Dataset

The description of the data set is as follows: There are in total of 31 participants. The data set consists of normal RGB & thermal data, referred as visual (VS) and thermal (TS) spectrums respectively. Each participant watched a sequence of stressful and non-stressful film clips while was being recorded.

Figure 3: The experimental setup (Sharma et al., 2013)

slide-15
SLIDE 15

15/32

Example data

Figure 4: An example frame from the ANU StressDB dataset

slide-16
SLIDE 16

16/32

Data Preprocessing

A script from previous research is used to preprocess the thermal

  • videos. The script extracts the video frames and store it as still

images into 20 separated segments per video. This yields to, as there are 31 subjects in the data collection, 620 different segments. The number of frames in each segment vary and the size of each frame is 640x480 pixels as a result of the thermal video’s ratio of 4:3. The following algorithm is used to obtain the middle frame as one approach to frame summarization (Sze, Lam, and Qiu, 2005)

slide-17
SLIDE 17

17/32

Data Preprocessing

slide-18
SLIDE 18

18/32

Further Data Preprocessing

As the aim of the research is to identify the state of stress based

  • n the thermal data on participants’s facial region, we cropped the

facial region into the size of 200 × 200 pixels.

Figure 5: An example image in the previous and further-preprocessed datasets.

slide-19
SLIDE 19

19/32

Further Data Preprocessing

By reducing the images, the input size to the network decreased considerably to 120,000 neurons, only 13.02% of the original input size.

slide-20
SLIDE 20

20/32

Experiment

slide-21
SLIDE 21

21/32

Experiment - Introduction

Autoencoders are able to extract lower features of images as their behaviours are similar to compression (Ng, 2011). Thus, it is possible to recognize features solely by using autoencoders and a final softmax layer for classification purposes. In this experiment, we decided to use stacked autoencoders on the ANU StressDB to classify stress.

slide-22
SLIDE 22

22/32

Experiment - Architecture Overview

As the images are in RGB channel mode, we had to separate the channels and trained each channel with a different autoencoder as we designed an autoencoder to classify one color channel.

slide-23
SLIDE 23

23/32

Experiment - Architecture Overview

At the end of each autoencoder, there is one softmax layer whose task is to do the final classification. We decided the final classification results by taking the majority vote of the three channels and C(X) = mode{aR(XR), aG(XG), aB(XB)} (1) where C(X) is the final classification function; aR, aG, ab are autoencoders for red, green, blue channels respectively; and X is the image input, while XR, XG, XB are red, green, blue sub-channels of the image.

slide-24
SLIDE 24

24/32

Experiment - Architecture Overview

slide-25
SLIDE 25

25/32

Experiment - Network Configuration

We came up with two different architectures. In the first one, we decided to compress the pixels to 4,000 neurons on the first autoencoder layer and 400 on the second layer. The last layer is the softmax layer for the final classifying job. In the second topology, we used the first architecture as the basis and add two more autoencoder layers. The third layer consists of 40 neurons and the fourth autoencoder layer consists of 10 neurons. The architectures are as follows: First stacked autoencoder: 200-by-200 pixel images → compressed to 4000 neurons → compressed to 400 neurons → softmax classification layer (2 classes)

slide-26
SLIDE 26

26/32

Experiment - Network Configuration

Second deep stacked autoencoder: 200-by-200 pixel images → compressed to 4000 neurons → compressed to 400 neurons → compressed to 40 neurons → compressed to 10 neurons → softmax classification layer (2 classes)

slide-27
SLIDE 27

27/32

Results of the Stacked Autoencoders

slide-28
SLIDE 28

28/32

Conclusions

slide-29
SLIDE 29

29/32

Conclusions & Future Works

Conclusion: Autoencoders are able to classify stress with decent performance. The performance of the autoencoder with more layers is less

  • accurate. It might be due to data overfitting.

Future works: Choosing the appropriate hyper-parameters is challenging for the stress classification tasks. There are numbers of different configurations that are yet to be tested. Employing the TS video to increase the stress classification accuracy.

slide-30
SLIDE 30

30/32

References

Abraham, Tara H (2002). “McCULLOCH–PITTS NEURAL NETWORKS”. In: Journal of the History of the Behavioral Sciences 38.1, pp. 3–25. Bengio, Yoshua (2009). “Learning deep architectures for AI”. In: Foundations and trends R in Machine Learning 2.1, pp. 1–127. Dhall, Abhinav et al. (2011). “Emotion recognition using PHOG and LPQ features”. In: Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference

  • n. IEEE, pp. 878–883.

Hinton, Geoffrey E and Ruslan R Salakhutdinov (2006). “Reducing the dimensionality of data with neural networks”. In: Science 313.5786, pp. 504–507. Joshi, Jyoti et al. (2012). “Neural-net classification for spatio-temporal descriptor based depression analysis”. In: Pattern Recognition (ICPR), 2012 21st International Conference

  • n. IEEE, pp. 2634–2638.
slide-31
SLIDE 31

31/32

References

McCulloch, Warren S and Walter Pitts (1943). “A logical calculus

  • f the ideas immanent in nervous activity”. In: The bulletin of

mathematical biophysics 5.4, pp. 115–133. Ng, Andrew (2011). “Sparse autoencoder”. In: CS294A Lecture notes 72, pp. 1–19. Rojas, Ra´ ul (2013). Neural networks: a systematic introduction. Springer Science & Business Media. Scholz, Matthias (2012). Nonlinear PCA. url: http://www.nlpca.org/ (visited on 08/09/2016). Sharma, Nandita et al. (2013). “Modeling stress using thermal facial patterns: A spatio-temporal approach”. In: Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on. IEEE, pp. 387–392.

slide-32
SLIDE 32

32/32

References

Sze, Kin-Wai, Kin-Man Lam, and Guoping Qiu (2005). “A new key frame representation for video segment retrieval”. In: IEEE Transactions on Circuits and Systems for Video Technology 15.9, pp. 1148–1155. issn: 1051-8215. doi: 10.1109/TCSVT.2005.852623. Yuen, P. et al. (2009). “Emotional amp; physical stress detection and classification using thermal imaging technique”. In: Crime Detection and Prevention (ICDP 2009), 3rd International Conference on, pp. 1–6. doi: 10.1049/ic.2009.0241.