PHYSIOLOGICAL DATA ANALYSIS ALCOHOL DRINKING PREDICTION USING - - PowerPoint PPT Presentation

physiological data analysis
SMART_READER_LITE
LIVE PREVIEW

PHYSIOLOGICAL DATA ANALYSIS ALCOHOL DRINKING PREDICTION USING - - PowerPoint PPT Presentation

PHYSIOLOGICAL DATA ANALYSIS ALCOHOL DRINKING PREDICTION USING STATISTICAL AND DEEP LEARNING METHODS Masters Thesis Defense Can Li Advisor: Dr. Yi Shang Contents Introduction Related Work Experiment Data Data Analysis Methods


slide-1
SLIDE 1

PHYSIOLOGICAL DATA ANALYSIS

ALCOHOL DRINKING PREDICTION USING STATISTICAL AND DEEP LEARNING METHODS Master’s Thesis Defense

Can Li

Advisor: Dr. Yi Shang

slide-2
SLIDE 2

Contents

  • Introduction
  • Related Work
  • Experiment Data
  • Data Analysis Methods
  • Experiment Results and Comparison
  • Conclusion and Future Work

2

slide-3
SLIDE 3

Contents

  • Introduction
  • Problem Definition
  • Motivation and Contribution
  • Related Work
  • Experiment Data
  • Data Analysis Methods
  • Experiment Results and Comparison
  • Conclusion and Future Work

3

slide-4
SLIDE 4

Introduction

Alcohol craving study based on real physiological data

  • 1. Data was collected from mobile ambulatory

assessment system

  • 2. The type of sensor used is basis watch
  • 3. The goal of this study is to predict whether

people had drinking or not using machine learning pipeline

4

slide-5
SLIDE 5

Problem Definition

Input: One dimensional skin temperature, heart rate, GSR(galvanic skin response) signal Method: Data analysis pipeline

  • 1. Data labeling
  • 2. Data cleaning
  • 3. Feature extraction
  • 4. Classification

Output: {0, 1}, 0 is non-drinking and 1 is drinking

5

slide-6
SLIDE 6

Motivation and Contributions

Motivation:

  • 1. Previous work was doing drinking prediction based on each record. There is
  • verlapping information in the result. Prediction based on drinking episode is

more reasonable.

  • 2. To try deep learning on drinking episode prediction

Contributions:

  • 1. Came up with drinking episode and deep learning pipeline
  • 2. New features were extracted
  • 3. Found that heart rate is the most significant feature in drinking prediction
  • 4. Achieve 88.89% accuracy for drinking episode prediction

6

slide-7
SLIDE 7

Contents

  • Introduction
  • Related Work
  • Experiment Data
  • Data Analysis Methods
  • Experiment Results and Comparison
  • Conclusion and Future Work

7

slide-8
SLIDE 8

Related Work

Hossain, Syed Monowar, et al. "Identifying drug (cocaine) intake events from acute physiological response in the presence of free- living physical activity." Proceedings of the 13th international symposium on Information processing in sensor networks. IEEE Press, 2014.

  • This paper was identifying recovery time from cocaine intake, which

gave me the idea to do drinking episode prediction

8

slide-9
SLIDE 9

Related Work (cont’d)

Wergeles, Nickolas M. “AMD: Analysis of Mood Dysregulation A Machine Learning Approach” 2016.

  • 1. He is doing mood dysregulation prediction from physiological data.

My research is about drinking prediction.

  • 2. Prediction is based on each 5-second record. My prediction is based
  • n both 1-minute record and 30-minute data block.
  • 3. Data cleaning method was introduced in his paper. I used the

similar data cleaning method.

9

slide-10
SLIDE 10

Related Work (cont’d)

Zhang, Chen. “Wearable Sensing Analysis – Identifying alcohol Drinking From Daily Physiological Data” 2016.

  • 1. Doing alcohol drinking prediction on physiological data from SEM,

Hexoskin sensors. My data is from basis watch.

  • 2. His sample rate is 5 seconds. Mine is 1 minute.
  • 3. Statistical features were extracted from 1-minute window. I

extracted different statistical features and deep learning features based on 30-minute data block.

10

slide-11
SLIDE 11

Contents

  • Introduction
  • Related Work
  • Experiment Data
  • 1. Data Overview
  • 2. Data Visualization
  • 3. Data Statistics
  • Data Analysis Methods
  • Experiment Results and Comparison
  • Conclusion and Future Work

11

slide-12
SLIDE 12
  • 1. Data Overview
  • Number of Users: 29
  • Survey Data

1) Initial Drinking 2) Drinking Follow-ups

  • Raw data (Sensor Data)
  • Sample rate: 1 minute
  • Features

1) Skin Temperature 2) Heart Rate 3) GSR (galvanic skin response)

12

Survey Data Example Sensor Data Example

slide-13
SLIDE 13
  • 2. Data Visualization

13

slide-14
SLIDE 14
  • 3. Data Statistics

14

5 10 15 20 25 30 35 40 1510 1572 2867 2958 3019 3040 3319 3383 3641 3910 4384 4405 4434 4489 4540 4557 4620 4758 5055 5070 5071 5078 5082 5114 5123 5129 5132 5135 5144

Number of Days UserID

Figure 1. Days for Raw Data

TotalDays DaysWithRawData 10000 20000 30000 40000 50000 1510 1572 2867 2958 3019 3040 3319 3383 3641 3910 4384 4405 4434 4489 4540 4557 4620 4758 5055 5071 5078 5082 5123 5129 5132 5135 5144

Figure 2. Total Number of Records For Raw Data

1000 2000 3000 4000 1510 1572 2867 2958 3019 3040 3319 3383 3641 3910 4384 4405 4434 4489 4540 4557 4620 4758 5055 5071 5078 5082 5123 5129 5132 5135 5144

Patients

Figure 3. Drinking Records

Drinking Records

slide-15
SLIDE 15

Contents

  • Introduction
  • Related Work
  • Experiment Data
  • Data Analysis Methods
  • 1. Data Analysis Methods Overview
  • 2. Method 1: Drinking Record Prediction Pipeline
  • 3. Method 2: Drinking Episode Prediction Statistical Pipeline
  • 4. Method 3: Drinking Episode Prediction Deep Learning Pipeline
  • Experiment Results and Comparison
  • Conclusion and Future Work

15

slide-16
SLIDE 16

Data Analysis Methods Overview

Method 1: Drinking record prediction Pipeline

  • 1. Data combination and labeling
  • 2. Data cleaning:

1) Gaps and insufficient data removal 2) Smoothing and outliers removal

  • 3. Classification

Data Cleaning Classification 16 Generate 30-minute Data Blocks Feature Extraction Data Combination Labeling

slide-17
SLIDE 17

Data Analysis Methods Overview

Method 2: Drinking episode prediction statistical pipeline

  • 1. Data Combination and Labeling
  • 2. Generate 30-minute data blocks
  • 3. Extract statistical features from 30-minute data blocks
  • 4. Principal component analysis
  • 5. Classification

Data Cleaning Classification 17 Generate 30-minute Data Blocks Feature Extraction Data Combination Labeling

slide-18
SLIDE 18

Data Analysis Methods Overview

Method 3: Drinking episode prediction deep learning pipeline

  • 1. Data Combination and Labeling
  • 2. Generate 30-minute data blocks
  • 3. Convert 30-minute data blocks into spectrogram
  • 4. Extract deep learning features from spectrogram
  • 5. Classification

Data Cleaning Classification 18 Generate 30-minute Data Blocks Feature Extraction Data Combination Labeling

slide-19
SLIDE 19

Contents

  • Data Analysis Methods
  • 1. Data Analysis Methods Overview
  • 2. Method 1: Drinking Record Prediction Pipeline
  • 1. Data Combination and Labeling
  • 2. Data Cleaning
  • 3. Classification
  • 3. Method 2: Drinking Episode Prediction Statistical Pipeline
  • 4. Method 3: Drinking Episode Prediction Deep Learning Pipeline
  • Experiment Results and Comparison
  • Conclusion and Future Work

19

slide-20
SLIDE 20
  • 1. Data Combination and Labeling
  • 1. Combine raw sensor data with survey data
  • 2. Find initial drinking and drinking follow-ups that have a time difference less

than 2 hours with its previous drinking behavior

  • 3. Label data points that fall into [ID - 30 minutes, Last DF + 2 hours] as drinking

20

ID: Initial drinking DF1: Drinking follow-up 1 DF2: Drinking follow-up 2 DF3: Drinking follow-up 3

slide-21
SLIDE 21
  • 2. Data Cleaning Step 1: Gaps and Insufficient Data

Removal

1) Gaps: There is no data within 10-minute window 2) Insufficient Data: Less than 5 data points within 10-minute window

21

Example for Gaps Example for Insufficient Data

slide-22
SLIDE 22
  • 2. Data Cleaning Step 2: Smoothing and Outliers

Removal

Use Lowess to smooth the data and remove outliers 1) Window Size: 1% of the data 2) Outliers: Two standard deviations away from the fitted curve

22

slide-23
SLIDE 23

Classification

Four Classifiers:

1) Naïve Bayes 2) Bayes Network 3) Logistic Regression 4) J48 Decision Tree

23

slide-24
SLIDE 24

Contents

  • Data Analysis Methods
  • 1. Data Analysis Methods Overview
  • 2. Method 1: Drinking Record Prediction Pipeline
  • 3. Method 2: Drinking Episode Prediction Statistical Pipeline
  • 1. Data Combination and Labeling
  • 2. Generate 30-minute data blocks
  • 3. Extract statistical features from 30-minute data blocks
  • 4. Principal component analysis
  • 5. Classification
  • 4. Method 3: Drinking Episode Prediction Deep Learning Pipeline
  • Experiment Results and Comparison
  • Conclusion and Future Work

24

slide-25
SLIDE 25
  • 2. Generate 30-Minute Data Blocks

Input: Labeled one-dimensional signal Requirement: 1)There is no missing value in 30-minute window 2) All the data points in the 30-minute window are labeled as the same type Output: 1) positive data block: if all 30 data points are drinking 2) negative data block: if all 30 data points are non-drinking

25

slide-26
SLIDE 26
  • 3. Statistical Feature Extraction

Statistical Features:

  • Mean:
  • Standard Deviation:
  • Skewness:
  • Slope: The slop of linear regression fitted on the data block
  • Coefficient of Variance: Std/Mean (measure spread relative to

mean)

26

slide-27
SLIDE 27
  • 4. Principal Component Analysis

Rule: Contribution larger than 0.1 percent Result: 8 principal components were chose

27

slide-28
SLIDE 28

Contents

  • Data Analysis Methods
  • 1. Data Analysis Methods Overview
  • 2. Method 1: Drinking Record Prediction Pipeline
  • 3. Method 2: Drinking Episode Prediction Statistical Pipeline
  • 4. Method 3: Drinking Episode Prediction Deep Learning Pipeline
  • 1. Data Combination and Labeling
  • 2. Generate 30-minute data blocks
  • 3. Convert 30-minute data block into Spectrogram
  • 4. Generate Cifar 10 Features from Spectrogram
  • 5. Classification
  • Experiment Results and Comparison
  • Conclusion and Future Work

28

slide-29
SLIDE 29
  • 3. Convert 30-minute data block into Spectrogram
  • Window size: 5
  • Overlap: window size – 1
  • Sample rate: 1 minute
  • Normalized
  • Color

29

slide-30
SLIDE 30
  • 4. Generate Cifar 10 Features from

Spectrogram

Use pre-trained model to do classification on Spectrogram to generate 10 probabilities for each Cifar 10 category

30

Spectrogram Cifar 10 Features

slide-31
SLIDE 31

Contents

  • Data Analysis Methods
  • 1. Data Analysis Methods Overview
  • 2. Method 1: Drinking Record Prediction Pipeline
  • 3. Method 2: Drinking Episode Prediction Statistical Pipeline
  • 4. Method 3: Drinking Episode Prediction Deep Learning Pipeline
  • Experiment Results and Comparison
  • 1. Result for Drinking Record Pipeline
  • 2. Result for Drinking Episode Statistical Pipeline
  • 3. Results for Drinking Episode Deep Learning Pipeline
  • 4. Statistical Pipeline VS Deep Learning Pipeline
  • Conclusion and Future Work

31

slide-32
SLIDE 32

Training and Testing Dataset

  • 1. Dataset for method 1: drinking records prediction pipeline

1) All users 2) Data type: 1-minute record 3) 14856 drinking and 14856 non-drinking 4) 66% for training, 34% for testing

  • 2. Dataset for method 2 and method 3: drinking episode prediction

1) Three users: 2867, 3641, 5055 2) Data type: 30-minute data blocks 3) User 2867: 101 drinking and 101 non-drinking 4) User 3641: 36 drinking and 36 non-drinking 5) User 5055: 26 drinking and 26 non-drinking 6) 66% for training, 34% for testing

32

slide-33
SLIDE 33
  • 1. Result for Drinking Record Pipeline

33

slide-34
SLIDE 34
  • 2. Result for Drinking Episode Statistical Pipeline

34

slide-35
SLIDE 35
  • 3. Results for Drinking Episode Deep Learning Pipeline

35

slide-36
SLIDE 36
  • 4. Statistical Pipeline VS Deep Learning Pipeline

36

slide-37
SLIDE 37

Contents

  • Data Analysis Methods
  • 1. Data Analysis Methods Overview
  • 2. Method 1: Drinking Record Prediction Pipeline
  • 3. Method 2: Drinking Episode Prediction Statistical Pipeline
  • 4. Method 3: Drinking Episode Prediction Deep Learning Pipeline
  • Experiment Results and Comparison
  • Conclusion and Future Work

37

slide-38
SLIDE 38

Conclusion and Future Work

  • Conclusion
  • Raw data has better result than cleaned data on drinking record prediction
  • Within-user result is much better than cross-user result
  • Statistical pipeline has better result than deep learning pipeline
  • Heart rate is the most significant feature in drinking episode prediction
  • Future work
  • Take alcohol amount into account for labeling
  • Try more deep learning models
  • Apply the methods in this thesis to other larger amount of data

38

slide-39
SLIDE 39

Thank You!

Question?