[PPT] - Data mining for Obstructive Sleep Apnea Detection 18 October 2017 PowerPoint Presentation

SLIDE 1

18 October 2017 Konstantinos Nikolaidis

Data mining for Obstructive Sleep Apnea Detection

SLIDE 2

Introduction: What is Obstructive Sleep Apnea?

Obstructive Sleep Apnea (OSA) is a relatively common

sleep disorder that is characterized by recurrent episodes

f partial or complete collapse of the upper airway during

sleep.

Estimates of disease prevalence are in the range of 3% to

7%.

However, It is estimated that 70-80% of OSA cases remains

undiagnosed.

Factors that increase vulnerability for the disorder include

age, male sex, obesity, family history, menopause and certain health behaviors such as cigarette smoking and alcohol use etc

SLIDE 3

Introduction: What is Obstructive Sleep Apnea?

Figure1.[1]

SLIDE 4

Diagnosis:

There is a variety of tools used to diagnose sleep apnea, ranging from the gold

standard polysomnography to questionnaires used for screening of patients at higher risk

Regarding Polysonography it is usually done by hospitalization in sleep

laboratories with polysomnographic instuments with multiparametric tests. The diagnosis includes:

– sensors on the nose – on the head (for monitoring occular movement and brain activity(EEG)) – on the chest and abdomen(elastic belts for the measurement of the

respiration)

– on the finger for oxygen saturation – also ECG (electrocardiograph) and EMG(electromyograph).

SLIDE 5

Diagnosis: Polysomnography

Figure2 [2]

The overall process of the polysomnography diagnosis is

resource demanding and also intrusive for the patient.

SLIDE 6

Diagnosis:

In recent years many research teams aim at creating new

hardware/software for easier and patient friendly OSA

diagnosis. Some ideas include:

–

Mobile applications which use sensors with automated diagnosis based on data mining techniques.

–

Portable devices created for this purpose.

–

More exotic methods like usage of smart T-shirts as sensors, use of built in phone microphones to detect OSA via snoring, or even detection during wakefulness.

SLIDE 7

Classification problem

From the above it is clear that OSA detection can be

described as a classification problem.

Looking at the OSA event detection we understand that

we have two probable classes:

–

Periods with apnea events

–

Periods with normal breathing

Supervised learning problem: we need annotations!

SLIDE 8

Classification problem: General

Under this assumption, we have have studied how do

different data mining techniques compare for the classification of OSA in different datasets.

These datasets (found in Physionet.org) include different

patients which have been monitored by using a variety of sensors, and their breathing periods were classified by experts.

So the datasets provide the annotations we need.

SLIDE 9

Classification problem: Sensors

We studied how different sensor combinations affect the

result of the classification.

We focused on the less cumbersome sensors, and

especially on the sensors that are related to respiration (nose, abdomen, chest and SpO2).

We did not apply feature extraction before the

classification, and so we trained our classifiers with the raw data for the different sensor combinations, for the different datasets.

SLIDE 10

Classification problem: Data mining

The data mining techniques we used were:

–

KNN algorithm.

–

Neural network.

–

Decision tree.

–

Support Vecto Machine.

SLIDE 11

Classification problem: Data mining: K Nearest Neighbor

Define proximity between instances, find neighbors of

new instance and assign majority class

Case based reasoning: when attributes are more

complicated than real-valued

Cons:

–

Slow during application

–

No feature selection

–

Notion of proximity vague.

Pros:

+ Fast training

SLIDE 12

Classification problem: Data mining: K Nearest Neighbor

Figure3 [3]

SLIDE 13

Classification problem: Data mining: Support Vector Machine.

We want to find the optimal separation hyperplane which

maximizes the margin between the 2 classes.

A hyperplane can be defined by the following equation:

where w,b are parameters.

We assume that are our input vectors and are the

corresponding labels. we examine linear separable case for 2 classes. So yi=+1 or yi=-1

w x+b=0 xi yi

SLIDE 14

Classification problem: Data mining: Support Vector Machine.

Figure4 [4]

we will select 2 parallel

hyperplanes wχ+b=1 and wx+b=-1 the hyperplanes for the 2 classes.

The region between these

hyperplanes is called margin, and we want to maximize this region.

It can be proven that the total

distance between these hyperplanes is 2/||w||

So if we want to maximize the

margin, we will have to minimize ||w||.

SLIDE 15

Classification problem: Data mining: Support Vector Machine.

We have to maximize the distance via minimizing ||w||, but

also we will have to satisfy our class separation constraints:

–

w*xi+b>=1 for every xi which yi=1

–

w*xi+b<=1 for every xi which yi=-1

So, our problem can be defined as: minimize ||w|| subject

to yi( wxi+b ) >= 1 for each i in our dataset.

SLIDE 16

Classification problem: Data mining: Support Vector Machine.

This is a quadratic programming problem which is

equivalent to solving the following( the primal Lagrangian equivalent to our problem): min for each ai>=0 , and l is the total number of training points. After solving for the partial derivatives of our parameters w and b equal to zero (local minimum), we get: and . b can be easily derived.

Lp(x)= 1 (‖w

2‖)

+∑

i=0 l

ai yi(xi w+b)+∑

i=0 l

ai w=∑

i=0 l

ai yi xi

∑

i=0 l

ai yi=0

SLIDE 17

Classification problem: Data mining: Support Vector Machine.

We can also express our problem in the dual Lagrangian

form by substituting our parameters in the Lagrangian

equation. The two forms are equivalent. We get the

following: max s.t Now we can find the ai by our maximisation constraint through partial derivatives equal to zero.

LD(ai)=∑

i=1 l

ai−1 2∑

i=1 l

∑

j=1 l

ai xi x j yi y j ai≥0,∑

i=0 l

ai yi=0

SLIDE 18

Classification problem: Data mining: Support Vector Machine.

We express our problem to this form because we are

independent from w and b, and we can solve it only via computing the inner product of xi,xj.

This is very useful for non linear separable cases where

we want to map our data to higher dimensions in order to make them linearly separable via using kernels.

Also, most of the ai (Lagrangian multipliers) will be zero.

The ones that are not zero correspond to the support vectors.

SLIDE 19

Classification problem: Data mining: Support Vector Machine.

Pros:

+ Reach global optimum + Not many parameters + Good for small datasets

Cons:
Choice of kernel
Relatively slow training
Does not scale well with

the increase of data

SLIDE 20

Classification problem: Data mining: Neural Networks.

Input nodes are connected to
utput nodes by a set of

hidden nodes and edges

Inputs describe DB instances
Outputs are the categories

we want to recognize

Hidden nodes assign weights

to each edge so they represent the weight of relationships between the input and the output of a large set of training data

Input Nodes Output Nodes Hidden Layer Nodes

Car House Sports Music Comic

SLIDE 21

Classification problem: Data mining: Neural Networks.

Initializing:

–

Normalize the data

–

Initialize the weights (set them to zero or uniformly distributed random value in (-1,1))

Training phase:

–

Forward phase where the input vector xi is inserted, and we get the output.

–

Backpropagation: Based on the error function propagate the error to the previous layers and update the weights based on the gradient descend.

Mining /Testing phase, where we test our model on unknown data

E(n)=1 2 ∑

i=1 Classes

( yi(n)−di(n))

2

SLIDE 22

Classification problem: Data mining: Neural Networks

Basic NN Unit:

H i d d e n n

d

e s O u t p u t n

d

e s x1 x2 x3 x1 x2 x3 w1 w2 w3

y n i i i

e y x w





    1 1 ) ( ) (

1

 

A more typical NN:

SLIDE 23

Classification problem: Data mining: Neural Networks

Useful for learning complex data like handwriting, speech

and image recognition. In order to have curved boundaries however, we must use a nonlinear activation function.

Neural network Classification tree Decision boundaries: Linear regression

SLIDE 24

Classification problem: Data mining: Neural Networks

Pros:

–

Can learn more complicated boundaries

–

Can handle large number

f features

–

Fast application

Cons:

–

Slow training time

–

Hard to interpret

–

Hard to implement: trial and error for choosing number of nodes

SLIDE 25

Classification problem: Data mining: Decision Trees

Tree where internal nodes are simple decision rules on
ne or more attributes and leaf nodes are predicted class

labels.

S a l a r y < 1 M P r

f

= t e a c h e r Good A g e < 3 Bad Bad Good

SLIDE 26

Classification problem: Data mining: Decision Trees

Widely used learning method
Easy to interpret: can be represented as if-then-else

rules

Approximates function by piece wise constant regions
Does not require any prior knowledge of data

distribution, works well on noisy data.

Has been applied to:

– classify medical patients based on the disease, – Equipment malfunction by cause, – Loan applicant by likelihood of payment.

SLIDE 27

Classification problem: Data mining: Decision Tree

Training a decision tree:
C4.5 algorithm:

–

We define as S the set of the training instances.

3 base cases: 1) All the examples from the training set belong to the same class ( returns a leaf with that class label ) 2) The training set is empty ( returns a tree leaf called failure ). 3) The attribute list is empty ( returns a leaf labeled with the most frequent class or the disjuction of all the classes).

–

Step 1) Check cases.

–

Step 2) Find the attribute with the most information gain ( a_best)

–

Step 3) Partition S in to S1, S2 ,S3 … based on the values of the a_best

–

Step 4) Recur the above steps for ever subset of S

SLIDE 28

Classification problem: Data mining: Decision Tree

Pros:

–

Reasonable training time

–

Fast application

–

Easy to interpret

–

Easy to implement

–

Can handle large number

f features.
Cons:

–

Cannot handle complicated feature relationships

–

Simple decision boundaries

–

Problems with lots of missing data

SLIDE 29

Classification problem: Results

After the data mining, we evaluated the performance of

the different algorithms for the different sensor combinations through all the datasets.

For the dataset with the best data quality, with using a

subset combination of the original dataset (in particular nose ,and chest respiration and O2) we can achieve really good results for the simplest classifier (KNN), and also for the other algorithms

However for the datasets with lower data quality our best

accuracy is significantly lower for all of the different possible sensor combinations for all the techniques we tried.

SLIDE 30

Classification problem: Evaluation

In order to further evaluate our results, we reimplemented

the experiment in another programming language (MATLAB→R).

Regardless of the relatively different parameter tuning, we

had similar results.

We have also implemented experiments where we include

feature extraction for the better evaluation of the results.

SLIDE 31

Future plans:

From these experiments, it is obvious that data quality

plays a very important role in the classification procedure.

We need to further understand the parameters which

affect the quality of a dataset, especially regarding time series biosignals.

We aim to apply the knowledge we will gain in order to be

able to recognize and if possible correct potentially pattern like low quality traits on this types of datasets.

SLIDE 32

Figure References:

[1] http://www.jarrettsvilledental.com/services/sleep-apnea/
[2] http://www.sleep-apneaguide.com/polysomnogram.html
[3]

https://en.wikipedia.org/wiki/K-nearest_neighbors_algorith m

[4]https://www.quora.com/What-are-the-general-easy-to-

Data mining for Obstructive Sleep Apnea Detection

Introduction: What is Obstructive Sleep Apnea?

sleep disorder that is characterized by recurrent episodes

sleep.

7%.

undiagnosed.

age, male sex, obesity, family history, menopause and certain health behaviors such as cigarette smoking and alcohol use etc

Introduction: What is Obstructive Sleep Apnea?

Diagnosis:

Diagnosis: Polysomnography

resource demanding and also intrusive for the patient.

Diagnosis:

hardware/software for easier and patient friendly OSA

Mobile applications which use sensors with automated diagnosis based on data mining techniques.

Portable devices created for this purpose.

More exotic methods like usage of smart T-shirts as sensors, use of built in phone microphones to detect OSA via snoring, or even detection during wakefulness.

Classification problem

described as a classification problem.

we have two probable classes:

Periods with apnea events

Periods with normal breathing

Classification problem: General

different data mining techniques compare for the classification of OSA in different datasets.

patients which have been monitored by using a variety of sensors, and their breathing periods were classified by experts.

Classification problem: Sensors

result of the classification.

especially on the sensors that are related to respiration (nose, abdomen, chest and SpO2).

classification, and so we trained our classifiers with the raw data for the different sensor combinations, for the different datasets.

Classification problem: Data mining

KNN algorithm.

Neural network.

Decision tree.

Support Vecto Machine.

Classification problem: Data mining: K Nearest Neighbor

new instance and assign majority class

complicated than real-valued

Slow during application

No feature selection

Notion of proximity vague.

+ Fast training

Classification problem: Data mining: K Nearest Neighbor

Classification problem: Data mining: Support Vector Machine.

maximizes the margin between the 2 classes.

where w,b are parameters.

corresponding labels. we examine linear separable case for 2 classes. So yi=+1 or yi=-1

w x+b=0 xi yi

Classification problem: Data mining: Support Vector Machine.

hyperplanes w*χ+b=1 and w*x+b=-1 the hyperplanes for the 2 classes.

hyperplanes is called margin, and we want to maximize this region.

distance between these hyperplanes is 2/||w||

margin, we will have to minimize ||w||.

Classification problem: Data mining: Support Vector Machine.

also we will have to satisfy our class separation constraints:

w*xi+b>=1 for every xi which yi=1

w*xi+b<=1 for every xi which yi=-1

to yi*( w*xi+b ) >= 1 for each i in our dataset.

Classification problem: Data mining: Support Vector Machine.

equivalent to solving the following( the primal Lagrangian equivalent to our problem): min for each ai>=0 , and l is the total number of training points. After solving for the partial derivatives of our parameters w and b equal to zero (local minimum), we get: and . b can be easily derived.

Lp(x)= 1 (‖w

+∑

ai yi(xi w+b)+∑

ai w=∑

ai yi xi

∑

ai yi=0

Classification problem: Data mining: Support Vector Machine.

form by substituting our parameters in the Lagrangian

following: max s.t Now we can find the ai by our maximisation constraint through partial derivatives equal to zero.

LD(ai)=∑

ai−1 2∑

∑

ai xi x j yi y j ai≥0,∑

ai yi=0

Classification problem: Data mining: Support Vector Machine.

independent from w and b, and we can solve it only via computing the inner product of xi,xj.

we want to map our data to higher dimensions in order to make them linearly separable via using kernels.

The ones that are not zero correspond to the support vectors.

Classification problem: Data mining: Support Vector Machine.

+ Reach global optimum + Not many parameters + Good for small datasets

the increase of data

hyperplanes wχ+b=1 and wx+b=-1 the hyperplanes for the 2 classes.

to yi( wxi+b ) >= 1 for each i in our dataset.