Data mining for Obstructive Sleep Apnea Detection 18 October 2017 - - PowerPoint PPT Presentation

data mining for obstructive sleep apnea detection
SMART_READER_LITE
LIVE PREVIEW

Data mining for Obstructive Sleep Apnea Detection 18 October 2017 - - PowerPoint PPT Presentation

Data mining for Obstructive Sleep Apnea Detection 18 October 2017 Konstantinos Nikolaidis Introduction: What is Obstructive Sleep Apnea? Obstructive Sleep Apnea (OSA) is a relatively common sleep disorder that is characterized by recurrent


slide-1
SLIDE 1

18 October 2017 Konstantinos Nikolaidis

Data mining for Obstructive Sleep Apnea Detection

slide-2
SLIDE 2

Introduction: What is Obstructive Sleep Apnea?

  • Obstructive Sleep Apnea (OSA) is a relatively common

sleep disorder that is characterized by recurrent episodes

  • f partial or complete collapse of the upper airway during

sleep.

  • Estimates of disease prevalence are in the range of 3% to

7%.

  • However, It is estimated that 70-80% of OSA cases remains

undiagnosed.

  • Factors that increase vulnerability for the disorder include

age, male sex, obesity, family history, menopause and certain health behaviors such as cigarette smoking and alcohol use etc

slide-3
SLIDE 3

Introduction: What is Obstructive Sleep Apnea?

Figure1.[1]

slide-4
SLIDE 4

Diagnosis:

  • There is a variety of tools used to diagnose sleep apnea, ranging from the gold

standard polysomnography to questionnaires used for screening of patients at higher risk

  • Regarding Polysonography it is usually done by hospitalization in sleep

laboratories with polysomnographic instuments with multiparametric tests. The diagnosis includes:

– sensors on the nose – on the head (for monitoring occular movement and brain activity(EEG)) – on the chest and abdomen(elastic belts for the measurement of the

respiration)

– on the finger for oxygen saturation – also ECG (electrocardiograph) and EMG(electromyograph).

slide-5
SLIDE 5

Diagnosis: Polysomnography

Figure2 [2]

  • The overall process of the polysomnography diagnosis is

resource demanding and also intrusive for the patient.

slide-6
SLIDE 6

Diagnosis:

  • In recent years many research teams aim at creating new

hardware/software for easier and patient friendly OSA

  • diagnosis. Some ideas include:

Mobile applications which use sensors with automated diagnosis based on data mining techniques.

Portable devices created for this purpose.

More exotic methods like usage of smart T-shirts as sensors, use of built in phone microphones to detect OSA via snoring, or even detection during wakefulness.

slide-7
SLIDE 7

Classification problem

  • From the above it is clear that OSA detection can be

described as a classification problem.

  • Looking at the OSA event detection we understand that

we have two probable classes:

Periods with apnea events

Periods with normal breathing

  • Supervised learning problem: we need annotations!
slide-8
SLIDE 8

Classification problem: General

  • Under this assumption, we have have studied how do

different data mining techniques compare for the classification of OSA in different datasets.

  • These datasets (found in Physionet.org) include different

patients which have been monitored by using a variety of sensors, and their breathing periods were classified by experts.

  • So the datasets provide the annotations we need.
slide-9
SLIDE 9

Classification problem: Sensors

  • We studied how different sensor combinations affect the

result of the classification.

  • We focused on the less cumbersome sensors, and

especially on the sensors that are related to respiration (nose, abdomen, chest and SpO2).

  • We did not apply feature extraction before the

classification, and so we trained our classifiers with the raw data for the different sensor combinations, for the different datasets.

slide-10
SLIDE 10

Classification problem: Data mining

  • The data mining techniques we used were:

KNN algorithm.

Neural network.

Decision tree.

Support Vecto Machine.

slide-11
SLIDE 11

Classification problem: Data mining: K Nearest Neighbor

  • Define proximity between instances, find neighbors of

new instance and assign majority class

  • Case based reasoning: when attributes are more

complicated than real-valued

  • Cons:

Slow during application

No feature selection

Notion of proximity vague.

  • Pros:

+ Fast training

slide-12
SLIDE 12

Classification problem: Data mining: K Nearest Neighbor

Figure3 [3]

slide-13
SLIDE 13

Classification problem: Data mining: Support Vector Machine.

  • We want to find the optimal separation hyperplane which

maximizes the margin between the 2 classes.

  • A hyperplane can be defined by the following equation:

where w,b are parameters.

  • We assume that are our input vectors and are the

corresponding labels. we examine linear separable case for 2 classes. So yi=+1 or yi=-1

w x+b=0 xi yi

slide-14
SLIDE 14

Classification problem: Data mining: Support Vector Machine.

Figure4 [4]

  • we will select 2 parallel

hyperplanes w*χ+b=1 and w*x+b=-1 the hyperplanes for the 2 classes.

  • The region between these

hyperplanes is called margin, and we want to maximize this region.

  • It can be proven that the total

distance between these hyperplanes is 2/||w||

  • So if we want to maximize the

margin, we will have to minimize ||w||.

slide-15
SLIDE 15

Classification problem: Data mining: Support Vector Machine.

  • We have to maximize the distance via minimizing ||w||, but

also we will have to satisfy our class separation constraints:

w*xi+b>=1 for every xi which yi=1

w*xi+b<=1 for every xi which yi=-1

  • So, our problem can be defined as: minimize ||w|| subject

to yi*( w*xi+b ) >= 1 for each i in our dataset.

slide-16
SLIDE 16

Classification problem: Data mining: Support Vector Machine.

  • This is a quadratic programming problem which is

equivalent to solving the following( the primal Lagrangian equivalent to our problem): min for each ai>=0 , and l is the total number of training points. After solving for the partial derivatives of our parameters w and b equal to zero (local minimum), we get: and . b can be easily derived.

Lp(x)= 1 (‖w

2‖)

+∑

i=0 l

ai yi(xi w+b)+∑

i=0 l

ai w=∑

i=0 l

ai yi xi

i=0 l

ai yi=0

slide-17
SLIDE 17

Classification problem: Data mining: Support Vector Machine.

  • We can also express our problem in the dual Lagrangian

form by substituting our parameters in the Lagrangian

  • equation. The two forms are equivalent. We get the

following: max s.t Now we can find the ai by our maximisation constraint through partial derivatives equal to zero.

LD(ai)=∑

i=1 l

ai−1 2∑

i=1 l

j=1 l

ai xi x j yi y j ai≥0,∑

i=0 l

ai yi=0

slide-18
SLIDE 18

Classification problem: Data mining: Support Vector Machine.

  • We express our problem to this form because we are

independent from w and b, and we can solve it only via computing the inner product of xi,xj.

  • This is very useful for non linear separable cases where

we want to map our data to higher dimensions in order to make them linearly separable via using kernels.

  • Also, most of the ai (Lagrangian multipliers) will be zero.

The ones that are not zero correspond to the support vectors.

slide-19
SLIDE 19

Classification problem: Data mining: Support Vector Machine.

  • Pros:

+ Reach global optimum + Not many parameters + Good for small datasets

  • Cons:
  • Choice of kernel
  • Relatively slow training
  • Does not scale well with

the increase of data

slide-20
SLIDE 20

Classification problem: Data mining: Neural Networks.

  • Input nodes are connected to
  • utput nodes by a set of

hidden nodes and edges

  • Inputs describe DB instances
  • Outputs are the categories

we want to recognize

  • Hidden nodes assign weights

to each edge so they represent the weight of relationships between the input and the output of a large set of training data

Input Nodes Output Nodes Hidden Layer Nodes

Car House Sports Music Comic

slide-21
SLIDE 21

Classification problem: Data mining: Neural Networks.

  • Initializing:

Normalize the data

Initialize the weights (set them to zero or uniformly distributed random value in (-1,1))

  • Training phase:

Forward phase where the input vector xi is inserted, and we get the output.

Backpropagation: Based on the error function propagate the error to the previous layers and update the weights based on the gradient descend.

  • Mining /Testing phase, where we test our model on unknown data

E(n)=1 2 ∑

i=1 Classes

( yi(n)−di(n))

2

slide-22
SLIDE 22

Classification problem: Data mining: Neural Networks

Basic NN Unit:

H i d d e n n

  • d

e s O u t p u t n

  • d

e s x1 x2 x3 x1 x2 x3 w1 w2 w3

y n i i i

e y x w

    1 1 ) ( ) (

1

 

A more typical NN:

slide-23
SLIDE 23

Classification problem: Data mining: Neural Networks

  • Useful for learning complex data like handwriting, speech

and image recognition. In order to have curved boundaries however, we must use a nonlinear activation function.

Neural network Classification tree Decision boundaries: Linear regression

slide-24
SLIDE 24

Classification problem: Data mining: Neural Networks

  • Pros:

Can learn more complicated boundaries

Can handle large number

  • f features

Fast application

  • Cons:

Slow training time

Hard to interpret

Hard to implement: trial and error for choosing number of nodes

slide-25
SLIDE 25

Classification problem: Data mining: Decision Trees

  • Tree where internal nodes are simple decision rules on
  • ne or more attributes and leaf nodes are predicted class

labels.

S a l a r y < 1 M P r

  • f

= t e a c h e r Good A g e < 3 Bad Bad Good

slide-26
SLIDE 26

Classification problem: Data mining: Decision Trees

  • Widely used learning method
  • Easy to interpret: can be represented as if-then-else

rules

  • Approximates function by piece wise constant regions
  • Does not require any prior knowledge of data

distribution, works well on noisy data.

  • Has been applied to:

– classify medical patients based on the disease, – Equipment malfunction by cause, – Loan applicant by likelihood of payment.

slide-27
SLIDE 27

Classification problem: Data mining: Decision Tree

  • Training a decision tree:
  • C4.5 algorithm:

We define as S the set of the training instances.

3 base cases: 1) All the examples from the training set belong to the same class ( returns a leaf with that class label ) 2) The training set is empty ( returns a tree leaf called failure ). 3) The attribute list is empty ( returns a leaf labeled with the most frequent class or the disjuction of all the classes).

Step 1) Check cases.

Step 2) Find the attribute with the most information gain ( a_best)

Step 3) Partition S in to S1, S2 ,S3 … based on the values of the a_best

Step 4) Recur the above steps for ever subset of S

slide-28
SLIDE 28

Classification problem: Data mining: Decision Tree

  • Pros:

Reasonable training time

Fast application

Easy to interpret

Easy to implement

Can handle large number

  • f features.
  • Cons:

Cannot handle complicated feature relationships

Simple decision boundaries

Problems with lots of missing data

slide-29
SLIDE 29

Classification problem: Results

  • After the data mining, we evaluated the performance of

the different algorithms for the different sensor combinations through all the datasets.

  • For the dataset with the best data quality, with using a

subset combination of the original dataset (in particular nose ,and chest respiration and O2) we can achieve really good results for the simplest classifier (KNN), and also for the other algorithms

  • However for the datasets with lower data quality our best

accuracy is significantly lower for all of the different possible sensor combinations for all the techniques we tried.

slide-30
SLIDE 30

Classification problem: Evaluation

  • In order to further evaluate our results, we reimplemented

the experiment in another programming language (MATLAB→R).

  • Regardless of the relatively different parameter tuning, we

had similar results.

  • We have also implemented experiments where we include

feature extraction for the better evaluation of the results.

slide-31
SLIDE 31

Future plans:

  • From these experiments, it is obvious that data quality

plays a very important role in the classification procedure.

  • We need to further understand the parameters which

affect the quality of a dataset, especially regarding time series biosignals.

  • We aim to apply the knowledge we will gain in order to be

able to recognize and if possible correct potentially pattern like low quality traits on this types of datasets.

slide-32
SLIDE 32

Figure References:

  • [1] http://www.jarrettsvilledental.com/services/sleep-apnea/
  • [2] http://www.sleep-apneaguide.com/polysomnogram.html
  • [3]

https://en.wikipedia.org/wiki/K-nearest_neighbors_algorith m

  • [4]https://www.quora.com/What-are-the-general-easy-to-

understand-differences-between-support-vector- machines-and-artificial-neural-networks-Why-do-you- need-them-both