18 October 2017 Konstantinos Nikolaidis
Data mining for Obstructive Sleep Apnea Detection 18 October 2017 - - PowerPoint PPT Presentation
Data mining for Obstructive Sleep Apnea Detection 18 October 2017 - - PowerPoint PPT Presentation
Data mining for Obstructive Sleep Apnea Detection 18 October 2017 Konstantinos Nikolaidis Introduction: What is Obstructive Sleep Apnea? Obstructive Sleep Apnea (OSA) is a relatively common sleep disorder that is characterized by recurrent
Introduction: What is Obstructive Sleep Apnea?
- Obstructive Sleep Apnea (OSA) is a relatively common
sleep disorder that is characterized by recurrent episodes
- f partial or complete collapse of the upper airway during
sleep.
- Estimates of disease prevalence are in the range of 3% to
7%.
- However, It is estimated that 70-80% of OSA cases remains
undiagnosed.
- Factors that increase vulnerability for the disorder include
age, male sex, obesity, family history, menopause and certain health behaviors such as cigarette smoking and alcohol use etc
Introduction: What is Obstructive Sleep Apnea?
Figure1.[1]
Diagnosis:
- There is a variety of tools used to diagnose sleep apnea, ranging from the gold
standard polysomnography to questionnaires used for screening of patients at higher risk
- Regarding Polysonography it is usually done by hospitalization in sleep
laboratories with polysomnographic instuments with multiparametric tests. The diagnosis includes:
– sensors on the nose – on the head (for monitoring occular movement and brain activity(EEG)) – on the chest and abdomen(elastic belts for the measurement of the
respiration)
– on the finger for oxygen saturation – also ECG (electrocardiograph) and EMG(electromyograph).
Diagnosis: Polysomnography
Figure2 [2]
- The overall process of the polysomnography diagnosis is
resource demanding and also intrusive for the patient.
Diagnosis:
- In recent years many research teams aim at creating new
hardware/software for easier and patient friendly OSA
- diagnosis. Some ideas include:
–
Mobile applications which use sensors with automated diagnosis based on data mining techniques.
–
Portable devices created for this purpose.
–
More exotic methods like usage of smart T-shirts as sensors, use of built in phone microphones to detect OSA via snoring, or even detection during wakefulness.
Classification problem
- From the above it is clear that OSA detection can be
described as a classification problem.
- Looking at the OSA event detection we understand that
we have two probable classes:
–
Periods with apnea events
–
Periods with normal breathing
- Supervised learning problem: we need annotations!
Classification problem: General
- Under this assumption, we have have studied how do
different data mining techniques compare for the classification of OSA in different datasets.
- These datasets (found in Physionet.org) include different
patients which have been monitored by using a variety of sensors, and their breathing periods were classified by experts.
- So the datasets provide the annotations we need.
Classification problem: Sensors
- We studied how different sensor combinations affect the
result of the classification.
- We focused on the less cumbersome sensors, and
especially on the sensors that are related to respiration (nose, abdomen, chest and SpO2).
- We did not apply feature extraction before the
classification, and so we trained our classifiers with the raw data for the different sensor combinations, for the different datasets.
Classification problem: Data mining
- The data mining techniques we used were:
–
KNN algorithm.
–
Neural network.
–
Decision tree.
–
Support Vecto Machine.
Classification problem: Data mining: K Nearest Neighbor
- Define proximity between instances, find neighbors of
new instance and assign majority class
- Case based reasoning: when attributes are more
complicated than real-valued
- Cons:
–
Slow during application
–
No feature selection
–
Notion of proximity vague.
- Pros:
+ Fast training
Classification problem: Data mining: K Nearest Neighbor
Figure3 [3]
Classification problem: Data mining: Support Vector Machine.
- We want to find the optimal separation hyperplane which
maximizes the margin between the 2 classes.
- A hyperplane can be defined by the following equation:
where w,b are parameters.
- We assume that are our input vectors and are the
corresponding labels. we examine linear separable case for 2 classes. So yi=+1 or yi=-1
w x+b=0 xi yi
Classification problem: Data mining: Support Vector Machine.
Figure4 [4]
- we will select 2 parallel
hyperplanes w*χ+b=1 and w*x+b=-1 the hyperplanes for the 2 classes.
- The region between these
hyperplanes is called margin, and we want to maximize this region.
- It can be proven that the total
distance between these hyperplanes is 2/||w||
- So if we want to maximize the
margin, we will have to minimize ||w||.
Classification problem: Data mining: Support Vector Machine.
- We have to maximize the distance via minimizing ||w||, but
also we will have to satisfy our class separation constraints:
–
w*xi+b>=1 for every xi which yi=1
–
w*xi+b<=1 for every xi which yi=-1
- So, our problem can be defined as: minimize ||w|| subject
to yi*( w*xi+b ) >= 1 for each i in our dataset.
Classification problem: Data mining: Support Vector Machine.
- This is a quadratic programming problem which is
equivalent to solving the following( the primal Lagrangian equivalent to our problem): min for each ai>=0 , and l is the total number of training points. After solving for the partial derivatives of our parameters w and b equal to zero (local minimum), we get: and . b can be easily derived.
Lp(x)= 1 (‖w
2‖)
+∑
i=0 l
ai yi(xi w+b)+∑
i=0 l
ai w=∑
i=0 l
ai yi xi
∑
i=0 l
ai yi=0
Classification problem: Data mining: Support Vector Machine.
- We can also express our problem in the dual Lagrangian
form by substituting our parameters in the Lagrangian
- equation. The two forms are equivalent. We get the
following: max s.t Now we can find the ai by our maximisation constraint through partial derivatives equal to zero.
LD(ai)=∑
i=1 l
ai−1 2∑
i=1 l
∑
j=1 l
ai xi x j yi y j ai≥0,∑
i=0 l
ai yi=0
Classification problem: Data mining: Support Vector Machine.
- We express our problem to this form because we are
independent from w and b, and we can solve it only via computing the inner product of xi,xj.
- This is very useful for non linear separable cases where
we want to map our data to higher dimensions in order to make them linearly separable via using kernels.
- Also, most of the ai (Lagrangian multipliers) will be zero.
The ones that are not zero correspond to the support vectors.
Classification problem: Data mining: Support Vector Machine.
- Pros:
+ Reach global optimum + Not many parameters + Good for small datasets
- Cons:
- Choice of kernel
- Relatively slow training
- Does not scale well with
the increase of data
Classification problem: Data mining: Neural Networks.
- Input nodes are connected to
- utput nodes by a set of
hidden nodes and edges
- Inputs describe DB instances
- Outputs are the categories
we want to recognize
- Hidden nodes assign weights
to each edge so they represent the weight of relationships between the input and the output of a large set of training data
Input Nodes Output Nodes Hidden Layer Nodes
Car House Sports Music Comic
Classification problem: Data mining: Neural Networks.
- Initializing:
–
Normalize the data
–
Initialize the weights (set them to zero or uniformly distributed random value in (-1,1))
- Training phase:
–
Forward phase where the input vector xi is inserted, and we get the output.
–
Backpropagation: Based on the error function propagate the error to the previous layers and update the weights based on the gradient descend.
- Mining /Testing phase, where we test our model on unknown data
E(n)=1 2 ∑
i=1 Classes
( yi(n)−di(n))
2
Classification problem: Data mining: Neural Networks
Basic NN Unit:
H i d d e n n
- d
e s O u t p u t n
- d
e s x1 x2 x3 x1 x2 x3 w1 w2 w3
y n i i i
e y x w
-
1 1 ) ( ) (
1
A more typical NN:
Classification problem: Data mining: Neural Networks
- Useful for learning complex data like handwriting, speech
and image recognition. In order to have curved boundaries however, we must use a nonlinear activation function.
Neural network Classification tree Decision boundaries: Linear regression
Classification problem: Data mining: Neural Networks
- Pros:
–
Can learn more complicated boundaries
–
Can handle large number
- f features
–
Fast application
- Cons:
–
Slow training time
–
Hard to interpret
–
Hard to implement: trial and error for choosing number of nodes
Classification problem: Data mining: Decision Trees
- Tree where internal nodes are simple decision rules on
- ne or more attributes and leaf nodes are predicted class
labels.
S a l a r y < 1 M P r
- f
= t e a c h e r Good A g e < 3 Bad Bad Good
Classification problem: Data mining: Decision Trees
- Widely used learning method
- Easy to interpret: can be represented as if-then-else
rules
- Approximates function by piece wise constant regions
- Does not require any prior knowledge of data
distribution, works well on noisy data.
- Has been applied to:
– classify medical patients based on the disease, – Equipment malfunction by cause, – Loan applicant by likelihood of payment.
Classification problem: Data mining: Decision Tree
- Training a decision tree:
- C4.5 algorithm:
–
We define as S the set of the training instances.
3 base cases: 1) All the examples from the training set belong to the same class ( returns a leaf with that class label ) 2) The training set is empty ( returns a tree leaf called failure ). 3) The attribute list is empty ( returns a leaf labeled with the most frequent class or the disjuction of all the classes).
–
Step 1) Check cases.
–
Step 2) Find the attribute with the most information gain ( a_best)
–
Step 3) Partition S in to S1, S2 ,S3 … based on the values of the a_best
–
Step 4) Recur the above steps for ever subset of S
Classification problem: Data mining: Decision Tree
- Pros:
–
Reasonable training time
–
Fast application
–
Easy to interpret
–
Easy to implement
–
Can handle large number
- f features.
- Cons:
–
Cannot handle complicated feature relationships
–
Simple decision boundaries
–
Problems with lots of missing data
Classification problem: Results
- After the data mining, we evaluated the performance of
the different algorithms for the different sensor combinations through all the datasets.
- For the dataset with the best data quality, with using a
subset combination of the original dataset (in particular nose ,and chest respiration and O2) we can achieve really good results for the simplest classifier (KNN), and also for the other algorithms
- However for the datasets with lower data quality our best
accuracy is significantly lower for all of the different possible sensor combinations for all the techniques we tried.
Classification problem: Evaluation
- In order to further evaluate our results, we reimplemented
the experiment in another programming language (MATLAB→R).
- Regardless of the relatively different parameter tuning, we
had similar results.
- We have also implemented experiments where we include
feature extraction for the better evaluation of the results.
Future plans:
- From these experiments, it is obvious that data quality
plays a very important role in the classification procedure.
- We need to further understand the parameters which
affect the quality of a dataset, especially regarding time series biosignals.
- We aim to apply the knowledge we will gain in order to be
able to recognize and if possible correct potentially pattern like low quality traits on this types of datasets.
Figure References:
- [1] http://www.jarrettsvilledental.com/services/sleep-apnea/
- [2] http://www.sleep-apneaguide.com/polysomnogram.html
- [3]
https://en.wikipedia.org/wiki/K-nearest_neighbors_algorith m
- [4]https://www.quora.com/What-are-the-general-easy-to-