clinical depression classification
play

Clinical Depression Classification Would like to detect clinical - PowerPoint PPT Presentation

Evaluation of Objective Features for Classification of Clinical Depression in Speech by Genetic Programming Juan Torres 1 , Ashraf Saad 2 , Elliot Moore 1 1 School of Electrical and Computer 2 Computer Science Department Engineering School of


  1. Evaluation of Objective Features for Classification of Clinical Depression in Speech by Genetic Programming Juan Torres 1 , Ashraf Saad 2 , Elliot Moore 1 1 School of Electrical and Computer 2 Computer Science Department Engineering School of Computing Georgia Institute of Technology Armstrong Atlantic State University Savannah, GA 31407, USA Savannah, GA 31419, USA juan.torres@gatech.edu, ashraf@cs.armstrong.edu emoore@gtsav.gatech.edu � 09/18/2006 - 10/06/2006

  2. Clinical Depression Classification � Would like to detect clinical depression by analyzing a patient’s speech. � Binary decision classification problem � Large number of features in dataset. Feature Selection is necessary for: � Designing a robust classifier � Identifying of small set of useful features, which may in turn provide physiological insight � 09/18/2006 - 10/06/2006

  3. Speech Database � 15 patients (6 male, 9 female) � 18 control subjects (9 male, 9 female) � Corpus: 65 sentence short story � Observation Groupings: � G1: 13 obs/speaker (5 sentences each) � G2: 5 obs/speaker (13 sentences each) � 09/18/2006 - 10/06/2006

  4. Speech Features � Prosodics � Vocal Tract Resonant Frequencies (Formants) � Glottal Waveform � Teager FM � 09/18/2006 - 10/06/2006

  5. Speech Features (cont.) Pitch (PCH) Glottal Ratios � Raw features extracted (GLR) frame by frame (25- Energy Median Glottal Spectrum 30ms), and grouped into Statistics (EMS) (GLS) 10 categories: Energy Deviation Formant � EDS = STD(DFS(E v )) Statistics (EDS) Locations (FMT) � EMS = MED(DFS(E v )) Speaking Rate Formant (SPR) Bandwidths (FBW) Glottal Timing Teager FM (GLT) (TFM) � 09/18/2006 - 10/06/2006

  6. Statistics Statistic Equation � Sentence-level statistics Average (AVG) were computed for each 1/N * Sum{x i } raw feature → Direct Median (MED) 50 th percentile Feature Statistics Standard Deviation Sqrt(1/(N-1) * Sum{(x i - (STD) Mean(x)) 2 }) � Same set of statistics Minimum (MIN) 5 th Percentile used on DFS’s over each entire observation → Maximum (MAX) 95 th Percentile Observation Level Range (RNG) MAX – MIN Statistics Dynamic Range log 10 (MAX) – (DRNG) log 10 (MIN) Interquartile Range 75 th percentile – 25 th (IQR) percentile � 09/18/2006 - 10/06/2006

  7. Final Feature Sets Experiment Observations Features � Result: 2000+ distinct features (OFS) MG1 195 724 � Statistical significance MG2 75 298 tests (ANOVA) used FG1 234 1246 to initially prune FG2 90 857 feature set. � Final size: 298 – 1246 features → large FS problem. � 09/18/2006 - 10/06/2006

  8. Feature Selection � Goal: Select (small) group of features that maximizes classifier performance � Approaches � Filter: optimize computationally inexpensive fitness function � Wrapper: fitness function = classification performance � 09/18/2006 - 10/06/2006

  9. Genetic Programming for Classification and FS (GPFS) � Estimate optimal feature set and classifier simultaneously → “online approach”. (Muni, Pal, Das 2006) � Advantages: � Evolutionary search: explores (potentially) large portion of feature space � Resulting classifier consists of a simple algebraic expression (easy to read and interpret) � Stochastic: multiple runs should yield different solutions. Frequency of selection can be regarded as approximate fitness measure, given large number of runs. � 09/18/2006 - 10/06/2006

  10. Genetic Programming � Classifier consists of expression trees � Binary decision → single tree Class assigned by algebraic sign of evaluation (T>0 → c1, T<0 → c2) � Internal nodes: { +, -, x, / (protected)} � External nodes: {features, rnd_dbl(0-10)} �� 09/18/2006 - 10/06/2006

  11. Genetic Programming (Cont.) � Large population of classifier trees is evolved over several generations. � Population Initialization � Random Trees (height 2-6), ramped half and half method � Fitness Function = Classification Performance � Evolutionary Operators � Reproduction (fitness-proportional selection) � Mutation (random selection) � Crossover (tournament selection) �� 09/18/2006 - 10/06/2006

  12. Evolutionary Rules for Simultaneous Feature Selection � Initial tree generation � Probability of selecting a feature set decreases linearly with feature set size. � Fitness � Biased toward trees that use few features. � Crossover � Homogeneous: only between parents with same feature set � Heterogeneous: biased toward selecting parents with similar feature sets �� 09/18/2006 - 10/06/2006

  13. Dynamic Parameters � Fitness bias toward smaller subsets decreases with generations. � Probability of heterogeneous crossover decreases with generations � Motivation: Explore feature space during first few generations, then gradually concentrate on improving classification performance with current feature sets. �� 09/18/2006 - 10/06/2006

  14. GP Parameters Parameter Value Crossover probability 0.80 Reproduction probability 0.05 Mutation probability 0.15 Prob. of selecting int./ext. node during crossover 0.8 / 0.2 Prob. of selecting int./ext. node during mutation 0.7 / 0.3 Tournament size 10 Number of generations 30 for G1 / 20 for G2 Initial height of trees 2-6 Maximum allowed nodes of a tree 350 Maximum height of a tree 12 Population size 3000 for G1 / 2000 for G2 �� 09/18/2006 - 10/06/2006

  15. GP Results Classification Performance, Averaged over 10 runs of Leave-one-out Cross-validation Male Female Experiment Mean G1 G2 G1 G2 Classification 71.2 71.3 84.9 82.2 77.4 Accuracy Sensitivity 80.9 74.7 85.4 82.7 80.9 Specificity 64.8 69.1 84.4 81.8 75.0 Feature Set Size 18.5 15.3 16.1 14.2 16.0 �� 09/18/2006 - 10/06/2006

  16. Feature Selection Histograms �� 09/18/2006 - 10/06/2006

  17. “Best” Features -- Males Male - G1 Male - G2 GLT: Max((CP)MIN) GLT: Max((CP)MIN) GLT: DRng((CP)IQR) PCH: Med(A1) GLS: Med((gSt1000)MAX) EDS: Avg(MED) GLT: Std((OP)IQR) GLT: IQR((CP)IQR) GLR: Rng((rCPO)IQR) GLR: Min((rOPO)IQR) GLS: Avg((gSt1000)MAX) EDS: Avg(AVG) EDS: Avg(AVG) GLR: Med((rCPOP)MIN) EDS: Avg(MED) GLT: Std((CP)MIN) EDS: Med(MED) GLR: Max((rCPOP)MIN) GLT: Med((CP)MIN) GLS: Avg((gSt1000)MAX) �� 09/18/2006 - 10/06/2006

  18. “Best” Features -- Females Female - G1 Female - G2 EMS: Med(MR) EMS: IQR(AVG_1) EMS: Med(STD_1) EMS: Med(STD_1) EMS: Max(MR) PCH: IQR(IQR) EMS: Med(RNG) EMS: Med(STD) EMS: Max(STD_1) EMS: Max(MR) EMS: Med(AVG) TFM: Avg(MAX(IQR)) EMS: Max(MAX) EMS: Med(MR) EMS: Avg(STD_1) FBW: Med((bwF3)IQR) EMS: Avg(MED) EMS: Med(MAX) EMS: Avg(AVG) EMS: Med(RNG) �� 09/18/2006 - 10/06/2006

  19. GP Results (Cont.) � GP results were not as good as hoped for. However, the fact that certain features were selected in the final solutions more frequently than others can be regarded as a measure of their usefulness. � To test this hypothesis, we train Bayesian classifiers using the 16 features most frequently- selected by GP. �� 09/18/2006 - 10/06/2006

  20. Naive Bayesian Classification � Assign Class C j with highest probability given observation (features). � Can be estimated using Bayes’ rule: ( | ) ( ) p X C P C ( | ) = j j p C X ( ) j p X � Under naive assumption, class-conditional distributions can be expressed as: ∏ ( | ) ( | ) = p X C p x C j i j i �� 09/18/2006 - 10/06/2006

  21. PDF estimation methods Uniform Bins � A histogram with N uniformly spaced intervals (bins) is computed for each feature and each � class using the training data. The optimum value for N was found by exhaustive search. � Optimal Threshold � Similar to uniform bins with N=2, but the cutoff threshold between the two bins is chosen � separately for each feature. (Naive) Gaussian Assumption � � Model the PDF of each feature and each class as a 1-D Gaussian density function whose mean and variance are taken as the sample mean and (unbiased) variance of the training data. Gaussian Mixtures � Each likelihood function p ( X | C j ) is modeled as a weighted sum of multivariate Gaussian � densities. � The expectation-maximization (EM) algorithm is used to estimate means, covariance matrices, and weights. We use diagonal covariance matrices and limit the number of mixtures to 3 for the G1 experiments and 2 for the G2 experiments in order to reduce the number of parameters to be estimated. Multivariate Gaussian � � Each (class-conditional) likelihood function is modeled as a single multivariate Gaussian PDF with a full covariance matrix. Like the GMM, this method does not follow the naïve assumption. �� 09/18/2006 - 10/06/2006

  22. Results Exp Method Acc Sen Spec Exp Method Acc Sen Spec Unif Bin 86.7 83.3 88.9 Unif Bin 88.0 85.5 90.6 (N = 8) (N = 9) Opt Thresh 82.6 82.1 82.9 Opt Thresh 78.6 65.8 91.5 Male Female Gaussian 87.2 88.5 86.3 Gaussian 87.2 91.5 82.9 G1 G1 GMM 88.7 87.2 89.7 GMM 87.6 88.0 87.7 MVG 84.1 83.3 84.6 MVG 85.5 83.8 87.2 Unif Bin 90.7 93.3 88.9 Unif Bin 93.3 93.3 93.3 (N = 2) (N = 5) Opt Thresh 73.3 50.0 88.9 Opt Thresh 86.7 75.6 97.8 Male Female Gaussian 89.3 93.3 86.7 Gaussian 91.1 95.6 86.7 G2 G2 GMM 90.7 90.0 91.1 GMM 88.0 83.3 91.1 MVG 86.7 80.0 91.1 MVG 92.2 86.7 97.8 Average Improvement: 18.5% (Males), 7.1% (Females) �� 09/18/2006 - 10/06/2006

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend