automatic classification of fricatives using t sne
play

Automatic Classification of Fricatives Using t-SNE Yizhar Lavner 1 - PowerPoint PPT Presentation

Automatic Classification of Fricatives Using t-SNE Yizhar Lavner 1 and Alex Frid 1,2 1 Department of Computer Science, Tel-Hai College, Israel 2 Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa,


  1. Automatic Classification of Fricatives Using t-SNE Yizhar Lavner 1 and Alex Frid 1,2 1 Department of Computer Science, Tel-Hai College, Israel 2 Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, Israel XXII Annual Pacific Voice Conference Krakow Poland, April 2014

  2. Phoneme analysis • The fricatives were analyzed and various features (in time and spectrum domain) are computed.

  3. Supervised Learning using t-SNE • t- distributed Stochastic Neighbor Embedding (t-SNE), ( van der Maaten & Hinton,2008 ) • A non-linear method for Dimensionality reduction. • t-SNE aims at preserving the local neighborhood structure of a set of data points in a high- dimensional space while converting it into 2 or 3 dimensional data. • Global structure such as clusters can be also preserved.

  4. t-SNE – cont. • High dimensional space: Converting distances between data points into pairwise conditional probabilities (similarities, affinities) x according to a Gaussian pdf, centered at : i   2    2 /2 x x i j i e  p Xi 0.2   | j i  2    2 /2 x x Xj 0.15 i k i e 0.1 Pdf  k i Xj 0.05  p p Setting:  0 | | j i i j p 2 ij 2 2 n 0 0 -2 -2 x2 x1

  5. t-SNE – cont. • Low dimensional space: Converting distances between data points into pairwise joint probabilities using student-t distribution ( 1 df ):    1 2   1 y y 0.2 i j  0.15 q   0.1  ij  1   2 0.05 1 y y 0 k l 2  2 k l 1 0 0 -1 -2 -2 • Better optimization, aims at solving the crowding problem (heavy tailed distribution).

  6. t-SNE – cont. • Embedding map points (low dimensional space)  minimization of the cost function ( Kullback-Leibler divergence ) : p      ij || log C KL P Q p ij q i j ij • The gradient of KL:        1 C  2      4 1 p q y y y y  ij ij i j i j y j i • Optimization: gradient descent with a momentum term

  7. t-SNE • [3D images / movie] 100 • /s/ 50 0 • /∫/ -50 • /f/ -100 -150 • / θ / 100 80 60 40 20 0 -20 -40 -60 -100 -80 -80 -60 -40 -20 0 20 -100 40 60 80 100

  8. Classification using t-SNE d=(2,3 …) Perplexity k /s/ Mapped kNN / Feature /∫/ Speech t-SNE vectors Majority vectors frames /f/ (d=3) vote (d=24) / θ / • 25,000 feature vectors (each from one frame of 8 msec.) • Paremeter selection based on preliminary experiments (perplexity=5-10, k=7-9, d=3). • Cross validation – 100 runs, 80% train, 20% test.

  9. Results • Before dimension reduction (kNN with d=24): Frames correct rate = 76.8%. • After mapping into 3-d using t-SNE using kNN: Frames correct rate = 73.6%. Fricative/ /s/ /∫/ /f/ / θ / Detected as: /s/ 83.8% 5.3% 11.0% 8.2% /∫/ 2.6% 85.7% 1.8% 12.8% /f/ 10.5% 3.0% 69. 8% 35.6% / θ / 3.1% 6.0% 17.4% 43.4%

  10. Some More Results • Before dimension reduction (SVM with d=12): Frames correct rate = 86.9% (with majority vote). • After mapping into 3-d using t-SNE with SVM: Frames correct rate = 89.4% . (use of majority vote raises the results by 3%) Detected as : /s/ /∫/ /f/ /θ/ /s/ 88.5% 2.2% 6.5% 2.8% /∫/ 3.0% 90.9% 2.3% 3.8% /f/ 4.4% 0.9% 85.8% 8.9% /θ/ 0.5% 0.0% 7.0% 92.5% Frid-Lavner (IWSSIP2014)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend