Automatic Classification of Fricatives Using t-SNE Yizhar Lavner 1 - - PowerPoint PPT Presentation

automatic classification of fricatives using t sne
SMART_READER_LITE
LIVE PREVIEW

Automatic Classification of Fricatives Using t-SNE Yizhar Lavner 1 - - PowerPoint PPT Presentation

Automatic Classification of Fricatives Using t-SNE Yizhar Lavner 1 and Alex Frid 1,2 1 Department of Computer Science, Tel-Hai College, Israel 2 Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa,


slide-1
SLIDE 1

Automatic Classification of Fricatives Using t-SNE

Yizhar Lavner1 and Alex Frid1,2

1Department of Computer Science,

Tel-Hai College, Israel

2Edmond J. Safra Brain Research Center for the Study

  • f Learning Disabilities, University of Haifa, Israel

XXII Annual Pacific Voice Conference Krakow Poland, April 2014

slide-2
SLIDE 2

Phoneme analysis

  • The fricatives were analyzed and various

features (in time and spectrum domain) are computed.

slide-3
SLIDE 3

Supervised Learning using t-SNE

  • t- distributed Stochastic Neighbor Embedding

(t-SNE), (van der Maaten & Hinton,2008)

  • A non-linear method for Dimensionality reduction.
  • t-SNE aims at preserving the local neighborhood

structure of a set of data points in a high- dimensional space while converting it into 2 or 3 dimensional data.

  • Global structure such as clusters can be also

preserved.

slide-4
SLIDE 4

t-SNE – cont.

  • High dimensional space:

Converting distances between data points into pairwise conditional probabilities (similarities, affinities) according to a Gaussian pdf, centered at :

Setting:

 

 

2 2 2 2

/2 | /2

i j i i k i

x x j i x x k i

e p e

      

| |

2

j i i j ij

p p p n  

i

x

  • 2

2

  • 2

2 0.05 0.1 0.15 0.2

x1 x2 Pdf

Xi Xj Xj

slide-5
SLIDE 5

t-SNE – cont.

  • Low dimensional space:

Converting distances between data points into pairwise joint probabilities using student-t distribution (1 df):

  • Better optimization, aims at solving the crowding problem

(heavy tailed distribution).

 

 

1 2 1 2

1 1

i j ij k l k l

y y q y y

  

    

  • 2
  • 1

1 2

  • 2

2 0.05 0.1 0.15 0.2

slide-6
SLIDE 6

t-SNE – cont.

  • Embedding map points (low dimensional space) 

minimization of the cost function (Kullback-Leibler divergence) :

  • The gradient of KL:
  • Optimization: gradient descent with a momentum term

 

|| log

ij ij i j ij

p C KL P Q p q  

  

1 2

4 1

ij ij i j i j j i

C p q y y y y y  

    

slide-7
SLIDE 7

t-SNE

  • [3D images / movie]
  • /s/
  • /∫/
  • /f/
  • /θ/
  • 100
  • 80
  • 60
  • 40
  • 20

20 40 60 80 100

  • 100
  • 80
  • 60
  • 40
  • 20

20 40 60 80 100

  • 150
  • 100
  • 50

50 100

slide-8
SLIDE 8

Classification using t-SNE

  • 25,000 feature vectors (each from one frame of 8 msec.)
  • Paremeter selection based on preliminary experiments

(perplexity=5-10, k=7-9, d=3).

  • Cross validation – 100 runs, 80% train, 20% test.

t-SNE d=(2,3…) Perplexity Speech frames Feature vectors (d=24) kNN / Majority vote k

/s/ /∫/ /f/ /θ/

Mapped vectors (d=3)

slide-9
SLIDE 9

Results

  • Before dimension reduction (kNN with d=24):

Frames correct rate = 76.8%.

  • After mapping into 3-d using t-SNE using kNN:

Frames correct rate = 73.6%.

/θ/ /f/ /∫/ /s/

Fricative/ Detected as:

8.2% 11.0% 5.3% 83.8% /s/ 12.8% 1.8% 85.7% 2.6% /∫/ 35.6%

  • 69. 8%

3.0% 10.5% /f/ 43.4% 17.4% 6.0% 3.1% /θ/

slide-10
SLIDE 10

Some More Results

  • Before dimension reduction (SVM with d=12):

Frames correct rate = 86.9% (with majority vote).

  • After mapping into 3-d using t-SNE with SVM:

Frames correct rate = 89.4%. (use of majority vote raises the results by 3%)

Detected as: /s/ /∫/ /f/ /θ/ /s/ 88.5% 2.2% 6.5% 2.8% /∫/ 3.0% 90.9% 2.3% 3.8% /f/ 4.4% 0.9% 85.8% 8.9% /θ/ 0.5% 0.0% 7.0% 92.5%

Frid-Lavner (IWSSIP2014)