entropic affinities properties and efficient numerical
play

Entropic Affinities: Properties and Efficient Numerical Computation - PowerPoint PPT Presentation

Entropic Affinities: Properties and Efficient Numerical Computation Max Vladymyrov and Miguel Carreira-Perpin Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu June 18, 2013 Summary


  1. Entropic Affinities: Properties and Efficient Numerical Computation Max Vladymyrov and Miguel Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu June 18, 2013

  2. Summary •The entropic affinities define affinities so that each point has an effective number of neighbors equal to K. •First introduced in: G. E. Hinton & S. Roweis: " Stochastic Neighbor Embedding ", NIPS 2002 . •Not in a widespread use, even though they work well in a range of problems. •We study some properties of entropic affinities and give fast algorithms to compute them. 2

  3. Affinity matrix Defines a measure of similarity between points in the dataset. Used in: Affinity matrix Data set • Dimensionality reduction: 1 ‣ Stochastic Neighbor Embedding, t -SNE, 0.9 Elastic Embedding, Laplacian Eigenmaps. 0.8 • Clustering: 0.7 0.6 ‣ Mean-Shift, Spectral clustering. 0.5 • Semi-supervised learning. 0.4 0.3 • and others 0.2 The performance of the algorithms depends crucially of the affinity construction, govern by the bandwidth . σ Common practice to set : σ • constant, • rule-of-thumb (e.g. distance to the 7th nearset neighbor, Zelnik & Perona, 05). 3

  4. Motivation: choice of σ COIL-20: Rotations of objects every 5º; input are greyscale images of . 128 × 128 Affinity matrices: Rule-of-thumb: Constant sigma Entropic affinities Dist. to the 7th nn (Zelnik & Perona, 05) − 5 − 6 − 4 x 10 x 10 x 10 16 7 3.5 14 6 3 12 5 2.5 10 4 2 8 3 1.5 6 2 1 4 1 0.5 2 0 0 4

  5. Motivation: choice of σ COIL-20: Rotations of objects every 5º; input are greyscale images of . 128 × 128 Dimensionality Reduction with Elastic Embedding algorithm: Rule-of-thumb: Constant sigma Entropic affinities Dist. to the 7th nn (Zelnik & Perona, 05) 5

  6. Search for good σ Good should be: σ • Set separately for every data point. • Take into account the whole distribution of distances. σ n x n σ n x n x 1 x 2 x 1 x 2 6

  7. Entropic affinities In the entropic affinities, the is set individually for each point such that it has a σ distribution over neighbors with fixed perplexity (Hinton & Rowies, 2003). K x 1 , . . . , x N ∈ R D x ∈ R D • Consider a distribution of the neighbors for : x N � || ( x − x n ) / σ || 2 � K p n ( x ; σ ) = P N � || ( x − x k ) / σ || 2 � k =1 K posterior distribution of Kernel Density Estimate. x 2 x • The entropy of the distribution is defined as H ( x , σ ) = − P N n =1 p n ( x , σ ) log( p n ( x , σ )) • Consider the bandwidth (or precision ) given the perplexity : 1 x 1 β = K σ 2 σ 2 H ( x , β ) = log K • Perplexity of in a distribution over neighbors provides the same surprise N K p as if we were to choose among equiprobable neighbors. K • We define entropic affinities as probabilities for with respect p = ( p 1 , . . . , p N ) x to . Thos affinities define a random walk matrix. β 7

  8. Entropic affinities: example 8

  9. Entropic affinities: properties H ( x n , β n ) ≡ − P N n =1 p n ( x n , β n ) log( p n ( x n , β n )) = log K • This is a root-finding problem or an 1 D inversion problem . β n = H − 1 x n (log K ) • Should be solved for 6 x n ∈ x 1 , . . . , x N H(x, � ) β ∗ • We can prove that: K=30 4 ‣ The root-finding problem is well log(K) defined for a Gaussian kernel for 2 any , and has a unique root β n > 0 for any . K ∈ (0 , N ) 0 ‣ The inverse is a uniquely defined − 2 − 1 0 1 2 3 log( � ) continuously differentiable function for all and . x n ∈ R N K ∈ (0 , N ) 9

  10. Entropic affinities: bounds The bounds for every and : [ β L , β U ] x n ∈ R N K ∈ (0 , N ) d N 0 1 s d 2 N log N log N A , K K β L = max , d 1 @ ( N − 1) ∆ 2 d 4 N − d 4 1 N 6 ✓ ◆ β U = 1 p 1 log ( N − 1) β ∗ , β U ∆ 2 1 − p 1 4 2 log(K) ∆ 2 2 = d 2 2 − d 2 where , , and is a unique ∆ 2 N = d 2 N − d 2 p 1 β L 2 1 1 solution of the equation √ N � � 2(1 − p 1 ) log 2(1 − p 1 ) = log min( 2 N, K ) 0 − 2 − 1 0 1 2 3 The bounds are computed in for each point. O (1) log( � ) 10

  11. Entropic affinities: computation For every x n ∈ x 1 , . . . , x N H ( x n , β n ) = log K x N 1. Initialize as close to the root as β n possible. 2. Compute the root . β n x 1 x 2 11

  12. 1. Computation of ; the root-finding β n Convergence Number of . O ( N ) Methods Meth Derivatives order order evaluations Bisection linear 0 1 Derivative- Derivative- Brent linear 0 1 free free Ridder quadratic 0 2 Newton quadratic 1 2 Derivative- Derivative- Halley cubic 2 3 based based Euler cubic 2 3 • The cost of the objective function evaluation and each of derivative is . O ( N ) • Derivative-free methods above generally converge globally. They work by iteratively shrinking an interval bracketing the root. • Derivative-based methods have higher convergence order, but may diverge. 12

  13. Robustified root-finding algorithm •We embed the derivative-based algorithm into bisection loop for global convergence. •We run the following algorithm for each x n ∈ x 1 , . . . , x N β K Input: initial , perplexity , 5.5 1 d 2 1 , . . . , d 2 , bounds . B distances 5 N while true do 4.5 k = 1 for to maxit do 4 β compute using a derivative- 3.5 log(K) based method 3 if tolerance achieved return 2.5 β / ∈ B if exit for loop 2 B update 1.5 H( � ) end for log(K) β 1 compute using bisection iterations B 0.5 update − 10 − 5 0 5 log( � ) end while 13

  14. Robustified root-finding algorithm •We embed the derivative-based algorithm into bisection loop for global convergence •We run the following algorithm for each x n ∈ x 1 , . . . , x N Bisection: step is outside the brackets β K Input: initial , perplexity , 5.5 1 d 2 1 , . . . , d 2 , bounds . B distances N 5 while true do 4.5 k = 1 for to maxit do 4 β compute using a derivative- 3.5 log(K) based method 3 if tolerance achieved return 2.5 β / ∈ B if exit for loop 2 B update H( � ) 1.5 log(K) end for Newton β 1 compute using bisection iterations B 0.5 update − 10 − 5 0 5 log( � ) end while 14

  15. Robustified root-finding algorithm •We embed the derivative-based algorithm 3.6 into bisection loop for global convergence 3.5 •We run the following algorithm for each 3.4 x n ∈ x 1 , . . . , x N 3.3 3.2 − 3.7 − 3.6 − 3.5 − 3.4 − 3.3 − 3.2 − 3.1 − 3 Normal step β K Input: initial , perplexity , 5.5 2 1 d 2 1 , . . . , d 2 , bounds . B distances N 5 while true do 4.5 k = 1 for to maxit do 4 β compute using a derivative- 3.5 log(K) based method 3 if tolerance achieved return 2.5 β / ∈ B if exit for loop 2 B update H( � ) 1.5 log(K) end for Newton β 1 compute using bisection iterations B 0.5 update − 10 − 5 0 5 log( � ) end while 15

  16. Robustified root-finding algorithm •We embed the derivative-based algorithm 3.6 into bisection loop for global convergence 3.5 •We run the following algorithm for each 3.4 x n ∈ x 1 , . . . , x N 3.3 3.2 − 3.7 − 3.6 − 3.5 − 3.4 − 3.3 − 3.2 − 3.1 − 3 Normal step β K Input: initial , perplexity , 5.5 2 3 1 d 2 1 , . . . , d 2 , bounds . B distances N 5 while true do 4.5 k = 1 for to maxit do 4 β compute using a derivative- 3.5 log(K) based method 3 if tolerance achieved return 2.5 β / ∈ B if exit for loop 2 B update H( � ) 1.5 log(K) end for Newton β 1 compute using bisection iterations B 0.5 update − 10 − 5 0 5 log( � ) end while 16

  17. Robustified root-finding algorithm •We embed the derivative-based algorithm 3.6 into bisection loop for global convergence 3.5 •We run the following algorithm for each 3.4 x n ∈ x 1 , . . . , x N 3.3 3.2 − 3.7 − 3.6 − 3.5 − 3.4 − 3.3 − 3.2 − 3.1 − 3 Normal step β K Input: initial , perplexity , 5.5 2 3 4 1 d 2 1 , . . . , d 2 , bounds . B distances N 5 while true do 4.5 k = 1 for to maxit do 4 β compute using a derivative- 3.5 log(K) based method 3 if tolerance achieved return 2.5 β / ∈ B if exit for loop 2 B update H( � ) 1.5 log(K) end for Newton β 1 compute using bisection iterations B 0.5 update − 10 − 5 0 5 log( � ) end while 17

  18. 2. Initialization of β n 1. Simple initialization: • midpoint of the bounds, • distance to th nearest neighbor. k Typically far from root and require more iterations. 2. Each new is initialized from the β n solution to its predecessor: • sequential order; • tree order. We need to find orders that are correlated with the behavior of . β 18

  19. 2. Initialization of β n 1. Simple initialization: • middle of the bounds, • distance to th nearest neighbor. k Typically far from root and require more iterations. 2. Each new is initialized from the β n solution to its predecessor: • sequential order; • tree order. We need to find orders that are correlated with the behavior of . β 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend