scaling methods to obtain doubly stochastic matrices
play

Scaling Methods to obtain Doubly stochastic matrices Krishna - PowerPoint PPT Presentation

Scaling Methods to obtain Doubly stochastic matrices Krishna Acharya, Nacim Oijid December 10, 2019 1 / 44 Overview Definition of the problem 1 Applications 2 A Natural Algorithm - Sinkhorn and Knopp 3 Conditions on A 4 Describe types


  1. Scaling Methods to obtain Doubly stochastic matrices Krishna Acharya, Nacim Oijid December 10, 2019 1 / 44

  2. Overview Definition of the problem 1 Applications 2 A Natural Algorithm - Sinkhorn and Knopp 3 Conditions on A 4 Describe types of convergence analysis 5 Knight 2008 6 Livne and Golub 7 Brief Description of Knight and Ruiz 8 Newton’s method 9 10 Splitting method 11 Some Key results from KR 2013 12 Numerical Comparison 2 / 44

  3. Balancing a matrix Given a nonnegative square matrix A. Find diagonal matrices X and Y , such that XAY is doubly stochastic. 3 / 44

  4. Preconditioning Linear Systems We are interested in z , the solution of Az = b Let A 1 = XAY be doubly stochastic, and we can obtain solution of A 1 z 1 = Xb . Then z = Yz 1 Perhaps A 1 z 1 = Xb is more numerically stable, so it helps to do this scaling Also note that A and A 1 have the same sparsity. 4 / 44

  5. Back to Balancing find positive vectors r and c such that D ( r ) AD ( c ) is doubly stochastic find positive vectors r and c such that D ( r ) AD ( c ) e = e and D ( c ) A T D ( r ) e = e r = D ( Ac ) − 1 e and c = D ( A T r ) − 1 e 5 / 44

  6. Iterative method The fixed point earlier leads to an iterative method, where c k +1 = D ( A T r k ) − 1 e and r k +1 = D ( Ac k +1 ) − 1 e In MATLAB c k +1 = 1 ./ ( A ′ ∗ r k ) , r k +1 = 1 ./ ( A ∗ c k +1 ) r 0 = e corresponds to the Sinkhorn and Knopp algorithm(’52). Lets see a run on   3 1 0 1 2 0   2 0 1 6 / 44

  7. Important Qs Do the c k and r k converge to r and c? Which non negative A ’s is this possible for? Linear convergence, quadratic ...? 7 / 44

  8. Some relevant definitions For A ∈ R n × n , a collection of n elements of A is called a diagonal of A, provided no two of the elements belong to the same row or column of A. A nonzero diagonal of A is a diagonal not containing any 0’s. If G is the bipartite graph whose adjacency matrix is A, then non zero diagonals correspond to the perfect matchings of G. 8 / 44

  9. Matrices for which balancing is possible Matrix has support if it has at least one non zero diagonal a (0-1) matrix A has total support provided each of its 1’s belongs to a non zero diagonal. That is, if Aij � = 0 we can find a permutation, B, of the rows of A which puts Aij on the leading diagonal and for which | b kk | > 0 for all k 9 / 44

  10. SK theorem Theorem If A ∈ R n × n is non negative, then a necessary and sufficient condition that there exists a doubly stochastic matrix P of the form DAE where D and E are diagonal matrices with positive main diagonals, is that A has total support. If P exists, then it is unique. D and E are also unique up to a scalar multiple iff A is fully indecomposable. Theorem A necessary and sufficient condition that the SK algorithm converges is that A has total support 10 / 44

  11. Convergence analysis There are many papers on the convergence of r k and c k , The main idea in these papers is to show that on each iteration a special function(e.g entropy, log barrier, permanent) is decreasing. For a matrix that can be balanced, a lower bound is known → it corresponds to the doubly stochastic case. Convex Optimization - Much faster scaling, Cohen Log barrier methods(Kalantari and Khachiyan 1996) Entropy minimization Connections to permanent of a matrix Topological methods 11 / 44

  12. Linear convergence for symmetric A, Knight 2008 Considers the case where A is symmetric. SK algorithm can be applied, Looks for diagonal matrix D , such that DAD is doubly stochastic. The symmetric analogues of SK algorithm are x = D ( Ax ) − 1 e and the iterative step x k = D ( Ax k − 1 ) − 1 e . 12 / 44

  13. Rate of convergence analysis, Knight 2008 Theorem If A is fully indecomposable then the SK algorithm will converge linearly to vectors r ∗ and c ∗ , such that D ( r ∗ ) AD ( c ∗ ) = P where P is doubly stochastic. Furthermore, there exists K ∈ Z such that for k ≥ K. � r k +1 � � r ∗ � � � r k � � r ∗ � � � � � ≤ σ 2 − − � � � � 2 c k +1 c ∗ c k c ∗ � � � Where σ 2 is the second singular value of P. 13 / 44

  14. What LG 2004 achieve Tackle the problem of Bi-normalization Provide an Iterative algorithm BIN for scaling all rows and columns of a real symmetric matrix to unit-2 norm. 14 / 44

  15. Important notions LG 2004 ˜ A = DAD B ij = A 2 ij j ˜ A 2 j d 2 i A 2 ij d 2 � ij = � j = c ∀ i ∈ [ n ] We are looking for positive solution ˜ x to B ( x ) x = be , where b is a positive real and B ( x ) = D ( Bx ) Equivalently(used in GS method) we are looking for a positive solution to ( I n − 1 n ee T ) B ( x ) x = be − 1 n ee T be 15 / 44

  16. Existence and uniqueness Matrix is defined to be scalable if B ( x ) x = be has a positive solution. What we are trying to solve is an n × n system of quadratic equations in x 1 , . . . x n . (positive x). Solution does not always exist. Theorem If a matrix is scalable, it is either (a) not diluted ,or (b) n = 2 and � 0 � a A = a 0 Diluted, if ∃ i ∈ [ n ] , A ii � = 0 and A kj = 0 ∀ k � = i and j � = i 16 / 44

  17. BIN Algorithm Algorithm 1: BIN Input: A , n × n real symm matrix, d 0 ∈ R n ,TOL-tolerance Output: ˜ A , d ∈ R n n ) 2 ) T ; x = (( d 0 1 ) 2 , . . . , ( d 0 B = A ◦ A ; Do Update rule; while s ( x ) > TOL · ¯ β ( x ) and sweeps < MaxSweeps do for i ← 1 to n do Solve equation i for x i keeping all other x j fixed at current values; end Do update rule ; end Update rule: d i = ¯ β − 1 / 2 x i ∀ i ∈ [ n ] ˜ A = DAD k ( x k β k − ¯ s ( x ) = ( 1 β ) 2 ) 1 / 2 � n 17 / 44

  18. Notations Let D be the operator that transform a vector into a diagonal matrix of his elements :   x 1 0 . . . . . . 0   x 1 . ... ... .   0 x 2 . . .     .  . .  ... ... ...   . . D : x = �→ D ( x ) =   . . .   .    .  .  ... ... ...  .    . 0  x n   0 . . . . . . 0 x n 18 / 44

  19. Notations Let D be the operator that transform a vector into a diagonal matrix of his elements :   x 1 0 . . . . . . 0   x 1 . ... ... .   0 x 2 . . .     .  . .  ... ... ...   . . D : x = �→ D ( x ) =   . . .   .    .  .  ... ... ...  .    . 0  x n   0 . . . . . . 0 x n   1 . . let e represent the vector   .   1 19 / 44

  20. Quick recall on iterativ method We are looking for 2 vectors r and c such that D ( r ) AD ( c ) is doubly stochastic. That means D ( r ) Ac = e and D ( c ) A T r = e 20 / 44

  21. Quick recall on iterativ method We are looking for 2 vectors r and c such that D ( r ) AD ( c ) is doubly stochastic. That means D ( r ) Ac = e and D ( c ) A T r = e we have the following : c = D ( A T r ) − 1 e (1) r = D ( Ac ) − 1 e . (2) 21 / 44

  22. Quick recall on iterativ method We are looking for 2 vectors r and c such that D ( r ) AD ( c ) is doubly stochastic. That means D ( r ) Ac = e and D ( c ) A T r = e we have the following : c = D ( A T r ) − 1 e (1) r = D ( Ac ) − 1 e . (2) iterative method : c k +1 = D ( A T r k ) − 1 e , r k +1 = D ( Ac ) − 1 e So, if A is symmetric, by uniqueness of r and c , we have r = c . So, the unique solution x ∗ is solution of the equation f ( x ∗ ) = D ( x ∗ ) Ax ∗ − e = 0 (3) 22 / 44

  23. Quick recall on iterativ method We are looking for 2 vectors r and c such that D ( r ) AD ( c ) is doubly stochastic. That means D ( r ) Ac = e and D ( c ) A T r = e we have the following : c = D ( A T r ) − 1 e (1) r = D ( Ac ) − 1 e . (2) iterative method : c k +1 = D ( A T r k ) − 1 e , r k +1 = D ( Ac ) − 1 e So, if A is symmetric, by uniqueness of r and c , we have r = c . So, the unique solution x ∗ is solution of the equation f ( x ∗ ) = D ( x ∗ ) Ax ∗ − e = 0 (3) The iterative step therefore is x k +1 = D ( Ax k ) − 1 e . But we don’t do the iterations as that would be SK. We will go back to the equation 3 and find a solution of it with Newton’s method. 23 / 44

  24. rows/columns scaling Algorithm 2: Rows Columns scaling Input: a m × n matrix A A (0) ← A ; D (0) ← I m ; E (0) ← I n ; for k ← 0 to convergence do � || r k R ← D ( i || ); � || c k C ← D ( j || ); A ( k +1) ← R − 1 A ( k ) C − 1 ; D ( k +1) ← D ( k ) R − 1 ; C ( k +1) ← E ( k ) C − 1 ; end 24 / 44

  25. Newton’s method remind : Newton’s method (on the blackboard) 25 / 44

  26. Newton’s method Let J ( X ) be the jacobian of f . We have ∂ J ( x ) = ∂ x ( D ( x ) Ax − e ) = D ( x ) A + D ( Ax ) 26 / 44

  27. Newton’s method Let J ( X ) be the jacobian of f . We have ∂ J ( x ) = ∂ x ( D ( x ) Ax − e ) = D ( x ) A + D ( Ax ) result : x k +1 − x k = − J ( x ) − 1 f ( x ) (4) D ( x ∗ ) Ax ∗ − e � − 1 � � � x k +1 = x k − D ( x ) A + D ( Ax ) (5) 27 / 44

  28. Newton’s method Let J ( X ) be the jacobian of f . We have ∂ J ( x ) = ∂ x ( D ( x ) Ax − e ) = D ( x ) A + D ( Ax ) result : x k +1 − x k = − J ( x ) − 1 f ( x ) (4) D ( x ∗ ) Ax ∗ − e � − 1 � � � x k +1 = x k − D ( x ) A + D ( Ax ) (5) we can arrange this equation : ( A + D ( x k ) − 1 D ( Ax k )) x k +1 = Ax k + D ( x k ) − 1 e (6) 28 / 44

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend