scalable semidefinite relaxation for maximum a posteriori
play

Scalable Semidefinite Relaxation for Maximum A Posteriori Estimation - PowerPoint PPT Presentation

ICML 2014, Beijing Scalable Semidefinite Relaxation for Maximum A Posteriori Estimation Qixing Huang, Yuxin Chen, and Leonidas Guibas Stanford University Page 1 Maximum A Posteriori (MAP) Inference Markov Random Field (MRF) w i :


  1. ICML 2014, Beijing Scalable Semidefinite Relaxation for Maximum A Posteriori Estimation Qixing Huang, Yuxin Chen, and Leonidas Guibas Stanford University Page 1

  2. Maximum A Posteriori (MAP) Inference • Markov Random Field (MRF) � w i : potential function for vertices � W ij : potential function for edges Page 2

  3. Maximum A Posteriori (MAP) Inference • Markov Random Field (MRF) � w i : potential function for vertices � W ij : potential function for edges • Maximum A Posteriori (MAP) Inference � Find the mode with the lowest energy / potential Page 2

  4. A Large Number of Applications ... • Computer Vision Applications � Image Segmentation � Geometric Surface Labeling � Photo Montage � Scene Decomposition � Object Detection � Color Segmentation � ... • Protein Folding • Metric Labeling • Error-Correcting Codes • ... OpenGM Benchmark Page 3

  5. Problem Setup • Model � n vertices ( x 1 , · · · , x n ) � m di ff erent states ( ) x i 2 { 1 , · · · , m } • Goal: n X X maximize f ( x 1 , · · · , x n ) := w i ( x i ) + W ij ( x i , x j ) | {z } i =1 ( i,j ) 2 G negative energy function s.t. x i 2 { 1 , · · · , m } Page 4

  6. Matrix Representation • Representation of Each x i � m possible states ( ) x i 2 { e 1 , e 2 , · · · , e m } x i Page 5

  7. Matrix Representation • Representation of Each x i � m possible states ( ) x i 2 { e 1 , e 2 , · · · , e m } • Representation of Potentials � potential on vertices: w i 2 R m x i W ij 2 R m ⇥ m � potential on edges: w i W ij Page 5

  8. Matrix Representation • Representation of Each x i � m possible states ( ) x i 2 { e 1 , e 2 , · · · , e m } • Representation of Potentials � potential on vertices: w i 2 R m x i W ij 2 R m ⇥ m � potential on edges: w i W ij • Equivalent Integer Program: n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } � Non-Convex! Page 5

  9. Matrix Representation n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } Page 6

  10. Matrix Representation n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } 2 3 · · · X 11 X 12 X 1 n X > · · · X 22 X 2 n 6 7 12 • Auxiliary Variable X = 6 7 . . . . . . . . · · · . 4 5 X > X > · · · X nn 1 n 2 n n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X ij = x i x > X ii = x i x > s.t. i = diag( x i ) j , x i 2 { e 1 , · · · , e m } Page 6

  11. Matrix Representation n X X ⌦ ↵ W ij , x i x > maximize f ( x 1 , · · · , x n ) := h w i , x i i + j i =1 ( i,j ) 2 G s.t. x i 2 { e 1 , · · · , e m } 2 3 2 3 · · · X 11 X 12 X 1 n x 1 X > · · · X 22 X 2 n x 2 6 7 6 7 12 • Auxiliary Variables X = 5 and x = 6 7 6 7 . . . . . . . . . . · · · . . 4 4 5 X > X > x n · · · X nn 1 n 2 n n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X = xx > , s.t. X ii = diag( x i ) x i 2 { e 1 , · · · , e m } Page 7

  12. Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X = xx > , s.t. X ii = diag( x i ) x i 2 { e 1 , · · · , e m } Page 8

  13. Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G X = xx > , s.t. X ii = diag( x i ) x i 2 { e 1 , · · · , e m } • Semidefinite Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 x X X ii = diag( x i ) x i 2 { e 1 , · · · , e m } Page 8

  14. Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X x i 2 { e 1 , · · · , e m } Page 9

  15. Convex Relaxation n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X x i 2 { e 1 , · · · , e m } • Relax the Constraints x i 2 { e 1 , · · · , e m } n X X maximize f ( x 1 , · · · , x n ) := h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X 1 > x i = 1 , x i � 0 , X ij � 0 , 8 ( i, j ) 2 G Page 9

  16. Our Semidefinite Formulation • Final Semidefinite Program (SDR): n X X maximize h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , X ii = diag( x i ) x X 1 > x i = 1 , x i � 0 , X ij � 0 , 8 ( i, j ) 2 G • Low-Rank and Sparse! • O ( nm 2 ) linear equality constraints x · x > = X Page 10

  17. Superiority to Linear Programming Relaxation LP Relaxation Semidefinite Relaxation (SDR)  � x > 1 X ij 1 = x i ( 1  i , j  n ) ⌫ 0 , x X X ii = diag( x i ) X ii = diag( x i ) 1 > x i = 1 x i � 0 , 1 > x i = 1 x i � 0 , X ij � 0 , 8 ( i, j ) 2 G X ij � 0 , 8 ( i, j ) 2 G • Shall we enforce the marginalization constraints? ( X ij 1 = x i , 1  i, j  n ) | {z } Θ ( n 2 m ) constraints Page 11

  18. Superiority to Linear Programming Relaxation LP Relaxation Semidefinite Relaxation (SDR)  � x > 1 X ij 1 = x i ( 1  i , j  n ) ⌫ 0 , x X X ii = diag( x i ) X ii = diag( x i ) 1 > x i = 1 x i � 0 , 1 > x i = 1 x i � 0 , X ij � 0 , 8 ( i, j ) 2 G X ij � 0 , 8 ( i, j ) 2 G • Shall we enforce the marginalization constraints? ( X ij 1 = x i , 1  i, j  n ) | {z } Θ ( n 2 m ) constraints • Answer: No! Proposition Any feasible solution to SDR necessarily satisfies X ij 1 = x i . O ( nm 2 ) v.s. O ( n 2 m + nm 2 ) linear equality constraints! Page 11

  19. ADMM • Alternating Direction Methods of Multipliers � Fast convergence in the first several tens of iterations Semidefinite Relaxation (SDR) n X X max h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � x > 1 s.t. ⌫ 0 , x X X ii = diag( x i ) 1 > x i = 1 , x i � 0 , X ij � 0 , 8 ( i, j ) 2 G Page 12

  20. ADMM • Alternating Direction Methods of Multipliers � Fast convergence in the first several tens of iterations Semidefinite Relaxation (SDR) n X X max h w i , x i i + h W ij , X ij i i =1 ( i,j ) 2 G  � Generic Formulation x > 1 s.t. ⌫ 0 , x X max h C , X i X ii = diag( x i ) s.t. A ( X ) = b , 1 > x i = 1 , x i � 0 , B ( X ) � 0 , X ij � 0 , 8 ( i, j ) 2 G X ⌫ 0 . • A , B , C are all highly sparse! Page 12

  21. Scalability? Generic Formulation max h C , X i dual vars s.t. A ( X ) = b , y B ( X ) � 0 , z � 0 X ⌫ 0 . S ⌫ 0 • A , B , C are all sparse! � All operations are fast except ... Page 13

  22. Scalability? Generic Formulation max h C , X i dual vars s.t. A ( X ) = b , y B ( X ) � 0 , z � 0 X ⌫ 0 . S ⌫ 0 • A , B , C are all sparse! � All operations are fast except ... ! X ( t � 1) � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � X ( t ) = µ ⌫ 0 | {z } projection onto PSD cone • Eigen-decomposition of dense matrices is expensive! Page 13

  23. Accelerated ADMM (SDPAD-LR) ! X ( t � 1) � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � X ( t ) = µ ⌫ 0 | {z } projection onto PSD cone • Recall: the ground truth obeys rank ( X ) = 1 � Enforce / Exploit Low-Rank Structure! Page 14

  24. Accelerated ADMM (SDPAD-LR) ! X ( t � 1) � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � X ( t ) = µ ⌫ 0 | {z } projection onto PSD cone • Recall: the ground truth obeys rank ( X ) = 1 � Enforce / Exploit Low-Rank Structure! • Our Strategy: � Only keep rank- r approximation of X ( t ) ⇡ Y ( t ) Y ( t ) > 0 1 � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � B C @ Y ( t � 1) Y ( t � 1) > eigens of B C | {z } µ A low rank | {z } sparse Page 14

  25. Accelerated ADMM (SDPAD-LR) • Our Strategy: � Only keep rank- r approximation of X ( t ) ⇡ Y ( t ) Y ( t ) > 0 1 � C + A ⇤ � y ( t ) � � B ⇤ � z ( t ) � B C @ Y ( t � 1) Y ( t � 1) > eigens of B C | {z } µ A low rank | {z } sparse • Numerically fast � e.g. Lanczos Process O ( nmr 2 + m 2 |G| ) • Empirically, r ⇡ 8 Cornelius Lanczos Page 15

  26. Benchmark Data Sets • Benchmark � OPENGM2 � PIC � ORIENT Page 16

  27. Benchmark Data Sets • Benchmark � OPENGM2 � PIC � ORIENT categories graphs # instances avg time n m PIC-Object full 60 11-21 37 5m32s PIC-Folding mixed 2K 2-503 21 21m42s PIC-Align dense 30-400 20-93 19 37m63s GM-Label sparse 1K 7 324 6m32s GM-Char sparse 5K-18K 2 100 1h13m GM-Montage grid 100K 5,7 3 9h32m GM-Matching dense 19 19 4 2m21s ORIENT sparse 1K 16 10 10m21s All problems can be solved within reasonable time! Page 16

  28. Empirical Convergence: Example • Benchmark: Geometric Surface Labeling (gm275) � matrix size: 5201; # constraints: 218791 � Stopping criterion: duality gap < 10 � 3 Page 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend