the kikuchi hierarchy and tensor pca
play

The Kikuchi Hierarchy and Tensor PCA Alex Wein Courant Institute, - PowerPoint PPT Presentation

The Kikuchi Hierarchy and Tensor PCA Alex Wein Courant Institute, NYU Joint work with: Ahmed El Alaoui Cris Moore Stanford Santa Fe Institute 1 / 19 Statistical Physics of Inference High-dimensional inference problems: compressed


  1. Algorithms for Tensor PCA Local algorithms : keep track of a “guess” v ∈ R n and locally maximize the log-likelihood L ( v ) = � Y , v ⊗ p � ◮ Gradient descent [Ben Arous-Gheissari-Jagannath ’18] ◮ Tensor power iteration [Richard-Montanari ’14] ◮ Langevin dynamics [Ben Arous-Gheissari-Jagannath ’18] 6 / 19

  2. Algorithms for Tensor PCA Local algorithms : keep track of a “guess” v ∈ R n and locally maximize the log-likelihood L ( v ) = � Y , v ⊗ p � ◮ Gradient descent [Ben Arous-Gheissari-Jagannath ’18] ◮ Tensor power iteration [Richard-Montanari ’14] ◮ Langevin dynamics [Ben Arous-Gheissari-Jagannath ’18] ◮ Approximate message passing (AMP) [Richard-Montanari ’14] 6 / 19

  3. Algorithms for Tensor PCA Local algorithms : keep track of a “guess” v ∈ R n and locally maximize the log-likelihood L ( v ) = � Y , v ⊗ p � ◮ Gradient descent [Ben Arous-Gheissari-Jagannath ’18] ◮ Tensor power iteration [Richard-Montanari ’14] ◮ Langevin dynamics [Ben Arous-Gheissari-Jagannath ’18] ◮ Approximate message passing (AMP) [Richard-Montanari ’14] These only succeed when λ ≫ n − 1 / 2 ◮ Recall: MLE works for λ ∼ n (1 − p ) / 2 6 / 19

  4. Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : 7 / 19

  5. Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] 7 / 19

  6. Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] 7 / 19

  7. Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15] 7 / 19

  8. Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15] These are poly-time and succeed when λ ≫ n − p / 4 7 / 19

  9. Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15] These are poly-time and succeed when λ ≫ n − p / 4 SoS lower bounds suggest no poly-time algorithm when λ ≪ n − p / 4 [Hopkins-Shi-Steurer ’15, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17] 7 / 19

  10. Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15] These are poly-time and succeed when λ ≫ n − p / 4 SoS lower bounds suggest no poly-time algorithm when λ ≪ n − p / 4 [Hopkins-Shi-Steurer ’15, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17] n (1 − p ) / 2 n − p / 4 n − 1 / 2 0 λ impossible MLE hard SoS !!! Local Local algorithms (gradient descent, AMP, ...) are suboptimal when p ≥ 3 7 / 19

  11. Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) 8 / 19

  12. Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] 8 / 19

  13. Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Interpolates between SoS and MLE: ◮ δ = 0 poly-time algorithm for λ ∼ n − p / 4 ⇒ ◮ δ = 1 2 n -time algorithm for λ ∼ n (1 − p ) / 2 ⇒ 8 / 19

  14. Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Interpolates between SoS and MLE: ◮ δ = 0 poly-time algorithm for λ ∼ n − p / 4 ⇒ ◮ δ = 1 2 n -time algorithm for λ ∼ n (1 − p ) / 2 ⇒ n (1 − p ) / 2 n − p / 4 n − 1 / 2 0 λ impossible MLE hard SoS Local 8 / 19

  15. Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Interpolates between SoS and MLE: ◮ δ = 0 poly-time algorithm for λ ∼ n − p / 4 ⇒ ◮ δ = 1 2 n -time algorithm for λ ∼ n (1 − p ) / 2 ⇒ n (1 − p ) / 2 n − p / 4 n − 1 / 2 0 λ impossible MLE hard SoS Local In contrast, some problems have a sharp threshold ◮ E.g., λ > 1 is nearly-linear time; λ < 1 needs time 2 n 8 / 19

  16. Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Interpolates between SoS and MLE: ◮ δ = 0 poly-time algorithm for λ ∼ n − p / 4 ⇒ ◮ δ = 1 2 n -time algorithm for λ ∼ n (1 − p ) / 2 ⇒ n (1 − p ) / 2 n − p / 4 n − 1 / 2 0 λ impossible MLE hard SoS Local In contrast, some problems have a sharp threshold ◮ E.g., λ > 1 is nearly-linear time; λ < 1 needs time 2 n For “soft” thresholds (like tensor PCA): BP/AMP can’t be optimal 8 / 19

  17. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) 9 / 19

  18. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio 9 / 19

  19. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems 9 / 19

  20. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] 9 / 19

  21. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] ◮ Idea: look for a low-degree polynomial (of Y ) that distinguishes P (spiked tensor) and Q (pure noise) 9 / 19

  22. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] ◮ Idea: look for a low-degree polynomial (of Y ) that distinguishes P (spiked tensor) and Q (pure noise) � O (1) E Y ∼ P [ f ( Y )] ⇒ “hard” ? max = ω (1) ⇒ “easy” � E Y ∼ Q [ f ( Y ) 2 ] f degree ≤ D 9 / 19

  23. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] ◮ Idea: look for a low-degree polynomial (of Y ) that distinguishes P (spiked tensor) and Q (pure noise) � O (1) E Y ∼ P [ f ( Y )] ⇒ “hard” ? max = ω (1) ⇒ “easy” � E Y ∼ Q [ f ( Y ) 2 ] f degree ≤ D ◮ Take deg- D polynomials as a proxy for n ˜ Θ( D ) -time algorithms 9 / 19

  24. Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] ◮ Idea: look for a low-degree polynomial (of Y ) that distinguishes P (spiked tensor) and Q (pure noise) � O (1) E Y ∼ P [ f ( Y )] ⇒ “hard” ? max = ω (1) ⇒ “easy” � E Y ∼ Q [ f ( Y ) 2 ] f degree ≤ D ◮ Take deg- D polynomials as a proxy for n ˜ Θ( D ) -time algorithms For more, see the survey Kunisky-W.-Bandeira, “Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio”, arXiv:1907.11636 9 / 19

  25. Our Contributions 10 / 19

  26. Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy 10 / 19

  27. Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff 10 / 19

  28. Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff ◮ This refines and “redeems” the statistical physics approach to algorithm design 10 / 19

  29. Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff ◮ This refines and “redeems” the statistical physics approach to algorithm design ◮ Our algorithms and analysis are simpler than prior work 10 / 19

  30. Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff ◮ This refines and “redeems” the statistical physics approach to algorithm design ◮ Our algorithms and analysis are simpler than prior work ◮ This talk: even-order tensors only 10 / 19

  31. Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff ◮ This refines and “redeems” the statistical physics approach to algorithm design ◮ Our algorithms and analysis are simpler than prior work ◮ This talk: even-order tensors only ◮ Similar results for refuting random XOR formulas 10 / 19

  32. Motivating the Algorithm: Belief Propagation / AMP 11 / 19

  33. Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y 11 / 19

  34. Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] 11 / 19

  35. Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] Find distribution µ over {± 1 } n minimizing free energy F ( µ ) = E ( µ ) − S ( µ ) ◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[ x | Y ] 11 / 19

  36. Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] Find distribution µ over {± 1 } n minimizing free energy F ( µ ) = E ( µ ) − S ( µ ) ◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[ x | Y ] Problem: need exponentially-many parameters to describe µ 11 / 19

  37. Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] Find distribution µ over {± 1 } n minimizing free energy F ( µ ) = E ( µ ) − S ( µ ) ◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[ x | Y ] Problem: need exponentially-many parameters to describe µ BP/AMP: just keep track of marginals m i = E [ x i ] and minimize a proxy, Bethe free energy B ( m ) 11 / 19

  38. Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] Find distribution µ over {± 1 } n minimizing free energy F ( µ ) = E ( µ ) − S ( µ ) ◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[ x | Y ] Problem: need exponentially-many parameters to describe µ BP/AMP: just keep track of marginals m i = E [ x i ] and minimize a proxy, Bethe free energy B ( m ) ◮ Locally minimize B ( m ) via iterative update 11 / 19

  39. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) 12 / 19

  40. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: 12 / 19

  41. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) 12 / 19

  42. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] 12 / 19

  43. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] Various ways to locally minimize Kikuchi free energy 12 / 19

  44. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] Various ways to locally minimize Kikuchi free energy ◮ Gradient descent 12 / 19

  45. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] Various ways to locally minimize Kikuchi free energy ◮ Gradient descent ◮ Generalized belief propagation (GBP) [Yedidia-Freeman-Weiss ’03] 12 / 19

  46. Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] Various ways to locally minimize Kikuchi free energy ◮ Gradient descent ◮ Generalized belief propagation (GBP) [Yedidia-Freeman-Weiss ’03] ◮ We will use a spectral method based on the Kikuchi Hessian 12 / 19

  47. The Kikuchi Hessian 13 / 19

  48. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] 13 / 19

  49. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } 13 / 19

  50. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 13 / 19

  51. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ 13 / 19

  52. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ ◮ Algorithm: compute bottom eigenvector of H 13 / 19

  53. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ ◮ Algorithm: compute bottom eigenvector of H ◮ Why: best direction of local improvement 13 / 19

  54. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ ◮ Algorithm: compute bottom eigenvector of H ◮ Why: best direction of local improvement ◮ Spectral method with performance essentially as good as BP for community detection 13 / 19

  55. The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ ◮ Algorithm: compute bottom eigenvector of H ◮ Why: best direction of local improvement ◮ Spectral method with performance essentially as good as BP for community detection Our approach: Kikuchi Hessian ◮ Bottom eigenvector of Hessian of K ( m ) with respect to moments m = { m i , m ij , . . . } 13 / 19

  56. The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. 14 / 19

  57. The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian 14 / 19

  58. The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M 14 / 19

  59. The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M ◮ Runtime: n O ( ℓ ) 14 / 19

  60. The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M ◮ Runtime: n O ( ℓ ) ◮ The case ℓ = p / 2 is “tensor unfolding,” which is poly-time and succeeds up to the SoS threshold 14 / 19

  61. The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M ◮ Runtime: n O ( ℓ ) ◮ The case ℓ = p / 2 is “tensor unfolding,” which is poly-time and succeeds up to the SoS threshold ◮ ℓ = n δ gives an algorithm of runtime n O ( n ℓ ) = 2 n δ + o (1) 14 / 19

  62. Intuition for Symmetric Difference Matrix Recall: M S , T = ✶ | S △ T | = p Y S △ T where | S | = | T | = ℓ 15 / 19

  63. Intuition for Symmetric Difference Matrix Recall: M S , T = ✶ | S △ T | = p Y S △ T where | S | = | T | = ℓ Compute top eigenvector via power iteration: v ← Mv ◮ v ∈ R ( n ℓ ) where v S is an estimate of x S := � i ∈ S x i 15 / 19

  64. Intuition for Symmetric Difference Matrix Recall: M S , T = ✶ | S △ T | = p Y S △ T where | S | = | T | = ℓ Compute top eigenvector via power iteration: v ← Mv ◮ v ∈ R ( n ℓ ) where v S is an estimate of x S := � i ∈ S x i Expand formula v ← Mv : � v S ← Y S △ T v T T : | S △ T | = p ◮ Recall: Y S △ T is a noisy measurement of x S △ T ◮ So Y S △ T v T is T ’s opinion about x S 15 / 19

  65. Intuition for Symmetric Difference Matrix Recall: M S , T = ✶ | S △ T | = p Y S △ T where | S | = | T | = ℓ Compute top eigenvector via power iteration: v ← Mv ◮ v ∈ R ( n ℓ ) where v S is an estimate of x S := � i ∈ S x i Expand formula v ← Mv : � v S ← Y S △ T v T T : | S △ T | = p ◮ Recall: Y S △ T is a noisy measurement of x S △ T ◮ So Y S △ T v T is T ’s opinion about x S This is a message-passing algorithm among sets of size ℓ 15 / 19

  66. ✶ Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) 16 / 19

  67. Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix M S , T = ✶ | S △ T | = p Y S △ T , threshold maximum eigenvalue 16 / 19

  68. Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix M S , T = ✶ | S △ T | = p Y S △ T , threshold maximum eigenvalue Key step: bound spectral norm � M � when Y ∼ i.i.d. N (0 , 1) 16 / 19

  69. Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix M S , T = ✶ | S △ T | = p Y S △ T , threshold maximum eigenvalue Key step: bound spectral norm � M � when Y ∼ i.i.d. N (0 , 1) Theorem (Matrix Chernoff Bound [Oliveira ’10, Tropp ’10] ) Let M = � i z i A i where z i ∼ N (0 , 1) independently and { A i } is a finite sequence of fixed symmetric d × d matrices. Then, for all t ≥ 0 , � � σ 2 = � � P ( � M � ≥ t ) ≤ 2 de − t 2 / 2 σ 2 � ( A i ) 2 � . where � � � � � i 16 / 19

  70. Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix M S , T = ✶ | S △ T | = p Y S △ T , threshold maximum eigenvalue Key step: bound spectral norm � M � when Y ∼ i.i.d. N (0 , 1) Theorem (Matrix Chernoff Bound [Oliveira ’10, Tropp ’10] ) Let M = � i z i A i where z i ∼ N (0 , 1) independently and { A i } is a finite sequence of fixed symmetric d × d matrices. Then, for all t ≥ 0 , � � σ 2 = � � P ( � M � ≥ t ) ≤ 2 de − t 2 / 2 σ 2 � ( A i ) 2 � . where � � � � � i i ( A i ) 2 is a multiple of the identity In our case, � 16 / 19

  71. Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max 17 / 19

  72. Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so 17 / 19

  73. Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so ◮ Each entry of M is a degree-2 ℓ/ p polynomial in Y 17 / 19

  74. Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so ◮ Each entry of M is a degree-2 ℓ/ p polynomial in Y ◮ Analysis: trace moment method (complicated) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] 17 / 19

  75. Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so ◮ Each entry of M is a degree-2 ℓ/ p polynomial in Y ◮ Analysis: trace moment method (complicated) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Our method: instead find M (symm. diff. matrix) such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p �� x � 2 ℓ − p � Y � inj ≤ � M � and so 17 / 19

  76. Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so ◮ Each entry of M is a degree-2 ℓ/ p polynomial in Y ◮ Analysis: trace moment method (complicated) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Our method: instead find M (symm. diff. matrix) such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p �� x � 2 ℓ − p � Y � inj ≤ � M � and so ◮ Each entry of M is a degree-1 polynomial in Y 17 / 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend