The Kikuchi Hierarchy and Tensor PCA Alex Wein Courant Institute, - PowerPoint PPT Presentation

Algorithms for Tensor PCA Local algorithms : keep track of a “guess” v ∈ R n and locally maximize the log-likelihood L ( v ) = � Y , v ⊗ p � ◮ Gradient descent [Ben Arous-Gheissari-Jagannath ’18] ◮ Tensor power iteration [Richard-Montanari ’14] ◮ Langevin dynamics [Ben Arous-Gheissari-Jagannath ’18] 6 / 19

Algorithms for Tensor PCA Local algorithms : keep track of a “guess” v ∈ R n and locally maximize the log-likelihood L ( v ) = � Y , v ⊗ p � ◮ Gradient descent [Ben Arous-Gheissari-Jagannath ’18] ◮ Tensor power iteration [Richard-Montanari ’14] ◮ Langevin dynamics [Ben Arous-Gheissari-Jagannath ’18] ◮ Approximate message passing (AMP) [Richard-Montanari ’14] 6 / 19

Algorithms for Tensor PCA Local algorithms : keep track of a “guess” v ∈ R n and locally maximize the log-likelihood L ( v ) = � Y , v ⊗ p � ◮ Gradient descent [Ben Arous-Gheissari-Jagannath ’18] ◮ Tensor power iteration [Richard-Montanari ’14] ◮ Langevin dynamics [Ben Arous-Gheissari-Jagannath ’18] ◮ Approximate message passing (AMP) [Richard-Montanari ’14] These only succeed when λ ≫ n − 1 / 2 ◮ Recall: MLE works for λ ∼ n (1 − p ) / 2 6 / 19

Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : 7 / 19

Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] 7 / 19

Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] 7 / 19

Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15] 7 / 19

Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15] These are poly-time and succeed when λ ≫ n − p / 4 7 / 19

Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15] These are poly-time and succeed when λ ≫ n − p / 4 SoS lower bounds suggest no poly-time algorithm when λ ≪ n − p / 4 [Hopkins-Shi-Steurer ’15, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17] 7 / 19

Algorithms for Tensor PCA Sum-of-squares (SoS) and spectral methods : ◮ SoS semidefinite program [Hopkins-Shi-Steurer ’15] ◮ Spectral SoS [Hopkins-Shi-Steurer ’15, Hopkins-Schramm-Shi-Steurer ’15] ◮ Tensor unfolding [Richard-Montanari ’14, Hopkins-Shi-Steurer ’15] These are poly-time and succeed when λ ≫ n − p / 4 SoS lower bounds suggest no poly-time algorithm when λ ≪ n − p / 4 [Hopkins-Shi-Steurer ’15, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17] n (1 − p ) / 2 n − p / 4 n − 1 / 2 0 λ impossible MLE hard SoS !!! Local Local algorithms (gradient descent, AMP, ...) are suboptimal when p ≥ 3 7 / 19

Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) 8 / 19

Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] 8 / 19

Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Interpolates between SoS and MLE: ◮ δ = 0 poly-time algorithm for λ ∼ n − p / 4 ⇒ ◮ δ = 1 2 n -time algorithm for λ ∼ n (1 − p ) / 2 ⇒ 8 / 19

Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Interpolates between SoS and MLE: ◮ δ = 0 poly-time algorithm for λ ∼ n − p / 4 ⇒ ◮ δ = 1 2 n -time algorithm for λ ∼ n (1 − p ) / 2 ⇒ n (1 − p ) / 2 n − p / 4 n − 1 / 2 0 λ impossible MLE hard SoS Local 8 / 19

Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Interpolates between SoS and MLE: ◮ δ = 0 poly-time algorithm for λ ∼ n − p / 4 ⇒ ◮ δ = 1 2 n -time algorithm for λ ∼ n (1 − p ) / 2 ⇒ n (1 − p ) / 2 n − p / 4 n − 1 / 2 0 λ impossible MLE hard SoS Local In contrast, some problems have a sharp threshold ◮ E.g., λ > 1 is nearly-linear time; λ < 1 needs time 2 n 8 / 19

Subexponential-Time Algorithms Subexponential-time: 2 n δ for δ ∈ (0 , 1) Tensor PCA has a smooth tradeoff between runtime and statistical power: for δ ∈ (0 , 1), there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Interpolates between SoS and MLE: ◮ δ = 0 poly-time algorithm for λ ∼ n − p / 4 ⇒ ◮ δ = 1 2 n -time algorithm for λ ∼ n (1 − p ) / 2 ⇒ n (1 − p ) / 2 n − p / 4 n − 1 / 2 0 λ impossible MLE hard SoS Local In contrast, some problems have a sharp threshold ◮ E.g., λ > 1 is nearly-linear time; λ < 1 needs time 2 n For “soft” thresholds (like tensor PCA): BP/AMP can’t be optimal 8 / 19

Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) 9 / 19

Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio 9 / 19

Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems 9 / 19

Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] 9 / 19

Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] ◮ Idea: look for a low-degree polynomial (of Y ) that distinguishes P (spiked tensor) and Q (pure noise) 9 / 19

Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] ◮ Idea: look for a low-degree polynomial (of Y ) that distinguishes P (spiked tensor) and Q (pure noise) � O (1) E Y ∼ P [ f ( Y )] ⇒ “hard” ? max = ω (1) ⇒ “easy” � E Y ∼ Q [ f ( Y ) 2 ] f degree ≤ D 9 / 19

Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] ◮ Idea: look for a low-degree polynomial (of Y ) that distinguishes P (spiked tensor) and Q (pure noise) � O (1) E Y ∼ P [ f ( Y )] ⇒ “hard” ? max = ω (1) ⇒ “easy” � E Y ∼ Q [ f ( Y ) 2 ] f degree ≤ D ◮ Take deg- D polynomials as a proxy for n ˜ Θ( D ) -time algorithms 9 / 19

Aside: Low-Degree Likelihood Ratio Recall: there is a 2 n δ -time algorithm for λ ∼ n − p / 4+ δ (1 / 2 − p / 4) Evidence that this tradeoff is optimal: low-degree likelihood ratio ◮ A relatively simple calculation that predicts the computational complexity of high-dimensional inference problems ◮ Arose from the study of SoS lower bounds, pseudo-calibration [Barak-Hopkins-Kelner-Kothari-Moitra-Potechin ’16, Hopkins-Steurer ’17, Hopkins-Kothari-Potechin-Raghavendra-Schramm-Steurer ’17, Hopkins PhD thesis ’18] ◮ Idea: look for a low-degree polynomial (of Y ) that distinguishes P (spiked tensor) and Q (pure noise) � O (1) E Y ∼ P [ f ( Y )] ⇒ “hard” ? max = ω (1) ⇒ “easy” � E Y ∼ Q [ f ( Y ) 2 ] f degree ≤ D ◮ Take deg- D polynomials as a proxy for n ˜ Θ( D ) -time algorithms For more, see the survey Kunisky-W.-Bandeira, “Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio”, arXiv:1907.11636 9 / 19

Our Contributions 10 / 19

Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy 10 / 19

Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff 10 / 19

Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff ◮ This refines and “redeems” the statistical physics approach to algorithm design 10 / 19

Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff ◮ This refines and “redeems” the statistical physics approach to algorithm design ◮ Our algorithms and analysis are simpler than prior work 10 / 19

Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff ◮ This refines and “redeems” the statistical physics approach to algorithm design ◮ Our algorithms and analysis are simpler than prior work ◮ This talk: even-order tensors only 10 / 19

Our Contributions ◮ We give a hierarchy of increasingly powerful BP/AMP-type algorithms: level ℓ requires n O ( ℓ ) time ◮ Analogous to SoS hierarchy ◮ We prove that these algorithms match the performance of SoS ◮ Both for poly-time and for subexponential-time tradeoff ◮ This refines and “redeems” the statistical physics approach to algorithm design ◮ Our algorithms and analysis are simpler than prior work ◮ This talk: even-order tensors only ◮ Similar results for refuting random XOR formulas 10 / 19

Motivating the Algorithm: Belief Propagation / AMP 11 / 19

Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y 11 / 19

Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] 11 / 19

Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] Find distribution µ over {± 1 } n minimizing free energy F ( µ ) = E ( µ ) − S ( µ ) ◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[ x | Y ] 11 / 19

Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] Find distribution µ over {± 1 } n minimizing free energy F ( µ ) = E ( µ ) − S ( µ ) ◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[ x | Y ] Problem: need exponentially-many parameters to describe µ 11 / 19

Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] Find distribution µ over {± 1 } n minimizing free energy F ( µ ) = E ( µ ) − S ( µ ) ◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[ x | Y ] Problem: need exponentially-many parameters to describe µ BP/AMP: just keep track of marginals m i = E [ x i ] and minimize a proxy, Bethe free energy B ( m ) 11 / 19

Motivating the Algorithm: Belief Propagation / AMP General setup: unknown signal x ∈ {± 1 } n , observed data Y Want to understand posterior Pr[ x | Y ] Find distribution µ over {± 1 } n minimizing free energy F ( µ ) = E ( µ ) − S ( µ ) ◮ “Energy” and “entropy” terms ◮ The unique minimizer is Pr[ x | Y ] Problem: need exponentially-many parameters to describe µ BP/AMP: just keep track of marginals m i = E [ x i ] and minimize a proxy, Bethe free energy B ( m ) ◮ Locally minimize B ( m ) via iterative update 11 / 19

Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) 12 / 19

Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: 12 / 19

Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) 12 / 19

Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] 12 / 19

Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] Various ways to locally minimize Kikuchi free energy 12 / 19

Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] Various ways to locally minimize Kikuchi free energy ◮ Gradient descent 12 / 19

Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] Various ways to locally minimize Kikuchi free energy ◮ Gradient descent ◮ Generalized belief propagation (GBP) [Yedidia-Freeman-Weiss ’03] 12 / 19

Generalized BP and Kikuchi Free Energy Recall: BP/AMP keeps track of marginals m i = E [ x i ] and minimizes Bethe free energy B ( m ) Natural higher-order variant: ◮ Keep track of m i = E [ x i ], m ij = E [ x i x j ] , . . . (up to degree ℓ ) ◮ Minimize Kikuchi free energy K ℓ ( m ) [Kikuchi ’51] Various ways to locally minimize Kikuchi free energy ◮ Gradient descent ◮ Generalized belief propagation (GBP) [Yedidia-Freeman-Weiss ’03] ◮ We will use a spectral method based on the Kikuchi Hessian 12 / 19

The Kikuchi Hessian 13 / 19

The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] 13 / 19

The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } 13 / 19

The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 13 / 19

The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ 13 / 19

The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ ◮ Algorithm: compute bottom eigenvector of H 13 / 19

The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ ◮ Algorithm: compute bottom eigenvector of H ◮ Why: best direction of local improvement 13 / 19

The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ ◮ Algorithm: compute bottom eigenvector of H ◮ Why: best direction of local improvement ◮ Spectral method with performance essentially as good as BP for community detection 13 / 19

The Kikuchi Hessian Bethe Hessian approach [Saade-Krzakala-Zdeborov´ a ’14] ◮ Recall: want to minimize B ( m ) with respect to m = { m i } ◮ Trivial “uninformative” stationary point m ∗ where ∇B ( m ) = 0 ∂ 2 B ◮ Bethe Hessian matrix H ij = ∂ m i ∂ m j | m = m ∗ ◮ Algorithm: compute bottom eigenvector of H ◮ Why: best direction of local improvement ◮ Spectral method with performance essentially as good as BP for community detection Our approach: Kikuchi Hessian ◮ Bottom eigenvector of Hessian of K ( m ) with respect to moments m = { m i , m ij , . . . } 13 / 19

The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. 14 / 19

The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian 14 / 19

The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M 14 / 19

The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M ◮ Runtime: n O ( ℓ ) 14 / 19

The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M ◮ Runtime: n O ( ℓ ) ◮ The case ℓ = p / 2 is “tensor unfolding,” which is poly-time and succeeds up to the SoS threshold 14 / 19

The Algorithm Definition (Symmetric Difference Matrix) Input: an order- p tensor Y = ( Y U ) | U | = p (with p even) and an � n � n � � integer ℓ in the range p / 2 ≤ ℓ ≤ n − p / 2. Define the × ℓ ℓ matrix (indexed by ℓ -subsets of [ n ]) � Y S △ T if | S △ T | = p , M S , T = 0 otherwise. ◮ This is (approximately) a submatrix of the Kikuchi Hessian ◮ Algorithm: compute leading eigenvalue/eigenvector of M ◮ Runtime: n O ( ℓ ) ◮ The case ℓ = p / 2 is “tensor unfolding,” which is poly-time and succeeds up to the SoS threshold ◮ ℓ = n δ gives an algorithm of runtime n O ( n ℓ ) = 2 n δ + o (1) 14 / 19

Intuition for Symmetric Difference Matrix Recall: M S , T = ✶ | S △ T | = p Y S △ T where | S | = | T | = ℓ 15 / 19

Intuition for Symmetric Difference Matrix Recall: M S , T = ✶ | S △ T | = p Y S △ T where | S | = | T | = ℓ Compute top eigenvector via power iteration: v ← Mv ◮ v ∈ R ( n ℓ ) where v S is an estimate of x S := � i ∈ S x i 15 / 19

Intuition for Symmetric Difference Matrix Recall: M S , T = ✶ | S △ T | = p Y S △ T where | S | = | T | = ℓ Compute top eigenvector via power iteration: v ← Mv ◮ v ∈ R ( n ℓ ) where v S is an estimate of x S := � i ∈ S x i Expand formula v ← Mv : � v S ← Y S △ T v T T : | S △ T | = p ◮ Recall: Y S △ T is a noisy measurement of x S △ T ◮ So Y S △ T v T is T ’s opinion about x S 15 / 19

Intuition for Symmetric Difference Matrix Recall: M S , T = ✶ | S △ T | = p Y S △ T where | S | = | T | = ℓ Compute top eigenvector via power iteration: v ← Mv ◮ v ∈ R ( n ℓ ) where v S is an estimate of x S := � i ∈ S x i Expand formula v ← Mv : � v S ← Y S △ T v T T : | S △ T | = p ◮ Recall: Y S △ T is a noisy measurement of x S △ T ◮ So Y S △ T v T is T ’s opinion about x S This is a message-passing algorithm among sets of size ℓ 15 / 19

✶ Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) 16 / 19

Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix M S , T = ✶ | S △ T | = p Y S △ T , threshold maximum eigenvalue 16 / 19

Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix M S , T = ✶ | S △ T | = p Y S △ T , threshold maximum eigenvalue Key step: bound spectral norm � M � when Y ∼ i.i.d. N (0 , 1) 16 / 19

Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix M S , T = ✶ | S △ T | = p Y S △ T , threshold maximum eigenvalue Key step: bound spectral norm � M � when Y ∼ i.i.d. N (0 , 1) Theorem (Matrix Chernoff Bound [Oliveira ’10, Tropp ’10] ) Let M = � i z i A i where z i ∼ N (0 , 1) independently and { A i } is a finite sequence of fixed symmetric d × d matrices. Then, for all t ≥ 0 , � � σ 2 = � � P ( � M � ≥ t ) ≤ 2 de − t 2 / 2 σ 2 � ( A i ) 2 � . where � � � � � i 16 / 19

Analysis Simplest statistical task: detection ◮ Distinguish between λ = ¯ λ (spiked tensor) and λ = 0 (noise) Algorithm: given Y , build matrix M S , T = ✶ | S △ T | = p Y S △ T , threshold maximum eigenvalue Key step: bound spectral norm � M � when Y ∼ i.i.d. N (0 , 1) Theorem (Matrix Chernoff Bound [Oliveira ’10, Tropp ’10] ) Let M = � i z i A i where z i ∼ N (0 , 1) independently and { A i } is a finite sequence of fixed symmetric d × d matrices. Then, for all t ≥ 0 , � � σ 2 = � � P ( � M � ≥ t ) ≤ 2 de − t 2 / 2 σ 2 � ( A i ) 2 � . where � � � � � i i ( A i ) 2 is a multiple of the identity In our case, � 16 / 19

Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max 17 / 19

Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so 17 / 19

Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so ◮ Each entry of M is a degree-2 ℓ/ p polynomial in Y 17 / 19

Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so ◮ Each entry of M is a degree-2 ℓ/ p polynomial in Y ◮ Analysis: trace moment method (complicated) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] 17 / 19

Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so ◮ Each entry of M is a degree-2 ℓ/ p polynomial in Y ◮ Analysis: trace moment method (complicated) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Our method: instead find M (symm. diff. matrix) such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p �� x � 2 ℓ − p � Y � inj ≤ � M � and so 17 / 19

Comparison to Prior Work SoS approach: given noise tensor Y , want to certify (prove) an upper bound on tensor injective norm � x � =1 |� Y , x ⊗ p �| � Y � inj := max Spectral certification: find an n ℓ × n ℓ matrix M such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p � 2 ℓ/ p � Y � inj ≤ � M � p / 2 ℓ and so ◮ Each entry of M is a degree-2 ℓ/ p polynomial in Y ◮ Analysis: trace moment method (complicated) [Raghavendra-Rao-Schramm ’16, Bhattiprolu-Guruswami-Lee ’16] Our method: instead find M (symm. diff. matrix) such that ( x ⊗ ℓ ) ⊤ M ( x ⊗ ℓ ) = � Y , x ⊗ p �� x � 2 ℓ − p � Y � inj ≤ � M � and so ◮ Each entry of M is a degree-1 polynomial in Y 17 / 19

The Kikuchi Hierarchy and Tensor PCA Alex Wein Courant Institute, - PowerPoint PPT Presentation

The Kikuchi Hierarchy and Tensor PCA Alex Wein Courant Institute, NYU Joint work with: Ahmed El Alaoui Cris Moore Stanford Santa Fe Institute 1 / 19 Statistical Physics of Inference High-dimensional inference problems: compressed

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

Kengo Kikuchi Riken iTHEMS QFT-core seminar Riken 2020/05/15 Introduce myself Kengo Kikuchi

Finding Protein Folding Funnels in Random Networks Macoto Kikuchi kikuchi@cmc.osaka-u.ac.jp

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Connections in Japanese Chemistry: The Lab as Contact Zone Yoshiyuki KIKUCHI (International

Hierarchy of School Marketing Needs Leadership Day - February 16, 2018 Maslows Hierarchy of

Extensions of the Caucal Hierarchy? Pawe Parys University of Warsaw LATA 2019 Caucal

Rediscover Google AMP Learn to integrate AMP with your Drupal project Twin Cities Drupal Camp |

Low-rank Matrix Estimation via Approximate Message Passing Andrea Montanari Ramji Venkataramanan

AMP: Accelerated Mobile Pages Re-Imagined Karen Stevenson Director of Technology L U L L A B O

Cloud Gateways Suli Yang, Kiran Srinivasan, Kishore Udayashankar, Swetha Krishnan, Jingxin Feng,

Cold Elec. in Milano: updates and plans Claudio Gotti Electronics WG Meeting 9th october 2019

Generalized Approximate Survey Propagation for Hig igh-dimensional Estimation Luca Saglietti Yue

VLSID 2016 KOLKATA, INDIA January 4-8, 2016 Massed Refresh: An Energy-Efficient Technique to

Productivity to Amp Up Your Results @fundraiserchad Who is this guy? And why does he think he

Sambuz

Useful Links

Newsletter

Mail Us