learning without correspondence
play

Learning without correspondence Daniel Hsu Computer Science - PowerPoint PPT Presentation

Learning without correspondence Daniel Hsu Computer Science Department & Data Science Institute Columbia University Introduction Example #1: unlinked data sources Two separate data sources about same entities: Record linkage unknown.


  1. Least squares problem (Observed by Pananjady, Wainwright, & Courtade, 2016.) Reduction from 3 -PARTITION (H., Shi, & Sun, 2017) . 9 Given ( x i ) n i = 1 from R d and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β , π ) := x ⊤ i β − y π ( i ) i = 1 • d = 1 : O ( n log n ) -time algorithm. • d = Ω( n ) : (strongly) NP-hard to decide if min F = 0 . Naïve brute-force search : Ω( | S n | ) = Ω( n !) . Least squares with known correspondence : O ( nd 2 ) time.

  2. 10 25 5 22 2 25 5 20 2 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 y 1 x 1 y 2 x 2 . . . . . . y n x n

  3. 10 25 5 22 2 25 5 20 2 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 2 � x 1 y 1 � β − 2 � x 2 y 2 � β − . . . . . . 2 � x n y n � β − Cost with π ( i ) = i for all i = 1 , . . . , n .

  4. 10 25 5 22 2 25 5 20 2 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 2 � � β 3 2 − 2 � � β 4 1 − . . . . . . 2 � � β 6 7 − Cost with π ( i ) = i for all i = 1 , . . . , n .

  5. 10 25 5 22 2 25 5 20 2 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 2 � � β 3 2 − 2 � � β 4 1 − . . . . . . 2 � � β 6 7 − If β > 0 , then can improve cost with π ( 1 ) = 2 and π ( 2 ) = 1 .

  6. 10 Least squares problem ( d = 1 ) Given ( x i ) n i = 1 and ( y i ) n i = 1 from R , minimize n ) 2 . ( ∑ F ( β, π ) := x i β − y π ( i ) i = 1 2 2 � � � � β β 3 2 3 1 − − 2 2 � � � � β β 4 1 4 2 − − > . . . . . . . . . . . . 2 2 � � � � β β 6 7 6 7 − − If β > 0 , then can improve cost with π ( 1 ) = 2 and π ( 2 ) = 1 . 25 β 2 − 20 β + 5 + · · · > 25 β 2 − 22 β + 5 + · · ·

  7. 11 3. Solve classical least squares problem 1 ? What about . Overall running time : . to get optimal 2 1 (via sorting). 2 1 such that fjnd optimal , 2 1 2. Assuming WLOG that Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.)

  8. (via sorting). 3. Solve classical least squares problem 1 2 to get optimal . Overall running time : . What about 1 ? 11 Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.) 2. Assuming WLOG that x 1 β ≤ x 2 β · · · ≤ x n β , fjnd optimal π such that y π ( 1 ) ≤ y π ( 2 ) ≤ · · · ≤ y π ( n )

  9. 11 Overall running time : 1 ? What about (via sorting). 3. Solve classical least squares problem . Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.) 2. Assuming WLOG that x 1 β ≤ x 2 β · · · ≤ x n β , fjnd optimal π such that y π ( 1 ) ≤ y π ( 2 ) ≤ · · · ≤ y π ( n ) n ∑ min ( x i β − y π ( i ) ) 2 β ∈ R i = 1 to get optimal β .

  10. 11 3. Solve classical least squares problem 1 ? What about (via sorting). Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.) 2. Assuming WLOG that x 1 β ≤ x 2 β · · · ≤ x n β , fjnd optimal π such that y π ( 1 ) ≤ y π ( 2 ) ≤ · · · ≤ y π ( n ) n ∑ min ( x i β − y π ( i ) ) 2 β ∈ R i = 1 to get optimal β . Overall running time : O ( n log n ) .

  11. (via sorting). 3. Solve classical least squares problem 11 Algorithm for least squares problem ( d = 1 ) [PWC’16] 1. “Guess” sign of optimal β . (Only two possibilities.) 2. Assuming WLOG that x 1 β ≤ x 2 β · · · ≤ x n β , fjnd optimal π such that y π ( 1 ) ≤ y π ( 2 ) ≤ · · · ≤ y π ( n ) n ∑ min ( x i β − y π ( i ) ) 2 β ∈ R i = 1 to get optimal β . Overall running time : O ( n log n ) . What about d > 1 ?

  12. Alternating minimization • Each loop-iteration effjciently computable. ( Open : How many restarts? How many iterations?) . So try many initial • But can get stuck in local minima. 12 Loop until convergence: Pick initial ˆ β ∈ R d (e.g., randomly). n ) 2 . ( i ˆ ∑ x ⊤ π ← arg min ˆ β − y π ( i ) π ∈ S n i = 1 n ) 2 . ( ˆ ∑ x ⊤ β ← arg min i β − y ˆ π ( i ) β ∈ R d i = 1

  13. Alternating minimization • Each loop-iteration effjciently computable. ( Open : How many restarts? How many iterations?) . So try many initial • But can get stuck in local minima. 12 Loop until convergence: Pick initial ˆ β ∈ R d (e.g., randomly). n ) 2 . ( i ˆ ∑ x ⊤ π ← arg min ˆ β − y π ( i ) π ∈ S n i = 1 n ) 2 . ( ˆ ∑ x ⊤ β ← arg min i β − y ˆ π ( i ) β ∈ R d i = 1

  14. Alternating minimization • Each loop-iteration effjciently computable. ( Open : How many restarts? How many iterations?) . So try many initial • But can get stuck in local minima. 12 Loop until convergence: Pick initial ˆ β ∈ R d (e.g., randomly). n ) 2 . ( i ˆ ∑ x ⊤ π ← arg min ˆ β − y π ( i ) π ∈ S n i = 1 n ) 2 . ( ˆ ∑ x ⊤ β ← arg min i β − y ˆ π ( i ) β ∈ R d i = 1

  15. Alternating minimization (Image credit: Wolfram|Alpha ) • Each loop-iteration effjciently computable. • But can get stuck in local minima. So try many initial . ( Open : How many restarts? How many iterations?) 12

  16. Alternating minimization (Image credit: Wolfram|Alpha ) • Each loop-iteration effjciently computable. ( Open : How many restarts? How many iterations?) 12 • But can get stuck in local minima. So try many initial ˆ β ∈ R d .

  17. Approximation result Theorem (H., Shi, & Sun, 2017) least squares problem in time Recall : Brute-force solution needs time. (No other previous algorithm with approximation guarantee.) 13 There is an algorithm that given any inputs ( x i ) n i = 1 , ( y i ) n i = 1 , and ϵ ∈ ( 0 , 1 ) , returns a ( 1 + ϵ ) -approximate solution to the ) O ( d ) ( n + poly( n, d ) . ϵ

  18. Approximation result Theorem (H., Shi, & Sun, 2017) least squares problem in time (No other previous algorithm with approximation guarantee.) 13 There is an algorithm that given any inputs ( x i ) n i = 1 , ( y i ) n i = 1 , and ϵ ∈ ( 0 , 1 ) , returns a ( 1 + ϵ ) -approximate solution to the ) O ( d ) ( n + poly( n, d ) . ϵ Recall : Brute-force solution needs Ω( n !) time.

  19. algorithms and lower-bounds Statistical recovery of β ∗ :

  20. Motivation Approach : Study question in context of statistical model for data. 1. Understand information-theoretic limits on recovering truth. 2. Natural “average-case” setting for algorithms. 14 When does the best fjt model shed light on the “truth” ( π ∗ & β ∗ )?

  21. Motivation Approach : Study question in context of statistical model for data. 1. Understand information-theoretic limits on recovering truth. 2. Natural “average-case” setting for algorithms. 14 When does the best fjt model shed light on the “truth” ( π ∗ & β ∗ )?

  22. Motivation Approach : Study question in context of statistical model for data. 1. Understand information-theoretic limits on recovering truth. 2. Natural “average-case” setting for algorithms. 14 When does the best fjt model shed light on the “truth” ( π ∗ & β ∗ )?

  23. Statistical model Recoverability of . to approximately recover Just need is known) : Classical setting (where 2 2 2 depends on signal-to-noise ratio : 15 x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) Assume ( x i ) n i = 1 iid from P and ( ε i ) n i = 1 iid from N( 0 , σ 2 ) .

  24. Statistical model 2 . to approximately recover Just need is known) : Classical setting (where 15 x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) Assume ( x i ) n i = 1 iid from P and ( ε i ) n i = 1 iid from N( 0 , σ 2 ) . Recoverability of β ∗ depends on signal-to-noise ratio : SNR := ∥ β ∗ ∥ 2 . σ 2

  25. Statistical model 2 15 x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) Assume ( x i ) n i = 1 iid from P and ( ε i ) n i = 1 iid from N( 0 , σ 2 ) . Recoverability of β ∗ depends on signal-to-noise ratio : SNR := ∥ β ∗ ∥ 2 . σ 2 Classical setting (where π ∗ is known) : Just need SNR ≳ d/n to approximately recover β ∗ .

  26. 1 and 2 can improve with High-level intuition 2 . unknown : distinguishability is less clear. 1 1 1 0 if Suppose 1 2 1 0 2 if 2 ( denotes unordered multi-set .) known : distinguishability of 16 is either 1 1 0 0 0 or 2 0 1 0 0 . x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n )

  27. 1 and 2 can improve with if . unknown : distinguishability is less clear. 1 1 1 0 2 High-level intuition known : distinguishability of 2 1 0 2 if 2 ( denotes unordered multi-set .) 1 16 Suppose β ∗ is either e 1 = ( 1 , 0 , 0 , . . . , 0 ) or e 2 = ( 0 , 1 , 0 , . . . , 0 ) . x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n )

  28. High-level intuition 2 1 1 0 2 if 1 1 unknown : distinguishability is less clear. 0 2 if 2 ( denotes unordered multi-set .) 1 16 Suppose β ∗ is either e 1 = ( 1 , 0 , 0 , . . . , 0 ) or e 2 = ( 0 , 1 , 0 , . . . , 0 ) . x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) π ∗ known : distinguishability of e 1 and e 2 can improve with n .

  29. High-level intuition 16 Suppose β ∗ is either e 1 = ( 1 , 0 , 0 , . . . , 0 ) or e 2 = ( 0 , 1 , 0 , . . . , 0 ) . x ⊤ y 1 ε 1 π ∗ (1) x ⊤ y 2 ε 2 π ∗ (2) β ∗ = + . . . . . . . . . x ⊤ y n ε n π ∗ ( n ) π ∗ known : distinguishability of e 1 and e 2 can improve with n . π ∗ unknown : distinguishability is less clear.  if β ∗ = e 1 , � x i, 1 � n i = 1 + N( 0 , σ 2 )   � y i � n i = 1 = if β ∗ = e 2 . � x i, 2 � n i = 1 + N( 0 , σ 2 )   ( � · � denotes unordered multi-set .)

  30. Efgect of noise With noise 2 0 17 Without noise ( P = N( 0 , I d ) ) 60 60 50 50 40 40 30 30 20 20 10 10 0 0 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 � x i, 1 � n � x i, 2 � n i = 1 i = 1

  31. Efgect of noise With noise 17 Without noise ( P = N( 0 , I d ) ) 60 60 50 50 40 40 30 30 20 20 10 10 0 0 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 � x i, 1 � n � x i, 2 � n i = 1 i = 1 60 50 40 30 20 10 0 -6 -4 -2 0 2 4 6 ??? + N( 0 , σ 2 )

  32. 18 Theorem (H., Shi, & Sun, 2017) . 1 9 , even as , must have 1 1 Another theorem : for suffjces. “Known correspondence” setting : unless 3 Lower bound on SNR For P = N( 0 , I d ) , no estimator ˆ β can guarantee ≤ ∥ β ∗ ∥ 2 [ ∥ ˆ ] β − β ∗ ∥ 2 E d SNR ≥ C · log log( n ) .

  33. 18 3 . 1 9 , even as , must have 1 1 Another theorem : for Theorem (H., Shi, & Sun, 2017) unless Lower bound on SNR For P = N( 0 , I d ) , no estimator ˆ β can guarantee ≤ ∥ β ∗ ∥ 2 [ ∥ ˆ ] β − β ∗ ∥ 2 E d SNR ≥ C · log log( n ) . “Known correspondence” setting : SNR ≳ d/n suffjces.

  34. 18 Theorem (H., Shi, & Sun, 2017) unless 3 Lower bound on SNR For P = N( 0 , I d ) , no estimator ˆ β can guarantee ≤ ∥ β ∗ ∥ 2 [ ∥ ˆ ] β − β ∗ ∥ 2 E d SNR ≥ C · log log( n ) . “Known correspondence” setting : SNR ≳ d/n suffjces. Another theorem : for P = Uniform([ − 1 , 1 ] d ) , must have SNR ≥ 1 / 9 , even as n → ∞ .

  35. 1 4 . 19 and .) (Recall: our approximate MLE algorithm has running time also permit effjcient algorithms? Does high that is correct w.p. 1 Have estimator for . Estimate sign of correlation between Previous works (Unnikrishnan, Haghighatshoar, & Vetterli, 2015; 1 ): broken random sample (DeGroot and Goel, 1980) . Related ( using Maximum Likelihood Estimation, i.e., least squares. , approximately) (and , then can recover If Pananjady, Wainwright, & Courtade, 2016): High SNR regime

  36. 1 4 . 19 Have estimator for .) (Recall: our approximate MLE algorithm has running time also permit effjcient algorithms? Does high that is correct w.p. 1 . Previous works (Unnikrishnan, Haghighatshoar, & Vetterli, 2015; and Estimate sign of correlation between 1 ): broken random sample (DeGroot and Goel, 1980) . Related ( using Maximum Likelihood Estimation, i.e., least squares. Pananjady, Wainwright, & Courtade, 2016): High SNR regime If SNR ≫ poly( n ) , then can recover π ∗ (and β ∗ , approximately)

  37. Previous works (Unnikrishnan, Haghighatshoar, & Vetterli, 2015; Pananjady, Wainwright, & Courtade, 2016): using Maximum Likelihood Estimation, i.e., least squares. Does high also permit effjcient algorithms? (Recall: our approximate MLE algorithm has running time .) 19 High SNR regime If SNR ≫ poly( n ) , then can recover π ∗ (and β ∗ , approximately) Related ( d = 1 ): broken random sample (DeGroot and Goel, 1980) . Estimate sign of correlation between x i and y i . Have estimator for sign( β ∗ ) that is correct w.p. 1 − ˜ O ( SNR − 1 / 4 ) .

  38. Previous works (Unnikrishnan, Haghighatshoar, & Vetterli, 2015; Pananjady, Wainwright, & Courtade, 2016): using Maximum Likelihood Estimation, i.e., least squares. 19 High SNR regime If SNR ≫ poly( n ) , then can recover π ∗ (and β ∗ , approximately) Related ( d = 1 ): broken random sample (DeGroot and Goel, 1980) . Estimate sign of correlation between x i and y i . Have estimator for sign( β ∗ ) that is correct w.p. 1 − ˜ O ( SNR − 1 / 4 ) . Does high SNR also permit effjcient algorithms? (Recall: our approximate MLE algorithm has running time n O ( d ) .)

  39. Average-case recovery with very high SNR

  40. 20 Also assume with high probability. suffjces to recover Claim : ). 1 (i.e., 1 We’ll assume (a.s.). gives exact recovery of , then recovery of 1 If 0 . 0 Noise-free setting ( SNR = ∞ ) x ⊤ y 0 π ∗ (0) x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) .

  41. 20 If with high probability. suffjces to recover Claim : ). 1 (i.e., 1 We’ll assume (a.s.). gives exact recovery of , then recovery of 1 Noise-free setting ( SNR = ∞ ) x ⊤ y 0 0 x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) . Also assume π ∗ ( 0 ) = 0 .

  42. 20 We’ll assume with high probability. suffjces to recover Claim : ). 1 (i.e., 1 Noise-free setting ( SNR = ∞ ) x ⊤ y 0 0 x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) . Also assume π ∗ ( 0 ) = 0 . If n + 1 ≥ d , then recovery of π ∗ gives exact recovery of β ∗ (a.s.).

  43. 20 Claim : with high probability. suffjces to recover Noise-free setting ( SNR = ∞ ) x ⊤ y 0 0 x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) . Also assume π ∗ ( 0 ) = 0 . If n + 1 ≥ d , then recovery of π ∗ gives exact recovery of β ∗ (a.s.). We’ll assume n + 1 ≥ d + 1 (i.e., n ≥ d ).

  44. 20 Noise-free setting ( SNR = ∞ ) x ⊤ y 0 0 x ⊤ y 1 π ∗ (1) β ∗ = . . . . . . x ⊤ y n π ∗ ( n ) Assume ( x i ) n i = 0 iid from N( 0 , I d ) . Also assume π ∗ ( 0 ) = 0 . If n + 1 ≥ d , then recovery of π ∗ gives exact recovery of β ∗ (a.s.). We’ll assume n + 1 ≥ d + 1 (i.e., n ≥ d ). Claim : n ≥ d suffjces to recover π ∗ with high probability.

  45. Result on exact recovery Theorem (H., Shi, & Sun, 2017) 21 In the noise-free setting, there is a poly( n, d ) -time ⋆ algorithm that returns π ∗ and β ∗ with high probability. ⋆ Assuming problem is appropriately discretized.

  46. Main idea: hidden subset , so 0 1 0 0 We also know: 1 1 1 Measurements: for , and For simplicity : assume 22 0 β ∗ ; π ∗ ( i ) β ∗ , y 0 = x ⊤ y i = x ⊤ i = 1 , . . . , n .

  47. Main idea: hidden subset Measurements: 0 1 0 0 We also know: 22 0 β ∗ ; π ∗ ( i ) β ∗ , y 0 = x ⊤ y i = x ⊤ i = 1 , . . . , n . For simplicity : assume n = d , and x i = e i for i = 1 , . . . , d , so � y 1 , . . . , y d � = � β ∗ 1 , . . . , β ∗ d � .

  48. Main idea: hidden subset Measurements: We also know: 22 0 β ∗ ; π ∗ ( i ) β ∗ , y 0 = x ⊤ y i = x ⊤ i = 1 , . . . , n . For simplicity : assume n = d , and x i = e i for i = 1 , . . . , d , so � y 1 , . . . , y d � = � β ∗ 1 , . . . , β ∗ d � . d 0 β ∗ = ∑ x 0 ,j β ∗ y 0 = x ⊤ j . j = 1

  49. 2 “source” numbers Reduction to Subset Sum • Subset Sum problem. 0 . adds up to The subset 0 . , “target” sum 0 23 x 0 , 1 y 1 d x 0 , 2 0 β ∗ = y 2 ∑ x 0 ,j β ∗ y 0 = x ⊤ j j = 1 . . . . d d . . ∑ ∑ x 0 ,j y i · 1 { π ∗ ( i ) = j } = x 0 ,n i = 1 j = 1 y n

  50. 2 “source” numbers Reduction to Subset Sum • Subset Sum problem. 0 . adds up to The subset 0 . , “target” sum 0 23 x 0 , 1 ??? y 1 d x 0 , 2 0 β ∗ = y 2 ∑ x 0 ,j β ∗ y 0 = x ⊤ j j = 1 . . . . . . d d ∑ ∑ x 0 ,j y i · 1 { π ∗ ( i ) = j } = x 0 ,n i = 1 j = 1 y n

  51. Reduction to Subset Sum Subset Sum problem. 23 x 0 , 1 ??? y 1 d x 0 , 2 0 β ∗ = y 2 ∑ x 0 ,j β ∗ y 0 = x ⊤ j j = 1 . . . . . . d d ∑ ∑ x 0 ,j y i · 1 { π ∗ ( i ) = j } = x 0 ,n i = 1 j = 1 y n • d 2 “source” numbers c i,j := x 0 ,j y i , “target” sum y 0 . The subset { c i,j : π ∗ ( i ) = j } adds up to y 0 .

  52. Reduction to Subset Sum Subset Sum problem. 23 x 0 , 1 ??? y 1 d x 0 , 2 0 β ∗ = y 2 ∑ x 0 ,j β ∗ y 0 = x ⊤ j j = 1 . . . . . . d d ∑ ∑ x 0 ,j y i · 1 { π ∗ ( i ) = j } = x 0 ,n i = 1 j = 1 y n • d 2 “source” numbers c i,j := x 0 ,j y i , “target” sum y 0 . The subset { c i,j : π ∗ ( i ) = j } adds up to y 0 .

  53. NP-Completeness of Subset Sum (a.k.a. “Knapsack”) (Karp, 1972) 24

  54. Easiness of Subset Sum • But Subset Sum is only “weakly” NP-hard (effjcient algorithm exists for unary-encoded inputs). • Lagarias & Odlyzko (1983) : solving certain random instances can be reduced to solving Approximate Shortest Vector Problem in lattices. • Lenstra, Lenstra, & Lovász (1982) : effjcient algorithm to solve Approximate SVP. • Our algorithm is based on similar reduction but requires a somewhat difgerent analysis. 25

  55. Easiness of Subset Sum • But Subset Sum is only “weakly” NP-hard (effjcient algorithm exists for unary-encoded inputs). • Lagarias & Odlyzko (1983) : solving certain random instances can be reduced to solving Approximate Shortest Vector Problem in lattices. • Lenstra, Lenstra, & Lovász (1982) : effjcient algorithm to solve Approximate SVP. • Our algorithm is based on similar reduction but requires a somewhat difgerent analysis. 25

  56. Easiness of Subset Sum • But Subset Sum is only “weakly” NP-hard (effjcient algorithm exists for unary-encoded inputs). • Lagarias & Odlyzko (1983) : solving certain random instances can be reduced to solving Approximate Shortest Vector Problem in lattices. • Lenstra, Lenstra, & Lovász (1982) : effjcient algorithm to solve Approximate SVP. • Our algorithm is based on similar reduction but requires a somewhat difgerent analysis. 25

  57. Easiness of Subset Sum • But Subset Sum is only “weakly” NP-hard (effjcient algorithm exists for unary-encoded inputs). • Lagarias & Odlyzko (1983) : solving certain random instances can be reduced to solving Approximate Shortest Vector Problem in lattices. • Lenstra, Lenstra, & Lovász (1982) : effjcient algorithm to solve Approximate SVP. • Our algorithm is based on similar reduction but requires a somewhat difgerent analysis. 25

  58. 1 such that • correct subset of basis vectors gives short lattice vector • any other lattice vector Reducing subset sum to shortest vector problem 0 . for suffjciently large 1 0 1 0 2 -times longer. ; is more than 2 Lagarias & Odlyzko (1983) : random instances of Subset Sum Reduction : construct lattice basis in by noticeable amount. sum Main idea : (w.h.p.) every incorrect subset will “miss” the target 26 effjciently solvable when N source numbers c 1 , . . . , c N chosen independently and u.a.r. from suffjciently wide interval of Z .

  59. 1 such that • correct subset of basis vectors gives short lattice vector • any other lattice vector 2 -times longer. 0 . for suffjciently large 1 0 1 0 Reducing subset sum to shortest vector problem is more than 2 Lagarias & Odlyzko (1983) : random instances of Subset Sum ; Reduction : construct lattice basis in Main idea : (w.h.p.) every incorrect subset will “miss” the target 26 effjciently solvable when N source numbers c 1 , . . . , c N chosen independently and u.a.r. from suffjciently wide interval of Z . sum T by noticeable amount.

  60. Reducing subset sum to shortest vector problem Main idea : (w.h.p.) every incorrect subset will “miss” the target Lagarias & Odlyzko (1983) : random instances of Subset Sum 26 effjciently solvable when N source numbers c 1 , . . . , c N chosen independently and u.a.r. from suffjciently wide interval of Z . sum T by noticeable amount. Reduction : construct lattice basis in R N + 1 such that • correct subset of basis vectors gives short lattice vector v ⋆ ; • any other lattice vector ̸∝ v ⋆ is more than 2 N/ 2 -times longer.   I N [ ] := · · ·  0 b 0 b 1 b N  MT − Mc 1 · · · − Mc N for suffjciently large M > 0 .

  61. integer multiple of permutation matrix corresponding to Our random subset sum instance that is not an 2 2 1 0 , Key lemma : (w.h.p.) for every Gaussian anti-concentration for quadratic and quartic forms. • To show that Lagarias & Odlyzko reduction still works, use 0 1 . • Instead, have some joint density derived from 27 Catch : Our source numbers c i,j = y i x ⊤ j x 0 are not independent , and not uniformly distributed on some wide interval of Z .

  62. integer multiple of permutation matrix corresponding to Our random subset sum instance • To show that Lagarias & Odlyzko reduction still works, use Gaussian anti-concentration for quadratic and quartic forms. Key lemma : (w.h.p.) for every that is not an , 0 1 2 2 27 Catch : Our source numbers c i,j = y i x ⊤ j x 0 are not independent , and not uniformly distributed on some wide interval of Z . • Instead, have some joint density derived from N( 0 , 1 ) .

  63. integer multiple of permutation matrix corresponding to Our random subset sum instance • To show that Lagarias & Odlyzko reduction still works, use Gaussian anti-concentration for quadratic and quartic forms. Key lemma : (w.h.p.) for every that is not an , 0 1 2 2 27 Catch : Our source numbers c i,j = y i x ⊤ j x 0 are not independent , and not uniformly distributed on some wide interval of Z . • Instead, have some joint density derived from N( 0 , 1 ) .

  64. Our random subset sum instance • To show that Lagarias & Odlyzko reduction still works, use 1 Gaussian anti-concentration for quadratic and quartic forms. 27 Catch : Our source numbers c i,j = y i x ⊤ j x 0 are not independent , and not uniformly distributed on some wide interval of Z . • Instead, have some joint density derived from N( 0 , 1 ) . Key lemma : (w.h.p.) for every Z ∈ Z d × d that is not an integer multiple of permutation matrix corresponding to π ∗ , � � � � ∑ 2 poly( d ) · ∥ β ∗ ∥ 2 . � � y 0 − Z i,j · c i,j ≥ � � � � i,j � �

  65. measurements. Unlikely to tolerate much noise . Some remarks works via Moore-Penrose pseudoinverse. • Algorithm strongly exploits assumption of noise-free Open problem : robust effjcient algorithm in high setting. 28 • In general, x 1 , . . . , x n are not e 1 , . . . , e d , but similar reduction

  66. Some remarks works via Moore-Penrose pseudoinverse. • Algorithm strongly exploits assumption of noise-free measurements. Unlikely to tolerate much noise . Open problem : 28 • In general, x 1 , . . . , x n are not e 1 , . . . , e d , but similar reduction robust effjcient algorithm in high SNR setting.

  67. Correspondence retrieval

  68. 2 . Correspondence retrieval problem • Correspondence across measurements is lost. 0 iid from • as unordered multi-set; 1 1 ; 0 iid from • , where 1 for Measurements : 29 Goal : recover k unknown “signals” β ∗ 1 , . . . , β ∗ k ∈ R d .

  69. Correspondence across measurements is lost. Correspondence retrieval problem 29 Goal : recover k unknown “signals” β ∗ 1 , . . . , β ∗ k ∈ R d . Measurements : ( x i , Y i ) for i = 1 , . . . , n , where • ( x i ) iid from N( 0 , I d ) ; i β ∗ i β ∗ • Y i = � x ⊤ 1 + ε i, 1 , . . . , x ⊤ k + ε i,k � as unordered multi-set; • ( ε i,j ) iid from N( 0 , σ 2 ) .

  70. Correspondence retrieval problem Correspondence across measurements is lost. 29 Goal : recover k unknown “signals” β ∗ 1 , . . . , β ∗ k ∈ R d . Measurements : ( x i , Y i ) for i = 1 , . . . , n , where • ( x i ) iid from N( 0 , I d ) ; i β ∗ i β ∗ • Y i = � x ⊤ 1 + ε i, 1 , . . . , x ⊤ k + ε i,k � as unordered multi-set; • ( ε i,j ) iid from N( 0 , σ 2 ) . β ∗ β ∗ 2 3 x i β ∗ 1

  71. Correspondence retrieval problem Correspondence across measurements is lost. 29 Goal : recover k unknown “signals” β ∗ 1 , . . . , β ∗ k ∈ R d . Measurements : ( x i , Y i ) for i = 1 , . . . , n , where • ( x i ) iid from N( 0 , I d ) ; i β ∗ i β ∗ • Y i = � x ⊤ 1 + ε i, 1 , . . . , x ⊤ k + ε i,k � as unordered multi-set; • ( ε i,j ) iid from N( 0 , σ 2 ) . β ∗ β ∗ 2 3 x i β ∗ 1

  72. Special cases • 2 and 1 2 : (real variant of) phase retrieval . Note that has same information as . Existing methods require 2 . 30 • k = 1 : classical linear regression regression model.

  73. Special cases 2 : (real variant of) phase retrieval . 30 • k = 1 : classical linear regression regression model. • k = 2 and β ∗ 1 = − β ∗ i β ∗ , − x ⊤ i β ∗ � has same information as | x ⊤ i β ∗ | . Note that � x ⊤ Existing methods require n > 2 d .

  74. Algorithmic results (Andoni, H., Shi, & Sun, 2017) Algorithm based on reduction to Subset Sum that requires • General setting : Method-of-moments algorithm that requires . I.e., based on forming averages over the data, like: 1 1 2 Questions : SNR limits? Sub-optimality of “method-of-moments”? 31 • Noise-free setting (i.e., σ = 0 ): n ≥ d + 1 , which is optimal.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend