Median Matrix Completion: from Embarrassment to Optimality Xiaojun - PowerPoint PPT Presentation

Median Matrix Completion: from Embarrassment to Optimality Xiaojun Mao School of Data Science Fudan University, China June 15, 2020 Joint work with Dr. Weidong Liu (Shanghai Jiao Tong University, China) and Dr. Raymond K. W. Wong (Texas A&M University, U.S.A.) Xiaojun Mao (FDU) MedianMC June 15, 2020 1 / 21

Introduction 1 Estimations 2 Theoretical Guarantee 3 Experiments 4 Xiaojun Mao (FDU) MedianMC June 15, 2020 2 / 21

Our Goal and Contributions Robust Matrix Comepletion (MC), allows heavy tails. Develop a robust and scalable estimator for median MC in large-scale problems. A fast and simple initial estimation via embarrassingly parallel computing. A refinement stage based on pseudo data. Theoretically, we show that this refinement stage can improve the convergence rate of the sub-optimal initial estimator to near-optimal order, as good as the computationally expensive median MC estimator. Xiaojun Mao (FDU) MedianMC June 15, 2020 3 / 21

Background: The Netflix Problem = Y n 1 ≈ 480 K , n 2 ≈ 18 K . On average each viewer rated about 200 movies. Only 1 . 2% entries were observed. Goal: recover the true rating matrix A ⋆ . Xiaojun Mao (FDU) MedianMC June 15, 2020 4 / 21

Robust Matrix Completion Low-rank-plus-sparse structure: A ⋆ + S + E . Median matrix completion: based on the absolute deviation loss. Under absolute deviation loss and the Huber loss, the convergence rates of Elsener and Geer (2018) match with Koltchinskii et al. (2011). Alquier et al. (2019) derives the minimax rates of convergence with any Lipschitz loss functions (absolute deviation loss). Xiaojun Mao (FDU) MedianMC June 15, 2020 5 / 21

Trace Regression Model N independent pairs ( X k , Y k ), � � X T Y k = tr + ǫ k , k = 1 , . . . , N . (1) k A ⋆ The elements of ǫ = ( ǫ 1 , . . . , ǫ N ) are N i.i.d. random noise variables independent of the design matrices. The design matrices X k : X = { e j ( n 1 ) e k ( n 2 ) T : j = 1 , . . . , n 1 ; k = 1 , . . . , n 2 } , Xiaojun Mao (FDU) MedianMC June 15, 2020 7 / 21

Regularized Least Absolute Deviation Estimator A ⋆ = ( A ⋆, ij ) n 1 , n 2 i , j =1 ∈ R n 1 × n 2 , P ( ǫ ≤ 0) = 0 . 5: A ⋆, ij is the median of Y | X . B ( a , n , m ) = { A ∈ R n × m : � A � ∞ ≤ a } and A ⋆ ∈ B ( a , n , m ). We use the absolute deviation loss: �� Y − tr ( X T A ) A ⋆ = arg min . E � A ∈B ( a , n 1 , n 2 ) To encourage a low-rank solution, N � � � 1 � � � � Y k − tr ( X T � + λ ′ N � A � ∗ . A LADMC = arg min k A ) N A ∈B ( a , n 1 , n 2 ) k =1 Common computational strategies based on proximal gradient method inapplicable (Sum of two non-differentiable terms). Alquier et al. (2019) use ADMM, when the sample size and the matrix dimensions are large, slow and not scalable in practice. Xiaojun Mao (FDU) MedianMC June 15, 2020 8 / 21

Distributed Initial Estimator Figure: An example of dividing a matrix into sub-matrices. � � � 1 � � Y k − tr ( X T � + λ N l , l � A l � ∗ . A LADMC , l = arg min l , k A l ) N l A l ∈B ( a , m 1 , m 2 ) k ∈ Ω l Xiaojun Mao (FDU) MedianMC June 15, 2020 9 / 21

The Idea of Refinement L ( A ; { Y , X } ) = | Y − tr ( X T A ) | . The Newton-Raphson iteration: � � vec( A 1 ) = vec( � A 0 ) − H ( � A 0 ) − 1 E ( Y , X ) l ( � A 0 ; { Y , X } ) , where � A 0 is an initial estimator; l ( A ; { Y , X } ) is the sub-gradient and H ( A ) is the Hessian matrix. When � A 0 is close to the minimizer A ⋆ , vec( A 1 ) ≈ vec( � A 0 ) − [2 f (0) diag ( Π )] − 1 E ( Y , X ) [ l ( � A 0 ; { Y , X } )] � � � � � � − 1 Y ≤ tr ( X T � vec( � A 0 ) − [ f (0)] − 1 = E ( Y , X ) A 0 ) I 1 n 1 n 2 2 � Y 0 � vec( X ) ˜ = { E ( Y , X ) [vec( X )vec( X ) T ] } − 1 E ( Y , X ) where Π = ( π 1 , 1 , . . . , π n 1 , n 2 ) T , π st = Pr( X k = e s ( n 1 ) e T t ( n 2 )), and the theoretical pseudo data � � � � − 1 Y o = tr ( X T � Y ≤ tr ( X T � � A 0 ) − [ f (0)] − 1 A 0 ) . I 2 Xiaojun Mao (FDU) MedianMC June 15, 2020 10 / 21

The First Iteration Refinement Details Y o − tr ( X T A ) } 2 . vec( A 1 ) ≈ arg min A E ( Y , X ) { � Choice of the initial estimator: � A 0 satisfies certain rate Condition. K ( x ): kernel function; h > 0: the bandwidth. � � N k � � Y k − tr ( X T f (0) = 1 A 0 ) � K . Nh h k =1 Let � Y = ( � Y k ), denote � � � � − 1 Y k = tr ( X T � k � A 0 ) − [ � f (0)] − 1 Y k ≤ tr ( X T k � A 0 ) . I 2 By using � Y , one natural estimator is given by � � 2 + λ N � A � ∗ . N � 1 � � Y k − tr ( X T A = arg min k A ) N A ∈B ( a , n 1 , n 2 ) k =1 Xiaojun Mao (FDU) MedianMC June 15, 2020 11 / 21

The t -th Iteration Refinement Details Let h t → 0 is the bandwidth for the t -th iteration, � � N k � A ( t − 1) ) � Y k − tr ( X T 1 f ( t ) (0) = � . K Nh t h t k =1 Similarly, for each 1 ≤ k ≤ N , define � − 1 � � � � � − 1 f ( t ) (0) Y ( t ) � = tr ( X T k � A ( t − 1) ) − � Y k ≤ tr ( X T k � A ( t − 1) ) . I k 2 We propose the following estimator N � � 2 + λ N , t � A � ∗ . � 1 A ( t ) = Y ( t ) � � − tr ( X T arg min k A ) k N A ∈B ( a , n 1 , n 2 ) k =1 Xiaojun Mao (FDU) MedianMC June 15, 2020 12 / 21

Notations n + = n 1 + n 2 , n max = max { n 1 , n 2 } and n min = min { n 1 , n 2 } . Denote r ⋆ = rank( A ⋆ ). In additional to some regular conditions, the initial estimator � A 0 satisfies ( n 1 n 2 ) − 1 / 2 � � A 0 − A ⋆ � F = O P (( n 1 n 2 ) − 1 / 2 a N ), where the initial rate ( n 1 n 2 ) − 1 / 2 a N = o (1). Denote the initial rate a N , 0 = a N and define that � � √ r ⋆ a N , 0 � 2 t r ⋆ ( n 1 n 2 ) n max log( n + ) + n min a N , t = . √ r ⋆ N n min Xiaojun Mao (FDU) MedianMC June 15, 2020 14 / 21

Convergence Results of Repeated Refinement Estimator Theorem (Repeated refinement) Suppose that certain regular conditions hold and A ⋆ ∈ B ( a , n 1 , n 2 ) . By choosing h t and λ N , t to be certain orders, we have � � � �� 2 �� A ( t ) − A ⋆ a 4 log( n + ) n max log( n + ) N , t − 1 F = O P max , r ⋆ + . n 2 n 1 n 2 N N min ( n 1 n 2 ) � � log( r 2 ⋆ n 2 max log( n + )) − log( n min N ) t ≥ log / log(2) , for some c 0 > 0 , c 0 log( r ⋆ a 2 N , 0 ) − 2 c 0 log( n min ) A ( t ) becomes r ⋆ n max N − 1 log( n + ) which is The convergence rate of � the near-optimal rate r ⋆ n max N − 1 upto a logarithmic factor. Under certain condition, t is of constant order. Xiaojun Mao (FDU) MedianMC June 15, 2020 15 / 21

Synthetic Data Generation A ⋆ = UV T , where the entries of U ∈ R n 1 × r and V ∈ R n 2 × r were all drawn from N (0 , 1) independently. Set r = 3, chose n 1 = n 2 : 400, repeat 500 times. The missing rate was 0 . 2, we adopted the uniform missing mechanism. Four noise distributions: S1 Normal: ǫ ∼ N (0 , 1). S2 Cauchy: ǫ ∼ Cauchy(0 , 1). S3 Exponential: ǫ ∼ exp(1). S4 t-distribution with degree of freedom 1: ǫ ∼ t 1 . Cauchy distribution is a very heavy-tailed distribution and its first moment (expectation) does not exist. Xiaojun Mao (FDU) MedianMC June 15, 2020 17 / 21

Comparison Methods (a) BLADMC: Blocked Least Absolute Deviation Matrix Completion � A LADMC , 0 . Number of row subsets l 1 = 2, number of column subsets l 2 = 2. (b) ACL: Least Absolute Deviation Matrix Completion with nuclear norm penalty based on the computationally expensive ADMM algorithm proposed by Alquier et al. (2019). c) MHT: The squared loss estimator with nuclear norm penalty proposed by Mazumder et al. (2010). Xiaojun Mao (FDU) MedianMC June 15, 2020 18 / 21

Simulation Results for Noise Distribution S1 and S2 Table: The average RMSEs, MAEs, estimated ranks and their standard errors (in parentheses) of DLADMC, BLADMC, ACL and MHT. (T) DLADMC BLADMC RMSE 0.5920 (0.0091) 0.7660 (0.0086) MAE 0.4273 (0.0063) 0.5615 (0.006) S1(4) rank 52.90 (2.51) 400 (0.00) RMSE 0.9395 (0.0544) 1.7421 (0.3767) MAE 0.6735 (0.0339) 1.2061 (0.1570) S2(5) rank 36.49 (7.94) 272.25 (111.84) (T) ACL MHT RMSE 0.5518 (0.0081) 0.4607 (0.0070) MAE 0.4031 (0.0056) 0.3375 (0.0047) S1(4) rank 400 (0.00) 36.89 (1.79) RMSE 1.8236 (1.1486) 106.3660 (918.5790) MAE 1.2434 (0.5828) 1.4666 (2.2963) S2(5) rank 277.08 (170.99) 1.25 (0.50) Xiaojun Mao (FDU) MedianMC June 15, 2020 19 / 21

Median Matrix Completion: from Embarrassment to Optimality Xiaojun - PowerPoint PPT Presentation

Median Matrix Completion: from Embarrassment to Optimality Xiaojun Mao School of Data Science Fudan University, China June 15, 2020 Joint work with Dr. Weidong Liu (Shanghai Jiao Tong University, China) and Dr. Raymond K. W. Wong (Texas

the nerves sensory radial median ulnar median median sensory median median ulnar radial

I - -75 Median Cable Barrier 75 Median Cable Barrier 75 Median Cable Barrier I 75 Median Cable

Linear-time Median Def: Median of elements A=a 1 , a 2 , , a n is the (n/2)-th smallest element

Spartanburg Nation Median Value of a $115,900 $184,700 Home Median Gross Rent $705 $950

V0E 27 Nov 2018 US Infant Mortality Rate: National Embarrassment? V0E 2018 CTC-3 1 V0E 2018

African American Strategy Equitable Access to Homeownership Presentation April 16, 2018

Median Finding Test Cases What's Next 1. Median finding, part 2 2. Why we write test cases 3.

Optimality Conditions Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Optimality

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Lecture 15: Exact Tensor Completion Joint Work with David Steurer Lecture Outline Part I:

Business Statistics CONTENTS Hypotheses on the median The sign test The Wilcoxon signed ranks

Singularity Degree of PSD Matrix Completion Shin-ichi Tanigawa CWI and Kyoto July 29, 2016 1 /

ELD Completion Module Advice for students on completion of Modules A, B & C Why?

The Parameterized Complexity of Matrix Completion Robert Ganian Joint work with: Eduard Eiben

Stability, Networks: Stability, Networks: Control, and Optimality Control, and Optimality

PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan

Making Sense of Jesus Miracles The Importance of Context Gods Omnipotence Can God make a

Monodromy dependence of Painlev tau functions Oleg Lisovyy Institut Denis-Poisson, Universit

More refined representations Control dependence graph Problem: control-flow edges in CFG

Introduction Computer Science & Engineering 423/823 I Similar to SSSP, but find shortest paths

Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & Intelligent Machines @ GT

Matrix-Multiply FSMD Start Din WEn WEn Cnt=0 WAddr WAddr Start=0 S1 Sum=0 RegFile

Search for systems of linked symmetric 2 (36 , 15 , 6) designs Matan Ziv-Av Ben-Gurion

Median Matrix Completion: from Embarrassment to Optimality Xiaojun - PowerPoint PPT Presentation

Median Matrix Completion: from Embarrassment to Optimality Xiaojun Mao School of Data Science Fudan University, China June 15, 2020 Joint work with Dr. Weidong Liu (Shanghai Jiao Tong University, China) and Dr. Raymond K. W. Wong (Texas

the nerves sensory radial median ulnar median median sensory median median ulnar radial

I - -75 Median Cable Barrier 75 Median Cable Barrier 75 Median Cable Barrier I 75 Median Cable

Linear-time Median Def: Median of elements A=a 1 , a 2 , , a n is the (n/2)-th smallest element

Spartanburg Nation Median Value of a $115,900 $184,700 Home Median Gross Rent $705 $950

V0E 27 Nov 2018 US Infant Mortality Rate: National Embarrassment? V0E 2018 CTC-3 1 V0E 2018

African American Strategy Equitable Access to Homeownership Presentation April 16, 2018

Median Finding Test Cases What's Next 1. Median finding, part 2 2. Why we write test cases 3.

Optimality Conditions Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Optimality

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Lecture 15: Exact Tensor Completion Joint Work with David Steurer Lecture Outline Part I:

Business Statistics CONTENTS Hypotheses on the median The sign test The Wilcoxon signed ranks

Singularity Degree of PSD Matrix Completion Shin-ichi Tanigawa CWI and Kyoto July 29, 2016 1 /

ELD Completion Module Advice for students on completion of Modules A, B &amp; C Why?

The Parameterized Complexity of Matrix Completion Robert Ganian Joint work with: Eduard Eiben

Stability, Networks: Stability, Networks: Control, and Optimality Control, and Optimality

PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan

Making Sense of Jesus Miracles The Importance of Context Gods Omnipotence Can God make a

Monodromy dependence of Painlev tau functions Oleg Lisovyy Institut Denis-Poisson, Universit

More refined representations Control dependence graph Problem: control-flow edges in CFG

Introduction Computer Science &amp; Engineering 423/823 I Similar to SSSP, but find shortest paths

Sparse Kernel Machines - RVM Henrik I. Christensen Robotics &amp; Intelligent Machines @ GT

Matrix-Multiply FSMD Start Din WEn WEn Cnt=0 WAddr WAddr Start=0 S1 Sum=0 RegFile

Search for systems of linked symmetric 2 (36 , 15 , 6) designs Matan Ziv-Av Ben-Gurion

ELD Completion Module Advice for students on completion of Modules A, B & C Why?

Introduction Computer Science & Engineering 423/823 I Similar to SSSP, but find shortest paths

Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & Intelligent Machines @ GT