Data Mining and Matrices 05 Semi-Discrete Decomposition Rainer - PowerPoint PPT Presentation

Data Mining and Matrices 05 – Semi-Discrete Decomposition Rainer Gemulla, Pauli Miettinen May 16, 2013

Outline Hunting the Bump 1 Semi-Discrete Decomposition 2 The Algorithm 3 Applications 4 SDD alone SVD + SDD Wrap-Up 5 2 / 30

An example data 100 200 300 400 500 600 700 100 200 300 400 500 600 700 The data 3 / 30

An example data 3.5 100 3 200 2.5 300 2 400 1.5 500 1 600 0.5 700 0 100 200 300 400 500 600 700 The data after permuting rows and columns 3 / 30

An example data The data in a 3D view Can we find the bumps in the picture automatically (from unpermuted data)? 3 / 30

What is a bump? A submatrix of a matrix A ∈ R m × n contains   some rows of A and some columns of those 3 1 3 rows A = 2 3 1   ◮ Let I ⊆ { 1 , 2 , . . . , m } have the row indices and 3 2 3 J ⊆ { 1 , 2 , . . . , n } have the column indices of the submatrix I = { 1 , 3 } ◮ If x ∈ { 0 , 1 } m has x i = 1 iff i ∈ I and J = { 1 , 3 } y ∈ { 0 , 1 } n has y j = 1 iff j ∈ J , then xy T ∈ { 0 , 1 } m × n has ( xy T ) ij = 1 iff a ij is in     1 1 the submatrix  y = x = 0 0 ◮ A ◦ xy T has the values of the submatrix and    1 1 zeros elsewhere ⋆ ( A ◦ B ) ij = a ij b ij is the Hadamard matrix  3 0 3  product A ◦ xy T = 0 0 0 The submatrix is uniform if all (or most) of its   3 0 3 values are (approximately) the same ◮ Exactly uniform submatrices with value δ can be written as δ xy T — a bump 4 / 30

The next bump and negative values Assume we know how to find the largest bump of a matrix To find another bump, we can subtract the found bump from the matrix and find the largest bump of the residual matrix ◮ But after subtraction we might have negative values in the matrix We can generalize the uniform submatrices to require uniformity only in magnitude ◮ Allow characteristic vectors x and y to take values from {− 1 , 0 , 1 } ◮ If x = ( − 1 , 0 , − 1) T and y = (1 , 0 , − 1) T , then   − δ 0 δ δ xy T = 0 0 0   δ 0 − δ This allows us to define bumps in matrices with negative values 5 / 30

The definition Semi-Discrete Decomposition Given a matrix A ∈ R m × n , the semi-discrete decomposition (SDD) of A of dimension k is A ≈ X k D k Y T k , where X k ∈ {− 1 , 0 , 1 } m × k Y k ∈ {− 1 , 0 , 1 } n × k D k ∈ R k × k is a diagonal matrix + 7 / 30

Example The first component σ 1 u 1 v T The data 1 using SVD 8 / 30

Example The second component σ 2 u 2 v T The data 2 using SVD The SVD cannot find the bumps 8 / 30

Example The first bump d 1 x 1 y T The data 1 using SDD 8 / 30

Example The second bump d 2 x 2 y T The data 2 using SDD 8 / 30

Example The third bump d 3 x 3 y T The data 3 using SDD 8 / 30

Example The fourth bump d 4 x 4 y T The data 4 using SDD 8 / 30

Example The fifth bump d 5 x 5 y T The data 5 using SDD 8 / 30

Example The data The 5-dimensional SDD approximation X 5 D 5 Y T 5 8 / 30

Properties of SDD The columns of X k and Y k do not need to be linearly independent ◮ The same column can be even repeated multiple times The dimension k might need to be large for accurate approximation (compared to SVD) ◮ k = min { n , m } is not necessarily enough for exact SDD ⋆ k = nm is always enough ◮ First factors don’t necessarily explain much about the matrix SDD factors are local ◮ Only affect a certain submatrix, typically not every element ◮ SVD factors typically change every value Storing an k -dimensional SDD takes less space than storing rank- k truncated SVD ◮ X k and Y k are ternary and often sparse For every rank-1 layer of an SDD, all non-zero values in the layer have the same magnitude ( d ii for layer i ) 9 / 30

Interpretation The factor interpretation is not very useful as the factors are not independent ◮ A later factor can change just a subset of values already changed by an earlier factor The SDD can be interpret as a form of bi-clustering ◮ Every layer (bump) defines a group of rows and columns with homogeneous values in the residual matrix The component interpretation is natural to SDD ◮ The SDD is a sum of local bumps ◮ SDD doesn’t model global phenomena (e.g. noise) well 10 / 30

The outline of the algorithm 1 Input: Matrix A ∈ R m × n , non-negative integer k 2 Output: k -dimensional SDD of A , i.e. matrices X k ∈ {− 1 , 0 , 1 } m × k , Y k ∈ {− 1 , 0 , 1 } n × k , and diagonal D k ∈ R k × k + 3 R 1 ← A 4 for i = 1 , . . . , k Select y i ∈ {− 1 , 0 , 1 } n 1 while not converged 2 Compute x i ∈ {− 1 , 0 , 1 } m given y i and R i 1 Compute y i given x and R i 2 end while 3 Set d i to the average of R i ◦ x i y T over the non-zero locations of xy T 4 i Set x i as the i th column of X i , y i the i th column of Y i , and d i the i th 5 value of D i R i +1 ← R i − d i x i y T 6 i 5 end for 6 return X k , Y k , and D k 12 / 30

Finding the bump Problem: Given R ∈ R m × n and y ∈ {− 1 , 0 , 1 } n , find x ∈ {− 1 , 0 , 1 } m such that � R − d xy T � 2 F is minimized 2 (the average of R ◦ xy T over the ◮ We set d ← x T Ry / � x � 2 2 � y � 2 non-zero locations of xy T ) ◮ We want to minimize the residual norm Set s ← Ry Task: Find x that maximizes F ( x , y ) = ( x T s ) 2 / � x � 2 2 ◮ Maximizing F equals minimizing the residual norm after d is set as above ◮ Can be solved optimally by trying 2 m different binary vectors and setting the sign appropriately Solution: Order values s i so that | s i 1 | ≥ | s i 2 | ≥ · · · ≥ | s i m | and set x i j ← sign( s i j ) for the first J values s i and 0 elsewhere ◮ J is the number of nonzeros in x ⋆ Because we don’t know J , we have to try every possibility and select the best ◮ Values s i contain the row sums of R from those columns that are selected by y and with sign set accordingly 13 / 30

Selecting the initial vector y There are many ways to select the initial vector: MAX: set y j = 1 for the column j that has the largest squared value of R and rest to zero ◮ Intuition: the very largest squared value is probably in the best bump CYC: set y j = 1 for j = ( k mod n ) + 1 ◮ Cycle thru the columns THR: select a unit vector y that satisfies � Ry � 2 F ≥ � R � 2 F / n ◮ The selected column must have a squared sum that’s above the average squared sum ◮ The selection can be random or columns can be tried one-by-one ⋆ The CYC and THR can be mixed 14 / 30

Example result The data 5-dimensional SDD 15 / 30

Example result The matrix X 5 D 5 Y T The data 5 − A 15 / 30

Normalization Normalization can have a profound effect on SDD Zero centering the columns will change the type of bumps found ◮ The bumps in the original data have the largest-magnitude values ◮ The bumps in the zero-centered data have the most extreme values Normalizing the variance will make the matrix to have more uniform values and thus changes the bumps Squaring the values will promote smaller bumps of exceptionally high values Square-rooting the values will promote larger bumps of smaller magnitude 16 / 30

Normalization example: zero-centered data Zero-centered data The first bump 17 / 30

Normalization example: zero-centered data Zero-centered data The second bump 17 / 30

Normalization example: zero-centered data Zero-centered data The third bump Note that here red means 0 17 / 30

Normalization example: zero-centered data Zero-centered data 5-dimensional SDD 17 / 30

Normalization example: square-root of data Data after taking element-wise The first bump square-root 18 / 30

Normalization example: square-root of data Data after taking element-wise The second bump square-root 18 / 30

Normalization example: square-root of data Data after taking element-wise The third bump square-root 18 / 30

Normalization example: square-root of data Data after taking element-wise 5-dimensional SDD square-root 18 / 30

Normalization example: squared data Squared data The first bump 19 / 30

Normalization example: squared data Squared data The second bump 19 / 30

Normalization example: squared data Squared data The third bump 19 / 30

Normalization example: squared data Squared data 5-dimensional SDD 19 / 30

Data Mining and Matrices 05 Semi-Discrete Decomposition Rainer - PowerPoint PPT Presentation

Data Mining and Matrices 05 Semi-Discrete Decomposition Rainer Gemulla, Pauli Miettinen May 16, 2013 Outline Hunting the Bump 1 Semi-Discrete Decomposition 2 The Algorithm 3 Applications 4 SDD alone SVD + SDD Wrap-Up 5 2 / 30

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Data Mining and Matrices 01 Introduction Rainer Gemulla, Pauli Miettinen April 18, 2013

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by

Transformations and Matrices Transformations I Transformations are functions Matrices

Matrices with Application to Page Rank Markov Matrices Pagerank Anil Maheshwari

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

for the IBL Detector Andrea Gaudiello Universit degli Studi Di Genova INFN On behalf of the

Design and Analysis for Multifidelity Computer Experiments Ying Hung Department of Statistics

Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems Ioannis

NIH Collaboratory: Research Transformation in Progress Kevin P. Weinfurt, PhD Adrian Hernandez,

Members of the SLS Beam Dynamics Group J. Chrin, M. Mu noz, A. Streun, M. B oge JLab

3 = JKR F W d Pull- -Off Force Off Force Pull Contact Radius Contact Radius po A

Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of

3.1 Iterated Partial Derivatives Prof. Tesler Math 20C Fall 2018 Prof. Tesler 3.1 Iterated

Data Mining and Matrices 05 Semi-Discrete Decomposition Rainer - PowerPoint PPT Presentation

Data Mining and Matrices 05 Semi-Discrete Decomposition Rainer Gemulla, Pauli Miettinen May 16, 2013 Outline Hunting the Bump 1 Semi-Discrete Decomposition 2 The Algorithm 3 Applications 4 SDD alone SVD + SDD Wrap-Up 5 2 / 30

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Data Mining and Matrices 01 Introduction Rainer Gemulla, Pauli Miettinen April 18, 2013

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices &amp; quadratic forms)

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal &amp; spectral matrices) by

Transformations and Matrices Transformations I Transformations are functions Matrices

Matrices with Application to Page Rank Markov Matrices Pagerank Anil Maheshwari

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

for the IBL Detector Andrea Gaudiello Universit degli Studi Di Genova INFN On behalf of the

Design and Analysis for Multifidelity Computer Experiments Ying Hung Department of Statistics

Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems Ioannis

NIH Collaboratory: Research Transformation in Progress Kevin P. Weinfurt, PhD Adrian Hernandez,

Members of the SLS Beam Dynamics Group J. Chrin, M. Mu noz, A. Streun, M. B oge JLab

3 = JKR F W d Pull- -Off Force Off Force Pull Contact Radius Contact Radius po A

Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of

3.1 Iterated Partial Derivatives Prof. Tesler Math 20C Fall 2018 Prof. Tesler 3.1 Iterated

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by