Large Scale Matrix Analysis and Inference Wouter M. Koolen - Manfred - PowerPoint PPT Presentation

Large Scale Matrix Analysis and Inference Wouter M. Koolen - Manfred Warmuth Reza Bosagh Zadeh - Gunnar Carlsson - Michael Mahoney Dec 9, NIPS 2013 1 / 32

Introductory musing — What is a matrix? a i , j 1 A vector of n 2 parameters 2 A covariance 3 A generalized probability distribution 4 . . . 2 / 32

1. A vector of n 2 parameters When you regularize with the squared Frobenius norm � || W || 2 min F + loss( tr ( WX n )) W n 3 / 32

1. A vector of n 2 parameters When you regularize with the squared Frobenius norm � || W || 2 min F + loss( tr ( WX n )) W n Equivalent to � || vec( W ) || 2 min 2 + loss(vec( W ) · vec( X n )) vec( W ) n No structure: n 2 independent variables 4 / 32

2. A covariance View the symmetric positive definite matrix C as a covariance matrix of some random feature vector c ∈ R n , i.e. � ( c − E ( c ))( c − E ( c )) ⊤ � C = E n features plus their pairwise interactions 5 / 32

Symmetric matrices as ellipses Ellipse = { Cu : � u � 2 = 1 } Dotted lines connect point u on unit ball with point Cu on ellipse 6 / 32

Symmetric matrices as ellipses Eigenvectors form axes Eigenvalues are lengths 7 / 32

Dyads uu ⊤ , where u unit vector One eigenvalue one All others zero Rank one projection matrix 8 / 32

Directional variance along direction u V ( c ⊤ u ) = u ⊤ Cu = tr ( C uu ⊤ ) ≥ 0 The outer figure eight is direction u times the variance u ⊤ C u PCA: find direction of largest variance 9 / 32

3 dimensional variance plots tr ( C uu ⊤ ) is generalized probability when tr ( C ) = 1 10 / 32

3. Generalized probability distributions ω = ( . 2 , . 1 ., . 6 , . 1) ⊤ Probability vector = � e i ω i i �� mixture coefficients pure events W = � w i w ⊤ Density matrix ω i i i �� mixture coefficients pure density matrices 11 / 32

3. Generalized probability distributions ω = ( . 2 , . 1 ., . 6 , . 1) ⊤ Probability vector = � e i ω i i �� mixture coefficients pure events W = � w i w ⊤ Density matrix ω i i i �� mixture coefficients pure density matrices Matrices as generalized distributions 12 / 32

3. Generalized probability distributions ω = ( . 2 , . 1 ., . 6 , . 1) ⊤ Probability vector = � e i ω i i �� mixture coefficients pure events W = � w i w ⊤ Density matrix ω i i i �� mixture coefficients pure density matrices Matrices as generalized distributions Many mixtures lead to same density matrix There always exists a decomposition into n eigendyads Density matrix: Symmetric positive matrix of trace one 13 / 32

It’s like a probability! Total variance along orthogonal set of directions is 1 u ⊤ 1 Wu 1 + u ⊤ 2 Wu 2 = 1 a + b + c = 1 14 / 32

Uniform density? All dyads have generalized probability 1 1 n I n tr (1 n I uu ⊤ ) = 1 n tr ( uu ⊤ ) = 1 n Generalized probabilities of n orthogonal dyads sum to 1 15 / 32

Conventional Bayes Rule P ( M i | y ) = P ( M i ) P ( y | M i ) P ( y ) 4 updates with the same data likelihood Update maintains uncertainty information about maximum likelihood Soft max 16 / 32

Bayes Rule for density matrices D ( M | y ) = exp ( log D ( M ) + log D ( y | M )) tr (above matrix) 1 update with data likelyhood matrix D ( y | M ) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 20 / 32

Bayes Rule for density matrices D ( M | y ) = exp ( log D ( M ) + log D ( y | M )) tr (above matrix) 2 updates with same data likelyhood matrix D ( y | M ) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 21 / 32

Vector case as special case of matrix case Vectors as diagonal matrices All matrices same eigensystem Fancy ⊙ becomes · Often the hardest problem ie bounds for the vector case “lift” to the matrix case 28 / 32

Vector case as special case of matrix case Vectors as diagonal matrices All matrices same eigensystem Fancy ⊙ becomes · Often the hardest problem ie bounds for the vector case “lift” to the matrix case This phenomenon has been dubbed the “free matrix lunch” Size of matrix = size of vector = n 29 / 32

PCA setup Data vectors C = � n x n x ⊤ n u ⊤ C u tr ( Cuu ⊤ ) max = max unit u dyad uu ⊤ � �� linear in uu ⊤ not convex in u c ⊤ e i Corresponding vector problem max e i �� linear in e i Vector problem is matrix problem when everything happens in the same eigensystem Uncertainty over unit: probability vector Uncertainty over dyads: density matrix Uncertainty over k -sets of units: capped probability vector Uncertainty over rank k projection matrices: capped density matrix 30 / 32

For PCA Solve the vector problem first Do all bounds Lift to matrix case: essentially replace · by ⊙ Regret bounds stay the same Free Matrix Lunch 31 / 32

Questions When can you “lift”vector case to matrix case? When is there a free matrix lunch? Lifting matrices to tensors? Efficient algorithms for large matrices? Approximations of ⊙ Avoid eigenvalue decomposition by sampling . . . 32 / 32

Large Scale Matrix Analysis and Inference Wouter M. Koolen - Manfred - PowerPoint PPT Presentation

Large Scale Matrix Analysis and Inference Wouter M. Koolen - Manfred Warmuth Reza Bosagh Zadeh - Gunnar Carlsson - Michael Mahoney Dec 9, NIPS 2013 1 / 32 Introductory musing What is a matrix? a i , j 1 A vector of n 2 parameters 2 A

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Random Walk Inference and Learning in A Large Scale Knowledge Base in A Large Scale Knowledge Base

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

Large Scale Complex Network Analysis using Large Scale Complex Network Analysis using the Hybrid

Matrix COSEC Right People in Right Place at Right Time Matrix COmplete SECurity Matrix COSEC

Complexity of matrix multiplication (For Hierarchical matrix) For Usual matrix The

CS 140 : Matrix multiplication Warmup: Matrix times vector: communication volume Matrix

What is it? Whats changed lately? Whats next? @benpa:matrix.org benp@matrix.org

Sensor-based proximity metrics for team research. A benchmarking and validatjon study across

The Results of a 10-Year Study of the Impact of Intimate Partner Violence Primary Aggressor Laws

Exploring task and gender effects on stance-taking in a collaborative conversational corpus

Structural Equation Modeling for Social Relations: The R package srm Alexander Robitzsch 1 2 ,

From Learning to Doing: Diffusion of Agricultural Innovations in Guinea-Bissau Rute Martins

6/6/2016 The PIWI Experience Nebraska Young Child Institute June 27, 2016 Linda Esterling &

Comparison Metrics for Large Scale Political Event Data Sets Philip A. Schrodt Parus Analytics

Probabilistic Foundations of Statistical Network Analysis Chapter 2: Binary relational data Harry

Large Scale Matrix Analysis and Inference Wouter M. Koolen - Manfred - PowerPoint PPT Presentation

Large Scale Matrix Analysis and Inference Wouter M. Koolen - Manfred Warmuth Reza Bosagh Zadeh - Gunnar Carlsson - Michael Mahoney Dec 9, NIPS 2013 1 / 32 Introductory musing What is a matrix? a i , j 1 A vector of n 2 parameters 2 A

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Random Walk Inference and Learning in A Large Scale Knowledge Base in A Large Scale Knowledge Base

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

Large Scale Complex Network Analysis using Large Scale Complex Network Analysis using the Hybrid

Matrix COSEC Right People in Right Place at Right Time Matrix COmplete SECurity Matrix COSEC

Complexity of matrix multiplication (For Hierarchical matrix) For Usual matrix The

CS 140 : Matrix multiplication Warmup: Matrix times vector: communication volume Matrix

What is it? Whats changed lately? Whats next? @benpa:matrix.org benp@matrix.org

Sensor-based proximity metrics for team research. A benchmarking and validatjon study across

The Results of a 10-Year Study of the Impact of Intimate Partner Violence Primary Aggressor Laws

Exploring task and gender effects on stance-taking in a collaborative conversational corpus

Structural Equation Modeling for Social Relations: The R package srm Alexander Robitzsch 1 2 ,

From Learning to Doing: Diffusion of Agricultural Innovations in Guinea-Bissau Rute Martins

6/6/2016 The PIWI Experience Nebraska Young Child Institute June 27, 2016 Linda Esterling &amp;

Comparison Metrics for Large Scale Political Event Data Sets Philip A. Schrodt Parus Analytics

Probabilistic Foundations of Statistical Network Analysis Chapter 2: Binary relational data Harry

6/6/2016 The PIWI Experience Nebraska Young Child Institute June 27, 2016 Linda Esterling &