Data Mining and Matrices 04 Matrix Completion Rainer Gemulla, - PowerPoint PPT Presentation

Data Mining and Matrices 04 – Matrix Completion Rainer Gemulla, Pauli Miettinen May 02, 2013

Recommender systems Problem ◮ Set of users ◮ Set of items (movies, books, jokes, products, stories, ...) ◮ Feedback (ratings, purchase, click-through, tags, ...) ◮ Sometimes: metadata (user profiles, item properties, ...) Goal: Predict preferences of users for items Ultimate goal: Create item recommendations for each user Example Avatar The Matrix Up  ? 4 2  Alice Bob 3 2 ?   5 ? 3 Charlie 2 / 35

Outline Collaborative Filtering 1 Matrix Completion 2 Algorithms 3 Summary 4 3 / 35

Collaborative filtering Key idea: Make use of past user behavior No domain knowledge required No expensive data collection needed Allows discovery of complex and unexpected patterns Widely adopted: Amazon, TiVo, Netflix, Microsoft Key techniques: neighborhood models, latent factor models Avatar The Matrix Up   Alice ? 4 2 3 2 ? Bob   Charlie 5 ? 3 Leverage past behavior of other users and/or on other items. 4 / 35

A simple baseline m users, n items, m × n rating matrix D Revealed entries Ω = { ( i , j ) | rating D ij is revealed } , N = | Ω | Baseline predictor : b ui = µ + b i + b j ◮ µ = 1 � ( i , j ) ∈ Ω D ij is the overall average rating N ◮ b i is a user bias (user’s tendency to rate low/high) ◮ b j is an item bias (item’s tendency to be rated low/high) ( i , j ) ∈ Ω ( D ij − µ − b i − b j ) 2 � Least squares estimates: argmin b ∗ D Avatar Matrix Up m = 3 (1 . 01) (0 . 34) ( − 1 . 32) n = 3 Ω = { (1 , 2) , (1 , 3) , (2 , 1) , . . . } Alice ? 4 2 N = 6 (0 . 32) (4 . 5) (3 . 8) (2 . 1) µ = 3 . 17 Bob 3 2 ? b 32 = 3 . 17 + 0 . 99 + 0 . 34 = 4 . 5 ( − 1 . 34) (2 . 8) (2 . 2) (0 . 5) Charlie 5 ? 3 Baseline does not account for (0 . 99) (5 . 2) (4 . 5) (2 . 8) personal tastes. 5 / 35

λ ∈ κ λ - κ ∑ When does a user like an item? Neighborhood models (kNN): When he likes similar items ◮ Find the top- k most similar items the user has rated ◮ Combine the ratings of these items (e.g., average) ◮ Requires a similarity measure (e.g., Pearson correlation coefficient) is similar to Unrated by Bob Bob rated 4 → predict 4 Latent factor models (LFM): When similar users like similar items ◮ More holistic approach Serious ∈  Braveheart ∈  Amadeus The Color Purple ◮ Users and items are placed in the same “latent factor space” Lethal Weapon Sense and Ocean’s 11 ◮ Position of a user and an item Geared Sensibility Geared toward toward males females related to preference (via dot products) Dave The Lion King Dumb and Dumber The Princess Independence Diaries Day Gus 6 / 35 Escapist ∈ 

λ ∈ ∑ κ - λ κ Intuition behind latent factor models (1) Serious ∈  Braveheart ∈  Amadeus The Color Purple Lethal Weapon Sense and Ocean’s 11 Sensibility Geared Geared toward toward males females Dave The Lion King Dumb and Dumber The Princess Independence Diaries Day Gus Escapist ∈  7 / 35 Koren et al., 2009.

Intuition behind latent factor models (2) Does user u like item v ? Quality: measured via direction from origin (cos ∠ ( u , v )) ◮ Same direction → attraction: cos ∠ ( u , v ) ≈ 1 ◮ Opposite direction → repulsion: cos ∠ ( u , v ) ≈ − 1 ◮ Orthogonal direction → oblivious: cos ∠ ( u , v ) ≈ 0 Strength: measured via distance from origin ( � u �� v � ) ◮ Far from origin → strong relationship: � u �� v � large ◮ Close to origin → weak relationship: � u �� v � small Overall preference: measured via dot product ( u · v ) u · v = � u �� v � u · v � u �� v � = � u �� v � cos ∠ ( u , v ) ◮ Same direction, far out → strong attraction: u · v large positive ◮ Opposite direction, far out → strong repulsion: u · v large negative ◮ Orthogonal direction, any distance → oblivious: : u · v ≈ 0 But how to select dimensions and where to place items and users? Key idea: Pick dimensions that explain the known data well. 8 / 35

SVD and missing values Input data Rank-10 truncated SVD 10% of input data Rank-10 truncated SVD SVD treats missing entries as 0. 9 / 35

Latent factor models and missing values Input data Rank-10 LFM 10% of input data Rank-10 LFM LFMs “ignore” missing entries. 10 / 35

Latent factor models (simple form) Given rank r , find m × r matrix L and r × n matrix R such that D ij ≈ [ LR ] ij for ( i , j ) ∈ Ω R Least squares formulation R ∗ j � ( D ij − [ LR ] ij ) 2 min L , R ( i , j ) ∈ Ω Example ( r = 1) L L i ∗ D ij R Avatar The Matrix Up (2 . 24) (1 . 92) (1 . 18) D Alice ? 4 2 (1 . 98) (4 . 4) (3 . 8) (2 . 3) Bob 3 2 ? L (1 . 21) (2 . 7) (2 . 3) (1 . 4) Charlie 5 ? 3 (2 . 30) (5 . 2) (4 . 4) (2 . 7) 11 / 35

∈ ∑ κ - -฀ - - λ฀ Example: Netflix prize data ( ≈ 500k users, ≈ 17k movies, ≈ 100M ratings) 1.5 Julien Donkey-Boy The Royal Tenenbaums Punch-Drunk Love I Heart Huckabees Lost in Translation Being John Malkovich 1.0 Kill Bill: Vol. 1 Belle de Jour Annie Hall Natural Born Killers Citizen Kane Freddy Got Fingered Half Baked 0.5 Scarface Sophie’s Choice Freddy vs. Jason Road Trip The Wizard of Oz Factor vector 2 Moonstruck 0.0 The Way We Were The Sound of Music The Waltons: Season 1 The Longest Yard The Fast and the Furious –0.5 Armageddon Catwoman Stepmom Runaway Bride Sister Act Coyote Ugly Maid in Manhattan –1.0 –1.5 –1.5 –1.0 –0.5 0.0 0.5 1.0 Factor vector 1 12 / 35 Koren et al., 2009.

Latent factor models (summation form) R R ∗ j Least squares formulation prone to overfitting More general summation form : � L = l ij ( L i ∗ , R ∗ j ) + R ( L , R ) , L D ij L i ∗ ( i , j ) ∈ Ω ◮ L is global loss ◮ L i ∗ and R ∗ j are user and item parameters , resp. D ◮ l ij is local loss , e.g., l ij = ( D ij − [ LR ] ij ) 2 ◮ R is regularization term , e.g., R = λ ( � L � 2 F + � R � 2 F ) Loss function can be more sophisticated ◮ Improved predictors (e.g., include user and item bias) ◮ Additional feedback data (e.g., time, implicit feedback) ◮ Regularization terms (e.g., weighted depending on amount of feedback) ◮ Available metadata (e.g., demographics, genre of a movie) 13 / 35

Example: Netflix prize data Root mean square error of predictions 0.91 40 60 Plain 90 0.905 128 180 With biases 50 With implicit feedback 100 200 With temporal dynamics (v.1) 0.9 With temporal dynamics (v.2) 50 0.895 RMSE 100 200 0.89 0.885 100 50 200 500 100 200 500 1,000 0.88 1,500 0.875 10 100 1,000 10,000 100,000 Millions of parameters 14 / 35 Koren et al., 2009.

Outline Collaborative Filtering 1 Matrix Completion 2 Algorithms 3 Summary 4 15 / 35

The matrix completion problem Complete these matrices!  1 1 1 1 1   1 1 1 1 1  ? ? ? ? 1 1 1 1 1 1         1 1 ? 1 1 1 ? ? ? ?         ? ? ? ? 1 1 1 1 1 1     1 1 1 1 1 1 ? ? ? ? Matrix completion is impossible without additional assumptions! Let’s assume that underlying full matrix is “simple” (here: rank 1).  1 1 1 1 1   1 1 1 1 1  1 1 1 1 1 1 1 1 1 1         1 1 1 1 1 1 1 1 1 1         1 1 1 1 1 1 1 1 1 1     1 1 1 1 1 1 1 1 1 1 When/how can we recover a low-rank matrix from a sample of its entries? 16 / 35

Rank minimization Definition (rank minimization problem) Given an n × n data matrix D and an index set Ω of revealed entries. The rank minimization problem is minimize rank( X ) subject to D ij = X ij ( i , j ) ∈ Ω X ∈ R n × n . Seeks for “simplest explanation” fitting the data If unique and sufficient samples, recovers D (i.e., X = D ) NP-hard Time complexity of existing rank minimization algorithms dou- ble exponential in n (and also slow in practice). 17 / 35

Nuclear norm minimization Rank: rank( D ) = |{ σ k ( D ) > 0 : 1 ≤ k ≤ n }| = � n k =1 I σ k ( D ) > 0 Nuclear norm : � D � ∗ = � n k =1 σ k ( D ) Definition (nuclear norm minimization) Given an n × n data matrix D and an index set Ω of revealed entries. The nuclear minimization problem is minimize � X � ∗ subject to D ij = X ij ( i , j ) ∈ Ω X ∈ R n × n . A heuristic for rank minimization Nuclear norm is convex function (thus local optimum is global opt.) Can be optimized (more) efficiently via semidefinite programming. 18 / 35

Data Mining and Matrices 04 Matrix Completion Rainer Gemulla, - PowerPoint PPT Presentation

Data Mining and Matrices 04 Matrix Completion Rainer Gemulla, Pauli Miettinen May 02, 2013 Recommender systems Problem Set of users Set of items (movies, books, jokes, products, stories, ...) Feedback (ratings, purchase,

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Data Mining and Matrices 01 Introduction Rainer Gemulla, Pauli Miettinen April 18, 2013

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by

Transformations and Matrices Transformations I Transformations are functions Matrices

Matrices with Application to Page Rank Markov Matrices Pagerank Anil Maheshwari

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

Reporting Statistics T test There was a significant difference in the change scores between X

Visualizing Data and Summary Statistics Introduction to Evolution and Scientific Inquiry Dr.

The Data Cleaning Problem: Some Key Issues & Practical Approaches Ronald K. Pearson Daniel

Document Understanding Conference DUC 2006 Welcome! DUC 2006-2007 Program Committee John Conroy

1 Gaussian Fun Facts Well add to these as we go along! First, consider a Gaussian random

1st, choose a covariance model; 2nd, aprroximate the precision matrix Q ; 3rd, draw approximate

Inf erence p enalis ee dans les mod` eles ` a vraisemblance non explicite par des

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon

Data Mining and Matrices 04 Matrix Completion Rainer Gemulla, - PowerPoint PPT Presentation

Data Mining and Matrices 04 Matrix Completion Rainer Gemulla, Pauli Miettinen May 02, 2013 Recommender systems Problem Set of users Set of items (movies, books, jokes, products, stories, ...) Feedback (ratings, purchase,

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Data Mining and Matrices 01 Introduction Rainer Gemulla, Pauli Miettinen April 18, 2013

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices &amp; quadratic forms)

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal &amp; spectral matrices) by

Transformations and Matrices Transformations I Transformations are functions Matrices

Matrices with Application to Page Rank Markov Matrices Pagerank Anil Maheshwari

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

Reporting Statistics T test There was a significant difference in the change scores between X

Visualizing Data and Summary Statistics Introduction to Evolution and Scientific Inquiry Dr.

The Data Cleaning Problem: Some Key Issues &amp; Practical Approaches Ronald K. Pearson Daniel

Document Understanding Conference DUC 2006 Welcome! DUC 2006-2007 Program Committee John Conroy

1 Gaussian Fun Facts Well add to these as we go along! First, consider a Gaussian random

1st, choose a covariance model; 2nd, aprroximate the precision matrix Q ; 3rd, draw approximate

Inf erence p enalis ee dans les mod` eles ` a vraisemblance non explicite par des

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by

The Data Cleaning Problem: Some Key Issues & Practical Approaches Ronald K. Pearson Daniel