Rank aggregation via nuclear norm minimization David F. Gleich - - PowerPoint PPT Presentation

rank aggregation via nuclear norm minimization
SMART_READER_LITE
LIVE PREVIEW

Rank aggregation via nuclear norm minimization David F. Gleich - - PowerPoint PPT Presentation

Rank aggregation via nuclear norm minimization David F. Gleich Purdue University @dgleich Lek-Heng Lim University of Chicago KDD2011 San Diego, CA Lek funded by NSF CAREER award (DMS-1057064); David funded by DOE John von Neumann and NSERC


slide-1
SLIDE 1

Rank aggregation via nuclear norm minimization

David F. Gleich Purdue University @dgleich Lek-Heng Lim University of Chicago KDD2011 San Diego, CA

David F. Gleich (Purdue) KDD 2011 1/20

Lek funded by NSF CAREER award (DMS-1057064); David funded by DOE John von Neumann and NSERC

slide-2
SLIDE 2

Which is a better list of good DVDs?

Lord of the Rings 3: The Return of … Lord of the Rings 3: The Return of … Lord of the Rings 1: The Fellowship Lord of the Rings 1: The Fellowship Lord of the Rings 2: The T wo T

  • wers

Lord of the Rings 2: The T wo T

  • wers

Lost: Season 1 Star Wars V: Empire Strikes Back Battlestar Galactica: Season 1 Raiders of the Lost Ark Fullmetal Alchemist Star Wars IV: A New Hope Trailer Park Boys: Season 4 Shawshank Redemption Trailer Park Boys: Season 3 Star Wars VI: Return of the Jedi T enchi Muyo! Lord of the Rings 3: Bonus DVD Shawshank Redemption The Godfather Nuclear Norm based rank aggregation

(not matrix completion on the netflix rating matrix)

Standard rank aggregation

(the mean rating)

David F. Gleich (Purdue) KDD 2011 2/20

slide-3
SLIDE 3

Rank Aggregation

Given partial orders on subsets of items, rank aggregation is the problem of finding an overall ordering. Voting Find the winning candidate Program committees Find the best papers given reviews Dining Find the best restaurant in San Diego (subject to a budget?)

David F. Gleich (Purdue) KDD 2011 3/20

slide-4
SLIDE 4

Ranking is really hard

All rank aggregations involve some measure

  • f compromise

A good ranking is the “average” ranking under a permutation distance Ken Arrow John Kemeny Dwork, Kumar, Naor, Sivikumar NP hard to compute Kemeny’s ranking

David F. Gleich (Purdue) KDD 2011 4/20

slide-5
SLIDE 5

Given a hard problem, what do you do? Numerically relax! It’ll probably be easier.

Embody chair John Cantrell (flickr)

David F. Gleich (Purdue) KDD 2011 5/20

slide-6
SLIDE 6

Suppose we had scores

Let be the score of the ith movie/song/paper/team to rank Suppose we can compare the ith to jth: Then is skew-symmetric, rank 2. Also works for with an extra log.

Kemeny and Snell, Mathematical Models in Social Sciences (1978)

Numerical ranking is intimately intertwined with skew-symmetric matrices

David F. Gleich (Purdue) KDD 2011 6/20

slide-7
SLIDE 7

Using ratings as comparisons

Ratings induce various skew-symmetric matrices.

David 1988 –The Method of Paired Comparisons

Arithmetic Mean Log-odds

David F. Gleich (Purdue) KDD 2011 7/20

slide-8
SLIDE 8

Extracting the scores

Given with all entries, then is the Borda count, the least-squares solution to How many do we have? Most. Do we trust all ? Not really.

Netflix data 17k movies, 500k users, 100M ratings– 99.17% filled 105 107 101 101

David F. Gleich (Purdue) KDD 2011

105 Number of Comparisons Movie Pairs

8/20

slide-9
SLIDE 9

Only partial info? Complete it!

Let be known for We trust these scores. Goal Find the simplest skew-symmetric matrix that matches the data

David F. Gleich (Purdue) KDD 2011

noiseless noisy

Both of these are NP-hard too.

9/20

slide-10
SLIDE 10

Solution Go Nuclear

David F. Gleich (Purdue) KDD 2011

From a French nuclear test in 1970, image from http://picdit.wordpress.com/2008/07/21/8-insane-nuclear-explosions/

10

slide-11
SLIDE 11

The nuclear norm

For vectors is NP-hard while is convex and gives the same answer “under appropriate circumstances” For matrices Let be the SVD. best convex under- estimator of rank on unit ball.

David F. Gleich (Purdue) KDD 2011

The analog of the 1-norm or for matrices.

11/20

slide-12
SLIDE 12

Only partial info? Complete it!

Let be known for We trust these scores. Goal Find the simplest skew-symmetric matrix that matches the data

NP hard Convex Heuristic

David F. Gleich (Purdue) KDD 2011 12/20

slide-13
SLIDE 13

Solving the nuclear norm problem

Use a LASSO formulation Jain et al. propose SVP for this problem without 1.

  • 2. REPEAT

3. = rank-k SVD of 4. 5.

  • 6. UNTIL

Jain et al. NIPS 2010

David F. Gleich (Purdue) KDD 2011 13/20

slide-14
SLIDE 14

Skew-symmetric SVDs

Let be an skew-symmetric matrix with eigenvalues , where and . Then the SVD of is given by for and given in the proof. Proof Use the Murnaghan-Wintner form and the SVD of a 2x2 skew-symmetric block

This means that SVP will give us the skew- symmetric constraint “for free”

David F. Gleich (Purdue) KDD 2011 14/20

slide-15
SLIDE 15

Exact recovery results

David Gross showed how to recover Hermitian matrices. i.e. the conditions under which we get the exact Note that is Hermitian. Thus our new result!

Gross arXiv 2010.

David F. Gleich (Purdue) KDD 2011 15/20

slide-16
SLIDE 16

Recovery Discussion and Experiments

Confession If

, then just look at differences from a connected set. Constants? Not very good. Intuition for the truth.

David F. Gleich (Purdue) KDD 2011 16/20

slide-17
SLIDE 17

The Ranking Algorithm

  • 0. INPUT

(ratings data) and c (for trust on comparisons)

  • 1. Compute

from

  • 2. Discard entries with fewer than

c comparisons

  • 3. Set

to be indices and values of what’s left 4. = SVP( )

  • 5. OUTPUT

David F. Gleich (Purdue) KDD 2011 17/20

slide-18
SLIDE 18

Synthetic Results

Construct an Item Response Theory model. Vary number of ratings per user and a noise/error level

Our Algorithm! The Average Rating

David F. Gleich (Purdue) KDD 2011 18/20

slide-19
SLIDE 19

Conclusions and Future Work

“aggregate, then complete” Rank aggregation with the nuclear norm is principled easy to compute The results are much better than simple approaches.

  • 1. Compare against others
  • 2. Noisy recovery! More

realistic sampling.

  • 3. Skew-symmetric Lanczos

based SVD?

David F. Gleich (Purdue) KDD 2011 19/20

slide-20
SLIDE 20

Google nuclear ranking gleich