Seriation & Ranking: Spectral Approach
Fajwel Fogel, CNRS & ENS, Paris. with Alexandre d’Aspremont, Francis Bach, Rodolphe Jenatton, & Milan Vojnovic CNRS, INRIA, ENS Paris & MSR Cambridge
1
Seriation & Ranking: Spectral Approach Fajwel Fogel , CNRS & - - PowerPoint PPT Presentation
Seriation & Ranking: Spectral Approach Fajwel Fogel , CNRS & ENS, Paris. with Alexandre dAspremont, Francis Bach, Rodolphe Jenatton, & Milan Vojnovic CNRS, INRIA, ENS Paris & MSR Cambridge 1 The seriation problem Pairwise
1
⌅ Pairwise similarity information Sij on n variables. ⌅ Suppose the data has a serial structure, i.e. there is an order π such that
20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 160
2
⌅ Genomes are cloned multiple times and randomly cut into shorter reads
⌅ Reorder the reads to recover the genome.
3
⌅ Combinatorial Solution [FJBA. 2013, Laurent et Seminaroti 2014]
⌅ 2-SUM: assign similar items to nearby positions in reordering, i.e. find
n
i,j=1
⌅ The 2-SUM problem is NP-Complete for generic matrices S [George and
4
1T x=0, kxk2=1
5
⌅ Exact for R-matrices. ⌅ Quite robust to noise. Arguments similar to perturbation results in spectral
⌅ Scales very well, especially when similarity matrix is sparse (as in DNA
6
7
⌅ sports competitions (e.g. chess, football. . . ) ⌅ crowdsourcing services (e.g. TopCoder. . . ) ⌅ online computer games. . . 8
⌅ ranking by score (e.g. #wins - #losses) [Huber, 1963; Wauthier et al., 2013] ⌅ ranking by “skills” under a probabilistic model [Bradley and Terry, 1952;
⌅ ranking according to principal eigenvector of a transition matrix [Page et al.,
⌅ . . .
⌅ missing comparisons ⌅ non transitive comparisons (i.e. a < b and b < c but a > c). 9
10
⌅ Input: a matrix of pairwise comparisons C where Ci,j 2 [1, 1] e.g. for a
⌅ Idea: count matching comparisons of i and j against other items k
11
⌅ Construct a similarity matrix S
i,j compared with k
⌅ Example: when σ(a, b) = 1 + ab, S = n11T + CCT.
⌅ Is it the right way to solve the ranking problem, in the presence of corrupted
12
⌅ A very simple procedure:
⌅ Might be improved by designing new similarities. 13
⌅ In applications, the design of the similarity can have a major impact. ⌅ For ranking, depending on the nature of your data (cardinal or ordinal data,
⌅ For DNA assembly, you would like to have a similarity robust to sequencing
⌅ Ongoing work... 14
⌅ Robustness to missing/corrupted comparisons
⌅ Exact recovery regime
⌅ Approximate recovery regime Competitive to other approaches for partial
15
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
rank item
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
rank item
16
⌅ Derive asymptotic analytical expression of Fiedler vector in noise free
⌅ Use perturbation results (i.e. Davis-Kahan) in order to bound the
⌅ Get theoretical guarantees for SerialRank in settings with only few
17
⌅ Use results on the convergence of Laplacian operators to provide a
⌅ Following the same analysis as in [Von Luxburg ’08] we can prove that
⌅ Moreover, we can characterize the eigenfunctions of the limit Laplacian
n ) by a differential equation, which gives an asymptotic
18
⌅ Taking the same notations as in [Von Luxburg ’08] we have here
⌅ We deduce that the range of d is [0.5, 0.75]. Interesting eigenvectors
19
⌅ We can also characterize eigenfunctions f by a differential equation
⌅ The asymptotic expression for the Fiedler vector is a solution to this differential
⌅ Very accurate numerically, even for small values of n. 20
2 4 6 8 10 −1 −0.5 0.5 1 Fiedler vector Asymptotic Fiedler vector 20 40 60 80 100 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 21
22
S be n ⇥ n positive definite matrices and let LR = L ˜ S LS. Let
S respectively.
23
24
25
⌅ Ranking as a seriation problem, with perturbation results. ⌅ Good performance on some applications, without specific tuning.
⌅ Impact of similarity measures. ⌅ Predictive power of SerialRank. 26
⌅ Links to papers & SerialRank tutorial: www.di.ens.fr/⇠fogel. ⌅ Support from a European Research Council starting grant (project SIPA) and
27