Dimensionality Reduction and (Bucket) Ranking: a Mass - - PowerPoint PPT Presentation
Dimensionality Reduction and (Bucket) Ranking: a Mass - - PowerPoint PPT Presentation
Dimensionality Reduction and (Bucket) Ranking: a Mass Transportation Approach Mastane Achab, Anna Korba, Stephan Cl emen con DA2PL2018, Poznan, Poland Outline Introduction Dimensionality Reduction on S n Empirical Distortion
Outline
Introduction Dimensionality Reduction on Sn Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset
Outline
Introduction Dimensionality Reduction on Sn Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset
Introduction (1/2)
◮ Permutations over n items n = {1, . . . , n}
Introduction (1/2)
◮ Permutations over n items n = {1, . . . , n} ◮ Number of permutations explodes: #Sn = n!
Introduction (1/2)
◮ Permutations over n items n = {1, . . . , n} ◮ Number of permutations explodes: #Sn = n! ◮ Distribution P on Sn: n! − 1 parameters
Introduction (2/2)
◮ Question: ”How to summarize P?”
Introduction (2/2)
◮ Question: ”How to summarize P?” ◮ Answer: dimensionality reduction
Introduction (2/2)
◮ Question: ”How to summarize P?” ◮ Answer: dimensionality reduction ◮ Problem: no vector space structure for permutations
Outline
Introduction Dimensionality Reduction on Sn Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset
Preliminaries (1/2)
Bucket order C = (C1, . . . , CK): ordered partition of n
◮ Ci’s disjoint non empty subsets of n
Preliminaries (1/2)
Bucket order C = (C1, . . . , CK): ordered partition of n
◮ Ci’s disjoint non empty subsets of n ◮ ∪K k=1Ck = n
Preliminaries (1/2)
Bucket order C = (C1, . . . , CK): ordered partition of n
◮ Ci’s disjoint non empty subsets of n ◮ ∪K k=1Ck = n ◮ K: ”size” of C
Preliminaries (1/2)
Bucket order C = (C1, . . . , CK): ordered partition of n
◮ Ci’s disjoint non empty subsets of n ◮ ∪K k=1Ck = n ◮ K: ”size” of C ◮ (#C1, . . . , #CK): ”shape” of C
Partial order: ”i is ranked lower than j in C” if ∃k < l s.t. (i, j) ∈ Ck × Cl.
Preliminaries (2/2)
PC: set of all bucket distributions P′ associated to C
◮ P′ distribution on Sn
Preliminaries (2/2)
PC: set of all bucket distributions P′ associated to C
◮ P′ distribution on Sn ◮ if (i, j) ∈ Ck × Cl (k < l), then p′ j,i = 0
Preliminaries (2/2)
PC: set of all bucket distributions P′ associated to C
◮ P′ distribution on Sn ◮ if (i, j) ∈ Ck × Cl (k < l), then p′ j,i = 0 ◮ p′ i,j = P(Σ′(i) < Σ′(j)) for Σ′ ∼ P′
P′ ∈ PC described by dC =
k≤K #Ck! − 1 ≤ n! − 1 parameters
Background on Consensus Ranking
Consensus ranking (or ”ranking aggregation”): summarize permutations σ1, . . . , σN by a consensus/median ranking σ∗ ∈ Sn by solving: min
σ∈Sn N
- s=1
d(σ, σs).
Background on Consensus Ranking
Consensus ranking (or ”ranking aggregation”): summarize permutations σ1, . . . , σN by a consensus/median ranking σ∗ ∈ Sn by solving: min
σ∈Sn N
- s=1
d(σ, σs). If Σ1, . . . , ΣN i.i.d. sampled from P (Korba et al., 2017), solve: min
σ∈Sn EΣ∼P[d(Σ, σ)].
Kemeny medians
Particular choice for metric d:
Kemeny medians
Particular choice for metric d:
◮ Kendall’s τ distance
dτ(σ, σ′) =
i<j I{(σ(i) − σ(j))(σ′(i) − σ′(j)) < 0}.
Kemeny medians
Particular choice for metric d:
◮ Kendall’s τ distance
dτ(σ, σ′) =
i<j I{(σ(i) − σ(j))(σ′(i) − σ′(j)) < 0}. ◮ Kemeny medians are solutions of: minσ∈Sn EΣ∼P[dτ(Σ, σ)].
Kemeny medians
Particular choice for metric d:
◮ Kendall’s τ distance
dτ(σ, σ′) =
i<j I{(σ(i) − σ(j))(σ′(i) − σ′(j)) < 0}. ◮ Kemeny medians are solutions of: minσ∈Sn EΣ∼P[dτ(Σ, σ)].
Unique Kemeny median σ∗
P if P strictly stochastically transitive:
Kemeny medians
Particular choice for metric d:
◮ Kendall’s τ distance
dτ(σ, σ′) =
i<j I{(σ(i) − σ(j))(σ′(i) − σ′(j)) < 0}. ◮ Kemeny medians are solutions of: minσ∈Sn EΣ∼P[dτ(Σ, σ)].
Unique Kemeny median σ∗
P if P strictly stochastically transitive: ◮ pi,j ≥ 1/2 and pj,k ≥ 1/2 ⇒ pi,k ≥ 1/2 ◮ pi,j = 1/2 for all i < j ◮ given by Copeland ranking
σ∗
P(i) = 1 +
- j=i
I{pi,j < 1/2}.
Bucket orders of size n
Consensus ranking: extreme case of bucket order C of size n.
Bucket orders of size n
Consensus ranking: extreme case of bucket order C of size n.
◮ C = ({σ∗−1(1)}, . . . , {σ∗−1(n)})
Bucket orders of size n
Consensus ranking: extreme case of bucket order C of size n.
◮ C = ({σ∗−1(1)}, . . . , {σ∗−1(n)}) ◮ PC = {δσ∗}, hence dimension dC = 0
Bucket orders of size n
Consensus ranking: extreme case of bucket order C of size n.
◮ C = ({σ∗−1(1)}, . . . , {σ∗−1(n)}) ◮ PC = {δσ∗}, hence dimension dC = 0
Problem: generalization for any bucket order.
A Mass Transportation Approach
◮ Question: ”How to quantify approximation error between
- rginal distrib. P and bucket distrib. P′ ∈ PC?”.
A Mass Transportation Approach
◮ Question: ”How to quantify approximation error between
- rginal distrib. P and bucket distrib. P′ ∈ PC?”.
◮ Our answer: Wasserstein distance Wd,q (P, P′).
Definition
Wd,q
- P, P′
= inf
Σ∼P, Σ′∼P′ E
- dq(Σ, Σ′)
A Mass Transportation Approach
◮ Question: ”How to quantify approximation error between
- rginal distrib. P and bucket distrib. P′ ∈ PC?”.
◮ Our answer: Wasserstein distance Wd,q (P, P′).
Definition
Wd,q
- P, P′
= inf
Σ∼P, Σ′∼P′ E
- dq(Σ, Σ′)
- ◮ Why: because it generalizes consensus ranking. Indeed:
Wd,1 (P, δσ) = EΣ∼P[d(Σ, σ)].
A Mass Transportation Approach
◮ Question: ”How to quantify approximation error between
- rginal distrib. P and bucket distrib. P′ ∈ PC?”.
◮ Our answer: Wasserstein distance Wd,q (P, P′).
Definition
Wd,q
- P, P′
= inf
Σ∼P, Σ′∼P′ E
- dq(Σ, Σ′)
- ◮ Why: because it generalizes consensus ranking. Indeed:
Wd,1 (P, δσ) = EΣ∼P[d(Σ, σ)].
◮ Focus on d = dτ and q = 1.
Distortion measure
A bucket order C represents well P if small distortion ΛP(C).
Definition
ΛP(C) = min
P′∈PC
Wdτ,1(P, P′)
Distortion measure
A bucket order C represents well P if small distortion ΛP(C).
Definition
ΛP(C) = min
P′∈PC
Wdτ,1(P, P′) Explicit expression for ΛP(C):
Proposition
ΛP(C) =
- 1≤k<l≤K
- (i,j)∈Ck×Cl
pj,i .
Outline
Introduction Dimensionality Reduction on Sn Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset
Empirical setting
Training sample: Σ1, . . . , ΣN i.i.d. from P.
◮ Empirical pairwise probabilities:
- pi,j = 1
N
N
- s=1
I{Σs(i) < Σs(j)}.
Empirical setting
Training sample: Σ1, . . . , ΣN i.i.d. from P.
◮ Empirical pairwise probabilities:
- pi,j = 1
N
N
- s=1
I{Σs(i) < Σs(j)}.
◮ Empirical distortion of any bucket order C:
- ΛN(C) =
- i≺Cj
- pj,i = Λ
PN(C).
Rate bound
Empirical distortion minimizer CK,λ is solution of: min
C∈CK,λ
- ΛN(C),
where CK,λ set of bucket orders C of size K and shape λ (i.e. #Ck = λk for all 1 ≤ k ≤ K).
Theorem
For all δ ∈ (0, 1), we have with probability at least 1 − δ: ΛP( CK,λ) − inf
C∈CK,λ
ΛP(C) ≤ β(n, λ) ×
- log( 1
δ)
N .
The Strong Stochastic Transitive Case
Assume that P is strongly (and strictly) stochastically transitive i.e.: pi,j ≥ 1/2 and pj,k ≥ 1/2 ⇒ pi,k ≥ max(pi,j, pj,k).
The Strong Stochastic Transitive Case
Assume that P is strongly (and strictly) stochastically transitive i.e.: pi,j ≥ 1/2 and pj,k ≥ 1/2 ⇒ pi,k ≥ max(pi,j, pj,k).
Theorem
(i). ΛP(C) has a unique minimizer over CK,λ, denote it C∗(K,λ). (ii). C∗(K,λ) is the unique bucket order in CK,λ agreeing with the Kemeny median.
Consequence: agglomerative algorithm.
Outline
Introduction Dimensionality Reduction on Sn Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset
Experiments
Sushi dataset (Kamishima, 2003):
◮ n = 10 sushi dishes ◮ N = 5000 full rankings.