Dimensionality Reduction and (Bucket) Ranking: a Mass - - PowerPoint PPT Presentation

dimensionality reduction and bucket ranking a mass
SMART_READER_LITE
LIVE PREVIEW

Dimensionality Reduction and (Bucket) Ranking: a Mass - - PowerPoint PPT Presentation

Dimensionality Reduction and (Bucket) Ranking: a Mass Transportation Approach Mastane Achab, Anna Korba, Stephan Cl emen con DA2PL2018, Poznan, Poland Outline Introduction Dimensionality Reduction on S n Empirical Distortion


slide-1
SLIDE 1

Dimensionality Reduction and (Bucket) Ranking: a Mass Transportation Approach

Mastane Achab, Anna Korba, Stephan Cl´ emen¸ con DA2PL’2018, Poznan, Poland

slide-2
SLIDE 2

Outline

Introduction Dimensionality Reduction on Sn Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

slide-3
SLIDE 3

Outline

Introduction Dimensionality Reduction on Sn Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

slide-4
SLIDE 4

Introduction (1/2)

◮ Permutations over n items n = {1, . . . , n}

slide-5
SLIDE 5

Introduction (1/2)

◮ Permutations over n items n = {1, . . . , n} ◮ Number of permutations explodes: #Sn = n!

slide-6
SLIDE 6

Introduction (1/2)

◮ Permutations over n items n = {1, . . . , n} ◮ Number of permutations explodes: #Sn = n! ◮ Distribution P on Sn: n! − 1 parameters

slide-7
SLIDE 7

Introduction (2/2)

◮ Question: ”How to summarize P?”

slide-8
SLIDE 8

Introduction (2/2)

◮ Question: ”How to summarize P?” ◮ Answer: dimensionality reduction

slide-9
SLIDE 9

Introduction (2/2)

◮ Question: ”How to summarize P?” ◮ Answer: dimensionality reduction ◮ Problem: no vector space structure for permutations

slide-10
SLIDE 10

Outline

Introduction Dimensionality Reduction on Sn Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

slide-11
SLIDE 11

Preliminaries (1/2)

Bucket order C = (C1, . . . , CK): ordered partition of n

◮ Ci’s disjoint non empty subsets of n

slide-12
SLIDE 12

Preliminaries (1/2)

Bucket order C = (C1, . . . , CK): ordered partition of n

◮ Ci’s disjoint non empty subsets of n ◮ ∪K k=1Ck = n

slide-13
SLIDE 13

Preliminaries (1/2)

Bucket order C = (C1, . . . , CK): ordered partition of n

◮ Ci’s disjoint non empty subsets of n ◮ ∪K k=1Ck = n ◮ K: ”size” of C

slide-14
SLIDE 14

Preliminaries (1/2)

Bucket order C = (C1, . . . , CK): ordered partition of n

◮ Ci’s disjoint non empty subsets of n ◮ ∪K k=1Ck = n ◮ K: ”size” of C ◮ (#C1, . . . , #CK): ”shape” of C

Partial order: ”i is ranked lower than j in C” if ∃k < l s.t. (i, j) ∈ Ck × Cl.

slide-15
SLIDE 15

Preliminaries (2/2)

PC: set of all bucket distributions P′ associated to C

◮ P′ distribution on Sn

slide-16
SLIDE 16

Preliminaries (2/2)

PC: set of all bucket distributions P′ associated to C

◮ P′ distribution on Sn ◮ if (i, j) ∈ Ck × Cl (k < l), then p′ j,i = 0

slide-17
SLIDE 17

Preliminaries (2/2)

PC: set of all bucket distributions P′ associated to C

◮ P′ distribution on Sn ◮ if (i, j) ∈ Ck × Cl (k < l), then p′ j,i = 0 ◮ p′ i,j = P(Σ′(i) < Σ′(j)) for Σ′ ∼ P′

P′ ∈ PC described by dC =

k≤K #Ck! − 1 ≤ n! − 1 parameters

slide-18
SLIDE 18

Background on Consensus Ranking

Consensus ranking (or ”ranking aggregation”): summarize permutations σ1, . . . , σN by a consensus/median ranking σ∗ ∈ Sn by solving: min

σ∈Sn N

  • s=1

d(σ, σs).

slide-19
SLIDE 19

Background on Consensus Ranking

Consensus ranking (or ”ranking aggregation”): summarize permutations σ1, . . . , σN by a consensus/median ranking σ∗ ∈ Sn by solving: min

σ∈Sn N

  • s=1

d(σ, σs). If Σ1, . . . , ΣN i.i.d. sampled from P (Korba et al., 2017), solve: min

σ∈Sn EΣ∼P[d(Σ, σ)].

slide-20
SLIDE 20

Kemeny medians

Particular choice for metric d:

slide-21
SLIDE 21

Kemeny medians

Particular choice for metric d:

◮ Kendall’s τ distance

dτ(σ, σ′) =

i<j I{(σ(i) − σ(j))(σ′(i) − σ′(j)) < 0}.

slide-22
SLIDE 22

Kemeny medians

Particular choice for metric d:

◮ Kendall’s τ distance

dτ(σ, σ′) =

i<j I{(σ(i) − σ(j))(σ′(i) − σ′(j)) < 0}. ◮ Kemeny medians are solutions of: minσ∈Sn EΣ∼P[dτ(Σ, σ)].

slide-23
SLIDE 23

Kemeny medians

Particular choice for metric d:

◮ Kendall’s τ distance

dτ(σ, σ′) =

i<j I{(σ(i) − σ(j))(σ′(i) − σ′(j)) < 0}. ◮ Kemeny medians are solutions of: minσ∈Sn EΣ∼P[dτ(Σ, σ)].

Unique Kemeny median σ∗

P if P strictly stochastically transitive:

slide-24
SLIDE 24

Kemeny medians

Particular choice for metric d:

◮ Kendall’s τ distance

dτ(σ, σ′) =

i<j I{(σ(i) − σ(j))(σ′(i) − σ′(j)) < 0}. ◮ Kemeny medians are solutions of: minσ∈Sn EΣ∼P[dτ(Σ, σ)].

Unique Kemeny median σ∗

P if P strictly stochastically transitive: ◮ pi,j ≥ 1/2 and pj,k ≥ 1/2 ⇒ pi,k ≥ 1/2 ◮ pi,j = 1/2 for all i < j ◮ given by Copeland ranking

σ∗

P(i) = 1 +

  • j=i

I{pi,j < 1/2}.

slide-25
SLIDE 25

Bucket orders of size n

Consensus ranking: extreme case of bucket order C of size n.

slide-26
SLIDE 26

Bucket orders of size n

Consensus ranking: extreme case of bucket order C of size n.

◮ C = ({σ∗−1(1)}, . . . , {σ∗−1(n)})

slide-27
SLIDE 27

Bucket orders of size n

Consensus ranking: extreme case of bucket order C of size n.

◮ C = ({σ∗−1(1)}, . . . , {σ∗−1(n)}) ◮ PC = {δσ∗}, hence dimension dC = 0

slide-28
SLIDE 28

Bucket orders of size n

Consensus ranking: extreme case of bucket order C of size n.

◮ C = ({σ∗−1(1)}, . . . , {σ∗−1(n)}) ◮ PC = {δσ∗}, hence dimension dC = 0

Problem: generalization for any bucket order.

slide-29
SLIDE 29

A Mass Transportation Approach

◮ Question: ”How to quantify approximation error between

  • rginal distrib. P and bucket distrib. P′ ∈ PC?”.
slide-30
SLIDE 30

A Mass Transportation Approach

◮ Question: ”How to quantify approximation error between

  • rginal distrib. P and bucket distrib. P′ ∈ PC?”.

◮ Our answer: Wasserstein distance Wd,q (P, P′).

Definition

Wd,q

  • P, P′

= inf

Σ∼P, Σ′∼P′ E

  • dq(Σ, Σ′)
slide-31
SLIDE 31

A Mass Transportation Approach

◮ Question: ”How to quantify approximation error between

  • rginal distrib. P and bucket distrib. P′ ∈ PC?”.

◮ Our answer: Wasserstein distance Wd,q (P, P′).

Definition

Wd,q

  • P, P′

= inf

Σ∼P, Σ′∼P′ E

  • dq(Σ, Σ′)
  • ◮ Why: because it generalizes consensus ranking. Indeed:

Wd,1 (P, δσ) = EΣ∼P[d(Σ, σ)].

slide-32
SLIDE 32

A Mass Transportation Approach

◮ Question: ”How to quantify approximation error between

  • rginal distrib. P and bucket distrib. P′ ∈ PC?”.

◮ Our answer: Wasserstein distance Wd,q (P, P′).

Definition

Wd,q

  • P, P′

= inf

Σ∼P, Σ′∼P′ E

  • dq(Σ, Σ′)
  • ◮ Why: because it generalizes consensus ranking. Indeed:

Wd,1 (P, δσ) = EΣ∼P[d(Σ, σ)].

◮ Focus on d = dτ and q = 1.

slide-33
SLIDE 33

Distortion measure

A bucket order C represents well P if small distortion ΛP(C).

Definition

ΛP(C) = min

P′∈PC

Wdτ,1(P, P′)

slide-34
SLIDE 34

Distortion measure

A bucket order C represents well P if small distortion ΛP(C).

Definition

ΛP(C) = min

P′∈PC

Wdτ,1(P, P′) Explicit expression for ΛP(C):

Proposition

ΛP(C) =

  • 1≤k<l≤K
  • (i,j)∈Ck×Cl

pj,i .

slide-35
SLIDE 35

Outline

Introduction Dimensionality Reduction on Sn Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

slide-36
SLIDE 36

Empirical setting

Training sample: Σ1, . . . , ΣN i.i.d. from P.

◮ Empirical pairwise probabilities:

  • pi,j = 1

N

N

  • s=1

I{Σs(i) < Σs(j)}.

slide-37
SLIDE 37

Empirical setting

Training sample: Σ1, . . . , ΣN i.i.d. from P.

◮ Empirical pairwise probabilities:

  • pi,j = 1

N

N

  • s=1

I{Σs(i) < Σs(j)}.

◮ Empirical distortion of any bucket order C:

  • ΛN(C) =
  • i≺Cj
  • pj,i = Λ

PN(C).

slide-38
SLIDE 38

Rate bound

Empirical distortion minimizer CK,λ is solution of: min

C∈CK,λ

  • ΛN(C),

where CK,λ set of bucket orders C of size K and shape λ (i.e. #Ck = λk for all 1 ≤ k ≤ K).

Theorem

For all δ ∈ (0, 1), we have with probability at least 1 − δ: ΛP( CK,λ) − inf

C∈CK,λ

ΛP(C) ≤ β(n, λ) ×

  • log( 1

δ)

N .

slide-39
SLIDE 39

The Strong Stochastic Transitive Case

Assume that P is strongly (and strictly) stochastically transitive i.e.: pi,j ≥ 1/2 and pj,k ≥ 1/2 ⇒ pi,k ≥ max(pi,j, pj,k).

slide-40
SLIDE 40

The Strong Stochastic Transitive Case

Assume that P is strongly (and strictly) stochastically transitive i.e.: pi,j ≥ 1/2 and pj,k ≥ 1/2 ⇒ pi,k ≥ max(pi,j, pj,k).

Theorem

(i). ΛP(C) has a unique minimizer over CK,λ, denote it C∗(K,λ). (ii). C∗(K,λ) is the unique bucket order in CK,λ agreeing with the Kemeny median.

slide-41
SLIDE 41
slide-42
SLIDE 42

Consequence: agglomerative algorithm.

Outline

Introduction Dimensionality Reduction on Sn Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

slide-43
SLIDE 43

Experiments

Sushi dataset (Kamishima, 2003):

◮ n = 10 sushi dishes ◮ N = 5000 full rankings.

10 20 distortion 101 102 103 104 dimension

sushi dataset

K 3 4 5 6 7 8

slide-44
SLIDE 44

Thank you!