dimensionality reduction and bucket ranking a mass
play

Dimensionality Reduction and (Bucket) Ranking: a Mass - PowerPoint PPT Presentation

Dimensionality Reduction and (Bucket) Ranking: a Mass Transportation Approach Mastane Achab, Anna Korba, Stephan Cl emen con DA2PL2018, Poznan, Poland Outline Introduction Dimensionality Reduction on S n Empirical Distortion


  1. Dimensionality Reduction and (Bucket) Ranking: a Mass Transportation Approach Mastane Achab, Anna Korba, Stephan Cl´ emen¸ con DA2PL’2018, Poznan, Poland

  2. Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

  3. Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

  4. Introduction (1/2) ◮ Permutations over n items � n � = { 1 , . . . , n }

  5. Introduction (1/2) ◮ Permutations over n items � n � = { 1 , . . . , n } ◮ Number of permutations explodes: # S n = n !

  6. Introduction (1/2) ◮ Permutations over n items � n � = { 1 , . . . , n } ◮ Number of permutations explodes: # S n = n ! ◮ Distribution P on S n : n ! − 1 parameters

  7. Introduction (2/2) ◮ Question: ”How to summarize P ?”

  8. Introduction (2/2) ◮ Question: ”How to summarize P ?” ◮ Answer: dimensionality reduction

  9. Introduction (2/2) ◮ Question: ”How to summarize P ?” ◮ Answer: dimensionality reduction ◮ Problem: no vector space structure for permutations

  10. Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

  11. Preliminaries (1/2) Bucket order C = ( C 1 , . . . , C K ): ordered partition of � n � ◮ C i ’s disjoint non empty subsets of � n �

  12. Preliminaries (1/2) Bucket order C = ( C 1 , . . . , C K ): ordered partition of � n � ◮ C i ’s disjoint non empty subsets of � n � ◮ ∪ K k =1 C k = � n �

  13. Preliminaries (1/2) Bucket order C = ( C 1 , . . . , C K ): ordered partition of � n � ◮ C i ’s disjoint non empty subsets of � n � ◮ ∪ K k =1 C k = � n � ◮ K : ”size” of C

  14. Preliminaries (1/2) Bucket order C = ( C 1 , . . . , C K ): ordered partition of � n � ◮ C i ’s disjoint non empty subsets of � n � ◮ ∪ K k =1 C k = � n � ◮ K : ”size” of C ◮ (# C 1 , . . . , # C K ): ”shape” of C Partial order: ” i is ranked lower than j in C ” if ∃ k < l s.t. ( i , j ) ∈ C k × C l .

  15. Preliminaries (2/2) P C : set of all bucket distributions P ′ associated to C ◮ P ′ distribution on S n

  16. Preliminaries (2/2) P C : set of all bucket distributions P ′ associated to C ◮ P ′ distribution on S n ◮ if ( i , j ) ∈ C k × C l ( k < l ), then p ′ j , i = 0

  17. Preliminaries (2/2) P C : set of all bucket distributions P ′ associated to C ◮ P ′ distribution on S n ◮ if ( i , j ) ∈ C k × C l ( k < l ), then p ′ j , i = 0 i , j = P (Σ ′ ( i ) < Σ ′ ( j )) for Σ ′ ∼ P ′ ◮ p ′ P ′ ∈ P C described by d C = � k ≤ K # C k ! − 1 ≤ n ! − 1 parameters

  18. Background on Consensus Ranking Consensus ranking (or ”ranking aggregation”): summarize permutations σ 1 , . . . , σ N by a consensus/median ranking σ ∗ ∈ S n by solving: N � min d ( σ, σ s ) . σ ∈ S n s =1

  19. Background on Consensus Ranking Consensus ranking (or ”ranking aggregation”): summarize permutations σ 1 , . . . , σ N by a consensus/median ranking σ ∗ ∈ S n by solving: N � min d ( σ, σ s ) . σ ∈ S n s =1 If Σ 1 , . . . , Σ N i.i.d. sampled from P (Korba et al., 2017), solve: σ ∈ S n E Σ ∼ P [ d (Σ , σ )] . min

  20. Kemeny medians Particular choice for metric d :

  21. Kemeny medians Particular choice for metric d : ◮ Kendall’s τ distance d τ ( σ, σ ′ ) = � i < j I { ( σ ( i ) − σ ( j ))( σ ′ ( i ) − σ ′ ( j )) < 0 } .

  22. Kemeny medians Particular choice for metric d : ◮ Kendall’s τ distance d τ ( σ, σ ′ ) = � i < j I { ( σ ( i ) − σ ( j ))( σ ′ ( i ) − σ ′ ( j )) < 0 } . ◮ Kemeny medians are solutions of: min σ ∈ S n E Σ ∼ P [ d τ (Σ , σ )].

  23. Kemeny medians Particular choice for metric d : ◮ Kendall’s τ distance d τ ( σ, σ ′ ) = � i < j I { ( σ ( i ) − σ ( j ))( σ ′ ( i ) − σ ′ ( j )) < 0 } . ◮ Kemeny medians are solutions of: min σ ∈ S n E Σ ∼ P [ d τ (Σ , σ )]. Unique Kemeny median σ ∗ P if P strictly stochastically transitive:

  24. Kemeny medians Particular choice for metric d : ◮ Kendall’s τ distance d τ ( σ, σ ′ ) = � i < j I { ( σ ( i ) − σ ( j ))( σ ′ ( i ) − σ ′ ( j )) < 0 } . ◮ Kemeny medians are solutions of: min σ ∈ S n E Σ ∼ P [ d τ (Σ , σ )]. Unique Kemeny median σ ∗ P if P strictly stochastically transitive: ◮ p i , j ≥ 1 / 2 and p j , k ≥ 1 / 2 ⇒ p i , k ≥ 1 / 2 ◮ p i , j � = 1 / 2 for all i < j ◮ given by Copeland ranking � σ ∗ P ( i ) = 1 + I { p i , j < 1 / 2 } . j � = i

  25. Bucket orders of size n Consensus ranking: extreme case of bucket order C of size n .

  26. Bucket orders of size n Consensus ranking: extreme case of bucket order C of size n . ◮ C = ( { σ ∗− 1 (1) } , . . . , { σ ∗− 1 ( n ) } )

  27. Bucket orders of size n Consensus ranking: extreme case of bucket order C of size n . ◮ C = ( { σ ∗− 1 (1) } , . . . , { σ ∗− 1 ( n ) } ) ◮ P C = { δ σ ∗ } , hence dimension d C = 0

  28. Bucket orders of size n Consensus ranking: extreme case of bucket order C of size n . ◮ C = ( { σ ∗− 1 (1) } , . . . , { σ ∗− 1 ( n ) } ) ◮ P C = { δ σ ∗ } , hence dimension d C = 0 Problem: generalization for any bucket order.

  29. A Mass Transportation Approach ◮ Question: ”How to quantify approximation error between orginal distrib. P and bucket distrib. P ′ ∈ P C ?”.

  30. A Mass Transportation Approach ◮ Question: ”How to quantify approximation error between orginal distrib. P and bucket distrib. P ′ ∈ P C ?”. ◮ Our answer: Wasserstein distance W d , q ( P , P ′ ). Definition � P , P ′ � � � d q (Σ , Σ ′ ) W d , q = Σ ∼ P , Σ ′ ∼ P ′ E inf

  31. A Mass Transportation Approach ◮ Question: ”How to quantify approximation error between orginal distrib. P and bucket distrib. P ′ ∈ P C ?”. ◮ Our answer: Wasserstein distance W d , q ( P , P ′ ). Definition � P , P ′ � � � d q (Σ , Σ ′ ) W d , q = Σ ∼ P , Σ ′ ∼ P ′ E inf ◮ Why: because it generalizes consensus ranking. Indeed: W d , 1 ( P , δ σ ) = E Σ ∼ P [ d (Σ , σ )] .

  32. A Mass Transportation Approach ◮ Question: ”How to quantify approximation error between orginal distrib. P and bucket distrib. P ′ ∈ P C ?”. ◮ Our answer: Wasserstein distance W d , q ( P , P ′ ). Definition � P , P ′ � � � d q (Σ , Σ ′ ) W d , q = Σ ∼ P , Σ ′ ∼ P ′ E inf ◮ Why: because it generalizes consensus ranking. Indeed: W d , 1 ( P , δ σ ) = E Σ ∼ P [ d (Σ , σ )] . ◮ Focus on d = d τ and q = 1.

  33. Distortion measure A bucket order C represents well P if small distortion Λ P ( C ). Definition W d τ , 1 ( P , P ′ ) Λ P ( C ) = min P ′ ∈ P C

  34. Distortion measure A bucket order C represents well P if small distortion Λ P ( C ). Definition W d τ , 1 ( P , P ′ ) Λ P ( C ) = min P ′ ∈ P C Explicit expression for Λ P ( C ): Proposition � � Λ P ( C ) = p j , i 1 ≤ k < l ≤ K ( i , j ) ∈C k ×C l .

  35. Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

  36. Empirical setting Training sample: Σ 1 , . . . , Σ N i.i.d. from P . ◮ Empirical pairwise probabilities: N � p i , j = 1 � I { Σ s ( i ) < Σ s ( j ) } . N s =1

  37. Empirical setting Training sample: Σ 1 , . . . , Σ N i.i.d. from P . ◮ Empirical pairwise probabilities: N � p i , j = 1 � I { Σ s ( i ) < Σ s ( j ) } . N s =1 ◮ Empirical distortion of any bucket order C : � � Λ N ( C ) = � p j , i = Λ � P N ( C ) . i ≺ C j

  38. Rate bound Empirical distortion minimizer � C K ,λ is solution of: � min Λ N ( C ) , C∈ C K ,λ where C K ,λ set of bucket orders C of size K and shape λ (i.e. # C k = λ k for all 1 ≤ k ≤ K ). Theorem For all δ ∈ (0 , 1) , we have with probability at least 1 − δ : � log( 1 δ ) Λ P ( � C K ,λ ) − inf Λ P ( C ) ≤ β ( n , λ ) × . N C∈ C K ,λ

  39. The Strong Stochastic Transitive Case Assume that P is strongly (and strictly) stochastically transitive i.e.: p i , j ≥ 1 / 2 and p j , k ≥ 1 / 2 ⇒ p i , k ≥ max ( p i , j , p j , k ) .

  40. The Strong Stochastic Transitive Case Assume that P is strongly (and strictly) stochastically transitive i.e.: p i , j ≥ 1 / 2 and p j , k ≥ 1 / 2 ⇒ p i , k ≥ max ( p i , j , p j , k ) . Theorem (i). Λ P ( C ) has a unique minimizer over C K ,λ , denote it C ∗ ( K ,λ ) . (ii). C ∗ ( K ,λ ) is the unique bucket order in C K ,λ agreeing with the Kemeny median.

  41. Consequence: agglomerative algorithm. Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

  42. Experiments Sushi dataset (Kamishima, 2003): ◮ n = 10 sushi dishes ◮ N = 5000 full rankings. sushi dataset K 10 4 3 dimension 4 10 3 5 10 2 6 7 10 1 8 0 10 20 distortion

  43. Thank you!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend