Intuitive Parameterization of Distance-Based Clustering Techniques
Altobelli de Brito Mantuan amantuan@ic.uff.br Leandro A. F. Fernandes laffernandes@ic.uff.br
Many Faces of Distances - Campinas, Brazil - 2014
Intuitive Parameterization of Distance-Based Clustering Techniques - - PowerPoint PPT Presentation
Intuitive Parameterization of Distance-Based Clustering Techniques Altobelli de Brito Mantuan Leandro A. F. Fernandes amantuan@ic.uff.br laffernandes@ic.uff.br Many Faces of Distances - Campinas, Brazil - 2014 Conventional Pipeline Input
Many Faces of Distances - Campinas, Brazil - 2014
Many Faces of Distances - Campinas, Brazil - 2014 2
Apriori
{A, C, F} → {B} {A, D} → {F}
Input Incidence Matrix Mined Rules
A B C D E F T1 1 1 1 T2 1 1 1 1 ... ... ... ... ... ... ... TN 1 1 1
Transaction ID Milk Bread Butter Beer
T1 1 1 T2 1 T3 1 T4 1 1 1 T5 1
Association Rule: {Butter, Bread} → {Milk} Support = 20% Confidence = 50%
Many Faces of Distances - Campinas, Brazil - 2014 3
Apriori
{A, C, F} → {B} {A, D} → {F}
Input Incidence Matrix Mined Rules
Dual Scaling Clustering & Pruning
A B C D E F T1 1 1 1 T2 1 1 1 1 ... ... ... ... ... ... ... TN 1 1 1
Response Style Space
Many Faces of Distances - Campinas, Brazil - 2014 4 Nishisato, S. “On quantifying different types of categorical data”. In: Psychometrika. 58(4), pp.617-629, 1993.
Many Faces of Distances - Campinas, Brazil - 2014 5 Mapped transactions Mapped items T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21 T22 T24 T25 T26 T27
A space where transition and item are represented as points
** Fictitious data
Transition T is more related to (i.e., prefers) item A than to E
T A E
Many Faces of Distances - Campinas, Brazil - 2014 6
A context emerges from the existence of groups of items having similar preferences
A set of transition with similar preferences
Mapped transactions Mapped items
Elements in the same context are likely to be part of significant itemsets
** Fictitious data
Many Faces of Distances - Campinas, Brazil - 2014 7
Items Euclidean distance in reponse-style space
Many Faces of Distances - Campinas, Brazil - 2014 8
2) of each
2 = 𝑦𝑗 𝑈 𝐺𝑈 𝐸𝑠 −1 𝐺 𝑦𝑗
𝑈 𝐸𝑑 𝑦𝑗
𝑙 is produced using
𝑙 = 𝑌 𝐺−1 𝑡𝑗 𝑙 , for 𝑌 = 𝑦1, 𝑦2, ⋯ , 𝑦𝑛
Many Faces of Distances - Campinas, Brazil - 2014 9
A B C D E F T1 1 1 1 T2 1 1 1 1 ... ... ... ... ... ... ... TN 1 1 1
Input Incidence Matrix 𝐺
Many Faces of Distances - Campinas, Brazil - 2014 10 Mapped samples Mapped items Company: 2 Oil Loss: Yes Gas Loss: Yes Company: 4 Shift: Morn. Shift: Night Gas Loss: No Daylight Sav.: Yes Daylight Sav.: No Shift: After. Oil Loss: No
The samples characterize symmetric distributions around the original item in response style space
** Fictitious data Company: 1 Company: 3
Items
Many Faces of Distances - Campinas, Brazil - 2014 11
Complementary Bhattacharyya distance between distributions
(J𝑌: Jacobian matrix of 𝑔, Σ: covariance matrix)
Many Faces of Distances - Campinas, Brazil - 2014 12
Many Faces of Distances - Campinas, Brazil - 2014 13