Descriptive clustering
Christel VRAIN, Thi-Bich-Hanh DAO
LIFO Université d’Orléans
Workshop on Machine Learning and Explainability
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 1 / 29
Descriptive clustering Christel VRAIN, Thi-Bich-Hanh DAO LIFO - - PowerPoint PPT Presentation
Descriptive clustering Christel VRAIN, Thi-Bich-Hanh DAO LIFO Universit dOrlans Workshop on Machine Learning and Explainability Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 1 / 29 Motivation Clustering used extensively in
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 1 / 29
◮ features cannot explain the clustering well ◮ data also described by another set of (potentially sparse and noisy)
◮ their consistency with human expectations ◮ their explanations to human
1
2
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 2 / 29
◮ introduced in the 80’s [Michalski & Stepp, 1983, Fisher, 1985] ◮ presently based on closed patterns (FCA and pattern mining) ◮ based on qualitative properties ◮ does not take into account quantitative attributes, nor distance
◮ based on dissimilarities between objects ◮ appropriate for quantitative data ◮ qualitative properties must be encapsulated in a distance Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 3 / 29
◮ for representing a partition ◮ for breaking symmetries ◮ user constraints: size, diameter, split, . . .
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 4 / 29
◮ Characterization ◮ Generalization ◮ Statistics
◮ Clustering and explanations are in the same representation space
◮ Clustering and explanations are in two different representation
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 5 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 6 / 29
◮ a set O of objects, a set I of Boolean properties ◮ a dissimilarity measure d(o, o′) for any o, o′ in O ◮ a binary database D: Dop = 1, when o satisfies property p
1
2
◮ Constraints of the distance-based model: partition, breaking
◮ Constraints from the conceptual model: an object is in a cluster iff it
p∈I A[c, p](1 − Dop) = 0
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 7 / 29
◮ motorization (diesel or not) ◮ drive wheels (4, 2 front, 2 rear) ◮ power (between 48 and 288) ◮ etc.
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 8 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 9 / 29
◮ good/compact in one modality (e.g. SIFT features for images or
◮ useful/descriptive in another modality (e.g. tags)
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 10 / 29
g f Criterion space
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 11 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 12 / 29
◮ X: n × f matrix of n data instances with f numerical features ◮ D: n × r matrix of the same n instances with r tag indicators
◮ cluster indication matrix Z: n × k boolean matrix
◮ cluster description matrix S: k × r boolean matrix
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 13 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 14 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 15 / 29
n
i<j,i,j=1 ZiZT j d(Xi, Xj)
<latexit sha1_base64="jSVOGtCK+Jy8ajRs97jD8Ie1fDs=">AC83icjVHLSsNAFD3G97vq0s1gESqUmoigoLoxqWi1VIfIUmnOjUvkokoxc9w50rc+gVu9RvEP9C/8M4YwQeiE5KcOfeM3PvdWNfpNI0nzuMzq7unt6+/oHBoeGR0cLY+G4aZYnHq17kR0nNdVLui5BXpZA+r8UJdwLX53vu6bqK753xJBVRuCMvYn4YOMehaArPkUTZhdlmqV7enmEr7CBwzu2WG6VmSi3VqzLo5DVbVG3W0c7jVLNFuWa3ZqxC0WzYurFfgIrB0XkazMqPOEADUTwkCEARwhJ2IeDlJ59WDARE3eINnEJIaHjHJcYIG1GWZwyHGJP6XtMu/2cDWmvPFOt9ugUn96ElAzTpIkoLyGsTmM6nmlnxf7m3dae6m4X9Hdzr4BYiRNi/9J9ZP5Xp2qRaGJR1yCoplgzqjovd8l0V9TN2aeqJDnExCncoHhC2NPKjz4zrUl17aq3jo6/6EzFqr2X52Z4VbekAVvfx/kTVOcqSxVra764upZPug+TmEKJxrmAVWxgE1WyvsI9HvBonBnXxo1x+5qdOSaCXxZxt0bkufqQ=</latexit><latexit sha1_base64="jSVOGtCK+Jy8ajRs97jD8Ie1fDs=">AC83icjVHLSsNAFD3G97vq0s1gESqUmoigoLoxqWi1VIfIUmnOjUvkokoxc9w50rc+gVu9RvEP9C/8M4YwQeiE5KcOfeM3PvdWNfpNI0nzuMzq7unt6+/oHBoeGR0cLY+G4aZYnHq17kR0nNdVLui5BXpZA+r8UJdwLX53vu6bqK753xJBVRuCMvYn4YOMehaArPkUTZhdlmqV7enmEr7CBwzu2WG6VmSi3VqzLo5DVbVG3W0c7jVLNFuWa3ZqxC0WzYurFfgIrB0XkazMqPOEADUTwkCEARwhJ2IeDlJ59WDARE3eINnEJIaHjHJcYIG1GWZwyHGJP6XtMu/2cDWmvPFOt9ugUn96ElAzTpIkoLyGsTmM6nmlnxf7m3dae6m4X9Hdzr4BYiRNi/9J9ZP5Xp2qRaGJR1yCoplgzqjovd8l0V9TN2aeqJDnExCncoHhC2NPKjz4zrUl17aq3jo6/6EzFqr2X52Z4VbekAVvfx/kTVOcqSxVra764upZPug+TmEKJxrmAVWxgE1WyvsI9HvBonBnXxo1x+5qdOSaCXxZxt0bkufqQ=</latexit><latexit sha1_base64="jSVOGtCK+Jy8ajRs97jD8Ie1fDs=">AC83icjVHLSsNAFD3G97vq0s1gESqUmoigoLoxqWi1VIfIUmnOjUvkokoxc9w50rc+gVu9RvEP9C/8M4YwQeiE5KcOfeM3PvdWNfpNI0nzuMzq7unt6+/oHBoeGR0cLY+G4aZYnHq17kR0nNdVLui5BXpZA+r8UJdwLX53vu6bqK753xJBVRuCMvYn4YOMehaArPkUTZhdlmqV7enmEr7CBwzu2WG6VmSi3VqzLo5DVbVG3W0c7jVLNFuWa3ZqxC0WzYurFfgIrB0XkazMqPOEADUTwkCEARwhJ2IeDlJ59WDARE3eINnEJIaHjHJcYIG1GWZwyHGJP6XtMu/2cDWmvPFOt9ugUn96ElAzTpIkoLyGsTmM6nmlnxf7m3dae6m4X9Hdzr4BYiRNi/9J9ZP5Xp2qRaGJR1yCoplgzqjovd8l0V9TN2aeqJDnExCncoHhC2NPKjz4zrUl17aq3jo6/6EzFqr2X52Z4VbekAVvfx/kTVOcqSxVra764upZPug+TmEKJxrmAVWxgE1WyvsI9HvBonBnXxo1x+5qdOSaCXxZxt0bkufqQ=</latexit><latexit sha1_base64="jSVOGtCK+Jy8ajRs97jD8Ie1fDs=">AC83icjVHLSsNAFD3G97vq0s1gESqUmoigoLoxqWi1VIfIUmnOjUvkokoxc9w50rc+gVu9RvEP9C/8M4YwQeiE5KcOfeM3PvdWNfpNI0nzuMzq7unt6+/oHBoeGR0cLY+G4aZYnHq17kR0nNdVLui5BXpZA+r8UJdwLX53vu6bqK753xJBVRuCMvYn4YOMehaArPkUTZhdlmqV7enmEr7CBwzu2WG6VmSi3VqzLo5DVbVG3W0c7jVLNFuWa3ZqxC0WzYurFfgIrB0XkazMqPOEADUTwkCEARwhJ2IeDlJ59WDARE3eINnEJIaHjHJcYIG1GWZwyHGJP6XtMu/2cDWmvPFOt9ugUn96ElAzTpIkoLyGsTmM6nmlnxf7m3dae6m4X9Hdzr4BYiRNi/9J9ZP5Xp2qRaGJR1yCoplgzqjovd8l0V9TN2aeqJDnExCncoHhC2NPKjz4zrUl17aq3jo6/6EzFqr2X52Z4VbekAVvfx/kTVOcqSxVra764upZPug+TmEKJxrmAVWxgE1WyvsI9HvBonBnXxo1x+5qdOSaCXxZxt0bkufqQ=</latexit>f(Z, S) = Σn
i<j,i,j=1ZiZT j d(Xi, Xj)
<latexit sha1_base64="JtFLvyJc6iUfIS5RvTB+qWpS4zo=">AC9XicjVHLSsNAFD3G97vq0s1gESqUkoigoLoxqWi1VIfIUmndWpeJBNRit/hzpW49Qvc6ieIf6B/4Z0xg9EJyQ5c+49Z+be68a+SKVpPncZ3T29f0Dg0PDI6Nj4WJyb0yhKPV73Ij5Ka6TcFyGvSiF9XosT7gSuz/fd0w0V3z/jSqicFdexPwocFqhaArPkUTZBatZqpd35tgqO9wRrcCxO2KlXWai3F61Lo9DVrdF3W4f7zZKNVuUa3Z7zi4UzYqpF/sJrBwUka+tqPCEQzQwUOGABwhJGEfDlJ6DmDBREzcETrEJYSEjnNcYoi0GWVxynCIPaVvi3YHORvSXnmWu3RKT69CSkZkTUV5CWJ3GdDzTzor9zbujPdXdLujv5l4BsRInxP6l+8j8r07VItHEkq5BUE2xZlR1Xu6S6a6om7NPVUlyiIlTuEHxhLCnlR9ZlqT6tpVbx0df9GZilV7L8/N8KpuSQO2vo/zJ6jOV5Yr1vZCcW09n/QApjGDEo1zEWvYxBaqZH2Fezg0Tg3ro0b4/Y91ejKNVP4soy7N5MDoGg=</latexit><latexit sha1_base64="JtFLvyJc6iUfIS5RvTB+qWpS4zo=">AC9XicjVHLSsNAFD3G97vq0s1gESqUkoigoLoxqWi1VIfIUmndWpeJBNRit/hzpW49Qvc6ieIf6B/4Z0xg9EJyQ5c+49Z+be68a+SKVpPncZ3T29f0Dg0PDI6Nj4WJyb0yhKPV73Ij5Ka6TcFyGvSiF9XosT7gSuz/fd0w0V3z/jSqicFdexPwocFqhaArPkUTZBatZqpd35tgqO9wRrcCxO2KlXWai3F61Lo9DVrdF3W4f7zZKNVuUa3Z7zi4UzYqpF/sJrBwUka+tqPCEQzQwUOGABwhJGEfDlJ6DmDBREzcETrEJYSEjnNcYoi0GWVxynCIPaVvi3YHORvSXnmWu3RKT69CSkZkTUV5CWJ3GdDzTzor9zbujPdXdLujv5l4BsRInxP6l+8j8r07VItHEkq5BUE2xZlR1Xu6S6a6om7NPVUlyiIlTuEHxhLCnlR9ZlqT6tpVbx0df9GZilV7L8/N8KpuSQO2vo/zJ6jOV5Yr1vZCcW09n/QApjGDEo1zEWvYxBaqZH2Fezg0Tg3ro0b4/Y91ejKNVP4soy7N5MDoGg=</latexit><latexit sha1_base64="JtFLvyJc6iUfIS5RvTB+qWpS4zo=">AC9XicjVHLSsNAFD3G97vq0s1gESqUkoigoLoxqWi1VIfIUmndWpeJBNRit/hzpW49Qvc6ieIf6B/4Z0xg9EJyQ5c+49Z+be68a+SKVpPncZ3T29f0Dg0PDI6Nj4WJyb0yhKPV73Ij5Ka6TcFyGvSiF9XosT7gSuz/fd0w0V3z/jSqicFdexPwocFqhaArPkUTZBatZqpd35tgqO9wRrcCxO2KlXWai3F61Lo9DVrdF3W4f7zZKNVuUa3Z7zi4UzYqpF/sJrBwUka+tqPCEQzQwUOGABwhJGEfDlJ6DmDBREzcETrEJYSEjnNcYoi0GWVxynCIPaVvi3YHORvSXnmWu3RKT69CSkZkTUV5CWJ3GdDzTzor9zbujPdXdLujv5l4BsRInxP6l+8j8r07VItHEkq5BUE2xZlR1Xu6S6a6om7NPVUlyiIlTuEHxhLCnlR9ZlqT6tpVbx0df9GZilV7L8/N8KpuSQO2vo/zJ6jOV5Yr1vZCcW09n/QApjGDEo1zEWvYxBaqZH2Fezg0Tg3ro0b4/Y91ejKNVP4soy7N5MDoGg=</latexit><latexit sha1_base64="JtFLvyJc6iUfIS5RvTB+qWpS4zo=">AC9XicjVHLSsNAFD3G97vq0s1gESqUkoigoLoxqWi1VIfIUmndWpeJBNRit/hzpW49Qvc6ieIf6B/4Z0xg9EJyQ5c+49Z+be68a+SKVpPncZ3T29f0Dg0PDI6Nj4WJyb0yhKPV73Ij5Ka6TcFyGvSiF9XosT7gSuz/fd0w0V3z/jSqicFdexPwocFqhaArPkUTZBatZqpd35tgqO9wRrcCxO2KlXWai3F61Lo9DVrdF3W4f7zZKNVuUa3Z7zi4UzYqpF/sJrBwUka+tqPCEQzQwUOGABwhJGEfDlJ6DmDBREzcETrEJYSEjnNcYoi0GWVxynCIPaVvi3YHORvSXnmWu3RKT69CSkZkTUV5CWJ3GdDzTzor9zbujPdXdLujv5l4BsRInxP6l+8j8r07VItHEkq5BUE2xZlR1Xu6S6a6om7NPVUlyiIlTuEHxhLCnlR9ZlqT6tpVbx0df9GZilV7L8/N8KpuSQO2vo/zJ6jOV5Yr1vZCcW09n/QApjGDEo1zEWvYxBaqZH2Fezg0Tg3ro0b4/Y91ejKNVP4soy7N5MDoGg=</latexit>Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 16 / 29
1
◮ minimize α + β ◮ useful when the tags contain noise 2
◮ the tags of a cluster are shared by all its instances (α = β = 0) ◮ maximize the tag set of each cluster (size of the smallest) ◮ Use: tags well populated with little noise 3
◮ each pair of instances in a same cluster must share at least q tags ◮ maximize q ◮ Use: tags are sparse Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 17 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 18 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 19 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 20 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 21 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 22 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 23 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 24 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 25 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 26 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 27 / 29
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 28 / 29
◮ Smart data ◮ Sampling ?? ◮ Relaxing the search for an optimal solution
Dao - Vrain (LIFO) Descriptive clustering 08/10/2018 29 / 29