cutting the dendrogram through permutation tests
play

Cutting the dendrogram through permutation tests Dario Bruzzese - PowerPoint PPT Presentation

Cutting the dendrogram through permutation tests Dario Bruzzese Domenico Vistocco dbruzzes@unina.it vistocco@unicas.it Department of Department of Preventive Medical Sciences Economics U NIVERSITY OF N APLES ITALY U NIVERSITY OF C ASSINO


  1. Cutting the dendrogram through permutation tests Dario Bruzzese Domenico Vistocco dbruzzes@unina.it vistocco@unicas.it Department of Department of Preventive Medical Sciences Economics U NIVERSITY OF N APLES ITALY U NIVERSITY OF C ASSINO ITALY Dario Bruzzese, Domenico Vistocco () Compstat 2010 1 / 19

  2. La Carte Motivation 1 The stairstep-like permutation procedure 2 Notation The outline Some results 3 Real datasets Synthetic dataset ToDo List 4 Dario Bruzzese, Domenico Vistocco () Compstat 2010 2 / 19

  3. La Carte Motivation 1 The stairstep-like permutation procedure 2 Notation The outline Some results 3 Real datasets Synthetic dataset ToDo List 4 Dario Bruzzese, Domenico Vistocco () Compstat 2010 3 / 19

  4. Motivation Automatically determine the optimal cut-off level of a dendrogram Explore partitions different from those allowed by an horizontal cut Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19

  5. Motivation Automatically determine the optimal cut-off level of a dendrogram Explore partitions different from those allowed by an horizontal cut The rep1HighNoise dataset Yeung KY, Medvedovic M, Bumgarner KY: Clustering gene-expression data with repeated measurements. Genome Biology, 2003, 4:R34 n = 200 p = 20 Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19

  6. Motivation Automatically determine the optimal cut-off level of a dendrogram Explore partitions different from those allowed by an horizontal cut Horizontal cut k = 3 Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19

  7. Motivation Automatically determine the optimal cut-off level of a dendrogram Explore partitions different from those allowed by an horizontal cut An alternative cut k = 3 Dario Bruzzese, Domenico Vistocco () Compstat 2010 4 / 19

  8. La Carte Motivation 1 The stairstep-like permutation procedure 2 Notation The outline Some results 3 Real datasets Synthetic dataset ToDo List 4 Dario Bruzzese, Domenico Vistocco () Compstat 2010 5 / 19

  9. Notation Let: Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  10. Notation Let: n the number of objects to classify; � � �� �� � � Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  11. Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  12. Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) C 1 C 1 L R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  13. Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) C 2 C 2 L R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  14. Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) C 3 C 3 L R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  15. Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k h the height necessary to merge R C k L and C k R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  16. Notation � � C 1 L ∪ C 1 h R Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k h the height necessary to merge R C k L and C k R C 1 C 1 L R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  17. Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C 2 L ∪ C 2 h � � C k L ∪ C k R h the height necessary to merge R C k L and C k R C 2 C 2 L R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  18. Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C 3 L ∪ C 3 � � C k L ∪ C k h h the height necessary to merge R R C k L and C k R C 3 C 3 L R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  19. Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k h the height necessary to merge R C k L and C k R � � C k the height at which C k h j has been obtained j (j ∈ { L, R }) Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  20. Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k � � h the height necessary to merge C 1 h R L C k L and C k R � � C k the height at which C k h j has been obtained j (j ∈ { L, R }) C 1 L Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  21. Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C 1 � � h C k L ∪ C k h the height necessary to merge R R C k L and C k R � � C k the height at which C k h j has been obtained j (j ∈ { L, R }) C 1 R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  22. Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k h the height necessary to merge R � � C 2 C k L and C k h L R � � C k the height at which C k h j has been obtained j (j ∈ { L, R }) C 2 L Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  23. Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k h the height necessary to merge R � � C 2 h C k L and C k R R � � C k the height at which C k h j has been obtained j (j ∈ { L, R }) C 2 R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  24. Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k h the height necessary to merge R C k L and C k � � C 3 h R L � � C k the height at which C k h j has been obtained j (j ∈ { L, R }) C 3 L Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  25. Notation Let: n the number of objects to classify; C k L and C k R the two classes merged at level k (k=1,...,n-1) � � C k L ∪ C k h the height necessary to merge � � R C 3 h R C k L and C k R � � C k the height at which C k h j has been obtained j (j ∈ { L, R }) C 3 R Dario Bruzzese, Domenico Vistocco () Compstat 2010 6 / 19

  26. The algorithm - Pseudo Code Input : A dataset and its related dendrogram Output : A partition of the dataset Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19

  27. The algorithm - Pseudo Code Input : A dataset and its related dendrogram Output : A partition of the dataset initialization: aggregationLevelsToVisit ← h ( C 1 L ∪ C 1 R ) permClusters ← [ ] i ← 1 Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19

  28. The algorithm - Pseudo Code Input : A dataset and its related dendrogram Output : A partition of the dataset initialization: aggregationLevelsToVisit ← h ( C 1 L ∪ C 1 R ) permClusters ← [ ] i ← 1 repeat if C i L ≡ C i R then add C i L ∪ C i R to permClusters else add h ( C i L ) and h ( C i R ) to aggregationLevelsToVisit sort aggregationLevelsToVisit in descending order end Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19

  29. The algorithm - Pseudo Code Input : A dataset and its related dendrogram Output : A partition of the dataset initialization: aggregationLevelsToVisit ← h ( C 1 L ∪ C 1 R ) permClusters ← [ ] i ← 1 repeat if C i L ≡ C i R then add C i L ∪ C i R to permClusters else add h ( C i L ) and h ( C i R ) to aggregationLevelsToVisit sort aggregationLevelsToVisit in descending order end remove the first element from aggregationLevelsToVisit i ← i+1 Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19

  30. The algorithm - Pseudo Code Input : A dataset and its related dendrogram Output : A partition of the dataset initialization: aggregationLevelsToVisit ← h ( C 1 L ∪ C 1 R ) permClusters ← [ ] i ← 1 repeat if C i L ≡ C i R then add C i L ∪ C i R to permClusters else add h ( C i L ) and h ( C i R ) to aggregationLevelsToVisit sort aggregationLevelsToVisit in descending order end remove the first element from aggregationLevelsToVisit i ← i+1 until aggregationLevelsToVisit is empty Dario Bruzzese, Domenico Vistocco () Compstat 2010 7 / 19

  31. The algorithm - The outline Initialization i ← 0 aggregationLevelsToVisit h ( C 1 L ∪ C 1 R ) permClusters Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

  32. The algorithm - The outline � � C 1 L ∪ C 1 h R Iteration i ← 1 aggregationLevelsToVisit h ( C 1 L ∪ C 1 R ) permClusters Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

  33. The algorithm - The outline Iteration i ← 1 aggregationLevelsToVisit h ( C 1 L ∪ C 1 R ) permClusters C 1 C 1 L R clusters to compare H 0 : C 1 L ≡ C 1 R �→ reject Dario Bruzzese, Domenico Vistocco () Compstat 2010 8 / 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend