vis u ali z ing hierarchies
play

Vis u ali z ing hierarchies U N SU P E R VISE D L E AR N IN G IN - PowerPoint PPT Presentation

Vis u ali z ing hierarchies U N SU P E R VISE D L E AR N IN G IN P YTH ON Benjamin Wilson Director of Research at lateral . io Vis u ali z ations comm u nicate insight " t - SNE " : Creates a 2 D map of a dataset ( later ) "


  1. Vis u ali z ing hierarchies U N SU P E R VISE D L E AR N IN G IN P YTH ON Benjamin Wilson Director of Research at lateral . io

  2. Vis u ali z ations comm u nicate insight " t - SNE " : Creates a 2 D map of a dataset ( later ) " Hierarchical cl u stering " ( this v ideo ) UNSUPERVISED LEARNING IN PYTHON

  3. A hierarch y of gro u ps Gro u ps of li v ing things can form a hierarch y Cl u sters are contained in one another UNSUPERVISED LEARNING IN PYTHON

  4. E u ro v ision scoring dataset Co u ntries ga v e scores to songs performed at the E u ro v ision 2016 2 D arra y of scores Ro w s are co u ntries , col u mns are songs 1 h � p ://www. e u ro v ision . t v/ page / res u lts UNSUPERVISED LEARNING IN PYTHON

  5. Hierarchical cl u stering of v oting co u ntries UNSUPERVISED LEARNING IN PYTHON

  6. Hierarchical cl u stering E v er y co u ntr y begins in a separate cl u ster At each step , the t w o closest cl u sters are merged Contin u e u ntil all co u ntries in a single cl u ster This is " agglomerati v e " hierarchical cl u stering UNSUPERVISED LEARNING IN PYTHON

  7. The dendrogram of a hierarchical cl u stering Read from the bo � om u p Vertical lines represent cl u sters UNSUPERVISED LEARNING IN PYTHON

  8. The dendrogram of a hierarchical cl u stering Read from the bo � om u p Vertical lines represent cl u sters UNSUPERVISED LEARNING IN PYTHON

  9. Dendrograms , step - b y- step UNSUPERVISED LEARNING IN PYTHON

  10. Dendrograms , step - b y- step UNSUPERVISED LEARNING IN PYTHON

  11. Dendrograms , step - b y- step UNSUPERVISED LEARNING IN PYTHON

  12. Dendrograms , step - b y- step UNSUPERVISED LEARNING IN PYTHON

  13. Dendrograms , step - b y- step UNSUPERVISED LEARNING IN PYTHON

  14. Dendrograms , step - b y- step UNSUPERVISED LEARNING IN PYTHON

  15. Dendrograms , step - b y- step UNSUPERVISED LEARNING IN PYTHON

  16. Hierarchical cl u stering w ith SciP y Gi v en samples ( the arra y of scores ), and country_names import matplotlib.pyplot as plt from scipy.cluster.hierarchy import linkage, dendrogram mergings = linkage(samples, method='complete') dendrogram(mergings, labels=country_names, leaf_rotation=90, leaf_font_size=6) plt.show() UNSUPERVISED LEARNING IN PYTHON

  17. Let ' s practice ! U N SU P E R VISE D L E AR N IN G IN P YTH ON

  18. Cl u ster labels in hierarchical cl u stering U N SU P E R VISE D L E AR N IN G IN P YTH ON Benjamin Wilson Director of Research at lateral . io

  19. Cl u ster labels in hierarchical cl u stering Not onl y a v is u ali z ation tool ! Cl u ster labels at an y intermediate stage can be reco v ered For u se in e . g . cross - tab u lations UNSUPERVISED LEARNING IN PYTHON

  20. Intermediate cl u sterings & height on dendrogram E . g . at height 15: B u lgaria , C y pr u s , Greece are one cl u ster R u ssia and Moldo v a are another Armenia in a cl u ster on its o w n UNSUPERVISED LEARNING IN PYTHON

  21. Dendrograms sho w cl u ster distances Height on dendrogram = distance bet w een merging cl u sters E . g . cl u sters w ith onl y C y pr u s and Greece had distance appro x. 6 UNSUPERVISED LEARNING IN PYTHON

  22. Dendrograms sho w cl u ster distances Height on dendrogram = distance bet w een merging cl u sters E . g . cl u sters w ith onl y C y pr u s and Greece had distance appro x. 6 This ne w cl u ster distance appro x. 12 from cl u ster w ith onl y B u lgaria UNSUPERVISED LEARNING IN PYTHON

  23. Intermediate cl u sterings & height on dendrogram Height on dendrogram speci � es ma x. distance bet w een merging cl u sters Don ' t merge cl u sters f u rther apart than this ( e . g . 15) UNSUPERVISED LEARNING IN PYTHON

  24. Distance bet w een cl u sters De � ned b y a " linkage method " In " complete " linkage : distance bet w een cl u sters is ma x. distance bet w een their samples Speci � ed v ia method parameter , e . g . linkage ( samples , method =" complete ") Di � erent linkage method , di � erent hierarchical cl u stering ! UNSUPERVISED LEARNING IN PYTHON

  25. E x tracting cl u ster labels Use the fcluster() f u nction Ret u rns a N u mP y arra y of cl u ster labels UNSUPERVISED LEARNING IN PYTHON

  26. E x tracting cl u ster labels u sing fcl u ster from scipy.cluster.hierarchy import linkage mergings = linkage(samples, method='complete') from scipy.cluster.hierarchy import fcluster labels = fcluster(mergings, 15, criterion='distance') print(labels) [ 9 8 11 20 2 1 17 14 ... ] UNSUPERVISED LEARNING IN PYTHON

  27. Aligning cl u ster labels w ith co u ntr y names Gi v en a list of strings country_names : import pandas as pd pairs = pd.DataFrame({'labels': labels, 'countries': country_names} print(pairs.sort_values('labels')) countries labels 5 Belarus 1 40 Ukraine 1 ... 36 Spain 5 8 Bulgaria 6 19 Greece 6 10 Cyprus 6 28 Moldova 7 ... UNSUPERVISED LEARNING IN PYTHON

  28. Let ' s practice ! U N SU P E R VISE D L E AR N IN G IN P YTH ON

  29. t - SNE for 2- dimensional maps U N SU P E R VISE D L E AR N IN G IN P YTH ON Benjamin Wilson Director of Research at lateral . io

  30. t - SNE for 2- dimensional maps t - SNE = " t - distrib u ted stochastic neighbor embedding " Maps samples to 2 D space ( or 3 D ) Map appro x imatel y preser v es nearness of samples Great for inspecting datasets UNSUPERVISED LEARNING IN PYTHON

  31. t - SNE on the iris dataset Iris dataset has 4 meas u rements , so samples are 4- dimensional t - SNE maps samples to 2 D space t - SNE didn ' t kno w that there w ere di � erent species ... y et kept the species mostl y separate UNSUPERVISED LEARNING IN PYTHON

  32. Interpreting t - SNE scatter plots "v ersicolor " and "v irginica " harder to disting u ish from one another Consistent w ith k - means inertia plot : co u ld arg u e for 2 cl u sters , or for 3 UNSUPERVISED LEARNING IN PYTHON

  33. t - SNE in sklearn 2 D N u mP y arra y samples print(samples) [[ 5. 3.3 1.4 0.2] [ 5. 3.5 1.3 0.3] [ 4.9 2.4 3.3 1. ] [ 6.3 2.8 5.1 1.5] ... [ 4.9 3.1 1.5 0.1]] List species gi v ing species of labels as n u mber (0, 1, or 2) print(species) [0, 0, 1, 2, ..., 0] UNSUPERVISED LEARNING IN PYTHON

  34. t - SNE in sklearn import matplotlib.pyplot as plt from sklearn.manifold import TSNE model = TSNE(learning_rate=100) transformed = model.fit_transform(samples) xs = transformed[:,0] ys = transformed[:,1] plt.scatter(xs, ys, c=species) plt.show() UNSUPERVISED LEARNING IN PYTHON

  35. t - SNE has onl y fit _ transform () Has a fit_transform() method Sim u ltaneo u sl y � ts the model and transforms the data Has no separate fit() or transform() methods Can ' t e x tend the map to incl u de ne w data samples M u st start o v er each time ! UNSUPERVISED LEARNING IN PYTHON

  36. t - SNE learning rate Choose learning rate for the dataset Wrong choice : points b u nch together Tr y v al u es bet w een 50 and 200 UNSUPERVISED LEARNING IN PYTHON

  37. Different e v er y time t - SNE feat u res are di � erent e v er y time Piedmont w ines , 3 r u ns , 3 di � erent sca � er plots ! ... ho w e v er : The w ine v arieties (= colors ) ha v e same position relati v e to one another UNSUPERVISED LEARNING IN PYTHON

  38. Let ' s practice ! U N SU P E R VISE D L E AR N IN G IN P YTH ON

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend