dimensionalit y red u ction feat u re e x traction
play

Dimensionalit y red u ction : feat u re e x traction P R AC TIC IN G - PowerPoint PPT Presentation

Dimensionalit y red u ction : feat u re e x traction P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist Uns u per v ised learning methods Principal component anal y sis ( PCA ) -->


  1. Dimensionalit y red u ction : feat u re e x traction P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  2. Uns u per v ised learning methods Principal component anal y sis ( PCA ) --> Lesson 3.1 Sing u lar v al u e decomposition ( SVD ) --> Lesson 3.1 Cl u stering / gro u ping --> Lesson 3.3 E x plorator y data mining PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  3. Dimensionalit y red u ction != feat u re selection 1 2 h � ps :// slidepla y er . com / slide /9699240/ h � ps ://www. anal y tics v idh y a . com / blog /2016/03/ practical - g u ide - principal - component - anal y sis - p y thon / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  4. C u rse of dimensionalit y 1 h � ps ://www.v isiond u mm y. com /2014/04/ c u rse - dimensionalit y- a � ect - classi � cation / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  5. 1- D search PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  6. 2- D search PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  7. 3- D search PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  8. Dimensionalit y red u ction methods PCA SVD PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  9. PCA PCA Relationship bet w een X and y Calc u lated b y � nding principal a x es Translates , rotates and scales Lo w er - dimensional projection of the data 1 h � ps :// scikit - learn . org / stable / mod u les / decomposition . html PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  10. SVD SVD Linear algebra and v ector calc u l u s Decomposes data matri x into three matrices Res u lts in ' sing u lar ' v al u es Variance in data appro x imatel y eq u als SS of sing u lar v al u es 1 h � ps :// gala xy datatech . com /2018/07/15/ sing u lar -v al u e - decomposition / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  11. Dimension red u ction f u nctions F u nction / method ret u rns sklearn.decomposition.PCA principal component anal y sis sklearn.decomposition.TruncatedSVD sing u lar v al u e decomposition PCA/SVD.fit_transform(X) � ts and transforms data PCA/SVD.explained_variance_ratio_ v ariance e x plained b y PCs Other matri x decomposition algorithms PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  12. Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON

  13. Dimensionalit y red u ction : v is u ali z ation techniq u es P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  14. Wh y dimensionalit y red u ction ? 1. Speed u p ML training 2. Vis u ali z ation 3. Impro v es acc u rac y PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  15. Vis u ali z ation techniq u es PCA t - SNE PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  16. Vis u ali z ing w ith PCA 1 h � ps :// districtdatalabs . sil v rback . com / principal - component - anal y sis -w ith - p y thon PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  17. Scree plot 1 h � ps :// to w ardsdatascience . com / a - step - b y- step - e x planation - of - principal - component - anal y sis - b 836 fb 9 c 97 e 2 PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  18. t - SNE Probabilistic Pairs of data points Lo w- dimensional embedding Plot embeddings PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  19. Vis u ali z ing w ith t - SNE # t-sne with loan data from sklearn.manifold import TSNE # t-sne viz import seaborn as sns plt.figure(figsize=(16,10)) sns.scatterplot( loans = pd.read_csv('loans_dataset.csv') x="t-SNE-PC-one", y="t-SNE-PC-two", hue="Loan Status", # Feature matrix palette=sns.color_palette(["grey","blue"]), X = loans.drop('Loan Status', axis=1) data=loans, legend="full", tsne = TSNE(n_components=2, verbose=1, perplexity=40) alpha=0.3 tsne_results = tsne.fit_transform(X) ) loans['t-SNE-PC-one'] = tsne_results[:,0] loans['t-SNE-PC-two'] = tsne_results[:,1] 1 h � ps :// scikit - learn . org / stable / mod u les / generated / sklearn . manifold . TSNE . html PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  20. Vis u ali z ing w ith t - SNE PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  21. PCA v s t - SNE digits data 1 h � ps :// to w ardsdatascience . com /v is u alising - high - dimensional - datasets -u sing - pca - and - t - sne - in - p y thon - 8 ef 87 e 7915 b PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  22. Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON

  23. Cl u stering anal y sis : selecting the right cl u stering algorithm P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  24. Cl u stering algorithms Feat u res >> Obser v ations Model training more challenging Rel y on distance calc u lations Most commonl y u sed u ns u per v ised techniq u e PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  25. Practical applications of cl u stering C u stomer segmentation Doc u ment classi � cation Ins u rance / transaction fra u d detection Image segmentation Anomal y detection Man y more ... PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  26. Distance metrics : Manhattan ( ta x icab ) distance 1 h � ps :// en .w ikipedia . org /w iki / Ta x icab _ geometr y PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  27. Distance metrics : E u clidian distance 1 h � p :// rosalind . info / glossar y/ e u clidean - distance / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  28. K - means 1. Initial centroids 2. Assign each obser v ation to nearest centroid 3. Create ne w centroids 4. Repeat steps 2 and 3 1 h � p :// sherr y to w ers . com /2013/10/24/ k - means - cl u stering / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  29. Hierarchical agglomerati v e cl u stering 1 h � ps ://www. datano v ia . com / en / lessons / agglomerati v e - hierarchical - cl u stering / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  30. Agglomerati v e cl u stering linkage Ward linkage Ma x im u m / complete linkage A v erage linkage Single linkage PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  31. Selecting a cl u stering algorithm Cl u ster stabilit y assessment K - means and HC u se E u clidian distance Inter - and intra - cl u ster distances " An appropriate dissimilarit y meas u re is far more important in obtaining s u ccess w ith cl u stering than choice of cl u stering algorithm ." - from Elements of Statistical Learning 1 h � ps :// slidepla y er . com / slide /8363774/ PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  32. Cl u stering f u nctions F u nction / method ret u rns sklearn.cluster.Kmeans K - Means cl u stering algorithm sklearn.cluster.AgglomerativeClustering Agglomerati v e cl u stering algorithm kmeans.inertia_ SS distances of obser v ations to closest cl u ster center scipy.cluster.hierarchy as sch Hierachical cl u stering for dendrograms sch.dendrogram() Dendrogram f u nction PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  33. Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON

  34. Cl u stering anal y sis : choosing the optimal n u mber of cl u sters P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

  35. Methods for optimal k Silho u e � e method Elbo w method PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  36. Silho u ette coefficient Composed of 2 scores Mean distance bet w een each obser v ation and all others : in the same cl u ster in the nearest cl u ster PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  37. Silho u ette coefficient v al u es Bet w een -1 and 1 1 near others in same cl u ster v er y far from others in other cl u sters -1 not near others in same cl u ster close to others in other cl u sters 0 denotes o v erlapping cl u sters PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  38. Silho u ette score 1 h � ps :// scikit - learn . org / stable / a u to _ e x amples / cl u ster / plot _ kmeans _ silho u e � e _ anal y sis . html PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  39. Elbo w method 1 h � ps ://www. datano v ia . com / en / lessons / determining - the - optimal - n u mber - of - cl u sters -3- m u st - kno w- methods / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  40. Optimal k selection f u nctions F u nction / method ret u rns sklearn.cluster.KMeans K - Means cl u stering algorithm sklearn.metrics.silhouette_score score bet w een -1 and 1 as meas u re of cl u ster stabilit y kmeans.inertia_ SS distances of obser v ations to closest cl u ster center range(start, stop) list of v al u es beginning w ith start , u p to b u t not incl u ding stop list.append(kmeans.inertia_) appends inertia v al u e to list PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

  41. Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend