Dimensionalit y red u ction : feat u re e x traction P R AC TIC IN G - PowerPoint PPT Presentation

Dimensionalit y red u ction : feat u re e x traction P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

Uns u per v ised learning methods Principal component anal y sis ( PCA ) --> Lesson 3.1 Sing u lar v al u e decomposition ( SVD ) --> Lesson 3.1 Cl u stering / gro u ping --> Lesson 3.3 E x plorator y data mining PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Dimensionalit y red u ction != feat u re selection 1 2 h � ps :// slidepla y er . com / slide /9699240/ h � ps ://www. anal y tics v idh y a . com / blog /2016/03/ practical - g u ide - principal - component - anal y sis - p y thon / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

C u rse of dimensionalit y 1 h � ps ://www.v isiond u mm y. com /2014/04/ c u rse - dimensionalit y- a � ect - classi � cation / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

1- D search PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Dimensionalit y red u ction methods PCA SVD PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

PCA PCA Relationship bet w een X and y Calc u lated b y � nding principal a x es Translates , rotates and scales Lo w er - dimensional projection of the data 1 h � ps :// scikit - learn . org / stable / mod u les / decomposition . html PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

SVD SVD Linear algebra and v ector calc u l u s Decomposes data matri x into three matrices Res u lts in ' sing u lar ' v al u es Variance in data appro x imatel y eq u als SS of sing u lar v al u es 1 h � ps :// gala xy datatech . com /2018/07/15/ sing u lar -v al u e - decomposition / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Dimension red u ction f u nctions F u nction / method ret u rns sklearn.decomposition.PCA principal component anal y sis sklearn.decomposition.TruncatedSVD sing u lar v al u e decomposition PCA/SVD.fit_transform(X) � ts and transforms data PCA/SVD.explained_variance_ratio_ v ariance e x plained b y PCs Other matri x decomposition algorithms PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Let ' s practice ! P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON

Dimensionalit y red u ction : v is u ali z ation techniq u es P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

Wh y dimensionalit y red u ction ? 1. Speed u p ML training 2. Vis u ali z ation 3. Impro v es acc u rac y PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Vis u ali z ation techniq u es PCA t - SNE PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Vis u ali z ing w ith PCA 1 h � ps :// districtdatalabs . sil v rback . com / principal - component - anal y sis -w ith - p y thon PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Scree plot 1 h � ps :// to w ardsdatascience . com / a - step - b y- step - e x planation - of - principal - component - anal y sis - b 836 fb 9 c 97 e 2 PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

t - SNE Probabilistic Pairs of data points Lo w- dimensional embedding Plot embeddings PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Vis u ali z ing w ith t - SNE # t-sne with loan data from sklearn.manifold import TSNE # t-sne viz import seaborn as sns plt.figure(figsize=(16,10)) sns.scatterplot( loans = pd.read_csv('loans_dataset.csv') x="t-SNE-PC-one", y="t-SNE-PC-two", hue="Loan Status", # Feature matrix palette=sns.color_palette(["grey","blue"]), X = loans.drop('Loan Status', axis=1) data=loans, legend="full", tsne = TSNE(n_components=2, verbose=1, perplexity=40) alpha=0.3 tsne_results = tsne.fit_transform(X) ) loans['t-SNE-PC-one'] = tsne_results[:,0] loans['t-SNE-PC-two'] = tsne_results[:,1] 1 h � ps :// scikit - learn . org / stable / mod u les / generated / sklearn . manifold . TSNE . html PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Vis u ali z ing w ith t - SNE PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

PCA v s t - SNE digits data 1 h � ps :// to w ardsdatascience . com /v is u alising - high - dimensional - datasets -u sing - pca - and - t - sne - in - p y thon - 8 ef 87 e 7915 b PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Cl u stering anal y sis : selecting the right cl u stering algorithm P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

Cl u stering algorithms Feat u res >> Obser v ations Model training more challenging Rel y on distance calc u lations Most commonl y u sed u ns u per v ised techniq u e PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Practical applications of cl u stering C u stomer segmentation Doc u ment classi � cation Ins u rance / transaction fra u d detection Image segmentation Anomal y detection Man y more ... PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Distance metrics : Manhattan ( ta x icab ) distance 1 h � ps :// en .w ikipedia . org /w iki / Ta x icab _ geometr y PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Distance metrics : E u clidian distance 1 h � p :// rosalind . info / glossar y/ e u clidean - distance / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

K - means 1. Initial centroids 2. Assign each obser v ation to nearest centroid 3. Create ne w centroids 4. Repeat steps 2 and 3 1 h � p :// sherr y to w ers . com /2013/10/24/ k - means - cl u stering / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Hierarchical agglomerati v e cl u stering 1 h � ps ://www. datano v ia . com / en / lessons / agglomerati v e - hierarchical - cl u stering / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Agglomerati v e cl u stering linkage Ward linkage Ma x im u m / complete linkage A v erage linkage Single linkage PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Selecting a cl u stering algorithm Cl u ster stabilit y assessment K - means and HC u se E u clidian distance Inter - and intra - cl u ster distances " An appropriate dissimilarit y meas u re is far more important in obtaining s u ccess w ith cl u stering than choice of cl u stering algorithm ." - from Elements of Statistical Learning 1 h � ps :// slidepla y er . com / slide /8363774/ PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Cl u stering f u nctions F u nction / method ret u rns sklearn.cluster.Kmeans K - Means cl u stering algorithm sklearn.cluster.AgglomerativeClustering Agglomerati v e cl u stering algorithm kmeans.inertia_ SS distances of obser v ations to closest cl u ster center scipy.cluster.hierarchy as sch Hierachical cl u stering for dendrograms sch.dendrogram() Dendrogram f u nction PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Cl u stering anal y sis : choosing the optimal n u mber of cl u sters P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist

Methods for optimal k Silho u e � e method Elbo w method PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Silho u ette coefficient Composed of 2 scores Mean distance bet w een each obser v ation and all others : in the same cl u ster in the nearest cl u ster PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Silho u ette coefficient v al u es Bet w een -1 and 1 1 near others in same cl u ster v er y far from others in other cl u sters -1 not near others in same cl u ster close to others in other cl u sters 0 denotes o v erlapping cl u sters PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Silho u ette score 1 h � ps :// scikit - learn . org / stable / a u to _ e x amples / cl u ster / plot _ kmeans _ silho u e � e _ anal y sis . html PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Elbo w method 1 h � ps ://www. datano v ia . com / en / lessons / determining - the - optimal - n u mber - of - cl u sters -3- m u st - kno w- methods / PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Optimal k selection f u nctions F u nction / method ret u rns sklearn.cluster.KMeans K - Means cl u stering algorithm sklearn.metrics.silhouette_score score bet w een -1 and 1 as meas u re of cl u ster stabilit y kmeans.inertia_ SS distances of obser v ations to closest cl u ster center range(start, stop) list of v al u es beginning w ith start , u p to b u t not incl u ding stop list.append(kmeans.inertia_) appends inertia v al u e to list PRACTICING MACHINE LEARNING INTERVIEW QUESTIONS IN PYTHON

Dimensionalit y red u ction : feat u re e x traction P R AC TIC IN G - PowerPoint PPT Presentation

Dimensionalit y red u ction : feat u re e x traction P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E STION S IN P YTH ON Lisa St u art Data Scientist Uns u per v ised learning methods Principal component anal y sis ( PCA ) -->

[LE,RO] red red red red red red red red red red red red red red red red red red

Feat u re e x traction D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine

Uniqueness for a class of linear quadratic mean field games with common noise Foguen Tchuendom

Feat u re engineering P R E P R OC E SSIN G FOR MAC H IN E L E AR N IN G IN P YTH ON Sarah G

Red Eyes, Red Spots, and Red Flags Red Eyes Common reason for primary care visits Red

Economics of Steam Traction Economics of Steam Traction for the for the Transportation of Coal

Traction Getting Traction for (your) Open Source Projects Michael Boelen

HIGH POWER IGBT TRACTION DRIVES Marc DEBRUYNE Master Expert Traction Systems NIX Converter

Feat u re Generation FE ATU R E E N G IN E E R IN G W ITH P YSPAR K John Hog u e Lead Data

Feat ure Select ion using/ f or Feat ure Select ion using/ f or Transduct ive ransduct ive S

Red fox By Hunter.K Red fox traits A Red fox is a mammal.(Mammals have hair and are warm

Red- -Light Running Light Running Red Red-Light Running 2 Traffic Signals Traffic Signals

Red- -Light Running Light Running Red Red-Light Running 2 Traffic Signals Traffic Signals

80% of Code Red 2 Code Red 2 re-re- Code Red 1 and Code Red 2 Code Red 2 re- cleaned up

Lessons Learnt from Japanese Red Cross Response to 3.11 Naoki Shiratsuchi Japanese Red Cross

REAL-TIME 8K WORKFLOW | RED R3D SDK ABOUT RED EVOLUTION OF RED Jim Jannard founded the

An Introduction to Integral Equations Adrianna Gillman Rice University ICERM Workshop on Fast

Unsupervised Learning Maria-Florina Balcan 04/06/2015 Reading: Chapter 14.3: Hastie,

Which Multiple Testing Methods are Optimal? Peter H. Westfall, Texas Tech University Background

0 20 40 60 80 100 N(t) N(t) = k 2 k 2 1 + t B L (t) = k sample path t l t l+ 1 t k 1 (0,

Computer Graphics HDR Imaging Philipp Slusallek Overview HDR Acquisition Tone-Mapping

Converting to Blaise 5 Experiences and Lessons Learned Introduction Initial focus for

PSYCHOPHARMACOLOGIC APPROACHES TO DEPRESSION IN CHILDREN AND ADOLESCENTS Learning Objectives

AI Progress 4/26/17 How close are we to achieving AI? Weve come a long way. The techniques