The c u rse of dimensionalit y D IME N SION AL ITY R E D U C TION - PowerPoint PPT Presentation

The c u rse of dimensionalit y D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine Learning Engineer , Faktion

From obser v ation to pattern Cit y Price Berlin 2 Paris 3 DIMENSIONALITY REDUCTION IN PYTHON

From obser v ation to pattern Cit y Price Berlin 2.0 Berlin 3.1 Berlin 4.3 Paris 3.0 Paris 5.2 ... ... DIMENSIONALITY REDUCTION IN PYTHON

B u ilding a cit y classifier - data split Separate the feat u re w e w ant to predict from the ones to train the model on . y = house_df['City'] X = house_df.drop('City', axis=1) Perform a 70% train and 30% test data split from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) DIMENSIONALITY REDUCTION IN PYTHON

B u ilding a cit y classifier - model fit Create a S u pport Vector Machine Classi � er and � t to training data from sklearn.svm import SVC svc = SVC() svc.fit(X_train, y_train) DIMENSIONALITY REDUCTION IN PYTHON

B u ilding a cit y classifier - predict from sklearn.metrics import accuracy_score print(accuracy_score(y_test, svc.predict(X_test))) 0.826 print(accuracy_score(y_train, svc.predict(X_train))) 0.832 DIMENSIONALITY REDUCTION IN PYTHON

Adding feat u res Cit y Price Berlin 2.0 Berlin 3.1 Berlin 4.3 Paris 3.0 Paris 5.2 ... ... DIMENSIONALITY REDUCTION IN PYTHON

Adding feat u res Cit y Price n _� oors n _ bathroom s u rface _ m 2 Berlin 2.0 1 1 190 Berlin 3.1 2 1 187 Berlin 4.3 2 2 240 Paris 3.0 2 1 170 Paris 5.2 2 2 290 ... ... ... ... ... DIMENSIONALITY REDUCTION IN PYTHON

Let ' s practice ! D IME N SION AL ITY R E D U C TION IN P YTH ON

Feat u res w ith missing v al u es or little v ariance D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine Learning Engineer , Faktion

Creating a feat u re selector print(ansur_df.shape) (6068, 94) from sklearn.feature_selection import VarianceThreshold sel = VarianceThreshold(threshold=1) sel.fit(ansur_df) mask = sel.get_support() print(mask) array([ True, True, ..., False, True]) DIMENSIONALITY REDUCTION IN PYTHON

Appl y ing a feat u re selector print(ansur_df.shape) (6068, 94) reduced_df = ansur_df.loc[:, mask] print(reduced_df.shape) (6068, 93) DIMENSIONALITY REDUCTION IN PYTHON

Variance selector ca v eats buttock_df.boxplot() DIMENSIONALITY REDUCTION IN PYTHON

Normali z ing the v ariance from sklearn.feature_selection import VarianceThreshold sel = VarianceThreshold(threshold=0.005) sel.fit(ansur_df / ansur_df.mean()) mask = sel.get_support() reduced_df = ansur_df.loc[:, mask] print(reduced_df.shape) (6068, 45) DIMENSIONALITY REDUCTION IN PYTHON

Missing v al u e selector DIMENSIONALITY REDUCTION IN PYTHON

Identif y ing missing v al u es pokemon_df.isna() DIMENSIONALITY REDUCTION IN PYTHON

Co u nting missing v al u es pokemon_df.isna().sum() Name 0 Type 1 0 Type 2 386 Total 0 HP 0 Attack 0 Defense 0 dtype: int64 DIMENSIONALITY REDUCTION IN PYTHON

Co u nting missing v al u es pokemon_df.isna().sum() / len(pokemon_df) Name 0.00 Type 1 0.00 Type 2 0.48 Total 0.00 HP 0.00 Attack 0.00 Defense 0.00 dtype: float64 DIMENSIONALITY REDUCTION IN PYTHON

Appl y ing a missing v al u e threshold # Fewer than 30% missing values = True value mask = pokemon_df.isna().sum() / len(pokemon_df) < 0.3 print(mask) Name True Type 1 True Type 2 False Total True HP True Attack True Defense True dtype: bool DIMENSIONALITY REDUCTION IN PYTHON

Appl y ing a missing v al u e threshold reduced_df = pokemon_df.loc[:, mask] reduced_df.head() DIMENSIONALITY REDUCTION IN PYTHON

Let ' s practice D IME N SION AL ITY R E D U C TION IN P YTH ON

Pair w ise correlation D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine Learning Engineer , Faktion

Pair w ise correlation sns.pairplot(ansur, hue="gender") DIMENSIONALITY REDUCTION IN PYTHON

Correlation coefficient DIMENSIONALITY REDUCTION IN PYTHON

Correlation matri x weights_df.corr() DIMENSIONALITY REDUCTION IN PYTHON

Vis u ali z ing the correlation matri x cmap = sns.diverging_palette(h_neg=10, h_pos=240, as_cmap=True) sns.heatmap(weights_df.corr(), center=0, cmap=cmap, linewidths=1, annot=True, fmt=".2f") DIMENSIONALITY REDUCTION IN PYTHON

Vis u ali z ing the correlation matri x corr = weights_df.corr() mask = np.triu(np.ones_like(corr, dtype=bool)) array([[ True, True, True], [False, True, True], [False, False, True]]) DIMENSIONALITY REDUCTION IN PYTHON

Vis u ali z ing the correlation matri x sns.heatmap(weights_df.corr(), mask=mask, center=0, cmap=cmap, linewidths=1, annot=True, fmt=".2f") DIMENSIONALITY REDUCTION IN PYTHON

Vis u alising the correlation matri x DIMENSIONALITY REDUCTION IN PYTHON

Remo v ing highl y correlated feat u res D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine Learning Engineer , Faktion

Highl y correlated data DIMENSIONALITY REDUCTION IN PYTHON

Highl y correlated feat u res DIMENSIONALITY REDUCTION IN PYTHON

Remo v ing highl y correlated feat u res # Create positive correlation matrix corr_df = chest_df.corr().abs() # Create and apply mask mask = np.triu(np.ones_like(corr_df, dtype=bool)) tri_df = corr_matrix.mask(mask) tri_df DIMENSIONALITY REDUCTION IN PYTHON

Remo v ing highl y correlated feat u res # Find columns that meet treshold to_drop = [c for c in tri_df.columns if any(tri_df[c] > 0.95)] print(to_drop) ['Suprasternale height', 'Cervicale height'] # Drop those columns reduced_df = chest_df.drop(to_drop, axis=1) DIMENSIONALITY REDUCTION IN PYTHON

Feat u re selection Feat u re e x traction DIMENSIONALITY REDUCTION IN PYTHON

Correlation ca v eats - Anscombe ' s q u artet DIMENSIONALITY REDUCTION IN PYTHON

Correlation ca v eats - ca u sation sns.scatterplot(x="N firetrucks sent to fire", y="N wounded by fire",data=fire_df) DIMENSIONALITY REDUCTION IN PYTHON

The c u rse of dimensionalit y D IME N SION AL ITY R E D U C TION - PowerPoint PPT Presentation

The c u rse of dimensionalit y D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine Learning Engineer , Faktion From obser v ation to pattern Cit y Price Berlin 2 Paris 3 DIMENSIONALITY REDUCTION IN PYTHON From obser v ation

RSE 2.0 RSE 2.0 Mark Woodbridge, Imperial College London deRSE19 Potsdam 6 June 2019

RSE - STEWARDSHIP AND SUSTAINABILITY George Mason General Manager Employment Services 2018

RSE Curriculum Focus Group Relationships and sex education Objectives We want to: Explain the

Relationship and Sex Education MONDAY, 18 TH JUNE RSE POLICY: COMPULSORY SCIENCE CURRICULUM RSE

Determining dimensionalit y FAC TOR AN ALYSIS IN R Jennifer Br u sso w Ps y chometrician Ho w

Dimensionalit y red u ction : feat u re e x traction P R AC TIC IN G MAC H IN E L E AR N IN G

CURRE NT ST AT E OF CYBE RSE CURI T Y Big Spe nding Wide spre a d Va c a nc ie s

E nro n Pre se nta tio n Outline E a rly L ife o f Arthur Ande rse n E nro nT he

Welcome to the co u rse ! L IN E AR C L ASSIFIE R S IN P YTH ON Michael ( Mike ) Gelbart Instr u

HI MSS Cyb e rse c urity Co mmunity Spo nso r 1 Se c urity F unda me nta ls b a se d o n the

1 https://trallard.github.io/Talks/RSE-shefeld The state of machine learning The state of

RIGHT STUFF EQUIPMENT PRODUCTS & SERVICES FOR THE THC/CBD MARKETS RSE AND THC AUTOMATION

DE F INING BE ST PRACT ICE F OR ADVE RSE E VE NT RE PORT ING WIT H INDUST RY-

PSHE in the Primary School Emma Saunders Year 4 Teacher, Pastoral Care (Primary) Presentation

West Orange High School COURS RSE RE REGISTRA RATION FOR OR RISING SENIORS YOU OUR COU

Co nstruc ting I nve rse Pro b a b ility We ig hts fo r Sta tic I nte rve ntio ns K unja l

CANDIS: Heterogenous Mobile Cloud Framework and Energy Cost-Aware Scheduling Sebastian Schildt,

T OWARDS R EALIZING THE P OTENTIAL OF M ALLEABLE P ARALLEL J OBS Bilge Acun acun2@illinois.edu

Furniture Committee Meeting October 2, 2018 Agenda Market Update Update on Budget and CR

Recent Schemes for X X Increasing the Quantum Z Z Fault-tolerance U L Threshold Ben

A Dynamic Near-Optimal Algorithm for Online Linear Programming Yinyu Ye Department of Management

Designing and Pricing Certificates Nima Haghpanah joint with Nageeb Ali, Xiao Lin, Ron Siegel

Expert Knowledge Is Needed From Guaranteed . . . for Design under A Designed System . . . Need

Negotiator Negotiator Policy Policy and and Configuration Configuration Greg Thain HTCondor

The c u rse of dimensionalit y D IME N SION AL ITY R E D U C TION - PowerPoint PPT Presentation

The c u rse of dimensionalit y D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine Learning Engineer , Faktion From obser v ation to pattern Cit y Price Berlin 2 Paris 3 DIMENSIONALITY REDUCTION IN PYTHON From obser v ation

RSE 2.0 RSE 2.0 Mark Woodbridge, Imperial College London deRSE19 Potsdam 6 June 2019

RSE - STEWARDSHIP AND SUSTAINABILITY George Mason General Manager Employment Services 2018

RSE Curriculum Focus Group Relationships and sex education Objectives We want to: Explain the

Relationship and Sex Education MONDAY, 18 TH JUNE RSE POLICY: COMPULSORY SCIENCE CURRICULUM RSE

Determining dimensionalit y FAC TOR AN ALYSIS IN R Jennifer Br u sso w Ps y chometrician Ho w

Dimensionalit y red u ction : feat u re e x traction P R AC TIC IN G MAC H IN E L E AR N IN G

CURRE NT ST AT E OF CYBE RSE CURI T Y Big Spe nding Wide spre a d Va c a nc ie s

E nro n Pre se nta tio n Outline E a rly L ife o f Arthur Ande rse n E nro nT he

Welcome to the co u rse ! L IN E AR C L ASSIFIE R S IN P YTH ON Michael ( Mike ) Gelbart Instr u

HI MSS Cyb e rse c urity Co mmunity Spo nso r 1 Se c urity F unda me nta ls b a se d o n the

1 https://trallard.github.io/Talks/RSE-shefeld The state of machine learning The state of

RIGHT STUFF EQUIPMENT PRODUCTS &amp; SERVICES FOR THE THC/CBD MARKETS RSE AND THC AUTOMATION

DE F INING BE ST PRACT ICE F OR ADVE RSE E VE NT RE PORT ING WIT H INDUST RY-

PSHE in the Primary School Emma Saunders Year 4 Teacher, Pastoral Care (Primary) Presentation

West Orange High School COURS RSE RE REGISTRA RATION FOR OR RISING SENIORS YOU OUR COU

Co nstruc ting I nve rse Pro b a b ility We ig hts fo r Sta tic I nte rve ntio ns K unja l

CANDIS: Heterogenous Mobile Cloud Framework and Energy Cost-Aware Scheduling Sebastian Schildt,

T OWARDS R EALIZING THE P OTENTIAL OF M ALLEABLE P ARALLEL J OBS Bilge Acun acun2@illinois.edu

Furniture Committee Meeting October 2, 2018 Agenda Market Update Update on Budget and CR

Recent Schemes for X X Increasing the Quantum Z Z Fault-tolerance U L Threshold Ben

A Dynamic Near-Optimal Algorithm for Online Linear Programming Yinyu Ye Department of Management

Designing and Pricing Certificates Nima Haghpanah joint with Nageeb Ali, Xiao Lin, Ron Siegel

Expert Knowledge Is Needed From Guaranteed . . . for Design under A Designed System . . . Need

Negotiator Negotiator Policy Policy and and Configuration Configuration Greg Thain HTCondor

RIGHT STUFF EQUIPMENT PRODUCTS & SERVICES FOR THE THC/CBD MARKETS RSE AND THC AUTOMATION