Regression models P R AC TIC IN G STATISTIC S IN TE R VIE W QU E - - PowerPoint PPT Presentation

regression models
SMART_READER_LITE
LIVE PREVIEW

Regression models P R AC TIC IN G STATISTIC S IN TE R VIE W QU E - - PowerPoint PPT Presentation

Regression models P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON Conor De w e y Data Scientist , Sq u arespace Getting started 1 Wikimedia PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON Ass u mptions Linear


slide-1
SLIDE 1

Regression models

P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON

Conor Dewey

Data Scientist, Squarespace

slide-2
SLIDE 2

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Getting started

Wikimedia

1

slide-3
SLIDE 3

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Assumptions

Linear relationship Errors are normally distributed Homoscedasticity Independent observations

slide-4
SLIDE 4

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Linear regression

Wikipedia

1

slide-5
SLIDE 5

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Linear regression

slide-6
SLIDE 6

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Example: linear regression

from sklearn.linear_model import LinearRegression lm = LinearRegression() lm.fit(X_train, y_train) LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

slide-7
SLIDE 7

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Example: linear regression

coef = lm.coef_ print(coef) [0.79086669]

slide-8
SLIDE 8

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Logistic regression

Wikimedia

1

slide-9
SLIDE 9

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Logistic regression

slide-10
SLIDE 10

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Example: logistic regression

from sklearn.linear_model import LogisticRegression clf = LogisticRegression(solver='lbfgs') clf.fit(X_train, y_train) LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='warn', n_jobs=None, penalty='l2', random_state=None, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False)

slide-11
SLIDE 11

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Example: logistic regression

coefs = clf.coef_ print(coefs) [[0.4015177 3.85056451]] accuracy = clf.score(X_test, y_test) print(accuracy) 0.8583333333333333

slide-12
SLIDE 12

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Summary

Review Assumptions Linear regression Logistic regression

slide-13
SLIDE 13

Let's prepare for the interview!

P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON

slide-14
SLIDE 14

Evaluating models

P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON

Conor Dewey

Data Scientist, Squarespace

slide-15
SLIDE 15

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Regression techniques

R-squared Mean absolute error (MAE) Mean squared error (MSE)

slide-16
SLIDE 16

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

R-squared

Wikimedia

1

slide-17
SLIDE 17

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

MAE vs. MSE

Wikimedia

1

slide-18
SLIDE 18

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

MAE vs. MSE

120 Data Science Interview Questions

1

slide-19
SLIDE 19

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Classification techniques

Precision Recall Confusion matrices

slide-20
SLIDE 20

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Precision

slide-21
SLIDE 21

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Recall

slide-22
SLIDE 22

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Confusion matrix

AB Tasty

1

slide-23
SLIDE 23

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Confusion matrix

AB Tasty

1

slide-24
SLIDE 24

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Confusion matrix

AB Tasty

1

slide-25
SLIDE 25

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Summary

R-squared Mean absolute error (MAE) vs. mean squared error (MSE) Precision and recall

slide-26
SLIDE 26

Let's prepare for the interview!

P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON

slide-27
SLIDE 27

Missing data and

  • utliers

P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON

Conor Dewey

Data Scientist, Squarespace

slide-28
SLIDE 28

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Handling missing data

Drop the whole row Impute missing values

slide-29
SLIDE 29

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Drop the whole row

df.dropna(inplace=True)

slide-30
SLIDE 30

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Impute missing values

Constant value Randomly selected record Mean, median, or mode Value estimated by another model

slide-31
SLIDE 31

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

A few useful functions

isnull() dropna() fillna()

slide-32
SLIDE 32

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Dealing with outliers

Standard deviations Interquartile range (IQR)

slide-33
SLIDE 33

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Standard deviations

Wikimedia

1

slide-34
SLIDE 34

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Interquartile range (IQR)

Wikimedia

1

slide-35
SLIDE 35

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Summary

Drop the whole row Impute missing values Standard deviations Interquartile range

slide-36
SLIDE 36

Let's prepare for the interview!

P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON

slide-37
SLIDE 37

Bias-variance tradeoff

P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON

Conor Dewey

Data Scientist, Squarespace

slide-38
SLIDE 38

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Types of error

Bias error Variance error Irreducible error

slide-39
SLIDE 39

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Bias error

How to Use Machine Learning to Predict the Quality of Wines

1

slide-40
SLIDE 40

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Variance error

How to Use Machine Learning to Predict the Quality of Wines

1

slide-41
SLIDE 41

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Bias-variance tradeoff

Sco Fortmann

1

slide-42
SLIDE 42

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Summary

Types of error Bias error Variance error Bias-variance tradeo

slide-43
SLIDE 43

Let's prepare for the interview!

P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON

slide-44
SLIDE 44

Wrapping up

P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON

Conor Dewey

Data Scientist, Squarespace

slide-45
SLIDE 45

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Chapter 1: Probability and sampling distributions

Conditional probabilities Central limit theorem Probability distributions

slide-46
SLIDE 46

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Chapter 2: Exploratory data analysis

Descriptive statistics Categorical data Encoding techniques Multivariate relationships

slide-47
SLIDE 47

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Chapter 3: Statistical experiments and significance testing

Condence intervals Hypothesis testing Power analysis Multiple comparisons

slide-48
SLIDE 48

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Chapter 4: Regression and classification

Linear regression Logistic regression Missing data and outliers Bias-variance tradeo

slide-49
SLIDE 49

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Some advice

Simulate the interview environment Practice explaining big concepts Know the business or product well Come prepared with ideas

slide-50
SLIDE 50

PRACTICING STATISTICS INTERVIEW QUESTIONS IN PYTHON

Resources

Data Science Career Resources Repo Practical Statistics for Data Scientists 120 Data Science Interview Questions Advice Applying to Data Science Jobs

slide-51
SLIDE 51

Good luck and thank you!

P R AC TIC IN G STATISTIC S IN TE R VIE W QU E STION S IN P YTH ON