Modeling Real Data IN TR OD U C TION TO L IN E AR MOD E L IN G IN - - PowerPoint PPT Presentation

modeling real data
SMART_READER_LITE
LIVE PREVIEW

Modeling Real Data IN TR OD U C TION TO L IN E AR MOD E L IN G IN - - PowerPoint PPT Presentation

Modeling Real Data IN TR OD U C TION TO L IN E AR MOD E L IN G IN P YTH ON Jason Vest u to Data Scientist Scikit - Learn from sklearn.linear_model import LinearRegression # Initialize a general model model =


slide-1
SLIDE 1

Modeling Real Data

IN TR OD U C TION TO L IN E AR MOD E L IN G IN P YTH ON

Jason Vestuto

Data Scientist

slide-2
SLIDE 2

INTRODUCTION TO LINEAR MODELING IN PYTHON

Scikit-Learn

from sklearn.linear_model import LinearRegression # Initialize a general model model = LinearRegression(fit_intercept=True) # Load and shape the data x_raw, y_raw = load_data() x_data = x_raw.reshape(len(y_raw),1) y_data = y_raw.reshape(len(y_raw),1) # Fit the model to the data model_fit = model.fit(x_data, y_data)

slide-3
SLIDE 3

INTRODUCTION TO LINEAR MODELING IN PYTHON

Predictions and Parameters

# Extract the linear model parameters intercept = model.intercept_[0] slope = model.coef_[0,0] # Use the model to make predictions future_x = 2100 future_y = model.predict(future_x)

slide-4
SLIDE 4

INTRODUCTION TO LINEAR MODELING IN PYTHON

statsmodels

x, y = load_data() df = pd.DataFrame(dict(times=x_data, distances=y_data)) fig = df.plot('times', 'distances') model_fit = ols(formula="distances ~ times", data=df).fit()

slide-5
SLIDE 5

INTRODUCTION TO LINEAR MODELING IN PYTHON

Uncertainty

a0 = model_fit.params['Intercept'] a1 = model_fit.params['times'] e0 = model_fit.bse['Intercept'] e1 = model_fit.bse['times'] intercept = a0 slope = a1 uncertainty_in_intercept = e0 uncertainty_in_slope = e1

slide-6
SLIDE 6

Let's practice!

IN TR OD U C TION TO L IN E AR MOD E L IN G IN P YTH ON

slide-7
SLIDE 7

The Limits of Prediction

IN TR OD U C TION TO L IN E AR MOD E L IN G IN P YTH ON

Jason Vestuto

Data Scientist

slide-8
SLIDE 8

INTRODUCTION TO LINEAR MODELING IN PYTHON

Interpolation

slide-9
SLIDE 9

INTRODUCTION TO LINEAR MODELING IN PYTHON

Interpolation

slide-10
SLIDE 10

INTRODUCTION TO LINEAR MODELING IN PYTHON

Interpolation

slide-11
SLIDE 11

INTRODUCTION TO LINEAR MODELING IN PYTHON

Interpolation

slide-12
SLIDE 12

INTRODUCTION TO LINEAR MODELING IN PYTHON

Interpolation

slide-13
SLIDE 13

INTRODUCTION TO LINEAR MODELING IN PYTHON

Domain of Validity

zoom in: data looks linear model assumption: a2*x**2 + a3*x**3 + ... = zero. build a linear model: a0 + a1*x zoom out: your model breaks

slide-14
SLIDE 14

INTRODUCTION TO LINEAR MODELING IN PYTHON

Extrapolating Too Far

slide-15
SLIDE 15

Let's practice!

IN TR OD U C TION TO L IN E AR MOD E L IN G IN P YTH ON

slide-16
SLIDE 16

Goodness-of-Fit

IN TR OD U C TION TO L IN E AR MOD E L IN G IN P YTH ON

Jason Vestuto

Data Scientist

slide-17
SLIDE 17

INTRODUCTION TO LINEAR MODELING IN PYTHON

3 Different R's

Building Models: RSS Evaluating Models: RMSE R-squared

slide-18
SLIDE 18

INTRODUCTION TO LINEAR MODELING IN PYTHON

RMSE

residuals = y_model - y_data RSS = np.sum( np.square(residuals) ) mean_squared_residuals = np.sum( np.square(residuals) ) / len(residuals) MSE = np.mean( np.square(residuals) ) RMSE = np.sqrt(np.mean( np.square(residuals))) RMSE = np.std(residuals)

slide-19
SLIDE 19

INTRODUCTION TO LINEAR MODELING IN PYTHON

R-Squared in Code

Deviations:

deviations = np.mean(y_data) - y_data VAR = np.sum(np.square(deviations))

Residuals:

residuals = y_model - y_data RSS = np.sum(np.square(residuals))

R-squared:

r_squared = 1 - (RSS / VAR) r = correlation(y_data, y_model)

slide-20
SLIDE 20

INTRODUCTION TO LINEAR MODELING IN PYTHON

R-Squared in Data

slide-21
SLIDE 21

INTRODUCTION TO LINEAR MODELING IN PYTHON

R-Squared in Data

slide-22
SLIDE 22

INTRODUCTION TO LINEAR MODELING IN PYTHON

R-Squared in Data

slide-23
SLIDE 23

INTRODUCTION TO LINEAR MODELING IN PYTHON

R-Squared in Data

slide-24
SLIDE 24

INTRODUCTION TO LINEAR MODELING IN PYTHON

RMSE vs R-Squared

RMSE: how much variation is residual R-squared: what fraction of variation is linear

slide-25
SLIDE 25

Let's practice!

IN TR OD U C TION TO L IN E AR MOD E L IN G IN P YTH ON

slide-26
SLIDE 26

Standard Error

IN TR OD U C TION TO L IN E AR MOD E L IN G IN P YTH ON

Jason Vestuto

Data Scientist

slide-27
SLIDE 27

INTRODUCTION TO LINEAR MODELING IN PYTHON

Uncertainty in Predictions

Model Predictions and RMSE: predictions compared to data gives residuals residuals have spread RMSE, measures residual spread RMSE, quanties prediction goodness

slide-28
SLIDE 28

INTRODUCTION TO LINEAR MODELING IN PYTHON

Uncertainty in Parameters

Model Parameters and Standard Error: Parameter value as center Parameter standard error as spread Standard Error, measures parameter uncertainty

slide-29
SLIDE 29

INTRODUCTION TO LINEAR MODELING IN PYTHON

Computing Standard Errors

df = pd.DataFrame(dict(times=x_data, distances=y_data)) model_fit = ols(formula="distances ~ times", data=df).fit() a1 = model_fit.params['times'] a0 = model_fit.params['Intercept'] slope = a1 intercept = a0

slide-30
SLIDE 30

INTRODUCTION TO LINEAR MODELING IN PYTHON

Computing Standard Errors

e0 = model_fit.bse['Intercept'] e1 = model_fit.bse['times'] standard_error_of_intercept = e0 standard_error_of_slope = e1

slide-31
SLIDE 31

Let's practice!

IN TR OD U C TION TO L IN E AR MOD E L IN G IN P YTH ON