Introduction to Regression Analysis Modeling a Response A regression - - PowerPoint PPT Presentation

introduction to regression analysis
SMART_READER_LITE
LIVE PREVIEW

Introduction to Regression Analysis Modeling a Response A regression - - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Introduction to Regression Analysis Modeling a Response A regression model describes how a dependent variable (or response ) Y is affected, on


slide-1
SLIDE 1

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Introduction to Regression Analysis

Modeling a Response A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates) x1, x2, . . . , xk. Example Bleaching cotton: Y = measured whiteness of a cotton swatch x1 = temperature of bleaching bath x2 = time spent in the bath.

1 / 13 Introduction to Regression Analysis Modeling a Response

slide-2
SLIDE 2

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

The average value of Y , E(Y ), depends on x1, x2, . . . , xk, so it is a function of them: E(Y ) = f (x1, x2, . . . , xk) = f (x). We may know the general form of f (x), but it may contain constants β0, β1, . . . , βp whose values are unknown. So more completely, E(Y ) = f (x1, x2, . . . , xk; β0, β1, . . . , βp) = f (x, β). This equation is a regression model.

2 / 13 Introduction to Regression Analysis Modeling a Response

slide-3
SLIDE 3

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

In any given measurement, Y will differ from E(Y ). The difference ǫ = Y − E(Y ) is called the random error, and clearly E(ǫ) = E(Y ) − E(Y ) = 0. We can then write the regression model as Y = E(Y ) + ǫ = f (x, β) + ǫ.

3 / 13 Introduction to Regression Analysis Modeling a Response

slide-4
SLIDE 4

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Example: bleaching cotton Bleaching is a chemical reaction in which colored impurities are

  • xidized either to colorless products, or to soluble products that are

washed out. If we knew all the reactions, their rates at various temperatures, and the solubility of the products, we could use a process-based model to predict whiteness, E(Y ). In practice, we don’t have all the details, so instead we use an empirical model.

4 / 13 Introduction to Regression Analysis Modeling a Response

slide-5
SLIDE 5

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

The simplest empirical model is a linear function: E(Y ) = β0 + β1x1 + β2x2. A quadratic model gives a better approximation: E(Y ) = β0 + β1x1 + β2x2 + β3x1x2 + β4x2

1 + β5x2 2.

If β4 < 0, β5 < 0, and β2

3 < 4β4β5, this function has a

maximum, which gives the optimum combination of temperature and time.

5 / 13 Introduction to Regression Analysis Overview of Regression Analysis

slide-6
SLIDE 6

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Origin of “Regression”

Francis Galton studied inheritability of physical characteristics such as height. Consider the deviation of an individual’s height from the gender average. Suppose that the deviation height Y of a son is, on average, linearly related to the average deviation height x of his parents: E(Y ) = β0 + β1x

6 / 13 Introduction to Regression Analysis Regression Applications

slide-7
SLIDE 7

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

The intercept β0 measures overall increase in height between generations, which is interesting but not related to inheritability. If β1 = 1, the son inherits the full characteristic of his parents. If β1 = 0, there is no inheritability. Galton observed β1 ≈ 2/3, and described this as a regression to the

  • mean. (OED: from Latin regressus, from regredi ’go back, return’,

from re- ’back’ + gradi ’to walk’.)

7 / 13 Introduction to Regression Analysis Regression Applications

slide-8
SLIDE 8

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

See Francis Galton, “Regression towards mediocrity in hereditary stature”. The Journal of the Anthropological Institute of Great Britain and Ireland, Vol 15, pages 246–263. (or Wikipedia!) The term “regression” has since been used for any such analysis, involving one or more variables, and involving linear and nonlinear relationships, mostly having no connection with inheritability.

8 / 13 Introduction to Regression Analysis Regression Applications

slide-9
SLIDE 9

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Estimation

In a regression context, we sample from many populations. For example, in bleaching cotton, for each combination of temperature and time, we could test many cotton swatches. Each time, the measured whiteness is drawn from some population. The constants β0, β1, . . . , βp are parameters of that collection of populations.

9 / 13 Introduction to Regression Analysis Collecting the Data for Regression

slide-10
SLIDE 10

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

We need to make inferences about them, in the form of: point estimates; interval estimates; hypothesis tests. We shall get point estimates using the method of least squares. For other inferences, we need to know the distribution of the errors ǫ, and we shall assume that they are normally distributed.

10 / 13 Introduction to Regression Analysis Collecting the Data for Regression

slide-11
SLIDE 11

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Observational and Experimental Data

In some investigations, the independent variables x1, x2, . . . , xk can be controlled; that is, held at desired values. For example, time and temperature in the bleaching problem. The resulting data are called experimental.

11 / 13 Introduction to Regression Analysis Collecting the Data for Regression

slide-12
SLIDE 12

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

In other cases, the independent variables cannot be controlled, and their values are simply observed. For example, Galton’s heights of parents and sons. The resulting data are called observational. Observational data show how the value of the response is associated with values of the independent variables, but generally cannot reveal cause and effect. George Box: “To find out what happens to a system when you interfere with it, you have to interfere with it (not just passively

  • bserve it).”

12 / 13 Introduction to Regression Analysis Collecting the Data for Regression

slide-13
SLIDE 13

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Random Thoughts About Statistical Models

A model is a simplified representation of reality. George Box: “All models are wrong, but some are useful.” John Tukey: “An approximate answer to the right question is worth a good deal more than the exact answer to an approximate problem.” Albert Einstein: “For every complex question there is a simple and wrong solution.”

13 / 13 Introduction to Regression Analysis Random Thoughts