V OCABULARY : Scatter plot It is a graph composed of points - - PowerPoint PPT Presentation
V OCABULARY : Scatter plot It is a graph composed of points - - PowerPoint PPT Presentation
D AY 49 C AUSATION AND C ORRELATION I NTRODUCTION A linear relationship between two variables is often determined by the use of a linear regression and the magnitude of the linearity determined by the correlation coefficient. We sometimes have
INTRODUCTION
A linear relationship between two variables is often determined by the use of a linear regression and the magnitude of the linearity determined by the correlation coefficient. We sometimes have cases where the correlation coefficient is strong yet, the response variable is not fully determined by the change in the dependent variable whose relationship is being determined. In such cases, we try to explain why and determine the main variable causing the response. In this lesson, we are going to explain why such cases occur.
VOCABULARY:
Scatter plot It is a graph composed of points representing the relationship between data of two related variables Line of the best fit It is the line in a scatter plot that best represents the points plotted Causation Refers to a relationship where the change in response variable is explained by the change in the independent variable
VOCABULARY:
Correlation A measure of how two variables tend to fluctuate simultaneously, either directly or inversely.
Correlation and Causation Correlation implies the simultaneous variations between two variables. It measures the magnitude
- f simultaneous change of two variables. When two
variables increase simultaneously without necessarily one affecting the change in another, the correlation is strong. Likewise, when one variable increases and another decreases to the same degree without necessarily one affecting the change in another, then two variables have a strong correlation.
As long as the change in independent variable does not affect to a significant degree in change in response variables, the relation is not causal relationship, that is, there is no causation. Causation occurs only if a change in one variable (the independent variable) leads to a change in another variables (the dependent variable).
EXAMPLE
A teacher would like to know relationship between age and proficiency in athletics. To achieve this, he identifies students and measure their proficiency in athletics in terms of the time taken to complete one complete cycle around a field. The percentages was awarded based on the table below
Finishing time (min) 1 – 1.2 1.3 -1.5 1.6 –1.8 1.9 -2.1 2.2-2.4 2.5-2.7 Score (%) 95 90 85 80 75 70 Finishing time (min) 1 – 1.2 1.3 -1.5 1.6 –1.8 1.9 -2.1 2.2-2.4 2.5-2.7 Score (%) 95 90 85 80 75 70
Based on the above, the following was collected
Finishing time (min) 2.8-3.0 3.1 -3.3 3.4 –3.6 3.7-3.9 Score (%) 65 60 55 50 Age 10 11 11 11 12 12 13 15 16 Score 60 65 65 65 70 70 75 85 95
Draw a scatter plot Determine the equation of the line Determine the correlation coefficient Explain your answer above and if it goes hand in hand with the real cause of proficiency in athletics. Solution Plotting the points in the table above, we have the following graph
y = 5.4232x + 6.2399 R² = 0.9959 10 20 30 40 50 60 70 80 90 100 2 4 6 8 10 12 14 16 18
Score (100%) Age (yrs)
Score
From the analysis above, we find that the correlation between the two variables is very
- strong. This implies that as the age increases, the
proficiency in athletics increases. However, this is not true since age is not major determinant of the proficiency in athletics rather, the genetic make up of a person and the period of
- training. Therefore, age and proficiency in athletics
do not have a causal relationship. In this cases, correlation does not imply causation.
HOMEWORK
1.
Identify two variables where correlation implies causation
2.
Identify two variables where two where correlation does not imply causation.
ANSWERS HOMEWORK
1.
These should be variables chosen on the basis of direct effect and response. Example, distance covered and time taken.
2.