Bivariate Data Marc H. Mehlman marcmehlman@yahoo.com University of - PowerPoint PPT Presentation

Bivariate Data Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 1 / 36

Table of Contents Bivariate Data 1 Scatterplots 2 Correlation 3 Two–Way Tables 4 Chapter #2 R Assignment 5 Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 2 / 36

Bivariate Data Bivariate Data Bivariate Data Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 3 / 36

Bivariate Data Bivariate data comes from measuring two aspects of the same item/individual. For instance, (70 , 178) , (72 , 192) , (74 , 184) , (68 , 181) is a random sample of size four obtained from four male college students. The bivariate data gives the height in inches and the weight in pounds of each of the for students. The third student sampled is 74 inches high and weighs 184 pounds. Can one variable be used to predict the other? Do tall people tend to weigh more? Definition A response (or dependent ) variable measures the outcome of a study. The explanatory (or independent ) variable is the one that predicts the response variable. Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 4 / 36

Scatterplots Scatterplots Scatterplots Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 5 / 36

Scatterplots Bivariate data  For each individual studied, we record Student Number Blood Alcohol ID of Beers Content data on two variables. 1 5 0.1 2 2 0.03 3 9 0.19  We then examine whether there is a 6 7 0.095 relationship between these two 7 3 0.07 variables: Do changes in one variable 9 3 0.02 tend to be associated with specific 11 4 0.07 changes in the other variables? 13 5 0.085 4 8 0.12 5 3 0.04 8 5 0.06 10 5 0.05 Here we have two quantitative variables 12 6 0.1 recorded for each of 16 students: 14 7 0.09 1. how many beers they drank 15 1 0.01 2. their resulting blood alcohol content (BAC) 16 4 0.05 Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 6 / 36

Scatterplots Scatterplots A scatterplot is used to display quantitative bivariate data. Each variable makes up one axis. Each individual is a point on the graph. Student Beers BAC 1 5 0.1 2 2 0.03 3 9 0.19 6 7 0.095 7 3 0.07 9 3 0.02 11 4 0.07 13 5 0.085 4 8 0.12 5 3 0.04 8 5 0.06 10 5 0.05 12 6 0.1 14 7 0.09 15 1 0.01 16 4 0.05 Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 7 / 36

Scatterplots > plot(trees$Girth~trees$Height,main="girth vs height") girth vs height ● 20 18 ● ● ● ● ● 16 ● trees$Girth ● ● 14 ● ● ● ● ● ● 12 ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● 8 65 70 75 80 85 trees$Height Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 8 / 36

Scatterplots How to scale a scatterplot Same data in all four plots Both variables should be given a similar amount of space:  Plot is roughly square  Points should occupy all the plot space (no blank space) Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 9 / 36

Scatterplots Interpreting scatterplots  After plotting two variables on a scatterplot, we describe the overall pattern of the relationship. Specifically, we look for …  Form : linear, curved, clusters, no pattern  Direction : positive, negative, no direction  Strength : how closely the points fit the “form”  … and clear deviations from that pattern  Outliers of the relationship Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 10 / 36

Scatterplots Form Linear No relationship Nonlinear Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 11 / 36

Scatterplots Direction Positive association : High values of one variable tend to occur together with high values of the other variable. Negative association : High values of one variable tend to occur together with low values of the other variable. Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 12 / 36

Scatterplots Strength The strength of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form. Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 13 / 36

Scatterplots Outliers An outlier is a data value that has a very low probability of occurrence (i.e., it is unusual or unexpected). In a scatterplot, outliers are points that fall outside of the overall pattern of the relationship. Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 14 / 36

Scatterplots Adding categorical variables to scatterplots Two or more relationships can be compared on a single scatterplot when we use different symbols for groups of points on the graph. The graph compares the association between thorax length and longevity of male fruit flies that are allowed to reproduce (green) or not (purple). The pattern is similar in both groups (linear, positive association), but male fruit flies not allowed to reproduce tend to live longer than reproducing male fruit flies of the same size. Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 15 / 36

Correlation Correlation Correlation Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 16 / 36

Correlation Definition Given the bivariate data, ( x 1 , y 1 ) , · · · , ( x n , y n ), the sample correlation coefficent (sample Pearson product-moment correlation coefficient) is n 1 � x j − ¯ x � � y j − ¯ y � r def � = . n − 1 s x s y j =1 The population correlation coefficient is denoted as N � x j − µ X � � y j − µ Y � = 1 ρ def � N σ X σ Y j =1 where the above sum is summed over the entire population of size N . One thinks of r as an estimator of ρ . Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 17 / 36

Correlation One can also use the formula n ( � n j =1 x j y j ) − ( � n j =1 x j )( � n j =1 y j ) r = �� 2 � � � 2 � �� n �� n n � n n � n j =1 x 2 j =1 y 2 j − j − j =1 x j j =1 y j R command: > cor(trees$Girth,trees$Height) [1] 0.5192801 Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 18 / 36

Correlation The correlation coefficient measures the strength of any linear relationship between X and Y . Properties of Correlation: cor ( X , Y ) = cor ( Y , X ). − 1 ≤ r ≤ 1, and scale invariant. if r is positive there is a positive linear relationship between the two variables. if r is negative there is a negative linear relationship between the two variables. the closer | r | is to one, the stronger the linear relationship between the two variables. if | r | = 1 (ie, r = 1 or − 1), all the data points lie on a straight line. Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 19 / 36

Bivariate Data Marc H. Mehlman marcmehlman@yahoo.com University of - PowerPoint PPT Presentation

Bivariate Data Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 1 / 36 Table of Contents Bivariate Data 1 Scatterplots 2 Correlation 3

Multivariate probability distributions September 1, 2017 STAT 151 Class 2 Slide 1 Outline

Bivariate Correlation r > 0 r < 0 r = 0 r = 0 r > 0 r = 0 remember: r measures

Linear Regression 18.05 Spring 2014 Agenda Fitting curves to bivariate data Measuring the

Linear Regression 18.05 Spring 2014 Agenda Fitting curves to bivariate data Measuring the

Linear Regression 18.05 Spring 2018 Agenda Fitting curves to bivariate data Measuring the

Bayesian and Non-Bayesian Analysis of Soccer Data using Bivariate Poisson Regression Models

Lionel Riou Fransca Univariate & bivariate Two kind of analysis Univariate

Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup . . . . . . . . . . . . .

Counting reducible and singular bivariate polynomials Joachim von zur Gathen Bonn 1 Four

Bivariate Count Processes for Earthquake Frequency Mathieu Boudreault & Arthur Charpentier

The Foundation of Regression Analysis Bivariate Linear Regression James H. Steiger Department of

Bivariate and conditional distributions Edwin Leuven Today Today we will continue our study of

Abstract 3-Rigidity and Bivariate Splines Bill Jackson School of Mathematical Sciences Queen

Bivariate Counting Processes for Risk Management Mathieu Boudreault & Arthur Charpentier

Business Statistics CONTENTS Data summaries Univariate summaries Bivariate summaries

Regression Models Bivariate data (y,x) Multivariate (y,x 1 ,,x k ) Suppose the

GET READY FOR TAKEOFF! Summer IP Experience, 2020 WELCOME MESSAGE Beth Laux Executive Director

FDA Evaluation of Point of Care Blood Glucose Meters Denise N. Johnson-Lyles, Ph.D., Scientific

8/5/20 After the Goldrush: Testing Medical Cannabis and CBD in Chronic Pain Patients Douglas

Public Safety and Substance Use Trends Presentation to the Substance Abuse Trend and Response

2/10/2016 Drug Testing, Workers Compensation And Unemployment: Are You Ready To Defend a

ACMS 20340 Statistics for Life Sciences Chapter 3: Scatterplots and Correlation Exploratory

Substance Use Disorders American Osteopathic Academy of Addiction Medicine For This Presentation:

Introduction to Toxicology Richard R. Rediske, Ph.D. Annis Water Resources Institute Grand

Bivariate Data Marc H. Mehlman marcmehlman@yahoo.com University of - PowerPoint PPT Presentation

Bivariate Data Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc Mehlman Marc Mehlman Marc Mehlman (University of New Haven) Bivariate Data 1 / 36 Table of Contents Bivariate Data 1 Scatterplots 2 Correlation 3

Multivariate probability distributions September 1, 2017 STAT 151 Class 2 Slide 1 Outline

Bivariate Correlation r &gt; 0 r &lt; 0 r = 0 r = 0 r &gt; 0 r = 0 remember: r measures

Linear Regression 18.05 Spring 2014 Agenda Fitting curves to bivariate data Measuring the

Linear Regression 18.05 Spring 2014 Agenda Fitting curves to bivariate data Measuring the

Linear Regression 18.05 Spring 2018 Agenda Fitting curves to bivariate data Measuring the

Bayesian and Non-Bayesian Analysis of Soccer Data using Bivariate Poisson Regression Models

Lionel Riou Fransca Univariate &amp; bivariate Two kind of analysis Univariate

Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup . . . . . . . . . . . . .

Counting reducible and singular bivariate polynomials Joachim von zur Gathen Bonn 1 Four

Bivariate Count Processes for Earthquake Frequency Mathieu Boudreault &amp; Arthur Charpentier

The Foundation of Regression Analysis Bivariate Linear Regression James H. Steiger Department of

Bivariate and conditional distributions Edwin Leuven Today Today we will continue our study of

Abstract 3-Rigidity and Bivariate Splines Bill Jackson School of Mathematical Sciences Queen

Bivariate Counting Processes for Risk Management Mathieu Boudreault &amp; Arthur Charpentier

Business Statistics CONTENTS Data summaries Univariate summaries Bivariate summaries

Regression Models Bivariate data (y,x) Multivariate (y,x 1 ,,x k ) Suppose the

GET READY FOR TAKEOFF! Summer IP Experience, 2020 WELCOME MESSAGE Beth Laux Executive Director

FDA Evaluation of Point of Care Blood Glucose Meters Denise N. Johnson-Lyles, Ph.D., Scientific

8/5/20 After the Goldrush: Testing Medical Cannabis and CBD in Chronic Pain Patients Douglas

Public Safety and Substance Use Trends Presentation to the Substance Abuse Trend and Response

2/10/2016 Drug Testing, Workers Compensation And Unemployment: Are You Ready To Defend a

ACMS 20340 Statistics for Life Sciences Chapter 3: Scatterplots and Correlation Exploratory

Substance Use Disorders American Osteopathic Academy of Addiction Medicine For This Presentation:

Introduction to Toxicology Richard R. Rediske, Ph.D. Annis Water Resources Institute Grand

Bivariate Correlation r > 0 r < 0 r = 0 r = 0 r > 0 r = 0 remember: r measures

Lionel Riou Fransca Univariate & bivariate Two kind of analysis Univariate

Bivariate Count Processes for Earthquake Frequency Mathieu Boudreault & Arthur Charpentier

Bivariate Counting Processes for Risk Management Mathieu Boudreault & Arthur Charpentier