 
              Probability and Statistics ì for Computer Science “Correla)on is not Causa)on” but Correla)on is so beau)ful! Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 9.1.2020
" " # in your Please use sign * question formal indicate chat a to comment or . mic keep eo please your mute * sound quality the Zoom . the of websites out check please * chat the Notebook in & Code simulation .
Last time Parameters Location i Mode Mean IM ) Median , , parameters Scale : Inter quartile Standard ( g ) ' range ciqr ) deviation ( 62 ) variance x' I x ' Data : standardizing
Objectives � Median, Interquar)le range, box plot and outlier � ScaRer plots, Correla)on Coefficient Heatmap, 3D bar, Time series plots, I � Visualizing & Summarizing rela%onships
Median � To organize the data we first sort it � Then if the number of items N is odd median = middle item's value if the number of items N is even median = mean of middle 2 items' values
Properties of Median � Scaling data scales the median median ( { k · x i } ) = k · median ( { x i } ) c Ei , ki - ul ) a rgmmin median = � Transla)ng data translates the median median ( { x i + c } ) = median ( { x i } ) + c -
Percentile � k th percen)le is the value rela)ve to which k% of the data items have smaller or equal numbers � Median is roughly the 50 th percen)le 12 } I ' I 5 6 7 4 3 2 . , , , , , , ¥751 = ? percentile 6 > 5th .
Interquartile range � iqr = (75th percen)le) - (25th percen)le) -1 20 � Scaling data scales the interquar)le range iqr ( { k · x i } ) = | k | · iqr ( { x i } ) AT � Transla)ng data does NOT change the interquar)le range iqr ( { x i + c } ) = iqr ( { x i } )
Box plots Vehicle death by region � Boxplots � Simpler than histogram DEATH � Good for outliers � Easier to use for comparison Data from hRps://www2.stetson.edu/ ~jrasp/data.htm
Boxplots details, outliers � How to Outlier define > 1.5 iqr Whisker outliers? - (the default) foot Box Interquar)le Range (iqr) Median < 1.5 iqr
Q. TRUE or FALSE mean is more sensi)ve to outliers than median ⑦ True False B.
Q. TRUE or FALSE interquar)le range is more sensi)ve to outliers than std. A True ⑤ false
Sensitivity of summary statistics to outliers � mean and standard devia)on are - - very sensi)ve to outliers � median and interquar)le range are - - not sensi)ve to outliers
Modes � Modes are peaks in a histogram � If there are more than 1 mode, we should be curious as to why
Multiple modes � We have seen the “iris” data which looks to Iris have several peaks Data: “iris” in R
Example Bi-modes distribution � Modes may indicate mul)ple popula)ons blood cell red Data: Erythrocyte cells in healthy humans Piagnerelli, JCP 2007
Tails and Skews O tails outlier , C → night + nil Credit: Prof.Forsyth
t.tl#. - 3 3 Smiled - I 4 - z l o L : an arrears -
Q. How is this skewed? A Lep I B Right 46 mean = ? Median = 47
Looking at relationships in data � Finding rela)onships between features in a data set or many data - sets is one of the most important tasks in data analysis
Relationship between data features � Example: does the weight of people relate to their height? Q � x : HIGHT, y: WEIGHT
Scatter plot � Body Fat data set
Scatter plot � ScaRer plot with density O o° O
Scatter plot � Removed of outliers & standardized
Correlation y ✓ y ✓ covariance . Y . I ch 13
Correlation seen from scatter plots Zero Posi)ve Nega)ve Correla)on correla)on correla)on Credit: Prof.Forsyth
What kind of Correlation? � Line of code in a database and number of bugs � Frequency of hand washing and number of germs on your hands � GPA and hours spent playing video games � earnings and happiness Credit: Prof. David Varodayan
Correlation doesn’t mean causation � Shoe size is correlated to reading skills, but it doesn’t mean making feet grow will make one person read faster.
Correlation Coefficient � Given a data set consis)ng of { ( x i , y i ) } items ( x 1 , y 1 ) ... ( x N , y N ) , � Standardize the coordinates of each feature: x i = x i − mean ( { x i } ) y i = y i − mean ( { y i } ) � � std ( { x i } ) std ( { y i } ) � Define the correla)on coefficient as: N � corr ( { ( x i , y i ) } ) = 1 x i � � y i N i =1
Correlation Coefficient x i = x i − mean ( { x i } ) y i = y i − mean ( { y i } ) � � std ( { x i } ) std ( { y i } ) � N corr ( { ( x i , y i ) } ) = 1 x i � � y i N i =1 = mean ( { � y i } ) x i �
Q: Correlation Coefficient � Which of the following describe(s) correla)on coefficient correctly? A. It’s unitless B. It’s defined in standard coordinates o C. Both A & B N � corr ( { ( x i , y i ) } ) = 1 x i � � y i N i =1
A visualization of correlation coefficient hRps://rpsychologist.com/d3/correla)on/ In a data set consis)ng of items { ( x i , y i ) } ( x 1 , y 1 ) ... ( x N , y N ) , shows posi)ve correla)on corr ( { ( x i , y i ) } ) > 0 shows nega)ve correla)on corr ( { ( x i , y i ) } ) < 0 shows no correla)on corr ( { ( x i , y i ) } ) = 0
The Properties of Correlation Coefficient � The correla)on coefficient is symmetric corr ( { ( x i , y i ) } ) = corr ( { ( y i , x i ) } ) � Transla)ng the data does NOT change the correla)on coefficient
The Properties of Correlation Coefficient � Scaling the data may change the sign of the correla)on coefficient corr ( { ( a x i + b, c y i + d ) } ) = sign ( a c ) corr ( { ( x i , y i ) } )
4 : - Z - 44 4 - Z 2 O
4 : -2 -4 - 4 -2 4 0 2
The Properties of Correlation Coefficient � The correla)on coefficient is bounded within [-1, 1] if and only if x i = � � corr ( { ( x i , y i ) } ) = 1 y i if and only if corr ( { ( x i , y i ) } ) = − 1 x i = − � � y i
Which%of%the%following%has%correlation% coefficient%equal%to%1?% Y Y Y ÷ . . × ^ a A. #Leb#and#right# B. #Leb# C. #Middle# #
Concept of Correlation Coefficient’s bound � The correla)on coefficient can be wriRen as � N corr ( { ( x i , y i ) } ) = 1 x i � � y i T > N vi. U i =1 N - Vi = -2 Ui � N � � x i y i II corr ( { ( x i , y i ) } ) = √ √ N N i =1 � It’s the inner product of two vectors � � � � and � y 1 � y N � x 1 � x N √ √ √ √ N , ... N , ... N N
Inner product � Inner product’s geometric meaning: ν 1 EEE | ν 1 | | ν 2 | cos ( θ ) θ ν 2 � Lengths of both vectors ν 1 = � � ν 2 = � � � � y 1 y N x N � x 1 � √ √ √ √ N , ... N , ... N N are 1
Bound of correlation coefficient | corr ( { ( x i , y i ) } ) | = | cos ( θ ) | ≤ 1 = ν 1 θ ν 2 ν 1 = � � � � ν 2 = y 1 � y N � x N � x 1 � √ √ √ √ N , ... N , ... N N
The Properties of Correlation Coefficient � Symmetric � Transla)ng invariant � Scaling only may change sign � bounded within [-1, 1]
Using correlation to predict � Cau'on ! Correla)on is NOT Causa)on 7 Credit: Tyler Vigen
How do we go about the prediction? � Removed of outliers & standardized
Using correlation to predict � Given a correlated data set { ( x i , y i ) } we can predict a value that goes with p y 0 a value x 0 � In standard coordinates { ( � x i , � y i ) } we can predict a value that goes with � p y 0 a value � x 0
Q: � Which coordinates will you use for the predictor using correla)on? A. Standard coordinates D B. Original coordinates C. Either
Linear predictor and its error � We will assume that our predictor is linear y p = a � � x + b � We denote the predic)on at each in the data � x i set as p � y i p = a � � y i x i + b � The error in the predic)on is denoted u i p = � u i = � y i − � y i − a � y i x i − b
⇒ Require the mean of error to be zero We would try to make the mean of error equal to zero so that it is also centered around 0 as the standardized data: mean 45 - ij% Yeargain center = - a E - b 3 , mean 48 = meant 5- a. meant 35 = - b - b = O = b = 0 A
Require the variance of error is minimal 3%2 ) # mean 14 Ui - mean GZ minimize , • = meant In :3 ' ? mean Cfc E - yep , -3 , O = " -4533 = mean CECE - ax - zeaxagt a' E' 3 , a = mean 48 ' Hein "3sta ' - y = mean 48 ' } ) - za nee managing TE moonlit -3 ) - i - rear ta = - of } ) = mean CECIL Ice - sashay ← varia 't - =o da - 28+29=0
Require the variance of error is minimal
Here is the linear predictor! jP=a Ee b y p = r � � x q = r b =o Correla)on coefficient
Prediction Formula � In standard coordinates p = r � � r = corr ( { ( x i , y i ) } ) where y 0 x 0 � In original coordinates y p 0 − mean ( { y i } ) = rx 0 − mean ( { x i } ) std ( { y i } ) std ( { x i } )
Recommend
More recommend