probability and statistics
play

Probability and Statistics for Computer Science Correla)on is not - PowerPoint PPT Presentation

Probability and Statistics for Computer Science Correla)on is not Causa)on but Correla)on is so beau)ful! Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 9.1.2020 " " # in your Please use sign *


  1. Probability and Statistics ì for Computer Science “Correla)on is not Causa)on” but Correla)on is so beau)ful! Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 9.1.2020

  2. " " # in your Please use sign * question formal indicate chat a to comment or . mic keep eo please your mute * sound quality the Zoom . the of websites out check please * chat the Notebook in & Code simulation .

  3. Last time Parameters Location i Mode Mean IM ) Median , , parameters Scale : Inter quartile Standard ( g ) ' range ciqr ) deviation ( 62 ) variance x' I x ' Data : standardizing

  4. Objectives � Median, Interquar)le range, box plot and outlier � ScaRer plots, Correla)on Coefficient Heatmap, 3D bar, Time series plots, I � Visualizing & Summarizing rela%onships

  5. Median � To organize the data we first sort it � Then if the number of items N is odd median = middle item's value if the number of items N is even median = mean of middle 2 items' values

  6. Properties of Median � Scaling data scales the median median ( { k · x i } ) = k · median ( { x i } ) c Ei , ki - ul ) a rgmmin median = � Transla)ng data translates the median median ( { x i + c } ) = median ( { x i } ) + c -

  7. Percentile � k th percen)le is the value rela)ve to which k% of the data items have smaller or equal numbers � Median is roughly the 50 th percen)le 12 } I ' I 5 6 7 4 3 2 . , , , , , , ¥751 = ? percentile 6 > 5th .

  8. Interquartile range � iqr = (75th percen)le) - (25th percen)le) -1 20 � Scaling data scales the interquar)le range iqr ( { k · x i } ) = | k | · iqr ( { x i } ) AT � Transla)ng data does NOT change the interquar)le range iqr ( { x i + c } ) = iqr ( { x i } )

  9. Box plots Vehicle death by region � Boxplots � Simpler than histogram DEATH � Good for outliers � Easier to use for comparison Data from hRps://www2.stetson.edu/ ~jrasp/data.htm

  10. Boxplots details, outliers � How to Outlier define > 1.5 iqr Whisker outliers? - (the default) foot Box Interquar)le Range (iqr) Median < 1.5 iqr

  11. Q. TRUE or FALSE mean is more sensi)ve to outliers than median ⑦ True False B.

  12. Q. TRUE or FALSE interquar)le range is more sensi)ve to outliers than std. A True ⑤ false

  13. Sensitivity of summary statistics to outliers � mean and standard devia)on are - - very sensi)ve to outliers � median and interquar)le range are - - not sensi)ve to outliers

  14. Modes � Modes are peaks in a histogram � If there are more than 1 mode, we should be curious as to why

  15. Multiple modes � We have seen the “iris” data which looks to Iris have several peaks Data: “iris” in R

  16. Example Bi-modes distribution � Modes may indicate mul)ple popula)ons blood cell red Data: Erythrocyte cells in healthy humans Piagnerelli, JCP 2007

  17. Tails and Skews O tails outlier , C → night + nil Credit: Prof.Forsyth

  18. t.tl#. - 3 3 Smiled - I 4 - z l o L : an arrears -

  19. Q. How is this skewed? A Lep I B Right 46 mean = ? Median = 47

  20. Looking at relationships in data � Finding rela)onships between features in a data set or many data - sets is one of the most important tasks in data analysis

  21. Relationship between data features � Example: does the weight of people relate to their height? Q � x : HIGHT, y: WEIGHT

  22. Scatter plot � Body Fat data set

  23. Scatter plot � ScaRer plot with density O o° O

  24. Scatter plot � Removed of outliers & standardized

  25. Correlation y ✓ y ✓ covariance . Y . I ch 13

  26. Correlation seen from scatter plots Zero Posi)ve Nega)ve Correla)on correla)on correla)on Credit: Prof.Forsyth

  27. What kind of Correlation? � Line of code in a database and number of bugs � Frequency of hand washing and number of germs on your hands � GPA and hours spent playing video games � earnings and happiness Credit: Prof. David Varodayan

  28. Correlation doesn’t mean causation � Shoe size is correlated to reading skills, but it doesn’t mean making feet grow will make one person read faster.

  29. Correlation Coefficient � Given a data set consis)ng of { ( x i , y i ) } items ( x 1 , y 1 ) ... ( x N , y N ) , � Standardize the coordinates of each feature: x i = x i − mean ( { x i } ) y i = y i − mean ( { y i } ) � � std ( { x i } ) std ( { y i } ) � Define the correla)on coefficient as: N � corr ( { ( x i , y i ) } ) = 1 x i � � y i N i =1

  30. Correlation Coefficient x i = x i − mean ( { x i } ) y i = y i − mean ( { y i } ) � � std ( { x i } ) std ( { y i } ) � N corr ( { ( x i , y i ) } ) = 1 x i � � y i N i =1 = mean ( { � y i } ) x i �

  31. Q: Correlation Coefficient � Which of the following describe(s) correla)on coefficient correctly? A. It’s unitless B. It’s defined in standard coordinates o C. Both A & B N � corr ( { ( x i , y i ) } ) = 1 x i � � y i N i =1

  32. A visualization of correlation coefficient hRps://rpsychologist.com/d3/correla)on/ In a data set consis)ng of items { ( x i , y i ) } ( x 1 , y 1 ) ... ( x N , y N ) , shows posi)ve correla)on corr ( { ( x i , y i ) } ) > 0 shows nega)ve correla)on corr ( { ( x i , y i ) } ) < 0 shows no correla)on corr ( { ( x i , y i ) } ) = 0

  33. The Properties of Correlation Coefficient � The correla)on coefficient is symmetric corr ( { ( x i , y i ) } ) = corr ( { ( y i , x i ) } ) � Transla)ng the data does NOT change the correla)on coefficient

  34. The Properties of Correlation Coefficient � Scaling the data may change the sign of the correla)on coefficient corr ( { ( a x i + b, c y i + d ) } ) = sign ( a c ) corr ( { ( x i , y i ) } )

  35. 4 : - Z - 44 4 - Z 2 O

  36. 4 : -2 -4 - 4 -2 4 0 2

  37. The Properties of Correlation Coefficient � The correla)on coefficient is bounded within [-1, 1] if and only if x i = � � corr ( { ( x i , y i ) } ) = 1 y i if and only if corr ( { ( x i , y i ) } ) = − 1 x i = − � � y i

  38. Which%of%the%following%has%correlation% coefficient%equal%to%1?% Y Y Y ÷ . . × ^ a A. #Leb#and#right# B. #Leb# C. #Middle# #

  39. Concept of Correlation Coefficient’s bound � The correla)on coefficient can be wriRen as � N corr ( { ( x i , y i ) } ) = 1 x i � � y i T > N vi. U i =1 N - Vi = -2 Ui � N � � x i y i II corr ( { ( x i , y i ) } ) = √ √ N N i =1 � It’s the inner product of two vectors � � � � and � y 1 � y N � x 1 � x N √ √ √ √ N , ... N , ... N N

  40. Inner product � Inner product’s geometric meaning: ν 1 EEE | ν 1 | | ν 2 | cos ( θ ) θ ν 2 � Lengths of both vectors ν 1 = � � ν 2 = � � � � y 1 y N x N � x 1 � √ √ √ √ N , ... N , ... N N are 1

  41. Bound of correlation coefficient | corr ( { ( x i , y i ) } ) | = | cos ( θ ) | ≤ 1 = ν 1 θ ν 2 ν 1 = � � � � ν 2 = y 1 � y N � x N � x 1 � √ √ √ √ N , ... N , ... N N

  42. The Properties of Correlation Coefficient � Symmetric � Transla)ng invariant � Scaling only may change sign � bounded within [-1, 1]

  43. Using correlation to predict � Cau'on ! Correla)on is NOT Causa)on 7 Credit: Tyler Vigen

  44. How do we go about the prediction? � Removed of outliers & standardized

  45. Using correlation to predict � Given a correlated data set { ( x i , y i ) } we can predict a value that goes with p y 0 a value x 0 � In standard coordinates { ( � x i , � y i ) } we can predict a value that goes with � p y 0 a value � x 0

  46. Q: � Which coordinates will you use for the predictor using correla)on? A. Standard coordinates D B. Original coordinates C. Either

  47. Linear predictor and its error � We will assume that our predictor is linear y p = a � � x + b � We denote the predic)on at each in the data � x i set as p � y i p = a � � y i x i + b � The error in the predic)on is denoted u i p = � u i = � y i − � y i − a � y i x i − b

  48. ⇒ Require the mean of error to be zero We would try to make the mean of error equal to zero so that it is also centered around 0 as the standardized data: mean 45 - ij% Yeargain center = - a E - b 3 , mean 48 = meant 5- a. meant 35 = - b - b = O = b = 0 A

  49. Require the variance of error is minimal 3%2 ) # mean 14 Ui - mean GZ minimize , • = meant In :3 ' ? mean Cfc E - yep , -3 , O = " -4533 = mean CECE - ax - zeaxagt a' E' 3 , a = mean 48 ' Hein "3sta ' - y = mean 48 ' } ) - za nee managing TE moonlit -3 ) - i - rear ta = - of } ) = mean CECIL Ice - sashay ← varia 't - =o da - 28+29=0

  50. Require the variance of error is minimal

  51. Here is the linear predictor! jP=a Ee b y p = r � � x q = r b =o Correla)on coefficient

  52. Prediction Formula � In standard coordinates p = r � � r = corr ( { ( x i , y i ) } ) where y 0 x 0 � In original coordinates y p 0 − mean ( { y i } ) = rx 0 − mean ( { x i } ) std ( { y i } ) std ( { x i } )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend