probability and statistics
play

Probability and Statistics for Computer Science Correla)on is not - PowerPoint PPT Presentation

Probability and Statistics for Computer Science Correla)on is not Causa)on but Correla)on is so beau)ful! Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 9.1.2020 Last time Mean Standard devia)on


  1. Probability and Statistics ì for Computer Science “Correla)on is not Causa)on” but Correla)on is so beau)ful! Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 9.1.2020

  2. Last time ✺ Mean ✺ Standard devia)on ✺ Variance ✺ Standardizing data

  3. Objectives ✺ Median, Interquar)le range, box plot and outlier ✺ ScaRer plots, Correla)on Coefficient ✺ Visualizing & Summarizing rela%onships Heatmap, 3D bar, Time series plots,

  4. Median ✺ To organize the data we first sort it ✺ Then if the number of items N is odd median = middle item's value if the number of items N is even median = mean of middle 2 items' values

  5. Properties of Median ✺ Scaling data scales the median median ( { k · x i } ) = k · median ( { x i } ) ✺ Transla)ng data translates the median median ( { x i + c } ) = median ( { x i } ) + c

  6. Percentile ✺ k th percen)le is the value rela)ve to which k% of the data items have smaller or equal numbers ✺ Median is roughly the 50 th percen)le

  7. Interquartile range ✺ iqr = (75th percen)le) - (25th percen)le) ✺ Scaling data scales the interquar)le range iqr ( { k · x i } ) = | k | · iqr ( { x i } ) ✺ Transla)ng data does NOT change the interquar)le range iqr ( { x i + c } ) = iqr ( { x i } )

  8. Box plots Vehicle death by region ✺ Boxplots ✺ Simpler than histogram DEATH ✺ Good for outliers ✺ Easier to use for comparison Data from hRps://www2.stetson.edu/ ~jrasp/data.htm

  9. Boxplots details, outliers ✺ How to Outlier define > 1.5 iqr Whisker outliers? (the default) Box Interquar)le Range (iqr) Median < 1.5 iqr

  10. Sensitivity of summary statistics to outliers ✺ mean and standard devia)on are very sensi)ve to outliers ✺ median and interquar)le range are not sensi)ve to outliers

  11. Modes ✺ Modes are peaks in a histogram ✺ If there are more than 1 mode, we should be curious as to why

  12. Multiple modes ✺ We have seen the “iris” data which looks to have several peaks Data: “iris” in R

  13. Example Bi-modes distribution ✺ Modes may indicate mul)ple popula)ons Data: Erythrocyte cells in healthy humans Piagnerelli, JCP 2007

  14. Tails and Skews Credit: Prof.Forsyth

  15. Q. How is this skewed? A Lep B Right Median = 47

  16. Looking at relationships in data ✺ Finding rela)onships between features in a data set or many data sets is one of the most important tasks in data analysis

  17. Relationship between data features ✺ Example: does the weight of people relate to their height? ✺ x : HIGHT, y: WEIGHT

  18. Scatter plot ✺ Body Fat data set

  19. Scatter plot ✺ ScaRer plot with density

  20. Scatter plot ✺ Removed of outliers & standardized

  21. Correlation seen from scatter plots Zero Posi)ve Nega)ve Correla)on correla)on correla)on Credit: Prof.Forsyth

  22. What kind of Correlation? ✺ Line of code in a database and number of bugs ✺ Frequency of hand washing and number of germs on your hands ✺ GPA and hours spent playing video games ✺ earnings and happiness Credit: Prof. David Varodayan

  23. Correlation doesn’t mean causation ✺ Shoe size is correlated to reading skills, but it doesn’t mean making feet grow will make one person read faster.

  24. Correlation Coefficient ✺ Given a data set consis)ng of { ( x i , y i ) } items ( x 1 , y 1 ) ... ( x N , y N ) , ✺ Standardize the coordinates of each feature: x i = x i − mean ( { x i } ) y i = y i − mean ( { y i } ) � � std ( { x i } ) std ( { y i } ) ✺ Define the correla)on coefficient as: N � corr ( { ( x i , y i ) } ) = 1 x i � � y i N i =1

  25. Correlation Coefficient x i = x i − mean ( { x i } ) y i = y i − mean ( { y i } ) � � std ( { x i } ) std ( { y i } ) � N corr ( { ( x i , y i ) } ) = 1 x i � � y i N i =1 = mean ( { � y i } ) x i �

  26. Q: Correlation Coefficient ✺ Which of the following describe(s) correla)on coefficient correctly? A. It’s unitless B. It’s defined in standard coordinates C. Both A & B N � corr ( { ( x i , y i ) } ) = 1 x i � � y i N i =1

  27. A visualization of correlation coefficient hRps://rpsychologist.com/d3/correla)on/ In a data set consis)ng of items { ( x i , y i ) } ( x 1 , y 1 ) ... ( x N , y N ) , shows posi)ve correla)on corr ( { ( x i , y i ) } ) > 0 shows nega)ve correla)on corr ( { ( x i , y i ) } ) < 0 shows no correla)on corr ( { ( x i , y i ) } ) = 0

  28. Correlation seen from scatter plots Zero Posi)ve Nega)ve Correla)on correla)on correla)on Credit: Prof.Forsyth

  29. The Properties of Correlation Coefficient ✺ The correla)on coefficient is symmetric corr ( { ( x i , y i ) } ) = corr ( { ( y i , x i ) } ) ✺ Transla)ng the data does NOT change the correla)on coefficient

  30. The Properties of Correlation Coefficient ✺ Scaling the data may change the sign of the correla)on coefficient corr ( { ( a x i + b, c y i + d ) } ) = sign ( a c ) corr ( { ( x i , y i ) } )

  31. The Properties of Correlation Coefficient ✺ The correla)on coefficient is bounded within [-1, 1] if and only if x i = � � corr ( { ( x i , y i ) } ) = 1 y i if and only if corr ( { ( x i , y i ) } ) = − 1 x i = − � � y i

  32. Concept of Correlation Coefficient’s bound ✺ The correla)on coefficient can be wriRen as � N corr ( { ( x i , y i ) } ) = 1 x i � � y i N i =1 � N � � x i y i corr ( { ( x i , y i ) } ) = √ √ N N i =1 ✺ It’s the inner product of two vectors � � � � and � y 1 � y N � � x 1 x N √ √ √ √ N , ... N , ... N N

  33. Inner product ✺ Inner product’s geometric meaning: ν 1 | ν 1 | | ν 2 | cos ( θ ) θ ν 2 ✺ Lengths of both vectors ν 1 = � � ν 2 = � � � � y 1 y N x N � x 1 � √ √ √ √ N , ... N , ... N N are 1

  34. Bound of correlation coefficient | corr ( { ( x i , y i ) } ) | = | cos ( θ ) | ≤ 1 ν 1 θ ν 2 ν 1 = � � � � ν 2 = y 1 � y N � x N � x 1 � √ √ √ √ N , ... N , ... N N

  35. The Properties of Correlation Coefficient ✺ Symmetric ✺ Transla)ng invariant ✺ Scaling only may change sign ✺ bounded within [-1, 1]

  36. Using correlation to predict ✺ Cau'on ! Correla)on is NOT Causa)on Credit: Tyler Vigen

  37. How do we go about the prediction? ✺ Removed of outliers & standardized

  38. Using correlation to predict ✺ Given a correlated data set { ( x i , y i ) } we can predict a value that goes with p y 0 a value x 0 ✺ In standard coordinates { ( � x i , � y i ) } we can predict a value that goes with � p y 0 a value � x 0

  39. Q: ✺ Which coordinates will you use for the predictor using correla)on? A. Standard coordinates B. Original coordinates C. Either

  40. Linear predictor and its error ✺ We will assume that our predictor is linear y p = a � � x + b ✺ We denote the predic)on at each in the data � x i set as p � y i p = a � � y i x i + b ✺ The error in the predic)on is denoted u i p = � u i = � y i − � y i − a � y i x i − b

  41. Require the mean of error to be zero We would try to make the mean of error equal to zero so that it is also centered around 0 as the standardized data:

  42. Require the variance of error is minimal

  43. Require the variance of error is minimal

  44. Here is the linear predictor! y p = r � � x Correla)on coefficient

  45. Prediction Formula ✺ In standard coordinates p = r � � r = corr ( { ( x i , y i ) } ) where y 0 x 0 ✺ In original coordinates y p 0 − mean ( { y i } ) = rx 0 − mean ( { x i } ) std ( { y i } ) std ( { x i } )

  46. Root-mean-square (RMS) prediction error ✺ var ( { u i } ) = 1 − 2 ar + a 2 Given & a = r var ( { u i } ) = 1 − r 2 � ✺ mean ( { u 2 RMS error = i } ) � = var ( { u i } ) √ 1 − r 2 =

  47. See the error through simulation hRps://rpsychologist.com/d3/correla)on/

  48. Example: Body Fat data r = 0.513

  49. Example: remove 2 more outliers r = 0.556

  50. Heatmap ✺ Display matrix of data via gradient of color(s) Summariza)on of 4 loca)ons’ annual mean temperature by month

  51. 3D bar chart ✺ Transparent 3D bar chart is good for small # of samples across categories

  52. Relationship between data feature and time ✺ Example: How does Amazon’s stock change over 1 years? take out the pair of features x: Day y: AMZN

  53. Time Series Plot: Stock of Amazon

  54. Scatter plot ✺ Coupled with heatmap to show a 3 rd feature

  55. Assignments ✺ Finish reading Chapter 2 of the textbook ✺ Next )me: Probability a first look

  56. Additional References ✺ Charles M. Grinstead and J. Laurie Snell "Introduc)on to Probability” ✺ Morris H. Degroot and Mark J. Schervish "Probability and Sta)s)cs”

  57. See you next time See You!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend