Probability and Statistics for Computer Science Correla)on is not - - PowerPoint PPT Presentation

probability and statistics
SMART_READER_LITE
LIVE PREVIEW

Probability and Statistics for Computer Science Correla)on is not - - PowerPoint PPT Presentation

Probability and Statistics for Computer Science Correla)on is not Causa)on but Correla)on is so beau)ful! Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 9.1.2020 " " # in your Please use sign *


slide-1
SLIDE 1

ì

Probability and Statistics for Computer Science

“Correla)on is not Causa)on” but Correla)on is so beau)ful!

Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 9.1.2020 Credit: wikipedia

slide-2
SLIDE 2

*

Please

use

"# "

sign

in

your

chat

to

indicate

a

formal

question

  • r

comment

.

*

please

mute

your

mic

eo

keep

the

Zoom

sound

quality

.

*

please

check

  • ut

the

websites

  • f

simulation

&

Code Notebook in

the

chat

.
slide-3
SLIDE 3

Last time

Location

Parameters

i

Mean IM)

,

Median , Mode Scale

parameters

:

Standard (g)

Interquartile

deviation

'

range ciqr)

variance (62 )

standardizing

Data :

'

x'Ix

slide-4
SLIDE 4

Objectives

Median, Interquar)le range, box

plot and outlier

ScaRer plots, Correla)on Coefficient Visualizing & Summarizing

rela%onships Heatmap, 3D bar, Time series plots, I

slide-5
SLIDE 5

Median

To organize the data we first sort it Then if the number of items N is odd

median = middle item's value if the number of items N is even median = mean of middle 2 items' values

slide-6
SLIDE 6

Properties of Median

Scaling data scales the median Transla)ng data translates the median

median({k · xi}) = k · median({xi})

median({xi + c}) = median({xi}) + c

median =

a rgmmin c Ei , ki- ul )

slide-7
SLIDE 7

Percentile

kth percen)le is the value rela)ve to

which k% of the data items have smaller

  • r equal numbers

Median is roughly the 50th percen)le

I' I

,

2

,

3

,

4

,

5

,

6

,

7

.

12 }

.

¥751

> 5th

percentile

= ?

6

slide-8
SLIDE 8

Interquartile range

iqr = (75th percen)le) - (25th percen)le) Scaling data scales the interquar)le range Transla)ng data does NOT change the

interquar)le range

iqr({k · xi}) = |k| · iqr({xi}) iqr({xi + c}) = iqr({xi})

  • 1

20

AT

slide-9
SLIDE 9

Box plots

Boxplots

Simpler than

histogram

Good for outliers Easier to use

for comparison

Data from hRps://www2.stetson.edu/ ~jrasp/data.htm

Vehicle death by region

DEATH

slide-10
SLIDE 10

Boxplots details, outliers

How to

define

  • utliers?

(the default)

Whisker Box Median Outlier Interquar)le Range (iqr) > 1.5 iqr < 1.5 iqr

  • foot
slide-11
SLIDE 11
  • Q. TRUE or FALSE

mean is more sensi)ve to outliers than median

True

B.

False

slide-12
SLIDE 12
  • Q. TRUE or FALSE

interquar)le range is more sensi)ve to outliers than std.

A

True

false

slide-13
SLIDE 13

Sensitivity of summary statistics to

  • utliers

mean and standard devia)on are

very sensi)ve to outliers

median and interquar)le range are

not sensi)ve to outliers

slide-14
SLIDE 14

Modes

Modes are peaks in a histogram If there are more than 1 mode, we

should be curious as to why

slide-15
SLIDE 15

Multiple modes

We have seen

the “iris” data which looks to have several peaks

Data: “iris” in R

Iris

slide-16
SLIDE 16

Example Bi-modes distribution

Modes may

indicate mul)ple

popula)ons

Data: Erythrocyte cells in healthy humans Piagnerelli, JCP 2007

red

blood cell
slide-17
SLIDE 17

Tails and Skews

Credit: Prof.Forsyth

O

tails

  • utlier

,

C

→ night +nil
slide-18
SLIDE 18

t.tl#.

  • 3
  • z
  • I
  • l
L 3 4

Smiled

: an

arrears

slide-19
SLIDE 19
  • Q. How is this skewed?

Median = 47

A Lep B Right

I

mean = ?

46

slide-20
SLIDE 20

Looking at relationships in data

Finding rela)onships between

features in a data set or many data sets is one of the most important tasks in data analysis

slide-21
SLIDE 21

Relationship between data features

Example: does the weight of people relate to

their height?

x : HIGHT, y: WEIGHT

Q

slide-22
SLIDE 22

Scatter plot

Body Fat data set

slide-23
SLIDE 23

Scatter plot

ScaRer plot with density

O

O

  • °
slide-24
SLIDE 24

Scatter plot

Removed of outliers & standardized

slide-25
SLIDE 25

Correlation

y

✓ y

covariance

ch

. Y . I 13
slide-26
SLIDE 26

Correlation seen from scatter plots

Posi)ve correla)on Nega)ve correla)on Zero Correla)on

Credit: Prof.Forsyth

slide-27
SLIDE 27

What kind of Correlation?

Line of code in a database and number of bugs Frequency of hand washing and number of

germs on your hands

GPA and hours spent playing video games earnings and happiness

Credit: Prof. David Varodayan

slide-28
SLIDE 28

Correlation doesn’t mean causation

Shoe size is correlated to reading skills,

but it doesn’t mean making feet grow will make one person read faster.

slide-29
SLIDE 29

Correlation Coefficient

Given a data set consis)ng of

items

Standardize the coordinates of each feature: Define the correla)on coefficient as:

corr({(xi, yi)}) = 1 N

N

  • i=1
  • xi

yi

{(xi, yi)}

(x1, y1) ... (xN, yN),

  • xi = xi − mean({xi})

std({xi})

  • yi = yi − mean({yi})

std({yi})

slide-30
SLIDE 30

Correlation Coefficient

corr({(xi, yi)}) = 1 N

N

  • i=1
  • xi

yi

  • xi = xi − mean({xi})

std({xi})

  • yi = yi − mean({yi})

std({yi})

= mean({ xi yi})

slide-31
SLIDE 31

Q: Correlation Coefficient

Which of the following describe(s)

correla)on coefficient correctly?

  • A. It’s unitless
  • B. It’s defined in standard coordinates
  • C. Both A & B

corr({(xi, yi)}) = 1 N

N

  • i=1
  • xi

yi

slide-32
SLIDE 32

A visualization of correlation coefficient

hRps://rpsychologist.com/d3/correla)on/ In a data set consis)ng of items shows posi)ve correla)on shows nega)ve correla)on shows no correla)on

{(xi, yi)} (x1, y1) ... (xN, yN),

corr({(xi, yi)}) > 0 corr({(xi, yi)}) < 0 corr({(xi, yi)}) = 0

slide-33
SLIDE 33

The Properties of Correlation Coefficient

The correla)on coefficient is symmetric Transla)ng the data does NOT change the

correla)on coefficient

corr({(xi, yi)}) = corr({(yi, xi)})

slide-34
SLIDE 34

The Properties of Correlation Coefficient

Scaling the data may change the sign of

the correla)on coefficient

corr({(a xi + b, c yi + d)}) = sign(a c)corr({(xi, yi)})

slide-35
SLIDE 35 4

:

  • Z
  • 44
  • Z
O 2 4
slide-36
SLIDE 36 4

:

  • 2
  • 4
  • 4
  • 2
2 4
slide-37
SLIDE 37

The Properties of Correlation Coefficient

The correla)on coefficient is bounded

within [-1, 1] if and only if if and only if

corr({(xi, yi)}) = 1 corr({(xi, yi)}) = −1

  • xi =

yi

  • xi = −

yi

slide-38
SLIDE 38

Which%of%the%following%has%correlation% coefficient%equal%to%1?%

  • A. #Leb#and#right#
  • B. #Leb#
  • C. #Middle#

#

Y

Y

Y

÷

.

.

×

^

a

slide-39
SLIDE 39

Concept of Correlation Coefficient’s bound

The correla)on coefficient can be

wriRen as

It’s the inner product of two vectors

and

corr({(xi, yi)}) =

N

  • i=1
  • xi

√ N

  • yi

√ N corr({(xi, yi)}) = 1 N

N

  • i=1
  • xi

yi

  • x1

√ N ,

...

  • xN

√ N

  • y1

√ N ,

...

  • yN

√ N

  • >
T
  • vi. U
N = -2 Ui
  • Vi
II
slide-40
SLIDE 40

Inner product

Inner product’s geometric meaning: Lengths of both vectors

are 1

θ ν2 ν1

|ν1| |ν2| cos(θ)

ν1= ν2=

  • x1

√ N ,

...

  • xN

√ N

  • y1

√ N ,

...

  • yN

√ N

  • EEE
slide-41
SLIDE 41

Bound of correlation coefficient

θ ν2 ν1

|corr({(xi, yi)})| = |cos(θ)| ≤ 1

ν1= ν2=

  • x1

√ N ,

...

  • xN

√ N

  • y1

√ N ,

...

  • yN

√ N

  • =
slide-42
SLIDE 42

The Properties of Correlation Coefficient

Symmetric Transla)ng invariant Scaling only may change sign bounded within [-1, 1]

slide-43
SLIDE 43

Using correlation to predict

Cau'on! Correla)on is NOT Causa)on

Credit: Tyler Vigen

7

slide-44
SLIDE 44

How do we go about the prediction?

Removed of outliers & standardized

slide-45
SLIDE 45

Using correlation to predict

Given a correlated data set

we can predict a value that goes with a value

{(xi, yi)}

y0

p

x0

In standard coordinates

we can predict a value that goes with a value

{( xi, yi)}

  • y0

p

  • x0
slide-46
SLIDE 46

Q:

Which coordinates will you use for the

predictor using correla)on?

  • A. Standard coordinates
  • B. Original coordinates
  • C. Either

D

slide-47
SLIDE 47

Linear predictor and its error

We will assume that our predictor is linear We denote the predic)on at each in the data

set as

The error in the predic)on is denoted

ui

  • xi
  • yi

p

  • y p = a

x + b

  • yi

p = a

xi + b

ui = yi − yi

p =

yi − a xi − b

slide-48
SLIDE 48

Require the mean of error to be zero

We would try to make the mean of error equal to zero so that it is also centered around 0 as the standardized data:

center

Yeargain

= mean 45 - ij% = mean 48
  • a E - b3 ,
= meant5- a.meant35
  • b
=
  • b
= O

b = 0

A

slide-49
SLIDE 49

Require the variance of error is minimal

minimize

,

GZ

mean 14 Ui - mean

#

3%2)

  • = meant In:3 '?
= mean Cfc E - yep

,-3 ,

O

= mean CECE- ax

"-4533 a = mean 48'
  • zeaxagt a'E'3,

Hein"3sta'

= mean 48 '} )

  • za nee
  • y

TE

managing

moonlit-3)

=

i - rear ta

  • = mean CECIL
  • of} )

Ice-sashay

←varia't

  • =o

da

  • 28+29=0
slide-50
SLIDE 50

Require the variance of error is minimal

slide-51
SLIDE 51

Here is the linear predictor!

  • y p = r

x

Correla)on coefficient

jP=a Ee b

q = r

b =o

slide-52
SLIDE 52

Prediction Formula

In standard coordinates In original coordinates

r = corr({(xi, yi)})

  • y0

p = r

x0

where

yp

0 − mean({yi})

std({yi}) = rx0 − mean({xi}) std({xi})

slide-53
SLIDE 53

Root-mean-square (RMS) prediction error

Given var({ui}) = 1 − 2ar + a2 & a = r var({ui}) = 1 − r2

RMS error =

  • mean({u2

i })

  • =

√ 1 − r2

=

  • var({ui})
  • if

r=l

vary Uil > = o

slide-54
SLIDE 54

See the error through simulation

hRps://rpsychologist.com/d3/correla)on/

slide-55
SLIDE 55

Example: Body Fat data

r = 0.513

slide-56
SLIDE 56

Example: remove 2 more outliers

r = 0.556

slide-57
SLIDE 57

Heatmap

Summariza)on of 4 loca)ons’ annual mean temperature by month Display matrix of data via gradient of color(s)

slide-58
SLIDE 58

3D bar chart

Transparent

3D bar chart is good for small # of samples across categories

slide-59
SLIDE 59

Relationship between data feature and time

Example: How does Amazon’s stock change

  • ver 1 years?

take out the pair of features x: Day y: AMZN

slide-60
SLIDE 60

Time Series Plot: Stock of Amazon

slide-61
SLIDE 61

Scatter plot

Coupled with

heatmap to show a 3rd feature

slide-62
SLIDE 62

Assignments

Finish reading Chapter 2 of the

textbook

Next )me: Probability a first look

slide-63
SLIDE 63

Additional References

Charles M. Grinstead and J. Laurie Snell

"Introduc)on to Probability”

Morris H. Degroot and Mark J. Schervish

"Probability and Sta)s)cs”

slide-64
SLIDE 64

See you next time

See You!