Probability*and*Statistics* ! ! for*Computer*Science** - - PowerPoint PPT Presentation

probability and statistics
SMART_READER_LITE
LIVE PREVIEW

Probability*and*Statistics* ! ! for*Computer*Science** - - PowerPoint PPT Presentation

Probability*and*Statistics* ! ! for*Computer*Science** The!statement!that!The! average!US!family!has!2.6! children!invites!mockery!! Prof.!Forsyth!reminds!us! about!criAcal!thinking! Credit:!wikipedia!


slide-1
SLIDE 1

!!

Probability*and*Statistics* for*Computer*Science**

“The!statement!that!“The! average!US!family!has!2.6! children”!invites!mockery”!–! Prof.!Forsyth!reminds!us! about!criAcal!thinking!

Hongye!Liu,!Teaching!Assistant!Prof,!CS361,!UIUC,!1.23.2020! Credit:!wikipedia!
slide-2
SLIDE 2

Last*lecture*

Course!material!and!survey! Meet!staff!team!! Overview!of!CS361! Lecture!1!Q!Data!VisualizaAon!&!

Summary!(I)!

slide-3
SLIDE 3

Lecture*videos*and*ClassTranscribe**

! ! !

Lecture!will!be!videotaped!and!accessible!

at!hXps://mediaspace.illinois.edu/!

ClassTranscribe!provides!transcripts!for!

lecture!videos! !hXps://classtranscribe.illinois.edu/home! !

T

  • be connected
slide-4
SLIDE 4

We*learned*

Visualizing!with!

! !Tables,!bar!charts,!histograms!

Summarizing!with!locaAon!parameter!!

!Mean!

slide-5
SLIDE 5

Visualizing*Data*with*Histogram*(III)*

CondiAonal!!

histogram!

Data:!Combined!Score!(HWs,! Prj!and!Exams)!grouped!by! students!with!full! parAcipaAon!or!not$full!in! CS361!fall!2019!

5 10 15 400 600 800 1000 Total_HWPRJExam count Participation 1 Mean!(aqua)!=!890!! Mean!(red)!=!760!
slide-6
SLIDE 6

Today:*More*Summary*(descriptive* statistics*&*Data*Visualization)*

Mean! Standard!deviaAon! Variance! Standardizing!data! Median,!interquarAle!

range,!!

box!plots!and!outliers! Visualizing!&!

Summarizing! rela%onships,

Heatmap! 3D!bar! Time!series!plots! ScaXer!plots! CorrelaAon!

coefficient!

slide-7
SLIDE 7

Summarizing*1D*continuous*data*

For!a!data!set!{x}!or!annotated!as!{xi},!we! summarize!with:!

LocaAon!Parameters!

Mean!! Median! Mode!

Scale!parameters!

Standard!deviaAon!and!variance! InterquarAle!range!

slide-8
SLIDE 8

Summarizing*1D*continuous*data**

Mean!

!!!

It’s!the!centroid!of!the!data!geometrically,! by!idenAfying!the!data!set!at!that!point,!you!find!! the!center!of!balance.!

mean({xi}) = 1 N

N

  • i=1

xi

slide-9
SLIDE 9

Properties*of*the*mean*

Scaling!data!scales!the!mean! TranslaAng!the!data!translates!the!mean!

!!!

mean({k · xi}) = k · mean({xi})

mean({xi + c}) = mean({xi}) + c

slide-10
SLIDE 10

Less*obvious*properties*of*the*mean*

The!signed!distances!from!the!mean!!

!sum!to!0!

The!mean!minimizes!the!sum!of!the!

squared!distance!from!any!real!value! !!!

N

  • i=1

(xi − mean({xi})) = 0

argmin

µ N
  • i=1

(xi − µ)2 = mean({xi})

slide-11
SLIDE 11

prove Ig, I ki

  • mean Cfn; )) ) = O

N

N

I

Xi - I

mean Ctx:3 )

LHS

E- I

e- I

I

Xi -

N
  • mean443,

N

E l

Z X c

'

meme {* i } ) =

T v

[ H s Ex.

.
  • Ex:

=o

slide-12
SLIDE 12

Prove :

Argmui

2 = mean Chi) )

dd-u.IE,

CX;

  • out

dffcg him

= df

D8

= ¥1, darkie
  • H

T d

' Ex

=÷I

, dagos

's-d¥n sexier

= 7¥, 2g

. C- l )

= EY, Xxi

  • a) th) = o

ji-won Chih )

slide-13
SLIDE 13

Qs:*

!

What!is!the!answer!for!

!mean(mean({xi}))!?! !!

!

! !

Recall!in!which!applicaAon!in!Lecture!1!

were!the!means!of!experiments! compared?!

A.!mean({xi})!!!!B.!unsure!!!C.!0!

slide-14
SLIDE 14

Standard*Deviation*(σ)*

!

The!standard!deviaAon!

std({xi}) =

  • 1

N

N

  • i=1

(xi − mean({xi}))2

=

  • mean({(xi − mean({xi}))2})
slide-15
SLIDE 15

*

¥¥¥¥

.

"

÷¥

slide-16
SLIDE 16

Can*a*standard*deviation*of*a*dataset*be* J1?*

A.!!YES! B.!!NO!

slide-17
SLIDE 17

Properties*of*the*standard*deviation*

Scaling!data!scales!the!standard!deviaAon! TranslaAng!the!data!does!NOT!change!the!

standard!deviaAon! !!! std({k · xi}) = |k| · std({xi})

std({xi + c}) = std({xi})

slide-18
SLIDE 18

Standard*deviation:*Chebyshev’s* inequality*(1st*look)*

At!most!!!!!!items!are!k!standard!

deviaAons!(σ)!away!from!the!mean!

Rough!jusAficaAon:!Assume!mean!=0!

N k2

0!

N − N K2

0.5N K2 0.5N K2

−kσ

std =

  • 1

N [(N − N k )02 + N k2(kσ)2] = σ

slide-19
SLIDE 19

Variance*(σ2)*

Variance!!=!(standard!deviaAon)2! Scaling!and!translaAng!similar!to!standard!

!!!!deviaAon!

var({xi}) = 1 N

N

  • i=1

(xi − mean({xi}))2

var({k · xi}) = k2 · var({xi})

var({xi + c}) = var({xi})

slide-20
SLIDE 20

Q:*Standard*deviation*

What!is!the!value!of!

!std(mean({xi})!?! A.!0!!!!!B.!1!!!!C.!unsure!

slide-21
SLIDE 21

Standard*Coordinates/normalized* data*

!

The!mean!tells!where!the!data!set!is!and!the!

standard*devia-on!tells!how!spread!out!it!is.! If!we!are!interested!only!in!comparing!the! shape,!we!could! !define:!

We!say!!!!!!!!!!is!in!standard!coordinates!

  • xi = xi − mean({xi})

std({xi)} { xi}

slide-22
SLIDE 22

Q:*Mean*of*standard*coordinates*

!

!μ!of!!!!!!!!!!is:!!

!A.!1!!B.!0!!!C.!unsure!

  • xi = xi − mean({xi})

std({xi)}

{ xi}

slide-23
SLIDE 23

Q:*Standard*deviation*(σ)*of* standard*coordinates*

!

σ!of!!!!!!!!!is:!!

!A.!1!!B.!0!!C.!unsure!

  • xi = xi − mean({xi})

std({xi)}

{ xi}

slide-24
SLIDE 24

Q:*Variance*of*standard*coordinates*

!

Variance!of!!!!!!!!!is:!!

!A.!1!!B.!0!!C.!unsure!

  • xi = xi − mean({xi})

std({xi)}

{ xi}

slide-25
SLIDE 25

Q:*Estimate*the*range*of*data*in* standard*coordinates**

!

EsAmate!as!close!as!possible,!90%!data!

is!within:!!

!A.![Q10,!10]!!! !B.![Q100,!100]! !C.![Q1,!1]! !D.![Q4,!4]! !E.!others!

  • xi = xi − mean({xi})

std({xi)}

slide-26
SLIDE 26

Standard*Coordinates/normalized*data*to** μ=0,*σ=1,*σ2=1* !

Data!in!standard!coordinates!always!has!!!

!mean!=!0;!standard!deviaAon!=1;! !variance!=!1.! ! ! ! !

!

Such!data!is!unitQless,!plots!based!on!this!

someAmes!are!more!comparable!

We!see!such!normalizaAon!very!oten!in!

staAsAcs!

slide-27
SLIDE 27

Median*

!

To!organize!the!data!we!first!sort!it!! Then!if!the!number!of!items!N!is!odd!

!median!=!middle!item's!value!! !if!the!number!of!items!N!is!even! !median!=!mean!of!middle!2!items'! !values!

slide-28
SLIDE 28

Properties*of*Median*

!

Scaling!data!scales!the!median!

! !

TranslaAng!data!translates!the!median!

median({k · xi}) = k · median({xi})

median({xi + c}) = median({xi}) + c

slide-29
SLIDE 29

Percentile*

!

!kth!percenAle!is!the!value!relaAve!to!

which!k%!of!the!data!items!have!smaller!

  • r!equal!numbers!

Median!is!the!50th!percenAle!

! !

slide-30
SLIDE 30

Q:*Scaling*effect*on*percentiles*

! Scaling!data!scales!the!percenAle!

!A.!True!!!!!B.!False!

!

slide-31
SLIDE 31

Q:*Translating*effect*on*percentiles*

!

TranslaAng!data!does!NOT!change!the!

percenAle! !A.!True!!!B.!False!

slide-32
SLIDE 32

Interquartile*range*

!

iqr!=!(75th!percenAle)!Q!(25th!percenAle)! Scaling!data!scales!the!interquarAle!range!

!

TranslaAng!data!does!NOT!change!the!

interquarAle!range!

iqr({k · xi}) = |k| · iqr({xi}) iqr({xi + c}) = iqr({xi})

slide-33
SLIDE 33

Summarizing*1D*continuous*data*

LocaAon!Parameters!

Mean!! Median! Mode!

Scale!parameters!

Standard!deviaAon!and!variance! InterquarAle!range!

slide-34
SLIDE 34

Box*plots*

!

Boxplots!

Simpler!than!!

!histogram!

Good!for!outliers! Easier!to!use!

for!comparison!

Data!from!hXps://www2.stetson.edu/ ~jrasp/data.htm!

Vehicle!death!by!region!

DEATH!
slide-35
SLIDE 35

Boxplots*details,*outliers*

How!to!!

define!!

  • utliers?!

(the!default)!

!!

Whisker! Box! Median! Outlier! InterquarAle!! Range!(iqr)! >!1.5!iqr! <!1.5!iqr!

75%

ni

.
slide-36
SLIDE 36

Sensitivity*of*summary*statistics*to*

  • utliers*

mean!and!standard!deviaAon!are!

very!sensiAve!to!outliers!

median!and!interquarAle!range!are!

not!sensiAve!to!outliers!

slide-37
SLIDE 37

Group*Discussion*

slide-38
SLIDE 38

Modes*

Modes!are!peaks!in!a!histogram! If!there!are!more!than!1!mode,!we!

should!be!curious!as!to!why!

slide-39
SLIDE 39

Multiple*modes*

We!have!seen!

!the!“iris”!data! which!looks!to!! have!several!! peaks!

Data:!“iris”!!

slide-40
SLIDE 40

Example*BiJmodes*distribution*

Modes!may!

indicate! mulAple!

populaAons!

Data:!Erythrocyte!cells!in! healthy!humans! ! Piagnerelli,!JCP!2007!
slide-41
SLIDE 41

Tails*and*Skews*

Credit:!Prof.Forsyth!
slide-42
SLIDE 42

Assignments*

HW1,!due!on!1/30!Thurs.! Reading!Chapter!2!of!the!textbook! Next!Ame:!Looking!for!relaAonship!in!

data;!correlaAon!coefficient!

!

slide-43
SLIDE 43

Additional*References*

Peter!Dalgaard!"Introductory!StaAsAcs"!

with!R!

Charles!M.!Grinstead!and!J.!Laurie!Snell!

"IntroducAon!to!Probability”!!

Morris!H.!Degroot!and!Mark!J.!Schervish!

"Probability!and!StaAsAcs”!

slide-44
SLIDE 44

See*you*next*time*

See You!