Probability*and*Statistics* ! for*Computer*Science** - - PowerPoint PPT Presentation

probability and statistics
SMART_READER_LITE
LIVE PREVIEW

Probability*and*Statistics* ! for*Computer*Science** - - PowerPoint PPT Presentation

Probability*and*Statistics* ! for*Computer*Science** "Sta&s&cal!thinking!will!one!day! be!as!necessary!for!efficient! ci&zenship!as!the!ability!to!read! and!write."!H.!G.!Wells ! Credit:!wikipedia!


slide-1
SLIDE 1

!

Probability*and*Statistics* for*Computer*Science**

"Sta&s&cal!thinking!will!one!day! be!as!necessary!for!efficient! ci&zenship!as!the!ability!to!read! and!write."!H.!G.!Wells!

Hongye!Liu,!Teaching!Assistant!Prof,!CS361,!UIUC,!03.12.2020! Credit:!wikipedia!

slide-2
SLIDE 2

Midterm1**

* Grading

is

done

.

*

Grades

will

be published

today

.

*

Points

will

be

curved

given

it 's

relatively harder

than last

semester

.

*

You 're

welcome

to

come

to

  • ffice

hour

today

to discuss

about

it .

slide-3
SLIDE 3

MidTerm1

Score Frequency 60 80 100 120 140 10 20 30

stole

112.849 - r

' ' 5 '9.724

last

semester

Std

  • 25.413

median

= 124

mean

=L 12.956
slide-4
SLIDE 4

Last*time*

Review!of!variance,!sample!mean! Sum!and!difference!between!

variables!of!normal!distribu&ons!

Hypothesis!test!of!equality!of!two!

sample!means!

ChiRsquare!test!

slide-5
SLIDE 5

Contents*

Review!of!sta&s&cal!inference! Inferring!probability!model!from!

data!

Maximum!likelihood!es&mate! Confidence!interval!for!MLE! Bayesian!inference!

1

slide-6
SLIDE 6

Categories*of*Statistical* inference**

Sta&s&cal!inference!includes!

Drawing!conclusion!from!samples! Assessing!the!significance!of!evidence!

for!a!hypothesis!

Inferring!the!parameters!of!

probabilis&c!model!from!data!

=

slide-7
SLIDE 7

Contents*

Review!of!sta&s&cal!inference! Inferring(probability(model(from(

data(

Maximum!likelihood!es&mate! Confidence!interval!for!MLE! Bayesian!inference!

slide-8
SLIDE 8

Motivation:*binomial*example*

Suppose!we!have!a!coin!with!unknown!

probability!of!coming!up!heads!

We!toss!it!N!&mes!and!observe!k!heads! We!know!that!this!data!comes!from!a!

binomial!distribu&on!

What!is!your!best!es&mate!of!the!probability!

  • f!coming!up!heads?!

Credit:!David!Varodayan!

  • K

pix-ks-fkjpkci.PT

"

µ= Is

pot getting

head ?

15-5

slide-9
SLIDE 9

Motivation:*geometric*example*

Suppose!we!have!a!die!with!unknown!

probability!of!coming!up!six!

We!roll!it!and!it!comes!up!six!for!the!first!

&me!on!the!kth!roll!

We!know!that!this!data!comes!from!a!

geometric!distribu&on!

What!is!your!best!es&mate!of!the!probability!

  • f!coming!up!heads?!

Credit:!David!Varodayan!

slide-10
SLIDE 10

Motivation:*Poisson*example*

Suppose!we!have!data!on!the!number!of!babies!

born!each!hour!in!a!large!hospital!

!We!can!assume!the!data!comes!from!a!Poisson!

distribu&on!

What!is!your!best!es&mate!of!the!intensity!λ?!

Credit:!David!Varodayan!

hour(

1( 2(

…!

N"

#!of!babies! k1# k2#

…!

kN#

!

known

T

T

T

K ,

kN

Atm known

slide-11
SLIDE 11

The*parameter*estimation*problem*

Suppose!we!have!a!dataset!that!we!know!comes!from!

a!distribu&on!(ie.!Binomial,!Geometric,!or!Poisson,!etc.)!

What!is!the!best!es&mate!of!the!parameters!(θ!or!θs)!

  • f!the!distribu&on?!

Examples:!

For!binomial!and!geometric!distribu&on,!θ(=!p!(probability!of!

success)!

For!Poisson!and!exponen&al!distribu&ons,!θ(=!λ!(intensity)! For!normal!distribu&ons,!θ(could!be!μ!or!σ2.#

  • T
  • T

mu

n

n

slide-12
SLIDE 12

Maximum*likelihood*estimation*(MLE)*

We!write!the!probability!of!seeing!the!data!D!

given!parameter!θ!!

The!likelihood(func,on!!!!!!!!!!is!not!a!

probability!distribu&on!

The!maximum(likelihood(es,mate((MLE)!of!

θ!is!! !

L(θ) = P(D|θ)

L(θ)

ˆ θ = arg max

θ

L(θ)

mm

'

up!

"

P H T

  • *oi

Ky

Pata

  • is
antf

Yammerer

slide-13
SLIDE 13

Why*is*L(θ)*not*a*probability*distribution?*

A.!!It!doesn’t!give!the!probability!of!all!the! possible!θ!values.!! B.!Don’t!know!whether!the!sum!or!integral!of!!!!!!!!!!! for!all!possible!θ!values!is!one!or!not.!! C.!Both.! L(θ)

¥

40)

f # die

I

O is

not

a

random variable

slide-14
SLIDE 14

Why*is*L(θ)*not*a*probability*distribution?*

A.!!It!doesn’t!give!the!probability!of!all!the! possible!θ!values.!! B.!Don’t!know!whether!the!sum!or!integral!of!!!!!!!!!!! for!all!possible!θ!values!is!one!or!not.!! C.!Both.! L(θ)

slide-15
SLIDE 15

Likelihood*function:*binomial*example*

Suppose!we!have!a!coin!with!unknown!

probability!of!coming!up!heads!

We!toss!it!N!&mes!and!observe!k!heads! We!know!that!this!data!comes!from!a!binomial!

distribu&on!

What!is!the!likelihood!func&on!!!!!!!!!!!!!!!!!!!!!!!!!!!?!

!

L(θ) = P(D|θ)

slide-16
SLIDE 16

Likelihood*function:*binomial*example*

Suppose!we!have!a!coin!with!unknown!

probability!of!coming!up!heads!

We!toss!it!N!&mes!and!observe!k!heads! We!know!that!this!data!comes!from!a!binomial!

distribu&on!

What!is!the!likelihood!func&on!!!!!!!!!!!!!!!!!!!!!!!!!!!?!

!

L(θ) = P(D|θ)

L(θ) = N k

  • θk(1 − θ)N−k

replace

P

with

slide-17
SLIDE 17

MLE*derivation:*binomial*example*

L(θ) = N k

  • θk(1 − θ)N−k

ˆ θ = arg max

θ

L(θ)

In!order!to!find:! ! We!set:!!

dL(θ) dθ = 0

TI

.

slide-18
SLIDE 18

MLE*derivation:*binomial*example*

L(θ) = N k

  • θk(1 − θ)N−k
slide-19
SLIDE 19

MLE*derivation:*binomial*example*

L(θ) = N k

  • θk(1 − θ)N−k

d dθL(θ) = N k

  • (kθk−1(1 − θ)N−k − θk(N − k)(1 − θ)N−k−1) = 0

(c

. fi . tr ) "
  • ff . fit fits)

/ -

slide-20
SLIDE 20

MLE*derivation:*binomial*example*

L(θ) = N k

  • θk(1 − θ)N−k

d dθL(θ) = N k

  • (kθk−1(1 − θ)N−k − θk(N − k)(1 − θ)N−k−1) = 0

kθk−1(1 − θ)N−k = θk(N − k)(1 − θ)N−k−1

slide-21
SLIDE 21

MLE*derivation:*binomial*example*

L(θ) = N k

  • θk(1 − θ)N−k

d dθL(θ) = N k

  • (kθk−1(1 − θ)N−k − θk(N − k)(1 − θ)N−k−1) = 0

kθk−1(1 − θ)N−k = θk(N − k)(1 − θ)N−k−1 k − kθ = Nθ − kθ

#

slide-22
SLIDE 22

MLE*derivation:*binomial*example*

L(θ) = N k

  • θk(1 − θ)N−k

d dθL(θ) = N k

  • (kθk−1(1 − θ)N−k − θk(N − k)(1 − θ)N−k−1) = 0

kθk−1(1 − θ)N−k = θk(N − k)(1 − θ)N−k−1 k − kθ = Nθ − kθ

ˆ θ = k N

The(MLE(of(p(

O

maximized at E

O

  • s p

ECKf- NP

slide-23
SLIDE 23

Likelihood*function:*geometric*example*

Suppose!we!have!a!die!with!unknown!probability!

  • f!coming!up!six!

We!roll!it!and!it!comes!up!six!for!the!first!&me!on!

the!kth!roll!

We!know!that!this!data!comes!from!a!geometric!

distribu&on!

What!is!the!likelihood!func&on!!!!!!!!!!!!!!!!!!!!!!!!!!!?!

Assume(θ(is(p.! !

L(θ) = P(D|θ)

slide-24
SLIDE 24

MLE*derivation:*geometric*example*

L(θ) = (1 − θ)k−1θ

  • is

pot

  • head.
slide-25
SLIDE 25

MLE*derivation:*geometric*example*

L(θ) = (1 − θ)k−1θ

d dθL(θ) = (1 − θ)k−1 − (k − 1)(1 − θ)k−2θ = 0

f- e f u

fit.

Titi

¥

slide-26
SLIDE 26

MLE*derivation:*geometric*example*

L(θ) = (1 − θ)k−1θ

(1 − θ)k−1 = (k − 1)(1 − θ)k−2θ

d dθL(θ) = (1 − θ)k−1 − (k − 1)(1 − θ)k−2θ = 0

/

r

I - O

= ( K - 1) O

p at sett's

l

  • O = KO
  • O

head is

i

  • Ko

'

  • '

='T

slide-27
SLIDE 27

MLE*derivation:*geometric*example*

L(θ) = (1 − θ)k−1θ

(1 − θ)k−1 = (k − 1)(1 − θ)k−2θ

d dθL(θ) = (1 − θ)k−1 − (k − 1)(1 − θ)k−2θ = 0

1 − θ = kθ − θ

slide-28
SLIDE 28

MLE*derivation:*geometric*example*

L(θ) = (1 − θ)k−1θ

(1 − θ)k−1 = (k − 1)(1 − θ)k−2θ

d dθL(θ) = (1 − θ)k−1 − (k − 1)(1 − θ)k−2θ = 0

1 − θ = kθ − θ

ˆ θ = 1 k

The(MLE(of(p(

Geometric

dis r - .

Eckl = ¥

=I

6

slide-29
SLIDE 29

MLE*with*data*from*IID*trials*

If!the!dataset!!!!!!!!!!!!!!!!!!comes!from!IID!trials! Each!xi!!is!one!observed!result!from!an!IID!trial!

D = {x}

L(θ) = P(D|θ) =

  • xi∈D

P(xi|θ)

"

""

Xi C- D

  • TT
slide-30
SLIDE 30

Q:*MLE*with*data*from*IID*trials*

If!the!dataset!!!!!!!!!!!!!!!!!!comes!from!IID!trials! Why!is!the!above!func&on!defined!by!the!product?!

!A.!IID!samples!are!independent! !B.!Each!trial!has!iden&cal!probability!func&on! !C.!Both.!

D = {x}

L(θ) = P(D|θ) =

  • xi∈D

P(xi|θ)

Xi GD

slide-31
SLIDE 31

Q:*MLE*with*data*from*IID*trials*

If!the!dataset!!!!!!!!!!!!!!!!!!comes!from!IID!trials! Why!is!the!above!func&on!defined!by!the!product?!

!A.!IID!samples!are!independent! !B.!Each!trial!has!iden&cal!probability!func&on! !C.!Both.!

D = {x}

L(θ) = P(D|θ) =

  • xi∈D

P(xi|θ)

slide-32
SLIDE 32

MLE*with*data*from*IID*trials*

If!the!dataset!!!!!!!!!!!!!!!!!!comes!from!IID!trials! The!likelihood!func&on!is!hard!to!differen&ate!in!

general,!except!for!the!binomial!and!geometric! cases.!

Clever!trick:!take!the!(natural)!log!

D = {x}

L(θ) = P(D|θ) =

  • xi∈D

P(xi|θ)

slide-33
SLIDE 33

LogJlikelihood*function*

Since!log!is!a!strictly!increasing!func&on!

!

So!we!can!aim!to!maximize!the!logClikelihood(

func,on(

The!logRlikelihood!func&on!is!usually!much!easier!

to!differen&ate!

ˆ θ = arg max

θ

L(θ) = arg max

θ

logL(θ)

logL(θ) = logP(D|θ) = log

  • xi∈D

P(xi|θ) =

  • xi∈D

logP(xi|θ)

  • .
  • I
slide-34
SLIDE 34

LogJlikelihood*function:*Poisson*example*

Suppose!we!have!data!on!the!number!of!babies!

born!each!hour!in!a!large!hospital!

!We!can!assume!the!data!comes!from!a!Poisson!

distribu&on!λ!

What!is!the!log!likelihood!func&on!!!!!!!!!!!!!!!!!!!!!?!

hour(

1( 2(

…!

N"

#!of!babies! k1# k2#

…!

kN#

!

LogL(θ)

slide-35
SLIDE 35

LogJlikelihood*function:*Poisson*example*

L(θ) =

N

  • i=1

e−θθki ki!

log L(θ) = log (

N

  • i=1

e−θθki ki! ) =

N

  • i=1

log(e−θθki ki! )

=

N

  • i=1

(−θ + ki logθ − log ki!)

#

PC kilos-E!?

. O

E

  • ki

two

slide-36
SLIDE 36

MLE*:*Poisson*example*

LogL(θ) =

N

  • i=1

(−θ + ki logθ − log ki!)

Wo

"

= II. title

I

= II,

c- It

In

=

  • N t 2- II

E-I

6

slide-37
SLIDE 37

MLE*:*Poisson*example*

d dθlog L(θ) = 0 ⇒

N

  • i=1

(−1 + ki θ − 0) = 0 LogL(θ) =

N

  • i=1

(−θ + ki logθ − log ki!)

slide-38
SLIDE 38

MLE*:*Poisson*example*

d dθlog L(θ) = 0 ⇒

N

  • i=1

(−1 + ki θ − 0) = 0 −N + N

i ki

θ = 0 LogL(θ) =

N

  • i=1

(−θ + ki logθ − log ki!)

"

T ! Ki

D= -

d

N

for a

slide-39
SLIDE 39

Poisson

Poisson

  • e-

B.

'

mo

( ( O)

= IT 401 Luca) ↳ co)

=e7÷

'

. iinoictoi

"

.
slide-40
SLIDE 40

MLE*:*Poisson*example*

d dθlog L(θ) = 0 ⇒

N

  • i=1

(−1 + ki θ − 0) = 0 −N + N

i ki

θ = 0

ˆ θ = N

i ki

N

The(MLE(of(λ(

LogL(θ) =

N

  • i=1

(−θ + ki logθ − log ki!)

slide-41
SLIDE 41

MLE*for*normal*distribution*

Suppose!we!model!the!dataset!!!!!!!!!!!!!!!!!!as!

normally!distributed!!

What!should!be!the!likelihood!func&on?!Is!the!

method!of!modeling!the!same!as!for!the!Poisson! distribu&on?! !A.!!!Yes!!!!!!!!B.!No!

D = {x}

slide-42
SLIDE 42

MLE*for*normal*distribution*

Suppose!we!model!the!dataset!!!!!!!!!!!!!!!!!!as!

normally!distributed!!

What!should!be!the!likelihood!func&on?!Is!the!

method!of!modeling!the!same!as!for!the!Poisson! distribu&on?!Yes(and(No.!The!idea!is!similar!but! the!normal!distribu&on!is!con&nuous,!we!need!to! use!the!probability(density!instead.!

D = {x}

N

¥, R

→ II, P

p→density

fun at.by

slide-43
SLIDE 43

MLE*for*normal*distribution*

Suppose!we!model!the!dataset!!!!!!!!!!!!!!!!!!as!

normally!distributed!!

The!likelihood!func&on!of!a!normal!distribu&on:!

D = {x}

L(µ, σ) =

n

  • i=1

1 √ 2πσexp(−(xi − µ)2 2σ2 )

XIEDO

*

x

n

p

I'

U

G

p -

anax!mish

L (µ ,6)

slide-44
SLIDE 44

MLE*for*normal*distribution*

Suppose!we!model!the!dataset!!!!!!!!!!!!!!!!!!as!

normally!distributed!!

There!are!two!parameters!to!es&mate:!μ!and!σ#

If!we!fix!σ(and!set!θ=!μ#

""

If!we!fix!μ#and!set!θ=!σ!

ˆ θ = 1 N

N

  • i=1

xi ˆ θ =

  • 1

N

N

  • i=1

(xi − µ)2

D = {x}

÷

"

i

slide-45
SLIDE 45

Efx]

A

=£×e

  • f •

DX

= µ

slide-46
SLIDE 46

( (M , 5 )

d KCMO) ⇒

fco , is -_ o

fix G

→-

i

dm

tix n → dD⇒ { face,nI=o d5

in

E

correction

:

in

ele

case of

fixing

0 ,

using log Likelihood function fi

doesn't depend

  • n

6

.

in the

case of fixing u ,

we

assume

we

know u

using Ei

as

estimation of u

slide-47
SLIDE 47

Drawbacks*of*MLE*

Maximizing!some!likelihood!or!logRlikelihood!

func&on!is!mathema&cally!hard!

!If!there!are!very!few!data!items,!the!MLE!

es&mate!maybe!very!unreliable!

If!we!observe!3!heads!in!10!coin!tosses,!should!we!

accept!that!p(heads)=!0.3!?!

If!we!observe!0!heads!in!2!coin!tosses,!should!we!

accept!that!p(heads)=!0!?!

slide-48
SLIDE 48

Confidence*intervals*for*MLE*estimates*

An!MLE!parameter!es&mate!!!!!!depends!on!the!

data!that!was!observed!

We!can!construct!a!confidence!interval!for!!!!!using!

the!parametric!bootstrap!

Use!the!distribu&on!with!parameter!!!!!!!to!generate!

a!large!number!of!bootstrap!samples!

From!each!“synthe&c”!dataset,!reRes&mate!the!

parameter!using!MLE!

Use!the!histogram!of!these!reRes&mates!to!

construct!a!confidence!interval!

ˆ θ ˆ θ ˆ θ

  • ¥
slide-49
SLIDE 49

Bayesian*inference*

In!MLE,!we!maximized!the!likelihood!func&on!! In!Bayesian!inference,!we!will!maximize!the!posterior,!

which!is!the!probability!of!the!parameters!θ!given!the!

  • bserved!data!D.!

Unlike!!!!!!!!!,!the!posterior!is!a!probability!distribu&on! The!value!of!θ!that!maximizes!!!!!!!!!!!!!!!!is!called!the!

maximum(a(posterior((MAP)!es&mate!!

L(θ) = P(D|θ)

ˆ θ

P(θ|D)

P(θ|D)

L(θ)

slide-50
SLIDE 50

The*prior*

From!Bayes!rule!

P(θ|D) = P(D|θ)P(θ) P(D) ∝ P(D|θ)P(θ)

slide-51
SLIDE 51

The*prior*

From!Bayes!rule!

P(θ|D) = P(D|θ)P(θ) P(D) ∝ P(D|θ)P(θ)

slide-52
SLIDE 52

The*prior*

From!Bayes!rule! The!probability!of!the!data!P(D)!is!a!constant,!which!

doesn’t!maher!for!differen&a&on.!

Bayesian!inference!allows!us!to!include!prior!beliefs!

about!θ!in!the!prior!!!!!!!!!!,!which!is!useful!

When!we!have!reasonable!beliefs,!such!as!a!coin!can!not!

have!P(heads)!=!0!

When!there!isn’t!much!data!

P(θ)

P(θ|D) = P(D|θ)P(θ) P(D) ∝ P(D|θ)P(θ)

slide-53
SLIDE 53

Q.*Why*is*the*posterior*a*probability*

distribution?**

!A.!!It!is!defined!so!that!it!gives!the!probability! !of!θ!for!all!possibility!condi&oned!on!the!data! !B.!The!sum!or!integral!of!such!condi&onal! !probability!should!be!1! !C.!Both!

slide-54
SLIDE 54

Q.*Why*is*the*posterior*a*probability*

distribution?**

!A.!!It!is!defined!so!that!it!gives!the!probability! !of!θ!for!all!possibility!condi&oned!on!the!data! !B.!The!sum!or!integral!of!such!condi&onal! !probability!should!be!1! !C.!Both!

slide-55
SLIDE 55

Assignments*

Read!Chapter!9!of!the!textbook! Next!&me:!Bayesian!inference!

!

slide-56
SLIDE 56

Additional*References*

✺ Robert!V.!Hogg,!Elliot!A.!Tanis!and!Dale!L.!

Zimmerman.!“Probability!and!Sta&s&cal! Inference”!!

Morris!H.!Degroot!and!Mark!J.!Schervish!

"Probability!and!Sta&s&cs”!

slide-57
SLIDE 57

See*you*next*time*

See You!