- . during share your screen Don 't * this lecture . - - PowerPoint PPT Presentation

during share your screen don t this lecture probability
SMART_READER_LITE
LIVE PREVIEW

- . during share your screen Don 't * this lecture . - - PowerPoint PPT Presentation

students Upon entry speakers of the are * for the quality of sound in Zoom muted room . ' ' raise up ' ' speak , the hand to Please * an muted for you will be audio . write private ' ' chat " to You can use * instructor the


slide-1
SLIDE 1

*

Upon entry speakers of the students

are

muted

for the quality of sound in Zoom

room .

*

Please

' ' raise up

hand

' '

to

speak , the

audio will

be

anmuted

for you

.

*

You can

use

'' chat

"

to

write private

note

to

the instructor

*

Can you

see

the poll question ? T

ake it

check piazza post # 360

if

you

can

.
  • *

Don 't

share your

screen

during

this

lecture

.
slide-2
SLIDE 2

!

Probability*and*Statistics* for*Computer*Science**

Covariance!is!coming!back!in! matrix!!

Hongye!Liu,!Teaching!Assistant!Prof,!CS361,!UIUC,!03.25.2020! Credit:!wikipedia!

cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]

slide-3
SLIDE 3

Last*time*

Review!of!Maximum!likelihood!

EsOmaOon!(MLE)!

Bayesian!Inference!(MAP)!

Check out the

discussion

videos and the pdf

file

for MLE

slide-4
SLIDE 4

Content*

Review!of!Bayesian!inference! Visualizing!high!dimensional!data!&!

Summarizing!data!

Refresh!of!some!linear!algebra! The!covariance!matrix!

!

slide-5
SLIDE 5

Bayesian inference for

O

is

p COLD)

.

It

is

a probability

distribution

.

Maximum Likelyhood function

( ( O) =p ( DIO)

is

a probability

function

but NOT

a

distribution .

PlotD) =PcDo

,

Bayes

.e

PID)

Rule

slide-6
SLIDE 6

Beta%distribution%

A"distribu&on"is"Beta"distribu&on"if"it"has"the"following"

pdf:" " ""

Is"an"expressive"family"of"

distribu&ons"""""""""""""""""""""""""""

""""""""""""""""""""""""""""""is"uniform"

P(θ) = K(α, β)θα−1(1 − θ)β−1

K(α, β) = Γ(α + β) Γ(α)Γ(β)

0.0 0.2 0.4 0.6 0.8 1.0 2 4 6 8 10 pdf of Beta − distribution X density Beta(1,1) Beta(5,5) Beta(50,50) Beta(70,70) Beta(20,50) Beta(0.5,0.5)

θ"

Beta(α = 1, β = 1)

expressive !

OGG, I]

T

T

T

T

'

270

,Boo

  • Kisiel

g

INFOto

.
  • x

t

Ct

.

k

a

t'

slide-7
SLIDE 7

Beta%distribution%as%the%conjugate%prior% for%Binomial%likelihood%

The$likelihood$is$Binomial$(N,$k)$ The$Beta$distribuOon$is$used$as$the$prior$ So$ Then$the$posterior$is$$

P(θ) = K(α, β)θα−1(1 − θ)β−1

P(D|θ) = N k

  • θk(1 − θ)N−k

P(θ|D) ∝ θα+k−1(1 − θ)β+N−k−1

Beta(α + k, β + N − k)

P(θ|D) = K(α + k, β + N − k)θα+k−1(1 − θ)β+N−k−1

O C- Co, I )

Pl 01=0

"

,

  • therwise

*

Deco , I ]

slide-8
SLIDE 8

The posterior for this

example

PCOID)

is

a

Beta

distribution

,

Continuous

X

C

""

c ,- o, Btn

  • K -i

OGG

,,]

O is the

random

variable

Binomial Distribution

i r e

Pl X -14=17

,) ok:c,-o ,

N-K

Discrete

/

K 70

K is

the

why PLOID

)

random

variable

is not

Binomial ?

slide-9
SLIDE 9

The%update%of%Bayesian%posterior%

Since$the$posterior$is$in$the$same$family$as$the$

conjugate$prior,$the$posterior$can$be$used$as$a$new$prior$ if$more$data$is$observed.$

Suppose$we$start$with$a$uniform$prior$on$the$

probability$θ$of$heads$

θ$ N" k" α" β" 1$ 1$ 3$ 0$ 1$ 4$ 10$ 7$ 8$ 7$ 30$ 17$ 25$ 20$ 100$ 72$ 97$ 48$

slide-10
SLIDE 10

Maximize%the%Bayesian%posterior%(MAP)%

The$posterior$of$the$previous$example$is$

$

DifferenOaOng$and$se^ng$to$0$gives$the$MAP$esOmate$

P(θ|D) = K(α + k, β + N − k)θα+k−1(1 − θ)β+N−k−1

ˆ θ = α − 1 + k α + β − 2 + N

2=1

  • B. =L

1St prior

I

  • It 96

= -

0<67

It I -2-1143

slide-11
SLIDE 11

Conjugate%prior%for%other%likelihood% functions%

What$is$the$the$conjugate$prior$if$the$likelihood$is$

Bernoulli$or$geometric?$

What$is$the$the$conjugate$prior$if$the$likelihood$is$

Poisson$or$ExponenOal?$

What$is$the$the$conjugate$prior$if$the$likelihood$is$

normal$with$known$variance?$ Berta Gamma

Normal

slide-12
SLIDE 12

Content%

Review$of$Bayesian$inference$ Visualizing"high"dimensional"data"

&"Summarizing"data"

Refresh$of$some$linear$algebra$ The$covariance$matrix$

$

slide-13
SLIDE 13

A%data%set%with%7%dimensions%

Seed$data$set$from$the$UCI$Machine$Learning$

site:$

areaA$ perimeterP$ compactness$ lengthKernel$ widthKernel$ asymmetry$ lengthGroove$ Label$

1$

15.26$ 14.84$ 0.871$ 5.763$ 3.312$ 2.221$ 5.22$ 1$

2$

14.88$ 14.57$ 0.8811$ 5.554$ 3.333$ 1.018$ 4.956$ 1$

3$

14.29$ 14.09$ 0.905$ 5.291$ 3.337$ 2.699$ 4.825$ 1$

4$

13.84$ 13.94$ 0.8955$ 5.324$ 3.379$ 2.259$ 4.805$ 1$

5$

16.14$ 14.99$ 0.9034$ 5.658$ 3.562$ 1.355$ 5.175$ 1$

6$

14.38$ 14.21$ 0.8951$ 5.386$ 3.312$ 2.462$ 4.956$ 1$

7$

14.69$ 14.49$ 0.8799$ 5.563$ 3.259$ 3.586$ 5.219$ 1$

…$

slide-14
SLIDE 14

Matrix%format%of%a%dataset%in%the%textbook%

Co l

l

  • N

← area A

row:-p

::

! )

q , µ

# of features

d-

slide-15
SLIDE 15

Scatterplot%matrix%

Visualizing$high$

dimensional$ data$with$ scader$plot$ matrix$

Limited$to$

small$number$

  • f$scader$plots$

$

12 16 20 12 16 20 areaA 13 15 17
  • 0.82
0.88
  • 5.0
6.0
  • 2.6
3.2 3.8
  • 2
4 6 8
  • 4.5
5.5 6.5 12 16 20
  • perimeterP
  • 13
15 17
  • ● ●
  • compactness
  • 0.82
0.88
  • lengthKernel
  • 5.0
6.0
  • widthKernel
  • 2.6
3.2 3.8
  • asymmetry
2 4 6 8
  • 4.5
5.5 6.5 4.5 5.5 6.5 lengthGroove

Red:$seed$type$I$ Blue:$seed$type$II$ Yellow:$seed$type$III$ 210$data$points$ 7$dimensions$

slide-16
SLIDE 16

3D%scatter%plot%

We$can$also$view$

the$data$set$in$3$ dimensions$

But$it’s$sOll$

limited$in$terms$

  • f$number$of$

dimensions$we$ can$see.$

3D Scatter Plot

10 12 14 16 18 20 22 0.80 0.82 0.84 0.86 0.88 0.90 0.92 12 13 14 15 16 17 18 areaA perimeterP compactness
slide-17
SLIDE 17

Summarizing%multidimensional%data%

LocaOon$and$spread$parameters$of$a$data$

set$

NotaOon$

Write${x}$for$a$dataset$consisOng$of$N$data$

items$

Each$item$xi$is$a$dgdimensional$vector;$column$ Write$jth$component$of$xi$as$xi

(j);$row$

Matrix$for$the$data$set${x}$is$d$by$N$dimension$

$

in the

textbook

.

feature

slide-18
SLIDE 18

Mean%of%a%multidimensional%data%

We$compute$the$mean$of${x}$by$compuOng$the$

mean$of$each$component$separately$and$stacking$ them$to$a$vector$

We$write$the$mean$of${x}$as$$

mean$of$jth$component$=

  • i x(j)

i

N

mean({x}) =

  • i xi

N

slide-19
SLIDE 19

Example%of%mean%of%a%multidimensional% data%set%

÷

slide-20
SLIDE 20

Content%

Review$of$Bayesian$inference$ Visualizing$high$dimensional$data$

&$Summarizing$data$

Refresh"of"some"linear"algebra" The$covariance$matrix$

$

slide-21
SLIDE 21

Why%linear%algebra?%

We$are$entering$into$part$IV$of$the$course.$

The$contents$will$be$basic$machine$learning$ techniques.$

Linear$algebra$is$essenOal$for$a$lot$of$

machine$Learning$methods!$

slide-22
SLIDE 22

Eigenvalues%and%eigenvectors%review%

If$A$is$an$n×n$square$matrix,$an$eigenvalue$λ$and$its$

corresponding$eigenvector$ν$(of$dimension$n×1)$saOsfy$ Aν%=%λν.$

To$solve$for$λ,%we$solve$the$characterisOc$equaOon$$

$|A$g$λI|$=$0$

Given$a$value$of$λ,%we$solve$ν%by$solving$$

$(A$–$λI)%ν%=%0$

Note$if$ν%$is$an$eigenvector,$then$so$is$any$mulOple%kν.$

slide-23
SLIDE 23

Eigenvalues%and%eigenvectors%example%

Find$the$eigenvalues$and$eigenvectors$$

$$

A =

  • 5

3 3 5

  • ( A - RI 1=0

15

  • A

3

1=0

What's

special

3

5-a

rt

this

A ?

is - alls - n) -9=0

symmetric

( A- 8) ( X -2) =o

A ,=8

az=z

positive definite

A 70

i

slide-24
SLIDE 24

Eigenvalues%and%eigenvectors%example%

Find$the$

eigenvectors$

A =

  • 5

3 3 5

  • $$$$$

( A - R2 IV = o

  • X. =8

(53-8 f.g) 0=0

( -3310=0

"=L :3

u,=¥V=¥f !)

" 2=2

( tf ⇒ u=o

[ }

312=0

u

"

I

v.=L'd

slide-25
SLIDE 25

Eigenvalues%and%eigenvectors%example%(2)%

Find$the$eigenvalues$and$eigenvectors$of$

A =

  • 1

2 2 4

  • 1A
  • AE 1=0

T urn . XIII

  • l't 0*11=0

What's special of

insights

.:c . siyuan IC

'

E.all

  • o

det (A) = IT di =o

(

c - d) c4 -X) - 4=0

( X - 5)

. A = o

5 ,

Are o

Ri 30

positive

semi

  • definite
slide-26
SLIDE 26

Eigenvalues%and%eigenvectors%example%

Find$the$eigenvectors$of$$

A =

  • 1

2 2 4

  • R , = 5

(A - 4770, =o

(A - 5174=0 ⇒ (

'I

14=0

  • v. =L : ) ⇒

u .

  • Fsc : ]

A -L=

A 02=0

02=171 ⇒ ur

f

  • Y)
slide-27
SLIDE 27

Diagonalization%of%a%symmetric%matrix%

If$A$is$an$n×n$symmetric$square$matrix,$the$eigenvalues$

are$real.$

If$the$eigenvalues$are$also$disOnct,$their$eigenvectors$

are$orthogonal$

We$can$then$scale$the$eigenvectors$to$unit$length,$and$

place$them$into$an$orthogonal$matrix$U=$[u1$u2$….$un]$

We$can$write$the$diagonal$matrix$$$$$$$$$$$$$$$$$$$$$$$$$such$

that$the$diagonal$entries$of$Λ$are$λ1,$λ2…$λn$in$that$order.$$

Λ = U TAU

  • why

do

we

do

this ?

slide-28
SLIDE 28

Diagonalization*example*

For$$

A =

  • 5

3 3 5

  • $$$

a , -8

u , =

( ! )

Az -_ 2

Ur =# (I

, ]

U ,

U2

t

t

c: :H¥÷⇒:HE±÷⇒ A

UT

A

U

slide-29
SLIDE 29

Q.#Are#these#two#vectors#orthogonal?#

V1#=#[3#6],#V2#=#[+2#1]# A.#Yes# B.#No#

3×1-46×1

=

  • J

Vi i

' Vai

=

  • f;) af?)
  • rthogonal dt=o
slide-30
SLIDE 30

Q.#Is#this#true?#

When#two#zero+mean#vectors#of# data#are#orthogonal,#they#are# uncorrelated## A.#Yes# B.#No#

  • -

. i÷

. --

  • mean Cui 7=0

I

s ) Cy -mail,

I I

' if

I j

'

= o

slide-31
SLIDE 31

Content#

Review#of#Bayesian#inference# Visualizing#high#dimensional#data#&#

Summarizing#data#

Refresh#of#some#linear#algebra# The$covariance$matrix$

#

slide-32
SLIDE 32

Covariance#

The$covariance#of#random#

variables#X#and#Y#is#

Note#that#

cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]

cov(X, X) = E[(X − E[X])2] = var[X]

#

slide-33
SLIDE 33

Correlation#coefficient#is# normalized##covariance#

The#correlaOon#coefficient#is#

$

When#X, Y#takes#on#values#with#equal#

probability#to#generate#data#sets# {(x,y)},#the#correlaOon#coefficient#will# be#as#seen#in#Chapter#2.#

corr(X, Y ) = cov(X, Y ) σXσY

III.

N

slide-34
SLIDE 34

Covariance#seen#from#scatter#plots#

PosiOve## Covariance# # NegaOve## Covariance# Zero## Covariance# #

Credit:# Prof.Forsyth#

O

'I

slide-35
SLIDE 35

Covariance#for#a#pair#of#components#in#a# data#set#

For#the#jth#and#kth#components#of#a#data#set#

{x}#

############################

cov({x}; j, k)=

  • i(x(j)

i

− mean({x(j)}))(x(k)

i

− mean({x(k)}))T N

U,

. VI

=-2 Vii

.

last

product

t t H

① it #

et

.

com -

  • III.

Ey

. Ea

t ( Corr l E I

slide-36
SLIDE 36

Covariance#of#a#pair#of#components#

{

cov({x}; 3, 5)

Data#set#{x} 7×8# Take#each#row# (component)#of#a#pair# and#subtract#it#by#the# row#mean,#then#do# the#inner#product#of# the#two#resulOng# rows#and#divide#by# the#number#of# columns#

1$ 2$ 3$ 4$ 5$ 6$ 7$ 8$ 1# *# *# *# *# *# *# *# *# 2# *# *# *# *# *# *# *# *# 3# *# *# *# *# *# *# *# *# 4# *# *# *# *# *# *# *# *# 5# *# *# *# *# *# *# *# *# 6# *# *# *# *# *# *# *# *# 7# *# *# *# *# *# *# *# *#

P

g deft

E:

: ÷

.

I

slide-37
SLIDE 37

Covariance#of#a#pair#of#components#

How#many#pairs#of#rows# are#there#for#which#we#can# compute#the#covariance?# # A) 49# B) 64# C) 56#

1$ 2$ 3$ 4$ 5$ 6$ 7$ 8$ 1# *# *# *# *# *# *# *# *# 2# *# *# *# *# *# *# *# *# 3# *# *# *# *# *# *# *# *# 4# *# *# *# *# *# *# *# *# 5# *# *# *# *# *# *# *# *# 6# *# *# *# *# *# *# *# *# 7# *# *# *# *# *# *# *# *#

{

cov({x}; 3, 5)

Data#set#{x} 7×8##

IT

www.xk

✓arCx)

tron

is

feature

7×7

slide-38
SLIDE 38

Covariance#matrix#

1$ 2$ 3$ 4$ 5$ 6$ 7$ 1# *# *# *# *# *# *# *# 2# *# *# *# *# *# *# *# 3# *# *# *# *# *# *# *# 4# *# *# *# *# *# *# *# 5# *# *# *# *# *# *# *# 6# *# *# *# *# *# *# *# 7# *# *# *# *# *# *# *#

Covmat(##########)#

{x}

7×7#

1$ 2$ 3$ 4$ 5$ 6$ 7$ 8$ 1# *# *# *# *# *# *# *# *# 2# *# *# *# *# *# *# *# *# 3# *# *# *# *# *# *# *# *# 4# *# *# *# *# *# *# *# *# 5# *# *# *# *# *# *# *# *# 6# *# *# *# *# *# *# *# *# 7# *# *# *# *# *# *# *# *#

cov({x}; 3, 5)

Data#set#{x} 7×8##

{

T

g

  • ⇐.

.

O

O

O

Cove { x 3 ; 3,51

= Cov (1×55

, 3)

slide-39
SLIDE 39

Properties#of#Covariance#matrix#

cov({x}; j, j) = var({x(j)})

1$ 2$ 3$ 4$ 5$ 6$ 7$ 1# *# *# *# *# *# *# *# 2# *# *# *# *# *# *# *# 3# *# *# *# *# *# *# *# 4# *# *# *# *# *# *# *# 5# *# *# *# *# *# *# *# 6# *# *# *# *# *# *# *# 7# *# *# *# *# *# *# *#

Covmat(##########)#

{x}

7×7#

The#diagonal#elements#

  • f#the#covariance#matrix#

are#just#variances#of# each#jth#components#

The#off#diagonals#are#

covariance#between# different#components#

  • * *
  • j
slide-40
SLIDE 40

Properties#of#Covariance#matrix#

1$ 2$ 3$ 4$ 5$ 6$ 7$ 1# *# *# *# *# *# *# *# 2# *# *# *# *# *# *# *# 3# *# *# *# *# *# *# *# 4# *# *# *# *# *# *# *# 5# *# *# *# *# *# *# *# 6# *# *# *# *# *# *# *# 7# *# *# *# *# *# *# *#

Covmat(##########)#

{x}

7×7#

The#covariance#

matrix#is#symmetric!#

And#it’s#posiKve$

semi;definite,#that#is# all#λi#≥#0#

Covariance#matrix#is#

diagonalizable#

cov({x}; j, k) = cov({x}; k, j)

slide-41
SLIDE 41

Properties#of#Covariance#matrix#

1$ 2$ 3$ 4$ 5$ 6$ 7$ 1# *# *# *# *# *# *# *# 2# *# *# *# *# *# *# *# 3# *# *# *# *# *# *# *# 4# *# *# *# *# *# *# *# 5# *# *# *# *# *# *# *# 6# *# *# *# *# *# *# *# 7# *# *# *# *# *# *# *#

Covmat(##########)#

{x}

7×7#

If#we#define#xc#as#the#

mean#centered# matrix#for#dataset#{x}#

The#covariance#

matrix#is#a#d×d#matrix#

d#=7##

Covmat({x}) = xc × xT

c

N

x

*

slide-42
SLIDE 42

Example:#covariance#matrix#of#a#data#set#

X(1)# X(2)#

What#are#the#dimensions#of#the# covariance#matrix#of#this#data?# # A) 2#by#2# B) 5#by#5# C) 5#by#2# D) 2#by#5#

A0 =

  • 5

4 3 2 1 −1 1 1 −1

  • (I)#

I = ~

d xd

# et features

slide-43
SLIDE 43

Example:#covariance#matrix#of#a#data#set#

Mean#centering# (I)#

A0 =

  • 5

4 3 2 1 −1 1 1 −1

  • A1 =
  • 2

1 −1 −2 −1 1 1 −1

  • Mean

÷

slide-44
SLIDE 44

Example:#covariance#matrix#of#a#data#set#

Mean#centering# (I)#

A0 =

  • 5

4 3 2 1 −1 1 1 −1

  • Inner#product#of#each#pairs:#

##################[1,1]#=#10# ##################[2,2]#=#4# ##################[1,2]#=#0# #

(II)#

A2 A2 A2

A2 = A1AT

1

A1 =

  • 2

1 −1 −2 −1 1 1 −1

  • ⇐ i ie -is
c-44
  • A L( 2 ,
c ) = O
slide-45
SLIDE 45

Example:#covariance#matrix#of#a#data#set#

Mean#centering# (I)#

A0 =

  • 5

4 3 2 1 −1 1 1 −1

  • Inner#product#of#each#pairs:#

##################[1,1]#=#10# ##################[2,2]#=#4# ##################[1,2]#=#0# #

(II)#

A2 A2 A2

A2 = A1AT

1

A1 =

  • 2

1 −1 −2 −1 1 1 −1

  • Covmat(#######)#

{x}

Divide#the#matrix#with#N#–#the#number#of#items# (III)#

= 1 N A2 = 1 5

  • 10

4

  • =
  • 2

0.8

  • =
  • 000
slide-46
SLIDE 46

What%do%the%data%look%like%when% Covmat({x})%is%diagonal?%

*" *" *" *" *"

Covmat(""""""")"

{x} = 1

N A2 = 1 5

  • 10

4

  • =
  • 2

0.8

  • A0 =
  • 5

4 3 2 1 −1 1 1 −1

  • X(1)"

X(2)"

It Data pts =5

G

00000

*

*

*

*

*

°

A-11 = UTA U

A , h

are

diagonal

OT

. U

can

be I

slide-47
SLIDE 47

Translation*properties*of*mean*and* covariance*matrix*

TranslaOng)the)data)set)translates)the)

mean)

TranslaOng)the)data)set)leaves)the)

covariance)matrix)unchanged)

)

mean({x} + c) = mean({x}) + c

Covmat({x} + c) = Covmat({x})

slide-48
SLIDE 48

Translation*properties*of*mean*and* covariance*matrix*

Proof:)

Covmat({x + c}) =

  • i(xi + c − mean({x + c}))(xi + c − mean({x + c}))T

N =

  • i(xi + c − mean({x}) − c)(xi + c − mean({x}) − c)T

N =

  • i(xi − mean({x}))(xi − mean({x}))T

N = Covmat({x})

slide-49
SLIDE 49

Linear*transformation*properties*of* mean*and*covariance*matrix*

Linearly)transforming)the)data)set)linearly)

transforms)the)mean)

Linearly)transforming)the)data)set)linearly)

changes)the)covariance)matrix)quadraOcally)

Covmat({Ax}) = A Covmat({x})AT

mean({Ax}) = A mean({x})

  • var Cc x ) = c '

Vera]

T

slide-50
SLIDE 50

Proof*of*linear*transformation*of* covariance*matrix*

Covmat({Ax}) =

  • i(Axi − mean({Ax}))(Axi − mean({Ax}))T

N =

  • i(Axi − A mean({x}))(Axi − A mean({x}))T

N =

  • i A(xi − mean({x}))(xi − mean({x}))TAT

N = A

  • i(xi − mean({x}))(xi − mean({x}))T

N AT = A Covmat({x})AT

CAB) T

① f5

  • O-O

cAKEY?

'

f

) AT

slide-51
SLIDE 51

Dimension*Reduction*

In)stead)of)showing)more)dimensions)through)

visualizaOon,)it’s)a)good)idea)to)do)dimension) reducOon)in)order)to)see)the)major)features)of) the)data)set.)

For)example,)principal)component)analysis)help)

find)the)major)components)of)the)data)set.)

PCA)is)essenOally)about)finding)eigenvectors)of)

covariance)matrix)

slide-52
SLIDE 52

Assignments*

Read)Chapter)10)of)the)textbook) Next)Ome:)PCA)

)

slide-53
SLIDE 53

Additional*References*

✺ Robert)V.)Hogg,)Elliot)A.)Tanis)and)Dale)L.)

Zimmerman.)“Probability)and)StaOsOcal) Inference”))

Morris)H.)Degroot)and)Mark)J.)Schervish)

"Probability)and)StaOsOcs”)

slide-54
SLIDE 54

See*you*next*time*

See You!