Hi Matlab Grader homework, emailed Thursday, 1 (of 9) homeworks Due - - PowerPoint PPT Presentation

hi
SMART_READER_LITE
LIVE PREVIEW

Hi Matlab Grader homework, emailed Thursday, 1 (of 9) homeworks Due - - PowerPoint PPT Presentation

Announcements Hi Matlab Grader homework, emailed Thursday, 1 (of 9) homeworks Due 21 April, Binary graded. 2 this week Jupyter homework?: translate matlab to Jupiter, TA Harshul h6gupta@eng.ucsd.edu or me I would like this to happen. GPU


slide-1
SLIDE 1

Announcements

Matlab Grader homework, emailed Thursday, 1 (of 9) homeworks Due 21 April, Binary graded. 2 this week Jupyter homework?: translate matlab to Jupiter, TA Harshul h6gupta@eng.ucsd.edu or me I would like this to happen. “GPU” homework. NOAA climate data in Jupyter on the datahub.ucsd.edu, 15 April. Projects: Any computer language Podcast might work eventually. Today:

  • Stanford CNN
  • Gaussian, Bishop 2.3
  • Gaussian Process 6.4
  • Linear regression 3.0-3.2

Wednesday 10 April Stanford CNN, Linear models for regression 3, Applications of Gaussian processes.

Hi

slide-2
SLIDE 2

Bayes and Softmax (Bishop p. 198)

  • Bayes:
  • Classification of N classes:

p(x|y) = p(y|x)p(x) p(y) = p(y|x)p(x) P

y∈Y p(x, y)

C p(Cn|x) = p(x|Cn)p(Cn) PN

k=1 p(x|Ck)p(Ck)

= exp(an) PN

k=1 exp(ak)

with an = ln (p(x|Cn)p(Cn))

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 - April 6, 2017

Parametric Approach: Linear Classifier

54

Image parameters

  • r weights

W

f(x,W)

10 numbers giving class scores

Array of 32x32x3 numbers (3072 numbers total)

f(x,W) = Wx + b

3072x1 10x1 10x3072 10x1

O

it

So

Et

slide-3
SLIDE 3

Softmax to Logistic Regression (Bishop p. 198)

p(C1|x) = p(x|C1)p(C1) P2

k=1 p(x|Ck)p(Ck)

= exp(a1) P2

k=1 exp(ak)

= 1 1 + exp(−a) with a = ln p(x|C1)p(C1) p(x|C2)p(C2) s for binary classification we should use logis

slide-4
SLIDE 4

Softmax with Gaussian(Bishop p. 198)

|C C Assuming x is Gaussian N(µn, Σ)

  • rm, it can be shown that (7) can be ex

C p(Cn|x) = p(x|Cn)p(Cn) PN

k=1 p(x|Ck)p(Ck)

= exp(an) PN

k=1 exp(ak)

with an = ln (p(x|Cn)p(Cn))

an = wT

n x + w0

wn = Σ−1µn w0 = −1 2 µT

nΣ−1µn + ln(p(Cn))

I

xc NLM.is

2

e

EH ANTE'aripen

cncpfxknj

EVTEIEMTE.fi

mish

w

w

n

Wo

two

Wii EMTs

qxqq

X

87

tea

we

Had

slide-5
SLIDE 5

Entropy 1.6

Important quantity in

  • coding theory
  • statistical physics
  • machine learning
slide-6
SLIDE 6

The Kullback-Leibler Divergence

P true distribution, q is approximating distribution

not

a distance mealsare

slide-7
SLIDE 7

KL homework

  • Support of P and Q = > “only >0” don’t use isnan isinf
  • After you pass. Take your time to clean up. Get close to 50
slide-8
SLIDE 8

Lecture 3

  • Homework
  • Pod-cast lecture on-line
  • Next lectures:

– I posted a rough plan. – It is flexible though so please come with suggestions

slide-9
SLIDE 9

Bayes for linear model

! = #$ + & &~N(*, ,-) y~N(#$, ,-) prior: $~N(*, ,$) / $ ! ~/ ! $ / $ ~0 $1, ,/ mean $1 = ,1#2,-

34!

Covariance ,1

34 = #2,- 34# + ,5 34

de

IE

trig

ex xp

e

flat y

cat

µ

e

ITE't

g

e

EE At

Ax cxtq

EATg µ

L l

xTcf'x

XTCpXp

slide-10
SLIDE 10

Bayes’ Theorem for Gaussian Variables

  • Given
  • we have
  • where
slide-11
SLIDE 11

Contribution of the Nth data point, xN

Sequential Estimation of mean (Bishop 2.3.5) correction given xN

correction weight

  • ld estimate
slide-12
SLIDE 12

Bayesian Inference for the Gaussian (Bishop2.3.6)

Assume s2 is known. Given i.i.d. data the likelihood function for µ is given by

  • This has a Gaussian shape as a function of µ (but it is not a distribution over µ).

UH

slide-13
SLIDE 13

Bayesian Inference for the Gaussian (Bishop2.3.6)

  • Combined with a Gaussian prior over µ,
  • this gives the posterior

so

I

T

µ Mart

e

Iz

NENGAst

K Mo

2 ENZ

slide-14
SLIDE 14

Bayesian Inference for the Gaussian (3)

  • Example: for N = 0, 1, 2 and 10.

Prior

slide-15
SLIDE 15

Bayesian Inference for the Gaussian (4)

Sequential Estimation The posterior obtained after observing N-1 data points becomes the prior when we

  • bserve the Nth data point.

Conjugate prior: posterior and prior are in the same family. The prior is called a conjugate prior for the likelihood function. I

7

slide-16
SLIDE 16

Gaussian Process (Bishop 6.4, Murphy15)

tn = yn + ϵn

I

a

i

O

  • C

r

ee

l

slide-17
SLIDE 17

Gaussian Process (Murphy ch15) Training

T

Tz

n

MI

KT Kxx

slide-18
SLIDE 18

Gaussian Process (Murphy ch15)

Common kernel is the squared exponential, RBF, Gaussian kernel

The conditional is Gaussian:

g

r