descent fly gradient Using C SNE Symmetric Instead of Pilj - - PDF document

descent fly
SMART_READER_LITE
LIVE PREVIEW

descent fly gradient Using C SNE Symmetric Instead of Pilj - - PDF document

PCA Wrap Up Projection perspective Ii 132 B orthonormal again Want To Minimize error reconstruction Nt E I.li Jm xi What The for Coordinates 2 Xi are optimal B w.ro JI JI JI J2ji Iki d2ji II x sitter f CI zmibm b


slide-1
SLIDE 1

PCA

Wrap Up

Projection

perspective

Ii

132

B

again

  • rthonormal

Want

To

Minimize

reconstruction

error

Jm

NtE

xi

I.li

What

are

The

  • ptimal

Coordinates

2

for

Xi

w.ro

B

JI

JI

JI

J2ji

Iki

d2ji

II

f

x sitter

CIzmibm

b

slide-2
SLIDE 2

So

IJM

f

x Htb

bmt.bg

if

mtj

f

x I zmib

big

f

x b

2

b

f

x b

2

Set

to

Zji

bjXi

A

similar

argument

for

choice

  • f

B

Cbasis can

be

made

yielding again

The

M

largest

Eigenvectors

see

reading

slide-3
SLIDE 3

tochastic

Neighbor

Embedding

CSNE

very low dim

Airn

X

y

very high dim

Define

a

conditional

probability

That

encodes

similarity

Pj

exPE

llxi

xg.li

2o

ZiexpE hxi

xklf

zo

i

Th

is

in

high

dim space

Xi

Xj

Xu

similarly

in

The

map

y

9J PE Nyi

y If

IEexpE Kyi yah

Ideally

Pjli

9J

Vij

slide-4
SLIDE 4

Formalize

this

with

KL

Divergence

KL

KLlqHp

E qlxllogff.IT

How different

is

the

distribution 9

from p

Properties

i

KL gllp

Z 0

2

If

KL glp

g

p

3

KL gmp

f

KL pkg

OST

function

c

fkLCQHP.it E

  • Pju.logPig

1

Same

but

Conditional

distribution

in

M

  • ver

all

  • ther

points j

given

i

in

D

To

place

points

find

y

TO

MINIMIZE

C

Using

gradient

descent fly C

slide-5
SLIDE 5

Symmetric

SNE

Instead

  • f

conditionals

Pilj 9ilj define

joint

distributions

Pij 9ij

9j

PE Yi yjH

nY

i

p

9ij

9ji

For

high dim

space

  • utliers

pose

a

challenge

because

denominator

will

be

large Pij

small

4ij

unimportant

Instead

Pig If

Piti

Pig

Zn

F

Pig Z In

txi

Yields

a

nicer fly

Thicsymmetric

4Filpij 9ij

Yi Yj

slide-6
SLIDE 6

The

crowding

problem

Not

enough

space

in

lower dims far

  • Ei

In

t

SNE

we

model

The

joint

probability

9 ij

Using

a

Student T

distribution

which

has

heavier

tails

moderate distance

in

X

big

distance

in

4

is

  • k

does

not

force

moderate

distances

in

X

to

yield

small distances

in

y

slide-7
SLIDE 7

Auto

encoders

Design

a

network

that

consumes

X

and

Then

re

constructs

IT

I

784

I

Decode

p

f

200

TI

p

20

I

Bottleneck

p E

zoo

I

encode

P

E

784

I

Simplest

Version

mff sty sfILcx.xy

llx x.li

W V

Note

This

is

just

linear

dam

reduction

and

should

look

familiar

Wx

z

I

_Vz l

l

l

l

slide-8
SLIDE 8

I

l

l

l

dxm

next

dx1

Mxd

More

next

Time