3 UkkVaT XCE best . k as rank approximation of Xc 0k Rpxk ) - - PDF document

3
SMART_READER_LITE
LIVE PREVIEW

3 UkkVaT XCE best . k as rank approximation of Xc 0k Rpxk ) - - PDF document

Random Projection dimensionality & High Lecture 4 . . - Sep -11,2017 Recall pxn Xn ] X=[ data xi , ... , - nt11T . 1=f!]RP data Xc= H=I centered XH , 3 UkkVaT XCE best . k as rank approximation of Xc 0k Rpxk )


slide-1
SLIDE 1

Lecture

4

.

High

dimensionality &

.

Random Projection

  • Sep-11,2017

Recall

pxn

data

X=[

xi , ... , Xn ]

centered

data Xc=

XH

,

H=I

  • nt11T
. 1=f!]£RP

¥3

XCE

Uk§kVaT

as

best

rank . k approximation

0k£ Rpxk

  • f Xc

f ,<

€ prnxk

) orthogonal

column mat . , ,

§k= diagcoii

. .
  • FKI
,

disks

. . .

sow

k

. PCA is

given

by Cox ,§k )

with projection

  • §=[ Fi
ii.

fikuiixc

a sieve each column

gives

new coordinates

C⇒

Eigenvalue Decomposition of

Covariance Mat .

In

a

ntxex

'i=dkAk0I

' ,aeSI i

k

.

MDS

" is

given

by

( §k ,V^⇒

with

data

representation
  • grate
e Rkxn

Gigenvaluedecomp ,

  • f

Kernel

Mat .

K

.

ntxixc

¥

write

"

kernel.PH/iUD=Ksois

positive

semi definite Be

HKH

'

I

VIAKVIT

slide-2
SLIDE 2

Prddet

: What about

big

data

  • f
high

dimensionality ?

n > > i

pm

big

data

n > > I .

In

  • .

tntfcxitikxiti

'5

down sample n'

good approximation #

nd .dµ§a

in

restricted

  • n

subsample

high dimensionality

pm

,

In

ERTKP

too big . sooompute

K

=

Xixc

? =

easy

to

approximate ?

Trojan

!

  • e. g.
  • R=

¥ Ad

# where

Aij

NJVCO

, l ) .

XD

" "

(Rxc)d×

" ,

dap

.

Kr

= XIRTRXC a

good

approximation of K !

. Aijnfj

Petz

ftp.flz

  • Aij
=f !

Pets

sparse

with many

zeros ! I -210=43 4

p=%

slide-3
SLIDE 3

8×amp_ (

Human

Genome

Diversity

Project HGDP )

http://www.cephb.fr/en/hgdp.panel.phph=1ob4

persons

p=

644,258

SNPS

XP

" " :

Xij

= : " AA "

;

1 : " AC " , ' 2 : " CC " ; 9 : "

Missing

"

Removing

21

persons

with

missing

values .

µ

644.258×1043

ROKP

randomly

select

d

r.ws/sNps

  • f

Xi

I

dxn =

Rxc

=

RXH

¥

UISIVIT

d=p= book

al
  • 5k

D= took

. akk kxk ktn . t t t

In

all

cases .

( §ad

, ✓
  • e. d)
are

good

results

!

Here

: PCA

coordinates

.

§adV^I,d

ERK

' " ,

k=z

.

Why

does it

work ?

.
slide-4
SLIDE 4

%hnson-Lind=mma

then

: Xie RP .

dij

=Hxi . Xjtl , it , " in Look for a transform

f=

Xii

Y ;

ER

" ' ,

D= Ocddbgn

)

at .

Mi

  • Yjll
  • SHE
. with

probability

1- E

E

uxi

  • xjll

zl

  • na
. no

Uniform

E- Isometry !

  • relative
metric . distortion is

uniformly

"

bounded

by

E !

f

is a

random

projection !

1980 's

Johnson

.

Linden

Strauss

Lipschutz

Extension

was

Sanjay Dasgupta

, Anupam Gupta

Dimitri

's

Achlioptas

.

Computer

Science

,

data

compression

nearest

neighbor search

Tim

Given EECO , i ) , n , a > .

Let

Rack

, e) fgn =C4+2&)(¥
  • E÷)→tgn

Then for

any

n

points

XEERSD

Cia . ... ,n ) , there exists a

map

f

: RD→Rk set .
  • VII. Xj

Hfcxi )

  • flxj ) lp
win

probs .li?nx?-xxsH2

* ,

slide-5
SLIDE 5
  • at )

holds

with

probability

at

least

I
,
  • f
can be found in

randomized polynomial

time

C random

projections )

a G.

ftp.RX.R.cn ,

...

,rdT

ER

" D , XERD

ries

't

sphere

  • f
D- I

dim

.
  • e. g.
r ; = cai.no#.ajn~Nco ,

D

Vaill