Manifold Matching: Joint Optimization of Fidelity & - - PowerPoint PPT Presentation

manifold matching joint optimization of fidelity
SMART_READER_LITE
LIVE PREVIEW

Manifold Matching: Joint Optimization of Fidelity & - - PowerPoint PPT Presentation

7th Conference on Multivariate Distributions with Applications Manifold Matching: Joint Optimization of Fidelity & Commensurability Carey E. Priebe Department of Applied Mathematics & Statistics Johns Hopkins University August, 2010


slide-1
SLIDE 1

7th Conference on Multivariate Distributions with Applications

Manifold Matching: Joint Optimization

  • f

Fidelity & Commensurability

Carey E. Priebe Department of Applied Mathematics & Statistics Johns Hopkins University August, 2010 Maresias, Brazil

1 / 23

slide-2
SLIDE 2

Collaborators

David J. Marchette Zhiliang Ma Sancar Adali &c. ——————– Support: AFOSR, NSSEFF, ONR, HLTCOE, ASEE

2 / 23

slide-3
SLIDE 3

Problem Formulation

Given xi1 ∼ · · · ∼ xik ∼ · · · ∼ xiK, i = 1, . . . , n

3 / 23

slide-4
SLIDE 4

Problem Formulation

Given xi1 ∼ · · · ∼ xik ∼ · · · ∼ xiK, i = 1, . . . , n

  • n objects are each measured under K different conditions
  • xi1 ∼ · · · ∼ xik ∼ · · · ∼ xiK

denotes K matched feature vectors representing a single object Oi

  • xik ∈ Ξk

3 / 23

slide-5
SLIDE 5

Problem Formulation

Given xi1 ∼ · · · ∼ xik ∼ · · · ∼ xiK, i = 1, . . . , n

  • n objects are each measured under K different conditions
  • xi1 ∼ · · · ∼ xik ∼ · · · ∼ xiK

denotes K matched feature vectors representing a single object Oi

  • xik ∈ Ξk
  • K new measurements {yk}K

k=1, yk ∈ Ξk

3 / 23

slide-6
SLIDE 6

Problem Formulation

Given xi1 ∼ · · · ∼ xik ∼ · · · ∼ xiK, i = 1, . . . , n

  • n objects are each measured under K different conditions
  • xi1 ∼ · · · ∼ xik ∼ · · · ∼ xiK

denotes K matched feature vectors representing a single object Oi

  • xik ∈ Ξk
  • K new measurements {yk}K

k=1, yk ∈ Ξk

Question

Are {yk}K

k=1 matched feature vectors representing a single object

measured under K conditions?

3 / 23

slide-7
SLIDE 7

Hypotheses

Ξ1 · · · ΞK Object O1 x11 ∼ · · · ∼ x1K . . . . . . . . . . . . Object On xn1 ∼ · · · ∼ xnK

4 / 23

slide-8
SLIDE 8

Hypotheses

Ξ1 · · · ΞK Object O1 x11 ∼ · · · ∼ x1K . . . . . . . . . . . . Object On xn1 ∼ · · · ∼ xnK

  • Each space Ξk comes with a dissimilarity δk,

yielding dissimilarity matrices ∆1, · · · , ∆K

4 / 23

slide-9
SLIDE 9

Hypotheses

Ξ1 · · · ΞK Object O1 x11 ∼ · · · ∼ x1K . . . . . . . . . . . . Object On xn1 ∼ · · · ∼ xnK

  • Each space Ξk comes with a dissimilarity δk,

yielding dissimilarity matrices ∆1, · · · , ∆K

  • Given new measurements {yk}K

k=1

we can obtain within-condition dissimilarities δk(yk, xik), i = 1, . . . , n, k = 1, . . . , K

4 / 23

slide-10
SLIDE 10

Hypotheses

Ξ1 · · · ΞK Object O1 x11 ∼ · · · ∼ x1K . . . . . . . . . . . . Object On xn1 ∼ · · · ∼ xnK

  • Each space Ξk comes with a dissimilarity δk,

yielding dissimilarity matrices ∆1, · · · , ∆K

  • Given new measurements {yk}K

k=1

we can obtain within-condition dissimilarities δk(yk, xik), i = 1, . . . , n, k = 1, . . . , K

  • Goal (K = 2): determine whether y1 and y2 are a match

4 / 23

slide-11
SLIDE 11

Hypotheses

Ξ1 · · · ΞK Object O1 x11 ∼ · · · ∼ x1K . . . . . . . . . . . . Object On xn1 ∼ · · · ∼ xnK

  • Each space Ξk comes with a dissimilarity δk,

yielding dissimilarity matrices ∆1, · · · , ∆K

  • Given new measurements {yk}K

k=1

we can obtain within-condition dissimilarities δk(yk, xik), i = 1, . . . , n, k = 1, . . . , K

  • Goal (K = 2): determine whether y1 and y2 are a match

H0 : y1 ∼ y2 versus HA : y1 ≁ y2 (we control the probability of missing a true match)

4 / 23

slide-12
SLIDE 12

what are these “conditions” and what does it mean to be “matched”

  • let condition be language for a text document,

and “matched” mean “on the same topic”

  • let condition be modality for an photo,

and “matched” mean “of the same person” – indoor lighting vs outdoor lighting – two cameras of different quality – passport photos and airport surveillance photos

  • let condition 1 be wiki text document

and condition 2 be wiki hyperlink structure

  • let condition 1 be text document

and condition 2 be photo

  • . . . or just a single space with multiple dissimilarities

5 / 23

slide-13
SLIDE 13

(not matched)

The English is clear enough to lorry drivers — but the Welsh reads “I am not in the office at the moment. Send any work to be translated.”

<http://news.bbc.co.uk/2/hi/uk_news/wales/7702913.stm>

6 / 23

slide-14
SLIDE 14

Manifold Matching I

Conditional distributions are induced by maps πk from “object space” Ξ

Ξ · · · Ξ1 ΞK

π1 πK

Conditional spaces Ξk are not commensurate

7 / 23

slide-15
SLIDE 15

Manifold Matching I

Conditional distributions are induced by maps πk from “object space” Ξ

Ξ · · · Ξ1 ΞK

π1 πK

∃ ϕ?

Conditional spaces Ξk are not commensurate

7 / 23

slide-16
SLIDE 16

Dirichlet Setting

Let Sp be the standard p-simplex in Rp+1 Let Ξ1 = Sp and Ξ2 = Sp (but the fact that the two spaces are the same is unknown to the algorithms ...) Let αi ∼iid Dirichlet(1) represent n “objects” or “topics” Let Xik ∼iid Dirichlet(rαi + 1) represent K languages (WCHs)

8 / 23

slide-17
SLIDE 17

Dirichlet Setting

Let Sp be the standard p-simplex in Rp+1 Let Ξ1 = Sp and Ξ2 = Sp (but the fact that the two spaces are the same is unknown to the algorithms ...) Let αi ∼iid Dirichlet(1) represent n “objects” or “topics” Let Xik ∼iid Dirichlet(rαi + 1) represent K languages (WCHs)

  • r controls “what it means to be matched”

(document variability & translation quality analogy)

1 r

αi Xi1 αi Xi2

Ξ2 Ξ1

1 r

8 / 23

slide-18
SLIDE 18

Manifold Matching II

Matched points are used to define maps ρk to the same space X (with distance d)

Ξ · · · Ξ1 ΞK

π1 πK

· · ·

ρ1 ρK

X

Reject for d( y1, y2) “large”

9 / 23

slide-19
SLIDE 19

Manifold Matching II

Matched points are used to define maps ρk to the same space X (with distance d)

Ξ · · · Ξ1 ΞK

π1 πK

· · ·

ρ1 ρK

X = Rd

Reject for d( y1, y2) “large”

9 / 23

slide-20
SLIDE 20

canonical correlation

  • Multidimensional scaling yields high-dimensional embeddings:

∆1 → X′

1 and ∆2 → X′ 2

  • Canonical correlation finds U1 : X′

1 → X1 and U2 : X′ 2 → X2

to maximize correlation

  • Out-of-sample embedding: y1 → y′

1, y2 → y′ 2

  • Both

y1 = UT

1 y′ 1 and

y2 = UT

2 y′ 2 are in Rd

with same coordinate system (i.e., they are commensurate)

  • Reject for d(

y1, y2) “large”

10 / 23

slide-21
SLIDE 21

procrustes ◦ mds

  • Multidimensional scaling yields low-dimensional embeddings:

∆1 → X1 and ∆2 → X2

  • Procrustes(X1, X2) yields

Q∗ = arg min

QT Q=I

X1 − X2QF

  • Out-of-sample embedding: y1 →

y1, y2 → y′

2

  • Both

y1 and y2 = Q∗ y′

2 are in Rd

with same coordinate system (i.e., they are commensurate)

  • Reject for d(

y1, y2) “large”

11 / 23

slide-22
SLIDE 22

fidelity & commensurability

Fidelity is how well the mapping preserves original dissimilarities;

  • ur within-condition fidelity error is given by

ǫfk = 1 n

2

  • 1≤i<j≤n

(d( xik, xjk) − δk(xik, xjk))2. Commensurability is how well the mapping preserves matchedness;

  • ur between-condition commensurability error is given by

ǫck1k2 = 1 n

  • 1≤i≤n

(d( xik1, xik2) − δk1k2(xik1, xik2))2. Alas, δk1k2 does not exist; however, our story seems to suggest that it might be reasonable to let δk1k2(xik1, xik2) = 0 for all i, k1, k2. NB: There is also between-condition separability error given by ǫsk1k2 = 1 n

2

  • 1≤i<j≤n

(d( xik1, xjk2) − δk1k2(xik1, xjk2))2.

12 / 23

slide-23
SLIDE 23

Methodological Comparison

  • canonical correlation optimizes commensurability

without regard for fidelity

  • procrustes ◦ mds optimizes fidelity

without regard for commensurability

13 / 23

slide-24
SLIDE 24

Methodological Comparison

  • canonical correlation optimizes commensurability

without regard for fidelity

  • procrustes ◦ mds optimizes fidelity

without regard for commensurability

  • compare: joint optimization of fidelity & commensurability . . .

13 / 23

slide-25
SLIDE 25

Omnibus Embedding Approach

M

2n ×2n =

∆1

n × n

∆2

n × n

W

n × n

u1

n × 1

u1 v1 u2

n × 1

W

u2 v1

n × 1 v2 n × 1

v2

T

T T T T

y1 y2

  • Under “matched” assumption,

impute dissimilarities δ12(xi11, xi22) to obtain an omnibus dissimilarity matrix M

  • Embed M as 2n points in Rd
  • Let ui1 = δ1(y1, xi1) and vi2 = δ2(y2, xi2)
  • Under H0 : y1 ∼ y2,

impute vi1 = δ12(y1, xi2) and ui2 = δ12(y2, xi1)

  • Out-of-sample embedding of (uT

1 , vT 1 )T and (uT 2 , vT 2 )T

yields y1 and y2

14 / 23

slide-26
SLIDE 26

Simulation Results

ROC curves: β against α

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

n=100, p=3, d=2, r=100, c=0.1, q=3

alpha power pom cca jofc

Simulation results indicate that joint optimization of fidelity & commensurability via omnibus embedding approach is (for this case) superior to canonical correlation and procrustes◦mds

15 / 23

slide-27
SLIDE 27

Spurious Correlation Phenomenon

Let Ξk = Sp+q = Sp × Sq; Sp encodes “signal” and Sq encodes “noise” On Sp, let αi ∼iid Dirichlet(1) and X1

ik ∼iid Dirichlet(rαi + 1)

(signal, as before) On Sq, let X2

ik ∼iid Dirichlet(1)

(pure noise) For c ∈ [0, 1], let Xik = [(1 − c)X1

ik, cX2 ik]

16 / 23

slide-28
SLIDE 28

Incommensurability Phenomenon I

1 2 3 4

F1 F2 F3 F4

1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4

⇒ ⇓ ⇐ ⇑

17 / 23

slide-29
SLIDE 29

Incommensurability Phenomenon II

  • 0.05

0.10 0.15 0.20 0.25 0.0 0.5 1.0 1.5 2.0

Scale & Polarity

Sqrt of Commensurability Error ||M1 − M2P*sa||

  • 0.05

0.10 0.15 0.20 0.25 0.0 0.5 1.0 1.5 2.0

Procrustes

Sqrt of Commensurability Error ||M1 − M2Q||

Dirichlet

18 / 23

slide-30
SLIDE 30

Experimental Data

Wikipedia Documents

  • Wikipedia is a free, multilingual encyclopedia project
  • 13 million articles (2.9 million in the English Wikipedia) have

been written collaboratively by volunteers around the world

  • A Wikipedia document has information regarding

◮ textual content of the document ◮ links in the document to other documents

  • Consider a subset of English and French Wikipedias that are

1-1 correspondent

  • We take the (directed) 2-neighborhood of the document “Alge-

braic Geometry” in the English Wikipedia, with the associated documents in the French Wikipedia (n = 1382)

19 / 23

slide-31
SLIDE 31

Experimental Results

ROC curves: β against α

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 alpha power pom cca jofc

Experimental results indicate that joint optimization of fidelity & commensurability via omnibus embedding approach is (for this case) superior to canonical correlation and procrustes◦mds

20 / 23

slide-32
SLIDE 32

Exploitation Task: Classification

21 / 23

slide-33
SLIDE 33

Integrated Sensing and Processing

Rd W Rd1 Rd2 RdK−1 RdK Ξ1 Ξ2 ΞK−1 ΞK · · · Ξ ISP ρ1 ρ2 ρK−1 ρK π1(θ1)π2(θ2) πK−1(θK−1) πK(θK) ∆(θ) MM Fidelity Commensurability Separability T1 T2 TK−1 TK

22 / 23

slide-34
SLIDE 34

Kronecker Quote

“The wealth of your practical experience with sane and interesting problems will give to mathematics a new direction and a new impetus.”

– Leopold Kronecker to Hermann von Helmholtz –

23 / 23