Estimation of Mixture Subspace Models Its Algebra, Statistics, and - - PowerPoint PPT Presentation

estimation of mixture subspace models its algebra
SMART_READER_LITE
LIVE PREVIEW

Estimation of Mixture Subspace Models Its Algebra, Statistics, and - - PowerPoint PPT Presentation

Introduction GPCA Lossy Compression Classification Conclusion Estimation of Mixture Subspace Models Its Algebra, Statistics, and Compressed Sensing Allen Y. Yang <yang@eecs.berkeley.edu> Nov. 30, 2007. Berkeley DSP Seminar Allen


slide-1
SLIDE 1

Introduction GPCA Lossy Compression Classification Conclusion

Estimation of Mixture Subspace Models – Its Algebra, Statistics, and Compressed Sensing

Allen Y. Yang <yang@eecs.berkeley.edu>

  • Nov. 30, 2007. Berkeley DSP Seminar

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-2
SLIDE 2

Introduction GPCA Lossy Compression Classification Conclusion

Motivation

Data from modern applications are often characterized as multimodal, multivariate: Subsets of the data are samples from different distribution models.

Face Recognition Handwritten Digits Hyperspectral Images Natural Image Segmentation Linear Switching Systems Human Kinesiology trackers movie

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-3
SLIDE 3

Introduction GPCA Lossy Compression Classification Conclusion

Next Generation Heterogeneous Sensor Networks

(a) Habitat surveil- lance (b) Smart camera sensor (c) Wearable sensors (d) Mobile sensor net

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-4
SLIDE 4

Introduction GPCA Lossy Compression Classification Conclusion

Estimation of Mixture Models in Vision and Learning

1

Simultaneous segmentation and estimation of mixture models

How to determine a class of models and the number of models? Robust to high noise and outliers? Purpose of segmentation for higher-level applications? e.g., motion segmentation, image categorization, object recognition.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-5
SLIDE 5

Introduction GPCA Lossy Compression Classification Conclusion

Estimation of Mixture Models in Vision and Learning

1

Simultaneous segmentation and estimation of mixture models

How to determine a class of models and the number of models? Robust to high noise and outliers? Purpose of segmentation for higher-level applications? e.g., motion segmentation, image categorization, object recognition.

2

New paradigms for distributed pattern recognition Centralized Recognition powerful processors (virtually) unlimited memory (virtually) unlimited bandwidth controlled observations simple sensor management

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-6
SLIDE 6

Introduction GPCA Lossy Compression Classification Conclusion

Estimation of Mixture Models in Vision and Learning

1

Simultaneous segmentation and estimation of mixture models

How to determine a class of models and the number of models? Robust to high noise and outliers? Purpose of segmentation for higher-level applications? e.g., motion segmentation, image categorization, object recognition.

2

New paradigms for distributed pattern recognition Centralized Recognition powerful processors (virtually) unlimited memory (virtually) unlimited bandwidth controlled observations simple sensor management Distributed Recognition mobile processors limited onboard memory band-limited communications high-percentage of occlusion/outliers complex sensor networks

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-7
SLIDE 7

Introduction GPCA Lossy Compression Classification Conclusion

Outline

We investigate two distinct frameworks

1

Unsupervised segmentation and estimation via GPCA Segment samples drawn from A = V1 ∪ V2 ∪ . . . ∪ VK in RD, and estimate subspace models.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-8
SLIDE 8

Introduction GPCA Lossy Compression Classification Conclusion

Outline

We investigate two distinct frameworks

1

Unsupervised segmentation and estimation via GPCA Segment samples drawn from A = V1 ∪ V2 ∪ . . . ∪ VK in RD, and estimate subspace models.

2

Supervised recognition via compressed sensing Assume training examples {A1, · · · , AK } for K subspaces. Given a test sample y ∈ V1 ∪ V2 ∪ . . . ∪ VK , determine its membership label(y) ∈ [1, 2, · · · , K] via a global sparse representation.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-9
SLIDE 9

Introduction GPCA Lossy Compression Classification Conclusion

Literature Overview

Literature Overview:

1

Unsupervised segmentation:

PCA [Pearson 1901, Eckart-Young 1930, Hotelling 1933, Jolliffe 1986] EM [Dempster 1977, McLachlan 1997] RANSAC [Fischler 1981, Torr 1997, Schindler 2005]

2

Supervised Classification:

Nearest neighbors Nearest subspaces [Kriegman 2003]

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-10
SLIDE 10

Introduction GPCA Lossy Compression Classification Conclusion

Literature Overview

Literature Overview:

1

Unsupervised segmentation:

PCA [Pearson 1901, Eckart-Young 1930, Hotelling 1933, Jolliffe 1986] EM [Dempster 1977, McLachlan 1997] RANSAC [Fischler 1981, Torr 1997, Schindler 2005]

2

Supervised Classification:

Nearest neighbors Nearest subspaces [Kriegman 2003]

References:

Generalized Principal Component Analysis, SIAM Review, preprint. Image Segmentation using Mixture Subspace Models, CVIU, preprint. Classification of Mixture Subspace Models via Compressed Sensing, Submitted to PAMI. http://www.eecs.berkeley.edu/~yang/

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-11
SLIDE 11

Introduction GPCA Lossy Compression Classification Conclusion

Generalized Principal Component Analysis

If one wishes to shrink it, one must first expand it. – Lao Tzu

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-12
SLIDE 12

Introduction GPCA Lossy Compression Classification Conclusion

Generalized Principal Component Analysis

If one wishes to shrink it, one must first expand it. – Lao Tzu

1

For a single subspace V ⊂ RD: d . = dim(V ) and codimension c . = dim(V ⊥) = D − d. V ⊥

1

: (x3 = 0), V ⊥

2

: (x1 = 0)&(x2 = 0).

V1 V2 R3 x3 x1 x2 Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-13
SLIDE 13

Introduction GPCA Lossy Compression Classification Conclusion

Generalized Principal Component Analysis

If one wishes to shrink it, one must first expand it. – Lao Tzu

1

For a single subspace V ⊂ RD: d . = dim(V ) and codimension c . = dim(V ⊥) = D − d. V ⊥

1

: (x3 = 0), V ⊥

2

: (x1 = 0)&(x2 = 0).

V1 V2 R3 x3 x1 x2 2

For subspace arrangement A = V1 ∪ V2, ∀z = (x1, x2, x3)T , z ∈ V1 ∪ V2 ⇔ {x3 = 0}|{(x1 = 0)&(x2 = 0)}

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-14
SLIDE 14

Introduction GPCA Lossy Compression Classification Conclusion

Generalized Principal Component Analysis

If one wishes to shrink it, one must first expand it. – Lao Tzu

1

For a single subspace V ⊂ RD: d . = dim(V ) and codimension c . = dim(V ⊥) = D − d. V ⊥

1

: (x3 = 0), V ⊥

2

: (x1 = 0)&(x2 = 0).

V1 V2 R3 x3 x1 x2 2

For subspace arrangement A = V1 ∪ V2, ∀z = (x1, x2, x3)T , z ∈ V1 ∪ V2 ⇔ {x3 = 0}|{(x1 = 0)&(x2 = 0)}

3

Equivalent to a system of second degree polynomials by De Morgan’s law: {x3 = 0}|{(x1 = 0)&(x2 = 0)} ⇔ (x1x3 = 0)&(x2x3 = 0) ⇔ x1x3=0

x2x3=0 Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-15
SLIDE 15

Introduction GPCA Lossy Compression Classification Conclusion

Generalized Principal Component Analysis

If one wishes to shrink it, one must first expand it. – Lao Tzu

1

For a single subspace V ⊂ RD: d . = dim(V ) and codimension c . = dim(V ⊥) = D − d. V ⊥

1

: (x3 = 0), V ⊥

2

: (x1 = 0)&(x2 = 0).

V1 V2 R3 x3 x1 x2 2

For subspace arrangement A = V1 ∪ V2, ∀z = (x1, x2, x3)T , z ∈ V1 ∪ V2 ⇔ {x3 = 0}|{(x1 = 0)&(x2 = 0)}

3

Equivalent to a system of second degree polynomials by De Morgan’s law: {x3 = 0}|{(x1 = 0)&(x2 = 0)} ⇔ (x1x3 = 0)&(x2x3 = 0) ⇔ x1x3=0

x2x3=0 4

Vanishing polynomials: p1 = x1x3, p2 = x2x3.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-16
SLIDE 16

Introduction GPCA Lossy Compression Classification Conclusion

Generalized Principal Component Analysis

If one wishes to shrink it, one must first expand it. – Lao Tzu

1

For a single subspace V ⊂ RD: d . = dim(V ) and codimension c . = dim(V ⊥) = D − d. V ⊥

1

: (x3 = 0), V ⊥

2

: (x1 = 0)&(x2 = 0).

V1 V2 R3 x3 x1 x2 2

For subspace arrangement A = V1 ∪ V2, ∀z = (x1, x2, x3)T , z ∈ V1 ∪ V2 ⇔ {x3 = 0}|{(x1 = 0)&(x2 = 0)}

3

Equivalent to a system of second degree polynomials by De Morgan’s law: {x3 = 0}|{(x1 = 0)&(x2 = 0)} ⇔ (x1x3 = 0)&(x2x3 = 0) ⇔ x1x3=0

x2x3=0 4

Vanishing polynomials: p1 = x1x3, p2 = x2x3.

5

Question: How many linearly independent Kth-degree polynomials for K subspaces?

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-17
SLIDE 17

Introduction GPCA Lossy Compression Classification Conclusion

Equivalence Relation

1

The equivalence between subspace arrangement and its Kth degree vanishing polynomials:

Trivial: linear products of 1-forms p1 = x1x3, p2 = x2x3 uniquely determine V1 ∪ V2. Not trivial: given V1 ∪ V2, p1 = x1x3, p2 = x2x3 are generators for all vanishing polynomials of arbitrary degree.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-18
SLIDE 18

Introduction GPCA Lossy Compression Classification Conclusion

Equivalence Relation

1

The equivalence between subspace arrangement and its Kth degree vanishing polynomials:

Trivial: linear products of 1-forms p1 = x1x3, p2 = x2x3 uniquely determine V1 ∪ V2. Not trivial: given V1 ∪ V2, p1 = x1x3, p2 = x2x3 are generators for all vanishing polynomials of arbitrary degree.

2

Under a general position condition, the number is combinatorial invariant [Jessica Sidman 2002 & Harm Derksen 2005] h(K) =

  • S

(−1)|S| K+D−1−cS

D−1−cS

  • ,

where S ⊆ {1, · · · , K} is an index set.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-19
SLIDE 19

Introduction GPCA Lossy Compression Classification Conclusion

Equivalence Relation

1

The equivalence between subspace arrangement and its Kth degree vanishing polynomials:

Trivial: linear products of 1-forms p1 = x1x3, p2 = x2x3 uniquely determine V1 ∪ V2. Not trivial: given V1 ∪ V2, p1 = x1x3, p2 = x2x3 are generators for all vanishing polynomials of arbitrary degree.

2

Under a general position condition, the number is combinatorial invariant [Jessica Sidman 2002 & Harm Derksen 2005] h(K) =

  • S

(−1)|S| K+D−1−cS

D−1−cS

  • ,

where S ⊆ {1, · · · , K} is an index set.

3

Example: linearly independent 3rd degree vanishing polynomials for 3 mixture subspaces.

Figure: Four possible configurations in R3.

d1 d2 d3 h(3) 2 2 2 1 2 2 1 2 2 1 1 4 1 1 1 7

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-20
SLIDE 20

Introduction GPCA Lossy Compression Classification Conclusion

Equivalence Relation

1

The equivalence between subspace arrangement and its Kth degree vanishing polynomials:

Trivial: linear products of 1-forms p1 = x1x3, p2 = x2x3 uniquely determine V1 ∪ V2. Not trivial: given V1 ∪ V2, p1 = x1x3, p2 = x2x3 are generators for all vanishing polynomials of arbitrary degree.

2

Under a general position condition, the number is combinatorial invariant [Jessica Sidman 2002 & Harm Derksen 2005] h(K) =

  • S

(−1)|S| K+D−1−cS

D−1−cS

  • ,

where S ⊆ {1, · · · , K} is an index set.

3

Example: linearly independent 3rd degree vanishing polynomials for 3 mixture subspaces.

Figure: Four possible configurations in R3.

d1 d2 d3 h(3) 2 2 2 1 2 2 1 2 2 1 1 4 1 1 1 7 Punch line The set of complete, linearly independent Kth degree vanishing polynomials, as a global signature, uniquely determines the mixture K subspaces.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-21
SLIDE 21

Introduction GPCA Lossy Compression Classification Conclusion

Estimation of Vanishing Polynomials

1

Vanishing polynomials are estimated via Veronese embedding: Given N samples x1, . . . , xN ∈ R3, L2 . = [ν2(x1), . . . , ν2(xN)] ∈ RM[3]

2 ×N

=     

··· (x1)2 ··· ··· (x1x2) ··· ··· (x1x3) ··· ··· (x2)2 ··· ··· (x2x3) ··· ··· (x3)2 ···

    

2

The null space of L2 is c1 = [0, 0, 1, 0, 0, 0] c2 = [0, 0, 0, 0, 1, 0] ⇒ p1 = c1ν2(x) = x1x3 p2 = c2ν2(x) = x2x3

V1 V2 R3 x3 x1 x2

Figure: 2nd-degree vanishing polynomials: p1 = x1x3, p2 = x2x3.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-22
SLIDE 22

Introduction GPCA Lossy Compression Classification Conclusion

Calculate Subspace Basis Vectors using Polynomial Derivatives

1

The bases for V ⊥

1 , · · · , V ⊥ K are recovered by the

derivatives of the vanishing polynomials. ∇xP = [ ∇xp1 ∇xp2 ] = x3 0

0 x3 x1 x2

  • .

2

Pick z = [1, 1, 0]T ∈ V1, then ∇xP(z) = 0 0

0 0 1 1

  • .

Pick z = [0, 0, 1]T ∈ V2, then ∇xP(z) = 1 0

0 1 0 0

  • .

V1 V2 R3 x3 x1 x2

Figure: P(x) . = [p1(x) p2(x)] = [x1x3, x2x3].

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-23
SLIDE 23

Introduction GPCA Lossy Compression Classification Conclusion

Calculate Subspace Basis Vectors using Polynomial Derivatives

1

The bases for V ⊥

1 , · · · , V ⊥ K are recovered by the

derivatives of the vanishing polynomials. ∇xP = [ ∇xp1 ∇xp2 ] = x3 0

0 x3 x1 x2

  • .

2

Pick z = [1, 1, 0]T ∈ V1, then ∇xP(z) = 0 0

0 0 1 1

  • .

Pick z = [0, 0, 1]T ∈ V2, then ∇xP(z) = 1 0

0 1 0 0

  • .

V1 V2 R3 x3 x1 x2

Figure: P(x) . = [p1(x) p2(x)] = [x1x3, x2x3].

Diagram of GPCA

V1 V2 RD = ⇒ νn(x) Null(Ln) p(x) = cTx = ⇒ = ⇒ ∇x Rank(Ln) = M[D]

n

− hI(n) V1 V2 RM[D]

n

RM[D]

n

RD Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-24
SLIDE 24

Introduction GPCA Lossy Compression Classification Conclusion

Robust GPCA

GPCA is stable to data noise.

(a) 8% (b) 12% (c) 16%

Figure: (2, 1, 1) ∈ R3.

(a) 8% (b) 12% (c) 16%

Figure: (2, 2, 1) ∈ R3.

GPCA is not robust to outliers: single outlier can arbitrarily perturb Null(Ln)

V1 V2 RD = ⇒ νn(x) Null(Ln) = ⇒ RM[D]

n

p(x) = cTx RM[D]

n

⇒ Breakdown of GPCA is 0% because breakdown of PCA is 0%: ⇒ Seek robust PCA to estimate Null(Ln), where Ln = [νn(x1), · · · , νn(xN)].

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-25
SLIDE 25

Introduction GPCA Lossy Compression Classification Conclusion

Robust statistics

Three approaches to eliminate “outliers”

1

Probability-based: small-probability samples. Probability plots: [Healy 1968, Cox 1968] PCs: [Rao 1964, Ganadesikan & Kettenring 1972] M-estimators: [Huber 1981, Campbell 1980] multivariate trimming (MVT): [Ganadesikan & Kettenring 1972]

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-26
SLIDE 26

Introduction GPCA Lossy Compression Classification Conclusion

Robust statistics

Three approaches to eliminate “outliers”

1

Probability-based: small-probability samples. Probability plots: [Healy 1968, Cox 1968] PCs: [Rao 1964, Ganadesikan & Kettenring 1972] M-estimators: [Huber 1981, Campbell 1980] multivariate trimming (MVT): [Ganadesikan & Kettenring 1972]

2

Influence-based: large influence on model parameters. Parameter difference with and without a sample: [Hampel et al. 1986, Critchley 1985]

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-27
SLIDE 27

Introduction GPCA Lossy Compression Classification Conclusion

Robust statistics

Three approaches to eliminate “outliers”

1

Probability-based: small-probability samples. Probability plots: [Healy 1968, Cox 1968] PCs: [Rao 1964, Ganadesikan & Kettenring 1972] M-estimators: [Huber 1981, Campbell 1980] multivariate trimming (MVT): [Ganadesikan & Kettenring 1972]

2

Influence-based: large influence on model parameters. Parameter difference with and without a sample: [Hampel et al. 1986, Critchley 1985]

3

Consensus-based: not consistent with models of high consensus. Hough: [Ballard 1981, Lowe 1999] RANSAC: [Fischler & Bolles 1981, Torr 1997] Least Median Estimate (LME): [Rousseeuw 1984, Steward 1999]

V1 V2 R3 x3 x1 x2

Reference: Estimation of subspace arrangements with applications in modeling and segmenting mixed data, SIAM Review (preprint).

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-28
SLIDE 28

Introduction GPCA Lossy Compression Classification Conclusion

Simulation of Robust GPCA

One plane and two lines.

(a) 12% (b) 32%

Two planes and one line.

(a) 12% (b) 32%

Outlier Elimination

Figure: Eliminate outliers via influence functions.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-29
SLIDE 29

Introduction GPCA Lossy Compression Classification Conclusion

Experiment: Motion Segmentation under 3-D Affine Projection

Problem formulation:

1

Object features p1, . . . , pN ∈ R3 are tracked in F frames. parking-lot movie

2

Denote mij ∈ R2 as the image under 3-D affine projection: mij = Ajpi+bj ∈ R2, i = 1, . . . , N; j = 1, . . . , F.

3

For each pi, xi =     mi1 . . . miF     ∈ R2F , i = 1, . . . , N.

4

Segment x1, . . . , xN that belong to different motions.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-30
SLIDE 30

Introduction GPCA Lossy Compression Classification Conclusion

Sequences: RANSAC: Influence: Reference: Robust statistical estimation and segmentation of multiple subspaces. CVPR Workshop on 25 Years of RANDAC, 2006.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-31
SLIDE 31

Introduction GPCA Lossy Compression Classification Conclusion

GPCA Website: http://perception.csl.uiuc.edu/gpca/

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-32
SLIDE 32

Introduction GPCA Lossy Compression Classification Conclusion

Summary

Advantages: A global algebraic framework. Solution is noniterative. If the subspace models are known, likely outperform most classical solutions. Robust to noise and outliers.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-33
SLIDE 33

Introduction GPCA Lossy Compression Classification Conclusion

Summary

Advantages: A global algebraic framework. Solution is noniterative. If the subspace models are known, likely outperform most classical solutions. Robust to noise and outliers. Limitations: Need to be provided with the subspace models. High-dimensional polynomial space limits the applications. Unsupervised learning.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-34
SLIDE 34

Introduction GPCA Lossy Compression Classification Conclusion

Image Segmentation based on Texture

1

Texture based:

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-35
SLIDE 35

Introduction GPCA Lossy Compression Classification Conclusion

Image Segmentation based on Texture

1

Texture based:

2

Object based:

3

Scene based:

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-36
SLIDE 36

Introduction GPCA Lossy Compression Classification Conclusion

Lossy MDL using Mixture Subspace Models

1

If the subspace number and dimensions are not given, segmentation is naturally ambiguous.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-37
SLIDE 37

Introduction GPCA Lossy Compression Classification Conclusion

Lossy MDL using Mixture Subspace Models

1

If the subspace number and dimensions are not given, segmentation is naturally ambiguous.

2

Given a set of vectors V = (v1, · · · , vN) ∈ RD×N and model A, a lossy coding scheme maps the vectors to binary bits up to a distortion E[vi − ˆ vi2] ≤ ǫ2: Lǫ(V , A) : RD×N → Z+. A∗(ǫ) = arg min{Lǫ(V , A) + Overhead(A)}.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-38
SLIDE 38

Introduction GPCA Lossy Compression Classification Conclusion

Lossy MDL using Mixture Subspace Models

1

If the subspace number and dimensions are not given, segmentation is naturally ambiguous.

2

Given a set of vectors V = (v1, · · · , vN) ∈ RD×N and model A, a lossy coding scheme maps the vectors to binary bits up to a distortion E[vi − ˆ vi2] ≤ ǫ2: Lǫ(V , A) : RD×N → Z+. A∗(ǫ) = arg min{Lǫ(V , A) + Overhead(A)}.

3

Lossy coding length for a noisy mixture subspace model. Model each subspace Vi as degenerate Gaussian. Hence its bit rate R(Vi) = 1 2 log2 det(I + D ǫ2Ni ViV T

i ).

Adding overhead L(Vi) = (Ni + D)R(Vi) + D 2 log2 det(1 + 1 ǫ2 µiµT

i ) + Ni(− log2(Ni/N)).

Total coding length: Ls(V1, · · · , VK ) =

i L(Vi). Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-39
SLIDE 39

Introduction GPCA Lossy Compression Classification Conclusion

Agglomerative Minimization

1

A greedy optimization method: animation

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-40
SLIDE 40

Introduction GPCA Lossy Compression Classification Conclusion

Agglomerative Minimization

1

A greedy optimization method: animation

2

Compare to GPCA

Formulated in the original data space. Segmentation without explicit subspace models. Disadvantage I: Does not recover subspace models. Disadvantage II: Does not reject outliers.

Reference: Unsupervised Segmentation of Natural Images via Lossy Data Compression. CVIU (preprint).

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-41
SLIDE 41

Introduction GPCA Lossy Compression Classification Conclusion

Image Segmentation via Mixture Subspace Models

(a) Nature (b) Urban (c) Portraits (d) Water

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-42
SLIDE 42

Introduction GPCA Lossy Compression Classification Conclusion

www.eecs.berkeley.edu/~yang/software/lossy_segmentation/

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-43
SLIDE 43

Introduction GPCA Lossy Compression Classification Conclusion

Design of Distributed Recognition Systems

Distributed recognition on smart sensors

1

Dimensionality reduction in search for low-dimensional features.

2

Linear or quadratic algorithms for distributed classifier.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-44
SLIDE 44

Introduction GPCA Lossy Compression Classification Conclusion

Design of Distributed Recognition Systems

Distributed recognition on smart sensors

1

Dimensionality reduction in search for low-dimensional features.

2

Linear or quadratic algorithms for distributed classifier. A Unified Framework via Compressed Sensing Classification encoded in a sparse representation. Dimensionality reduction with guaranteed performance. Efficient solution via ℓ1-minimization that is mostly linear.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-45
SLIDE 45

Introduction GPCA Lossy Compression Classification Conclusion

Design of Distributed Recognition Systems

Distributed recognition on smart sensors

1

Dimensionality reduction in search for low-dimensional features.

2

Linear or quadratic algorithms for distributed classifier. A Unified Framework via Compressed Sensing Classification encoded in a sparse representation. Dimensionality reduction with guaranteed performance. Efficient solution via ℓ1-minimization that is mostly linear. Compared with GPCA and the lossy coding scheme: Supervised. In lower-dimensional feature space. Do not require to know mixture subspace dimensions. Very light computation complexity suitable for smart sensors.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-46
SLIDE 46

Introduction GPCA Lossy Compression Classification Conclusion

Problem Formulation using Face Recognition

1

Notations

Training: For K classes, collect training samples {v1,1, · · · , v1,n1}, · · · , {vK,1, · · · , vK,nK } ∈ RD. Test: Present a new y ∈ RD, solve for label(y) ∈ [1, 2, · · · , K].

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-47
SLIDE 47

Introduction GPCA Lossy Compression Classification Conclusion

Problem Formulation using Face Recognition

1

Notations

Training: For K classes, collect training samples {v1,1, · · · , v1,n1}, · · · , {vK,1, · · · , vK,nK } ∈ RD. Test: Present a new y ∈ RD, solve for label(y) ∈ [1, 2, · · · , K].

2

Data representation in (long) vector form: RD is the sample space. For face recognition, assume 3-channel 640 × 480 image, D = 3 · 640 · 480.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-48
SLIDE 48

Introduction GPCA Lossy Compression Classification Conclusion

Problem Formulation using Face Recognition

1

Notations

Training: For K classes, collect training samples {v1,1, · · · , v1,n1}, · · · , {vK,1, · · · , vK,nK } ∈ RD. Test: Present a new y ∈ RD, solve for label(y) ∈ [1, 2, · · · , K].

2

Data representation in (long) vector form: RD is the sample space. For face recognition, assume 3-channel 640 × 480 image, D = 3 · 640 · 480.

3

For face recognition, assume y belongs to Class i, [Belhumeur et al. 1997, Basri & Jocobs 2003] y = αi,1vi,1 + αi,2vi,2 + · · · + αi,n1vi,ni , = Aiαi, where Ai = [vi,1, vi,2, · · · , vi,ni ].

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-49
SLIDE 49

Introduction GPCA Lossy Compression Classification Conclusion

Classification of Mixture Subspace Model

1

Nevertheless, i is the variable we need to solve. Consider a global representation y = [A1, A2, · · · , AK ]  

α1 α2

. . .

αK

  , = Ax0.

  • ver-determined system: A ∈ RD×n, where D ≫ n = n1 + · · · + nK .

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-50
SLIDE 50

Introduction GPCA Lossy Compression Classification Conclusion

Classification of Mixture Subspace Model

1

Nevertheless, i is the variable we need to solve. Consider a global representation y = [A1, A2, · · · , AK ]  

α1 α2

. . .

αK

  , = Ax0.

  • ver-determined system: A ∈ RD×n, where D ≫ n = n1 + · · · + nK .

2

x0 encodes membership of y: If y belongs to Subject i, x0 = [ 0 ··· 0 αi 0 ··· 0 ]T ∈ Rn. If we recover sparse x0, classification is solved!

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-51
SLIDE 51

Introduction GPCA Lossy Compression Classification Conclusion

Classification of Mixture Subspace Model

1

Nevertheless, i is the variable we need to solve. Consider a global representation y = [A1, A2, · · · , AK ]  

α1 α2

. . .

αK

  , = Ax0.

  • ver-determined system: A ∈ RD×n, where D ≫ n = n1 + · · · + nK .

2

x0 encodes membership of y: If y belongs to Subject i, x0 = [ 0 ··· 0 αi 0 ··· 0 ]T ∈ Rn. If we recover sparse x0, classification is solved!

3

Not so fast!!

Directly solving y = Ax0 is expensive: D > 7 × 104 for a 320 × 240 grayscale image. x0 is sparse: Average

1 K terms non-zero. Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-52
SLIDE 52

Introduction GPCA Lossy Compression Classification Conclusion

Dimensionality Redunction

1

Construct linear projection R ∈ Rd×D, d is the feature dimension. ˜ y . = Ry = RAx0 = ˜ Ax0. ˜ A ∈ Rd×n, but x0 is unchanged.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-53
SLIDE 53

Introduction GPCA Lossy Compression Classification Conclusion

Dimensionality Redunction

1

Construct linear projection R ∈ Rd×D, d is the feature dimension. ˜ y . = Ry = RAx0 = ˜ Ax0. ˜ A ∈ Rd×n, but x0 is unchanged.

2

Holistic features

Eigenfaces [Turk 1991] Fisherfaces [Belhumeur 1997] Laplacianfaces [He 2005]

3

Partial features

4

Unconventional features

Downsampled faces Randomfaces

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-54
SLIDE 54

Introduction GPCA Lossy Compression Classification Conclusion

ℓ0-Minimization

1

Underdetermined system ˜ y = ˜ Ax0 ∈ Rd: ˜ A ∈ Rd×n. Ask for the sparsest solution.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-55
SLIDE 55

Introduction GPCA Lossy Compression Classification Conclusion

ℓ0-Minimization

1

Underdetermined system ˜ y = ˜ Ax0 ∈ Rd: ˜ A ∈ Rd×n. Ask for the sparsest solution.

2

ℓ0-Minimization x0 = arg min

x

x0 s.t. ˜ y = ˜ Ax. · 0 simply counts the number of nonzero terms.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-56
SLIDE 56

Introduction GPCA Lossy Compression Classification Conclusion

ℓ0-Minimization

1

Underdetermined system ˜ y = ˜ Ax0 ∈ Rd: ˜ A ∈ Rd×n. Ask for the sparsest solution.

2

ℓ0-Minimization x0 = arg min

x

x0 s.t. ˜ y = ˜ Ax. · 0 simply counts the number of nonzero terms.

3

ℓ0-Ball Optimization over ℓ0-ball is combinatorial.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-57
SLIDE 57

Introduction GPCA Lossy Compression Classification Conclusion

The Usual Suspects in Search for “Improper” Solvers

1

Nearest neighbors: (PNN) xNN = [0, · · · , 0, 1k, 0, · · · , 0]T such that k = arg mini ˜ y − ˜ vi. Not a good choice for applications such as face recognition (with limited samples)

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-58
SLIDE 58

Introduction GPCA Lossy Compression Classification Conclusion

The Usual Suspects in Search for “Improper” Solvers

1

Nearest neighbors: (PNN) xNN = [0, · · · , 0, 1k, 0, · · · , 0]T such that k = arg mini ˜ y − ˜ vi. Not a good choice for applications such as face recognition (with limited samples)

2

ℓ2-Minimization: (P2) x2 = arg minx x2 s.t. ˜ y = ˜ Ax. ℓ2-Minimization is convex. But solution is not sparse.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-59
SLIDE 59

Introduction GPCA Lossy Compression Classification Conclusion

ℓ1/ℓ0 Equivalence

1

If x0 is sparse enough, program (P0) is equivalent to (P1) min x1 s.t. ˜ y = ˜ Ax. x1 = |x1| + |x2| + · · · + |xn|.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-60
SLIDE 60

Introduction GPCA Lossy Compression Classification Conclusion

ℓ1/ℓ0 Equivalence

1

If x0 is sparse enough, program (P0) is equivalent to (P1) min x1 s.t. ˜ y = ˜ Ax. x1 = |x1| + |x2| + · · · + |xn|.

2

ℓ1-Ball

ℓ1-Minimization is linear (matching pursuit, basis pursuit) Solution equal to ℓ0-minimization.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-61
SLIDE 61

Introduction GPCA Lossy Compression Classification Conclusion

ℓ1/ℓ0 Equivalence

1

If x0 is sparse enough, program (P0) is equivalent to (P1) min x1 s.t. ˜ y = ˜ Ax. x1 = |x1| + |x2| + · · · + |xn|.

2

ℓ1-Ball

ℓ1-Minimization is linear (matching pursuit, basis pursuit) Solution equal to ℓ0-minimization.

3

ℓ1/ℓ0 Equivalence: [Donoho 2002, 2004; Candes et al. 2004; Baraniuk 2006] Given ˜ y = ˜ Ax0, there exists EBP ρ(˜ A), if x00 < ρ, (1) ℓ1-solution is unique, (2) x1 = x0.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-62
SLIDE 62

Introduction GPCA Lossy Compression Classification Conclusion

Random projection (Randomfaces)

Blessing of Dimensionality In high-dimensional data space RD, with overwhelming probability, ℓ1/ℓ0 equivalence in ˜ y = ˜ Ax holds for a random projection matrix R. Equivalence breakdown point ρ ≈ 0.49d!

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-63
SLIDE 63

Introduction GPCA Lossy Compression Classification Conclusion

Random projection (Randomfaces)

Blessing of Dimensionality In high-dimensional data space RD, with overwhelming probability, ℓ1/ℓ0 equivalence in ˜ y = ˜ Ax holds for a random projection matrix R. Equivalence breakdown point ρ ≈ 0.49d! Properties (Are they universal projections?):

1

Domain independent!

2

Data independent!

3

Fast to generate and compute!

Reference: Candes & Tao. Near-optimal signal recovery from random projections: Universal encoding strategies?. 2006.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-64
SLIDE 64

Introduction GPCA Lossy Compression Classification Conclusion

ℓ1-Minimization Routines

Matching pursuit [Mallat 1993]

1

Find most correlated vector vi in A with y: i = arg max y, vj.

2

A ← A(i), xi ← y, vi, y ← y − xivi.

3

Repeat until y < ǫ.

Basis pursuit [Chen 1998]

1

Start with number of sparse coefficients m = 1.

2

Select m linearly independent vectors Bm in A as a basis xm = B†

my. 3

Repeat swapping one basis vector in Bm with another vector not in Bm if improve y − Bmxm.

4

If y − Bmxm2 < ǫ, stop; Otherwise, m ← m + 1, repeat Step 2.

Quadratic solvers: ˜ y = ˜ Ax0 + z ∈ Rd, where z2 < ǫ x∗ = arg min{x1 + λy − ˜ Ax2} [LASSO, Second-order cone programming]: Much more expensive.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-65
SLIDE 65

Introduction GPCA Lossy Compression Classification Conclusion

ℓ1-Minimization Routines

Matching pursuit [Mallat 1993]

1

Find most correlated vector vi in A with y: i = arg max y, vj.

2

A ← A(i), xi ← y, vi, y ← y − xivi.

3

Repeat until y < ǫ.

Basis pursuit [Chen 1998]

1

Start with number of sparse coefficients m = 1.

2

Select m linearly independent vectors Bm in A as a basis xm = B†

my. 3

Repeat swapping one basis vector in Bm with another vector not in Bm if improve y − Bmxm.

4

If y − Bmxm2 < ǫ, stop; Otherwise, m ← m + 1, repeat Step 2.

Quadratic solvers: ˜ y = ˜ Ax0 + z ∈ Rd, where z2 < ǫ x∗ = arg min{x1 + λy − ˜ Ax2} [LASSO, Second-order cone programming]: Much more expensive. Stability versus Complexity On workstations, prefer quadratic routines to explicitly model data noise. On sensor nodes, prefer linear routines (BP) for simplicity.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-66
SLIDE 66

Introduction GPCA Lossy Compression Classification Conclusion

Classification

Given a noise estimate ǫ, solve (P′

1) ⇒ x1.

1

Project x1 onto face subspaces: δ1(x1) =  

α1

. . .   , δ2(x1) =  

α2

. . .   , · · · , δK (x1) =    . . .

αK

   . (1)

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-67
SLIDE 67

Introduction GPCA Lossy Compression Classification Conclusion

Classification

Given a noise estimate ǫ, solve (P′

1) ⇒ x1.

1

Project x1 onto face subspaces: δ1(x1) =  

α1

. . .   , δ2(x1) =  

α2

. . .   , · · · , δK (x1) =    . . .

αK

   . (1)

2

Define residual ri = ˜ y − ˜ Aδi(x1)2 for Subject i:

id(y) = arg mini=1,··· ,K {ri}

Reference: Feature selection in face recognition: A sparse representation perspective. Submitted to PAMI, 2007.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-68
SLIDE 68

Introduction GPCA Lossy Compression Classification Conclusion

AR Database 100 Subjects (Illumination and Expression Variance)

Table: Nearest Neighbor (Left) and Nearest Subspace (Right). Dimension 30 54 130 540 Eigen [%] 68.1 74.8 79.3 80.5 Laplacian [%] 73.1 77.1 83.8 89.7 Random [%] 56.7 63.7 71.4 75 Down [%] 51.7 60.9 69.2 73.7 Fisher [%] 83.4 86.8 N/A N/A Dimension 30 54 130 540 Eigen [%] 64.1 77.1 82 85.1 Laplacian [%] 66 77.5 84.3 90.3 Random [%] 59.2 68.2 80 83.3 Down [%] 56.2 67.7 77 82.1 Fisher [%] 80.3 85.8 N/A N/A Table: ℓ1-Minimization. Dimension 30 54 130 540 Eigen [%] 71.1 80 85.7 92 Laplacian [%] 73.7 84.7 91 94.3 Random [%] 57.8 75.5 87.6 94.7 Down [%] 46.8 67 84.6 93.9 Fisher [%] 87 92.3 N/A N/A

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-69
SLIDE 69

Introduction GPCA Lossy Compression Classification Conclusion

Conclusion

Estimation of Mixture Subspace Models

1

GPCA: Estimation of vanishing polynomials.

2

Lossy coding: Minimization of the lossy coding length.

3

Compressed sensing: Sparse representation via ℓ1-minimization

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-70
SLIDE 70

Introduction GPCA Lossy Compression Classification Conclusion

Conclusion

Estimation of Mixture Subspace Models

1

GPCA: Estimation of vanishing polynomials.

2

Lossy coding: Minimization of the lossy coding length.

3

Compressed sensing: Sparse representation via ℓ1-minimization Confluence of Algebra and Statistics In estimation of mixture (subspace) models, Algebra makes statistical algorithms well-conditioned; Statistics makes algebraic algorithms robust.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-71
SLIDE 71

Introduction GPCA Lossy Compression Classification Conclusion

Conclusion

Estimation of Mixture Subspace Models

1

GPCA: Estimation of vanishing polynomials.

2

Lossy coding: Minimization of the lossy coding length.

3

Compressed sensing: Sparse representation via ℓ1-minimization Confluence of Algebra and Statistics In estimation of mixture (subspace) models, Algebra makes statistical algorithms well-conditioned; Statistics makes algebraic algorithms robust. Beware of the wall !!

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-72
SLIDE 72

Introduction GPCA Lossy Compression Classification Conclusion

Open Problems

1

Algebraic framework

Segmentation of mixture linear and quadratic models. Kernel methods to avoid high-dimensional Veronese map?

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-73
SLIDE 73

Introduction GPCA Lossy Compression Classification Conclusion

Open Problems

1

Algebraic framework

Segmentation of mixture linear and quadratic models. Kernel methods to avoid high-dimensional Veronese map?

2

Lossy coding scheme

Approximation of manifolds using mixture subspace models.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-74
SLIDE 74

Introduction GPCA Lossy Compression Classification Conclusion

Open Problems

1

Algebraic framework

Segmentation of mixture linear and quadratic models. Kernel methods to avoid high-dimensional Veronese map?

2

Lossy coding scheme

Approximation of manifolds using mixture subspace models.

3

Recognition via Sparse Representation

Distributed pattern recognition: How to systematically improve global recognition if multiple local observations are available (in distributed camera nets, wearable sensor nets)?

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-75
SLIDE 75

Introduction GPCA Lossy Compression Classification Conclusion

Further Reading List

Robust Statistics

  • Jollife. Principal component analysis. 2002.
  • Huber. Robust statistics. 1981.

Gnanadesikan & Kettenring. Robust estimates, residuals, and outlier detection with multiresponse data. 1972 Fischler & Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. 1981.

Lossy Coding and Minimum Description Length

Cover & Thomas. Elements of information theory. 1991. Dempster et al. Maximum likelihood from incomplete data via the EM algorithm. 1977. Madiman et al. Minimum description length vs. maximum likelihood in lossy data compression. 2004.

Compressed Sensing

  • Donoho. For most large underdetermined systems of equations, the minimal ℓ1-norm near-solution

approximates the sparsest near-solution. 2004.

  • Candes. Compressive sampling. 2006.
  • Donoho. Neighborly polytopes and sparse solution of underdetermined linear equations. 2004.

Baraniuk et al. The Johnson-Lindenstrauss lemma meets compressed sensing. 2006.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models

slide-76
SLIDE 76

Introduction GPCA Lossy Compression Classification Conclusion

Acknowledgments

Collaborators

Berkeley: Shankar Sastry, Ruzena Bajcsy UIUC: Yi Ma, Robert Fossum UMich: Harm Derksen JHU: Rene Vidal OSU: Kun Huang UT-Dallas: Roozbeh Jafari

Matlab Toolboxes

GPCA: http://perception.csl.uiuc.edu/gpca/ Texture Segmentation: http://www.eecs.berkeley.edu/~yang/software/lossy_segmentation/ SparseLab: http://sparselab.stanford.edu/ ℓ1-Magic: http://www.acm.caltech.edu/l1magic/

References

Estimation of subspace arrangements with applications in modeling and segmenting mixed data. SIAM Review (preprint), 2007. Unsupervised Segmentation of Natural Images via Lossy Data Compression. CVIU (preprint), 2007. Feature selection in face recognition: A sparse representation perspective. UC Berkeley Tech Report, 2007.

Allen Y. Yang <yang@eecs.berkeley.edu> Estimation of Mixture Subspace Models