[PPT] - Polynomial Spectral Decomposition of Conditional Expectation PowerPoint Presentation

SLIDE 1

Polynomial Spectral Decomposition

f Conditional Expectation Operators

Anuran Makur and Lizhong Zheng

EECS Department, Massachusetts Institute of Technology

Allerton Conference 2016

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 1 / 25

SLIDE 2

Outline

1

Introduction Motivation: Regression and Maximal Correlation Preliminaries Spectral Characterization of Maximal Correlation

2

Polynomial Decompositions of Compact Operators

3

Illustrations of Polynomial SVDs

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 2 / 25

SLIDE 3

Motivation: Regression and Maximal Correlation

Fix a joint distribution PX,Y on X × Y.

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

SLIDE 4

Motivation: Regression and Maximal Correlation

Fix a joint distribution PX,Y on X × Y. Regression: [Breiman and Friedman, 1985] Find f ⋆ ∈ F and g⋆ ∈ G that minimize the mean squared error: inf

f ∈F, g∈G E

(f (X) − g(Y ))2

where we minimize over: F

f : X → R | E [f (X)] = 0, E
f 2(X)
= 1
G
g : Y → R | E [g(Y )] = 0, E
g2(Y )
= 1
A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

SLIDE 5

Motivation: Regression and Maximal Correlation

Fix a joint distribution PX,Y on X × Y. Regression: [Breiman and Friedman, 1985] Find f ⋆ ∈ F and g⋆ ∈ G that minimize the mean squared error: inf

f ∈F, g∈G E

(f (X) − g(Y ))2

where we minimize over: F

f : X → R | E [f (X)] = 0, E
f 2(X)
= 1
G
g : Y → R | E [g(Y )] = 0, E
g2(Y )
= 1
Maximal Correlation: [R´

enyi, 1959] Find f ⋆ ∈ F and g⋆ ∈ G that maximize the correlation: ρ(X; Y ) sup

f ∈F, g∈G

E [f (X)g(Y )] Equivalence: E[(f (X) − g(Y ))2] = 2 − 2E [f (X)g(Y )]

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

SLIDE 6

Motivation: Regression and Maximal Correlation

Fix a joint distribution PX,Y on X × Y. Regression: [Breiman and Friedman, 1985] Find f ⋆ ∈ F and g⋆ ∈ G that minimize the mean squared error: inf

f ∈F, g∈G E

(f (X) − g(Y ))2

where we minimize over: F

f : X → R | E [f (X)] = 0, E
f 2(X)
= 1
G
g : Y → R | E [g(Y )] = 0, E
g2(Y )
= 1
Maximal Correlation: [R´

enyi, 1959] Find f ⋆ ∈ F and g⋆ ∈ G that maximize the correlation: ρ(X; Y ) sup

f ∈F, g∈G

E [f (X)g(Y )] Equivalence: E[(f (X) − g(Y ))2] = 2 − 2E [f (X)g(Y )] Maximal correlation is a singular value of an operator!

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

SLIDE 7

Preliminaries

Source random variable X ∈ X ⊆ R with probability density PX

n the measure space (X, B(X), λ)
A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

SLIDE 8

Preliminaries

Source random variable X ∈ X ⊆ R with probability density PX

n the measure space (X, B(X), λ)

Output random variable Y ∈ Y ⊆ R

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

SLIDE 9

Preliminaries

Source random variable X ∈ X ⊆ R with probability density PX

n the measure space (X, B(X), λ)

Output random variable Y ∈ Y ⊆ R Channel conditional probability densities

PY |X=x : x ∈ X
n the measure space (Y, B(Y), µ).
A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

SLIDE 10

Preliminaries

Source random variable X ∈ X ⊆ R with probability density PX

n the measure space (X, B(X), λ)

Output random variable Y ∈ Y ⊆ R Channel conditional probability densities

PY |X=x : x ∈ X
n the measure space (Y, B(Y), µ).

Marginal probability laws: PX and PY

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

SLIDE 11

Preliminaries

Hilbert spaces: L2 (X, PX)

f : X → R | E
f 2(X)
< +∞
L2 (Y, PY )
g : Y → R | E
g2(Y )
< +∞
𝑔

1

𝑔

2

𝑕1 𝑕2 ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 5 / 25

SLIDE 12

Preliminaries

Hilbert spaces: L2 (X, PX)

f : X → R | E
f 2(X)
< +∞
L2 (Y, PY )
g : Y → R | E
g2(Y )
< +∞
𝑔

1

𝑔

2

𝑕1 𝑕2 ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍

f1, f2PX E [f1(X)f2(X)] g1, g2PY E [g1(Y )g2(Y )] Correlation as inner products

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 5 / 25

SLIDE 13

Preliminaries

Hilbert spaces: L2 (X, PX)

f : X → R | E
f 2(X)
< +∞
L2 (Y, PY )
g : Y → R | E
g2(Y )
< +∞
𝑔

𝑕 ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 𝐷 𝐷∗

Conditional Expectation Operators: C : L2 (X, PX) → L2 (Y, PY ): (C(f ))(y) E [f (X)|Y = y] C ∗ : L2 (Y, PY ) → L2 (X, PX): (C ∗(g))(x) E [g(Y )|X = x]

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 6 / 25

SLIDE 14

Preliminaries

Proposition (Conditional Expectation Operators)

C and C ∗ are bounded linear operators with operator norms Cop = C ∗op = 1. Moreover, C ∗ is the adjoint operator of C.

𝑔 𝐷(𝑔) ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 𝐷 1𝒴 1𝒵

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 7 / 25

SLIDE 15

Preliminaries

Proposition (Conditional Expectation Operators)

C and C ∗ are bounded linear operators with operator norms Cop = C ∗op = 1. Moreover, C ∗ is the adjoint operator of C. Operator Norm: Cop sup

f ∈L2(X,PX )

C(f )PY f PX

𝑔 𝐷(𝑔) ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 𝐷 1𝒴 1𝒵

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 7 / 25

SLIDE 16

Preliminaries

Proposition (Conditional Expectation Operators)

C and C ∗ are bounded linear operators with operator norms Cop = C ∗op = 1. Moreover, C ∗ is the adjoint operator of C. Operator Norm: Cop sup

f ∈L2(X,PX )

C(f )PY f PX Cop ≤ 1 by Jensen’s inequality: C(f )2

PY = E

E [f (X)|Y ]2

≤ E

E
f 2(X)|Y
= f 2

PX .

𝑔 𝐷(𝑔) ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 𝐷 1𝒴 1𝒵

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 7 / 25

SLIDE 17

Preliminaries

Proposition (Conditional Expectation Operators)

C and C ∗ are bounded linear operators with operator norms Cop = C ∗op = 1. Moreover, C ∗ is the adjoint operator of C. Operator Norm: Cop sup

f ∈L2(X,PX )

C(f )PY f PX Cop ≤ 1 by Jensen’s inequality. Let 1S : S → R denote the everywhere unity function: 1S(x) = 1. C (1X ) = 1Y and 1X 2

PX = 1Y2 PY = 1 ⇒ Cop = 1.

𝑔 𝐷(𝑔) ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 𝐷 1𝒴 1𝒵

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 8 / 25

SLIDE 18

Spectral Characterization of Maximal Correlation

Prop (Spectral Characterization of Maximal Correlation) [R´ enyi, 1959]

For random variables X and Y as defined earlier: ρ(X; Y ) = sup

f ∈L2(X,PX ): E[f (X)]=0

C(f )PY f PX where the supremum is achieved by some f ⋆ ∈ L2 (X, PX) if C is compact.

𝑔⋆ 𝐷(𝑔⋆) ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 1𝒴 1𝒵 𝐷 𝐷∗ 𝜍2

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 9 / 25

SLIDE 19

Spectral Characterization of Maximal Correlation

Prop (Spectral Characterization of Maximal Correlation) [R´ enyi, 1959]

For random variables X and Y as defined earlier: ρ(X; Y ) = sup

f ∈L2(X,PX ): E[f (X)]=0

C(f )PY f PX where the supremum is achieved by some f ⋆ ∈ L2 (X, PX) if C is compact.

𝑔⋆ 𝐷(𝑔⋆) ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 1𝒴 1𝒵 𝐷 𝐷∗/𝜍2

C has largest singular value Cop = 1: C (1X ) = 1Y, C ∗ (1Y) = 1X .

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 9 / 25

SLIDE 20

Spectral Characterization of Maximal Correlation

Prop (Spectral Characterization of Maximal Correlation) [R´ enyi, 1959]

For random variables X and Y as defined earlier: ρ(X; Y ) = sup

f ∈L2(X,PX ): E[f (X)]=0

C(f )PY f PX where the supremum is achieved by some f ⋆ ∈ L2 (X, PX) if C is compact.

𝑔⋆ 𝐷(𝑔⋆) ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 1𝒴 1𝒵 𝐷 𝐷∗/𝜍2

C has largest singular value Cop = 1: C (1X ) = 1Y, C ∗ (1Y) = 1X . ρ (X; Y ) = second largest singular value of C with singular vectors f ⋆ ⊥ 1X and g⋆ = C (f ⋆) /ρ (X; Y ) ⊥ 1Y that maximize correlation.

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 9 / 25

SLIDE 21

Outline

1

Introduction

2

Polynomial Decompositions of Compact Operators The Hermite SVD Assumptions and Definitions Polynomial EVD of Compact Self-Adjoint Operators Polynomial SVD of Conditional Expectation Operators

3

Illustrations of Polynomial SVDs

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 10 / 25

SLIDE 22

The Hermite SVD

Gaussian Channel: PY |X=x = N(x, ν) with expectation parameter x ∈ R and fixed variance ν ∈ (0, ∞) ∀x, y ∈ R, PY |X(y|x) = 1 √ 2πν exp

−(y − x)2

2ν

Gaussian Source: PX = N (0, p) with fixed variance p ∈ (0, ∞)

∀x ∈ R, PX(x) = 1 √2πp exp

−x2

2p

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 11 / 25

SLIDE 23

The Hermite SVD

Gaussian Channel: PY |X=x = N(x, ν) with expectation parameter x ∈ R and fixed variance ν ∈ (0, ∞) ∀x, y ∈ R, PY |X(y|x) = 1 √ 2πν exp

−(y − x)2

2ν

Gaussian Source: PX = N (0, p) with fixed variance p ∈ (0, ∞)

∀x ∈ R, PX(x) = 1 √2πp exp

−x2

2p

Remark: (AWGN channel) Y = X + W with X ⊥

⊥ W ∼ N (0, ν) Gaussian Output Marginal: PY = N (0, p + ν) ∀y ∈ R, PY (y) = 1

2π(p + ν)

exp

−

y2 2(p + ν)

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 11 / 25

SLIDE 24

The Hermite SVD

Prop (Hermite SVD) [Abbe & Zheng, 2012], [Makur & Zheng, 2016]

For the Gaussian channel PY |X and Gaussian source PX, the conditional expectation operator C : L2 (R, PX) → L2 (R, PY ) has SVD: ∀k ∈ N, C

H(p)

k

= σkH(p+ν)

k

with singular values: {σk ∈ (0, 1] : k ∈ N} where σ0 = 1 and lim

k→∞ σk = 0,

and singular vectors: {H(p)

k

with degree k : k ∈ N} - Hermite polynomials that are

rthonormal with respect to PX,

{H(p+ν)

k

with degree k : k ∈ N} - Hermite polynomials that are

rthonormal with respect to PY .
A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 12 / 25

SLIDE 25

The Hermite SVD

Prop (Hermite SVD) [Abbe & Zheng, 2012], [Makur & Zheng, 2016]

For the Gaussian channel PY |X and Gaussian source PX, the conditional expectation operator C : L2 (R, PX) → L2 (R, PY ) has SVD: ∀k ∈ N, C

H(p)

k

= σkH(p+ν)

k

with singular values: {σk ∈ (0, 1] : k ∈ N} where σ0 = 1 and lim

k→∞ σk = 0,

and singular vectors: {H(p)

k

with degree k : k ∈ N} - Hermite polynomials that are

rthonormal with respect to PX,

{H(p+ν)

k

with degree k : k ∈ N} - Hermite polynomials that are

rthonormal with respect to PY .

For which joint distributions PX,Y are the singular vectors of C

rthonormal polynomials?
A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 12 / 25

SLIDE 26

Assumptions and Definitions

L2 (X, PX) and L2 (Y, PY ) are infinite-dimensional.

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 13 / 25

SLIDE 27

Assumptions and Definitions

L2 (X, PX) and L2 (Y, PY ) are infinite-dimensional. L2 (X, PX) admits a unique countable orthonormal basis of polynomials, {pk : k ∈ N} ⊆ L2 (X, PX), where pk : X → R is an

rthonormal polynomial with degree k.
A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 13 / 25

SLIDE 28

Assumptions and Definitions

L2 (X, PX) and L2 (Y, PY ) are infinite-dimensional. L2 (X, PX) admits a unique countable orthonormal basis of polynomials, {pk : k ∈ N} ⊆ L2 (X, PX), where pk : X → R is an

rthonormal polynomial with degree k.

L2 (Y, PY ) admits a unique countable orthonormal basis of polynomials, {qk : k ∈ N} ⊆ L2 (Y, PY ), where qk : Y → R is an

rthonormal polynomial with degree k.
A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 13 / 25

SLIDE 29

Assumptions and Definitions

Definition (Closure over Polynomials and Degree Preservation)

An operator T : L2 (X, PX) → L2 (Y, PY ) is closed over polynomials if for any polynomial p ∈ L2 (X, PX), T(p) is also a polynomial. Furthermore, T is degree preserving if: deg (T(p)) ≤ deg (p) , and T is strictly degree preserving if: deg (T(p)) = deg (p) .

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 14 / 25

SLIDE 30

Assumptions and Definitions

Definition (Closure over Polynomials and Degree Preservation)

An operator T : L2 (X, PX) → L2 (Y, PY ) is closed over polynomials if for any polynomial p ∈ L2 (X, PX), T(p) is also a polynomial. Furthermore, T is degree preserving if: deg (T(p)) ≤ deg (p) , and T is strictly degree preserving if: deg (T(p)) = deg (p) . Gaussian Channel Example: Y = X + W with X ⊥ ⊥ W ∼ N (0, ν) E [g(Y )|X = x] = 1 √ 2πν

R

g(y) exp

−(y − x)2

2ν

dµ(y)
A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 14 / 25

SLIDE 31

Assumptions and Definitions

Definition (Closure over Polynomials and Degree Preservation)

An operator T : L2 (X, PX) → L2 (Y, PY ) is closed over polynomials if for any polynomial p ∈ L2 (X, PX), T(p) is also a polynomial. Furthermore, T is degree preserving if: deg (T(p)) ≤ deg (p) , and T is strictly degree preserving if: deg (T(p)) = deg (p) . Gaussian Channel Example: Y = X + W with X ⊥ ⊥ W ∼ N (0, ν) E [g(Y )|X = x] = 1 √ 2πν

R

g(y) exp

−(y − x)2

2ν

dµ(y)

Convolution preserves polynomials!

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 14 / 25

SLIDE 32

Polynomial EVD of Compact Self-Adjoint Operators

Theorem (Condition for Orthonormal Polynomial Eigenbasis) [Makur and Zheng, 2016]

Let T : L2 (X, PX) → L2 (X, PX) be a compact self-adjoint operator. T is closed over polynomials and degree preserving if and only if: ∀k ∈ N, T (pk) = αkpk where {αk ∈ R : k ∈ N} are eigenvalues satisfying lim

k→∞ αk = 0.

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 15 / 25

SLIDE 33

Polynomial SVD of Conditional Expectation Operators

Theorem (Condition for Orthonormal Polynomial Singular Vectors) [Makur and Zheng, 2016]

Suppose C : L2 (X, PX) → L2 (Y, PY ) is compact and C ∗ : L2 (Y, PY ) → L2 (X, PX) is its adjoint operator. C and C ∗ are closed over polynomials and strictly degree preserving if and only if: ∀k ∈ N, C (pk) = βkqk where {βk ∈ (0, ∞) : k ∈ N} are the singular values such that lim

k→∞ βk = 0.

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 16 / 25

SLIDE 34

Polynomial SVD of Conditional Expectation Operators

Theorem (Condition for Orthonormal Polynomial Singular Vectors) [Makur and Zheng, 2016]

Suppose C E [·|Y ] : L2 (X, PX) → L2 (Y, PY ) is compact and C ∗ = E [·|X] : L2 (Y, PY ) → L2 (X, PX) is its adjoint operator. For every n ∈ N, E [X n|Y ] is a polynomial in Y with degree n and E [Y n|X] is polynomial in X with degree n if and only if: ∀k ∈ N, C (pk) = βkqk where {βk ∈ (0, 1] : k ∈ N} are the singular values such that β0 = 1 and lim

k→∞ βk = 0.

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 17 / 25

SLIDE 35

Polynomial SVD of Conditional Expectation Operators

Theorem (Condition for Orthonormal Polynomial Singular Vectors) [Makur and Zheng, 2016]

Suppose C E [·|Y ] : L2 (X, PX) → L2 (Y, PY ) is compact and C ∗ = E [·|X] : L2 (Y, PY ) → L2 (X, PX) is its adjoint operator. For every n ∈ N, E [X n|Y ] is a polynomial in Y with degree n and E [Y n|X] is polynomial in X with degree n if and only if: ∀k ∈ N, C (pk) = βkqk where {βk ∈ (0, 1] : k ∈ N} are the singular values such that β0 = 1 and lim

k→∞ βk = 0.

Gaussian Example Proof Sketch: Y = X + W with X ∼ N (0, p) ⊥ ⊥ W ∼ N (0, ν).

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 17 / 25

SLIDE 36

Polynomial SVD of Conditional Expectation Operators

Theorem (Condition for Orthonormal Polynomial Singular Vectors) [Makur and Zheng, 2016]

Suppose C E [·|Y ] : L2 (X, PX) → L2 (Y, PY ) is compact and C ∗ = E [·|X] : L2 (Y, PY ) → L2 (X, PX) is its adjoint operator. For every n ∈ N, E [X n|Y ] is a polynomial in Y with degree n and E [Y n|X] is polynomial in X with degree n if and only if: ∀k ∈ N, C (pk) = βkqk where {βk ∈ (0, 1] : k ∈ N} are the singular values such that β0 = 1 and lim

k→∞ βk = 0.

Gaussian Example Proof Sketch: Y = X + W with X ∼ N (0, p) ⊥ ⊥ W ∼ N (0, ν). C, C ∗ are defined by convolution kernels which preserve polynomials.

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 17 / 25

SLIDE 37

Polynomial SVD of Conditional Expectation Operators

Theorem (Condition for Orthonormal Polynomial Singular Vectors) [Makur and Zheng, 2016]

Suppose C E [·|Y ] : L2 (X, PX) → L2 (Y, PY ) is compact and C ∗ = E [·|X] : L2 (Y, PY ) → L2 (X, PX) is its adjoint operator. For every n ∈ N, E [X n|Y ] is a polynomial in Y with degree n and E [Y n|X] is polynomial in X with degree n if and only if: ∀k ∈ N, C (pk) = βkqk where {βk ∈ (0, 1] : k ∈ N} are the singular values such that β0 = 1 and lim

k→∞ βk = 0.

Gaussian Example Proof Sketch: Y = X + W with X ∼ N (0, p) ⊥ ⊥ W ∼ N (0, ν). C, C ∗ are defined by convolution kernels which preserve polynomials. By above theorem, C has Hermite polynomial singular vectors.

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 17 / 25

SLIDE 38

Outline

1

Introduction

2

Polynomial Decompositions of Compact Operators

3

Illustrations of Polynomial SVDs The Laguerre SVD The Jacobi SVD Natural Exponential Families and Conjugate Priors

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 18 / 25

SLIDE 39

The Laguerre SVD

Poisson Channel: PY |X=x = Poisson(x) with rate parameter x ∈ (0, ∞) ∀x ∈ (0, ∞), ∀y ∈ N, PY |X(y|x) = xye−x y! Gamma Source: PX = gamma(α, β) with shape parameter α ∈ (0, ∞) and rate parameter β ∈ (0, ∞) ∀x ∈ (0, ∞), PX(x) = βαxα−1e−βx Γ(α)

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 19 / 25

SLIDE 40

The Laguerre SVD

Poisson Channel: PY |X=x = Poisson(x) with rate parameter x ∈ (0, ∞) ∀x ∈ (0, ∞), ∀y ∈ N, PY |X(y|x) = xye−x y! Gamma Source: PX = gamma(α, β) with shape parameter α ∈ (0, ∞) and rate parameter β ∈ (0, ∞) ∀x ∈ (0, ∞), PX(x) = βαxα−1e−βx Γ(α) Negative Binomial Output Marginal: PY = negative-binomial

p =

1 β+1, α

with success probability parameter

p ∈ (0, 1) and number of failures parameter α ∈ (0, ∞) ∀y ∈ N, PY (y) = Γ(α + y) Γ(α)y!

1

β + 1 y β β + 1 α

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 19 / 25

SLIDE 41

The Laguerre SVD

Proposition (Laguerre SVD) [Makur and Zheng, 2016]

For the Poisson channel PY |X and gamma source PX, the conditional expectation operator C : L2 ((0, ∞), PX) → L2 (N, PY ) has SVD: ∀k ∈ N, C

L(α,β)

k

= σkM
α,

1 β+1

k

with singular values: {σk ∈ (0, 1] : k ∈ N} where σ0 = 1 and lim

k→∞ σk = 0,

and singular vectors: {L(α,β)

k

with degree k : k ∈ N} - generalized Laguerre polynomials that are orthonormal with respect to PX, {M

α,

1 β+1

k

with degree k : k ∈ N} - Meixner polynomials that are

rthonormal with respect to PY .
A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 20 / 25

SLIDE 42

The Jacobi SVD

Binomial Channel: PY |X=x = binomial(n, x) with number of trials parameter n ∈ N\{0} and success probability parameter x ∈ (0, 1) ∀x ∈ (0, 1), ∀y ∈ [n] {0, . . . , n} , PY |X(y|x) = n y

xy(1 − x)n−y

Beta Source: PX = beta(α, β) with shape parameters α ∈ (0, ∞) and β ∈ (0, ∞) ∀x ∈ (0, 1), PX(x) = xα−1(1 − x)β−1 B(α, β)

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 21 / 25

SLIDE 43

The Jacobi SVD

Binomial Channel: PY |X=x = binomial(n, x) with number of trials parameter n ∈ N\{0} and success probability parameter x ∈ (0, 1) ∀x ∈ (0, 1), ∀y ∈ [n] {0, . . . , n} , PY |X(y|x) = n y

xy(1 − x)n−y

Beta Source: PX = beta(α, β) with shape parameters α ∈ (0, ∞) and β ∈ (0, ∞) ∀x ∈ (0, 1), PX(x) = xα−1(1 − x)β−1 B(α, β) Beta-Binomial Output Marginal: PY = beta-binomial(n, α, β) ∀y ∈ [n], PY (y) = n y B(α + y, β + n − y) B(α, β)

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 21 / 25

SLIDE 44

The Jacobi SVD

Proposition (Jacobi SVD) [Makur and Zheng, 2016]

For the binomial channel PY |X and beta source PX, the conditional expectation operator C : L2 ((0, 1), PX) → L2 ([n], PY ) has SVD: ∀k ∈ [n], C

J(α,β)

k

= σkQ(α,β)

k

∀k ∈ N\[n], C

J(α,β)

k

= 0

with singular values: {σk ∈ (0, 1] : k ∈ [n]} where σ0 = 1, and singular vectors: {J(α,β)

k

with degree k : k ∈ N} - Jacobi polynomials that are

rthonormal with respect to PX,

{Q(α,β)

k

with degree k : k ∈ [n]} - Hahn polynomials that are

rthonormal with respect to PY .
A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 22 / 25

SLIDE 45

Why are these joint distributions special?

PY |X is a natural exponential family with quadratic variance function (introduced in [Morris, 1982]): ∀x ∈ X, ∀y ∈ Y, PY |X(y|x) = exp (xy − α(x) + β(y)) where PY |X(y|0) = exp (β(y)) is the base distribution, α(x) is the log-partition function with α(0) = 0, and VAR(Y |X = x) is a quadratic function of E [Y |X = x].

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 23 / 25

SLIDE 46

Why are these joint distributions special?

PY |X is a natural exponential family with quadratic variance function (introduced in [Morris, 1982]): ∀x ∈ X, ∀y ∈ Y, PY |X(y|x) = exp (xy − α(x) + β(y)) PX belongs to the corresponding conjugate prior family: ∀x ∈ X, PX(x; y′, n) = exp

y′x − nα(x) − τ(y′, n)
where τ(y′, n) is the log-partition function.
A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 23 / 25

SLIDE 47

Why are these joint distributions special?

PY |X is a natural exponential family with quadratic variance function (introduced in [Morris, 1982]): ∀x ∈ X, ∀y ∈ Y, PY |X(y|x) = exp (xy − α(x) + β(y)) PX belongs to the corresponding conjugate prior family: ∀x ∈ X, PX(x; y′, n) = exp

y′x − nα(x) − τ(y′, n)
All moments exist and are finite:

Gaussian likelihood with Gaussian prior, Poisson likelihood with gamma prior, binomial likelihood with beta prior.

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 23 / 25

SLIDE 48

Conclusion

Summary:

1 Regression and maximal correlation

⇒ conditional expectation operators

2 Closure over polynomials and degree preservation

⇔ orthogonal polynomial eigenvectors or singular vectors

3 Check conditional moments are polynomials

⇒ Gaussian-Gaussian, Gamma-Poisson, Beta-Binomial examples

4 Examples have natural exponential family/conjugate prior structure

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 24 / 25

SLIDE 49

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 25 / 25

SLIDE 50

A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 25 / 25