Polynomial Spectral Decomposition of Conditional Expectation - - PowerPoint PPT Presentation

polynomial spectral decomposition of conditional
SMART_READER_LITE
LIVE PREVIEW

Polynomial Spectral Decomposition of Conditional Expectation - - PowerPoint PPT Presentation

Polynomial Spectral Decomposition of Conditional Expectation Operators Anuran Makur and Lizhong Zheng EECS Department, Massachusetts Institute of Technology Allerton Conference 2016 A. Makur & L. Zheng (MIT) Polynomial Spectral


slide-1
SLIDE 1

Polynomial Spectral Decomposition

  • f Conditional Expectation Operators

Anuran Makur and Lizhong Zheng

EECS Department, Massachusetts Institute of Technology

Allerton Conference 2016

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 1 / 25

slide-2
SLIDE 2

Outline

1

Introduction Motivation: Regression and Maximal Correlation Preliminaries Spectral Characterization of Maximal Correlation

2

Polynomial Decompositions of Compact Operators

3

Illustrations of Polynomial SVDs

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 2 / 25

slide-3
SLIDE 3

Motivation: Regression and Maximal Correlation

Fix a joint distribution PX,Y on X × Y.

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

slide-4
SLIDE 4

Motivation: Regression and Maximal Correlation

Fix a joint distribution PX,Y on X × Y. Regression: [Breiman and Friedman, 1985] Find f ⋆ ∈ F and g⋆ ∈ G that minimize the mean squared error: inf

f ∈F, g∈G E

  • (f (X) − g(Y ))2

where we minimize over: F

  • f : X → R | E [f (X)] = 0, E
  • f 2(X)
  • = 1
  • G
  • g : Y → R | E [g(Y )] = 0, E
  • g2(Y )
  • = 1
  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

slide-5
SLIDE 5

Motivation: Regression and Maximal Correlation

Fix a joint distribution PX,Y on X × Y. Regression: [Breiman and Friedman, 1985] Find f ⋆ ∈ F and g⋆ ∈ G that minimize the mean squared error: inf

f ∈F, g∈G E

  • (f (X) − g(Y ))2

where we minimize over: F

  • f : X → R | E [f (X)] = 0, E
  • f 2(X)
  • = 1
  • G
  • g : Y → R | E [g(Y )] = 0, E
  • g2(Y )
  • = 1
  • Maximal Correlation: [R´

enyi, 1959] Find f ⋆ ∈ F and g⋆ ∈ G that maximize the correlation: ρ(X; Y ) sup

f ∈F, g∈G

E [f (X)g(Y )] Equivalence: E[(f (X) − g(Y ))2] = 2 − 2E [f (X)g(Y )]

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

slide-6
SLIDE 6

Motivation: Regression and Maximal Correlation

Fix a joint distribution PX,Y on X × Y. Regression: [Breiman and Friedman, 1985] Find f ⋆ ∈ F and g⋆ ∈ G that minimize the mean squared error: inf

f ∈F, g∈G E

  • (f (X) − g(Y ))2

where we minimize over: F

  • f : X → R | E [f (X)] = 0, E
  • f 2(X)
  • = 1
  • G
  • g : Y → R | E [g(Y )] = 0, E
  • g2(Y )
  • = 1
  • Maximal Correlation: [R´

enyi, 1959] Find f ⋆ ∈ F and g⋆ ∈ G that maximize the correlation: ρ(X; Y ) sup

f ∈F, g∈G

E [f (X)g(Y )] Equivalence: E[(f (X) − g(Y ))2] = 2 − 2E [f (X)g(Y )] Maximal correlation is a singular value of an operator!

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

slide-7
SLIDE 7

Preliminaries

Source random variable X ∈ X ⊆ R with probability density PX

  • n the measure space (X, B(X), λ)
  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

slide-8
SLIDE 8

Preliminaries

Source random variable X ∈ X ⊆ R with probability density PX

  • n the measure space (X, B(X), λ)

Output random variable Y ∈ Y ⊆ R

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

slide-9
SLIDE 9

Preliminaries

Source random variable X ∈ X ⊆ R with probability density PX

  • n the measure space (X, B(X), λ)

Output random variable Y ∈ Y ⊆ R Channel conditional probability densities

  • PY |X=x : x ∈ X
  • n the measure space (Y, B(Y), µ).
  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

slide-10
SLIDE 10

Preliminaries

Source random variable X ∈ X ⊆ R with probability density PX

  • n the measure space (X, B(X), λ)

Output random variable Y ∈ Y ⊆ R Channel conditional probability densities

  • PY |X=x : x ∈ X
  • n the measure space (Y, B(Y), µ).

Marginal probability laws: PX and PY

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

slide-11
SLIDE 11

Preliminaries

Hilbert spaces: L2 (X, PX)

  • f : X → R | E
  • f 2(X)
  • < +∞
  • L2 (Y, PY )
  • g : Y → R | E
  • g2(Y )
  • < +∞
  • 𝑔

1

𝑔

2

𝑕1 𝑕2 ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 5 / 25

slide-12
SLIDE 12

Preliminaries

Hilbert spaces: L2 (X, PX)

  • f : X → R | E
  • f 2(X)
  • < +∞
  • L2 (Y, PY )
  • g : Y → R | E
  • g2(Y )
  • < +∞
  • 𝑔

1

𝑔

2

𝑕1 𝑕2 ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍

f1, f2PX E [f1(X)f2(X)] g1, g2PY E [g1(Y )g2(Y )] Correlation as inner products

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 5 / 25

slide-13
SLIDE 13

Preliminaries

Hilbert spaces: L2 (X, PX)

  • f : X → R | E
  • f 2(X)
  • < +∞
  • L2 (Y, PY )
  • g : Y → R | E
  • g2(Y )
  • < +∞
  • 𝑔

𝑕 ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 𝐷 𝐷∗

Conditional Expectation Operators: C : L2 (X, PX) → L2 (Y, PY ): (C(f ))(y) E [f (X)|Y = y] C ∗ : L2 (Y, PY ) → L2 (X, PX): (C ∗(g))(x) E [g(Y )|X = x]

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 6 / 25

slide-14
SLIDE 14

Preliminaries

Proposition (Conditional Expectation Operators)

C and C ∗ are bounded linear operators with operator norms Cop = C ∗op = 1. Moreover, C ∗ is the adjoint operator of C.

𝑔 𝐷(𝑔) ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 𝐷 1𝒴 1𝒵

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 7 / 25

slide-15
SLIDE 15

Preliminaries

Proposition (Conditional Expectation Operators)

C and C ∗ are bounded linear operators with operator norms Cop = C ∗op = 1. Moreover, C ∗ is the adjoint operator of C. Operator Norm: Cop sup

f ∈L2(X,PX )

C(f )PY f PX

𝑔 𝐷(𝑔) ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 𝐷 1𝒴 1𝒵

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 7 / 25

slide-16
SLIDE 16

Preliminaries

Proposition (Conditional Expectation Operators)

C and C ∗ are bounded linear operators with operator norms Cop = C ∗op = 1. Moreover, C ∗ is the adjoint operator of C. Operator Norm: Cop sup

f ∈L2(X,PX )

C(f )PY f PX Cop ≤ 1 by Jensen’s inequality: C(f )2

PY = E

  • E [f (X)|Y ]2

≤ E

  • E
  • f 2(X)|Y
  • = f 2

PX .

𝑔 𝐷(𝑔) ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 𝐷 1𝒴 1𝒵

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 7 / 25

slide-17
SLIDE 17

Preliminaries

Proposition (Conditional Expectation Operators)

C and C ∗ are bounded linear operators with operator norms Cop = C ∗op = 1. Moreover, C ∗ is the adjoint operator of C. Operator Norm: Cop sup

f ∈L2(X,PX )

C(f )PY f PX Cop ≤ 1 by Jensen’s inequality. Let 1S : S → R denote the everywhere unity function: 1S(x) = 1. C (1X ) = 1Y and 1X 2

PX = 1Y2 PY = 1 ⇒ Cop = 1.

𝑔 𝐷(𝑔) ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 𝐷 1𝒴 1𝒵

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 8 / 25

slide-18
SLIDE 18

Spectral Characterization of Maximal Correlation

Prop (Spectral Characterization of Maximal Correlation) [R´ enyi, 1959]

For random variables X and Y as defined earlier: ρ(X; Y ) = sup

f ∈L2(X,PX ): E[f (X)]=0

C(f )PY f PX where the supremum is achieved by some f ⋆ ∈ L2 (X, PX) if C is compact.

𝑔⋆ 𝐷(𝑔⋆) ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 1𝒴 1𝒵 𝐷 𝐷∗ 𝜍2

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 9 / 25

slide-19
SLIDE 19

Spectral Characterization of Maximal Correlation

Prop (Spectral Characterization of Maximal Correlation) [R´ enyi, 1959]

For random variables X and Y as defined earlier: ρ(X; Y ) = sup

f ∈L2(X,PX ): E[f (X)]=0

C(f )PY f PX where the supremum is achieved by some f ⋆ ∈ L2 (X, PX) if C is compact.

𝑔⋆ 𝐷(𝑔⋆) ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 1𝒴 1𝒵 𝐷 𝐷∗/𝜍2

C has largest singular value Cop = 1: C (1X ) = 1Y, C ∗ (1Y) = 1X .

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 9 / 25

slide-20
SLIDE 20

Spectral Characterization of Maximal Correlation

Prop (Spectral Characterization of Maximal Correlation) [R´ enyi, 1959]

For random variables X and Y as defined earlier: ρ(X; Y ) = sup

f ∈L2(X,PX ): E[f (X)]=0

C(f )PY f PX where the supremum is achieved by some f ⋆ ∈ L2 (X, PX) if C is compact.

𝑔⋆ 𝐷(𝑔⋆) ℒ2 𝒴, ℙ𝑌 ℒ2 𝒵, ℙ𝑍 1𝒴 1𝒵 𝐷 𝐷∗/𝜍2

C has largest singular value Cop = 1: C (1X ) = 1Y, C ∗ (1Y) = 1X . ρ (X; Y ) = second largest singular value of C with singular vectors f ⋆ ⊥ 1X and g⋆ = C (f ⋆) /ρ (X; Y ) ⊥ 1Y that maximize correlation.

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 9 / 25

slide-21
SLIDE 21

Outline

1

Introduction

2

Polynomial Decompositions of Compact Operators The Hermite SVD Assumptions and Definitions Polynomial EVD of Compact Self-Adjoint Operators Polynomial SVD of Conditional Expectation Operators

3

Illustrations of Polynomial SVDs

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 10 / 25

slide-22
SLIDE 22

The Hermite SVD

Gaussian Channel: PY |X=x = N(x, ν) with expectation parameter x ∈ R and fixed variance ν ∈ (0, ∞) ∀x, y ∈ R, PY |X(y|x) = 1 √ 2πν exp

  • −(y − x)2

  • Gaussian Source: PX = N (0, p) with fixed variance p ∈ (0, ∞)

∀x ∈ R, PX(x) = 1 √2πp exp

  • −x2

2p

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 11 / 25

slide-23
SLIDE 23

The Hermite SVD

Gaussian Channel: PY |X=x = N(x, ν) with expectation parameter x ∈ R and fixed variance ν ∈ (0, ∞) ∀x, y ∈ R, PY |X(y|x) = 1 √ 2πν exp

  • −(y − x)2

  • Gaussian Source: PX = N (0, p) with fixed variance p ∈ (0, ∞)

∀x ∈ R, PX(x) = 1 √2πp exp

  • −x2

2p

  • Remark: (AWGN channel) Y = X + W with X ⊥

⊥ W ∼ N (0, ν) Gaussian Output Marginal: PY = N (0, p + ν) ∀y ∈ R, PY (y) = 1

  • 2π(p + ν)

exp

y2 2(p + ν)

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 11 / 25

slide-24
SLIDE 24

The Hermite SVD

Prop (Hermite SVD) [Abbe & Zheng, 2012], [Makur & Zheng, 2016]

For the Gaussian channel PY |X and Gaussian source PX, the conditional expectation operator C : L2 (R, PX) → L2 (R, PY ) has SVD: ∀k ∈ N, C

  • H(p)

k

  • = σkH(p+ν)

k

with singular values: {σk ∈ (0, 1] : k ∈ N} where σ0 = 1 and lim

k→∞ σk = 0,

and singular vectors: {H(p)

k

with degree k : k ∈ N} - Hermite polynomials that are

  • rthonormal with respect to PX,

{H(p+ν)

k

with degree k : k ∈ N} - Hermite polynomials that are

  • rthonormal with respect to PY .
  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 12 / 25

slide-25
SLIDE 25

The Hermite SVD

Prop (Hermite SVD) [Abbe & Zheng, 2012], [Makur & Zheng, 2016]

For the Gaussian channel PY |X and Gaussian source PX, the conditional expectation operator C : L2 (R, PX) → L2 (R, PY ) has SVD: ∀k ∈ N, C

  • H(p)

k

  • = σkH(p+ν)

k

with singular values: {σk ∈ (0, 1] : k ∈ N} where σ0 = 1 and lim

k→∞ σk = 0,

and singular vectors: {H(p)

k

with degree k : k ∈ N} - Hermite polynomials that are

  • rthonormal with respect to PX,

{H(p+ν)

k

with degree k : k ∈ N} - Hermite polynomials that are

  • rthonormal with respect to PY .

For which joint distributions PX,Y are the singular vectors of C

  • rthonormal polynomials?
  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 12 / 25

slide-26
SLIDE 26

Assumptions and Definitions

L2 (X, PX) and L2 (Y, PY ) are infinite-dimensional.

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 13 / 25

slide-27
SLIDE 27

Assumptions and Definitions

L2 (X, PX) and L2 (Y, PY ) are infinite-dimensional. L2 (X, PX) admits a unique countable orthonormal basis of polynomials, {pk : k ∈ N} ⊆ L2 (X, PX), where pk : X → R is an

  • rthonormal polynomial with degree k.
  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 13 / 25

slide-28
SLIDE 28

Assumptions and Definitions

L2 (X, PX) and L2 (Y, PY ) are infinite-dimensional. L2 (X, PX) admits a unique countable orthonormal basis of polynomials, {pk : k ∈ N} ⊆ L2 (X, PX), where pk : X → R is an

  • rthonormal polynomial with degree k.

L2 (Y, PY ) admits a unique countable orthonormal basis of polynomials, {qk : k ∈ N} ⊆ L2 (Y, PY ), where qk : Y → R is an

  • rthonormal polynomial with degree k.
  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 13 / 25

slide-29
SLIDE 29

Assumptions and Definitions

Definition (Closure over Polynomials and Degree Preservation)

An operator T : L2 (X, PX) → L2 (Y, PY ) is closed over polynomials if for any polynomial p ∈ L2 (X, PX), T(p) is also a polynomial. Furthermore, T is degree preserving if: deg (T(p)) ≤ deg (p) , and T is strictly degree preserving if: deg (T(p)) = deg (p) .

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 14 / 25

slide-30
SLIDE 30

Assumptions and Definitions

Definition (Closure over Polynomials and Degree Preservation)

An operator T : L2 (X, PX) → L2 (Y, PY ) is closed over polynomials if for any polynomial p ∈ L2 (X, PX), T(p) is also a polynomial. Furthermore, T is degree preserving if: deg (T(p)) ≤ deg (p) , and T is strictly degree preserving if: deg (T(p)) = deg (p) . Gaussian Channel Example: Y = X + W with X ⊥ ⊥ W ∼ N (0, ν) E [g(Y )|X = x] = 1 √ 2πν

  • R

g(y) exp

  • −(y − x)2

  • dµ(y)
  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 14 / 25

slide-31
SLIDE 31

Assumptions and Definitions

Definition (Closure over Polynomials and Degree Preservation)

An operator T : L2 (X, PX) → L2 (Y, PY ) is closed over polynomials if for any polynomial p ∈ L2 (X, PX), T(p) is also a polynomial. Furthermore, T is degree preserving if: deg (T(p)) ≤ deg (p) , and T is strictly degree preserving if: deg (T(p)) = deg (p) . Gaussian Channel Example: Y = X + W with X ⊥ ⊥ W ∼ N (0, ν) E [g(Y )|X = x] = 1 √ 2πν

  • R

g(y) exp

  • −(y − x)2

  • dµ(y)

Convolution preserves polynomials!

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 14 / 25

slide-32
SLIDE 32

Polynomial EVD of Compact Self-Adjoint Operators

Theorem (Condition for Orthonormal Polynomial Eigenbasis) [Makur and Zheng, 2016]

Let T : L2 (X, PX) → L2 (X, PX) be a compact self-adjoint operator. T is closed over polynomials and degree preserving if and only if: ∀k ∈ N, T (pk) = αkpk where {αk ∈ R : k ∈ N} are eigenvalues satisfying lim

k→∞ αk = 0.

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 15 / 25

slide-33
SLIDE 33

Polynomial SVD of Conditional Expectation Operators

Theorem (Condition for Orthonormal Polynomial Singular Vectors) [Makur and Zheng, 2016]

Suppose C : L2 (X, PX) → L2 (Y, PY ) is compact and C ∗ : L2 (Y, PY ) → L2 (X, PX) is its adjoint operator. C and C ∗ are closed over polynomials and strictly degree preserving if and only if: ∀k ∈ N, C (pk) = βkqk where {βk ∈ (0, ∞) : k ∈ N} are the singular values such that lim

k→∞ βk = 0.

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 16 / 25

slide-34
SLIDE 34

Polynomial SVD of Conditional Expectation Operators

Theorem (Condition for Orthonormal Polynomial Singular Vectors) [Makur and Zheng, 2016]

Suppose C E [·|Y ] : L2 (X, PX) → L2 (Y, PY ) is compact and C ∗ = E [·|X] : L2 (Y, PY ) → L2 (X, PX) is its adjoint operator. For every n ∈ N, E [X n|Y ] is a polynomial in Y with degree n and E [Y n|X] is polynomial in X with degree n if and only if: ∀k ∈ N, C (pk) = βkqk where {βk ∈ (0, 1] : k ∈ N} are the singular values such that β0 = 1 and lim

k→∞ βk = 0.

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 17 / 25

slide-35
SLIDE 35

Polynomial SVD of Conditional Expectation Operators

Theorem (Condition for Orthonormal Polynomial Singular Vectors) [Makur and Zheng, 2016]

Suppose C E [·|Y ] : L2 (X, PX) → L2 (Y, PY ) is compact and C ∗ = E [·|X] : L2 (Y, PY ) → L2 (X, PX) is its adjoint operator. For every n ∈ N, E [X n|Y ] is a polynomial in Y with degree n and E [Y n|X] is polynomial in X with degree n if and only if: ∀k ∈ N, C (pk) = βkqk where {βk ∈ (0, 1] : k ∈ N} are the singular values such that β0 = 1 and lim

k→∞ βk = 0.

Gaussian Example Proof Sketch: Y = X + W with X ∼ N (0, p) ⊥ ⊥ W ∼ N (0, ν).

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 17 / 25

slide-36
SLIDE 36

Polynomial SVD of Conditional Expectation Operators

Theorem (Condition for Orthonormal Polynomial Singular Vectors) [Makur and Zheng, 2016]

Suppose C E [·|Y ] : L2 (X, PX) → L2 (Y, PY ) is compact and C ∗ = E [·|X] : L2 (Y, PY ) → L2 (X, PX) is its adjoint operator. For every n ∈ N, E [X n|Y ] is a polynomial in Y with degree n and E [Y n|X] is polynomial in X with degree n if and only if: ∀k ∈ N, C (pk) = βkqk where {βk ∈ (0, 1] : k ∈ N} are the singular values such that β0 = 1 and lim

k→∞ βk = 0.

Gaussian Example Proof Sketch: Y = X + W with X ∼ N (0, p) ⊥ ⊥ W ∼ N (0, ν). C, C ∗ are defined by convolution kernels which preserve polynomials.

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 17 / 25

slide-37
SLIDE 37

Polynomial SVD of Conditional Expectation Operators

Theorem (Condition for Orthonormal Polynomial Singular Vectors) [Makur and Zheng, 2016]

Suppose C E [·|Y ] : L2 (X, PX) → L2 (Y, PY ) is compact and C ∗ = E [·|X] : L2 (Y, PY ) → L2 (X, PX) is its adjoint operator. For every n ∈ N, E [X n|Y ] is a polynomial in Y with degree n and E [Y n|X] is polynomial in X with degree n if and only if: ∀k ∈ N, C (pk) = βkqk where {βk ∈ (0, 1] : k ∈ N} are the singular values such that β0 = 1 and lim

k→∞ βk = 0.

Gaussian Example Proof Sketch: Y = X + W with X ∼ N (0, p) ⊥ ⊥ W ∼ N (0, ν). C, C ∗ are defined by convolution kernels which preserve polynomials. By above theorem, C has Hermite polynomial singular vectors.

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 17 / 25

slide-38
SLIDE 38

Outline

1

Introduction

2

Polynomial Decompositions of Compact Operators

3

Illustrations of Polynomial SVDs The Laguerre SVD The Jacobi SVD Natural Exponential Families and Conjugate Priors

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 18 / 25

slide-39
SLIDE 39

The Laguerre SVD

Poisson Channel: PY |X=x = Poisson(x) with rate parameter x ∈ (0, ∞) ∀x ∈ (0, ∞), ∀y ∈ N, PY |X(y|x) = xye−x y! Gamma Source: PX = gamma(α, β) with shape parameter α ∈ (0, ∞) and rate parameter β ∈ (0, ∞) ∀x ∈ (0, ∞), PX(x) = βαxα−1e−βx Γ(α)

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 19 / 25

slide-40
SLIDE 40

The Laguerre SVD

Poisson Channel: PY |X=x = Poisson(x) with rate parameter x ∈ (0, ∞) ∀x ∈ (0, ∞), ∀y ∈ N, PY |X(y|x) = xye−x y! Gamma Source: PX = gamma(α, β) with shape parameter α ∈ (0, ∞) and rate parameter β ∈ (0, ∞) ∀x ∈ (0, ∞), PX(x) = βαxα−1e−βx Γ(α) Negative Binomial Output Marginal: PY = negative-binomial

  • p =

1 β+1, α

  • with success probability parameter

p ∈ (0, 1) and number of failures parameter α ∈ (0, ∞) ∀y ∈ N, PY (y) = Γ(α + y) Γ(α)y!

  • 1

β + 1 y β β + 1 α

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 19 / 25

slide-41
SLIDE 41

The Laguerre SVD

Proposition (Laguerre SVD) [Makur and Zheng, 2016]

For the Poisson channel PY |X and gamma source PX, the conditional expectation operator C : L2 ((0, ∞), PX) → L2 (N, PY ) has SVD: ∀k ∈ N, C

  • L(α,β)

k

  • = σkM
  • α,

1 β+1

  • k

with singular values: {σk ∈ (0, 1] : k ∈ N} where σ0 = 1 and lim

k→∞ σk = 0,

and singular vectors: {L(α,β)

k

with degree k : k ∈ N} - generalized Laguerre polynomials that are orthonormal with respect to PX, {M

  • α,

1 β+1

  • k

with degree k : k ∈ N} - Meixner polynomials that are

  • rthonormal with respect to PY .
  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 20 / 25

slide-42
SLIDE 42

The Jacobi SVD

Binomial Channel: PY |X=x = binomial(n, x) with number of trials parameter n ∈ N\{0} and success probability parameter x ∈ (0, 1) ∀x ∈ (0, 1), ∀y ∈ [n] {0, . . . , n} , PY |X(y|x) = n y

  • xy(1 − x)n−y

Beta Source: PX = beta(α, β) with shape parameters α ∈ (0, ∞) and β ∈ (0, ∞) ∀x ∈ (0, 1), PX(x) = xα−1(1 − x)β−1 B(α, β)

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 21 / 25

slide-43
SLIDE 43

The Jacobi SVD

Binomial Channel: PY |X=x = binomial(n, x) with number of trials parameter n ∈ N\{0} and success probability parameter x ∈ (0, 1) ∀x ∈ (0, 1), ∀y ∈ [n] {0, . . . , n} , PY |X(y|x) = n y

  • xy(1 − x)n−y

Beta Source: PX = beta(α, β) with shape parameters α ∈ (0, ∞) and β ∈ (0, ∞) ∀x ∈ (0, 1), PX(x) = xα−1(1 − x)β−1 B(α, β) Beta-Binomial Output Marginal: PY = beta-binomial(n, α, β) ∀y ∈ [n], PY (y) = n y B(α + y, β + n − y) B(α, β)

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 21 / 25

slide-44
SLIDE 44

The Jacobi SVD

Proposition (Jacobi SVD) [Makur and Zheng, 2016]

For the binomial channel PY |X and beta source PX, the conditional expectation operator C : L2 ((0, 1), PX) → L2 ([n], PY ) has SVD: ∀k ∈ [n], C

  • J(α,β)

k

  • = σkQ(α,β)

k

∀k ∈ N\[n], C

  • J(α,β)

k

  • = 0

with singular values: {σk ∈ (0, 1] : k ∈ [n]} where σ0 = 1, and singular vectors: {J(α,β)

k

with degree k : k ∈ N} - Jacobi polynomials that are

  • rthonormal with respect to PX,

{Q(α,β)

k

with degree k : k ∈ [n]} - Hahn polynomials that are

  • rthonormal with respect to PY .
  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 22 / 25

slide-45
SLIDE 45

Why are these joint distributions special?

PY |X is a natural exponential family with quadratic variance function (introduced in [Morris, 1982]): ∀x ∈ X, ∀y ∈ Y, PY |X(y|x) = exp (xy − α(x) + β(y)) where PY |X(y|0) = exp (β(y)) is the base distribution, α(x) is the log-partition function with α(0) = 0, and VAR(Y |X = x) is a quadratic function of E [Y |X = x].

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 23 / 25

slide-46
SLIDE 46

Why are these joint distributions special?

PY |X is a natural exponential family with quadratic variance function (introduced in [Morris, 1982]): ∀x ∈ X, ∀y ∈ Y, PY |X(y|x) = exp (xy − α(x) + β(y)) PX belongs to the corresponding conjugate prior family: ∀x ∈ X, PX(x; y′, n) = exp

  • y′x − nα(x) − τ(y′, n)
  • where τ(y′, n) is the log-partition function.
  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 23 / 25

slide-47
SLIDE 47

Why are these joint distributions special?

PY |X is a natural exponential family with quadratic variance function (introduced in [Morris, 1982]): ∀x ∈ X, ∀y ∈ Y, PY |X(y|x) = exp (xy − α(x) + β(y)) PX belongs to the corresponding conjugate prior family: ∀x ∈ X, PX(x; y′, n) = exp

  • y′x − nα(x) − τ(y′, n)
  • All moments exist and are finite:

Gaussian likelihood with Gaussian prior, Poisson likelihood with gamma prior, binomial likelihood with beta prior.

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 23 / 25

slide-48
SLIDE 48

Conclusion

Summary:

1 Regression and maximal correlation

⇒ conditional expectation operators

2 Closure over polynomials and degree preservation

⇔ orthogonal polynomial eigenvectors or singular vectors

3 Check conditional moments are polynomials

⇒ Gaussian-Gaussian, Gamma-Poisson, Beta-Binomial examples

4 Examples have natural exponential family/conjugate prior structure

  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 24 / 25

slide-49
SLIDE 49
  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 25 / 25

slide-50
SLIDE 50
  • A. Makur & L. Zheng (MIT)

Polynomial Spectral Decomposition Allerton Conference 2016 25 / 25