Super-resolution using Gaussian Process Regression Final Year - - PowerPoint PPT Presentation

super resolution using gaussian process regression
SMART_READER_LITE
LIVE PREVIEW

Super-resolution using Gaussian Process Regression Final Year - - PowerPoint PPT Presentation

Super-resolution using Gaussian Process Regression Final Year Project Interim Report He He Department of Electronic and Information Engineering The Hong Kong Polytechnic Unviersity December 30, 2010 () December 30, 2010 1 / 33 Outline


slide-1
SLIDE 1

Super-resolution using Gaussian Process Regression

Final Year Project Interim Report He He

Department of Electronic and Information Engineering The Hong Kong Polytechnic Unviersity

December 30, 2010

() December 30, 2010 1 / 33

slide-2
SLIDE 2

Outline

1

Introduction

2

Gaussian Process Regression Multivariate Normal Distribution Gaussian Process Regression Training

3

GPR for Super-resolution Framework Covariance Function

() December 30, 2010 2 / 33

slide-3
SLIDE 3

Outline

1

Introduction

2

Gaussian Process Regression Multivariate Normal Distribution Gaussian Process Regression Training

3

GPR for Super-resolution Framework Covariance Function

() December 30, 2010 3 / 33

slide-4
SLIDE 4

The goal of super-resolution (SR) is to estimate a high-resolution (HR) image from one or a set of low-resolution (LR) images. It is widely applied in face recognition, medical imaging, HDTV etc.

Figure: Face recognition in video.

() December 30, 2010 4 / 33

slide-5
SLIDE 5

The goal of super-resolution (SR) is to estimate a high-resolution (HR) image from one or a set of low-resolution (LR) images. It is widely applied in face recognition, medical imaging, HDTV etc.

Figure: Super-resolution in medical imaging.

() December 30, 2010 4 / 33

slide-6
SLIDE 6

Super-resolution Methods

Interpolation-based methods Fast but the HR image is usually blurred. E.g., bicubic interpolation, NEDI. Learning-based methods Hallucinate textures from the HR/LR image pair database. Reconstruction-based methods Formalize an optimization problem constrained by the LR image with various priors.

() December 30, 2010 5 / 33

slide-7
SLIDE 7

Outline

1

Introduction

2

Gaussian Process Regression Multivariate Normal Distribution Gaussian Process Regression Training

3

GPR for Super-resolution Framework Covariance Function

() December 30, 2010 6 / 33

slide-8
SLIDE 8

Multivariate Normal Distribution

Definition

A random vector X = (X1, X2, . . . , Xp) is said to be multivariate normally (MVN) distributed if every linear combination of its components Y = aTX has a univariate normal distribution. Real-world random variables can

  • ften be approximated as following a multivariate normal distribution.

The probability density function of X is f (x) = 1 (2π)(p/2)|Σ|1/2 exp 1 2(x − µ)TΣ−1(x − µ)

  • (1)

where µ is the mean of X and Σ is the covariance matrix.

() December 30, 2010 7 / 33

slide-9
SLIDE 9

Multivariate Normal Distribution

Example

Bivariate normal distribution µ = [1 1]′, Σ = 1 1

  • .

() December 30, 2010 8 / 33

slide-10
SLIDE 10

Multivariate Normal Distribution

Property 1 The joint distribution of two MVN random variables is also an MVN distribution. Given X1 ∼ N(µ1, Σ1), X2 ∼ N(µ2, Σ2) and X = X1 X2

  • , we have

X ∼ Np(µ, Σ) with µ = µ1 µ2

  • , Σ =

Σ11 Σ12 Σ21 Σ11

  • .

() December 30, 2010 9 / 33

slide-11
SLIDE 11

Multivariate Normal Distribution

Property 2 The conditional distribution of the components of MVN are (multivariate) normal. The distribution of X1, given that X2 = x2, is normal and has Mean = µ1 + Σ12Σ−1

22 (x2 − µ2)

(2) Covariance = Σ11 − Σ12Σ−1

22 Σ21

(3)

() December 30, 2010 10 / 33

slide-12
SLIDE 12

Gaussian Process

Definition

Gaussian Process (GP) defines a distribution over the function f , where f is a mapping from the input space X to R, such that for any finite subset of X, its marginal distribution P(f (x1), f (x2), ...f (xn)) is a multivariate normal distribution. f|X ∼ N(m(x), K(X, X)) (4) where X = {x1, x2, . . . , xn} (5) m(x) = E [f (x)] (6) k(xi, xj) = E

  • (f (xi) − m(x))(f (xi)T − m(xT))
  • (7)

and K(X, X) denotes the covariance matrix such that Kij = k(xi, xj).

() December 30, 2010 11 / 33

slide-13
SLIDE 13

Gaussian Process

Formally, we write the Gaussian Process as f (x) ∼ GP(m(x), k(xi, xj)) (8) Without loss of generality, the mean is usually taken to be zero. Parameterized by the mean function m(x) and the covariance function k(xi, xj) Infer in the function space directly

() December 30, 2010 12 / 33

slide-14
SLIDE 14

Gaussian Process Regression

Model: f (x) ∼ GP(m(x), k(xi, xj)) (9) Given the inputs X∗, the output f∗ is f∗ ∼ N(0, K(X∗, X∗)) (10) According to the Gaussian prior, the joint distribution of the training

  • utputs f, and the test outputs f∗ is

f f∗

  • ∼ N
  • 0,

K(X, X) K(X, X∗) K(X∗, X) K(X∗, X∗)

  • .

(11)

() December 30, 2010 13 / 33

slide-15
SLIDE 15

Noisy Model

In reality, we do not have access to true function values but rather noisy

  • bservations. Assuming independent indentically distributed noise, we

have the noisy model y = f (x) + ε, ε ∼ N(0, σ2

n)

(12) f (x) ∼ GP(m(x), K(X, X)) (13) Var(y) = Var(f (x)) + Var(ε) = K(X, X) + σ2

nI

(14) Thus, the joint distribution for prediction is y f∗

  • ∼ N
  • 0,

K(X, X) + σ2

nI

K(X, X∗) K(X∗, X) K(X∗, X∗)

  • (15)

() December 30, 2010 14 / 33

slide-16
SLIDE 16

Prediction

Referring to the previous property of the conditional distribution, we can

  • btain

f∗ ∼ N(¯ f, V (f∗)) (16) ¯ f∗ = K(X∗, X)[K(X, X) + σ2

nI]−1y,

(17) V (f∗) = K(X∗, X∗) − K(X∗, X)[K(X, X) + σ2

nI]−1K(X, X∗).

(18) y are the training outputs and f∗ are the test outputs, which are predicted as the mean ¯ f.

() December 30, 2010 15 / 33

slide-17
SLIDE 17

Marginal Likelihood

GPR model: y = f + ǫ (19) f ∼ GP(m(x), K) (20) ǫ ∼ N(0, σ2

nI)

(21) y is an n-dimensional vector of observations. Without loss of generality, let m(x) = 0. Thus y|X follows a normal distribution with E(y|X) = (22) Var(y|X) = K(X, X) + σ2

nI

(23)

() December 30, 2010 16 / 33

slide-18
SLIDE 18

Marginal Likelihood

Let Ky = Var(y|X), p(y|X) = 1 (2π)n/2|Ky|1/2 exp

  • −1

2yTK−1

y y

  • (24)

The log marginal likelihood is L = log p(y|X) = −n 2log 2π − 1 2log |Ky| − 1 2fTK−1

y f

(25)

() December 30, 2010 17 / 33

slide-19
SLIDE 19

Maximum a posteriori

Matrix derivative: ∂ ∂x Y = −Y−1 ∂Y ∂θi Y−1 (26) ∂ ∂x log |Y| = tr (Y−1 ∂Y ∂θi ) (27) Gradient ascent: ∂L ∂θi = 1 2yTK −1 ∂K ∂θi K −1y − 1 2tr(K −1 ∂K ∂θi ) (28)

∂K ∂θi is a matrix of derivatives of each element.

() December 30, 2010 18 / 33

slide-20
SLIDE 20

Outline

1

Introduction

2

Gaussian Process Regression Multivariate Normal Distribution Gaussian Process Regression Training

3

GPR for Super-resolution Framework Covariance Function

() December 30, 2010 19 / 33

slide-21
SLIDE 21

Graphical Representation

Model: y = f (x) + ε Squares: observed pixels Circles: unknown Gaussian field Inputs (x): neighbors (predictors) of the target pixel Outputs (y): pixel at the center of each 3 × 3 patch Thick horizontal line: a set of fully connected nodes.

() December 30, 2010 20 / 33

slide-22
SLIDE 22

Workflow

Stage 1: interpolation Input LR patch

() December 30, 2010 21 / 33

slide-23
SLIDE 23

Workflow

Stage 1: interpolation Sample training targets

() December 30, 2010 21 / 33

slide-24
SLIDE 24

Workflow

Stage 1: interpolation SR based on Bicubic Interpolation Stage 2: deblurring

() December 30, 2010 21 / 33

slide-25
SLIDE 25

Workflow

Stage 1: interpolation Stage 2: deblurring Sample training targets

() December 30, 2010 21 / 33

slide-26
SLIDE 26

Workflow

Stage 1: interpolation Stage 2: deblurring Obtain neighbors from the downsampled patch

() December 30, 2010 21 / 33

slide-27
SLIDE 27

Workflow

Stage 1: interpolation Stage 2: deblurring SR based on the simulated blurring process

() December 30, 2010 21 / 33

slide-28
SLIDE 28

Covariance Equation

defines the similarity between two points (vectors) indicate the underlying distribution of functions in GP Squared Exponential covariance function k(xi, xj) = σ2

f exp

  • −1

2 (xi − xj)′(xi − xj) ℓ2

  • (29)

σ2

f represents the signal variance and ℓ defines the characteristic

length scale. Given an image I, the covariance between two pixels Ii,j and Im,n is calculated as k(I(i,j),N, I(m,n),N), where N means to take the 8 nearest pixels around the pixel. Therefore, the similarity is based on the Euclidean distance between the pixels’ neighbors.

() December 30, 2010 22 / 33

slide-29
SLIDE 29

Covariance Equation

(a) Test point (b) Training patch (c) Covariance ma- trix

Local similarity: high responses (red regions) from the training patch are concentrated on edges Global similarity: high-responsive regions also include other similar edges within the patch Conclusion: pixels embedded in a similar structure to that of the target pixel in terms of the neighborhood tend to have higher weights during prediction

() December 30, 2010 23 / 33

slide-30
SLIDE 30

Hyperparameter Adaptation

Hyperparameters: σ2

f : signal variance

σ2

n: noise variance

ℓ: characteristic length scale

(a) Test (b) Training (c) ℓ = .50, σn = .01 (d) ℓ = .05, σn = .001 (e) ℓ = 1.65, σn = .14

(c): MAP estimation (d): Quickly varying field with low noise (e): Slowly varyin field with high noise

() December 30, 2010 24 / 33

slide-31
SLIDE 31

Hyperparameter Adaptation

Hyperparameters: σ2

f : signal variance

σ2

n: noise variance

ℓ: characteristic length scale

(a) Test (b) Training (c) ℓ = .50, σn = .01 (d) ℓ = .05, σn = .001 (e) ℓ = 1.65, σn = .14

(c): MAP estimation (d): Quickly varying field with low noise (high-frequncy artifacts) (e): Slowly varyin field with high noise

() December 30, 2010 24 / 33

slide-32
SLIDE 32

Hyperparameter Adaptation

Hyperparameters: σ2

f : signal variance

σ2

n: noise variance

ℓ: characteristic length scale

(a) Test (b) Training (c) ℓ = .50, σn = .01 (d) ℓ = .05, σn = .001 (e) ℓ = 1.65, σn = .14

(c): MAP estimation (d): Quickly varying field with low noise (high-frequncy artifacts) (e): Slowly varyin field with high noise (too smooth)

() December 30, 2010 24 / 33

slide-33
SLIDE 33

Hyperparameter Adaptation

Log marginal likelihood: log p(y|X, θ) = −1 2yTK −1

y

y − 1 2log|Ky| − n 2log 2π (30) Maximize a posteriori (gradient descent): ∂L ∂θi = 1 2yTK −1 ∂K ∂θi K −1y − 1 2tr(K −1 ∂K ∂θi ) (31) θ denotes the parameter set.

() December 30, 2010 25 / 33

slide-34
SLIDE 34

Results

(a) Bicubic (MSSIM=0.84) (b) GPP (MSSIM=0.84) (c) Our result (MSSIM=0.86) (d) Ground truth

() December 30, 2010 26 / 33

slide-35
SLIDE 35

Results

(a) Input (b) 3× direct magnification (c) 10× our result (d) 10× detail synthesis

() December 30, 2010 27 / 33

slide-36
SLIDE 36

Results

(a) GPP (b) Our result

() December 30, 2010 28 / 33

slide-37
SLIDE 37

Results

(a) Bicubic (b) Edge statistics (c) Patch redundancy (d) Ours

() December 30, 2010 29 / 33

slide-38
SLIDE 38

Results

(a) Bicubic (b) Edge statistics (c) Patch redundancy (d) Ours

() December 30, 2010 30 / 33

slide-39
SLIDE 39

Results

(a) Bicubic (b) Edge statistics

() December 30, 2010 31 / 33

slide-40
SLIDE 40

Results

(a) Bicubic (b) Edge statistics

() December 30, 2010 32 / 33

slide-41
SLIDE 41

Results

(a) Bicubic (b) Edge statistics

() December 30, 2010 33 / 33