Lecture 03 Optimal Detection under Noise I-Hsiang Wang - - PowerPoint PPT Presentation

lecture 03 optimal detection under noise
SMART_READER_LITE
LIVE PREVIEW

Lecture 03 Optimal Detection under Noise I-Hsiang Wang - - PowerPoint PPT Presentation

Principle of Communications, Fall 2017 Lecture 03 Optimal Detection under Noise I-Hsiang Wang ihwang@ntu.edu.tw National Taiwan University 2017/9/28 x ( t ) Pulse Up { u m } x b ( t ) Shaper Converter discrete baseband passband Noisy


slide-1
SLIDE 1

Principle of Communications, Fall 2017

Lecture 03 Optimal Detection under Noise

I-Hsiang Wang

ihwang@ntu.edu.tw National Taiwan University 2017/9/28

slide-2
SLIDE 2

2

passband waveform

Pulse Shaper Sampler + Filter

discrete sequence

Up Converter Down Converter

baseband waveform

Noisy Channel {um} {ˆ um} xb(t) yb(t)

y(t) x(t)

Previous lecture: y(t) = x(t)

= ⇒ we can guarantee yb(t) = xb(t) for all t and um = ˆ um, for all m.

slide-3
SLIDE 3

Y (t) = x(t) + Z(t)

3

passband waveform

Pulse Shaper Sampler + Filter

discrete sequence

Up Converter Down Converter

baseband waveform

Noisy Channel {um} {ˆ um} xb(t) yb(t)

y(t) x(t)

This lecture:

additive noise

Questions to be addressed

How to model the noise?

Answer

Equivalent noise after down-conversion, filtering, and sampling? Z(t) as white Gaussian process Vm = um + Zm Discrete-time equivalent: Optimal decision rule that minimizes error probability

Filter + Sampler +

Detection

How to find the best from ? ˆ um Vm

slide-4
SLIDE 4

Outline

  • Random processes
  • Statistical model of noise
  • Hypothesis testing
  • Optimal detection rules under additive noise
  • Performance analysis

4

slide-5
SLIDE 5

5

Part I. Random Processes

Definition, Gaussian Processes, Stationarity, Power Spectral Density, Filtering

slide-6
SLIDE 6

Random variables

6

Probability space

(Ω, F, P)

sample space sigma field probability measure

Random variable

X : Ω R, ω X(ω).

Distribution

FX(x) P{X(ω) ≤ x}, x ∈ R. (Ω, F, P)

X

− → (R, B, FX) Roughly speaking:

cumulative distribution function (CDF)

slide-7
SLIDE 7

Random process

7

A collection of of jointly distributed random variables:

{X(ω; t) | t ∈ I}

I is uncountable (I = R) → random waveform {Xm : m ∈ Z} I is countable (I = Z) → random sequence {X(t) : t ∈ R}

Distribution is determined by the joint distribution of {X(ω; t) | t ∈ S} for all finite subsets S ⊆ I FX(t1),...,X(tn)(x1, . . . , xn) P {X(ω; t1) ≤ x1, . . . , X(ω; tn) ≤ xn} for all positive integer n and finite subset {t1, t2, . . . , tn} ⊆ I

slide-8
SLIDE 8

Random process

8

First moment µX(t) E [X(t)] Second moment X(ω; t)

F

← → ˘ X(ω; f) = ∞

−∞

X(ω; t) exp (−j2πft) dt Fourier transform (auto-covariance function) KX(s, t) Cov (X(s), X(t)) = E [(X(s) − µX(s))(X(t) − µX(t))] RX(s, t) E [X(s)X(t)] (auto-correlation function)

slide-9
SLIDE 9

Gaussian random variable

9

X ∼ N(µ, σ2)

mean variance

Gaussian probability density function (PDF) fX(x) ∂ ∂xFX(x) = 1 √ 2πσ2 exp

  • − |x − µ|2

2σ2

slide-10
SLIDE 10

Jointly Gaussian random variables

10

Definition (Jointly Gaussian) {Z1, Z2, . . . , Zn} : J.G. ⇐ ⇒ ∃ m ≤ n, W1, . . . , Wm

i.i.d.

∼ N(0, 1) constant vector b ∈ Rn a ∈ Rn×m constant matrix such that Z    Z1 . . . Zn    = aW + b, where W    W1 . . . Wm   .

jointly Gaussian random vector

slide-11
SLIDE 11

Jointly Gaussian random variables

11

First moment

Z ∼ N (µ, k) ,

µ E[Z] =    E[Z1] . . . E[Zn]    Second moment k E[(Z − µ)(Z − µ)] =      Var[Z1] Cov(Z1, Z2) · · · Cov(Z1, Zn) Cov(Z2, Z1) Var[Z1] · · · Cov(Z2, Zn) . . . ... . . . Cov(Zn, Z1) · · · Var[Zn]     

slide-12
SLIDE 12

Jointly Gaussian random variables

12

Z ∼ N (µ, k) ,

PDF of jointly Gaussian random vector fZ(z) = 1 (2π)

n 2

det(k) exp

  • −1

2(z − µ)k−1(z − µ)

  • Important fact:

linear combinations of jointly Gaussian random variables are also jointly Gaussian

slide-13
SLIDE 13

Gaussian process

13

Definition (Gaussian Process) {Z(t) | t ∈ I} is a Gaussian process ⇐ ⇒ ∀ n ∈ N and {t1, . . . , tn} ⊆ I, {Z(t1), . . . , Z(tn)} : J.G. Theorem (Distribution of Gaussian Process) Distribution of a Gaussian process {Z(t)} is completely determined by its first and second moments µZ(t), KZ(s, t)

slide-14
SLIDE 14

Orthonormal expansion of Gaussian processes

  • For the Gaussian processes considered in this course, we assume that it can be

expanded over an orthonormal basis with independent Gaussian coefficients:

  • This process is zero-mean, but this is WLOG when modeling the noise
  • The energy of the random waveform is also random:
  • Auto-covariance of this Gaussian process is

14

Z(t) =

  • k=1

Zk φk(t), Zk ∼ N(0, σ2

k)

: mutually independent.

∥Z(t)∥2 = ∞

k=1 Z2 k

KZ(s, t) = Cov ∞

  • k=1

  • m=1

ZkZm φk(s)φm(t)

  • =

  • k=1

σ2

k φk(s)φk(t).

quite general

slide-15
SLIDE 15

Stationary processes

15

  • Statistical properties of the noise tend to behave in a time invariant manner
  • This motivates us to define stationary processes:
  • Hard to check
  • Definition (Wide-sense stationary)
  • Mean function is fixed:
  • Auto-covariance is only a function of time difference:
  • A Gaussian process is stationary iff it is WSS.

is stationary and are identically distributed for any time shift is wide-sense stationary (WSS) the first and second moments are time-shift invariant, that is,

for all . for all and .

slide-16
SLIDE 16

Power spectral density

16

  • In the following, we will define the power spectral density of a zero-mean WSS

random process as the Fourier transform of the auto-covariance (auto- correlation) function:

  • Several useful properties for a WSS real-valued random process {X(t)}
  • Auto-correlation and auto-covariance are both even functions:
  • Its PSD is real and even:
  • PSD is non-negative, and

RX(τ)

F

← → SX(f).

RX(τ) = RX(−τ), KX(τ) = KX(−τ) SX(f) = S∗

X(f) = SX(−f)

E

  • |X(t)|2

= ∞

−∞

SX(f) df.

slide-17
SLIDE 17

Energy spectral density of a deterministic waveform

17

  • narrow-band filter
  • For a deterministic signal x(t), its energy spectral density

is the energy per unit frequency (hertz) at each frequency Operational Definition: Ex(f0) lim

∆f→0

energy of output y(t; f0, ∆f) ∆f ↑ the square of the freq. response = |˘ x(f0)|2 .

slide-18
SLIDE 18

Auto-correlation of a deterministic waveform

18

For a deterministic signal x(t), its auto-correlation is the inverse Fourier transform of the energy spectral density. Rx(t) ∞

−∞

x(τ)x∗(τ − t) dτ

F

← → Ex(f)

Auto-correlation is effectively the convolution of x(t) with x*(-t)

In PAM demodulation, the Rx filter q(t) = p*(-t). The filtering block is called “correlator” in some literatures

slide-19
SLIDE 19

Power spectral density of a deterministic waveform

19

For a deterministic signal x(t), its power spectral density is the power per unit frequency (hertz) at each frequency Power = Energy/Time Operational Definition:

  • multiplication with a small time interval
  • xt0(t) x(t)1
  • t ∈
  • −t0

2 , t0 2

  • Sx(f) lim

t0→∞

1 t0 Ext0(f).

slide-20
SLIDE 20

PSD of random process vs. deterministic wavform

20

Inverse Fourier transform of the power spectral density is the long-term time average of the auto-correlation! Rx(t) lim

t0→∞

1 t0

  • t0

2

− t0

2

x(τ)x∗(τ − t) dτ

F

← → Sx(f) For an important class of random processes called ergodic processes, the long-term time average is equal to the statistical average: RX(t) = E [X(τ)X∗(τ − t)] = RX(t) = ⇒ SX(f) = F {RX(t)} = SX(f) ↑ the reason why it is called PSD!

ergodicity

slide-21
SLIDE 21

Filtering of random processes

  • We are primarily interested in the following properties of a random process after

passing through an LTI filter:

  • First and second moments
  • Stationarity
  • Gaussianity
  • Stationarity/wide-sense stationarity is preserved under LTI filtering
  • Gaussianity is preserved under LTI filtering
  • For a WSS process, the PSD of the filtered process is the PSD of the original

process times the ESD of the LTI filter

21

slide-22
SLIDE 22

First and second moments

22

X(t) h(t) Y (t) = (X ∗ h)(t) First moment: µY (t) = (µX ∗ h)(t) Second moment: KY (s, t) = ∞

−∞

−∞

h(s − ξ)KX(ξ, τ)h(t − τ) dξdτ

slide-23
SLIDE 23

Stationarity

23

X(t) h(t) Y (t) = (X ∗ h)(t) First moment: µY = µX˘ h(0) Second moment: KY (τ) = (h ∗ KX ∗ hrv) (τ)

WSS WSS Stationary Stationary

Power spectral density: SY (f) = |˘ h(f)|2 SX(f)

slide-24
SLIDE 24

X1(t)

Two branches of filtered processes

24

X2(t) Y1(t) = (X1 ∗ h1)(t) Y2(t) = (X2 ∗ h2)(t) h1(t) h2(t) Definition (jointly WSS):

jointly WSS jointly WSS

{X1(t)} and {X2(t)} are jointly WSS if they are both WSS and the cross-covariance KX1,X2(t + τ, t) Cov (X1(s), X2(t)) depends on τ only

slide-25
SLIDE 25

Two branches of filtered processes

25

X1(t) X2(t) Y1(t) = (X1 ∗ h1)(t) Y2(t) = (X2 ∗ h2)(t) h1(t) h2(t)

jointly WSS jointly WSS

KY1,Y2(τ) = (h1 ∗ KX1,X2 ∗ h2,rv) (τ) Cross-covariance: Cross PSD: SY1,Y2(f) F {KY1,Y2(τ)} = ˘ h1(f)SX1,X2(f)˘ h∗

2(f)

slide-26
SLIDE 26

Projecting random processes

26

X1(t) X2(t) g1(t) g2(t) W1 = X1(t), g1(t) W2 = X2(t), g2(t)

−∞

X1(t)g∗

1(t) dt

−∞

X2(t)g∗

2(t) dt

First moment: E [Wi] = µXi(t), gi(t) Second moment: Cov (W1, W2) = ∞

−∞

−∞

g2(t)KX1,X2(s, t)g∗

1(s) ds dt

slide-27
SLIDE 27

Projecting random processes

27

X1(t) X2(t) g1(t) g2(t) W1 = X1(t), g1(t) W2 = X2(t), g2(t)

−∞

X1(t)g∗

1(t) dt

−∞

X2(t)g∗

2(t) dt

First moment: E [Wi] = µX ˘ g∗

i (0)

Second moment: Cov (W1, W2) = (g2 KX1,X2)(t), g1(t)

jointly WSS

slide-28
SLIDE 28

Gaussianity

28

h(t)

Gaussian Gaussian

Z(t) U(t) = (Z ∗ h)(t)

Gaussian Process

              

Jointly Gaussian

Z(t) g1(t) gm(t) V1 = Z(t), g1(t) Vm = Z(t), gm(t)

−∞

Z(t)g∗

1(t) dt

slide-29
SLIDE 29

29

Part II. Statistical Model of Noise

Additive White Gaussian Noise (AWGN), Equivalent Complex Baseband Noise

slide-30
SLIDE 30

Additive noise

30

Channel Output ≠ Channel Input

Noisy Channel x(t) y(t)

Simplest noise model: Y (t) = x(t) + Z(t)

some random process

  • ps. scaling of x(t) is absorbed into

the random process

Physical properties of the additive noise

Noise = aggregation of many additive “sub-noises” (thermal noise, device imperfection, etc.) Sub-noises are zero-mean WLOG and statistically independent All sub-noises contribute about the same energy to the total noise Sub-noises are rather stationary in the duration of interest

slide-31
SLIDE 31

Gaussian process modeling the random noise

31

  • Recall the Central Limit Theorem:
  • The sum of zero-mean i.i.d. random variables converges to a Gaussian in distribution:
  • Other reasons of using Gaussian as the model of the additive noise:
  • Under a variance constraint, Gaussian noise is the “worst-case” noise in the sense that it

is the most “random” and yields smallest channel capacity

  • Gaussian is easy to manipulate analytically, which gives great insights about more

complex models

  • Hence, model the physical noise as a zero-mean stationary Gaussian process

1 √n

n

  • i=1

Xi

d

→ N(0, σ2) as n → ∞

slide-32
SLIDE 32

White Gaussian noise

  • For the time interval of interest, the noise waveform is rather stationary
  • ⟹ Assume the noise random process is stationary (from t = -∞ to ∞)
  • For the frequency band of interest, the noise waveform has roughly constant

energy spectral density

  • ⟹ Assume the PSD of the noise random process is constant (from f = -∞ to ∞)
  • White Gaussian process: a zero-mean stationary Gaussian process {W(t)} with
  • WGN combined with linear filters can generate any stationary Gaussian process.

32

KW (τ) = N0 2 δ(τ) SW (f) = N0 2

slide-33
SLIDE 33
  • White Gaussian noise process, combined with LTI filters, can be used to

generate (almost) all kinds of stationary zero-mean Gaussian processes.

  • For a stationary zero-mean Gaussian process {Z(t)} with PSD :
  • Choose an LTI filter with frequency response
  • Recall: , real-valued, and even.
  • Hence, so does is also real-valued and even!
  • Passing WGN {W(t)} through the LTI filter h(t) results in {Z(t)}.
  • For most communication systems, there are band-limited filters at the receiver.

Hence, the PSD of the filtered noise process is also band-limited.

WGN passed through band-limited filters

33

SZ(f)

˘ h(f) =

2 N0

  • SZ(f)

SZ(f) ≥ 0 ˘ h(f) = ⇒ h(t)

Z(t) = (W ∗ h)(t)

  • h(t)

WGN with PSD

  • ˘

h(f) =

2 N0

  • SZ(f)

SZ(f) = N0

2

  • ˘

h(f)

  • 2
slide-34
SLIDE 34

WGN projected onto an orthonormal set

34

WGN with PSD

             

i.i.d. Gaussians!

Cov (Wi, Wj) = ⟨(gj ∗ KW )(t), gi(t)⟩ = ⟨ N0

2 gj(t), gi(t)⟩

= N0

2 1 {i = j}

  • φ1(t)

W1 = W(t), φ1(t) φm(t) Wm = W(t), φm(t)

slide-35
SLIDE 35

Equivalent noise in QAM demodulation

35

x(t) + Z(t) √ 2 cos(2πfct) − √ 2 sin(2πfct) Filter q(t) Filter q(t) T =

1 2W

T =

1 2W

{u(I)

m + Z(I) m }

{u(Q)

m

+ Z(Q)

m }

slide-36
SLIDE 36

36

√ 2 cos(2πfct) − √ 2 sin(2πfct) Filter q(t) Filter q(t) T =

1 2W

T =

1 2W

Z(t) {Z(I)

m }

{Z(Q)

m }

Recall: QAM demodulation is equivalent to projection onto an orthonormal set

slide-37
SLIDE 37

37

Z(t) {Z(I)

m }

{Z(Q)

m }

Recall: QAM demodulation is equivalent to projection onto an orthonormal set Project onto ψ(I)

m (t), m ∈ Z

Project onto ψ(Q)

m (t), m ∈ Z

Hence, Z(I)

m , Z(Q) m i.i.d.

∼ N(0, N0

2 ),

∀ m ∈ Z

p(t − mT) √ 2 cos(2πfct) ← → ψ(I)

m (t)

−p(t − mT) √ 2 sin(2πfct) ← → ψ(Q)

m (t)

slide-38
SLIDE 38

38

Circular symmetric complex Gaussian

  • In the equivalent complex baseband model, recall that we use the in-phase

component as the real part and the quadrature component as the imaginary part.

  • Following the same spirit, we can define the equivalent complex baseband

demodulated symbols as

  • The equivalent complex baseband noise is complex, with i.i.d. Gaussian real and

imaginary parts:

  • This is called circularly symmetric Gaussian:

um u(I)

m + ju(Q) m

Vm um + Zm, Zm Z(I)

m + jZ(Q) m

Zm Z(I)

m + jZ(Q) m ,

Z(I)

m , Z(Q) m i.i.d.

∼ N(0, N0

2 )

Zm ∼ CN(0, N0)

slide-39
SLIDE 39

Summary: the additive noise model

39

Pulse Shaper Filter + Sampler

discrete sequence

Up Converter Down Converter

baseband waveform

Noisy Channel

passband waveform

Vm = um + Zm um

Equivalent complex baseband additive noise channel model

Zm

i.i.d.

∼ CN(0, N0), ∀ m

slide-40
SLIDE 40

40

Part III. Optimal Detection Rules

Basic Hypothesis Testing, Performance Metrics, MAP Detection, Maximum Likelihood Detection, Minimum Distance Rule

slide-41
SLIDE 41

Detection: problem statement

41

u

Equivalent complex baseband additive noise channel model

V = u + Z

Detection the symbol mapper maps the bits to a symbol in the constellation set u ∈ A {a1, . . . , aM}

The above is a special case of hypothesis testing problems = aθ ˆ u = aˆ

θ

ˆ Θ = φ(V )

Statistical Experiment

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

slide-42
SLIDE 42

Performance metric: probability of error

  • For any give index , we use the probability of error to measure the

performance of the decision making algorithm :

  • It is quite obvious that a single decision making algorithm cannot do

simultaneously well for all

  • Hence, there are different kinds of formulations

42

Statistical Experiment

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

θ ∈ Θ φ(·) (φ; θ) Pθ {φ(X) = θ} θ ∈ Θ

slide-43
SLIDE 43

φMinimax arg min

φ

max

θ

Pe(φ; θ)

43

Statistical Experiment

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

Bayesian formulation

suppose the index is random and distributed as Θ ∼ π Goal: find a decision making algorithm s.t. the avg. probability of error is minimized φ(·)

(φ) Θ∼π [ (φ; Θ)] φBayes arg min

φ

Pe(φ) Minimax formulation

no assumption on the prior distribution of the index Goal: find a decision making algorithm s.t. worst-case prob. of error is minimized φ(·)

slide-44
SLIDE 44

44

Bayesian-optimal testing algorithm

Statistical Experiment

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

φBayes arg min

φ

Pe(φ) (φ) Θ∼π [ (φ; Θ)] =

  • θ∈Θ

π(θ)(φ; θ) =

  • θΘ
  • x:φ(x)=θ

π(θ)Pθ(x) = 1 −

  • θ∈Θ
  • x:φ(x)=θ

π(θ)Pθ(x)

slide-45
SLIDE 45

45

Statistical Experiment

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

φBayes arg min

φ

Pe(φ) to maximize this term, we should pick (φ) = 1 −

  • θ∈Θ
  • x:φ(x)=θ

π(θ)Pθ(x) = 1 −

  • θ,x

π(θ)Pθ(x) {φ(x) = θ} φ(x) = arg max

θ∈Θ

π(θ)Pθ(x)

slide-46
SLIDE 46

46

MAP rule

Statistical Experiment

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

Theorem: φBayes arg min

φ

Pe(φ) φBayes(x) = arg max

θ∈Θ

{π(θ)Pθ(x)}

this is a mapping from to x θ

Thinking of as jointly distributed random variables: (Θ, X) = ⇒ (Θ, X) ∼ PΘ,X, PΘ,X(θ, x) ≡ π(θ)Pθ(x)

slide-47
SLIDE 47

φBayes(x) = arg max

θ∈Θ

{PΘ,X(θ, x)}

47

Statistical Experiment

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

φBayes arg min

φ

Pe(φ)

this is a mapping from to x θ

Notice that for any observed x : PΘ,X(θ, x) ∝ PΘ|X(θ|x) Theorem:

slide-48
SLIDE 48

48

Statistical Experiment

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

φBayes arg min

φ

Pe(φ)

this is a mapping from to x θ

φBayes(x) = arg max

θ∈Θ

  • PΘ|X(θ|x)
  • Posterior probability: the probability of the hidden object

after observing θ x Prior probability: the probability of the hidden object without observing anything θ Maximum a posteriori (MAP) decision rule Theorem:

slide-49
SLIDE 49

49

Maximum Likelihood (ML)

  • Often the prior distribution is uniform, that is, no preference on the hidden index
  • In this special case, MAP rule is equivalent to the maximum likelihood (ML) rule

Statistical Experiment

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

φBayes(x) = arg max

θ∈Θ

{π(θ)Pθ(x)} = arg max

θ∈Θ

{Pθ(x)} Pθ(x) ≡ PX|Θ(x|θ) φMAP(x) ≡ φML(x) arg max

θ∈Θ

  • PX|Θ(x|θ)
  • likelihood function
slide-50
SLIDE 50

Optimal detection in i.i.d. additive Gaussian noise

50

V = u + Z

Detection

ˆ u = aˆ

θ

ˆ Θ = φ(V )

u = aθ ∈ A {a1, . . . , aM}

Detection in QAM demodulation

Z ∼ CN(0, N0)

A more general setting: vector detection under i.i.d. Gaussian noise

Z Z(I) + jZ(Q), Z(I), Z(Q) i.i.d. ∼ N(0, N0

2 )

We can view the problem as detection in two-dimensional Euclidean space!

V = u + Z

Detection

ˆ Θ = φ(V ) ˆ u = aˆ

θ

u = aθ ∈ A {a1, . . . , aM} ⊆ Rn

In words, the n coordinates of u experience i.i.d. Gaussian noise!

Z ∼ N(0, σ2In)

slide-51
SLIDE 51

51

V = u + Z

Detection

ˆ Θ = φ(V ) ˆ u = aˆ

θ

u = aθ ∈ A {a1, . . . , aM} ⊆ Rn Z ∼ N(0, σ2In)

Likelihood:

Derivation of the ML detection rule

fθ(v) =

n

  • i=1

1 √ 2πσ2 exp

  • − 1

2σ2 (vi − aθ,i)2

  • = (2πσ2)− n

2 exp

n

i=1(vi − aθ,i)2

2σ2

  • = (2πσ2)− n

2 exp

  • 1

2σ2 v aθ2

  • Maximum likelihood is equivalent to minimum Euclidean distance!

φML(v) = argmax

θ∈{1,...,M}

fθ(v) = argmin

θ∈{1,...,M}

v aθ .

slide-52
SLIDE 52

D0 D1

Decision region

52

Definition:

For a deterministic test , the decision region for hypothesis Hθ Dθ(φ) {x ∈ X | φ(x) = θ} . 2-PAM φ : X → Θ

d −d 1 d −d −3d 3d 01 11 00 10

D00 D01 D10 D11

4-PAM

slide-53
SLIDE 53

0110 1110 0010 1010 0111 1111 0011 1011 0101 1101 0001 1001 0100 1100 0000 1000

53

16-QAM

000 001 011 111 101 100 110 010

8-PSK

slide-54
SLIDE 54

Binary detection

  • The optimal detection rule for M-ary detection in the n-dimensional space under

i.i.d. additive Gaussian noise is the minimum distance rule:

  • This result is quite general for any M and n, as long as the noise is i.i.d. Gaussian

across all n coordinates

  • However, this requires computational resources scale with the dimension n
  • For binary detection problems (M = 2), it turns out that making decision based
  • n a scalar is sufficient to achieve optimal performance!

54

φMD(v) = argmin

θ∈Θ

v aθ .

slide-55
SLIDE 55

˜ v

55

a1 a2

Rn

v φMD(v) = 2

a a2→1

˜ v v a, a2→1 a2→1

a a1+a2

2

, a2→1 a1−a2

2

sufficient statistics

slide-56
SLIDE 56

56

R

˜ v

a1 a2 ˜ a1 ˜ a2

  • ˜

a1 = 1

2 a1 a2

˜ a2 = 1

2 a1 a2

Equivalent to binary PAM!

slide-57
SLIDE 57

Decision Making

ˆ Θ = φ(X)

  • 57

Sufficient statistics

Statistical Experiment

θ ∈ Θ

X θ

X ∼ Pθ Decision Making

ˆ Θ = φ(X)

  • Statistical

Experiment

θ ∈ Θ

X θ

X ∼ Pθ Decision Making

˜ φ(·) ˆ Θ = ˜ φ(T)

Dimension Reduction

f(·) T = f(X)

⟹ T is a sufficient statistic!

slide-58
SLIDE 58

Decision Making

˜ φ(·) ˆ Θ = ˜ φ(T)

58

Statistical Experiment

θ ∈ Θ

X θ

X ∼ Pθ Dimension Reduction

f(·) T = f(X)

Definition:

T = f(X) is a sufficient statistic if the conditional distribution

  • f X given T does not depend on the underlying parameter θ
slide-59
SLIDE 59

Decision Making

˜ φ(·) ˆ Θ = ˜ φ(T)

59

Statistical Experiment

θ ∈ Θ

X θ

X ∼ Pθ Dimension Reduction

f(·) T = f(X)

Theorem:

T = f(X) is a sufficient statistic if and only if the distribution

  • f X can be factorized as Pθ(x) = h(x)g(t; θ).