[PPT] - Lecture 03 Optimal Detection under Noise I-Hsiang Wang PowerPoint Presentation

SLIDE 1

Principle of Communications, Fall 2017

Lecture 03 Optimal Detection under Noise

I-Hsiang Wang

ihwang@ntu.edu.tw National Taiwan University 2017/9/28

SLIDE 2

2

passband waveform

Pulse Shaper Sampler + Filter

discrete sequence

Up Converter Down Converter

baseband waveform

Noisy Channel {um} {ˆ um} xb(t) yb(t)

y(t) x(t)

Previous lecture: y(t) = x(t)

= ⇒ we can guarantee yb(t) = xb(t) for all t and um = ˆ um, for all m.

SLIDE 3

Y (t) = x(t) + Z(t)

3

passband waveform

Pulse Shaper Sampler + Filter

discrete sequence

Up Converter Down Converter

baseband waveform

Noisy Channel {um} {ˆ um} xb(t) yb(t)

y(t) x(t)

This lecture:

additive noise

Questions to be addressed

How to model the noise?

Answer

Equivalent noise after down-conversion, filtering, and sampling? Z(t) as white Gaussian process Vm = um + Zm Discrete-time equivalent: Optimal decision rule that minimizes error probability

Filter + Sampler +

Detection

How to find the best from ? ˆ um Vm

SLIDE 4

Outline

Random processes
Statistical model of noise
Hypothesis testing
Optimal detection rules under additive noise
Performance analysis

4

SLIDE 5

5

Part I. Random Processes

Definition, Gaussian Processes, Stationarity, Power Spectral Density, Filtering

SLIDE 6

Random variables

6

Probability space

(Ω, F, P)

sample space sigma field probability measure

Random variable

X : Ω R, ω X(ω).

Distribution

FX(x) P{X(ω) ≤ x}, x ∈ R. (Ω, F, P)

X

− → (R, B, FX) Roughly speaking:

cumulative distribution function (CDF)

SLIDE 7

Random process

7

A collection of of jointly distributed random variables:

{X(ω; t) | t ∈ I}

I is uncountable (I = R) → random waveform {Xm : m ∈ Z} I is countable (I = Z) → random sequence {X(t) : t ∈ R}

Distribution is determined by the joint distribution of {X(ω; t) | t ∈ S} for all finite subsets S ⊆ I FX(t1),...,X(tn)(x1, . . . , xn) P {X(ω; t1) ≤ x1, . . . , X(ω; tn) ≤ xn} for all positive integer n and finite subset {t1, t2, . . . , tn} ⊆ I

SLIDE 8

Random process

8

First moment µX(t) E [X(t)] Second moment X(ω; t)

F

← → ˘ X(ω; f) = ∞

−∞

X(ω; t) exp (−j2πft) dt Fourier transform (auto-covariance function) KX(s, t) Cov (X(s), X(t)) = E [(X(s) − µX(s))(X(t) − µX(t))] RX(s, t) E [X(s)X(t)] (auto-correlation function)

SLIDE 9

Gaussian random variable

9

X ∼ N(µ, σ2)

mean variance

Gaussian probability density function (PDF) fX(x) ∂ ∂xFX(x) = 1 √ 2πσ2 exp

− |x − µ|2

2σ2

SLIDE 10

Jointly Gaussian random variables

10

Definition (Jointly Gaussian) {Z1, Z2, . . . , Zn} : J.G. ⇐ ⇒ ∃ m ≤ n, W1, . . . , Wm

i.i.d.

∼ N(0, 1) constant vector b ∈ Rn a ∈ Rn×m constant matrix such that Z    Z1 . . . Zn    = aW + b, where W    W1 . . . Wm   .

jointly Gaussian random vector

SLIDE 11

Jointly Gaussian random variables

11

First moment

Z ∼ N (µ, k) ,

µ E[Z] =    E[Z1] . . . E[Zn]    Second moment k E[(Z − µ)(Z − µ)] =      Var[Z1] Cov(Z1, Z2) · · · Cov(Z1, Zn) Cov(Z2, Z1) Var[Z1] · · · Cov(Z2, Zn) . . . ... . . . Cov(Zn, Z1) · · · Var[Zn]     

SLIDE 12

Jointly Gaussian random variables

12

Z ∼ N (µ, k) ,

PDF of jointly Gaussian random vector fZ(z) = 1 (2π)

n 2

det(k) exp

−1

2(z − µ)k−1(z − µ)

Important fact:

linear combinations of jointly Gaussian random variables are also jointly Gaussian

SLIDE 13

Gaussian process

13

Definition (Gaussian Process) {Z(t) | t ∈ I} is a Gaussian process ⇐ ⇒ ∀ n ∈ N and {t1, . . . , tn} ⊆ I, {Z(t1), . . . , Z(tn)} : J.G. Theorem (Distribution of Gaussian Process) Distribution of a Gaussian process {Z(t)} is completely determined by its first and second moments µZ(t), KZ(s, t)

SLIDE 14

Orthonormal expansion of Gaussian processes

For the Gaussian processes considered in this course, we assume that it can be

expanded over an orthonormal basis with independent Gaussian coefficients:

This process is zero-mean, but this is WLOG when modeling the noise
The energy of the random waveform is also random:
Auto-covariance of this Gaussian process is

14

Z(t) =

∞

k=1

Zk φk(t), Zk ∼ N(0, σ2

k)

: mutually independent.

∥Z(t)∥2 = ∞

k=1 Z2 k

KZ(s, t) = Cov ∞

k=1

∞

m=1

ZkZm φk(s)φm(t)

=

∞

k=1

σ2

k φk(s)φk(t).

quite general

SLIDE 15

Stationary processes

15

Statistical properties of the noise tend to behave in a time invariant manner
This motivates us to define stationary processes:
Hard to check
Definition (Wide-sense stationary)
Mean function is fixed:
Auto-covariance is only a function of time difference:
A Gaussian process is stationary iff it is WSS.

is stationary and are identically distributed for any time shift is wide-sense stationary (WSS) the first and second moments are time-shift invariant, that is,

for all . for all and .

SLIDE 16

Power spectral density

16

In the following, we will define the power spectral density of a zero-mean WSS

random process as the Fourier transform of the auto-covariance (auto- correlation) function:

Several useful properties for a WSS real-valued random process {X(t)}
Auto-correlation and auto-covariance are both even functions:
Its PSD is real and even:
PSD is non-negative, and

RX(τ)

F

← → SX(f).

RX(τ) = RX(−τ), KX(τ) = KX(−τ) SX(f) = S∗

X(f) = SX(−f)

E

|X(t)|2

= ∞

−∞

SX(f) df.

SLIDE 17

Energy spectral density of a deterministic waveform

17

narrow-band filter
For a deterministic signal x(t), its energy spectral density

is the energy per unit frequency (hertz) at each frequency Operational Definition: Ex(f0) lim

∆f→0

energy of output y(t; f0, ∆f) ∆f ↑ the square of the freq. response = |˘ x(f0)|2 .

SLIDE 18

Auto-correlation of a deterministic waveform

18

For a deterministic signal x(t), its auto-correlation is the inverse Fourier transform of the energy spectral density. Rx(t) ∞

−∞

x(τ)x∗(τ − t) dτ

F

← → Ex(f)

Auto-correlation is effectively the convolution of x(t) with x*(-t)

In PAM demodulation, the Rx filter q(t) = p*(-t). The filtering block is called “correlator” in some literatures

SLIDE 19

Power spectral density of a deterministic waveform

19

For a deterministic signal x(t), its power spectral density is the power per unit frequency (hertz) at each frequency Power = Energy/Time Operational Definition:

multiplication with a small time interval
xt0(t) x(t)1
t ∈
−t0

2 , t0 2

Sx(f) lim

t0→∞

1 t0 Ext0(f).

SLIDE 20

PSD of random process vs. deterministic wavform

20

Inverse Fourier transform of the power spectral density is the long-term time average of the auto-correlation! Rx(t) lim

t0→∞

1 t0

t0

2

− t0

2

x(τ)x∗(τ − t) dτ

F

← → Sx(f) For an important class of random processes called ergodic processes, the long-term time average is equal to the statistical average: RX(t) = E [X(τ)X∗(τ − t)] = RX(t) = ⇒ SX(f) = F {RX(t)} = SX(f) ↑ the reason why it is called PSD!

ergodicity

SLIDE 21

Filtering of random processes

We are primarily interested in the following properties of a random process after

passing through an LTI filter:

First and second moments
Stationarity
Gaussianity
Stationarity/wide-sense stationarity is preserved under LTI filtering
Gaussianity is preserved under LTI filtering
For a WSS process, the PSD of the filtered process is the PSD of the original

process times the ESD of the LTI filter

21

SLIDE 22

First and second moments

22

X(t) h(t) Y (t) = (X ∗ h)(t) First moment: µY (t) = (µX ∗ h)(t) Second moment: KY (s, t) = ∞

−∞

∞

−∞

h(s − ξ)KX(ξ, τ)h(t − τ) dξdτ

SLIDE 23

Stationarity

23

X(t) h(t) Y (t) = (X ∗ h)(t) First moment: µY = µX˘ h(0) Second moment: KY (τ) = (h ∗ KX ∗ hrv) (τ)

WSS WSS Stationary Stationary

Power spectral density: SY (f) = |˘ h(f)|2 SX(f)

SLIDE 24

X1(t)

Two branches of filtered processes

24

X2(t) Y1(t) = (X1 ∗ h1)(t) Y2(t) = (X2 ∗ h2)(t) h1(t) h2(t) Definition (jointly WSS):

jointly WSS jointly WSS

{X1(t)} and {X2(t)} are jointly WSS if they are both WSS and the cross-covariance KX1,X2(t + τ, t) Cov (X1(s), X2(t)) depends on τ only

SLIDE 25

Two branches of filtered processes

25

X1(t) X2(t) Y1(t) = (X1 ∗ h1)(t) Y2(t) = (X2 ∗ h2)(t) h1(t) h2(t)

jointly WSS jointly WSS

KY1,Y2(τ) = (h1 ∗ KX1,X2 ∗ h2,rv) (τ) Cross-covariance: Cross PSD: SY1,Y2(f) F {KY1,Y2(τ)} = ˘ h1(f)SX1,X2(f)˘ h∗

2(f)

SLIDE 26

Projecting random processes

26

X1(t) X2(t) g1(t) g2(t) W1 = X1(t), g1(t) W2 = X2(t), g2(t)

∞

−∞

X1(t)g∗

1(t) dt

∞

−∞

X2(t)g∗

2(t) dt

First moment: E [Wi] = µXi(t), gi(t) Second moment: Cov (W1, W2) = ∞

−∞

∞

−∞

g2(t)KX1,X2(s, t)g∗

1(s) ds dt

SLIDE 27

Projecting random processes

27

X1(t) X2(t) g1(t) g2(t) W1 = X1(t), g1(t) W2 = X2(t), g2(t)

∞

−∞

X1(t)g∗

1(t) dt

∞

−∞

X2(t)g∗

2(t) dt

First moment: E [Wi] = µX ˘ g∗

i (0)

Second moment: Cov (W1, W2) = (g2 KX1,X2)(t), g1(t)

jointly WSS

SLIDE 28

Gaussianity

28

h(t)

Gaussian Gaussian

Z(t) U(t) = (Z ∗ h)(t)

Gaussian Process

              

Jointly Gaussian

Z(t) g1(t) gm(t) V1 = Z(t), g1(t) Vm = Z(t), gm(t)

∞

−∞

Z(t)g∗

1(t) dt

SLIDE 29

29

Part II. Statistical Model of Noise

Additive White Gaussian Noise (AWGN), Equivalent Complex Baseband Noise

SLIDE 30

Additive noise

30

Channel Output ≠ Channel Input

Noisy Channel x(t) y(t)

Simplest noise model: Y (t) = x(t) + Z(t)

some random process

ps. scaling of x(t) is absorbed into

the random process

Physical properties of the additive noise

Noise = aggregation of many additive “sub-noises” (thermal noise, device imperfection, etc.) Sub-noises are zero-mean WLOG and statistically independent All sub-noises contribute about the same energy to the total noise Sub-noises are rather stationary in the duration of interest

SLIDE 31

Gaussian process modeling the random noise

31

Recall the Central Limit Theorem:
The sum of zero-mean i.i.d. random variables converges to a Gaussian in distribution:
Other reasons of using Gaussian as the model of the additive noise:
Under a variance constraint, Gaussian noise is the “worst-case” noise in the sense that it

is the most “random” and yields smallest channel capacity

Gaussian is easy to manipulate analytically, which gives great insights about more

complex models

Hence, model the physical noise as a zero-mean stationary Gaussian process

1 √n

n

i=1

Xi

d

→ N(0, σ2) as n → ∞

SLIDE 32

White Gaussian noise

For the time interval of interest, the noise waveform is rather stationary
⟹ Assume the noise random process is stationary (from t = -∞ to ∞)
For the frequency band of interest, the noise waveform has roughly constant

energy spectral density

⟹ Assume the PSD of the noise random process is constant (from f = -∞ to ∞)
White Gaussian process: a zero-mean stationary Gaussian process {W(t)} with
WGN combined with linear filters can generate any stationary Gaussian process.

32

KW (τ) = N0 2 δ(τ) SW (f) = N0 2

SLIDE 33

White Gaussian noise process, combined with LTI filters, can be used to

generate (almost) all kinds of stationary zero-mean Gaussian processes.

For a stationary zero-mean Gaussian process {Z(t)} with PSD :
Choose an LTI filter with frequency response
Recall: , real-valued, and even.
Hence, so does is also real-valued and even!
Passing WGN {W(t)} through the LTI filter h(t) results in {Z(t)}.
For most communication systems, there are band-limited filters at the receiver.

Hence, the PSD of the filtered noise process is also band-limited.

WGN passed through band-limited filters

33

SZ(f)

˘ h(f) =

2 N0

SZ(f)

SZ(f) ≥ 0 ˘ h(f) = ⇒ h(t)

Z(t) = (W ∗ h)(t)

h(t)

WGN with PSD

˘

h(f) =

2 N0

SZ(f)

SZ(f) = N0

2

˘

h(f)

2

SLIDE 34

WGN projected onto an orthonormal set

34

WGN with PSD



             

i.i.d. Gaussians!

Cov (Wi, Wj) = ⟨(gj ∗ KW )(t), gi(t)⟩ = ⟨ N0

2 gj(t), gi(t)⟩

= N0

2 1 {i = j}

φ1(t)

W1 = W(t), φ1(t) φm(t) Wm = W(t), φm(t)

SLIDE 35

Equivalent noise in QAM demodulation

35

x(t) + Z(t) √ 2 cos(2πfct) − √ 2 sin(2πfct) Filter q(t) Filter q(t) T =

1 2W

T =

1 2W

{u(I)

m + Z(I) m }

{u(Q)

m

+ Z(Q)

m }

SLIDE 36

36

√ 2 cos(2πfct) − √ 2 sin(2πfct) Filter q(t) Filter q(t) T =

1 2W

T =

1 2W

Z(t) {Z(I)

m }

{Z(Q)

m }

Recall: QAM demodulation is equivalent to projection onto an orthonormal set

SLIDE 37

37

Z(t) {Z(I)

m }

{Z(Q)

m }

Recall: QAM demodulation is equivalent to projection onto an orthonormal set Project onto ψ(I)

m (t), m ∈ Z

Project onto ψ(Q)

m (t), m ∈ Z

Hence, Z(I)

m , Z(Q) m i.i.d.

∼ N(0, N0

2 ),

∀ m ∈ Z

p(t − mT) √ 2 cos(2πfct) ← → ψ(I)

m (t)

−p(t − mT) √ 2 sin(2πfct) ← → ψ(Q)

m (t)

SLIDE 38

38

Circular symmetric complex Gaussian

In the equivalent complex baseband model, recall that we use the in-phase

component as the real part and the quadrature component as the imaginary part.

Following the same spirit, we can define the equivalent complex baseband

demodulated symbols as

The equivalent complex baseband noise is complex, with i.i.d. Gaussian real and

imaginary parts:

This is called circularly symmetric Gaussian:

um u(I)

m + ju(Q) m

Vm um + Zm, Zm Z(I)

m + jZ(Q) m

Zm Z(I)

m + jZ(Q) m ,

Z(I)

m , Z(Q) m i.i.d.

∼ N(0, N0

2 )

Zm ∼ CN(0, N0)

SLIDE 39

Summary: the additive noise model

39

Pulse Shaper Filter + Sampler

discrete sequence

Up Converter Down Converter

baseband waveform

Noisy Channel

passband waveform

Vm = um + Zm um

Equivalent complex baseband additive noise channel model

Zm

i.i.d.

∼ CN(0, N0), ∀ m

SLIDE 40

40

Part III. Optimal Detection Rules

Basic Hypothesis Testing, Performance Metrics, MAP Detection, Maximum Likelihood Detection, Minimum Distance Rule

SLIDE 41

Detection: problem statement

41

u

Equivalent complex baseband additive noise channel model

V = u + Z

Detection the symbol mapper maps the bits to a symbol in the constellation set u ∈ A {a1, . . . , aM}

The above is a special case of hypothesis testing problems = aθ ˆ u = aˆ

θ

ˆ Θ = φ(V )

Statistical Experiment

Pθ

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

SLIDE 42

Performance metric: probability of error

For any give index , we use the probability of error to measure the

performance of the decision making algorithm :

It is quite obvious that a single decision making algorithm cannot do

simultaneously well for all

Hence, there are different kinds of formulations

42

Statistical Experiment

Pθ

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

θ ∈ Θ φ(·) (φ; θ) Pθ {φ(X) = θ} θ ∈ Θ

SLIDE 43

φMinimax arg min

φ

max

θ

Pe(φ; θ)

43

Statistical Experiment

Pθ

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

Bayesian formulation

suppose the index is random and distributed as Θ ∼ π Goal: find a decision making algorithm s.t. the avg. probability of error is minimized φ(·)

(φ) Θ∼π [ (φ; Θ)] φBayes arg min

φ

Pe(φ) Minimax formulation

no assumption on the prior distribution of the index Goal: find a decision making algorithm s.t. worst-case prob. of error is minimized φ(·)

SLIDE 44

44

Bayesian-optimal testing algorithm

Statistical Experiment

Pθ

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

φBayes arg min

φ

Pe(φ) (φ) Θ∼π [ (φ; Θ)] =

θ∈Θ

π(θ)(φ; θ) =

θΘ
x:φ(x)=θ

π(θ)Pθ(x) = 1 −

θ∈Θ
x:φ(x)=θ

π(θ)Pθ(x)

SLIDE 45

45

Statistical Experiment

Pθ

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

φBayes arg min

φ

Pe(φ) to maximize this term, we should pick (φ) = 1 −

θ∈Θ
x:φ(x)=θ

π(θ)Pθ(x) = 1 −

θ,x

π(θ)Pθ(x) {φ(x) = θ} φ(x) = arg max

θ∈Θ

π(θ)Pθ(x)

SLIDE 46

46

MAP rule

Statistical Experiment

Pθ

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

Theorem: φBayes arg min

φ

Pe(φ) φBayes(x) = arg max

θ∈Θ

{π(θ)Pθ(x)}

this is a mapping from to x θ

Thinking of as jointly distributed random variables: (Θ, X) = ⇒ (Θ, X) ∼ PΘ,X, PΘ,X(θ, x) ≡ π(θ)Pθ(x)

SLIDE 47

φBayes(x) = arg max

θ∈Θ

{PΘ,X(θ, x)}

47

Statistical Experiment

Pθ

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

φBayes arg min

φ

Pe(φ)

this is a mapping from to x θ

Notice that for any observed x : PΘ,X(θ, x) ∝ PΘ|X(θ|x) Theorem:

SLIDE 48

48

Statistical Experiment

Pθ

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

φBayes arg min

φ

Pe(φ)

this is a mapping from to x θ

φBayes(x) = arg max

θ∈Θ

PΘ|X(θ|x)
Posterior probability: the probability of the hidden object

after observing θ x Prior probability: the probability of the hidden object without observing anything θ Maximum a posteriori (MAP) decision rule Theorem:

SLIDE 49

49

Maximum Likelihood (ML)

Often the prior distribution is uniform, that is, no preference on the hidden index
In this special case, MAP rule is equivalent to the maximum likelihood (ML) rule

Statistical Experiment

Pθ

θ ∈ Θ Decision Making

ˆ Θ = φ (X) X θ

X ∼ Pθ

φBayes(x) = arg max

θ∈Θ

{π(θ)Pθ(x)} = arg max

θ∈Θ

{Pθ(x)} Pθ(x) ≡ PX|Θ(x|θ) φMAP(x) ≡ φML(x) arg max

θ∈Θ

PX|Θ(x|θ)
likelihood function

SLIDE 50

Optimal detection in i.i.d. additive Gaussian noise

50

V = u + Z

Detection

ˆ u = aˆ

θ

ˆ Θ = φ(V )

u = aθ ∈ A {a1, . . . , aM}

Detection in QAM demodulation

Z ∼ CN(0, N0)

A more general setting: vector detection under i.i.d. Gaussian noise

Z Z(I) + jZ(Q), Z(I), Z(Q) i.i.d. ∼ N(0, N0

2 )

We can view the problem as detection in two-dimensional Euclidean space!

V = u + Z

Detection

ˆ Θ = φ(V ) ˆ u = aˆ

θ

u = aθ ∈ A {a1, . . . , aM} ⊆ Rn

In words, the n coordinates of u experience i.i.d. Gaussian noise!

Z ∼ N(0, σ2In)

SLIDE 51

51

V = u + Z

Detection

ˆ Θ = φ(V ) ˆ u = aˆ

θ

u = aθ ∈ A {a1, . . . , aM} ⊆ Rn Z ∼ N(0, σ2In)

Likelihood:

Derivation of the ML detection rule

fθ(v) =

n

i=1

1 √ 2πσ2 exp

− 1

2σ2 (vi − aθ,i)2

= (2πσ2)− n

2 exp

−

n

i=1(vi − aθ,i)2

2σ2

= (2πσ2)− n

2 exp

1

2σ2 v aθ2

Maximum likelihood is equivalent to minimum Euclidean distance!

φML(v) = argmax

θ∈{1,...,M}

fθ(v) = argmin

θ∈{1,...,M}

v aθ .

SLIDE 52

D0 D1

Decision region

52

Definition:

For a deterministic test , the decision region for hypothesis Hθ Dθ(φ) {x ∈ X | φ(x) = θ} . 2-PAM φ : X → Θ

d −d 1 d −d −3d 3d 01 11 00 10

D00 D01 D10 D11

4-PAM

SLIDE 53

0110 1110 0010 1010 0111 1111 0011 1011 0101 1101 0001 1001 0100 1100 0000 1000

53

16-QAM

000 001 011 111 101 100 110 010

8-PSK

SLIDE 54

Binary detection

The optimal detection rule for M-ary detection in the n-dimensional space under

i.i.d. additive Gaussian noise is the minimum distance rule:

This result is quite general for any M and n, as long as the noise is i.i.d. Gaussian

across all n coordinates

However, this requires computational resources scale with the dimension n
For binary detection problems (M = 2), it turns out that making decision based
n a scalar is sufficient to achieve optimal performance!

54

φMD(v) = argmin

θ∈Θ

v aθ .

SLIDE 55

˜ v

55

a1 a2

Rn

v φMD(v) = 2

a a2→1

˜ v v a, a2→1 a2→1

a a1+a2

2

, a2→1 a1−a2

2

sufficient statistics

SLIDE 56

56

R

˜ v

a1 a2 ˜ a1 ˜ a2

˜

a1 = 1

2 a1 a2

˜ a2 = 1

2 a1 a2

Equivalent to binary PAM!

SLIDE 57

Decision Making

ˆ Θ = φ(X)

57

Sufficient statistics

Statistical Experiment

Pθ

θ ∈ Θ

X θ

X ∼ Pθ Decision Making

ˆ Θ = φ(X)

Statistical

Experiment

Pθ

θ ∈ Θ

X θ

X ∼ Pθ Decision Making

˜ φ(·) ˆ Θ = ˜ φ(T)

Dimension Reduction

f(·) T = f(X)

≡

⟹ T is a sufficient statistic!

SLIDE 58

Decision Making

˜ φ(·) ˆ Θ = ˜ φ(T)

58

Statistical Experiment

Pθ

θ ∈ Θ

X θ

X ∼ Pθ Dimension Reduction

f(·) T = f(X)

Definition:

T = f(X) is a sufficient statistic if the conditional distribution

f X given T does not depend on the underlying parameter θ

SLIDE 59

Decision Making

˜ φ(·) ˆ Θ = ˜ φ(T)

59

Statistical Experiment

Pθ

θ ∈ Θ

X θ

X ∼ Pθ Dimension Reduction

f(·) T = f(X)

Theorem:

T = f(X) is a sufficient statistic if and only if the distribution

f X can be factorized as Pθ(x) = h(x)g(t; θ).

SLIDE 60

60

Part IV. Performance Analysis

Probability of Error, Signal-to-Noise Ratio, Asymptotical Performance, Union Bound

SLIDE 61

= P

N(0, 1) <

−d

√

N0/2

= Q
d

√

N0/2

= P
N(0, 1) ≥

d

√

N0/2

= Q
d

√

N0/2

61

Binary PAM performance

d −d

D0 D1

1

θ = 0 Under hypothesis H0 θ = 1 H1 Under hypothesis

aθ = −d, V = −d + Z aθ = +d, V = d + Z

(φML; 0) = P0{V ∈ D1} = {−d + Z ≥ 0} = {Z > d} (φML; 1) = P1{V ∈ D0} = {d + Z < 0} = {Z < −d}

Pe(φML) = Pe(φML; 0) = Pe(φML; 1) = Q

d

√

N0/2

SLIDE 62

Q Function

62

t

1 √ 2π exp

− 1

2t2

Q (x) P{N(0, 1) > x} = ∞

x

1 √ 2π exp

−1

2t2

dt

= P{N(0, 1) < −x} x

Tail of standard Gaussian distribution

Q(x) is a decreasing function Q(0) = 1/2 Q (∞) = 0, Q (−∞) = 1 Q (x) + Q (−x) = 1

SLIDE 63

63

1 √ 2π exp

− 1

2t2

Q (x) P{N(0, 1) > x} x

Tail of standard Gaussian distribution

Bounds and asymptotes of Q function

in words, the asymptotic behavior of Q(x) is exponentially decaying with rate x2/2

Q (x) ≤ 1 2 exp

−1

2x2

,

∀ x ≥ 0 Q (x) ≤ 1 x √ 2π exp

−1

2x2

,

∀ x ≥ 0 Q (x) ≥

1 − 1

x2

1

x √ 2π exp

−1

2x2

,

∀ x ≥ 0 lim

x→∞

ln Q (x) −x2/2 = 1 ⇐ ⇒ Q (x) . = exp

−1

2x2

SLIDE 64

Binary vector detection performance

64

˜ v

a1 a2

Rn

v

a a2→1

R

˜ v

a1 a2 ˜ a1 ˜ a2

≡

Pe(φML) = Pe(φML; 1) = Pe(φML; 2) = Q

∥a1−a2∥

2σ

SLIDE 65

Signal-to-noise ratio (SNR)

Signal-to-noise ratio (SNR) tells you how good the additive-noise channel is:
For binary PAM (passband):
Both in-phase and quadrature noises need to be counted ⟹ total noise variance = N0.
Average symbol energy = d 2.
Hence,
SNR characterizes optimal performance of detection: for passband binary PAM,

65

SNR average symbol energy total noise variance

SNR = d2

N0

Pe(φML) = Q

d

√

N0/2

= Q

√ 2SNR . = exp(−SNR)

SLIDE 66

Pe(φML; 00) = Q

d

√

N0/2

General PAM performance

66

d −d −3d 3d 01 11 00 10

D00 D01 D10 D11

θ = 00 d

SLIDE 67

Pe(φML; 01) = 2Q

d

√

N0/2

67

d −d −3d 3d 01 11 00 10

D00 D01 D10 D11

θ = 01 d d Pe(φML; 00) = Q

d

√

N0/2

SLIDE 68

68

d −d −3d 3d 01 11 00 10

D00 D01 D10 D11

Pe(φML; 11) = 2Q

d

√

N0/2

θ = 11

Pe(φML; 00) = Q

d

√

N0/2

Pe(φML; 01) = 2Q
d

√

N0/2

SLIDE 69

69

d −d −3d 3d 01 11 00 10

D00 D01 D10 D11

θ = 10 Pe(φML; 10) = Q

d

√

N0/2

Pe(φML; 00) = Q
d

√

N0/2

Pe(φML; 01) = 2Q
d

√

N0/2

Pe(φML; 11) = 2Q
d

√

N0/2

SLIDE 70

70

d −d −3d 3d 01 11 00 10

D00 D01 D10 D11

Pe(φML; 00) = Q

d

√

N0/2

Pe(φML; 01) = 2Q
d

√

N0/2

Pe(φML; 11) = 2Q
d

√

N0/2

Pe(φML; 10) = Q
d

√

N0/2

=

⇒ Pe(φML) = 1+2+2+1

4

Q

d

√

N0/2

= 3

2Q

d

√

N0/2

SNR = 5d2

N0

= 3

2Q

2

5SNR

. = exp(− 1

5SNR)

SLIDE 71

71

QAM performance

Performance analysis of optimal detection in QAM is similar to that of PAM, since
the two noises in the in-phase and quadrature parts are independent
QAM constellation is a direct product of two PAM
The only twist is that, the decision region is more complicated that that of PAM
Below, we use 4-QAM and 16-QAM as two examples

SLIDE 72

D10

4-QAM (4-PSK) performance

72

Q ≡ Q

d

√

N0/2

01

11 00 10

P10{V1 > 0, V2 < 0} = P10{V1 > 0}P10{V2 < 0}

independence of the two noises

= P{N(0, N0/2) > −d}P{N(0, N0/2) < d} = (1 − Q)2 Probability of success

θ = 10

= ⇒ Pe(φML; 10) = 1 − (1 − Q)2 = 2Q − Q2 Probability of error By symmetry, Pe(φML) = Pe(φML; 10) = 2Q − Q2

= Q √ SNR

SNR = 2d2

N0

SLIDE 73

D10 D10

High SNR asymptote

73

D10

01 11 00 10

Pe(φML) = 2Q √ SNR

−
Q

√ SNR 2 . = exp(− 1

2SNR)

Pe(φML) = P10{V ∈ V1 ∪ V2} Simple bounds suffice for high SNR asymptote V1 V2 = ⇒ Pe(φML) ≤ P10{V ∈ V1} + P10{V ∈ V2} Pe(φML) ≥ max (P10{V ∈ V1}, P10{V ∈ V2})

P10{V ∈ V1} = P{N(0, N0/2) < −d} = Q P10{V ∈ V2} = P{N(0, N0/2) > d} = Q

= ⇒ Q √ SNR

≤ Pe(φML) ≤ 2Q

√ SNR

SLIDE 74

Repetition 4-PAM v.s. 4-QAM

74 01 11 00 10 01 11 00 10

4-QAM 4-PAM

repetition does not improve performance under fixed SNR

Pe . = exp(− 1

2SNR)

Pe . = exp(− 1

5SNR)

To achieve the same performance, 4-PAM requires 1.5X more power than 4-QAM! QAM exploits the total degrees of freedom better than PAM under fixed power constraint

SLIDE 75

16-QAM performance

75

0110 1110 0010 1010 0111 1111 0011 1011 0101 1101 0001 1001 0100 1100 0000 1000

Exact performance can be computed It is simpler to use pairwise probability of error to find bounds on the performance!

Pe(φML; θ) = Pθ

i̸=θ{V ∈ Di}
=

⇒ maxi̸=θ Pθ {V ∈ Di} ≤ Pe(φML; θ) ≤

i̸=θ Pθ {V ∈ Di}

Pairwise error probability is simple to compute because it equals that of binary detection!

Pθ {V ∈ Di} = Q

|ai−aθ|

2√ N0/2

≤ Q
dmin

√2N0

=

⇒ Q

dmin

√2N0

≤ Pe(φML; θ) ≤ (M − 1)Q
dmin

√2N0

=

⇒ Pe(φML) . = exp

− d2

min

4N0

∀ θ