Principle of Communications, Fall 2017
Lecture 03 Optimal Detection under Noise
I-Hsiang Wang
ihwang@ntu.edu.tw National Taiwan University 2017/9/28
Lecture 03 Optimal Detection under Noise I-Hsiang Wang - - PowerPoint PPT Presentation
Principle of Communications, Fall 2017 Lecture 03 Optimal Detection under Noise I-Hsiang Wang ihwang@ntu.edu.tw National Taiwan University 2017/9/28 x ( t ) Pulse Up { u m } x b ( t ) Shaper Converter discrete baseband passband Noisy
Principle of Communications, Fall 2017
ihwang@ntu.edu.tw National Taiwan University 2017/9/28
2
passband waveform
Pulse Shaper Sampler + Filter
discrete sequence
Up Converter Down Converter
baseband waveform
Noisy Channel {um} {ˆ um} xb(t) yb(t)
y(t) x(t)
= ⇒ we can guarantee yb(t) = xb(t) for all t and um = ˆ um, for all m.
Y (t) = x(t) + Z(t)
3
passband waveform
Pulse Shaper Sampler + Filter
discrete sequence
Up Converter Down Converter
baseband waveform
Noisy Channel {um} {ˆ um} xb(t) yb(t)
y(t) x(t)
additive noise
How to model the noise?
Equivalent noise after down-conversion, filtering, and sampling? Z(t) as white Gaussian process Vm = um + Zm Discrete-time equivalent: Optimal decision rule that minimizes error probability
Filter + Sampler +
Detection
How to find the best from ? ˆ um Vm
4
5
6
sample space sigma field probability measure
7
I is uncountable (I = R) → random waveform {Xm : m ∈ Z} I is countable (I = Z) → random sequence {X(t) : t ∈ R}
8
F
← → ˘ X(ω; f) = ∞
−∞
X(ω; t) exp (−j2πft) dt Fourier transform (auto-covariance function) KX(s, t) Cov (X(s), X(t)) = E [(X(s) − µX(s))(X(t) − µX(t))] RX(s, t) E [X(s)X(t)] (auto-correlation function)
9
mean variance
2σ2
10
i.i.d.
∼ N(0, 1) constant vector b ∈ Rn a ∈ Rn×m constant matrix such that Z Z1 . . . Zn = aW + b, where W W1 . . . Wm .
jointly Gaussian random vector
11
µ E[Z] = E[Z1] . . . E[Zn] Second moment k E[(Z − µ)(Z − µ)] = Var[Z1] Cov(Z1, Z2) · · · Cov(Z1, Zn) Cov(Z2, Z1) Var[Z1] · · · Cov(Z2, Zn) . . . ... . . . Cov(Zn, Z1) · · · Var[Zn]
12
n 2
det(k) exp
2(z − µ)k−1(z − µ)
13
14
Z(t) =
∞
Zk φk(t), Zk ∼ N(0, σ2
k)
: mutually independent.
∥Z(t)∥2 = ∞
k=1 Z2 k
KZ(s, t) = Cov ∞
∞
ZkZm φk(s)φm(t)
∞
σ2
k φk(s)φk(t).
quite general
15
is stationary and are identically distributed for any time shift is wide-sense stationary (WSS) the first and second moments are time-shift invariant, that is,
for all . for all and .
16
RX(τ)
F
← → SX(f).
RX(τ) = RX(−τ), KX(τ) = KX(−τ) SX(f) = S∗
X(f) = SX(−f)
E
= ∞
−∞
SX(f) df.
17
∆f→0
energy of output y(t; f0, ∆f) ∆f ↑ the square of the freq. response = |˘ x(f0)|2 .
18
−∞
x(τ)x∗(τ − t) dτ
F
← → Ex(f)
Auto-correlation is effectively the convolution of x(t) with x*(-t)
19
2 , t0 2
t0→∞
1 t0 Ext0(f).
20
t0→∞
1 t0
2
− t0
2
x(τ)x∗(τ − t) dτ
F
← → Sx(f) For an important class of random processes called ergodic processes, the long-term time average is equal to the statistical average: RX(t) = E [X(τ)X∗(τ − t)] = RX(t) = ⇒ SX(f) = F {RX(t)} = SX(f) ↑ the reason why it is called PSD!
ergodicity
21
22
X(t) h(t) Y (t) = (X ∗ h)(t) First moment: µY (t) = (µX ∗ h)(t) Second moment: KY (s, t) = ∞
−∞
∞
−∞
h(s − ξ)KX(ξ, τ)h(t − τ) dξdτ
23
X(t) h(t) Y (t) = (X ∗ h)(t) First moment: µY = µX˘ h(0) Second moment: KY (τ) = (h ∗ KX ∗ hrv) (τ)
WSS WSS Stationary Stationary
X1(t)
24
X2(t) Y1(t) = (X1 ∗ h1)(t) Y2(t) = (X2 ∗ h2)(t) h1(t) h2(t) Definition (jointly WSS):
jointly WSS jointly WSS
{X1(t)} and {X2(t)} are jointly WSS if they are both WSS and the cross-covariance KX1,X2(t + τ, t) Cov (X1(s), X2(t)) depends on τ only
25
X1(t) X2(t) Y1(t) = (X1 ∗ h1)(t) Y2(t) = (X2 ∗ h2)(t) h1(t) h2(t)
jointly WSS jointly WSS
KY1,Y2(τ) = (h1 ∗ KX1,X2 ∗ h2,rv) (τ) Cross-covariance: Cross PSD: SY1,Y2(f) F {KY1,Y2(τ)} = ˘ h1(f)SX1,X2(f)˘ h∗
2(f)
26
X1(t) X2(t) g1(t) g2(t) W1 = X1(t), g1(t) W2 = X2(t), g2(t)
∞
−∞
X1(t)g∗
1(t) dt
∞
−∞
X2(t)g∗
2(t) dt
−∞
∞
−∞
g2(t)KX1,X2(s, t)g∗
1(s) ds dt
27
X1(t) X2(t) g1(t) g2(t) W1 = X1(t), g1(t) W2 = X2(t), g2(t)
∞
−∞
X1(t)g∗
1(t) dt
∞
−∞
X2(t)g∗
2(t) dt
i (0)
jointly WSS
28
h(t)
Gaussian Gaussian
Z(t) U(t) = (Z ∗ h)(t)
Gaussian Process
Jointly Gaussian
Z(t) g1(t) gm(t) V1 = Z(t), g1(t) Vm = Z(t), gm(t)
∞
−∞
Z(t)g∗
1(t) dt
29
30
Noisy Channel x(t) y(t)
some random process
the random process
Noise = aggregation of many additive “sub-noises” (thermal noise, device imperfection, etc.) Sub-noises are zero-mean WLOG and statistically independent All sub-noises contribute about the same energy to the total noise Sub-noises are rather stationary in the duration of interest
31
is the most “random” and yields smallest channel capacity
complex models
1 √n
n
Xi
d
→ N(0, σ2) as n → ∞
32
KW (τ) = N0 2 δ(τ) SW (f) = N0 2
33
SZ(f)
˘ h(f) =
2 N0
SZ(f) ≥ 0 ˘ h(f) = ⇒ h(t)
Z(t) = (W ∗ h)(t)
WGN with PSD
h(f) =
2 N0
SZ(f) = N0
2
h(f)
34
WGN with PSD
i.i.d. Gaussians!
Cov (Wi, Wj) = ⟨(gj ∗ KW )(t), gi(t)⟩ = ⟨ N0
2 gj(t), gi(t)⟩
= N0
2 1 {i = j}
W1 = W(t), φ1(t) φm(t) Wm = W(t), φm(t)
35
x(t) + Z(t) √ 2 cos(2πfct) − √ 2 sin(2πfct) Filter q(t) Filter q(t) T =
1 2W
T =
1 2W
{u(I)
m + Z(I) m }
{u(Q)
m
+ Z(Q)
m }
36
√ 2 cos(2πfct) − √ 2 sin(2πfct) Filter q(t) Filter q(t) T =
1 2W
T =
1 2W
Z(t) {Z(I)
m }
{Z(Q)
m }
37
Z(t) {Z(I)
m }
{Z(Q)
m }
m (t), m ∈ Z
m (t), m ∈ Z
m , Z(Q) m i.i.d.
∼ N(0, N0
2 ),
∀ m ∈ Z
p(t − mT) √ 2 cos(2πfct) ← → ψ(I)
m (t)
−p(t − mT) √ 2 sin(2πfct) ← → ψ(Q)
m (t)
38
um u(I)
m + ju(Q) m
Vm um + Zm, Zm Z(I)
m + jZ(Q) m
Zm Z(I)
m + jZ(Q) m ,
Z(I)
m , Z(Q) m i.i.d.
∼ N(0, N0
2 )
Zm ∼ CN(0, N0)
39
Pulse Shaper Filter + Sampler
discrete sequence
Up Converter Down Converter
baseband waveform
Noisy Channel
passband waveform
Vm = um + Zm um
Zm
i.i.d.
∼ CN(0, N0), ∀ m
40
41
u
Equivalent complex baseband additive noise channel model
V = u + Z
Detection the symbol mapper maps the bits to a symbol in the constellation set u ∈ A {a1, . . . , aM}
θ
ˆ Θ = φ(V )
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
42
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
θ ∈ Θ φ(·) (φ; θ) Pθ {φ(X) = θ} θ ∈ Θ
φMinimax arg min
φ
max
θ
Pe(φ; θ)
43
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
suppose the index is random and distributed as Θ ∼ π Goal: find a decision making algorithm s.t. the avg. probability of error is minimized φ(·)
(φ) Θ∼π [ (φ; Θ)] φBayes arg min
φ
Pe(φ) Minimax formulation
no assumption on the prior distribution of the index Goal: find a decision making algorithm s.t. worst-case prob. of error is minimized φ(·)
44
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
φBayes arg min
φ
Pe(φ) (φ) Θ∼π [ (φ; Θ)] =
π(θ)(φ; θ) =
π(θ)Pθ(x) = 1 −
π(θ)Pθ(x)
45
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
φBayes arg min
φ
Pe(φ) to maximize this term, we should pick (φ) = 1 −
π(θ)Pθ(x) = 1 −
π(θ)Pθ(x) {φ(x) = θ} φ(x) = arg max
θ∈Θ
π(θ)Pθ(x)
46
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
φ
Pe(φ) φBayes(x) = arg max
θ∈Θ
{π(θ)Pθ(x)}
this is a mapping from to x θ
φBayes(x) = arg max
θ∈Θ
{PΘ,X(θ, x)}
47
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
φBayes arg min
φ
Pe(φ)
this is a mapping from to x θ
48
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
φBayes arg min
φ
Pe(φ)
this is a mapping from to x θ
φBayes(x) = arg max
θ∈Θ
49
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
φBayes(x) = arg max
θ∈Θ
{π(θ)Pθ(x)} = arg max
θ∈Θ
{Pθ(x)} Pθ(x) ≡ PX|Θ(x|θ) φMAP(x) ≡ φML(x) arg max
θ∈Θ
50
V = u + Z
Detection
ˆ u = aˆ
θ
ˆ Θ = φ(V )
u = aθ ∈ A {a1, . . . , aM}
Z ∼ CN(0, N0)
Z Z(I) + jZ(Q), Z(I), Z(Q) i.i.d. ∼ N(0, N0
2 )
We can view the problem as detection in two-dimensional Euclidean space!
V = u + Z
Detection
ˆ Θ = φ(V ) ˆ u = aˆ
θ
u = aθ ∈ A {a1, . . . , aM} ⊆ Rn
In words, the n coordinates of u experience i.i.d. Gaussian noise!
Z ∼ N(0, σ2In)
51
V = u + Z
Detection
ˆ Θ = φ(V ) ˆ u = aˆ
θ
u = aθ ∈ A {a1, . . . , aM} ⊆ Rn Z ∼ N(0, σ2In)
fθ(v) =
n
1 √ 2πσ2 exp
2σ2 (vi − aθ,i)2
2 exp
n
i=1(vi − aθ,i)2
2σ2
2 exp
2σ2 v aθ2
φML(v) = argmax
θ∈{1,...,M}
fθ(v) = argmin
θ∈{1,...,M}
v aθ .
D0 D1
52
For a deterministic test , the decision region for hypothesis Hθ Dθ(φ) {x ∈ X | φ(x) = θ} . 2-PAM φ : X → Θ
d −d 1 d −d −3d 3d 01 11 00 10
D00 D01 D10 D11
4-PAM
0110 1110 0010 1010 0111 1111 0011 1011 0101 1101 0001 1001 0100 1100 0000 1000
53
16-QAM
000 001 011 111 101 100 110 010
8-PSK
54
φMD(v) = argmin
θ∈Θ
v aθ .
˜ v
55
a1 a2
v φMD(v) = 2
a a2→1
˜ v v a, a2→1 a2→1
a a1+a2
2
, a2→1 a1−a2
2
56
˜ v
a1 a2 ˜ a1 ˜ a2
a1 = 1
2 a1 a2
˜ a2 = 1
2 a1 a2
Decision Making
ˆ Θ = φ(X)
Statistical Experiment
Pθ
θ ∈ Θ
X θ
X ∼ Pθ Decision Making
ˆ Θ = φ(X)
Experiment
Pθ
θ ∈ Θ
X θ
X ∼ Pθ Decision Making
˜ φ(·) ˆ Θ = ˜ φ(T)
Dimension Reduction
f(·) T = f(X)
Decision Making
˜ φ(·) ˆ Θ = ˜ φ(T)
58
Statistical Experiment
Pθ
θ ∈ Θ
X θ
X ∼ Pθ Dimension Reduction
f(·) T = f(X)
T = f(X) is a sufficient statistic if the conditional distribution
Decision Making
˜ φ(·) ˆ Θ = ˜ φ(T)
59
Statistical Experiment
Pθ
θ ∈ Θ
X θ
X ∼ Pθ Dimension Reduction
f(·) T = f(X)
T = f(X) is a sufficient statistic if and only if the distribution
60
= P
−d
√
N0/2
√
N0/2
d
√
N0/2
√
N0/2
d −d
D0 D1
1
θ = 0 Under hypothesis H0 θ = 1 H1 Under hypothesis
aθ = −d, V = −d + Z aθ = +d, V = d + Z
(φML; 0) = P0{V ∈ D1} = {−d + Z ≥ 0} = {Z > d} (φML; 1) = P1{V ∈ D0} = {d + Z < 0} = {Z < −d}
Pe(φML) = Pe(φML; 0) = Pe(φML; 1) = Q
√
N0/2
62
t
1 √ 2π exp
2t2
Q (x) P{N(0, 1) > x} = ∞
x
1 √ 2π exp
2t2
= P{N(0, 1) < −x} x
Q(x) is a decreasing function Q(0) = 1/2 Q (∞) = 0, Q (−∞) = 1 Q (x) + Q (−x) = 1
63
1 √ 2π exp
2t2
Q (x) P{N(0, 1) > x} x
in words, the asymptotic behavior of Q(x) is exponentially decaying with rate x2/2
Q (x) ≤ 1 2 exp
2x2
∀ x ≥ 0 Q (x) ≤ 1 x √ 2π exp
2x2
∀ x ≥ 0 Q (x) ≥
x2
x √ 2π exp
2x2
∀ x ≥ 0 lim
x→∞
ln Q (x) −x2/2 = 1 ⇐ ⇒ Q (x) . = exp
2x2
64
˜ v
a1 a2
v
a a2→1
˜ v
a1 a2 ˜ a1 ˜ a2
Pe(φML) = Pe(φML; 1) = Pe(φML; 2) = Q
2σ
65
SNR average symbol energy total noise variance
SNR = d2
N0
Pe(φML) = Q
√
N0/2
√ 2SNR . = exp(−SNR)
Pe(φML; 00) = Q
√
N0/2
66
d −d −3d 3d 01 11 00 10
D00 D01 D10 D11
θ = 00 d
Pe(φML; 01) = 2Q
√
N0/2
d −d −3d 3d 01 11 00 10
D00 D01 D10 D11
θ = 01 d d Pe(φML; 00) = Q
√
N0/2
68
d −d −3d 3d 01 11 00 10
D00 D01 D10 D11
Pe(φML; 11) = 2Q
√
N0/2
Pe(φML; 00) = Q
√
N0/2
√
N0/2
69
d −d −3d 3d 01 11 00 10
D00 D01 D10 D11
θ = 10 Pe(φML; 10) = Q
√
N0/2
√
N0/2
√
N0/2
√
N0/2
70
d −d −3d 3d 01 11 00 10
D00 D01 D10 D11
Pe(φML; 00) = Q
√
N0/2
√
N0/2
√
N0/2
√
N0/2
⇒ Pe(φML) = 1+2+2+1
4
Q
√
N0/2
2Q
√
N0/2
N0
= 3
2Q
5SNR
. = exp(− 1
5SNR)
71
D10
72
Q ≡ Q
√
N0/2
11 00 10
P10{V1 > 0, V2 < 0} = P10{V1 > 0}P10{V2 < 0}
independence of the two noises
= P{N(0, N0/2) > −d}P{N(0, N0/2) < d} = (1 − Q)2 Probability of success
θ = 10
= ⇒ Pe(φML; 10) = 1 − (1 − Q)2 = 2Q − Q2 Probability of error By symmetry, Pe(φML) = Pe(φML; 10) = 2Q − Q2
= Q √ SNR
N0
D10 D10
73
D10
01 11 00 10
Pe(φML) = 2Q √ SNR
√ SNR 2 . = exp(− 1
2SNR)
Pe(φML) = P10{V ∈ V1 ∪ V2} Simple bounds suffice for high SNR asymptote V1 V2 = ⇒ Pe(φML) ≤ P10{V ∈ V1} + P10{V ∈ V2} Pe(φML) ≥ max (P10{V ∈ V1}, P10{V ∈ V2})
P10{V ∈ V1} = P{N(0, N0/2) < −d} = Q P10{V ∈ V2} = P{N(0, N0/2) > d} = Q
= ⇒ Q √ SNR
√ SNR
74 01 11 00 10 01 11 00 10
4-QAM 4-PAM
repetition does not improve performance under fixed SNR
Pe . = exp(− 1
2SNR)
Pe . = exp(− 1
5SNR)
To achieve the same performance, 4-PAM requires 1.5X more power than 4-QAM! QAM exploits the total degrees of freedom better than PAM under fixed power constraint
75
0110 1110 0010 1010 0111 1111 0011 1011 0101 1101 0001 1001 0100 1100 0000 1000
Exact performance can be computed It is simpler to use pairwise probability of error to find bounds on the performance!
Pe(φML; θ) = Pθ
⇒ maxi̸=θ Pθ {V ∈ Di} ≤ Pe(φML; θ) ≤
i̸=θ Pθ {V ∈ Di}
Pairwise error probability is simple to compute because it equals that of binary detection!
Pθ {V ∈ Di} = Q
2√ N0/2
√2N0
⇒ Q
√2N0
√2N0
⇒ Pe(φML) . = exp
min
4N0