Principle of Communications, Fall 2017
Lecture 03 Optimal Detection under Noise
I-Hsiang Wang
ihwang@ntu.edu.tw National Taiwan University 2017/9/28
Lecture 03 Optimal Detection under Noise I-Hsiang Wang - - PowerPoint PPT Presentation
Principle of Communications, Fall 2017 Lecture 03 Optimal Detection under Noise I-Hsiang Wang ihwang@ntu.edu.tw National Taiwan University 2017/9/28 x ( t ) Pulse Up { u m } x b ( t ) Shaper Converter discrete baseband passband Noisy
Principle of Communications, Fall 2017
ihwang@ntu.edu.tw National Taiwan University 2017/9/28
2
passband waveform
Pulse Shaper Sampler + Filter
discrete sequence
Up Converter Down Converter
baseband waveform
Noisy Channel {um} {ˆ um} xb(t) yb(t)
y(t) x(t)
= ⇒ we can guarantee yb(t) = xb(t) for all t and um = ˆ um, for all m.
Y (t) = x(t) + Z(t)
3
passband waveform
Pulse Shaper Sampler + Filter
discrete sequence
Up Converter Down Converter
baseband waveform
Noisy Channel {um} {ˆ um} xb(t) yb(t)
y(t) x(t)
additive noise
How to model the noise?
Equivalent noise after down-conversion, filtering, and sampling? Z(t) as white Gaussian process Vm = um + Zm Discrete-time equivalent: Optimal decision rule that minimizes error probability
Filter + Sampler +
Detection
How to find the best from ? ˆ um Vm
4
5
6
sample space sigma field probability measure
7
I is uncountable (I = R) → random waveform {Xm : m ∈ Z} I is countable (I = Z) → random sequence {X(t) : t ∈ R}
8
F
← → ˘ X(ω; f) = ∞
−∞
X(ω; t) exp (−j2πft) dt Fourier transform (auto-covariance function) KX(s, t) Cov (X(s), X(t)) = E [(X(s) − µX(s))(X(t) − µX(t))] RX(s, t) E [X(s)X(t)] (auto-correlation function)
9
mean variance
2σ2
10
i.i.d.
∼ N(0, 1) constant vector b ∈ Rn a ∈ Rn×m constant matrix such that Z Z1 . . . Zn = aW + b, where W W1 . . . Wm .
jointly Gaussian random vector
11
µ E[Z] = E[Z1] . . . E[Zn] Second moment k E[(Z − µ)(Z − µ)] = Var[Z1] Cov(Z1, Z2) · · · Cov(Z1, Zn) Cov(Z2, Z1) Var[Z1] · · · Cov(Z2, Zn) . . . ... . . . Cov(Zn, Z1) · · · Var[Zn]
12
n 2
det(k) exp
2(z − µ)k−1(z − µ)
13
14
Z(t) =
∞
Zk φk(t), Zk ∼ N(0, σ2
k)
: mutually independent.
∥Z(t)∥2 = ∞
k=1 Z2 k
KZ(s, t) = Cov ∞
∞
ZkZm φk(s)φm(t)
∞
σ2
k φk(s)φk(t).
quite general
15
is stationary and are identically distributed for any time shift is wide-sense stationary (WSS) the first and second moments are time-shift invariant, that is,
for all . for all and .
16
RX(τ)
F
← → SX(f).
RX(τ) = RX(−τ), KX(τ) = KX(−τ) SX(f) = S∗
X(f) = SX(−f)
E
= ∞
−∞
SX(f) df.
17
∆f→0
energy of output y(t; f0, ∆f) ∆f ↑ the square of the freq. response = |˘ x(f0)|2 .
18
−∞
x(τ)x∗(τ − t) dτ
F
← → Ex(f)
Auto-correlation is effectively the convolution of x(t) with x*(-t)
19
2 , t0 2
t0→∞
1 t0 Ext0(f).
20
t0→∞
1 t0
2
− t0
2
x(τ)x∗(τ − t) dτ
F
← → Sx(f) For an important class of random processes called ergodic processes, the long-term time average is equal to the statistical average: RX(t) = E [X(τ)X∗(τ − t)] = RX(t) = ⇒ SX(f) = F {RX(t)} = SX(f) ↑ the reason why it is called PSD!
ergodicity
21
22
X(t) h(t) Y (t) = (X ∗ h)(t) First moment: µY (t) = (µX ∗ h)(t) Second moment: KY (s, t) = ∞
−∞
∞
−∞
h(s − ξ)KX(ξ, τ)h(t − τ) dξdτ
23
X(t) h(t) Y (t) = (X ∗ h)(t) First moment: µY = µX˘ h(0) Second moment: KY (τ) = (h ∗ KX ∗ hrv) (τ)
WSS WSS Stationary Stationary
X1(t)
24
X2(t) Y1(t) = (X1 ∗ h1)(t) Y2(t) = (X2 ∗ h2)(t) h1(t) h2(t) Definition (jointly WSS):
jointly WSS jointly WSS
{X1(t)} and {X2(t)} are jointly WSS if they are both WSS and the cross-covariance KX1,X2(t + τ, t) Cov (X1(s), X2(t)) depends on τ only
25
X1(t) X2(t) Y1(t) = (X1 ∗ h1)(t) Y2(t) = (X2 ∗ h2)(t) h1(t) h2(t)
jointly WSS jointly WSS
KY1,Y2(τ) = (h1 ∗ KX1,X2 ∗ h2,rv) (τ) Cross-covariance: Cross PSD: SY1,Y2(f) F {KY1,Y2(τ)} = ˘ h1(f)SX1,X2(f)˘ h∗
2(f)
26
X1(t) X2(t) g1(t) g2(t) W1 = X1(t), g1(t) W2 = X2(t), g2(t)
∞
−∞
X1(t)g∗
1(t) dt
∞
−∞
X2(t)g∗
2(t) dt
−∞
∞
−∞
g2(t)KX1,X2(s, t)g∗
1(s) ds dt
27
X1(t) X2(t) g1(t) g2(t) W1 = X1(t), g1(t) W2 = X2(t), g2(t)
∞
−∞
X1(t)g∗
1(t) dt
∞
−∞
X2(t)g∗
2(t) dt
i (0)
jointly WSS
28
h(t)
Gaussian Gaussian
Z(t) U(t) = (Z ∗ h)(t)
Gaussian Process
Jointly Gaussian
Z(t) g1(t) gm(t) V1 = Z(t), g1(t) Vm = Z(t), gm(t)
∞
−∞
Z(t)g∗
1(t) dt
29
30
Noisy Channel x(t) y(t)
some random process
the random process
Noise = aggregation of many additive “sub-noises” (thermal noise, device imperfection, etc.) Sub-noises are zero-mean WLOG and statistically independent All sub-noises contribute about the same energy to the total noise Sub-noises are rather stationary in the duration of interest
31
is the most “random” and yields smallest channel capacity
complex models
1 √n
n
Xi
d
→ N(0, σ2) as n → ∞
32
KW (τ) = N0 2 δ(τ) SW (f) = N0 2
33
SZ(f)
˘ h(f) =
2 N0
SZ(f) ≥ 0 ˘ h(f) = ⇒ h(t)
Z(t) = (W ∗ h)(t)
WGN with PSD
h(f) =
2 N0
SZ(f) = N0
2
h(f)
34
WGN with PSD
i.i.d. Gaussians!
Cov (Wi, Wj) = ⟨(gj ∗ KW )(t), gi(t)⟩ = ⟨ N0
2 gj(t), gi(t)⟩
= N0
2 1 {i = j}
W1 = W(t), φ1(t) φm(t) Wm = W(t), φm(t)
35
x(t) + Z(t) √ 2 cos(2πfct) − √ 2 sin(2πfct) Filter q(t) Filter q(t) T =
1 2W
T =
1 2W
{u(I)
m + Z(I) m }
{u(Q)
m
+ Z(Q)
m }
36
√ 2 cos(2πfct) − √ 2 sin(2πfct) Filter q(t) Filter q(t) T =
1 2W
T =
1 2W
Z(t) {Z(I)
m }
{Z(Q)
m }
37
Z(t) {Z(I)
m }
{Z(Q)
m }
m (t), m ∈ Z
m (t), m ∈ Z
m , Z(Q) m i.i.d.
∼ N(0, N0
2 ),
∀ m ∈ Z
p(t − mT) √ 2 cos(2πfct) ← → ψ(I)
m (t)
−p(t − mT) √ 2 sin(2πfct) ← → ψ(Q)
m (t)
38
um u(I)
m + ju(Q) m
Vm um + Zm, Zm Z(I)
m + jZ(Q) m
Zm Z(I)
m + jZ(Q) m ,
Z(I)
m , Z(Q) m i.i.d.
∼ N(0, N0
2 )
Zm ∼ CN(0, N0)
39
Pulse Shaper Filter + Sampler
discrete sequence
Up Converter Down Converter
baseband waveform
Noisy Channel
passband waveform
Vm = um + Zm um
Zm
i.i.d.
∼ CN(0, N0), ∀ m
40
41
u
Equivalent complex baseband additive noise channel model
V = u + Z
Detection the symbol mapper maps the bits to a symbol in the constellation set u ∈ A {a1, . . . , aM}
θ
ˆ Θ = φ(V )
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
42
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
θ ∈ Θ φ(·) (φ; θ) Pθ {φ(X) = θ} θ ∈ Θ
φMinimax arg min
φ
max
θ
Pe(φ; θ)
43
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
suppose the index is random and distributed as Θ ∼ π Goal: find a decision making algorithm s.t. the avg. probability of error is minimized φ(·)
(φ) Θ∼π [ (φ; Θ)] φBayes arg min
φ
Pe(φ) Minimax formulation
no assumption on the prior distribution of the index Goal: find a decision making algorithm s.t. worst-case prob. of error is minimized φ(·)
44
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
φBayes arg min
φ
Pe(φ) (φ) Θ∼π [ (φ; Θ)] =
π(θ)(φ; θ) =
π(θ)Pθ(x) = 1 −
π(θ)Pθ(x)
45
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
φBayes arg min
φ
Pe(φ) to maximize this term, we should pick (φ) = 1 −
π(θ)Pθ(x) = 1 −
π(θ)Pθ(x) {φ(x) = θ} φ(x) = arg max
θ∈Θ
π(θ)Pθ(x)
46
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
φ
Pe(φ) φBayes(x) = arg max
θ∈Θ
{π(θ)Pθ(x)}
this is a mapping from to x θ
φBayes(x) = arg max
θ∈Θ
{PΘ,X(θ, x)}
47
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
φBayes arg min
φ
Pe(φ)
this is a mapping from to x θ
48
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
φBayes arg min
φ
Pe(φ)
this is a mapping from to x θ
φBayes(x) = arg max
θ∈Θ
49
Statistical Experiment
Pθ
θ ∈ Θ Decision Making
ˆ Θ = φ (X) X θ
X ∼ Pθ
φBayes(x) = arg max
θ∈Θ
{π(θ)Pθ(x)} = arg max
θ∈Θ
{Pθ(x)} Pθ(x) ≡ PX|Θ(x|θ) φMAP(x) ≡ φML(x) arg max
θ∈Θ
50
V = u + Z
Detection
ˆ u = aˆ
θ
ˆ Θ = φ(V )
u = aθ ∈ A {a1, . . . , aM}
Z ∼ CN(0, N0)
Z Z(I) + jZ(Q), Z(I), Z(Q) i.i.d. ∼ N(0, N0
2 )
We can view the problem as detection in two-dimensional Euclidean space!
V = u + Z
Detection
ˆ Θ = φ(V ) ˆ u = aˆ
θ
u = aθ ∈ A {a1, . . . , aM} ⊆ Rn
In words, the n coordinates of u experience i.i.d. Gaussian noise!
Z ∼ N(0, σ2In)
51
V = u + Z
Detection
ˆ Θ = φ(V ) ˆ u = aˆ
θ
u = aθ ∈ A {a1, . . . , aM} ⊆ Rn Z ∼ N(0, σ2In)
fθ(v) =
n
1 √ 2πσ2 exp
2σ2 (vi − aθ,i)2
2 exp
n
i=1(vi − aθ,i)2
2σ2
2 exp
2σ2 v aθ2
φML(v) = argmax
θ∈{1,...,M}
fθ(v) = argmin
θ∈{1,...,M}
v aθ .
D0 D1
52
For a deterministic test , the decision region for hypothesis Hθ Dθ(φ) {x ∈ X | φ(x) = θ} . 2-PAM φ : X → Θ
d −d 1 d −d −3d 3d 01 11 00 10
D00 D01 D10 D11
4-PAM
0110 1110 0010 1010 0111 1111 0011 1011 0101 1101 0001 1001 0100 1100 0000 1000
53
16-QAM
000 001 011 111 101 100 110 010
8-PSK
54
φMD(v) = argmin
θ∈Θ
v aθ .
˜ v
55
a1 a2
v φMD(v) = 2
a a2→1
˜ v v a, a2→1 a2→1
a a1+a2
2
, a2→1 a1−a2
2
56
˜ v
a1 a2 ˜ a1 ˜ a2
a1 = 1
2 a1 a2
˜ a2 = 1
2 a1 a2
Decision Making
ˆ Θ = φ(X)
Statistical Experiment
Pθ
θ ∈ Θ
X θ
X ∼ Pθ Decision Making
ˆ Θ = φ(X)
Experiment
Pθ
θ ∈ Θ
X θ
X ∼ Pθ Decision Making
˜ φ(·) ˆ Θ = ˜ φ(T)
Dimension Reduction
f(·) T = f(X)
Decision Making
˜ φ(·) ˆ Θ = ˜ φ(T)
58
Statistical Experiment
Pθ
θ ∈ Θ
X θ
X ∼ Pθ Dimension Reduction
f(·) T = f(X)
T = f(X) is a sufficient statistic if the conditional distribution
Decision Making
˜ φ(·) ˆ Θ = ˜ φ(T)
59
Statistical Experiment
Pθ
θ ∈ Θ
X θ
X ∼ Pθ Dimension Reduction
f(·) T = f(X)
T = f(X) is a sufficient statistic if and only if the distribution