Elements of a Nonstochastic Information Theory
Girish Nair
- Dept. Electrical & Electronic Engineering
University of Melbourne LCCC Workshop on Information and Control in Networks Lund, Sweden 17 October 2012
Elements of a Nonstochastic Information Theory Girish Nair Dept. - - PowerPoint PPT Presentation
Elements of a Nonstochastic Information Theory Girish Nair Dept. Electrical & Electronic Engineering University of Melbourne LCCC Workshop on Information and Control in Networks Lund, Sweden 17 October 2012 Random Variables in
Girish Nair
University of Melbourne LCCC Workshop on Information and Control in Networks Lund, Sweden 17 October 2012
In communications, unknown quantities/signals are usually modelled as random variables (rv’s) & random processes, for good reasons:
to well-defined distributions & random models – e.g. Gaussian thermal electronic noise, binary symmetric channels, Rayleigh fading, etc. thermal electronic noise, binary symmetric channels, Rayleigh fading, etc.
each individual phone call/email/download may not be critically important... System designer need only seek good performance in an average or expected sense - e.g. bit error rate, signal-to-noise ratio, outage probability.
2
5
(subaddi 2
t t
ε → ≥
tivity) 2
t t t t t
ε → →∞
6
t t
→∞ ≥
In 1956, Shannon also introduced the stricter notion of
that permits a probability of decoding error = 0 exactly. I.e. : supsu zero error capacity C C =
2 2
log log | | p lim sup ,
t t =
F F
7
I.e. : supsu
t
C
≥
=
2 2
log | | p lim sup , 1 1 where = a finite set of input words of length 1, & the inner supremums are over all s.t. (0 : ) , the corresponding channel output word (0 : ) can be mapped
t t t t t t
t t t x t Y t
→∞
= + + + ∀ ∈ F F F ˆ ˆ to an estimate (0 : ) with Pr (0 : ) (0 : ) 0. Clearly, is (usually strictly) smaller than . X t X t x t C C ≠ =
8
9
from some sample space Ω to a space X.
inputs entering a system, & X may represent an output/state variable
10
ω
( ) : . : ( ) ( ) :
X Y Marginal range Joint range As in prob. theory, the argument will often be omitted. X X X Y X Y
: ( ), ( ) : . | : ( ) : ( ) , .
Y X Joint range Conditional range X Y X Y X y X Y y
In the absence of statistical structure, the joint range completely characterises the relationship between uv's X Y
| { },
y Y
X Y X y y the joint range can be determine
y
d from the conditional & marginal ranges, similar to the relationship between joint, conditional & marginal probability
12 12
distributions.
y y
Y x Y
X Y
X Y
Y Y x
y’ y’
X y X
x x
X X
X
x’ x’ a) X,Y related b) X,Y unrelated
X X y
2
n
n
X
inf log , discrete-valued | I [ ; ]: .
y Y
X X X y X Y
[ ] inf log , continuous-valued |
y Y
X X X y
T
H H , , , finite-valued [ ; ]:= . X Y X Y X Y X Y
T[ ; ]: . Something complex, ( , ) cont.-valued w. convex range
n
X Y X Y
Each gives different treatments of continuous &
Klir’s information has natural properties, but is
Shingin & Ohta’s information: inherently
20
n
1 1
i i i
n n
([[X,Y]] = shaded area) y y y x x x ( , ) ( ', '), l di t d i l x y x y
( ', '), but disconnected in usual sense. x y x y
( , ) ( ', '), but connected in usual sense. x y x y
23 23
* 2
24
Suppose X & Y are separately observed by two agents. Let the agents have functions f & g respectively s.t.
The values of Z induce a partition of the joint range [[X,Y]]. Taxicab partition = the [[X,Y]]-partition induced by the most
([[X,Y]] = shaded area) y y
z=1 z=0 z=0
x x
z=1 z=0 z=0
| | 2 max.# distinct values that can always be agreed on f t b ti f & X Y
| | 1 max.# distinct values that can always be agreed on T
25
from separate observations of & . X Y from separate observations of & . X Y
* *
*
*
31
0,
t X
≥
ı ı
ı ı
1
* * : (0: ): (0: )
t
t X X t X t
∞ +
→∞ ⊆ ⊆
X X
The idea of a common (random) variable Z comes from
cryptography [Wolf & Wullschleger, ITW2004]
the discrete bipartite graph describing (x,y) pairs having joint prob.> 0.
32
prob.> 0.
and mixed pairs of variables, not representable by discrete graphs.
C0 was shown by Wolf & Wullschleger to coincide with the
maximum Shannon entropy rate over all common rv’s Z. However, this is still a probabilistic characterisation.
maximal common rv Z.
feedback. channel No . ) ( ) : ( uv. a ) ( ), ( ) ( ), ( ) 1 ( → ∈ = = + t S t Y X t GX t Y t AX t X 2 S : Channel Erroneous S : Coder
Q
a
9
. ) ( ˆ ) ( sup lim , || ) ( || s.t. ) ( uv any For
) ( ˆ ) ( sup , || ) ( || s.t. ) ( uv any For :
the , , parameters Given ) 1 ( ˆ ) : (
,
= − ≤ ∞ < − ≤ > + →
− Ω ∈ ∞ → − Ω ∈ ≥
t X t X l X X t X t X l X X ρ l t X t Q
t t t t
ρ ρ ρ ρ
ω ω
: e convergenc uniform l exponentia II) errors estimation bounded uniformly l exponentia I) : Estimator 2 S : Channel Erroneous a
ρ
ρ ρ
≥ = state, plant initial
depend not does channel The : . s |' eigenvalue | by governed subspace invariant to restricted : where ,
is ) , ( : A A A G DF2 DF1 ρ > ↔ ↔ s ' | eigenvalue | more
has : ) : ( ) : ( ) ( ), : ( sequence input channel given ), ( to unrelated lly conditiona is ) : ( sequence
the i.e. A t Q t S X t S X t Q DF3
= ≥ >
≥
(*) : log then , some for achieved are errors estimation bounded uniformly l exponentia
| | 2
ρ λ ρ
ρ ρ λ
H C l
i
i
>
≥
n theory informatio maximin : part first
Proof ve. constructi : part second
Proof d. constructe be can e convergenc uniform l exponentia
that estimator
a , any for then strictly, holds (*) if , Conversely
| |
ρ
ρ λ
l
i
. detectable is ) , ( ), ( ) ( ) ( ), ( ) ( ) 1 ( A G t W t GX t Y t V t AX t X + = + = + : D0 : s Assumption
12 12
) : ( ) : ( ) : ( ), 1 : ( ), ( ), : ( input channel given the , ) : ( ), 1 : ( ), ( with unrelated lly conditiona is ) : (
channel the i.e. es, disturbanc and states plant
depend not does channel The unrelated. mutually are & ), ( ns. realisatio e disturbanc valid are , signals null The . in bounded uniformly are &
ns Realisatio 1. s |' eigenvalue | more
has . detectable is ) , ( t Q t S t W t V X t S t W t V X t Q W V X w v W V A ↔ ↔ − − = >
∞
: D5 : D4 : D3 : D2 : D1 : D0 l
(**) . : | | log then , some for achieved are errors estimation bounded uniformly If
1 | | 2
= ≥ >
≥
H C l
i
λ
λ
13 13
d. constructe be can errors estimation bounded uniformly achieves that estimator
a , any for then strictly, holds (**) if , Conversely
1 | |
>
≥
l
i
λ
In a stochastic setting (i.e. random channel and X(0)) with no
plant noise, it is known that almost-sure asymptotic convergence is possible iff ordinary capacity C > H (Matveev & Savkin 2007). The criterion here is stricter because a law of large numbers cannot be used to average out decoding errors.
If bounded, nonstochastic disturbances are present, they
showed that a.s. uniformly bounded errors are possible iff C0 > H. Proof used no info theory
14
Formulated a framework for modelling unknown variables without
assuming the existence of distributions
Defined nonprobabilistic analogues of independence & Markovness Proposed maximin information as a nonstochastic index of the most
refined knowledge that can be agreed on from separate refined knowledge that can be agreed on from separate
Showed that zero-error capacity coincides with the highest maximin
info rate possible across the channel
Used maximin info theory to derive tight conditions for uniform state
estimation of LTI plants
15
16
: n informatio maximin
in terms expressed be can channel uncertain memoryless stationary a
capacity feedback error
l
The : ) 12 in appear to (GN, Theorem directed CDC
17
n. informatio maximin is ] | ; [ log min : ] | ; [ where ], [ : ) 1 : ( | ) ( ); ( 1 1 sup lim
2 ]] [[ * * * ) : ( ), : ( 0F
l conditiona z Y X Z Y X I Y X I k Y k Y k X I t C
Z z t k t Y t X t
T
∈ = ∞ →
= → = − + =
References
http://arxiv.org/abs/1112.3471. (Provisionally accepted by IEEE Trans Auto. Contr; short version in
Workshop, San Antonio, USA ,2004.
and stabilization via noisy communication channels,'‘ Int. Jour. Contr., 2007.
communication channels”, SIAM J. Contr. Optim., 2007.