Elements of a Nonstochastic Information Theory Girish Nair Dept. - - PowerPoint PPT Presentation

elements of a nonstochastic information theory
SMART_READER_LITE
LIVE PREVIEW

Elements of a Nonstochastic Information Theory Girish Nair Dept. - - PowerPoint PPT Presentation

Elements of a Nonstochastic Information Theory Girish Nair Dept. Electrical & Electronic Engineering University of Melbourne LCCC Workshop on Information and Control in Networks Lund, Sweden 17 October 2012 Random Variables in


slide-1
SLIDE 1

Elements of a Nonstochastic Information Theory

Girish Nair

  • Dept. Electrical & Electronic Engineering

University of Melbourne LCCC Workshop on Information and Control in Networks Lund, Sweden 17 October 2012

slide-2
SLIDE 2

Random Variables in Communications

In communications, unknown quantities/signals are usually modelled as random variables (rv’s) & random processes, for good reasons:

  • Physical laws governing electronic/photonic circuit noise give rise

to well-defined distributions & random models – e.g. Gaussian thermal electronic noise, binary symmetric channels, Rayleigh fading, etc. thermal electronic noise, binary symmetric channels, Rayleigh fading, etc.

  • Telecomm. systems usually designed to be used many times, &

each individual phone call/email/download may not be critically important... System designer need only seek good performance in an average or expected sense - e.g. bit error rate, signal-to-noise ratio, outage probability.

2

slide-3
SLIDE 3

Nonrandom Variables in Control

In contrast, unknowns in control are often treated as nonstochastic variables or signals

  • Dominant disturbances are not necessarily

electronic/photonic circuit noise, & may not follow well-defined probability distributions.

  • Safety- & mission-criticality

Performance guarantees needed every time plant is used, not just on average.

slide-4
SLIDE 4

Networked Control

Networked control: combines both communications and control theories!

How may nonstochastic analogues

  • f key probabilistic concepts like

independence, Markovness and information be usefully defined?

slide-5
SLIDE 5

Another Motivation: Channel Capacity

The

  • f a channel is defined as

the highest block-code bit-rate that permits an arbitrarily small probability of decoding error.

  • rdinary capacity C

5

(subaddi 2

log | | I.e. : lim supsup 1

t t

C t

ε → ≥

= = + F

tivity) 2

log lim lim sup , 1 where := a finite set of input words of length 1, & the inner supremums are over all s.t. (0 : ) , the corresponding random channel output word (0 : ) can be

t t t t t

t t x t Y t

ε → →∞

+ + ∀ ∈ F F F F ˆ ˆ mapped to an estimate (0: ) with Pr (0 : ) (0 : ) . X t X t x t ε   ≠ ≤  

slide-6
SLIDE 6

Information Capacity

Shannon's essentially gives an information-theoretic characterization of for : Channel Coding Theorem C stationary memoryless stochastic channels

6

[ ] [ ]

I (0: ); (0: ) I (0: ); (0: ) supsup lim sup 1 1

t t

X t Y t X t Y t C t t

→∞ ≥

= = + +

( )

supI[ (0); (0)] , where I[ ; ]:=Shannon's functional, and the inner supremums are over all random input sequences (0: ). X Y X t = ⋅⋅ mutual information

slide-7
SLIDE 7

Zero-Error Capacity

In 1956, Shannon also introduced the stricter notion of

  • , the highest block-coded bit-rate

that permits a probability of decoding error = 0 exactly. I.e. : supsu zero error capacity C C =

2 2

log log | | p lim sup ,

t t =

F F

7

I.e. : supsu

t

C

=

2 2

log | | p lim sup , 1 1 where = a finite set of input words of length 1, & the inner supremums are over all s.t. (0 : ) , the corresponding channel output word (0 : ) can be mapped

t t t t t t

t t t x t Y t

→∞

= + + + ∀ ∈ F F F ˆ ˆ to an estimate (0 : ) with Pr (0 : ) (0 : ) 0. Clearly, is (usually strictly) smaller than . X t X t x t C C   ≠ =  

slide-8
SLIDE 8

C0 as an “Information” Capacity?

Fact: C0 does not depend on the nonzero transition probabilities of the channel, and can be defined without any probability theory, in terms of the input-output graph that describes permitted channel transitions. Q: Can we express C0 as the maximum rate of some nonstochastic information functional?

8

slide-9
SLIDE 9

Outline

(Motivation) Uncertain Variables Taxicab Partitions & Maximin Information Taxicab Partitions & Maximin Information C0 via Maximin Information Uniform LTI State Estimation over

Erroneous Channels

Conclusion Extension & Future Work

9

slide-10
SLIDE 10

The Uncertain Variable Framework

  • Similar to probability theory, let an uncertain variable (uv) be a mapping X

from some sample space Ω to a space X.

  • E.g., each ω є Ω may represent a particular combination of disturbances &

inputs entering a system, & X may represent an output/state variable

  • For any particular ω, the value x=X(ω) is realised.

10

Ω Ω Ω Ω

ω

x

X

Unlike prob. theory, assume no σ-algebra or measure

  • n Ω.
slide-11
SLIDE 11

Ranges Ranges

  • As in prob theory the
  • argument will often be omitted
  • :

( ) : . : ( ) ( ) :

  • X

X Y Marginal range Joint range As in prob. theory, the argument will often be omitted. X X X Y X Y

  • ,

: ( ), ( ) : . | : ( ) : ( ) , .

  • X

Y X Joint range Conditional range X Y X Y X y X Y y

  • & .

In the absence of statistical structure, the joint range completely characterises the relationship between uv's X Y

  • ,

| { },

  • As

y Y

X Y X y y the joint range can be determine

y

d from the conditional & marginal ranges, similar to the relationship between joint, conditional & marginal probability

12 12

distributions.

slide-12
SLIDE 12

Unrelatedness

  • , called

if , , X Y unrelated X Y X Y

  • r equivalently if
  • r equivalently if

| , . X y X y Y

  • Parallels the definition of mutual independence for rv's.

Called if , related X

  • , without equality.

Y X Y

  • 13

,

  • ,

q y

slide-13
SLIDE 13

y y

  • | '

Y x Y

  • ,

X Y

  • ,

X Y

  • | '

Y Y x

  • Y

y’ y’

  • | '

X y X

  • y

x x

  • | '

X X

X

x’ x’ a) X,Y related b) X,Y unrelated

  • | '

X X y

  • X
slide-14
SLIDE 14

Nonstochastic Entropy

The uncertainty associated with a uv is captured by H [ ]: log [0 ] a priori X Hartley entropy X X

2

H [ ]: log [0, ]. Hartley entropy X X

  • Continuous-valued uv's yield H [ ]

. For uv's with Lebesgue measurable range in

n

X

  • For uv's with Lebesgue-measurable range in

,

n

  • the 0-th order Re nyi differential entropy
  • 2

h [ ]: log [ , ] is more useful X X

  • 18

is more useful.

slide-15
SLIDE 15

Nonstochastic Information – Nonstochastic Information – Previous Definitions

  • H. Shingin & Y. Ohta, NecSys09:

X

  • 2

inf log , discrete-valued | I [ ; ]: .

y Y

X X X y X Y

  • 2

[ ] inf log , continuous-valued |

y Y

X X X y

  • (expressed in the uv framework here)
  • G. Klir, 2006:

T

  • H

H H , , , finite-valued [ ; ]:= . X Y X Y X Y X Y

  • 19

T[ ; ]: . Something complex, ( , ) cont.-valued w. convex range

n

X Y X Y

slide-16
SLIDE 16

Comments on Comments on Previous Definitions

Each gives different treatments of continuous &

di t l d i bl discrete-valued variables.

Klir’s information has natural properties, but is

purely axiomatic. No demonstrated relevance to problems in communications or control.

Shingin & Ohta’s information: inherently

asymmetric, but shown to be useful for studying y , y g control over errorless digital channels.

20

slide-17
SLIDE 17

Taxicab Connectivity

  • A pair of points ( , ), ( ', ')

, is called denoted ( ) ( ' ') if a finite sequence ( ) in

n

x y x y X Y taxicab connected, x y x y x y X Y

  • 1

1 1

denoted ( , ) ( , ), if a finite sequence ( , ) in , i) beginning from ( , ) ( , ), ii) ending in ( ) ( ' ')

i i i

x y x y x y X Y x y x y x y x y

  • ii) ending in (

, ) ( , ), iii) and w

n n

x y x y

  • ith each point in the sequence differing in at
  • ne coordinate

from its predecessor most from its predecessor. Every point in this sequence must yield the value same z Every point in this sequence must yield the

  • value

as its predecessor, since it has either the same - o same z x r -coordinate. By induction ( )& ( ' ') yield the same -value y x y x y z

  • 22

By induction, ( , )& ( , ) yield the same -value. x y x y z

slide-18
SLIDE 18

Taxicab Connectedness Taxicab Connectedness Examples Examples

([[X,Y]] = shaded area) y y y x x x ( , ) ( ', '), l di t d i l x y x y

  • ( , )

( ', '), but disconnected in usual sense. x y x y

  • also disconnected in usual sense

( , ) ( ', '), but connected in usual sense. x y x y

  • 23

23 23

slide-19
SLIDE 19

Taxicab Partition and Taxicab Partition and Nonstochastic Information

  • There is a unique partition
  • f

, in which X Y Thm : T

  • a) every pair of points in the same partition set is taxicab connected, but

b) pair of points in different partition sets is taxicab connected. no b) pair of points in different partition sets is taxicab connected. Can no be established that defines the most refined shared T Can be established that defines the most refined shared data that can be unambiguously determined from or alone. Z X Y T

* 2

Define I [ ; ]: log X Y

  • maximin information

T

24

slide-20
SLIDE 20

Interpretation as a Common/Shared Variable

Suppose X & Y are separately observed by two agents. Let the agents have functions f & g respectively s.t.

f(X)=g(Y)=:Z The agents can unambiguously agree on the value of The agents can unambiguously agree on the value of the common variable Z.

  • The more distinct values Z can take, the more refined is

this shared knowledge.

The values of Z induce a partition of the joint range [[X,Y]]. Taxicab partition = the [[X,Y]]-partition induced by the most

refined common variable Z.

slide-21
SLIDE 21

Examples

([[X,Y]] = shaded area) y y

z=1 z=0 z=0

x x

z=1 z=0 z=0

| | 2 max.# distinct values that can always be agreed on f t b ti f & X Y

  • T

| | 1 max.# distinct values that can always be agreed on T

25

from separate observations of & . X Y from separate observations of & . X Y

slide-22
SLIDE 22

Some Key Properties Some Key Properties

  • f I*

* *

I [ ] I [ ] X Y Y X Symmetry : I [ ; ] I [ ; ]. X Y Y X

  • *

*

I [ ; ] I [ ; , ]. X Y X Y W

  • More Data Can't Hurt :

"Data Processing" : If is a Markov uncertainty chain, the W X Y

  • *

*

n I [ ; ] I [ ; ] W Y W X

  • 26

I [ ; ] I [ ; ]. W Y W X

slide-23
SLIDE 23

Uncertain Signals & Stationary Uncertain Signals & Stationary Memoryless Channels y

An is a mapping from to the space uncertain signal X Def :

  • f discrete-time signals :

. x

  • X

X

  • A

consists

  • f a set-valued

stationary memoryless uncertain channel transition funct Def : : , and the family of all ion

  • Y

T X 2 , y uncertain input-output signal pairs ( , ) s.t. X Y

  • ( ) | (0: ), (0:

1) ( ) | ( ) ( ) , Y k x k y k Y k x k x k

  • T

Y

  • (

) x y X Y k

  • 30
  • ( , )

, , . x y X Y k

slide-24
SLIDE 24

Channel Coding Theorem for Zero-Error Communication

: The zero-error capacity

  • f a stationary memoryless

uncertain channel coincides with the highest average rate of maximin information possible across it, i.e. C Thm

31

0,

sup

t X

C

=

ı ı

[ ]

ı ı

[ ]

1

* * : (0: ): (0: )

I (0 : ); (0 : ) I (0 : ); (0 : ) lim sup . 1 1 : is defined , as the largest rate over all block codes that permit unambiguous recovery of the input sequence.

t

t X X t X t

X t Y t X t Y t t t Note C

  • perationally

∞ +

→∞ ⊆ ⊆

= + +

X X

This result gives an characterization. intrinsic

slide-25
SLIDE 25

Remarks

The idea of a common (random) variable Z comes from

cryptography [Wolf & Wullschleger, ITW2004]

  • There, Z is formally defined by the connected components of

the discrete bipartite graph describing (x,y) pairs having joint prob.> 0.

32

prob.> 0.

  • Taxicab connectedness generalises this to continuous-valued

and mixed pairs of variables, not representable by discrete graphs.

C0 was shown by Wolf & Wullschleger to coincide with the

maximum Shannon entropy rate over all common rv’s Z. However, this is still a probabilistic characterisation.

  • Maximin information coincides with the Hartley entropy of the

maximal common rv Z.

slide-26
SLIDE 26

State Estimation of Disturbance-Free LTI Systems

feedback. channel No . ) ( ) : ( uv. a ) ( ), ( ) ( ), ( ) 1 ( → ∈ = = + t S t Y X t GX t Y t AX t X 2 S : Channel Erroneous S : Coder

Q

a

9

. ) ( ˆ ) ( sup lim , || ) ( || s.t. ) ( uv any For

  • .

) ( ˆ ) ( sup , || ) ( || s.t. ) ( uv any For :

  • are
  • bjectives

the , , parameters Given ) 1 ( ˆ ) : (

,

= − ≤ ∞ < − ≤ > + →

− Ω ∈ ∞ → − Ω ∈ ≥

t X t X l X X t X t X l X X ρ l t X t Q

t t t t

ρ ρ ρ ρ

ω ω

: e convergenc uniform l exponentia II) errors estimation bounded uniformly l exponentia I) : Estimator 2 S : Channel Erroneous a

slide-27
SLIDE 27

Assumptions

ρ

ρ ρ

≥ = state, plant initial

  • n the

depend not does channel The : . s |' eigenvalue | by governed subspace invariant to restricted : where ,

  • bservable

is ) , ( : A A A G DF2 DF1 ρ > ↔ ↔ s ' | eigenvalue | more

  • r
  • ne

has : ) : ( ) : ( ) ( ), : ( sequence input channel given ), ( to unrelated lly conditiona is ) : ( sequence

  • utput

the i.e. A t Q t S X t S X t Q DF3

slide-28
SLIDE 28

Criterion without Disturbances

= ≥ >

(*) : log then , some for achieved are errors estimation bounded uniformly l exponentia

  • If

| | 2

ρ λ ρ

ρ ρ λ

H C l

i

i

        >

n theory informatio maximin : part first

  • f

Proof ve. constructi : part second

  • f

Proof d. constructe be can e convergenc uniform l exponentia

  • achieves

that estimator

  • coder

a , any for then strictly, holds (*) if , Conversely

| |

ρ

ρ λ

l

i

slide-29
SLIDE 29

LTI State Estimation With Plant Disturbances

. detectable is ) , ( ), ( ) ( ) ( ), ( ) ( ) 1 ( A G t W t GX t Y t V t AX t X + = + = + : D0 : s Assumption

12 12

( ) ( )

) : ( ) : ( ) : ( ), 1 : ( ), ( ), : ( input channel given the , ) : ( ), 1 : ( ), ( with unrelated lly conditiona is ) : (

  • utput

channel the i.e. es, disturbanc and states plant

  • n the

depend not does channel The unrelated. mutually are & ), ( ns. realisatio e disturbanc valid are , signals null The . in bounded uniformly are &

  • f

ns Realisatio 1. s |' eigenvalue | more

  • r
  • ne

has . detectable is ) , ( t Q t S t W t V X t S t W t V X t Q W V X w v W V A ↔ ↔ − − = >

: D5 : D4 : D3 : D2 : D1 : D0 l

slide-30
SLIDE 30

Criterion with Disturbances

(**) . : | | log then , some for achieved are errors estimation bounded uniformly If

1 | | 2

= ≥ >

H C l

i

λ

λ

13 13

d. constructe be can errors estimation bounded uniformly achieves that estimator

  • coder

a , any for then strictly, holds (**) if , Conversely

1 | |

>

l

i

λ

slide-31
SLIDE 31

Remarks

In a stochastic setting (i.e. random channel and X(0)) with no

plant noise, it is known that almost-sure asymptotic convergence is possible iff ordinary capacity C > H (Matveev & Savkin 2007). The criterion here is stricter because a law of large numbers cannot be used to average out decoding errors.

If bounded, nonstochastic disturbances are present, they

showed that a.s. uniformly bounded errors are possible iff C0 > H. Proof used no info theory

14

slide-32
SLIDE 32

Conclusion

Formulated a framework for modelling unknown variables without

assuming the existence of distributions

Defined nonprobabilistic analogues of independence & Markovness Proposed maximin information as a nonstochastic index of the most

refined knowledge that can be agreed on from separate refined knowledge that can be agreed on from separate

  • bservations of two variables

Showed that zero-error capacity coincides with the highest maximin

info rate possible across the channel

Used maximin info theory to derive tight conditions for uniform state

estimation of LTI plants

15

slide-33
SLIDE 33

Future Work

Channels with input or memory constraints Network maximin information theory Systems with feedback – preliminary Systems with feedback – preliminary

results to appear in CDC 2012

16

slide-34
SLIDE 34

Extension

  • Zero Error Feedback Capacity

: n informatio maximin

  • f

in terms expressed be can channel uncertain memoryless stationary a

  • f

capacity feedback error

  • zero

l

  • perationa

The : ) 12 in appear to (GN, Theorem directed CDC

17

[ ]

n. informatio maximin is ] | ; [ log min : ] | ; [ where ], [ : ) 1 : ( | ) ( ); ( 1 1 sup lim

2 ]] [[ * * * ) : ( ), : ( 0F

l conditiona z Y X Z Y X I Y X I k Y k Y k X I t C

Z z t k t Y t X t

T

∈ = ∞ →

= → = − + =

slide-35
SLIDE 35

Thank You!

References

  • GN, “A nonstochastic information theory for communication and state estimation”,

http://arxiv.org/abs/1112.3471. (Provisionally accepted by IEEE Trans Auto. Contr; short version in

  • Proc. 9th IEEE Int. Conf. Control & Automation, Santiago, Chile, Dec. 2011.)
  • -, “ A nonstochastic information theory for feedback”, to appear in Proc. IEEE CDC, Dec. 2012.
  • G. Klir, Uncertainty and Information Foundations of Generalized Information Theory, Wiley, 2006,
  • ch. 2.
  • ch. 2.
  • H. Shingin and Y. Ohta, “Disturbance rejection with information constraints: Performance limitations
  • f a scalar system for bounded and Gaussian disturbances,'' Automatica, 2012.
  • S. Wolf and J. Wullschleger, “Zero-error information and applications in cryptography”, Info. Theory

Workshop, San Antonio, USA ,2004.

  • C.E. Shannon, “The zero-error capacity of a noisy channel”, IRE Trans. Info. Theory, vol. 2, 1956.
  • S. Tatikonda and S. Mitter, “Control under communication constraints,” IEEE TAC., 2004.
  • A.S. Matveev and A.V. Savkin, ``Shannon zero error capacity in the problems of state estimation

and stabilization via noisy communication channels,'‘ Int. Jour. Contr., 2007.

  • , ``An analogue of Shannon information theory for detection and stabilization via noisy discrete

communication channels”, SIAM J. Contr. Optim., 2007.

  • J. Massey, ``Causality, feedback and directed information,'' in Int. Symp. Inf. Theory App., 1990.