SLIDE 1 What is Information?∗
Department of Computer Science Purdue University
April 30, 2008 AofA and IT logos
INRIA 2008
∗Participants of Information Beyond Shannon, Orlando, 2005, and J. Konorski, Gdansk, Poland.
SLIDE 2 Outline
- 1. Standing on the Shoulders of Giants . . .
- 2. What is Information?
- 3. Shannon Information
- Beyond Shannon
- Temporal and Darwin Channels
- 4. Physics of Information
- Shannon vs Boltzmann
- Maxwell’s Demon, Szilard’s Engine, and Landauer’s Principle
- 5. Ubiquitous Information (Biology, Chemistry, Physics)
- 6. Today’s Challenges
- 7. Science of Information
SLIDE 3 Standing on the Shoulders of Giants . . .
acker: “Information is only that which produces information” (relativity). “Information is only that which is understood” (rationality) “Information has no absolute meaning”.
“. . . Information is as much a property of your own knowledge as anything in the message. . . . Information is not simply a physical property of a message: it is a property of the message and your knowledge about it.”
“It from Bit”. (Information is physical.)
“These semantic aspects of communication are irrelevant . . .”
SLIDE 4 Structural and Biological Information
- F. Brooks, jr. (JACM, 50, 2003, “Three Great Challenges for . . . CS ”):
“Shannon and Weaver performed an inestimable service by giving us a definition of Information and a metric for for Information as communicated from place to place. We have no theory however that gives us a metric for the Information embodied in structure . . . this is the most fundamental gap in the theoretical underpinning of Information and computer science. . . . A young information theory scholar willing to spend years on a deeply fundamental problem need look no further.”
“The differentiable characteristic of the living systems is Information. Information assures the controlled reproduction of all constituents, thereby ensuring conservation of viability . . . . Information theory, pioneered by Claude Shannon, cannot answer this question . . . in principle, the answer was formulated 130 years ago by Charles Darwin.
SLIDE 5
What is then Information?
Information has the flavor of: relativity (depends on the activity undertaken), rationality (depends on the recipient’s knowledge), timeliness (temporal structure), space (spatial structure). Informally Speaking: A piece of data carries information if it can impact a recipient’s ability to achieve the objective of some activity within a given context. Using the event-driven paradigm, we may formally define: Definition 1. The amount of information (in a faultless scenario) info(E) carried by the event E in the context C as measured for a system with the rules of conduct R is infoR,C(E) = cost[objectiveR(C(E)), objectiveR(C(E) + E)] where the cost (weight, distance) is taken according to the ordering of points in the space of objectives.
SLIDE 6 Example: Decimal Representation
Example 1: In a decimal representation of π, the objective is to learn the number π and P is to compute successive digits approximating π. Imagine we are drawing circles of circumferences, i.e., 3, 3.1, 3.14, 3.141 etc., and measure the respective diameters i.e., .9549, .9868, .9995, .9998, which asymptote to the ideal 1.
1.0 0.98 . 9 5
info = information is the the difference between successive deviations from the ideal 1. For example:
- event ”3” carries (1 − 0) − (1 − .9549) = .9549,
- event ”1” carries (1 − .9549) − (1 − .9868) = .0319,
- event ”4” carries (1 − .9995) − (1 − .9868) = .0127, etc.
SLIDE 7 Example: Distributed Information
- 1. Example 2: In an N-threshold secret sharing scheme, N subkeys of the
decryption key roam among A × A stations.
- 2. By protocol P a station has access:
- only it sees all N subkeys.
- it is within a distance D from all subkeys.
. . . . . . . . . . x . . . . . . . . . . . . . . x x x x x x x . . . . . . . . . x x x x x x x x x x . . . . . . . x x x x x x x x x x x . . . . . . . x x x x x x x x x x x . . . . . . x x x x ⋆ x ⋆ x x x x x x . . . . . x x x x x x x x x x x x . . . . . . x x x x x x x x x x x x . . . . . x x x x x x x x ⋆ x x x x . . . . . . x x x x x x x x x x x x . . . . . . x x x x x x x x x x x x . . . . . . . x x x x x x x x x . . . . . . . . . . . x x x x x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- 3. Assume that the larger N,
the more valuable the secrets. We define the amount of information as info= N × {# of stations having access} .
SLIDE 8 Outline Update
- 1. Standing on the Shoulders of Giants . . .
- 2. What is Information?
- 3. Shannon Information
- 4. Physics of Information
- 5. Ubiquitous Information
- 6. Today’s Challenges
SLIDE 9 Shannon Information . . .
In 1948 C. Shannon created a powerful and beautiful theory of information that served as the backbone to a now classical paradigm of digital communication. In our setting, Shannon defined:
statistical ignorance of the recipient; statistical uncertainty of the recipient. cost: # binary decisions to describe E; = − log P (E); P (E) being the probability of E. Context: the semantics of data is irrelevant . . . Self-information for Ei: info(Ei) = − log P (Ei). Average information: H(P ) = − P
i P (Ei) log P (Ei)
Entropy of X = {E1, . . .}: H(X) = − P
i P (Ei) log P (Ei)
Mutual Information: I(X; Y ) = H(Y ) − H(Y |X), (faulty channel). Shannon’s statistical information tells us how much a recipient of data can reduce their statistical uncertainty by observing data. Shannon’s information is not absolute information since P (Ei) (prior knowledge) is a subjective property of the recipient.
SLIDE 10 Shortest Description, Complexity
Example: X can take eight values with probabilities: (1
2, 1 4, 1 8, 1 16, 1 64, 1 64, 1 64, 1 64).
Assign to them the following code: 0, 10, 110, 1110, 111100, 111101, 111110, 111111, The entropy X is H(X) = 2 bits. The shortest description (on average) is 2 bits. In general, if X is a (random) sequence with entropy H(X) and average code length L(X), then H(X) ≤ L(X) ≤ H(X) + 1. Complexity vs Description vs Entropy The more complex X is, the longer its description is, and the bigger the entropy is.
SLIDE 11
Three Jewels of Shannon
Theorem 1. [Shannon 1948; Lossless Data Compression]. compression bit rate ≥ source entropy H(X).
(There exists a codebook of size 2nR of universal codes of length n with R > H(X) and probability of error smaller than any ε > 0.)
Theorem 2. [Shannon 1948; Channel Coding] In Shannon’s words: It is possible to send information at the capacity through the channel with as small a frequency of errors as desired by proper (long) encoding. This statement is not true for any rate greater than the capacity.
(The maximum codebook size N(n, ε) for codelength n and error probability ε is asymptotically equal to: N(n, ε) ∼ 2nC.)
Theorem 3. [Shannon 1948; Lossy Data Compression]. For distortion level D: lossy bit rate ≥ rate distortion function R(D).
SLIDE 12 Rissanen’s MDL Principle
1. Objective(P, C) may include the cost of the very recognition and interpretation of C. 2. In 1978 Rissanen introduced the Minimum Description Length (MDL) principle (Occam’s Razor) postulating that the best hypothesis is the one with the shortest description.
- 3. Universal data compression is used to realize MDL.
- 4. Normalized maximum likelihood (NML) code: Let Mk = {Qθ : θ ∈ Θ}
and let ˆ θ minimize − log Qθ(x). The minimax regret is r∗
n(M) = min Q max x
» log Qˆ
θ(x)
Qθ(x) – = log X
x
Qˆ
θ(x) = log
X
x
sup
θ
Qθ(x). Rissanen proved for memoryless and Markov sources: r∗
n(Mk) = k
2 ln n 2π + ln Z
θ
q |I(θ)|dθ + o(1).
- 5. Why to restrict analysis to prefix codes? Fundamental lower bound? For
- ne-to-one codes (cf. W.S., ISIT, 2005).
redundancy = −1 2log n + O(1)?
SLIDE 13
Beyond Shannon
Participants of the 2005 Information Beyond Shannon workshop realize: Delay: In networks, delay incurred is a issue not yet addressed in information theory (e.g., complete information arriving late maybe useless). Space: In networks the spatially distributed components raise fundamental issues of limitations in information exchange since the available resources must be shared, allocated and re-used. Information is exchanged in space and time for decision making, thus timeliness of information delivery along with reliability and complexity constitute the basic objective. Structure: We still lack measures and meters to define and appraise the amount of information embodied in structure and organization. Semantics. In many scientific contexts, one is interested in signals, without knowing precisely what these signals represent. What is semantic information and how to characterize it? How much more semantic information is there when when compared with its syntactic information? Limited Computational Resources: In many scenarios, information is limited by available computational resources (e.g., cell phone, living cell). Physics of Information: Information is physical (J. Wheeler).
SLIDE 14 Some Things to Think About . . .
Here is a short list of “toy problems” to think about:
- Temporal Capacity (e.g., assign transmission time to each symbol or a
block of symbols).
- Spatial Capacity (e.g., destination may be in different locations).
- Darwin Channel that models flow
- f genetic information (e.g.,
a combination of a deletion/insertion channel and constrained channel).
- Distributed Information (information here/local and there/distributed is
not the same?)
- Speed of Information (how fast information can be spread out?)
- Entropy of a structure (e.g., graph entropy)
- Representation-invariant measure of information (Shannon, 1953).
SLIDE 15 Temporal Capacity
- 1. Binary symmetric channel (BSC): each bit incurs a delay.
- 2. Delay T has known probability distribution: F (t) = P (T < t).
If a bit arrives after a given deadline τ, it is dropped.
- 3. The longer it takes to send a bit, the lower the probability of a success,
which we denote by Φ(ε, t) for t < τ (e.g., Φ(ε, t) = (1 − ε)t).
R τ
0 Φ(ε, t)dF (t): prob. of a successfully transmission:
P (y|x) = 8 > < > : α := 1 − F (τ) y = erasure P (x|x) if x = y 1 − α − P (x|x) if x = y.
- 5. Define: α = 1 − F (τ) and ρ := P(x|x)
(1−α) .
Note C(τ) := H(Y ) − H(Y |X), where H(Y |X) = H(α) + (1 − α)H(ρ) and H(Y ) = H(α) + (1 − α)H(pρ + ¯ p¯ ρ). Then:
0.44 0.42 tau 10 8 6 4 0.45 2 0.43
C(τ) = [(1 − P (T > τ)][1 − H(ρ)].
SLIDE 16 Darwin Channel
To capture sources of variation and natural selection, one is tempted to introduce the so called Darwin channel that models the flow of genetic
- information. (Special case of it is the noisy constrained channel.)
CHANNEL SELECTION FUNCTION noise (MUTATION) (surviving) Y drop X ∈ S X S Y S drop
SLIDE 17 Outline Update
- 1. Standing on the Shoulders of Giants . . .
- 2. What is Information?
- 3. Shannon Information
- 4. Physics of Information
- Shannon vs Boltzmann
- Maxwell’s Demon, Szilard’s Engine, and Landauer’s Principle
- 5. Ubiquitous Information
- 6. Today’s Challenges
- R. Feynmann (Lectures on Computation):
. . . information is proportional to the free energy required to reset a “tape” (message) to a fixed state . . .
SLIDE 18 Clausius and Boltzmann Entropies
- R. Clausius in 1850 defined entropy as
dS = dQ
T
where Q is heat and T temperature. Boltzmann defined in 1877 statisticall entropy S as S = k log W where W is the number of molecule macrostates, and k is Boltzmann’s constant (to get the correct units). How do we interpret Boltzmann’s entropy? Boltzmann wanted to find out: How are molecules distributed? How are Clausius, Shannon and Boltzmann’s entropies related (cf. Brillouin, Jaynes, Tribus)?
SLIDE 19 Boltzmann → Shannon
Divide space into m cells each containing Nk molecules with energy Ek. How many configurations, W , are there?
Nk Ek cells cells
m m
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
W =
N! N1!N2!···Nm!
subject to N =
m
X
i=1
Ni, E =
m
X
i=1
NiEi Boltzmann asked: Which distribution is the most likely to occur? Boltzmann’s answer: the most probable distribution is the one that occurs in the greatest number of ways! Solving the constrained optimization problem, we find (cf.
log W = −N Pm
i=1
“Ni
N
” log “Ni
N
” = NH(P ) (for Pi = Ni
N = e−βEi Z
=
e−βEi P i e−βEi),
β =
1 kT is the Largrange’s multiplier, and Z = P i e−βEi is the partition
function.
SLIDE 20 Shannon → Clausius
We start from S = k log W = −k X
i
Pi log Pi where Pi = Ni N = e−βEi Z , β = 1 kT . But dS = −k X
i
(dPi log Pi + dPi) = −k X
i
dPi log Pi since d P
i Pi = 0.
Observe that log Pi = −βEi − log Z, hence dS = kβ X
i
EidPi = dQ T where dQ = P
i EidPi represents heat transfered from the outside.
SLIDE 21
Maxwell’s Demon
Second Law of Thermodynamics: Physics: ∆S ≥ 0 Information Theory (T. Cover) For Markov source Xn: (i) with uniform statationary dist.: H(Xn+1) ≥ H(Xn); (ii) stationary Markov source: H(Xn+1|X1) ≥ H(Xn|X1). The second law of thermodynamics: the total entropy of any thermodynamically isolated system tends to increase over time, that is, ∆S ≥ 0. Is the Second Law of Thermodynamics violated by Maxwell’s Demon?
SLIDE 22 Szilard’s Engine
(b)
.
.
W (d) (c)
Q = W
Szilard’s Engine: (Acquiring) Information ⇒ Energy. Energy: E = kNT ln(VF/VI). In Szilard’s engine, there is one molecule, hence E = kT ln(2). 1 bit of information = kT ln 2 (joules) of energy.
SLIDE 23
Landauer’s Principle: Limits of Computations
Landauer’s Principle (1961): Rolf Landauer argued in 1961 that: ”any logically irreversible manipulation of information (e.g., erasure of a bit or the merging of two computations) is also physical irreversible and must be accompanied by a corresponding entropy increase . . .”. Information erasure ≡ amount of energy to reset a system to zero (asymmetry of resetting). Information is there only only if we randomize the molecule. By Szilard: Energy equals T ∆S = kT log(VF/VI) = kT log 2. By Boltzmann: : Energy equals T∆S = kT log(W ) = kT log 2. von Neumann-Landauer Bound: Irreversible computation ≥ kT ln 2 (joules)
SLIDE 24 Maxwell’s demon Explained
- R. Feynmann (Lectures on Computation):
. . . information is proportional to the free energy required to reset a “tape” (message) to a fixed state . . .
How much work (fuel) can be extracted from a tape? Let I be information and N number of bits (molecules): work = (N − I)kT log 2. Maxwell’s demon explained: C.H. Bennett observed that to determine what side of the gate a molecule must be on, the demon must store information about the state of the molecule. Eventually(!) the demon will run out of information storage space and must begin to erase the information and by Landauer’s principle this will increase the entropy of a system.
SLIDE 25 Bennett’s Argument
S S S L R L R S R L L S R a b c d e f = a
Figure 1: Bennett’s argument: erasure of information not measurement is the source of entropy generation.
SLIDE 26 Outline Update
- 1. Standing on the Shoulders of Giants . . .
- 2. What is Information?
- 3. Shannon Information
- 4. Physics of Information
- 5. Ubiquitous Information (Biology, Chemistry, Economics, Physics)
- 6. Today’s Challenges
SLIDE 27 Ubiquitous Information (Biology)
Life is a delicate interplay of energy, entropy, and information; essential functions of living beings correspond to the generation, consumption, processing, preservation, and duplication of information.
- How information is generated and transferred through underlying
mechanisms of variation and selection (Darwin channel).
- How information in biomolecules (sequences and structures) relates to
the organization of the cell.
- Whether there are error correcting mechanisms (codes) in biomolecules.
- How organisms survive and thrive in noisy environments.
SLIDE 28
Ubiquitous Information (Chemistry)
In chemistry information may be manifested in shapes and structures. Amorphous solid: no long-range order. Crystalline solids: long-range atomic order. By how much are amorphous solids more complex than crystalline solids? Structural Information Distinctivness of nodes: {a}, {b, c}, {d, e}. Group Automorphism and Orbits.
a b c d e
SLIDE 29 Quantum Information
The laws of nature . . . in quantum theory . . . deal with . . . our knowledge of the elementary particles. The concept of objective reality . . . evaporates into . . .
- ur knowledge of its behavior.
. . . any attempt to measure [that] property destroys (at least partially) the influence
- f earlier knowledge of the system.
. . . the laws of physics are limited by the range of information processing available. . . . reality and information are two sides of the same coin, that is, they are in a deep sense indistinguishable.
SLIDE 30 Limited Resources Information in Quantum
Quantum physics is a theory of information for systems with limited information resources (Brukner & Zeilinger, 2006). Brukner and Zeilinger (2001, 2006) postulate:
- 1. The information content of a quantum system is finite.
- . . . randomness is a direct consequence of the fact that no enough
information is available to pre-define the outcomes of all possibilities.
- For complementarity the information available suffices to define the
- utcomes of mutually complementary measurements.
- Entanglement is a consequence of finite information available to
characterize only joint observations (Zeilinger, Nature, 2005).
- 2. The most elementary system represents the truth value of one
proposition.
- 3. The most elementary system carries 1 bit of information.
- 4. N elementary systems carry N bits of information.
SLIDE 31
Shannon Postulates for Entropy
SLIDE 32 Brukner-Zeilinger Experiment
In quantum mechanics events do not necessarily commute, thus H(A, B)=H(B, A) (!) and potentially H(B)<H(B|A) (!). Brukner & Zeilinger suggest to define total information in quantum as I(p1, . . . , pn) =
n
X
i=1
„ pi − 1 n «2 = X
i
p2
i − 1
n = 2−H2(p) − 1 n so that the sum of individual measures over mutually complementary measurements is invariant under unitary transformations.
SLIDE 33 Law of Information?
The flow of information about an object into its surrounding is called decoherence (increases entanglement with its environment) [H. Zeh, W. Zurek]. Decoherence occurs very, very, very fast, in 10−10 – 10−20 seconds. The essential difference between microscopic world (quantum) and macroscopic world is decoherence. Entropy and decoherence are related, but while entropy operates on a time scale of microseconds, decoherence works a billion times faster. A new law of Information(?): Information can be neither created nor destroyed.
stored information of any “isolated system” tends to dissipate.
SLIDE 34 Today’s Challenges
- We still lack measures and meters to define and appraise the amount of
structure and organization embodied in artifacts and natural objects.
- Information accumulates at a rate faster than it can be sifted through,
so that the bottleneck, traditionally represented by the medium, is drifting towards the receiving end of the channel.
- Timeliness, space and control are important dimensions of Information.
Time and space varying situations are rarely studied in Shannon Information Theory.
- In a growing number of situations,
the overhead in accessing Information makes information itself practically unattainable
- r
- bsolete.
- Microscopic systems do not seem to obey Shannon’s postulates of
Information. In the quantum world and on the level of living cells, traditional Information often fails to accurately describe reality.
- What is the impact of rational/noncooperative behavior on information?
What is the relation between value information and information?
SLIDE 35 Science of Information
I N F O R M A T I O N
Information embodied in structures & (chemistry) Physics
information Value
information ( economics) Information transfer in life science Temporal &
spatial
information (wireless, brain) Information
communication (information theory) Information & knowledge, (meaning, shapes Kolmogorov)
SLIDE 36 Institute for Science of Information
At Purdue we initiated the Institute for Science of Information integrating research and teaching activities aimed at investigating the role of information from various viewpoints: from the fundamental theoretical underpinnings of information to the science and engineering
- f novel information substrates, biological pathways, communication
networks, economics, and complex social systems. The specific means and goals for the Center are:
- continue the Prestige Science Lecture Series on Information to collectively
ponder short and long term goals;
- study dynamic information theory that extends information theory to
time–space–varying situations;
- advance information algorithmics that develop new algorithms and
data structures for the application of information;
- encourage and facilitate interdisciplinary collaborations;
- provide scholarships and fellowships for the best students, and support
the development of new interdisciplinary courses.