SLIDE 1 Lachlan J. Gunn1 François Chapeau-Blondeau2 Andrew Allison1 Derek Abbott1
1School of Electrical and Electronic Engineering
The University of Adelaide
2Laboratoire Angevin de Recherche en Ingénierie des Systèmes (LARIS)
University of Angers
Towards an information-theoretic model
A canonical measure of dependence for a special mixture distribution
SLIDE 2
Mixture processes
▶ Many systems can be split into independent multiplexed
subsystems.
X₁ X₂ Y S
X₂ X₁
▶ These situations occur quite often.
SLIDE 3 Applications of mixture processes
▶ Radar—sea clutter can be modelled with a KK-distribution
pKK(x) = (1−k) pK(x; ν₁, σ₁) + k pK(x; ν₂, σ₂)
Rolling sea (1−k) of the time… …and an occasional spike in the return.
Photo: Graham Horn Photo: Malene Thyssen
Yunhan Dong, DSTO-RR-0316 “Distribution of X-Band High Resolution and High Grazing Angle Sea Clutter” (2006)
SLIDE 4 Applications of mixture processes
▶ Speech—random signals corresponding to phenomes are joined
to form the sound of words. "bat"
Lips Alveolar Ridge Palate (Roof of Mouth) Nasal Cavity Velum Oral Cavity Voice Box Teeth Tongue (Tongue tip & Tongue Blade)
/b/ /æ/ /t/
Photo: Oren Peles Image: Tavin, Wikimedia Commons Image: doi:10.3389/fnhum.2013.00749
Gales, Young, “The Application of Hidden Markov Models in Speech Recognition,’, Foundations and Trends in Signal Processing, 1 (3), 2007.
SLIDE 5
Applications of mixture processes
▶ Natural language processing—word frequency distributions can
be modelled by a mixture of Poisson distributions.
Javertiness
Table of contents Javert's introduction Valjean's monologue Javert testifies
Mentions of Javert in Les Misérables
Battle of Waterloo
SLIDE 6
The Allison Mixture
▶ Which input process is visible at the output? ▶ One option is to choose independently each time:
pY(y) = kp0(y) + (1 − k)p1(y)
= ⇒ Independent inputs yield independent outputs.
▶ This is too restrictive, often the choices of sample are dependent.
SLIDE 7
The Allison Mixture
▶ Instead, let’s—sometimes—switch from one input to the other.
X₁ X₂ α₁ α₂ 1−α₂ 1−α₁
▶ This forms a Markov chain.
SLIDE 8
Independence of the Allison mixture
▶ This makes output samples dependent, even if the inputs are not. X₁ X₂ Y S
X₂ X₁
SLIDE 9
Autocovariance
▶ It is known that the lag-one autocovariance is given by
RYY[1] = α1α2 α1 + α2
(1 − α1 − α2)(μ1 − μ2)2,
where μ1 and μ2 are the input means.
▶ If X1 and X2 are N(μi, σ2) then this is just a noisy version of S. Gunn, Allison, Abbott, “Allison mixtures: where random digits obey thermodynamic principles”, International Journal of Modern Physics: Conference Series 33 (2014).
SLIDE 10
Uncorrelatedness
▶ This gives us the conditions for a correlated output:
α1 ̸= 0 α2 ̸= 0 α1 + α2 ̸= 1 μ1 ̸= μ2
▶ If any of these are violated, consecutive samples are uncorrelated.
SLIDE 11
Moving beyond a single step
▶ We previously demonstrated only the appearance of correlation
in the Allison mixture. Let’s fill in the remaining details!
▶ The state-transition matrix of the Markov chain S is
P =
[
1 − α1 α2 α1 1 − α2
] .
▶ Taking every k-th sample of an Allison mixture yields another
Allison mixture—the choice of input is still Markovian.
Gunn, Chapeau-Blondeau, Allison, Abbott, Unsolved Problems of Noise (2015).
SLIDE 12
Moving beyond a single step
▶ By taking Pk and reading ofg the minor diagonal, we find the
k-step transition probabilities α1[k] = α2 α1 + α2
[
1 − (1 − α1 − α2)k] α2[k] = α1 α1 + α2
[
1 − (1 − α1 − α2)k]
.
▶ The initial coeffjcients are the stationary probabilities π1 and π2
respectively.
SLIDE 13
Multi-step autocovariance
▶ We substitute these back into the autocovariance formula,
yielding RYY[k] = α1α2 α1 + α2
(μ1 − μ2)2(1 − α1 − α2)k = RYY[1](1 − α1 − α2)k.
▶ The autocovariance thus decays exponentially with time.
SLIDE 14
Autoinformation
▶ Autoinformation provides a more-easily-computed alternative to
entropy rate.
▶ Information theory lets us capture dependence that does not
induce correlation.
SLIDE 15
Autoinformation
Definition (Autoinformation)
We define the autoinformation IXX[n, k] IXX[n, k] = I(X[n]; X[n − k]) , which simplifies, for a stationary process, to IXX[k] = H(X[n], X[n − k]) − 2H(X[n]).
SLIDE 16
A path towards Allison mixture autoinformation
▶ Can we take the same approach as with autocovariance?
Calculate Autocovariance Calculate Autoinformation S[n] Transform according to input processes Transform according to input processes
Rss[k] Iss[k] Rss[k] (μ₁−μ₂)² ???
SLIDE 17
Sampling process autoinformation
▶ We compute the autoinformation from the stationary and
transition probabilities using I(X; Y) = H(X) − H(X|Y). Ixx[1] = α2(1 − α1) log2
1−α1 α2
α1 + α2
+
α1(1 − α2) log2
1−α2 α1
α1 + α2 (1)
+ log2(α1 + α2),
SLIDE 18
Sampling process autoinformation
0.0 0.5 1.0 0.0 1.0 0.5 0.0 1.0 0.2 0.4 0.6 0.8
α₂ α₁ Autoinformation
SLIDE 19
Sampling process autoinformation
10¯⁵ 10¯⁴ 10¯³ 10¯² 10¯¹ 10¯⁰ 5 10 15 20 Lag Autoinformation (bits)
SLIDE 20 Allison mixture autoinformation
▶ How do we apply this to the Allison mixture? ▶ Binary-valued outputs: X1[k], X2[k] ∈ 0, 1
▶ Use Bayes law to find the probability of each state:
P [S[k] = s|Y[k]] = P[Y[k]|S[k] = s]πs
∑
q P[Y[k]|S[k] = q]πq
▶ We now know enough to find the transition probabilities for Y:
Y[k] −
→ S[k] − → S[k + 1] − → Y[k + 1]
SLIDE 21 Allison mixture autoinformation
▶ It turns out that the previous autoinformation formula works here
with transition probabilities
α′
0 =
α1(1 − p0) [p0(1 − α0) + p1α0] + α0(1 − p1) [p0α1 + p1(1 − α1)] α0(1 − p1) + α1(1 − p0) α′
1 =
α1p0 [(1 − p0)(1 − α0) + (1 − p1)α0] + α0p1 [(1 − p0)α1 + (1 − p1)(1 − α1)] α0p1 + α1p0
.
▶ A formula of this complexity that only works for binary processes
is not the end of the road.
SLIDE 22 Open problems
▶ Can a similar technique be applied to more general input
processes?
▶ Continuous distributions are important.
▶ Could this system be useful for studying transfer entropy?
▶ Transfer entropy is the “information transfer” between two
systems.
▶ Previous studies have revolved around chaotic systems.