SLIDE 1
Categorical Probability: Results and Challenges Tobias Fritz May - - PowerPoint PPT Presentation
Categorical Probability: Results and Challenges Tobias Fritz May - - PowerPoint PPT Presentation
Categorical Probability: Results and Challenges Tobias Fritz May 2019 What this talk is (not) Categorical probability is like finding the sea route to India: Many possible routes to be explored without a coherent overall map. We may end
SLIDE 2
SLIDE 3
A (not so) random sample of contributors
Bill Lawvere
?
Mich` ele Giry Prakash Panangaden Bart Jacobs
Paolo Perrone Sharwin Rezagholi David Spivak
SLIDE 4
Motivation
⊲ Category theory has been hugely successful in algebraic geometry, algebraic topology, and theoretical computer science. ⊲ Contemporary research in these fields can hardly even be conceived of without categorical machinery. ⊲ Can and should we expect similar success in other areas? ⊲ A case in point: probability theory!
SLIDE 5
Motivation
A structural treatment can help us achieve: ⊲ Improved conceptual clarity. ⊲ Greater generality due to higher abstraction. ⊲ Therefore applicability in a range of contexts instead of only one.
SLIDE 6
For example, let Sh(IR) be the category of sheaves on the poset of compact intervals in R.
Conjecture (with David Spivak)
A probability space internal to Sh(IR) is the same thing as an external stochastic process. Suitably structural results on probability would therefore immediately give results on stochastic processes.
SLIDE 7
But first, what is probability theory? ⊲ The study of randomness. ⊲ Fundamental insight: probability is volume! ⇒ Measure theory. ⊲ Central themes:
⊲ Random variables and their distributions. ⊲ Theorems involving infinitely many variables. ⊲ Conditioning and Bayes’ rule.
SLIDE 8
An example statement:
Central limit theorem
Let (Xn)n∈N be i.i.d. random variables with E[Xn] = µ and V[Xn] = σ. Then √n
- 1
n
n
- i=1
Xi − µ
- n→∞
− → N(0, σ). converges in distribution.
(Wikipedia, Cflm001)
SLIDE 9
Structures in categorical probability
Probability monad: ⊲ probability measures ⊲ pushforward of measures ⊲ point measures δx ⊲ averaging of measures Eilenberg–Moore category: ⊲ integration ⊲ stochastic dominance ⊲ martingales Kleisli category: ⊲ stochastic maps ⊲ (conditional) independence ⊲ statistics
SLIDE 10
⊲ A probability monad lives on a category of sets or spaces. ⊲ Most basic: the convex combinations monad on Set, where DX :=
- i
ciδxi
- ci ≥ 0,
- i
ci = 1
- is the set of finitely supported probability measures on X.
⊲ p ∈ DX is a “random element” of X. For example a fair coin, 1 2δheads + 1 2δtails ∈ D ({heads, tails}) ⊲ Functoriality Df : DX → DY takes pushforward measures: applying a function to a random element of X produces a random element of Y .
SLIDE 11
⊲ The unit X → DX assigns to every x ∈ X the point mass δx at x. ⊲ The multiplication DDX → DX computes the expected distribution,
- i
ci
j
dijδxij − →
- i,j
cidijδxij ⊲ Algebras E : DA → A are “convex spaces” in which every p ∈ DA has a designated barycenter or expectation value E[p] ∈ A. x y E 1
2δx + 1 2δy
SLIDE 12
Integration: the Eilenberg–Moore side
⊲ Let A be an Eilenberg–Moore algebra, e.g. A = R. ⊲ Then for p ∈ DX and a random variable f : X → A,
- X
f dp := E[(Df )(p)]. ⊲ For g : Y → X and q ∈ DY , the change of variables formula
- Y
(f ◦ g) dq =
- X
f d(Dg)(q) then holds by functoriality, D(f ◦ g) = D(f ) ◦ D(g).
SLIDE 13
Measure theory without measure theory
Basic idea
A probability measure on X is an idealized version of a finite sample: elements (x1, . . . , xn) of X representing the uniform distribution 1
n
- i δxi.
All constructions and proofs with probability measures should be reducible to constructions and proofs with finite samples. We construct a probability monad which implements this idea and makes it precise. Let CMet be the category where ⊲ objects (X, dX) are complete metric spaces, ⊲ morphisms f : (X, dX) → (Y , dY ) are short maps, dY (f (x), f (x′)) ≤ dX(x, x′).
SLIDE 14
⊲ For S ∈ FinSet, we have the power functor CMet − → CMet, X − → X S. ⊲ We have isomorphisms X 1 ∼ = X and X S×T ∼ = (X S)T. ⊲ These make the power functors into a graded monad on CMet, which is a lax monoidal functor FinUnif − → [CMet, CMet]. ⊲ Here, FinUnif ⊆ FinSet is the subcategory of nonempty sets and functions with uniform fibres.
SLIDE 15
Theorem (with Paolo Perrone, arXiv:1712.05363)
There is a left Kan extension FinUnif [CMet, CMet] 1
! P
in the 2-category of symmetric monoidal categories and lax monoidal functors, where P is a probability monad such that PX = {Radon measures on X with finite first moment}. This reduces (parts of) measure and probability to combinatorics!
SLIDE 16
Categories of stochastic maps: the Kleisli side
Let C be a symmetric strict monoidal category where each object carries a distinguished commutative comonoid: = = = = We think of this structure as providing copy and delete operations.
SLIDE 17
Definition
C is a category with comonoids if these comonoids are compatible with the monoidal structure, and deletion is natural, = f This makes C into a semicartesian monoidal category: we have natural maps X ⊗ Y − → X, X ⊗ Y − → Y which are abstract versions of marginalization, when composed with p : I → X ⊗ Y .
SLIDE 18
Example
Let FinStoch be the category of finite sets, where morphisms f : X → Y are stochastic matrices (fxy)x∈X, y∈Y , fxy ≥ 0,
- y
fxy = 1, ⊲ fxy is the probability that the output is y given the input x. ⊲ We also write f (y|x). ⊲ Composition of morphisms is given by the Chapman–Kolomogorov equation, (g ◦ f )(z|x) :=
- y
g(z|y) f (y|x).
SLIDE 19
⊲ The monoidal structure is (g ⊗ f )(y, z|w, x) := g(y|w)f (z|x), with canonical symmetry isomorphism. ⊲ The copying operation is just copying, δ(x1, x2|x) =
- 1
if x1 = x2 = x,
- therwise.
⊲ With this, FinStoch is a category with comonoids.
SLIDE 20
Deterministic morphisms
Definition
A morphism f : X → Y is deterministic if the comonoids are natural with respect to f , f f = f ⊲ The deterministic morphisms form a cartesian monoidal subcategory. ⊲ In FinStoch, the deterministic morphisms are the stochastic matrices with entries in {0, 1}, i.e. the actual functions. They form a copy of FinSet.
SLIDE 21
Conditional independence
Categories with comonoids support several notions of conditional independence, including:
Definition
A morphism f : A → X ⊗ Y displays the conditional independence X ⊥ Y || A if there are g : A → X and h : A → Y such that = f g h One can derive the usual properties of conditional independence purely formally.
SLIDE 22
Almost surely
Definition
Given p : Θ → X, morphisms f , g : X → Y are equal p-almost surely if p f = p g ⊲ Other concepts relativize similarly to almost surely concepts.
Proposition
If gf = id, then g is f -almost surely deterministic.
SLIDE 23
Sufficient statistics
Definition
⊲ A statistical model is a morphism p : Θ → X. ⊲ A statistic for p is a deterministic split epimorphism s : X → T. ⊲ A statistic is sufficient if there is a splitting α : T → X such that p α s X T p s X T = Θ Θ
SLIDE 24
Axiom
Suppose that gf = id. Then g = f f ⊲ This holds in FinStoch. ⊲ Now there is a completely formal version of a classical result of statistics:
Fisher–Neyman factorization theorem (preliminary)
If the axiom holds, a statistic s : X → T is sufficient for p : Θ → X if and
- nly if there is a splitting α : T → X with αsp = p.
SLIDE 25
Other preliminary results
Let p : Θ → X be a statistical model. We have abstract versions of other classical theorems of statistics:
Basu’s theorem
A complete sufficient statistic for p is independent of any ancillary statistic.
Bahadur’s theorem
If a minimal sufficient statistic exists, then a complete sufficient statistic is minimal sufficient.
SLIDE 26
A challenge: zero-one laws
Kolmogorov’s and Hewitt–Savage’s zero-one law
Let ⊲ (Xn)n∈N be a sequence of random variables, ⊲ A an event which is a function of the (Xn), and ⊲ independent of (Xn)n∈F for any finite F ⊆ N (Kolmogorov), or ⊲ invariant under finite permutation of the (Xn) (Hewitt–Savage). Then p(A) ∈ {0, 1}. ⊲ A categorical reformulation and proof in a suitable class of categories with colimits may now be within reach.
SLIDE 27
A challenge: concentration of measure
Concentration of measure is the phenomenon that ⊲ if A is a set with p(A) ≥ 1/2 in a metric probability space, ⊲ then the ε-neighbourhood Aε satisfies p(Aε) ≈ 1.
Theorem (L´ evy)
On the n-sphere Sn, p(A) ≥ 1 − π 8 e− ε2n
2 ≈ 1.
Law of large numbers
Let (Xn)n∈N be an i.i.d. sequence with E[Xn] = µ. Then lim
n→∞ P
- 1
n
n
- i=1
Xi − µ
- > ε
- = 0.
SLIDE 28