Categorical Probability: Results and Challenges Tobias Fritz May - - PowerPoint PPT Presentation

▶

Feb 12, 2024 376 likes •673 views

Categorical Probability: Results and Challenges Tobias Fritz May 2019 What this talk is (not) Categorical probability is like finding the sea route to India: Many possible routes to be explored without a coherent overall map. We may end

SLIDE 1

Categorical Probability: Results and Challenges

Tobias Fritz May 2019

SLIDE 2

What this talk is (not)

Categorical probability is like finding the sea route to India: ⊲ Many possible routes to be explored without a coherent overall map. ⊲ We may end up discovering something totally different than India!

SLIDE 3

A (not so) random sample of contributors

Bill Lawvere

?

Mich` ele Giry Prakash Panangaden Bart Jacobs

Paolo Perrone Sharwin Rezagholi David Spivak

SLIDE 4

Motivation

⊲ Category theory has been hugely successful in algebraic geometry, algebraic topology, and theoretical computer science. ⊲ Contemporary research in these fields can hardly even be conceived of without categorical machinery. ⊲ Can and should we expect similar success in other areas? ⊲ A case in point: probability theory!

SLIDE 5

Motivation

A structural treatment can help us achieve: ⊲ Improved conceptual clarity. ⊲ Greater generality due to higher abstraction. ⊲ Therefore applicability in a range of contexts instead of only one.

SLIDE 6

For example, let Sh(IR) be the category of sheaves on the poset of compact intervals in R.

Conjecture (with David Spivak)

A probability space internal to Sh(IR) is the same thing as an external stochastic process. Suitably structural results on probability would therefore immediately give results on stochastic processes.

SLIDE 7

But first, what is probability theory? ⊲ The study of randomness. ⊲ Fundamental insight: probability is volume! ⇒ Measure theory. ⊲ Central themes:

⊲ Random variables and their distributions. ⊲ Theorems involving infinitely many variables. ⊲ Conditioning and Bayes’ rule.

SLIDE 8

An example statement:

Central limit theorem

Let (Xn)n∈N be i.i.d. random variables with E[Xn] = µ and V[Xn] = σ. Then √n

n

Xi − µ

n→∞

− → N(0, σ). converges in distribution.

(Wikipedia, Cflm001)

SLIDE 9

Structures in categorical probability

Probability monad: ⊲ probability measures ⊲ pushforward of measures ⊲ point measures δx ⊲ averaging of measures Eilenberg–Moore category: ⊲ integration ⊲ stochastic dominance ⊲ martingales Kleisli category: ⊲ stochastic maps ⊲ (conditional) independence ⊲ statistics

SLIDE 10

⊲ A probability monad lives on a category of sets or spaces. ⊲ Most basic: the convex combinations monad on Set, where DX :=

ciδxi

ci ≥ 0,
i

ci = 1

is the set of finitely supported probability measures on X.

⊲ p ∈ DX is a “random element” of X. For example a fair coin, 1 2δheads + 1 2δtails ∈ D ({heads, tails}) ⊲ Functoriality Df : DX → DY takes pushforward measures: applying a function to a random element of X produces a random element of Y .

SLIDE 11

⊲ The unit X → DX assigns to every x ∈ X the point mass δx at x. ⊲ The multiplication DDX → DX computes the expected distribution,

ci  

j

dijδxij   − →

cidijδxij ⊲ Algebras E : DA → A are “convex spaces” in which every p ∈ DA has a designated barycenter or expectation value E[p] ∈ A. x y E 1

2δx + 1 2δy

SLIDE 12

Integration: the Eilenberg–Moore side

⊲ Let A be an Eilenberg–Moore algebra, e.g. A = R. ⊲ Then for p ∈ DX and a random variable f : X → A,

f dp := E[(Df )(p)]. ⊲ For g : Y → X and q ∈ DY , the change of variables formula

(f ◦ g) dq =

f d(Dg)(q) then holds by functoriality, D(f ◦ g) = D(f ) ◦ D(g).

SLIDE 13

Measure theory without measure theory

Basic idea

A probability measure on X is an idealized version of a finite sample: elements (x1, . . . , xn) of X representing the uniform distribution 1

n

i δxi.

All constructions and proofs with probability measures should be reducible to constructions and proofs with finite samples. We construct a probability monad which implements this idea and makes it precise. Let CMet be the category where ⊲ objects (X, dX) are complete metric spaces, ⊲ morphisms f : (X, dX) → (Y , dY ) are short maps, dY (f (x), f (x′)) ≤ dX(x, x′).

SLIDE 14

⊲ For S ∈ FinSet, we have the power functor CMet − → CMet, X − → X S. ⊲ We have isomorphisms X 1 ∼ = X and X S×T ∼ = (X S)T. ⊲ These make the power functors into a graded monad on CMet, which is a lax monoidal functor FinUnif − → [CMet, CMet]. ⊲ Here, FinUnif ⊆ FinSet is the subcategory of nonempty sets and functions with uniform fibres.

SLIDE 15

Theorem (with Paolo Perrone, arXiv:1712.05363)

There is a left Kan extension FinUnif [CMet, CMet] 1

! P

in the 2-category of symmetric monoidal categories and lax monoidal functors, where P is a probability monad such that PX = {Radon measures on X with finite first moment}. This reduces (parts of) measure and probability to combinatorics!

SLIDE 16

Categories of stochastic maps: the Kleisli side

Let C be a symmetric strict monoidal category where each object carries a distinguished commutative comonoid: = = = = We think of this structure as providing copy and delete operations.

SLIDE 17

Definition

C is a category with comonoids if these comonoids are compatible with the monoidal structure, and deletion is natural, = f This makes C into a semicartesian monoidal category: we have natural maps X ⊗ Y − → X, X ⊗ Y − → Y which are abstract versions of marginalization, when composed with p : I → X ⊗ Y .

SLIDE 18

Example

Let FinStoch be the category of finite sets, where morphisms f : X → Y are stochastic matrices (fxy)x∈X, y∈Y , fxy ≥ 0,

fxy = 1, ⊲ fxy is the probability that the output is y given the input x. ⊲ We also write f (y|x). ⊲ Composition of morphisms is given by the Chapman–Kolomogorov equation, (g ◦ f )(z|x) :=

g(z|y) f (y|x).

SLIDE 19

⊲ The monoidal structure is (g ⊗ f )(y, z|w, x) := g(y|w)f (z|x), with canonical symmetry isomorphism. ⊲ The copying operation is just copying, δ(x1, x2|x) =

if x1 = x2 = x,

therwise.

⊲ With this, FinStoch is a category with comonoids.

SLIDE 20

Deterministic morphisms

Definition

A morphism f : X → Y is deterministic if the comonoids are natural with respect to f , f f = f ⊲ The deterministic morphisms form a cartesian monoidal subcategory. ⊲ In FinStoch, the deterministic morphisms are the stochastic matrices with entries in {0, 1}, i.e. the actual functions. They form a copy of FinSet.

SLIDE 21

Conditional independence

Categories with comonoids support several notions of conditional independence, including:

Definition

A morphism f : A → X ⊗ Y displays the conditional independence X ⊥ Y || A if there are g : A → X and h : A → Y such that = f g h One can derive the usual properties of conditional independence purely formally.

SLIDE 22

Almost surely

Definition

Given p : Θ → X, morphisms f , g : X → Y are equal p-almost surely if p f = p g ⊲ Other concepts relativize similarly to almost surely concepts.

Proposition

If gf = id, then g is f -almost surely deterministic.

SLIDE 23

Sufficient statistics

Definition

⊲ A statistical model is a morphism p : Θ → X. ⊲ A statistic for p is a deterministic split epimorphism s : X → T. ⊲ A statistic is sufficient if there is a splitting α : T → X such that p α s X T p s X T = Θ Θ

SLIDE 24

Axiom

Suppose that gf = id. Then g = f f ⊲ This holds in FinStoch. ⊲ Now there is a completely formal version of a classical result of statistics:

Fisher–Neyman factorization theorem (preliminary)

If the axiom holds, a statistic s : X → T is sufficient for p : Θ → X if and

nly if there is a splitting α : T → X with αsp = p.

SLIDE 25

Other preliminary results

Let p : Θ → X be a statistical model. We have abstract versions of other classical theorems of statistics:

Basu’s theorem

A complete sufficient statistic for p is independent of any ancillary statistic.

Bahadur’s theorem

If a minimal sufficient statistic exists, then a complete sufficient statistic is minimal sufficient.

SLIDE 26

A challenge: zero-one laws

Kolmogorov’s and Hewitt–Savage’s zero-one law

Let ⊲ (Xn)n∈N be a sequence of random variables, ⊲ A an event which is a function of the (Xn), and ⊲ independent of (Xn)n∈F for any finite F ⊆ N (Kolmogorov), or ⊲ invariant under finite permutation of the (Xn) (Hewitt–Savage). Then p(A) ∈ {0, 1}. ⊲ A categorical reformulation and proof in a suitable class of categories with colimits may now be within reach.

SLIDE 27

A challenge: concentration of measure

Concentration of measure is the phenomenon that ⊲ if A is a set with p(A) ≥ 1/2 in a metric probability space, ⊲ then the ε-neighbourhood Aε satisfies p(Aε) ≈ 1.

Theorem (L´ evy)

On the n-sphere Sn, p(A) ≥ 1 − π 8 e− ε2n

2 ≈ 1.

Law of large numbers

Let (Xn)n∈N be an i.i.d. sequence with E[Xn] = µ. Then lim

n→∞ P

n

Xi − µ

> ε
= 0.

SLIDE 28

Summary

⊲ Categorical probability is currently like finding the sea route to India: several approaches with unclear relation. ⊲ This talk has sketched a biased sample of approaches. ⊲ It seems useful to distinguish:

⊲ Eilenberg–Moore category ⇒ integration and its properties. ⊲ Kleisli category ⇒ conditional independence, statistics.

⊲ A clearer overall picture may emerge once we have further concrete results. ⊲ The biggest challenge is to recover the specific analytical theorems of probability, such as the central limit theorem.