Categorical Probability and Statistics Peter McCullagh Department - PowerPoint PPT Presentation

Categorical Probability and Statistics Categorical Probability and Statistics Peter McCullagh Department of Statistics University of Chicago June 5 2020

Categorical Probability and Statistics Speaker background Categorical Probability and Statistics Speaker background Remarks on Saunders MacLane Categorical notions in statistics Sampling and sub-sampling Simple random sampling Spectral sampling Linear representations for injective maps Sub-representations of Inj Sub-representations of Inj 2 , Inj 3 , . . . Factorial subspaces

Categorical Probability and Statistics Speaker background Where is this speaker coming from? Randomness, repetitive structures, stochastic processes Samples and sub-samples; selection Simple random samples and sub-samples Sample values; symmetric functions; cumulants, k -statistics and polykays Inheritance under simple random sampling spectral samples; spectral k -statistics, free cumulants Experimental design and structured samples; Factorial design Linear models and factorial subspaces Symmetry and group representations Marginality and category representations Kolmogorov consistency Projective systems and infinite exchangeability

Categorical Probability and Statistics Speaker background Remarks on Saunders MacLane Recollections of Saunders MacLane 1909–2005 Semi-regular at the Quad-Club lunch Frequently joined the Stats table Very strong views on myriad topics Views freely expressed Occasionally mentioned category theory Had no interest in prob or stats Had no interest in applications of math Would undoubtedly regard this talk as trivial Saunders was a curmudgeon, usually friendly S was an extrovert He loved debate, argument, controversy I learned about categories from Burt Totaro Also representation theory for categories Burt is the opposite of Saunders

Categorical Probability and Statistics Categorical notions in statistics Categorical Probability and Statistics Speaker background Remarks on Saunders MacLane Categorical notions in statistics Sampling and sub-sampling Simple random sampling Spectral sampling Linear representations for injective maps Sub-representations of Inj Sub-representations of Inj 2 , Inj 3 , . . . Factorial subspaces

Categorical Probability and Statistics Categorical notions in statistics Sampling and sub-sampling Samples and sub-samples Universe: a set U of observational units a.k.a population the items (humans/mice/rats/drosophila/...) being studied the sample U ⊂ U actually chosen: (# U < ∞ ) process: to each u ∈ U there corresponds a value Y u observation: to each u ∈ U there corresponds an obs Y u e.g., Y u ∈ { 0 , 1 } (Covid-19 status) or Y u ∈ R (height or weight or temp) or Y u ∈ R 2 (systolic, diastolic) Goal of statistics: given Y : U → R observed on sample What can we say about Y u for extra-sample u ∈ U \ U ? —stochastic process

Categorical Probability and Statistics Categorical notions in statistics Sampling and sub-sampling Exchangeability and symmetric functions Equivalent samples: ϕ : U ′ → U (bijection) n = # U (sample size) —all samples of the same size are equivalent (same distribution) Y ∈ R U ∼ = R n Observation Y : U → R ; Symmetric function h : R n → R as a statistical summary h ( y 1 , . . . , y n ) = h ( y σ (1) , . . . , y σ ( n ) ) examples h ( y ) = y . = y 1 + · · · + y n h ( y ) = ¯ y n = ( y 1 + · · · + y n ) / n y n ) 2 / n h ( y ) = � ( y i − ¯ h ( y ) = s 2 y n ) 2 / ( n − 1) n = � ( y i − ¯ The statistical problem with symmetric functions ... —The equivalence classes are isolated —nothing to connect samples of size 5 with samples of size 6

Categorical Probability and Statistics Categorical notions in statistics Simple random sampling Simple random sampling A s.r.s. of size n taken from ‘population’ [ N ] = { 1 , . . . , N } (conventional) All subsets of size n have equal probability (for today) each ϕ : [ n ] → [ N ] is 1–1 with probability 1 / N ↓ n N ↓ n = N ( N − 1) · · · ( N − n + 1) = # Hom([ n ] , [ N ]) y ϕ s.r.s. obs y ϕ by composition [ n ] − → [ N ] − → R Example: N = 4; n = 3; y = (6 . 2 , 4 . 8 , 5 . 1 , 3 . 2) w.p. 1 / 4 ↓ 3 ;  (6 . 2 , 4 . 8 , 5 . 1) [3!]  w.p. 1 / 4 ↓ 3 ;  (6 . 2 , 4 . 8 , 3 . 2) [3!]  y ϕ ∆ = w.p. 1 / 4 ↓ 3 ; (6 . 2 , 5 . 1 , 3 . 2) [3!]   w.p. 1 / 4 ↓ 3 ; (4 . 8 , 5 . 1 , 3 . 2) [3!] 

Categorical Probability and Statistics Categorical notions in statistics Simple random sampling Exchangeability and inheritance on the average Illustration: N = 4; n = 3; y = (6 . 2 , 4 . 8 , 5 . 1 , 3 . 2) � ¯ y N = k N , 1 ( y ) = y i / N = 4 . 825 � y N ) 2 / ( N − 1) = 4 . 6075 k N , 2 ( y ) = ( y i − ¯ N � y N ) 3 k N , 3 ( y ) = ( y i − ¯ ( N − 1)( N − 2) = − 1 . 11375 k n , 1 ( y ϕ ) ∆ = { 5 . 367 , 4 . 373 , 4 . 833 , 4 . 367 } w.p. 1/4 each � � ave ϕ k n , 1 ( y ϕ ) = 4 . 825 � � ave ϕ k n , 2 ( y ϕ ) = 4 . 6075 � � ave ϕ k n , 3 ( y ϕ ) = − 1 . 11375

Categorical Probability and Statistics Categorical notions in statistics Simple random sampling Natural statistics with respect to S.R.S. A natural statistic T of degree d is a sequence of functions T n : R n → R —defined for every n ≥ d ≥ 0 For every y ∈ R N and s.r.s. ϕ : [ n ] → [ N ] ϕ ∈ Hom([ n ] , [ N ]) T n ( y ϕ ) = T N ( y ) Ave In general, called U -statistics Polynomial functions: k -statistics and polykays Relation between symmetric functions on different spaces k -statistics (Fisher 1929); Inheritance (Tukey 1950s)

Categorical Probability and Statistics Categorical notions in statistics Spectral sampling Statistical theory for spectral sampling Objects Y are n × n matrices (symmetric or Hermitian) Functions T n ( Y ) are class functions T n ( UYU ∗ ) = T n ( Y ) Statistics: Y is a random N × N Hermitian matrix Y is freely randomized if, for each U unitary, Y ∼ UYU ∗ if H ⊥ ⊥ Y is a random Haar-distributed matrix, order N then HYH ∗ is a freely randomized version of Y ( HYH ∗ ) n × n is the leading n × n sub-matrix then ( HYH ∗ ) n × n is also freely randomized Λ( Y ) = { λ 1 , . . . , λ N } ( HYH ∗ ) n × n � � Λ is a spectral sub-sample

Categorical Probability and Statistics Categorical notions in statistics Spectral sampling Natural statistics for spectral samples A natural statistic T of degree d is a sequence of class functions T n : H n → R —defined for every n ≥ d . For every Y ∈ H N ( HYH ∗ ) n × n � � Ave T n = T N ( Y ) H ∈ Haar N Simplest examples: (1) ( Y ) = n − 1 tr( Y ) = k (1) ( λ ) k † λ ) 2 = k (2) ( λ ) 1 k † � ( λ i − ¯ (2) ( Y ) = n 2 − 1 n + 1

Categorical Probability and Statistics Categorical notions in statistics Spectral sampling Examples of natural spectral statistics (Di N. et al 2013) (2) = nS 2 − S 2 λ ) 2 = k (2) 1 k † 1 � ( λ i − ¯ n ( n 2 − 1) = n 2 − 1 n + 1 (1 2 ) = nS 2 n ( n 2 − 1) = k (1 2 ) + k (2) 1 − S 2 k † n + 1 (3) = 2 2 S 3 1 − 3 nS 1 S 2 + n 2 S 3 2 k (3) k † = n ( n 2 − 1)( n 2 − 4) ( n + 1)( n + 2) (4) = 6 S 4 ( n 3 + n ) − 4 S 1 S 3 ( n 2 + 1) + S 2 2 (3 − 2 n 2 ) + 10 nS 2 1 S 2 − 5 S 4 k † 1 n ( n 2 − 1)( n 2 − 4)( n 2 − 9) k (4) + k (2 2 ) = 6 ( n + 1)( n + 2)( n + 3) (2 2 ) = k (4) + ( n 2 + 6 n + 6) k (2 2 ) / n k † ( n + 1)( n + 2)( n + 3)

Categorical Probability and Statistics Categorical notions in statistics Spectral sampling Limiting behaviour as n → ∞ Theorem (Di Nardo, McC and Senato (2013)) The normalized limit of k † ( r ) ( Y ) as n → ∞ is the rth free cumulant. The normalized limit of k † ( r , s ) is the product of two free cumulants Categorical interpretation: random embeddings Simple random samples : Spectral random samples L ϕ Euclidean isometries R n → R N Inj: [ n ] − → [ N ] : − Haar: R n � R N SRS: [ n ] � [ N ] : pullback by composition : pullback by conjugation # Inj( n , N ) = N ↓ n ; #SRS( n , N ) = 1 n ≤ N ; Natural statistic is a natural transformation on functors

Categorical Probability and Statistics Linear representations for injective maps Categorical Probability and Statistics Speaker background Remarks on Saunders MacLane Categorical notions in statistics Sampling and sub-sampling Simple random sampling Spectral sampling Linear representations for injective maps Sub-representations of Inj Sub-representations of Inj 2 , Inj 3 , . . . Factorial subspaces

Categorical Probability and Statistics Linear representations for injective maps The category of injective maps (Inj) Objects(Inj): finite sets Ω , Ω ′ , . . . Arrows(Inj): 1–1 maps (injective maps ϕ : Ω ′ → Ω) ϕ Inj includes symmetric group(s): [ n ] − → [ n ] # Hom([ m ] , [ n ]) = n ↓ m for m ≤ n ; 0 for m > n Representation of Inj: homomorphism Inj → Lin(Vect) Inj Lin Lin R Ω R Ω × Ω Ω �     � ϕ ∗ � ϕ ∗  ϕ  R Ω ′ R Ω ′ × Ω ′ Ω ′ ϕ ∗ x Ω ′ ϕ → x ϕ ∈ R Ω ′ − → Ω − → R ; �− x

Categorical Probability and Statistics Peter McCullagh Department - PowerPoint PPT Presentation

Categorical Probability and Statistics Categorical Probability and Statistics Peter McCullagh Department of Statistics University of Chicago June 5 2020 Categorical Probability and Statistics Speaker background Categorical Probability and

Categorical models of probability with symmetries Sam Staton, Oxford Categorical models

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Categorical Professional Development In-Service August 6, 2019 Welcome Back Categorical Team

STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin College September 7, 2017 1 /

Case study introduction Emily Robinson Data Scientist DataCamp Categorical Data in the

Categorical quantum mechanics Chris Heunen 1 / 76 Categorical Quantum Mechanics? Study of

Reordering factors Emily Robinson Data Scientist DataCamp Categorical Data in the Tidyverse

STAT 113 Describing Categorical Data I Colin Reimer Dawson Oberlin College September 11, 2020

Categorical Probability: Results and Challenges Tobias Fritz May 2019 What this talk is (not)

Annotation and down-stream analysis Martin Morgan 1 June 20-23, 2011 1 mtmorgan@fhcrc.org

Foundations of Artificial Intelligence 40. Board Games: Introduction and State of the Art Malte

Assessing Human Error Against a Benchmark of Perfection Ashton Anderson University of Toronto

Provenance and Linked Data in Biological Data Webs Jun Zhao Image Bioinformatics Research Group

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

Adversarial Search George Konidaris gdk@cs.brown.edu Fall 2019 Games Chess is the

CSE 182-L2:Blast & variants I Dynamic Programming FA08 CSE182 Notes

append/3 A Drosophila of L.P. As functions: append([], L) = L append([ H | T ], L) = [H |

Sambuz

Useful Links

Newsletter

Mail Us

Categorical Probability and Statistics Peter McCullagh Department - PowerPoint PPT Presentation

Categorical Probability and Statistics Categorical Probability and Statistics Peter McCullagh Department of Statistics University of Chicago June 5 2020 Categorical Probability and Statistics Speaker background Categorical Probability and

Categorical models of probability with symmetries Sam Staton, Oxford Categorical models

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Categorical Professional Development In-Service August 6, 2019 Welcome Back Categorical Team

STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin College September 7, 2017 1 /

Case study introduction Emily Robinson Data Scientist DataCamp Categorical Data in the

Categorical quantum mechanics Chris Heunen 1 / 76 Categorical Quantum Mechanics? Study of

Reordering factors Emily Robinson Data Scientist DataCamp Categorical Data in the Tidyverse

STAT 113 Describing Categorical Data I Colin Reimer Dawson Oberlin College September 11, 2020

Categorical Probability: Results and Challenges Tobias Fritz May 2019 What this talk is (not)

Annotation and down-stream analysis Martin Morgan 1 June 20-23, 2011 1 mtmorgan@fhcrc.org

Foundations of Artificial Intelligence 40. Board Games: Introduction and State of the Art Malte

Assessing Human Error Against a Benchmark of Perfection Ashton Anderson University of Toronto

Provenance and Linked Data in Biological Data Webs Jun Zhao Image Bioinformatics Research Group

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

Adversarial Search George Konidaris gdk@cs.brown.edu Fall 2019 Games Chess is the

CSE 182-L2:Blast &amp; variants I Dynamic Programming FA08 CSE182 Notes

append/3 A Drosophila of L.P. As functions: append([], L) = L append([ H | T ], L) = [H |

Sambuz

Useful Links

Newsletter

Mail Us

CSE 182-L2:Blast & variants I Dynamic Programming FA08 CSE182 Notes