inference in dynamical systems and the geometry of
play

Inference in dynamical systems and the geometry of learning group - PowerPoint PPT Presentation

Inference in dynamical systems and the geometry of learning group actions Geometry and Topology of Data Sayan Mukherjee Departments of Statistical Science, Mathematics. Computer Science, Biostatistics & Bioinformatics Duke University


  1. Discrete Hodge theory Define cochain complex d − Ω 0 ( Γ ) Ω 1 ( Γ ) − 0 − − → − → − − → − 0 , ← − ← − − ← − δ where ∀ f ∈ Ω 0 ( Γ ) , ( df ) ij = f i − f j , 1 ∀ ω ∈ Ω 1 ( Γ ) . X ( δω ) i = ω ij , deg ( i ) j ∼ i Then ( L 0 f ) i := ( δ df ) i 1 ∀ i ∈ V , ∀ f ∈ Ω 0 ( Γ ) . X = ( f i − f j ) deg ( i ) j ∼ i

  2. Twisted De Rham-Hodge Theory 1 X ( L 0 f ) i = ( f i − f j ) , ∀ f : V → K deg ( i ) j ∼ i 1 X ( L 1 f ) i = ( f i − ρ ij f j ) ∀ f : V → F deg ( i ) j ∼ i

  3. Twisted De Rham-Hodge Theory 1 X ( L 0 f ) i = ( f i − f j ) , ∀ f : V → K deg ( i ) j ∼ i 1 X ( L 1 f ) i = ( f i − ρ ij f j ) ∀ f : V → F deg ( i ) j ∼ i Na¨ ıvely: ∀ f ∈ C 0 ( Γ ; F ) ( d ρ f ) ij = f i − ρ ij f j , 1 ∀ ω ∈ C 1 ( Γ ; F ) X ( δ ρ ω ) i = ω ij , deg ( i ) j ∼ i then L 1 = δ ρ d ρ . There is a problem

  4. 8 f 2 C 0 ( Γ ; F ) ( d ρ f ) ij ⇠ f i � ρ ij f j , 1 8 ω 2 C 1 ( Γ ; F ) X ( δ ρ ω ) i ⇠ ω ij , deg ( i ) j ∼ i Issue : d ρ does not map into C 1 ( Γ ; F ) (no skew-symmetry). f j � ρ ji f j = � ρ ji ( f i � ρ ij f j ) 6 = � ( f i � ρ ij f j ) .

  5. 8 f 2 C 0 ( Γ ; F ) ( d ρ f ) ij ⇠ f i � ρ ij f j , 1 8 ω 2 C 1 ( Γ ; F ) X ( δ ρ ω ) i ⇠ ω ij , deg ( i ) j ∼ i Issue : d ρ does not map into C 1 ( Γ ; F ) (no skew-symmetry). f j � ρ ji f j = � ρ ji ( f i � ρ ij f j ) 6 = � ( f i � ρ ij f j ) . Fix : Interpret f i � ρ ij f j as the “local expression” of ( d ρ f ) ij in a local trivialization over U = { U i | 1  i  | V |} of the associated F -bundle of B ρ , denoted as B ρ [ F ], such that the extra ρ ji factor encodes a bundle transformation from U i to U j .

  6. Twisted De Rham-Hodge Theory I Combinatorial Hodge Theory: d − Ω 0 ( Γ ) Ω 1 ( Γ ) − 0 − − → − → − − → − 0 , ← − ← − − ← − δ I Twisted Combinatorial Hodge Theory: d ρ − C 0 ( Γ ; F ) Ω 1 ( Γ ; B ρ [ F ]) − − − → 0 − − → − → − 0 . ← − ← − − ← − δ ρ

  7. Twisted De Rham-Hodge Theory I Combinatorial Hodge Theory: d − Ω 0 ( Γ ) Ω 1 ( Γ ) − 0 − − → − → − − → − 0 , ← − ← − − ← − δ I Twisted Combinatorial Hodge Theory: d ρ − C 0 ( Γ ; F ) Ω 1 ( Γ ; B ρ [ F ]) − − − → 0 − − → − → − 0 . ← − ← − − ← − δ ρ Theorem (Gao, Brodzki, M (2016))) Define ∆ (0) ∆ (1) := δ ρ d ρ , := d ρ δ ρ ρ ρ then the following Hodge-type decomposition holds: C 0 ( Γ ; F ) = ker ∆ (0) ⊕ im δ ρ = ker d ρ ⊕ im δ ρ , ρ Ω 1 ( Γ ; B ρ [ F ]) = im d ρ ⊕ ker ∆ (1) = im d ρ ⊕ ker δ ρ . ρ

  8. Recovering the GCL An O ( d )-valued edge potential ξ 2 C 1 ( Γ ; O ( d )) is synchronizable if and only if there exists g 2 C 0 ( Γ ; O ( d )) such that ξ ij = g i g − 1 j for all i ⇠ j .

  9. Recovering the GCL An O ( d )-valued edge potential ξ 2 C 1 ( Γ ; O ( d )) is synchronizable if and only if there exists g 2 C 0 ( Γ ; O ( d )) such that ξ ij = g i g − 1 j for all i ⇠ j . Define frustration 1 1 2 � � X � g i g − 1 ν ( Γ ) = inf � ρ ij w ij � � j 2 d vol ( Γ ) � g ∈ C 0 ( Γ ; O ( d )) F i , j ∈ V 1 1 w ij k ξ ij � ρ ij k 2 X = inf F , 2 d vol ( Γ ) ξ ∈ C 1 sync ( Γ ; O ( d )) i , j ∈ V where we define ξ 2 C 1 ( Γ ; O ( d )) C 1 � ξ synchronizable � � sync ( Γ ; O ( d )) := ξ 2 C 1 ( Γ ; O ( d )) � Hol ξ ( Γ ) is trivial � � = .

  10. Find cochains in C 0 � Γ ; R d � R d ⇤ ⇥ “closest to” a global frame of B ρ 1 w ij k f i � ρ ij f j k 2 X D E f , ∆ (0) 2 ρ f i , j 2 V η ( f ) = = k f k 2 d i k f i k 2 X i 2 V 1 2 vol ( Γ ) [ f ] > L 1 [ f ] , 8 f 2 C 0 ⇣ Γ ; S d � 1 ⌘ = , k f k 6 = 0 .

  11. Learning group actions

  12. Learning group actions Problem (Learning group actions) Given a group G acting on a set X , simultaneously learn actions of G on X and a partition of X into disjoint subsets X 1 , · · · , X K . Each action is cycle-consistent on each X i ( 1 ≤ i ≤ K ).

  13. Learning group actions

  14. Learning group actions by synchronization Problem (Learning group actions by synchronization) Denote X K for all partitions of Γ into K nonempty connected subgroups ( K ≤ n ) and X X ν ( S i ) = inf w jk Cost G ( f j , ρ jk f k ) , vol ( S i ) = d j , f ∈ C 0 ( Γ ; G ) j , k ∈ S i j ∈ S i Solve the optimization problem 1 ≤ i ≤ K ν ( S i ) max min (1) 1 ≤ i ≤ K vol ( S i ) min { S 1 , ··· , S K } ∈ X K and output an optimal partition { S 1 , · · · , S K } together with the minimizing f ∈ C 0 ( Γ ; G ) .

  15. Algorithm: SynCut Input: Γ = ( V , E , w ), ρ 2 C 1 ( Γ ; G ), number of partitions K Output: Partitions { S 1 , · · · , S K } 1. Solve synchronization problem over Γ for ρ , obtain f 2 C 0 ( Γ ; G )

  16. Algorithm: SynCut Input: Γ = ( V , E , w ), ρ 2 C 1 ( Γ ; G ), number of partitions K Output: Partitions { S 1 , · · · , S K } 1. Solve synchronization problem over Γ for ρ , obtain f 2 C 0 ( Γ ; G ) 2. Compute d ij = exp ( � w ij k f i � ρ ij f j k ) on all edges ( i , j ) 2 E

  17. Algorithm: SynCut Input: Γ = ( V , E , w ), ρ 2 C 1 ( Γ ; G ), number of partitions K Output: Partitions { S 1 , · · · , S K } 1. Solve synchronization problem over Γ for ρ , obtain f 2 C 0 ( Γ ; G ) 2. Compute d ij = exp ( � w ij k f i � ρ ij f j k ) on all edges ( i , j ) 2 E 3. Spectral clustering on weighted graph ( V , E , d ) to get { S 1 , · · · , S k }

  18. Algorithm: SynCut Input: Γ = ( V , E , w ), ρ 2 C 1 ( Γ ; G ), number of partitions K Output: Partitions { S 1 , · · · , S K } 1. Solve synchronization problem over Γ for ρ , obtain f 2 C 0 ( Γ ; G ) 2. Compute d ij = exp ( � w ij k f i � ρ ij f j k ) on all edges ( i , j ) 2 E 3. Spectral clustering on weighted graph ( V , E , d ) to get { S 1 , · · · , S k } 4. Solve synchronization problem within each partition S j , “glue up” the local solutions to obtain f ∗ 2 C 0 ( Γ ; G )

  19. Algorithm: SynCut Input: Γ = ( V , E , w ), ρ 2 C 1 ( Γ ; G ), number of partitions K Output: Partitions { S 1 , · · · , S K } 1. Solve synchronization problem over Γ for ρ , obtain f 2 C 0 ( Γ ; G ) 2. Compute d ij = exp ( � w ij k f i � ρ ij f j k ) on all edges ( i , j ) 2 E 3. Spectral clustering on weighted graph ( V , E , d ) to get { S 1 , · · · , S k } 4. Solve synchronization problem within each partition S j , “glue up” the local solutions to obtain f ∗ 2 C 0 ( Γ ; G ) 5. f f ∗ , repeat from Step 2

  20. Dietary habits of primates

  21. Geometric morphometrics second mandibular molar of a Philippine flying lemur Philippine flying lemur ( Cynocephalus volans )

  22. Geometric morphometrics • Manually put k landmarks p 1 , p 2 , · · · , p k • Use spatial coordinates of the landmarks as features • Represent a shape in R 3 × k second mandibular molar of a Philippine flying lemur

  23. Geometric morphometrics • Manually put k landmarks p 1 , p 2 , · · · , p k • Use spatial coordinates of the landmarks as features p j = ( x j , y j , z j ) , j = 1 , · · · , k • Represent a shape in R 3 × k second mandibular molar of a Philippine flying lemur

  24. Geometric morphometrics • Manually put k landmarks p 1 , p 2 , · · · , p k • Use spatial coordinates of the landmarks as features p j = ( x j , y j , z j ) , j = 1 , · · · , k • Represent a shape in R 3 × k second mandibular molar of a Philippine flying lemur

  25. Shape distances: automated landmarks d cWn ( S 1 , S 2 ): Conformal Wasserstein Distance (CWD) d cP ( S 1 , S 2 ): Continuous Procrustes Distance (CPD) d cKP ( S 1 , S 2 ): Continuous Kantorovich-Procrustes Distance (CKPD)

  26. Continuous Procrustes distance Define A ( S , S 0 ) as the set of area preserving di ff eomorphisms, → S 0 such that for any measurable subset Ω ⊂ S maps a : S − Z Z dA S = dA S 0 . a ( Ω ) Ω

  27. Continuous Procrustes distance Define A ( S , S 0 ) as the set of area preserving di ff eomorphisms, → S 0 such that for any measurable subset Ω ⊂ S maps a : S − Z Z dA S = dA S 0 . a ( Ω ) Ω The continuous procrustes distance is Z | R ( x ) − a ( x ) | 2 d A S . d p ( S , S 0 ) = inf min R 2 rigid motion a 2 A ( S , S 0 ) S

  28. Continuous Procrustes distance Define A ( S , S 0 ) as the set of area preserving di ff eomorphisms, → S 0 such that for any measurable subset Ω ⊂ S maps a : S − Z Z dA S = dA S 0 . a ( Ω ) Ω The continuous procrustes distance is Z | R ( x ) − a ( x ) | 2 d A S . d p ( S , S 0 ) = inf min R 2 rigid motion a 2 A ( S , S 0 ) S Near optimal a are “almost” conformal, so simplify above to searching near conformal maps.

  29. The actions: G ◆ 1 ✓Z 2 k R ( x ) � C ( x ) k 2 d vol S i ( x ) d cP ( S i , S j ) = inf inf C ∈ A ( S i , S j ) R ∈ E (3) S i d ij � � � ! S j S i f ij

  30. 50 molars from 5 primate genera

  31. 5 primate genera Spider monkey Howler Monkey Squirrel Monkey Black handed spider monkey Titi monkey

  32. Folivorous, frugivorous, and insectivorous

  33. Open questions (1) Use Hodge structure to design synchronization algorithms (2) Synchronization beyond the regime of linear algebraic groups (3) Statistical complexity of learning group actions (4) Random walks on fibre bundles (5) Provable algorithms (6) Synchronization on simplicial complexes (7) Bayesian model

  34. Learning dynamical systems

  35. Microbial ecology 1 2 3 4 Posterior 95% credible interval 10000 Acidaminococcaceae 100 n.rikenellacae 10000 Bacillaceae 100 10000 Bacteroidaceae 100 10000 2 Bifidobacteriaceae 100 10000 Caldicoprobacteraceae 100 10000 0 Campylobacteraceae 100 10000 Christensenellaceae 100 10000 Clostridiaceae_1 − 2 100 10000 Coriobacteriaceae 100 n.oral 10000 Enterobacteriaceae 100 0 10000 Enterococcaceae 100 Vessel Balance Value 10000 Erysipelotrichaceae 100 1 count − 4 10000 2 Eubacteriaceae 100 3 10000 Family_XI 100 4 − 8 10000 Family_XIII 100 10000 Fusobacteriaceae 100 10000 n.enterobacteriaceae Lachnospiraceae 100 10000 2 Methanobacteriaceae 100 10000 Peptococcaceae 100 0 10000 Planococcaceae 100 − 2 10000 Porphyromonadaceae 100 10000 Prevotellaceae 100 − 4 10000 Ruminococcaceae 100 − 6 10000 Veillonellaceae 100 Day 02 Day 09 Day 16 Day 23 Day 30 Apr 12 Apr 14 Apr 16 Apr 18 Apr 20 Apr 22 Apr 12 Apr 14 Apr 16 Apr 18 Apr 20 Apr 22 Apr 12 Apr 14 Apr 16 Apr 18 Apr 20 Apr 22 Apr 12 Apr 14 Apr 16 Apr 18 Apr 20 Apr 22 t Justin Silverman and Lawerence David

  36. General framework We consider mathematical models of the form: I X is the “phase space”; I X t is the “true state” of bioreactor (your stomach) at time t ; I Y is the “observation space”; I Y t is our observation at time t .

  37. General framework We consider mathematical models of the form: I X is the “phase space”; I X t is the “true state” of bioreactor (your stomach) at time t ; I Y is the “observation space”; I Y t is our observation at time t . We only have access to the observations { Y t k } n k = 0 .

  38. General questions Given access to the observations { Y t k } n k = 0 , we might want to ask I what is the “true state” of the bioreactor at time t ? (filtering)

  39. General questions Given access to the observations { Y t k } n k = 0 , we might want to ask I what is the “true state” of the bioreactor at time t ? (filtering) I what are we likely to observe at time t n + 1 ? (prediction)

  40. General questions Given access to the observations { Y t k } n k = 0 , we might want to ask I what is the “true state” of the bioreactor at time t ? (filtering) I what are we likely to observe at time t n + 1 ? (prediction) I what are the rules governing the evolution of the system? (model selection / parameter estimation) We’ll focus on the last type of question.

  41. Basic assumptions How are the variables { X t k } n k = 0 and { Y t k } n k = 0 related? We’ll assume the process ( X t , Y t ) t has: I stationarity: the rules governing both the state space and our observations don’t change over time.

  42. Basic assumptions How are the variables { X t k } n k = 0 and { Y t k } n k = 0 related? We’ll assume the process ( X t , Y t ) t has: I stationarity: the rules governing both the state space and our observations don’t change over time. I Markov property: given the microbial population today, the microbial population tomorrow is independent of the population yesterday.

  43. Basic assumptions How are the variables { X t k } n k = 0 and { Y t k } n k = 0 related? We’ll assume the process ( X t , Y t ) t has: I stationarity: the rules governing both the state space and our observations don’t change over time. I Markov property: given the microbial population today, the microbial population tomorrow is independent of the population yesterday. I conditionally independent observations: given the state of the population today, today’s observation is independent of any other variables. Such systems are called “hidden Markov models” (HMMs).

  44. HMMs

  45. Microbial ecology 1 2 3 4 Posterior 95% credible interval 10000 Acidaminococcaceae 100 n.rikenellacae 10000 Bacillaceae 100 10000 Bacteroidaceae 100 10000 2 Bifidobacteriaceae 100 10000 Caldicoprobacteraceae 100 10000 0 Campylobacteraceae 100 10000 Christensenellaceae 100 10000 Clostridiaceae_1 − 2 100 10000 Coriobacteriaceae 100 n.oral 10000 Enterobacteriaceae 100 0 10000 Enterococcaceae 100 Vessel Balance Value 10000 Erysipelotrichaceae 100 1 count − 4 10000 2 Eubacteriaceae 100 3 10000 Family_XI 100 4 − 8 10000 Family_XIII 100 10000 Fusobacteriaceae 100 10000 n.enterobacteriaceae Lachnospiraceae 100 10000 2 Methanobacteriaceae 100 10000 Peptococcaceae 100 0 10000 Planococcaceae 100 − 2 10000 Porphyromonadaceae 100 10000 Prevotellaceae 100 − 4 10000 Ruminococcaceae 100 − 6 10000 Veillonellaceae 100 Day 02 Day 09 Day 16 Day 23 Day 30 Apr 12 Apr 14 Apr 16 Apr 18 Apr 20 Apr 22 Apr 12 Apr 14 Apr 16 Apr 18 Apr 20 Apr 22 Apr 12 Apr 14 Apr 16 Apr 18 Apr 20 Apr 22 Apr 12 Apr 14 Apr 16 Apr 18 Apr 20 Apr 22 t

  46. Dynamic linear models x t + 1 A t + 1 x t = y t B t x t + v t , = Here: y t is an observation in R p ; x t is a hidden state in R q ; A t is a p × p state transition matrix; B t is a q × p observation matrix; v t is a zero-mean vector in R q .

  47. Stochastic versus deterministic systems Should the process ( X t ) t be stochastic or deterministic?

  48. Stochastic versus deterministic systems Should the process ( X t ) t be stochastic or deterministic? I If the conditional distribution of X t k + 1 given X t k has positive variance, then we’ll say the process ( X t ) t is stochastic.

  49. Stochastic versus deterministic systems Should the process ( X t ) t be stochastic or deterministic? I If the conditional distribution of X t k + 1 given X t k has positive variance, then we’ll say the process ( X t ) t is stochastic. I Otherwise, we’ll say the process ( X t ) t is deterministic. In ecology both types of systems are commonly used.

  50. Deterministic dynamics Deterministic dynamics: for each θ , there is a map T θ : X → X such that X t + 1 = T θ ( X t ) .

  51. Deterministic dynamics Deterministic dynamics: for each θ , there is a map T θ : X → X such that X t + 1 = T θ ( X t ) . The Markov transition kernel is degenerate: Q θ ( x , y ) = δ T θ ( x ) ( y ) .

  52. Deterministic dynamics Deterministic dynamics: for each θ , there is a map T θ : X → X such that X t + 1 = T θ ( X t ) . The Markov transition kernel is degenerate: Q θ ( x , y ) = δ T θ ( x ) ( y ) . Such systems do not satisfy the strong stochastic mixing conditions used in previous work for HMMs.

  53. Setting for deterministic dynamics Suppose that for each θ in Θ (parameter space), we have ( X , X , T θ , µ θ ) , where I X is a complete separable metric space with Borel σ -algebra X

  54. Setting for deterministic dynamics Suppose that for each θ in Θ (parameter space), we have ( X , X , T θ , µ θ ) , where I X is a complete separable metric space with Borel σ -algebra X I T θ : X → X is a measurable map,

  55. Setting for deterministic dynamics Suppose that for each θ in Θ (parameter space), we have ( X , X , T θ , µ θ ) , where I X is a complete separable metric space with Borel σ -algebra X I T θ : X → X is a measurable map, I µ θ is a probability measure on ( X , X ) is T θ -invariant if µ θ ( T − 1 A ) = µ θ ( A ) , ∀ A ∈ X θ

  56. Setting for deterministic dynamics Suppose that for each θ in Θ (parameter space), we have ( X , X , T θ , µ θ ) , where I X is a complete separable metric space with Borel σ -algebra X I T θ : X → X is a measurable map, I µ θ is a probability measure on ( X , X ) is T θ -invariant if µ θ ( T − 1 A ) = µ θ ( A ) , ∀ A ∈ X θ I the measure preserving system ( X , X , T θ , µ θ ) is ergodic if T − 1 A = A implies µ ( A ) = { 0 , 1 } . θ

  57. Setting for deterministic dynamics Suppose that for each θ in Θ (parameter space), we have ( X , X , T θ , µ θ ) , where I X is a complete separable metric space with Borel σ -algebra X I T θ : X → X is a measurable map, I µ θ is a probability measure on ( X , X ) is T θ -invariant if µ θ ( T − 1 A ) = µ θ ( A ) , ∀ A ∈ X θ I the measure preserving system ( X , X , T θ , µ θ ) is ergodic if T − 1 A = A implies µ ( A ) = { 0 , 1 } . θ Family of systems ( X , , X , T θ , µ θ ) θ ∈ Θ ≡ ( T θ , µ θ ) θ ∈ Θ .

  58. Mixing Stochastic mixing: Let ( X t ) be a stochastic process. Consider the function α ( s ) α ( s ) = sup {| P ( A ∩ B ) − P ( A ) P ( B ) |} such that A ∈ X t −∞ , B ∈ X ∞ t + s the process is strongly mixing if α ( s ) → 0 as s → ∞

  59. Mixing Stochastic mixing: Let ( X t ) be a stochastic process. Consider the function α ( s ) α ( s ) = sup {| P ( A ∩ B ) − P ( A ) P ( B ) |} such that A ∈ X t −∞ , B ∈ X ∞ t + s the process is strongly mixing if α ( s ) → 0 as s → ∞ Dynamical mixing: T is strongly mixing if for all A , B ∈ X n →∞ µ ( T − n A ∩ B ) = µ ( A ) µ ( B ) . lim

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend