Related to Bayesian Statistics by Atsuhide Mori (Osaka Dental - - PowerPoint PPT Presentation
Related to Bayesian Statistics by Atsuhide Mori (Osaka Dental - - PowerPoint PPT Presentation
Symplectic/Contact Geometry Related to Bayesian Statistics by Atsuhide Mori (Osaka Dental University, Japan) Geometric setting (1) The mind and statistics : manifold that stands for (a part of) the mind of an agent. Each point of
SLIDE 1
SLIDE 2
Geometric setting (1) The mind and statistics
- ℳ: manifold that stands for (a part of) the mind of an agent.
- Each point of ℳ presents a probability density function on ℝ𝑜.
- Fix the state 𝑊 ∈ 𝒲 = volume forms with finite total on ℳ of ℳ.
Then any statistic 𝑔: ℳ → ℝ𝑛 obeys a probability distribution. ℳ 𝑔 ℝ𝑛 ℝ𝑜(∋ 𝑨) 𝑊 /𝑒Vol=density w/ probability 𝑧 𝑔 𝑧 This probability distribution is the FULL estimation of the statistic 𝑔. (A value and a confidence interval with error bar are not enough!)
The proportion in ℳ by volume defines the probability.
SLIDE 3
Geometric setting (2) Bayesian updating
- Given a point 𝑨 ∈ ℝ𝑜, the agent updates 𝑊 ∈ 𝒲 in the following way:
Each point 𝑧 of the mind ℳ presents a probability density 𝜍𝑧 of ℝ𝑜. Change it to the likelihood 𝜍 𝑨 𝑧 = 𝜍𝑧(𝑨), and define the map 𝜒: ℝ𝑜 × 𝒲 ∋ 𝑨, 𝑊 ↦ 𝜍 𝑨 𝑊 ∈ 𝒲: “updating map”.
- Example.
ℳ = N 𝜈, 𝛵 | 𝜈 ∈ ℝ𝑜 𝛵 ∈ 𝒬
𝑜 ,
𝑔 = 𝜈: ℳ → ℝ𝑜. This is the estimation of the mean 𝜈 of a normal distribution with a fixed covariance 𝛵 in the space 𝒬
𝑜 of positive definite real symmetric
- matrix. Then the updating map is 𝜒 𝑨, 𝑊 = N 𝑨, 𝛵 𝑊 ∈ 𝒲, namely,
𝑊 ↦ N 𝑨1, 𝛵 𝑊 ↦ N 𝑨2, 𝛵 N 𝑨1, 𝛵 𝑊 ↦ ⋯ . (The symmetry N 𝜈, 𝛵 𝑨 = N 𝑨, 𝛵 𝜈 implies 𝜍 𝑨 = N(𝑨, 𝛵).)
SLIDE 4
Geometric setting (3) Conjugate prior
- Our subject is the updating map 𝜒: ℝ𝑜 × 𝒲 ∋ 𝑨, 𝑊 ↦ 𝜍 𝑨 𝑊 ∈ 𝒲.
- Conjugate prior is a proper subset ෨
𝒱 ⊂ 𝒲 with 𝜒 ℝ𝑜 × ෨ 𝒱 ⊂ ෨ 𝒱. Putting 𝒱 = 𝑊 ∈ ෨ 𝒱 |
ℳ 𝑊 = 1 , we have ෨
𝒱 = 𝑙𝑊 | 𝑊 ∈ 𝒱, 𝑙 > 0 .
- Example. ℳ = N 𝜈, 𝛵 | 𝜈 ∈ ℝ𝑜
𝛵 ∈ 𝒬
𝑜 , 𝑔 = 𝜈: ℳ → ℝ𝑜. Put
𝒱 = N 𝑛, 𝐵 𝑒Vol | 𝑛 ∈ ℝ𝑜, 𝐵 ∈ 𝒬 𝑜, ℝ ⇒ ෨ 𝒱: conjugate prior.
- Suppose that the conjugate prior ෨
𝒱 is a manifold. We fix a ``distance’’ ෩ 𝐸: ෨ 𝒱 × ෨ 𝒱 → ℝ , which satisfies non of the axioms of distance, as ෩ 𝐸 𝑊
1, 𝑊 2 = න ℝ𝑜 𝑊 1 ln 𝑊 2
𝑊
1
the relative entropy
- The restriction ෩
𝐸|𝒱×𝒱 = 𝐸 is non-negative (KL-divergence).
SLIDE 5
Geometric setting (4) Bayesian Information Geom.
- Each 𝑧 ∈ ℳ presents a volume form 𝜍𝑧𝑒Vol on ℝ𝑜 ∋ 𝑨 .
- Given points 𝑨1, 𝑨2, … ∈ ℝ𝑜, one updates the prior 𝑄 ∈ ෨
𝒱 as 𝑄 ↦ 𝜍 𝑨1 𝑄 ↦ 𝜍 𝑨2 𝜍 𝑨1 𝑄 ↦ ⋯ 𝜍(𝑨)(𝑧) = 𝜍𝑧(𝑨) . This corresponds to a point move on 𝒱 by normalizing the density.
- Generalized IG = the Fisher metric & 𝛽-connections 𝛼 − 𝛽∗𝑈
: the quadratic term of ෩ 𝐸 𝑄, 𝑄 + 𝑒𝑄 + ෩ 𝐸(𝑄 + 𝑒𝑄, 𝑄) 𝑈: the cubic term of 3෩ 𝐸 𝑄, 𝑄 + 𝑒𝑄 − 3෩ 𝐸 𝑄 + 𝑒𝑄, 𝑄 The usual IG looks at the restrictions to the hypersurface 𝒱. Bayesian IG is the geometric study on the updating maps in ෨ 𝒱 and 𝒱.
SLIDE 6
Example of Bayesian IG (1) Two operations
- 𝜈 = 𝑧 presents exp −
1 2 𝑨 − 𝜈 T𝛵−1(𝑨 − 𝜈) 𝑒Vol on ℝ𝑜(∋ 𝑨).
- We have 𝒱 = N 𝑛, 𝐵 𝑒Vol | 𝑛 ∈ ℝ𝑜, 𝐵 ∈ 𝒬 𝑜, ℝ
and ෨ 𝒱.
- If 𝑨 repeats, the agent updates e.g. exp −
1 2𝑤 𝜈 2 𝑒Vol ∈ ෨
𝒱 into 𝜍 𝑨 𝑜𝑊 = exp − 1 2𝑤 𝜈 2 − 𝑜 2 𝜈 − 𝑧 T 𝛵−1(𝜈 − 𝑧) 𝑒Vol.
- Two operations on 𝒱 = N 𝑛, 𝐵 𝑒Vol | 𝑛 ∈ ℝ𝑜, 𝐵 ∈ 𝒬 𝑜, ℝ
: “∗” from the convolution N 𝑛, 𝐵 ∗ N 𝑛′, 𝐵′ presenting 𝑨 + 𝑨′, “⋅” from the normalized pointwise product 𝑙N 𝑛, 𝐵 ⋅ N 𝑛′, 𝐵′ . The above updating roughly corresponds to the iteration of “⋅” on 𝓥.
SLIDE 7
Example of Bayesian IG (2) Symmetry of D
- Assume 𝑜 = 1 (temporarily). Write 𝑄 = 𝑛, 𝑡 ∈ 𝒱, where 𝐵 = 𝑡2.
N 𝑛, 𝐵 ∗ N 𝑛′, 𝐵′ = N 𝑛 + 𝑛′, 𝐵 + 𝐵′ ⇒ 𝑄 ∗ 𝑄′ = 𝑛 + 𝑛′, 𝑡2 + 𝑡′2 N 𝑛, 𝐵 ⋅ N 𝑛′, 𝐵′ = N
𝑛𝐵′+𝐵𝑛′ 𝐵+𝐵′
,
𝐵𝐵′ 𝐵+𝐵′ ⇒ 𝑄 ⋅ 𝑄′ = 𝑛𝑡′2+𝑛′𝑡2 𝑡2+𝑡′2
,
𝑡𝑡′ 𝑡2+𝑡′2
- The correspondence 𝐺 =
𝑛, 𝑡 , 𝑁, 𝑇 |
𝑛 𝑡 + 𝑁 𝑇 = 0, 𝑡𝑇 = 1
defines a diffeomorphism of 𝒱 which interchanges “∗” and “⋅”, i.e., 𝑞, 𝑄 , 𝑞′, 𝑄′ ∈ 𝐺 ⊂ 𝒱 × 𝒱 ⇒ 𝑞 ∗ 𝑞′, 𝑄 ⋅ 𝑄′ , 𝑞 ⋅ 𝑞′, 𝑄 ∗ 𝑄′ ∈ 𝐺.
- Take the “stereogram” 𝑔 𝑞, 𝑄 = 𝐸 𝑞, 𝑞′ of 𝐸 under 𝑞′, 𝑄 ∈ 𝐺.
Then 𝑔: 𝒱 × 𝒱 → ℝ is preserved under the transformations 𝑛, 𝑡 , 𝑁, 𝑇 ↦ 𝑓𝑢𝑛, 𝑓𝑢𝑡 , 𝑓−𝑢𝑁, 𝑓−𝑢𝑇 𝑢 ∈ ℝ Perhaps this is the first found symmetry of the KL-divergence 𝐸.
SLIDE 8
Example of Bayesian IG (3) Symplectic geometry
- The space 𝒱 × 𝒱 carries the positive&negative symplectic structures
𝑒𝜇± =
𝑒𝑛∧𝑒𝑡 𝑡2
±
𝑒𝑁∧𝑒𝑇 𝑇2
and their primitives 𝜇± =
𝑒𝑛 𝑡 ± 𝑒𝑁 𝑇 .
- Restricting the primitives 𝜇± to the hypersurface 𝑂 = 𝑡𝑇 = 1
⊃ 𝐺 , we obtain a bi-contact structure, i.e., a transverse pair of positive & negative contact structures. Then 𝜇± are their natural extensions.
- In general, a contact form 𝜃 & a function ℎ on a manifold 𝑁 determine
the contact Hamiltonian vector field 𝑌 via 𝜽 𝒀 = 𝒊 & 𝜽 ∧ 𝓜𝒀𝜽 = 𝟏. 𝑌 is the push-forward of the Hamiltonian vector field of 𝑓𝑢ℎ on the product ℝ ∋ t × 𝑁 with respect to the symplectic form 𝑒(𝑓𝑢𝜃).
- 𝑡 = 𝑓−𝑢−𝑣, 𝑇 = 𝑓−𝑢+𝑣 ⇒ 𝜇± = 𝑓𝑢 𝑓𝑣𝑒𝑛 ± 𝑓−𝑣𝑒𝑁 , 𝑂 = 𝑢 = 0 .
- Unless ℎ = ℎ(𝑛, 𝑡), there is no bi-contact Hamiltonian vector field.
SLIDE 9
Example of Bayesian IG (4) The Bayesian flow
- The correspondence 𝐺 ⊂ 𝒱 × 𝒱 is Lagrangian with respect to 𝑒𝜇−.
- There is a bi-contact Hamiltonian flow preserving the correspondence
𝐺 ⊂ 𝑂 ⊂ 𝒱 × 𝒱 . It is the one for the function ℎ = 𝑙
𝑛 𝑡
𝑙 ∈ ℝ .
- The restriction of the flow to the correspondence 𝐺 can be presented
by a flow on the second factor. Then the flow interpolates the iteration
- f “⋅” product in a logarithmic time. Thus we call it the Bayesian flow.
- The diffeomorphism of 𝒱 defined by 𝐺 ⊂ 𝒱 × 𝒱 sends any e-geodesic
to an e-geodesic (as a image). Particularly, the iteration of “∗” product is a discretization of an e-geodesic, which the diffeomorphism sends to a flow line of the above Bayesian flow.
- This has an application concerning the smoothness of a smoothing.
SLIDE 10
Example of Bayesian IG (5) Multivariate Case
- Take the extended Cholesky decomposition of the covariance 𝐵.
- This defines the fiber-bundle projection (and therefore the foliation by
fibers) of the space of normal distributions to the unitriangular group.
- Then the fibers (i.e., the leaves) have special properties:
- They are affine (thus flat) with respect to the e-connection.
- They are closed under the two operations “∗” and “⋅”.
- The product of any two leaves carries a pair of symplectic forms,
the Lagrangian correspondence, the bi-contact hypersurface, and the Bayesian bi-contact Hamiltoninan flow.
- The Bayesian approach could explain the extra dimensions in physics.