Related to Bayesian Statistics by Atsuhide Mori (Osaka Dental - - PowerPoint PPT Presentation

related to bayesian statistics
SMART_READER_LITE
LIVE PREVIEW

Related to Bayesian Statistics by Atsuhide Mori (Osaka Dental - - PowerPoint PPT Presentation

Symplectic/Contact Geometry Related to Bayesian Statistics by Atsuhide Mori (Osaka Dental University, Japan) Geometric setting (1) The mind and statistics : manifold that stands for (a part of) the mind of an agent. Each point of


slide-1
SLIDE 1

Symplectic/Contact Geometry Related to Bayesian Statistics

by Atsuhide Mori (Osaka Dental University, Japan)

slide-2
SLIDE 2

Geometric setting (1) The mind and statistics

  • ℳ: manifold that stands for (a part of) the mind of an agent.
  • Each point of ℳ presents a probability density function on ℝ𝑜.
  • Fix the state 𝑊 ∈ 𝒲 = volume forms with finite total on ℳ of ℳ.

Then any statistic 𝑔: ℳ → ℝ𝑛 obeys a probability distribution. ℳ 𝑔 ℝ𝑛 ℝ𝑜(∋ 𝑨) 𝑊 /𝑒Vol=density w/ probability 𝑧 𝑔 𝑧 This probability distribution is the FULL estimation of the statistic 𝑔. (A value and a confidence interval with error bar are not enough!)

The proportion in ℳ by volume defines the probability.

slide-3
SLIDE 3

Geometric setting (2) Bayesian updating

  • Given a point 𝑨 ∈ ℝ𝑜, the agent updates 𝑊 ∈ 𝒲 in the following way:

Each point 𝑧 of the mind ℳ presents a probability density 𝜍𝑧 of ℝ𝑜. Change it to the likelihood 𝜍 𝑨 𝑧 = 𝜍𝑧(𝑨), and define the map 𝜒: ℝ𝑜 × 𝒲 ∋ 𝑨, 𝑊 ↦ 𝜍 𝑨 𝑊 ∈ 𝒲: “updating map”.

  • Example.

ℳ = N 𝜈, 𝛵 | 𝜈 ∈ ℝ𝑜 𝛵 ∈ 𝒬

𝑜 ,

𝑔 = 𝜈: ℳ → ℝ𝑜. This is the estimation of the mean 𝜈 of a normal distribution with a fixed covariance 𝛵 in the space 𝒬

𝑜 of positive definite real symmetric

  • matrix. Then the updating map is 𝜒 𝑨, 𝑊 = N 𝑨, 𝛵 𝑊 ∈ 𝒲, namely,

𝑊 ↦ N 𝑨1, 𝛵 𝑊 ↦ N 𝑨2, 𝛵 N 𝑨1, 𝛵 𝑊 ↦ ⋯ . (The symmetry N 𝜈, 𝛵 𝑨 = N 𝑨, 𝛵 𝜈 implies 𝜍 𝑨 = N(𝑨, 𝛵).)

slide-4
SLIDE 4

Geometric setting (3) Conjugate prior

  • Our subject is the updating map 𝜒: ℝ𝑜 × 𝒲 ∋ 𝑨, 𝑊 ↦ 𝜍 𝑨 𝑊 ∈ 𝒲.
  • Conjugate prior is a proper subset ෨

𝒱 ⊂ 𝒲 with 𝜒 ℝ𝑜 × ෨ 𝒱 ⊂ ෨ 𝒱. Putting 𝒱 = 𝑊 ∈ ෨ 𝒱 | ׬

ℳ 𝑊 = 1 , we have ෨

𝒱 = 𝑙𝑊 | 𝑊 ∈ 𝒱, 𝑙 > 0 .

  • Example. ℳ = N 𝜈, 𝛵 | 𝜈 ∈ ℝ𝑜

𝛵 ∈ 𝒬

𝑜 , 𝑔 = 𝜈: ℳ → ℝ𝑜. Put

𝒱 = N 𝑛, 𝐵 𝑒Vol | 𝑛 ∈ ℝ𝑜, 𝐵 ∈ 𝒬 𝑜, ℝ ⇒ ෨ 𝒱: conjugate prior.

  • Suppose that the conjugate prior ෨

𝒱 is a manifold. We fix a ``distance’’ ෩ 𝐸: ෨ 𝒱 × ෨ 𝒱 → ℝ , which satisfies non of the axioms of distance, as ෩ 𝐸 𝑊

1, 𝑊 2 = න ℝ𝑜 𝑊 1 ln 𝑊 2

𝑊

1

the relative entropy

  • The restriction ෩

𝐸|𝒱×𝒱 = 𝐸 is non-negative (KL-divergence).

slide-5
SLIDE 5

Geometric setting (4) Bayesian Information Geom.

  • Each 𝑧 ∈ ℳ presents a volume form 𝜍𝑧𝑒Vol on ℝ𝑜 ∋ 𝑨 .
  • Given points 𝑨1, 𝑨2, … ∈ ℝ𝑜, one updates the prior 𝑄 ∈ ෨

𝒱 as 𝑄 ↦ 𝜍 𝑨1 𝑄 ↦ 𝜍 𝑨2 𝜍 𝑨1 𝑄 ↦ ⋯ 𝜍(𝑨)(𝑧) = 𝜍𝑧(𝑨) . This corresponds to a point move on 𝒱 by normalizing the density.

  • Generalized IG = the Fisher metric 𝑕 & 𝛽-connections 𝛼 − 𝛽𝑕∗𝑈

𝑕: the quadratic term of ෩ 𝐸 𝑄, 𝑄 + 𝑒𝑄 + ෩ 𝐸(𝑄 + 𝑒𝑄, 𝑄) 𝑈: the cubic term of 3෩ 𝐸 𝑄, 𝑄 + 𝑒𝑄 − 3෩ 𝐸 𝑄 + 𝑒𝑄, 𝑄 The usual IG looks at the restrictions to the hypersurface 𝒱. Bayesian IG is the geometric study on the updating maps in ෨ 𝒱 and 𝒱.

slide-6
SLIDE 6

Example of Bayesian IG (1) Two operations

  • 𝜈 = 𝑧 presents exp −

1 2 𝑨 − 𝜈 T𝛵−1(𝑨 − 𝜈) 𝑒Vol on ℝ𝑜(∋ 𝑨).

  • We have 𝒱 = N 𝑛, 𝐵 𝑒Vol | 𝑛 ∈ ℝ𝑜, 𝐵 ∈ 𝒬 𝑜, ℝ

and ෨ 𝒱.

  • If 𝑨 repeats, the agent updates e.g. exp −

1 2𝑤 𝜈 2 𝑒Vol ∈ ෨

𝒱 into 𝜍 𝑨 𝑜𝑊 = exp − 1 2𝑤 𝜈 2 − 𝑜 2 𝜈 − 𝑧 T 𝛵−1(𝜈 − 𝑧) 𝑒Vol.

  • Two operations on 𝒱 = N 𝑛, 𝐵 𝑒Vol | 𝑛 ∈ ℝ𝑜, 𝐵 ∈ 𝒬 𝑜, ℝ

: “∗” from the convolution N 𝑛, 𝐵 ∗ N 𝑛′, 𝐵′ presenting 𝑨 + 𝑨′, “⋅” from the normalized pointwise product 𝑙N 𝑛, 𝐵 ⋅ N 𝑛′, 𝐵′ . The above updating roughly corresponds to the iteration of “⋅” on 𝓥.

slide-7
SLIDE 7

Example of Bayesian IG (2) Symmetry of D

  • Assume 𝑜 = 1 (temporarily). Write 𝑄 = 𝑛, 𝑡 ∈ 𝒱, where 𝐵 = 𝑡2.

N 𝑛, 𝐵 ∗ N 𝑛′, 𝐵′ = N 𝑛 + 𝑛′, 𝐵 + 𝐵′ ⇒ 𝑄 ∗ 𝑄′ = 𝑛 + 𝑛′, 𝑡2 + 𝑡′2 N 𝑛, 𝐵 ⋅ N 𝑛′, 𝐵′ = N

𝑛𝐵′+𝐵𝑛′ 𝐵+𝐵′

,

𝐵𝐵′ 𝐵+𝐵′ ⇒ 𝑄 ⋅ 𝑄′ = 𝑛𝑡′2+𝑛′𝑡2 𝑡2+𝑡′2

,

𝑡𝑡′ 𝑡2+𝑡′2

  • The correspondence 𝐺 =

𝑛, 𝑡 , 𝑁, 𝑇 |

𝑛 𝑡 + 𝑁 𝑇 = 0, 𝑡𝑇 = 1

defines a diffeomorphism of 𝒱 which interchanges “∗” and “⋅”, i.e., 𝑞, 𝑄 , 𝑞′, 𝑄′ ∈ 𝐺 ⊂ 𝒱 × 𝒱 ⇒ 𝑞 ∗ 𝑞′, 𝑄 ⋅ 𝑄′ , 𝑞 ⋅ 𝑞′, 𝑄 ∗ 𝑄′ ∈ 𝐺.

  • Take the “stereogram” 𝑔 𝑞, 𝑄 = 𝐸 𝑞, 𝑞′ of 𝐸 under 𝑞′, 𝑄 ∈ 𝐺.

Then 𝑔: 𝒱 × 𝒱 → ℝ is preserved under the transformations 𝑛, 𝑡 , 𝑁, 𝑇 ↦ 𝑓𝑢𝑛, 𝑓𝑢𝑡 , 𝑓−𝑢𝑁, 𝑓−𝑢𝑇 𝑢 ∈ ℝ Perhaps this is the first found symmetry of the KL-divergence 𝐸.

slide-8
SLIDE 8

Example of Bayesian IG (3) Symplectic geometry

  • The space 𝒱 × 𝒱 carries the positive&negative symplectic structures

𝑒𝜇± =

𝑒𝑛∧𝑒𝑡 𝑡2

±

𝑒𝑁∧𝑒𝑇 𝑇2

and their primitives 𝜇± =

𝑒𝑛 𝑡 ± 𝑒𝑁 𝑇 .

  • Restricting the primitives 𝜇± to the hypersurface 𝑂 = 𝑡𝑇 = 1

⊃ 𝐺 , we obtain a bi-contact structure, i.e., a transverse pair of positive & negative contact structures. Then 𝜇± are their natural extensions.

  • In general, a contact form 𝜃 & a function ℎ on a manifold 𝑁 determine

the contact Hamiltonian vector field 𝑌 via 𝜽 𝒀 = 𝒊 & 𝜽 ∧ 𝓜𝒀𝜽 = 𝟏. 𝑌 is the push-forward of the Hamiltonian vector field of 𝑓𝑢ℎ on the product ℝ ∋ t × 𝑁 with respect to the symplectic form 𝑒(𝑓𝑢𝜃).

  • 𝑡 = 𝑓−𝑢−𝑣, 𝑇 = 𝑓−𝑢+𝑣 ⇒ 𝜇± = 𝑓𝑢 𝑓𝑣𝑒𝑛 ± 𝑓−𝑣𝑒𝑁 , 𝑂 = 𝑢 = 0 .
  • Unless ℎ = ℎ(𝑛, 𝑡), there is no bi-contact Hamiltonian vector field.
slide-9
SLIDE 9

Example of Bayesian IG (4) The Bayesian flow

  • The correspondence 𝐺 ⊂ 𝒱 × 𝒱 is Lagrangian with respect to 𝑒𝜇−.
  • There is a bi-contact Hamiltonian flow preserving the correspondence

𝐺 ⊂ 𝑂 ⊂ 𝒱 × 𝒱 . It is the one for the function ℎ = 𝑙

𝑛 𝑡

𝑙 ∈ ℝ .

  • The restriction of the flow to the correspondence 𝐺 can be presented

by a flow on the second factor. Then the flow interpolates the iteration

  • f “⋅” product in a logarithmic time. Thus we call it the Bayesian flow.
  • The diffeomorphism of 𝒱 defined by 𝐺 ⊂ 𝒱 × 𝒱 sends any e-geodesic

to an e-geodesic (as a image). Particularly, the iteration of “∗” product is a discretization of an e-geodesic, which the diffeomorphism sends to a flow line of the above Bayesian flow.

  • This has an application concerning the smoothness of a smoothing.
slide-10
SLIDE 10

Example of Bayesian IG (5) Multivariate Case

  • Take the extended Cholesky decomposition of the covariance 𝐵.
  • This defines the fiber-bundle projection (and therefore the foliation by

fibers) of the space of normal distributions to the unitriangular group.

  • Then the fibers (i.e., the leaves) have special properties:
  • They are affine (thus flat) with respect to the e-connection.
  • They are closed under the two operations “∗” and “⋅”.
  • The product of any two leaves carries a pair of symplectic forms,

the Lagrangian correspondence, the bi-contact hypersurface, and the Bayesian bi-contact Hamiltoninan flow.

  • The Bayesian approach could explain the extra dimensions in physics.