A Wrapped Normal Distribution on Hyperbolic Space for Gradient Based - - PowerPoint PPT Presentation

a wrapped normal distribution on hyperbolic space for
SMART_READER_LITE
LIVE PREVIEW

A Wrapped Normal Distribution on Hyperbolic Space for Gradient Based - - PowerPoint PPT Presentation

ICML19 , Jun 12 th , 2019 A Wrapped Normal Distribution on Hyperbolic Space for Gradient Based Learning Yoshihiro Nagano 1) , Shoichiro Yamaguchi 2) , Yasuhiro Fujita 2) , Masanori Koyama 2) 1) Department of Complexity Science, The University


slide-1
SLIDE 1

A Wrapped Normal Distribution on Hyperbolic Space for Gradient Based Learning

ICML’19, Jun 12th, 2019 Yoshihiro Nagano1), Shoichiro Yamaguchi2), Yasuhiro Fujita2), Masanori Koyama2)

1) Department of Complexity Science, The University of Tokyo, Japan 2) Preferred Networks, Inc., Japan

Code: github.com/pfnet-research/hyperbolic_wrapped_distribution Poster: 6:30-9:00 PM @Pacific Ballroom #7

slide-2
SLIDE 2

Motivation

  • p

) P P P P P P

  • [Silver+2016]

Mammal Primate Human Monkey Rodent

slide-3
SLIDE 3

Motivation

Mammal Primate Human Monkey Rodent

  • p

) P P P P P P

  • [Silver+2016]

Hierarchical Datasets Hyperbolic Space

[Image: wikipedia.org]

slide-4
SLIDE 4

Motivation

Mammal Primate Human Monkey Rodent

  • p

) P P P P P P

  • [Silver+2016]

Hierarchical Datasets Hyperbolic Space

V

  • l

u m e i n c r e a s e s e x p

  • n

e n t i a l l y w i t h i t s r a d i u s

slide-5
SLIDE 5

Motivation

Mammal Primate Human Monkey Rodent

  • p

) P P P P P P

  • [Silver+2016]

Hierarchical Datasets Hyperbolic Space

[Nickel+2017]

slide-6
SLIDE 6

Motivation

Mammal Primate Human Monkey Rodent

  • p

) P P P P P P

  • [Silver+2016]

Hierarchical Datasets Hyperbolic Space

[Nickel+2017]

How can we extend these works to probabilistic inference?

slide-7
SLIDE 7

Difficulty: Probabilistic Distribution on Curved Space

VAEs w/ Riemannian distribution [Ovinnikov2019; Mathieu+2019]

  • Only limited to the Gaussian w/ scalar variance
  • Needs rejection sampling

⇒ Construct distribution by sampling for flexible density and sampling

slide-8
SLIDE 8

Lorentz model: Defining probabilistic distribution on locally flat tangent space and projecting its random variable with the parallel transport and exponential map. We can analytically get the log-density by calculating volumetric change.

  • Construction of Hyperbolic Wrapped Distribution
slide-9
SLIDE 9

Lorentz model: Defining probabilistic distribution on locally flat tangent space and projecting its random variable with the parallel transport and exponential map. We can analytically get the log-density by calculating volumetric change.

  • Construction of Hyperbolic Wrapped Distribution
slide-10
SLIDE 10

Lorentz model: Defining probabilistic distribution on locally flat tangent space and projecting its random variable with the parallel transport and exponential map. We can analytically get the log-density by calculating volumetric change.

Construction of Hyperbolic Wrapped Distribution

slide-11
SLIDE 11

Properties of Hyperbolic Wrapped Distribution

Density: Projection: ()(((( (((

  • /A/

. (/ .//./ //

  • )//

/ .// /A . / A.// /≃ ℝ$

slide-12
SLIDE 12

Numerical Evaluations

Euclid Hyperbolic n MAP Rank MAP Rank 5 0.296±.006 25.09±.80 0.506±.017 20.55±1.34 10 0.778±.007 4.70±.05 0.795±.007 5.07±.12 20 0.894±.002 2.23±.03 0.897±.005 2.54±.20 50 0.942±.003 1.51±.04 0.975±.001 1.19±.01 100 0.953±.002 1.34±.02 0.978±.002 1.15±.01

(b) Normal VAE (β = 1.0) (c) Hyperbolic VAE (a) A tree representation of the training dataset

  • Variational Autoencoder

Hyperbolic VAE could learn not only the true hierarchical structure but also noisy unseen data without any explicit knowledge for tree. Word embedding Our model outperformed Euclidean counterpart for WordNet nouns dataset.

slide-13
SLIDE 13

Conclusion

Proposed a projection-based probabilistic distribution on hyperbolic space which is easy to use with gradient-based learning. Constructed the wrapped normal distribution on Lorentz model by projecting the random variable on locally flat tangent space. Numerically evaluated the performance of our model on various datasets including MNIST, Atari 2600 Breakout, and WordNet.

Poster: 6:30-9:00 PM @Pacific Ballroom #7