The continuous categorical: a novel simplex-valued exponential - - PowerPoint PPT Presentation

the continuous categorical a novel simplex valued
SMART_READER_LITE
LIVE PREVIEW

The continuous categorical: a novel simplex-valued exponential - - PowerPoint PPT Presentation

The continuous categorical: a novel simplex-valued exponential family Elliott Gordon-Rodr guez , Gabriel Loaiza-Ganem, John P. Cunningham https://arxiv.org/abs/2002.08563 ICML 2020 Motivation: compositional data Definition (simplex): S K :=


slide-1
SLIDE 1

The continuous categorical: a novel simplex-valued exponential family

Elliott Gordon-Rodr´ ıguez, Gabriel Loaiza-Ganem, John P. Cunningham

https://arxiv.org/abs/2002.08563

ICML 2020

slide-2
SLIDE 2

Motivation: compositional data

Definition (simplex): SK := {x ∈ RK

+ : K i=1 xi = 1}

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 2 / 1

slide-3
SLIDE 3

Motivation: compositional data

Examples: ◮ Geology ◮ Chemistry ◮ Microbiology ◮ Genetics ◮ Economics ◮ Politics ◮ Machine learning

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 3 / 1

slide-4
SLIDE 4

Shortcomings of the Dirichlet

Definition: x ∼ Dirichlet(α) if x ∈ SK with density: p(x; α) = 1 B(α)

K

  • i=1

xαi−1

i

. (1)

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 4 / 1

slide-5
SLIDE 5

Shortcomings of the Dirichlet

Definition: x ∼ Dirichlet(α) if x ∈ SK with density: p(x; α) = 1 B(α)

K

  • i=1

xαi−1

i

. (1) ◮ Extrema. log p(x; α) → ±∞ as xj → 0. ∴ log-likelihood is undefined in the presence of zeros.

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 4 / 1

slide-6
SLIDE 6

Shortcomings of the Dirichlet

Definition: x ∼ Dirichlet(α) if x ∈ SK with density: p(x; α) = 1 B(α)

K

  • i=1

xαi−1

i

. (1) ◮ Extrema. log p(x; α) → ±∞ as xj → 0. ∴ log-likelihood is undefined in the presence of zeros. ◮ Bias. Re-write the density in canonical form p(x; α) = h(x) exp K

i=1 αi log xi − A(α)

  • .

By theory of exponential families, MLE is unbiased for E log xj. ∴ MLE is biased for the mean µj = Exj.

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 4 / 1

slide-7
SLIDE 7

Shortcomings of the Dirichlet

Definition: x ∼ Dirichlet(α) if x ∈ SK with density: p(x; α) = 1 B(α)

K

  • i=1

xαi−1

i

. (1) ◮ Extrema. log p(x; α) → ±∞ as xj → 0. ∴ log-likelihood is undefined in the presence of zeros. ◮ Bias. Re-write the density in canonical form p(x; α) = h(x) exp K

i=1 αi log xi − A(α)

  • .

By theory of exponential families, MLE is unbiased for E log xj. ∴ MLE is biased for the mean µj = Exj. ◮ Flexibility. If x0 ∈ SK is a single datapoint, then log p(x0; α) → ∞ as α → ∞ along α = kx0. ∴ the Dirichlet log-likelihood is ill-behaved under flexible predictive models (e.g. GLMs, neural networks).

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 4 / 1

slide-8
SLIDE 8

Solution: a new exponential family

Definition: x ∈ SK follows a continuous categorical (CC) distribution with parameter λ ∈ SK if: x ∼ CC(λ) ⇐ ⇒ p(x; λ) ∝

K

  • i=1

λxi

i

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 5 / 1

slide-9
SLIDE 9

Solution: a new exponential family

Definition: x ∈ SK follows a continuous categorical (CC) distribution with parameter λ ∈ SK if: x ∼ CC(λ) ⇐ ⇒ p(x; λ) ∝

K

  • i=1

λxi

i

◮ Extrema. log p(x; λ) is finite at the extrema of the simplex. ∴ log-likelihood is well-defined in the presence of zeros.

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 5 / 1

slide-10
SLIDE 10

Solution: a new exponential family

Definition: x ∈ SK follows a continuous categorical (CC) distribution with parameter λ ∈ SK if: x ∼ CC(λ) ⇐ ⇒ p(x; λ) ∝

K

  • i=1

λxi

i

◮ Extrema. log p(x; λ) is finite at the extrema of the simplex. ∴ log-likelihood is well-defined in the presence of zeros. ◮ Bias. Re-write the CC density in canonical form p(x; λ) ∝ exp K

i=1 log(λi) · xi

  • .

∴ by theory of exponential families, MLE is unbiased for the mean µj = Exj.

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 5 / 1

slide-11
SLIDE 11

Solution: a new exponential family

Definition: x ∈ SK follows a continuous categorical (CC) distribution with parameter λ ∈ SK if: x ∼ CC(λ) ⇐ ⇒ p(x; λ) ∝

K

  • i=1

λxi

i

◮ Extrema. log p(x; λ) is finite at the extrema of the simplex. ∴ log-likelihood is well-defined in the presence of zeros. ◮ Bias. Re-write the CC density in canonical form p(x; λ) ∝ exp K

i=1 log(λi) · xi

  • .

∴ by theory of exponential families, MLE is unbiased for the mean µj = Exj. ◮ Flexibility. The CC density is convex in x. ∴ cannot represent interior modes, cannot concentrate mass

  • n interior points and log-likelihood does not diverge.

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 5 / 1

slide-12
SLIDE 12

Solution: a new exponential family

Definition: x ∈ SK follows a continuous categorical (CC) distribution with parameter λ ∈ SK if: x ∼ CC(λ) ⇐ ⇒ p(x; λ) ∝

K

  • i=1

λxi

i

Where did this come from? ◮ A probabilistic cross-entropy loss for compositional data. ◮ Multivariate generalization of the continuous Bernoulli distribution (Loaiza-Ganem & Cunningham, NeurIPS 2019): x ∼ CB(λ) ⇐ ⇒ p(x|λ) ∝ λx(1 − λ)1−x, for x ∈ [0, 1] = S1. ◮ A continuous relaxation of the categorical distribution. ◮ Switching the role of the parameter and the argument in the Dirichlet density. ◮ Restricting independent exponential RVs to the simplex.

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 5 / 1

slide-13
SLIDE 13

Normalizing constant

Theorem: Write C(λ) for the normalizing constant of the CC(λ) distribution, i.e.

  • SK C(λ)

K

  • i=1

λxi

i dµ(x) = 1.

(2) Then C(λ) =

  • (−1)K+1

K

  • k=1

λk

  • i=k log λi

λk

−1 ,

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 6 / 1

slide-14
SLIDE 14

Normalizing constant

Theorem: Write C(λ) for the normalizing constant of the CC(λ) distribution, i.e.

  • SK C(λ)

K

  • i=1

λxi

i dµ(x) = 1.

(2) Then C(λ) =

  • (−1)K+1

K

  • k=1

λk

  • i=k log λi

λk

−1 , Remark: ◮ Closed-form in terms of elementary functions only. ◮ Can compute moments, MGF, and more, directly from C(·).

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 6 / 1

slide-15
SLIDE 15

Related distributions

Beta Continuous Bernoulli Dirichlet Continuous Categorical

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 7 / 1

slide-16
SLIDE 16

Related distributions

xα−1(1 − x)β−1 λx(1 − λ)1−x K

i=1 xαi−1 i

K

i=1 λxi i

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 8 / 1

slide-17
SLIDE 17

Related distributions

xα−1(1 − x)β−1 λx(1 − λ)1−x K

i=1 xαi−1 i

K

i=1 λxi i Generalize to simplex Generalize to simplex

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 9 / 1

slide-18
SLIDE 18

Related distributions

xα−1(1 − x)β−1 λx(1 − λ)1−x K

i=1 xαi−1 i

K

i=1 λxi i Generalize to simplex Switch parameter and argument Generalize to simplex Switch parameter and argument

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 10 / 1

slide-19
SLIDE 19

Related distributions

Beta CB Dirichlet CC

Generalize to simplex Switch parameter and argument Generalize to simplex Switch parameter and argument

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 11 / 1

slide-20
SLIDE 20

Related distributions

Beta Dirichlet CB CC [0,1]-valued, Image data Simplex-valued, Compositional data Stable Unbiased Inflexible Unstable Biased Flexible

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 12 / 1

slide-21
SLIDE 21

Application: UK 2019 general election

Constituency-level predictors Voting

  • utcomes

regression function (linear or MLP)

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 13 / 1

slide-22
SLIDE 22

Election data: results

50 100 150 200 250 300 350 400

epoch

0.05 0.10 0.15 0.20 0.25 0.30

test error Dir linear Dir MLP CC linear CC MLP RMSE MAE

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 14 / 1

slide-23
SLIDE 23

Election data: results

50 100 150 200 250 300 350 400

epoch

0.05 0.10 0.15 0.20 0.25 0.30

test error Dir linear Dir MLP CC linear CC MLP RMSE MAE

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 14 / 1

slide-24
SLIDE 24

Election data: results

50 100 150 200 250 300 350 400

epoch

0.05 0.10 0.15 0.20 0.25 0.30

test error Dir linear Dir MLP CC linear CC MLP RMSE MAE

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 14 / 1

slide-25
SLIDE 25

Election data: optimizers

50 100 150 200 250 300 350 400

epoch

4000 2000 2000 4000 6000 8000

LogLik Dir linear Dir MLP CC linear CC MLP Adam RMSprop Adadelta

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 15 / 1

slide-26
SLIDE 26

Election data: optimizers

50 100 150 200 250 300 350 400

epoch

4000 2000 2000 4000 6000 8000

LogLik Dir linear Dir MLP CC linear CC MLP Adam RMSprop Adadelta

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 15 / 1

slide-27
SLIDE 27

Election data: optimizers

50 100 150 200 250 300 350 400

epoch

4000 2000 2000 4000 6000 8000

LogLik Dir linear Dir MLP CC linear CC MLP Adam RMSprop Adadelta

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 15 / 1

slide-28
SLIDE 28

Model compression (knowledge distillation)

Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. “Distilling the knowledge in a neural network.” arXiv preprint arXiv:1503.02531 (2015). Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 16 / 1

slide-29
SLIDE 29

Model compression (knowledge distillation)

Student network learns from (soft) outputs of teacher model, via (soft) cross-entropy loss − → replace with CC log-likelihood.

Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. “Distilling the knowledge in a neural network.” arXiv preprint arXiv:1503.02531 (2015). Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 16 / 1

slide-30
SLIDE 30

Model compression: results on MNIST

50 100 150 200 250 300 350 400

epoch

0.05 0.10 0.15 0.20 0.25 0.30

RMSE Dir soft XE CC hard XE T=1 T=5 T=10

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 17 / 1

slide-31
SLIDE 31

Conclusion

◮ Novel exponential family of distributions. ◮ Attractive mathematical properties. ◮ Outperforms the Dirichlet in regression models of compositional outcomes.

Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. 18 / 1