Mingzhang Yin*, Yuguang Yue*, Mingyuan Zhou The University of - - PowerPoint PPT Presentation

mingzhang yin yuguang yue mingyuan zhou
SMART_READER_LITE
LIVE PREVIEW

Mingzhang Yin*, Yuguang Yue*, Mingyuan Zhou The University of - - PowerPoint PPT Presentation

ARSM: Augment-REINFORCE-Swap-Merge Estimator for Gradient Backpropagation Through Categorical Variables Mingzhang Yin*, Yuguang Yue*, Mingyuan Zhou The University of Texas at Austin Department of Statistics and Data Sciences IROM Department,


slide-1
SLIDE 1

ARSM: Augment-REINFORCE-Swap-Merge Estimator for Gradient Backpropagation Through Categorical Variables

Mingzhang Yin*, Yuguang Yue*, Mingyuan Zhou

The University of Texas at Austin Department of Statistics and Data Sciences IROM Department, McCombs School of Business

International Conference on Machine Learning Long Beach, CA, June 13, 2019

(UT-Austin Statistics) ARSM June 2019 1 / 7

slide-2
SLIDE 2

Categorical latent variable optimization

Goal: Maximize the expectation with respect to categorical variables E(φ) =

  • f (z)qφ(z)dz = Ez∼qφ(z)[f (z)]

Notations:

1

f (z) is the reward function for categorical z

2

z = (z1, . . . , zK) ∈ {1, 2, . . . , C}K is a K-dimensional C-way multivariate categorical vector

3

qφ(z) = K

k=1 Categorical(zk; σ(φk)) is the categorical distribution

whose parameters φ ∈ RKC needs to be optimized

Challenge: It is difficult to estimate ∇φE(φ) especially for large K and C.

(UT-Austin Statistics) ARSM June 2019 2 / 7

slide-3
SLIDE 3

Derivation of ARSM

Augment: the categorical variable z ∼ Cat(σ(φ)) can be equivalently generated as z = arg min

i∈{1,...,C}

πie−φi, π ∼ Dir(1C). Thus E(φ) = Ez∼qφ(z)[f (z)] = Eπ∼Dir(1C )[f (arg mini πie−φi)]. REINFORCE: ∇φE(φ) = Eπ∼Dir(1C )[f (arg mini πie−φi)(1 − Cπ)] Swap: Swapping the ith and jth elements of π would not change the expectation, which is a property used to provide self-controlled variance reduction (without any tuning parameters). Merge: Sharing random numbers between differently expressed but equivalent expectations leads to ∇φcE(φ) = Eπ∼Dir(1C )[gARSM(π)c] gARSM(π)c := 1 C

C

  • j=1
  • f (z c⇌j) − 1

C

C

  • m=1

f (z m⇌j)

  • (1 − Cπj)

(UT-Austin Statistics) ARSM June 2019 3 / 7

slide-4
SLIDE 4

An illustration example

Optimize φ ∈ RC to maximize Ez∼Cat(σ(φ))[f (z)], f (z) := 0.5 + z/(CR)

0.52 0.53

Reward True

0.52 0.53

REINFORCE

0.52 0.53

Gumbel

0.52 0.53

RELAX

0.52 0.53

AR

0.52 0.53

ARS

0.52 0.53

ARSM

1 2

Gradient

1e 3 2 4 1e 1 2.5 0.0 2.5 1e 2 0.5 0.0 0.5 4.0 4.5 5.0 1e 1 0.5 0.0 0.5 1.0 1e 1 0.5 0.0 0.5 1.0 1e 2 10

3

10

2

10

1

100

Probability

10

3

10

1

10

3

10

2

10

1

100 10

3

10

2

10

1

100 10

1

100 10

3

10

2

10

1

100 10

3

10

2

10

1

100 5000 0.5 0.0 0.5

Grad_var

1e 1 5000 0.00 0.25 0.50 0.75 1e 2 5000 2 4 1e 6 5000 0.0 0.5 1.0 1e 1 5000 3.4 3.6 3.8 4.0 1e 4 5000 0.0 0.5 1.0 1e 5 5000 0.0 0.5 1.0 1e 6

Iteration

Figure: The optimal solution is σ(φ) = (0, . . . , 1). The reward is computed analytically by Ez∼Cat(σ(φ))[f (z)] with maximum as 0.533.

(UT-Austin Statistics) ARSM June 2019 4 / 7

slide-5
SLIDE 5

VAEs with one or two categorical hidden layers (20-dimensional 10-way categorical)

20 40 60 80 100 120 140

Iterations(x1000)

100 110 120 130 140 150 160 170

  • ELBO

REINFORCE AR RELAX Gumbel-S. ARS ARSM Gumbel-S._2layer ARSM_2layer

Figure: Plots of negative ELBOs (nats) on binarized MNIST against training

  • iterations. The solid and dash lines correspond to the training and testing

respectively.

(UT-Austin Statistics) ARSM June 2019 5 / 7

slide-6
SLIDE 6

Reinforcement Learning (a sequence of categorical actions)

Figure: Moving average reward and log-variance of gradient estimator. In each plot, the solid lines are the median value of ten independent runs. The opaque bars are 10th and 90th percentiles.

(UT-Austin Statistics) ARSM June 2019 6 / 7

slide-7
SLIDE 7

Thank you!

Welcome to our poster at Pacific Ballroom #85

(UT-Austin Statistics) ARSM June 2019 7 / 7