Adaptive Antithetic Sampling for Variance Reduction Hongyu Ren*, - PowerPoint PPT Presentation

Adaptive Antithetic Sampling for Variance Reduction Hongyu Ren*, Shengjia Zhao*, Stefano Ermon *equal contribution

Goal Estimation of 𝜈 = 𝔽 𝑞(𝑦) [𝑔 𝑦 ] is ubiquitous in machine learning problems. Environment 𝑞 𝑦 𝑟(𝑨|𝑦) 𝑟(𝑨) 𝑞 𝑦 Reward State Action 𝐸 Real/Fake 𝑞(𝑨) 𝐻 𝑞(𝑦|𝑨) Agent 𝑞(𝑨) 𝑞 𝑦 log 𝑞(𝑦, 𝑨) 𝔽 𝑞(𝜐) ෍ 𝑠(𝑡 𝑢 , 𝑏 𝑢 ) 𝔽 𝑞 𝑦 𝔽 𝑟 𝑨|𝑦 𝔽 𝑞(𝑦) log 𝐸(𝑦) + 𝔽 𝑞 𝑨 log 1 − 𝐸 𝐻 𝑨 𝑟(𝑨|𝑦) 𝑢 Variational Autoencoder Generative Adversarial Nets Reinforcement Learning

Goal Estimation of 𝜈 = 𝔽 𝑞(𝑦) [𝑔 𝑦 ] is ubiquitous in machine learning problems. 1 i.i.d. Monte Carlo Estimation: 𝜈 ≈ 2 (𝑔 𝑦 1 + 𝑔(𝑦 2 )) 𝑦 1 , 𝑦 2 ∼ 𝑞(𝑦) 1 MC is unbiased: 𝔽 2 (𝑔 𝑦 1 + 𝑔(𝑦 2 )) = 𝜈 High variance Estimation can be far off with small sample size

Goal Estimation of 𝜈 = 𝔽 𝑞(𝑦) [𝑔 𝑦 ] is ubiquitous in machine learning problems. 1 i.i.d. Monte Carlo Estimation: 𝜈 ≈ 2 (𝑔 𝑦 1 + 𝑔(𝑦 2 )) 𝑦 1 , 𝑦 2 ∼ 𝑞(𝑦) Trivial solution: Better solution: use more samples! better sampling strategy than i.i.d.

Antithetic Sampling Don’t sample i.i.d. 𝑦 1 , 𝑦 2 ∼ 𝑞 𝑦 1 𝑞(𝑦 2 ) Sample correlated distribution 𝑦 1 , 𝑦 2 ∼ 𝑟(𝑦 1 , 𝑦 2 ) Unbiased if Goal: minimize 𝑟 𝑦 1 = 𝑞 𝑦 1 𝑔 𝑦 1 + 𝑔(𝑦 2 ) 𝑟 𝑦 2 = 𝑞(𝑦 2 ) Var 𝑟(𝑦 1 ,𝑦 2 ) 2

Example: Negative Sampling 𝑟 𝑦 1 , 𝑦 2 defined by 𝑦 2 𝑦 2 𝑞 𝑦 2 1.Sample 𝑦 1 ∼ 𝑞(𝑦) . 𝑦 1 Marginal 2.Pick 𝑦 2 = −𝑦 1 . on 𝑦 2 Marginal on 𝑦 1 𝑦 1 𝑞 𝑦 1

Example: Negative Sampling Best Case Example 𝑟 𝑦 1 , 𝑦 2 defined by 1.Sample 𝑦 1 ∼ 𝑞(𝑦) . 2.Pick 𝑦 2 = −𝑦 1 . 𝑔 𝑦 1 + 𝑔(𝑦 2 ) = 0 2 matches 𝑔 = 𝑦 3 𝐹 𝑞(𝑦) [𝑔 𝑦 ] = 0 𝑔 𝑦 1 +𝑔(𝑦 2 ) Var 𝑟(𝑦 1 ,𝑦 2 ) = 0 2 no error for a sample size of 2!

Example: Negative Sampling Worst Case Example 𝑟 𝑦 1 , 𝑦 2 defined by 1.Sample 𝑦 1 ∼ 𝑞(𝑦) . 2.Pick 𝑦 2 = −𝑦 1 . 𝑔 = 𝑦 2 𝑔 𝑦 1 = 𝑔(𝑦 2 ) , 𝑦 2 redundant 𝑔 𝑦 1 +𝑔(𝑦 2 ) Var 𝑟(𝑦 1 ,𝑦 2 ) doubles! 2

General Result Question: is there an antithetic distribution that always works better than i.i.d.? Yes: sampling without replacement is always a tiny bit better. No Free Lunch (Theorem 1): no antithetic distribution work better than sampling without replacement for every function 𝑔 .

Valid Distribution Set 𝑦 2 𝑦 2 𝑦 1 𝑦 1 𝒭 𝑣𝑜𝑐𝑗𝑏𝑡𝑓𝑒 : Set of distributions 𝑟(𝑦 1 , 𝑦 2 ) that satisfy 𝑟 𝑦 1 = 𝑞 𝑦 1 , 𝑟 𝑦 2 = 𝑞 𝑦 2

Variance of example functions Pick this distribution 𝑦 2 Low Variance 𝑦 1 High Variance 𝒭 𝑣𝑜𝑐𝑗𝑏𝑡𝑓𝑒 : Set of distributions 𝑟(𝑦 1 , 𝑦 2 ) that satisfy 𝑟 𝑦 1 = 𝑞 𝑦 1 , 𝑟 𝑦 2 = 𝑞 𝑦 2 1 = 𝑦 3 𝑔

Variance of example functions 𝑦 2 Pick this distribution 𝑦 1 Low Variance High Variance High Variance 𝒭 𝑣𝑜𝑐𝑗𝑏𝑡𝑓𝑒 : Set of distributions 𝑟(𝑦 1 , 𝑦 2 ) that satisfy 𝑟 𝑦 1 = 𝑞 𝑦 1 , 𝑟 𝑦 2 = 𝑞 𝑦 2 2 = 𝑓 𝑦 + 2𝑦𝑡𝑗𝑜(𝑦) 𝑔

Pick Good Distribution for a Class of Functions 𝑦 2  = 𝑔 1 , 𝑔 2 , … 𝑦 1 Low Variance High Variance on average for  on average for  𝒭 𝑣𝑜𝑐𝑗𝑏𝑡𝑓𝑒 : Set of distributions 𝑟(𝑦 1 , 𝑦 2 ) that satisfy 𝑟 𝑦 1 = 𝑞 𝑦 1 , 𝑟 𝑦 2 = 𝑞 𝑦 2

Pick Good Distribution for a class of functions 𝑦 2 𝑦 1 High Variance Low Variance on average on average 𝒭 𝑣𝑜𝑐𝑗𝑏𝑡𝑓𝑒 : Set of distributions 𝑟(𝑦 1 , 𝑦 2 ) that satisfy 𝑟 𝑦 1 = 𝑞 𝑦 1 , 𝑟 𝑦 2 = 𝑞 𝑦 2 Generalization Training Low variance for similar functions Pick a good 𝑟 for several functions

Training Objective 𝑔 𝑦 1 + 𝑔 𝑦 2 min 𝑟 𝔽 𝑔∼  Var 𝑟 𝑦 1 ,𝑦 2 2 𝑡. 𝑢. 𝑟 𝑦 1 , 𝑦 2 ∈ 𝒭 𝑣𝑜𝑐𝑗𝑏𝑡𝑓𝑒

Practical Training Algorithm We design 1. Parameterization for 𝒭 𝑣𝑜𝑐𝑗𝑏𝑡𝑓𝑒 via copulas. 2. A surrogate objective to optimize the variance.

Wasserstein GAN w/ gradient penalty Variance of Gradient Inception Score Inception Score Inception Score Inception Score Variance Wall Clock Time Batch Size Batch Size per Iteration Gulrajani, Ishaan, et al. "Improved training of wasserstein gans." Advances in Neural Information Processing Systems . 2017.

Importance Weighted Autoencoder Our method VS negative sampling Our method VS i.i.d. sampling Probability of Improvement Log Likelihood Improvement (higher is better) Burda, Yuri, Roger Grosse, and Ruslan Salakhutdinov. "Importance weighted autoencoders." arXiv preprint arXiv:1509.00519 (2015).

Conclusion • Define a general family of (parameterized) unbiased antithetic distribution. • Propose an optimization framework to learn the antithetic distribution based on the task at hand. • Sampling from the resulting joint distribution reduces variance at negligible computation cost. Welcome to our poster session for further discussions! Thursday 6:30-9pm @ Pacific Ballroom #205

Adaptive Antithetic Sampling for Variance Reduction Hongyu Ren*, - PowerPoint PPT Presentation

Adaptive Antithetic Sampling for Variance Reduction Hongyu Ren, Shengjia Zhao, Stefano Ermon *equal contribution Goal Estimation of = () [ ] is ubiquitous in machine learning problems. Environment

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Stochastic Simulation Methods: Variance reduction methods Antithetic variables Bo Friis

Ninomiya-Victoir scheme: strong convergence, antithetic version and application to multilevel

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

Software Security: Buffer Overflow Attacks Fall 2016 Adam (Ada) Lerner lerner@cs.washington.edu

An Efficient General Purpose Elliptic Curve Cryptography Module for Ubiquitous Sensor Networks

WiFi Direct in Android Mobile and Ubiquitous Compu9ng

Gas Accretion & Outflows from Redshift z~1 Galaxies David C. Koo Kate Rubin, Ben Weiner,

Agora Virtual e-learning federated by design Jose A. Accino 1 Victoriano Giralt 1 Manuel Cebrian 2

CIRUS : A Cloud Infrastructure for Real-time Ubilytics (aka ubiquitous big data analytics)

Transactions Definition a sequence of one or more operations on one or more resources

Bin Packing Problem with Generalized Time Lags: A Branch-Cut-and-Price Approach Franois

Adaptive Antithetic Sampling for Variance Reduction Hongyu Ren*, - PowerPoint PPT Presentation

Adaptive Antithetic Sampling for Variance Reduction Hongyu Ren*, Shengjia Zhao*, Stefano Ermon *equal contribution Goal Estimation of = () [ ] is ubiquitous in machine learning problems. Environment

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Stochastic Simulation Methods: Variance reduction methods Antithetic variables Bo Friis

Ninomiya-Victoir scheme: strong convergence, antithetic version and application to multilevel

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean &amp; Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

Software Security: Buffer Overflow Attacks Fall 2016 Adam (Ada) Lerner lerner@cs.washington.edu

An Efficient General Purpose Elliptic Curve Cryptography Module for Ubiquitous Sensor Networks

WiFi Direct in Android Mobile and Ubiquitous Compu9ng

Gas Accretion &amp; Outflows from Redshift z~1 Galaxies David C. Koo Kate Rubin, Ben Weiner,

Agora Virtual e-learning federated by design Jose A. Accino 1 Victoriano Giralt 1 Manuel Cebrian 2

CIRUS : A Cloud Infrastructure for Real-time Ubilytics (aka ubiquitous big data analytics)

Transactions Definition a sequence of one or more operations on one or more resources

Bin Packing Problem with Generalized Time Lags: A Branch-Cut-and-Price Approach Franois

Adaptive Antithetic Sampling for Variance Reduction Hongyu Ren, Shengjia Zhao, Stefano Ermon *equal contribution Goal Estimation of = () [ ] is ubiquitous in machine learning problems. Environment

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Gas Accretion & Outflows from Redshift z~1 Galaxies David C. Koo Kate Rubin, Ben Weiner,