Bayesian Batch Active Learning as Sparse Subset Approximation - PowerPoint PPT Presentation

Bayesian Batch Active Learning as Sparse Subset Approximation Robert Pinsler Jonathan Gordon Eric Nalisnick José Miguel Hernández-Lobato October 2019 Research Talk

Introduction • Acquiring labels for supervised learning can be costly and time-consuming • In such settings, active learning (AL) enables data-efficient model training by intelligently selecting points for which labels should be requested

Introduction Model Train model Pool-based Labeled Unlabeled active learning training set pool set (AL) Select queries Oracle

Introduction Sequential AL loop Train Query single Update Select data model point model data point

Introduction Sequential AL loop Batch AL approaches: - scale to large datasets and models - enable parallel data acquisition - (ideally) trade off diversity and representativeness Train Query single Update Select data model point model data point How to construct such a batch?

Bayesian Batch Active Learning Bayesian approach: Choose set of points that maximally reduces uncertainty over parameter posterior • NP-hard, but greedy approximations exist: MaxEnt , BALD • Naïve batch strategy: Select b best points according to acquisition function

Bayesian Batch Active Learning Bayesian approach: Choose set of points that maximally reduces uncertainty over parameter posterior • NP-hard, but greedy approximations exist: MaxEnt , BALD • Naïve batch strategy: Select b best points according to acquisition function Budget is wasted on selecting nearby points MaxEnt BALD

Bayesian Batch Active Learning Bayesian approach: Choose set of points that maximally reduces uncertainty over parameter posterior • NP-hard, but greedy approximations exist: MaxEnt , BALD • Naïve batch strategy: Select b best points according to acquisition function Budget is wasted on selecting nearby points MaxEnt BALD Idea: Re-cast batch construction as optimizing a sparse subset approximation to complete data posterior

Bayesian Batch Active Learning Bayesian approach: Choose set of points that maximally reduces uncertainty over parameter posterior • NP-hard, but greedy approximations exist: MaxEnt , BALD • Naïve batch strategy: Select b best points according to acquisition function MaxEnt BALD Ours Idea: Re-cast batch construction as optimizing a sparse subset approximation to complete data posterior

Related Work Bayesian coresets Idea: Re-cast batch construction as optimizing a sparse subset approximation to complete data posterior We take inspiration from Bayesian coresets • Coreset: Summarize data by sparse, weighted subset • Bayesian coreset: Approximate posterior by sparse, weighted subset

Related Work Bayesian coresets Idea: Re-cast batch construction as optimizing a sparse subset approximation to complete data posterior We take inspiration from Bayesian coresets • Coreset: Summarize data by sparse, weighted subset • Bayesian coreset: Approximate posterior by sparse, weighted subset • Batch AL with Bayesian coresets: Batch = Bayesian coreset

Batch Construction as Sparse Subset Approximation Choose batch such that best approximates

Batch Construction as Sparse Subset Approximation Choose batch such that best approximates We don't know the labels of the points in the pool set before querying them

Batch Construction as Sparse Subset Approximation Choose batch such that best approximates We don't know the labels of the points in the pool set before querying them T ake expectation w.r.t. current predictive posterior distribution:

Batch Construction as Sparse Subset Approximation Hilbert coresets

Batch Construction as Sparse Subset Approximation Hilbert coresets • Considers directionality of residual error → adaptively construct batch while accounting for similarity between data points (induced by norm) • Still intractable!

Batch Construction as Sparse Subset Approximation Frank-Wolfe optimization

Batch Construction as Sparse Subset Approximation Frank-Wolfe optimization 1 1. Relax constraints

Batch Construction as Sparse Subset Approximation Frank-Wolfe optimization 1 1. Relax constraints 2 2. Apply Frank-Wolfe algorithm • Geometrically motivated convex optimization algorithm • Iteratively selects vector most aligned with residual error • Corresponds to adding at most one data point to batch in every iteration

Batch Construction as Sparse Subset Approximation Frank-Wolfe optimization 1 1. Relax constraints 2 2. Apply Frank-Wolfe algorithm • Geometrically motivated convex optimization algorithm • Iteratively selects vector most aligned with residual error • Corresponds to adding at most one data point to batch in every iteration 3 3. Project continuous weights back to feasible space (i.e. binarize them)

Batch Construction as Sparse Subset Approximation Frank-Wolfe optimization 1 1. Relax constraints 2 2. Apply Frank-Wolfe algorithm • Geometrically motivated convex optimization algorithm • Iteratively selects vector most aligned with residual error • Corresponds to adding at most one data point to batch in every iteration 3 3. Project continuous weights back to feasible space (i.e. binarize them) Which norm is appropriate?

Batch Construction as Sparse Subset Approximation Choice of Inner Products Norm is induced by inner product, e.g. 1. Weighted Fisher inner product + Leads to simple, interpretable expressions for linear models -- Requires taking gradients w.r.t. parameters -- Scales quadratically with pool set size

Batch Construction as Sparse Subset Approximation Choice of Inner Products Norm is induced by inner product, e.g. 1. Weighted Fisher inner product + Leads to simple, interpretable expressions for linear models -- Requires taking gradients w.r.t. parameters -- Scales quadratically with pool set size Example: Linear regression

Batch Construction as Sparse Subset Approximation Choice of Inner Products Norm is induced by inner product, e.g. 1. Weighted Fisher inner product + Leads to simple, interpretable expressions for linear models -- Requires taking gradients w.r.t. parameters -- Scales quadratically with pool set size Example: Linear regression • Connections to BALD , leverage scores and influence functions • Probit regression also yields interpretable closed-form solution

Batch Construction as Sparse Subset Approximation Choice of Inner Products Norm is induced by inner product, e.g. 1. Weighted Fisher inner product + Leads to simple, interpretable expressions for linear models -- Requires taking gradients w.r.t. parameters -- Scales quadratically with pool set size 2. Weighted Euclidean inner product + Only requires tractable likelihood computations + Scalable to large pool set sizes (linearly) and complex, non-linear models through random projections -- No gradient information utilized

Batch Construction as Sparse Subset Approximation Choice of Inner Products Norm is induced by inner product, e.g. 1. Weighted Fisher inner product + Leads to simple, interpretable expressions for linear models -- Requires taking gradients w.r.t. parameters -- Scales quadratically with pool set size 2. Weighted Euclidean inner product + Only requires tractable likelihood computations + Scalable to large pool set sizes (linearly) and complex, non-linear models through random projections -- No gradient information utilized J-dimensional random projection in Euclidean space

Experimental Setup Experiments (i) Does our approach avoid correlated queries? closed form (ii) Is our method competitive in the small-data regime? closed form projections (iii) Does our method scale to large datasets and models?

Experimental Setup Experiments (i) Does our approach avoid correlated queries? closed form (ii) Is our method competitive in the small-data regime? closed form projections (iii) Does our method scale to large datasets and models? Model: Neural Linear Deterministic ? Stochastic fully feature extractor connected layer (e.g. ConvNet) Exact inference (regression) Mean-field VI (classification)

Experiments: Probit Regression Does our approach avoid correlated queries? BALD ACS-FW

Experiments: Probit Regression Does our approach avoid correlated queries? BALD No change ACS-FW

Experiments: Probit Regression Does our approach avoid correlated queries? BALD No change ACS-FW Rotates in data space

Experiments: Probit Regression Does our approach avoid correlated queries? BALD ACS-FW And again...

Experiments: Probit Regression Does our approach avoid correlated queries? BALD ACS-FW ACS-FW queries diverse batch of points

Experiments: Regression Is our method competitive in the small-data regime?

Experiments: Regression Is our method competitive in the small-data regime? Competitive on small data, even more beneficial for larger N

Experiments: Regression Does our method scale to large datasets and models?

Bayesian Batch Active Learning as Sparse Subset Approximation - PowerPoint PPT Presentation

Bayesian Batch Active Learning as Sparse Subset Approximation Robert Pinsler Jonathan Gordon Eric Nalisnick Jos Miguel Hernndez-Lobato October 2019 Research Talk Introduction Acquiring labels for supervised learning can be costly

Batch Systems Running calculations on HPC resources Outline What is a batch system? How

Batch Mode Active Learning and Its Application to Medical Image Classification ICML 2006 S. Hoi,

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Theorem 7.56 SUBSET-SUM is NP Complete ANSHUMAN MOHANTY SUBSET-SUM Problem Consider a set of

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Part I bers, t - target number Question: Is there a subset of X such the sum of its elements is t ?

W4231: Analysis of Algorithms Subset Sum The Subset Sum problem is defined as follows: 11/30/99

More Recursion Summary Topics: more recursion Subset sum: finding if a subset of an

The Efficiency of Geometric Samplers for Exoplanet Transit Timing Variation Models Noah W. Tuchow,

Approximate Posterior Sampling via Stochastic Optimisation Connie Trojan Supervisor: Srshti

Bayesian estimation approach in frameworks, integration of compilation and analysis Jan W. van

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Rising Skill Premium? The Roles of Capital-Skill Complementarity and Sectoral Shifts in a

Commercial Real Estate Loan Commercial Real Estate Loan Workouts and Modifications Strategies for

September 2014 Investor Presentation 1 nemo2014\Presentations\Analyst Presentation Jan14\201401

Maturity of Interest-only Mortgages (MIOM) C3 & C4 lenders thematic project overview Council

Bayesian Batch Active Learning as Sparse Subset Approximation - PowerPoint PPT Presentation

Bayesian Batch Active Learning as Sparse Subset Approximation Robert Pinsler Jonathan Gordon Eric Nalisnick Jos Miguel Hernndez-Lobato October 2019 Research Talk Introduction Acquiring labels for supervised learning can be costly

Batch Systems Running calculations on HPC resources Outline What is a batch system? How

Batch Mode Active Learning and Its Application to Medical Image Classification ICML 2006 S. Hoi,

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Theorem 7.56 SUBSET-SUM is NP Complete ANSHUMAN MOHANTY SUBSET-SUM Problem Consider a set of

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Part I bers, t - target number Question: Is there a subset of X such the sum of its elements is t ?

W4231: Analysis of Algorithms Subset Sum The Subset Sum problem is defined as follows: 11/30/99

More Recursion Summary Topics: more recursion Subset sum: finding if a subset of an

The Efficiency of Geometric Samplers for Exoplanet Transit Timing Variation Models Noah W. Tuchow,

Approximate Posterior Sampling via Stochastic Optimisation Connie Trojan Supervisor: Srshti

Bayesian estimation approach in frameworks, integration of compilation and analysis Jan W. van

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Rising Skill Premium? The Roles of Capital-Skill Complementarity and Sectoral Shifts in a

Commercial Real Estate Loan Commercial Real Estate Loan Workouts and Modifications Strategies for

September 2014 Investor Presentation 1 nemo2014\Presentations\Analyst Presentation Jan14\201401

Maturity of Interest-only Mortgages (MIOM) C3 &amp; C4 lenders thematic project overview Council

Maturity of Interest-only Mortgages (MIOM) C3 & C4 lenders thematic project overview Council