LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data - - PowerPoint PPT Presentation

lr glm high dimensional bayesian inference using low rank
SMART_READER_LITE
LIVE PREVIEW

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data - - PowerPoint PPT Presentation

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations Brian Trippe , Jonathan Huggins, Raj Agrawal, and Tamara Broderick LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations Brian Trippe ,


slide-1
SLIDE 1

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian Trippe, Jonathan Huggins, Raj Agrawal, and Tamara Broderick

slide-2
SLIDE 2

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian Trippe, Jonathan Huggins, Raj Agrawal, and Tamara Broderick

Genomic Study (motivating example)

  • Goal: Understand relationship between

genomic variation & disease outcome

  • N=20,000 samples — D=500,000 SNPs

https://www.ebi.ac.uk/training/

Cases Controls Diseased Healthy

slide-3
SLIDE 3

Generalized Linear Models (GLMs)

  • Interpretability
  • E.g. Logistic/Poisson/Negative

Binomial Regression

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian Trippe, Jonathan Huggins, Raj Agrawal, and Tamara Broderick

Genomic Study (motivating example)

  • Goal: Understand relationship between

genomic variation & disease outcome

  • N=20,000 samples — D=500,000 SNPs

https://www.ebi.ac.uk/training/

Cases Controls Diseased Healthy

slide-4
SLIDE 4

Generalized Linear Models (GLMs)

  • Interpretability
  • E.g. Logistic/Poisson/Negative

Binomial Regression

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian Trippe, Jonathan Huggins, Raj Agrawal, and Tamara Broderick

Genomic Study (motivating example)

  • Goal: Understand relationship between

genomic variation & disease outcome

  • N=20,000 samples — D=500,000 SNPs

Bayesian Modeling & Inference

  • Coherent uncertainty quantification

Problem: Super-linear scaling with D

https://www.ebi.ac.uk/training/

Cases Controls Diseased Healthy

slide-5
SLIDE 5

Generalized Linear Models (GLMs)

  • Interpretability
  • E.g. Logistic/Poisson/Negative

Binomial Regression

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian Trippe, Jonathan Huggins, Raj Agrawal, and Tamara Broderick

Genomic Study (motivating example)

  • Goal: Understand relationship between

genomic variation & disease outcome

  • N=20,000 samples — D=500,000 SNPs

Bayesian Modeling & Inference

  • Coherent uncertainty quantification

Problem: Super-linear scaling with D

https://www.ebi.ac.uk/training/

Cases Controls Diseased Healthy

slide-6
SLIDE 6

Generalized Linear Models (GLMs)

  • Interpretability
  • E.g. Logistic/Poisson/Negative

Binomial Regression

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian Trippe, Jonathan Huggins, Raj Agrawal, and Tamara Broderick

Genomic Study (motivating example)

  • Goal: Understand relationship between

genomic variation & disease outcome

  • N=20,000 samples — D=500,000 SNPs

We present LR-GLM, a method with linear scaling in D and theoretical guarantees on quality

Bayesian Modeling & Inference

  • Coherent uncertainty quantification

Problem: Super-linear scaling with D

https://www.ebi.ac.uk/training/

Cases Controls Diseased Healthy

slide-7
SLIDE 7

How does it work?

slide-8
SLIDE 8

How does it work?

Cartoon Example

  • Logistic Regression with two

correlated features

slide-9
SLIDE 9

How does it work?

Cartoon Example

  • Logistic Regression with two

correlated features

slide-10
SLIDE 10

How does it work?

Cartoon Example

  • Logistic Regression with two

correlated features

Uncertainty in Effect Sizes

slide-11
SLIDE 11

How does it work?

Cartoon Example

  • Logistic Regression with two

correlated features

Uncertainty in Effect Sizes

L

  • t

s

  • f

I n f

  • r

m a t i

  • n

L i t t l e I n f

  • r

m a t i

  • n
slide-12
SLIDE 12

How does it work?

Cartoon Example

  • Logistic Regression with two

correlated features

The LR-GLM Approximation

We ignore the least informative directions

|

{

p(yi|xT

i β) ≈ p(yi|xiUU T β)

Uncertainty in Effect Sizes

L

  • t

s

  • f

I n f

  • r

m a t i

  • n

L i t t l e I n f

  • r

m a t i

  • n
slide-13
SLIDE 13

How does it work?

Cartoon Example

  • Logistic Regression with two

correlated features

The LR-GLM Approximation

We ignore the least informative directions

|

{

p(yi|xT

i β) ≈ p(yi|xiUU T β)

Uncertainty in Effect Sizes

L

  • t

s

  • f

I n f

  • r

m a t i

  • n

L i t t l e I n f

  • r

m a t i

  • n
slide-14
SLIDE 14

How does it work?

Cartoon Example

  • Logistic Regression with two

correlated features

Approximation Quality

  • Exact when data are low rank
  • We prove: Approximation is

close when the data are approximately low rank

The LR-GLM Approximation

We ignore the least informative directions

|

{

p(yi|xT

i β) ≈ p(yi|xiUU T β)

Uncertainty in Effect Sizes

L

  • t

s

  • f

I n f

  • r

m a t i

  • n

L i t t l e I n f

  • r

m a t i

  • n
slide-15
SLIDE 15

How does it work?

Cartoon Example

  • Logistic Regression with two

correlated features

Approximation Quality

  • Exact when data are low rank
  • We prove: Approximation is

close when the data are approximately low rank

The LR-GLM Approximation

We ignore the least informative directions

|

{

p(yi|xT

i β) ≈ p(yi|xiUU T β)

Uncertainty in Effect Sizes

L

  • t

s

  • f

I n f

  • r

m a t i

  • n

L i t t l e I n f

  • r

m a t i

  • n
slide-16
SLIDE 16

Does it Work?

slide-17
SLIDE 17

Does it Work?

Evaluate by comparing exact means and uncertainties (slow) against

  • ur approximation (fast)
slide-18
SLIDE 18

Exact Uncertainty

  • Approx. Uncertainty

Exact Mean

  • Approx. Mean

Does it Work?

Evaluate by comparing exact means and uncertainties (slow) against

  • ur approximation (fast)
  • Post. Mean Estimation
  • Post. Uncertainty Estimation
slide-19
SLIDE 19

Exact Uncertainty

  • Approx. Uncertainty

Exact Mean

  • Approx. Mean

Does it Work?

We rigorously show…

  • Rank of approximation defines a computational-statistical trade-off
  • The approximation is conservative (overestimates uncertainty)
  • For high-dimensional, correlated data, LR-GLM closely

approximates the exact posterior up to 5X faster!

Evaluate by comparing exact means and uncertainties (slow) against

  • ur approximation (fast)
  • Post. Mean Estimation
  • Post. Uncertainty Estimation
slide-20
SLIDE 20

Exact Uncertainty

  • Approx. Uncertainty

Exact Mean

  • Approx. Mean

Does it Work?

We rigorously show…

  • Rank of approximation defines a computational-statistical trade-off
  • The approximation is conservative (overestimates uncertainty)
  • For high-dimensional, correlated data, LR-GLM closely

approximates the exact posterior up to 5X faster!

Evaluate by comparing exact means and uncertainties (slow) against

  • ur approximation (fast)
  • Post. Mean Estimation
  • Post. Uncertainty Estimation
slide-21
SLIDE 21

Exact Uncertainty

  • Approx. Uncertainty

Exact Mean

  • Approx. Mean

Does it Work?

We rigorously show…

  • Rank of approximation defines a computational-statistical trade-off
  • The approximation is conservative (overestimates uncertainty)
  • For high-dimensional, correlated data, LR-GLM closely

approximates the exact posterior up to 5X faster!

Evaluate by comparing exact means and uncertainties (slow) against

  • ur approximation (fast)
  • Post. Mean Estimation
  • Post. Uncertainty Estimation
slide-22
SLIDE 22

Exact Uncertainty

  • Approx. Uncertainty

Exact Mean

  • Approx. Mean

Does it Work?

We rigorously show…

  • Rank of approximation defines a computational-statistical trade-off
  • The approximation is conservative (overestimates uncertainty)
  • For high-dimensional, correlated data, LR-GLM closely

approximates the exact posterior up to 5X faster!

Evaluate by comparing exact means and uncertainties (slow) against

  • ur approximation (fast)
  • Post. Mean Estimation
  • Post. Uncertainty Estimation
slide-23
SLIDE 23

Exact Uncertainty

  • Approx. Uncertainty

Exact Mean

  • Approx. Mean

Does it Work?

We rigorously show…

  • Rank of approximation defines a computational-statistical trade-off
  • The approximation is conservative (overestimates uncertainty)
  • For high-dimensional, correlated data, LR-GLM closely

approximates the exact posterior up to 5X faster!

Evaluate by comparing exact means and uncertainties (slow) against

  • ur approximation (fast)
  • Post. Mean Estimation
  • Post. Uncertainty Estimation

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian L. Trippe, Jonathan H. Huggins, Raj Agrawal and Tamara Broderick Paper: proceedings.mlr.press/v97/trippe19a Poster: Pacific Ballroom #214