[PPT] - LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data PowerPoint Presentation

SLIDE 1

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian Trippe, Jonathan Huggins, Raj Agrawal, and Tamara Broderick

SLIDE 2

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian Trippe, Jonathan Huggins, Raj Agrawal, and Tamara Broderick

Genomic Study (motivating example)

Goal: Understand relationship between

genomic variation & disease outcome

N=20,000 samples — D=500,000 SNPs

https://www.ebi.ac.uk/training/

Cases Controls Diseased Healthy

SLIDE 3

Generalized Linear Models (GLMs)

Interpretability
E.g. Logistic/Poisson/Negative

Binomial Regression

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian Trippe, Jonathan Huggins, Raj Agrawal, and Tamara Broderick

Genomic Study (motivating example)

Goal: Understand relationship between

genomic variation & disease outcome

N=20,000 samples — D=500,000 SNPs

https://www.ebi.ac.uk/training/

Cases Controls Diseased Healthy

SLIDE 4

Generalized Linear Models (GLMs)

Interpretability
E.g. Logistic/Poisson/Negative

Binomial Regression

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian Trippe, Jonathan Huggins, Raj Agrawal, and Tamara Broderick

Genomic Study (motivating example)

Goal: Understand relationship between

genomic variation & disease outcome

N=20,000 samples — D=500,000 SNPs

Bayesian Modeling & Inference

Coherent uncertainty quantification

Problem: Super-linear scaling with D

https://www.ebi.ac.uk/training/

Cases Controls Diseased Healthy

SLIDE 5

Generalized Linear Models (GLMs)

Interpretability
E.g. Logistic/Poisson/Negative

Binomial Regression

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian Trippe, Jonathan Huggins, Raj Agrawal, and Tamara Broderick

Genomic Study (motivating example)

Goal: Understand relationship between

genomic variation & disease outcome

N=20,000 samples — D=500,000 SNPs

Bayesian Modeling & Inference

Coherent uncertainty quantification

Problem: Super-linear scaling with D

https://www.ebi.ac.uk/training/

Cases Controls Diseased Healthy

SLIDE 6

Generalized Linear Models (GLMs)

Interpretability
E.g. Logistic/Poisson/Negative

Binomial Regression

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian Trippe, Jonathan Huggins, Raj Agrawal, and Tamara Broderick

Genomic Study (motivating example)

Goal: Understand relationship between

genomic variation & disease outcome

N=20,000 samples — D=500,000 SNPs

We present LR-GLM, a method with linear scaling in D and theoretical guarantees on quality

Bayesian Modeling & Inference

Coherent uncertainty quantification

Problem: Super-linear scaling with D

https://www.ebi.ac.uk/training/

Cases Controls Diseased Healthy

SLIDE 7

How does it work?

SLIDE 8

How does it work?

Cartoon Example

Logistic Regression with two

correlated features

SLIDE 9

How does it work?

Cartoon Example

Logistic Regression with two

correlated features

SLIDE 10

How does it work?

Cartoon Example

Logistic Regression with two

correlated features

Uncertainty in Effect Sizes

SLIDE 11

How does it work?

Cartoon Example

Logistic Regression with two

correlated features

Uncertainty in Effect Sizes

L

t

s

f

I n f

r

m a t i

n

L i t t l e I n f

r

m a t i

n

SLIDE 12

How does it work?

Cartoon Example

Logistic Regression with two

correlated features

The LR-GLM Approximation

We ignore the least informative directions

|

{

p(yi|xT

i β) ≈ p(yi|xiUU T β)

Uncertainty in Effect Sizes

L

t

s

f

I n f

r

m a t i

n

L i t t l e I n f

r

m a t i

n

SLIDE 13

How does it work?

Cartoon Example

Logistic Regression with two

correlated features

The LR-GLM Approximation

We ignore the least informative directions

|

{

p(yi|xT

i β) ≈ p(yi|xiUU T β)

Uncertainty in Effect Sizes

L

t

s

f

I n f

r

m a t i

n

L i t t l e I n f

r

m a t i

n

SLIDE 14

How does it work?

Cartoon Example

Logistic Regression with two

correlated features

Approximation Quality

Exact when data are low rank
We prove: Approximation is

close when the data are approximately low rank

The LR-GLM Approximation

We ignore the least informative directions

|

{

p(yi|xT

i β) ≈ p(yi|xiUU T β)

Uncertainty in Effect Sizes

L

t

s

f

I n f

r

m a t i

n

L i t t l e I n f

r

m a t i

n

SLIDE 15

How does it work?

Cartoon Example

Logistic Regression with two

correlated features

Approximation Quality

Exact when data are low rank
We prove: Approximation is

close when the data are approximately low rank

The LR-GLM Approximation

We ignore the least informative directions

|

{

p(yi|xT

i β) ≈ p(yi|xiUU T β)

Uncertainty in Effect Sizes

L

t

s

f

I n f

r

m a t i

n

L i t t l e I n f

r

m a t i

n

SLIDE 16

Does it Work?

SLIDE 17

Does it Work?

Evaluate by comparing exact means and uncertainties (slow) against

ur approximation (fast)

SLIDE 18

Exact Uncertainty

Approx. Uncertainty

Exact Mean

Approx. Mean

Does it Work?

Evaluate by comparing exact means and uncertainties (slow) against

ur approximation (fast)
Post. Mean Estimation
Post. Uncertainty Estimation

SLIDE 19

Exact Uncertainty

Approx. Uncertainty

Exact Mean

Approx. Mean

Does it Work?

We rigorously show…

Rank of approximation defines a computational-statistical trade-off
The approximation is conservative (overestimates uncertainty)
For high-dimensional, correlated data, LR-GLM closely

approximates the exact posterior up to 5X faster!

Evaluate by comparing exact means and uncertainties (slow) against

ur approximation (fast)
Post. Mean Estimation
Post. Uncertainty Estimation

SLIDE 20

Exact Uncertainty

Approx. Uncertainty

Exact Mean

Approx. Mean

Does it Work?

We rigorously show…

Rank of approximation defines a computational-statistical trade-off
The approximation is conservative (overestimates uncertainty)
For high-dimensional, correlated data, LR-GLM closely

approximates the exact posterior up to 5X faster!

Evaluate by comparing exact means and uncertainties (slow) against

ur approximation (fast)
Post. Mean Estimation
Post. Uncertainty Estimation

SLIDE 21

Exact Uncertainty

Approx. Uncertainty

Exact Mean

Approx. Mean

Does it Work?

We rigorously show…

Rank of approximation defines a computational-statistical trade-off
The approximation is conservative (overestimates uncertainty)
For high-dimensional, correlated data, LR-GLM closely

approximates the exact posterior up to 5X faster!

Evaluate by comparing exact means and uncertainties (slow) against

ur approximation (fast)
Post. Mean Estimation
Post. Uncertainty Estimation

SLIDE 22

Exact Uncertainty

Approx. Uncertainty

Exact Mean

Approx. Mean

Does it Work?

We rigorously show…

Rank of approximation defines a computational-statistical trade-off
The approximation is conservative (overestimates uncertainty)
For high-dimensional, correlated data, LR-GLM closely

approximates the exact posterior up to 5X faster!

Evaluate by comparing exact means and uncertainties (slow) against

ur approximation (fast)
Post. Mean Estimation
Post. Uncertainty Estimation

SLIDE 23

Exact Uncertainty

Approx. Uncertainty

Exact Mean

Approx. Mean

Does it Work?

We rigorously show…

Rank of approximation defines a computational-statistical trade-off
The approximation is conservative (overestimates uncertainty)
For high-dimensional, correlated data, LR-GLM closely

approximates the exact posterior up to 5X faster!

Evaluate by comparing exact means and uncertainties (slow) against

ur approximation (fast)
Post. Mean Estimation
Post. Uncertainty Estimation

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Brian L. Trippe, Jonathan H. Huggins, Raj Agrawal and Tamara Broderick Paper: proceedings.mlr.press/v97/trippe19a Poster: Pacific Ballroom #214