Structured Association II 02-715 Advanced Topics in Computa8onal - - PowerPoint PPT Presentation

▶

Oct 14, 2022 243 likes •542 views

Structured Association II 02-715 Advanced Topics in Computa8onal Genomics Regression with Regularization Group lasso (Yuan and Lin, 2006) L1/L2 2 || j || L 1/ L 2 = jk k

SLIDE 1

Structured Association II

02-‑715 ¡Advanced ¡Topics ¡in ¡Computa8onal ¡ Genomics ¡

SLIDE 2

Regression with Regularization

Group ¡lasso ¡(Yuan ¡and ¡Lin, ¡2006) ¡

– Parameter ¡es8ma8on: ¡pathwise ¡coordinate ¡descent ¡(Friedman ¡et ¡al, ¡2007) ¡

L1/L2

|| βj ||L1/ L2= β jk

2 k

∑

SLIDE 3

Regression with Regularization (Group Lasso Penalty)

Lasso penalty Group lasso penalty L2 penalty

SLIDE 4

Lasso (Tibshirani, 1996)

Inputs Outputs

Regression Coefficients

SLIDE 5

L1/L2-regularized Multi-task Regression

(Obozinski et al., 2008) Regression Coefficients

Inputs Outputs

SLIDE 6

Hierarchical Selection with Nested Groups

(Zhao, Rocha, and Yu, 2009)

Prior knowledge on group hierarchies One-sided tree Almost-complete tree Regular tree

SLIDE 7

Grouped Variable Selection in Structured Association

Mul8-‑popula8on ¡lasso ¡(Puniyani ¡et ¡al., ¡2010) ¡

– Groups ¡are ¡given ¡as ¡groups ¡of ¡individuals ¡for ¡different ¡popula8ons ¡

Tree-‑guided ¡group ¡lasso ¡(Kim ¡& ¡Xing, ¡2010) ¡

– Groups ¡are ¡defined ¡hierarchically ¡according ¡to ¡hierarchical ¡clustering ¡ tree ¡

SLIDE 8

Pooled analysis

f multiple populations:

lasso for all populations

Separate analysis

f multiple populations:

lasso for each population

Population Structure in GWAS

SLIDE 9

Multi-population group lasso

Population Structure in GWAS

SLIDE 10

Analysis of Lactose Intolerance Dataset

Lasso (pooled) Lasso (separate) Eigen strat Single SNP test

Multi- population lasso

SLIDE 11

Tree-Guided Group Lasso

h2 h1

Inputs Outputs

Regression Coefficients

Tree-guided group lasso penalty Key idea: use overlapping groups in group lasso

SLIDE 12

Outputs (Genes)

Example: Learning Genetic Associations

TCGACGTTTTACTGTACAATT ¡

Inputs (SNPs) Regression coefficients

SLIDE 13

Tree-Guided Group-Lasso Penalty

Hierarchical clustering tree over the outputs

(tasks) as prior knowledge

– Tree structure: clustering at multiple granularity – Heights of internal nodes: strength of clustering

Group-lasso-like penalty with overlapping

groups

– Each group at each node of the tree

h2 h1

SLIDE 14

Tree-Guided Group Lasso

Low height
Tight correlation
Joint selection
Large height
Weak correlation
Separate selection

h h

In a simple case of two outputs

Inputs Inputs

SLIDE 15

L1 penalty

Lasso penalty
Separate selection

L2 penalty

Group lasso
Joint selection

h Elastic net

Select the child nodes jointly or separately?

Tree-Guided Group Lasso

In a simple case of two outputs

Tree-guided group lasso

Inputs

SLIDE 16

h2 h1

Select the child nodes jointly or separately?

Joint

selection Separate selection

For a general tree

Tree-Guided Group Lasso

Tree-guided group lasso

SLIDE 17

h2 h1

Select the child nodes jointly or separately?

For a general tree

Tree-Guided Group Lasso

Tree-guided group lasso Joint

selection Separate selection

Note that the groups overlap!

SLIDE 18

Overlapping Groups in Tree-guided Group Lasso

Balanced penalization

SLIDE 19

Overlapping Groups

Previously
Arbitrarily overlapping groups (Jenatton, Audibert, Bach, 2009)
Overlapping groups over tree-structured inputs (Zhao, Roach, Yu, 2008)

Unbalanced penalization

SLIDE 20

Tree-Guided Group-Lasso Penalty

Penalty function

where

SLIDE 21

Unit Contour Surface for Various Penalty Function

Lasso L1/L2 Tree g1=0.5, g2=0.5 Tree g1=0.2, g2=0.7 Tree g1=0.7, g2=0.2

SLIDE 22

Estimating Parameters

Second-order cone program

– Many publicly available software packages for solving convex

ptimization problems can be used
Also, variational formulation

SLIDE 23

Proximal Gradient Descent

Original Problem: Approximation Problem: Gradient of the Approximation:

SLIDE 24

Geometric Interpretation

Smooth ¡approxima8on ¡

Uppermost Line Nonsmooth Uppermost Line Smooth

SLIDE 25

Illustration with Simulated Data

True ¡regression ¡ coefficients ¡ Lasso ¡ ¡ Tree-‑guided ¡ group ¡lasso ¡ ¡

No association High association

L1/L2-‑regularized ¡ mul8-‑task ¡regression ¡ ¡

Inputs (SNPs) Outputs (Genes)

SLIDE 26

Simulation Study: ROC Curves

Results averaged over 50 simulated datasets

SLIDE 27

Simulation Study: Prediction Errors

Results averaged over 50 simulated datasets

SLIDE 28

Experiments

Yeast Dataset

– Inputs: 21 genetic variations in chromosome 3 of yeast – Outputs: gene expression measurements for 3684 – Samples for 114 yeast strains

Goal: learn input features that preturb the output

gene expression levels

SLIDE 29

Yeast eQTL Analysis

Lasso ¡ ¡ Tree-‑guided ¡ group ¡lasso ¡ ¡

Inputs (SNPs) Outputs (Genes)

No association High association

Hierarchical ¡ clustering ¡tree ¡for ¡ genes ¡(outputs) ¡ L1/L2-‑regularized ¡ mul8-‑task ¡regression ¡ ¡

Structured Association II

02-­‑715 ¡Advanced ¡Topics ¡in ¡Computa8onal ¡ Genomics ¡

Regression with Regularization

|| βj ||L1/ L2= β jk

∑

Regression with Regularization (Group Lasso Penalty)

Lasso (Tibshirani, 1996)

L1/L2-regularized Multi-task Regression

Hierarchical Selection with Nested Groups

Grouped Variable Selection in Structured Association

– Groups ¡are ¡given ¡as ¡groups ¡of ¡individuals ¡for ¡different ¡popula8ons ¡

– Groups ¡are ¡defined ¡hierarchically ¡according ¡to ¡hierarchical ¡clustering ¡ tree ¡

Separate analysis

lasso for each population

Population Structure in GWAS

Multi-population group lasso

Population Structure in GWAS

Analysis of Lactose Intolerance Dataset

Tree-Guided Group Lasso

h2 h1

Tree-guided group lasso penalty Key idea: use overlapping groups in group lasso

Example: Learning Genetic Associations

TCGACGTTTTACTGTACAATT ¡

Tree-Guided Group-Lasso Penalty

(tasks) as prior knowledge

groups

h2 h1

Tree-Guided Group Lasso

h h

L1 penalty

L2 penalty

h Elastic net

Tree-Guided Group Lasso

Tree-guided group lasso

h2 h1

Joint

Tree-Guided Group Lasso

Tree-guided group lasso

h2 h1

Tree-Guided Group Lasso

Tree-guided group lasso Joint

Note that the groups overlap!

Overlapping Groups in Tree-guided Group Lasso

Balanced penalization

Overlapping Groups

Unbalanced penalization

Tree-Guided Group-Lasso Penalty

Unit Contour Surface for Various Penalty Function

Lasso L1/L2 Tree g1=0.5, g2=0.5 Tree g1=0.2, g2=0.7 Tree g1=0.7, g2=0.2

Estimating Parameters

– Many publicly available software packages for solving convex

Proximal Gradient Descent

Geometric Interpretation

Illustration with Simulated Data

Simulation Study: ROC Curves

Simulation Study: Prediction Errors

Experiments

– Inputs: 21 genetic variations in chromosome 3 of yeast – Outputs: gene expression measurements for 3684 – Samples for 114 yeast strains

gene expression levels

Yeast eQTL Analysis

02-‑715 ¡Advanced ¡Topics ¡in ¡Computa8onal ¡ Genomics ¡