Structured Association II 02-715 Advanced Topics in Computa8onal - - PowerPoint PPT Presentation
Structured Association II 02-715 Advanced Topics in Computa8onal - - PowerPoint PPT Presentation
Structured Association II 02-715 Advanced Topics in Computa8onal Genomics Regression with Regularization Group lasso (Yuan and Lin, 2006) L1/L2 2 || j || L 1/ L 2 = jk k
Regression with Regularization
- Group ¡lasso ¡(Yuan ¡and ¡Lin, ¡2006) ¡
– Parameter ¡es8ma8on: ¡pathwise ¡coordinate ¡descent ¡(Friedman ¡et ¡al, ¡2007) ¡
L1/L2
|| βj ||L1/ L2= β jk
2 k
∑
Regression with Regularization (Group Lasso Penalty)
Lasso penalty Group lasso penalty L2 penalty
Lasso (Tibshirani, 1996)
Inputs Outputs
Regression Coefficients
L1/L2-regularized Multi-task Regression
(Obozinski et al., 2008) Regression Coefficients
Inputs Outputs
Hierarchical Selection with Nested Groups
(Zhao, Rocha, and Yu, 2009)
Prior knowledge on group hierarchies One-sided tree Almost-complete tree Regular tree
Grouped Variable Selection in Structured Association
- Mul8-‑popula8on ¡lasso ¡(Puniyani ¡et ¡al., ¡2010) ¡
– Groups ¡are ¡given ¡as ¡groups ¡of ¡individuals ¡for ¡different ¡popula8ons ¡
- Tree-‑guided ¡group ¡lasso ¡(Kim ¡& ¡Xing, ¡2010) ¡
– Groups ¡are ¡defined ¡hierarchically ¡according ¡to ¡hierarchical ¡clustering ¡ tree ¡
Pooled analysis
- f multiple populations:
lasso for all populations
Separate analysis
- f multiple populations:
lasso for each population
Population Structure in GWAS
Multi-population group lasso
Population Structure in GWAS
Analysis of Lactose Intolerance Dataset
Lasso (pooled) Lasso (separate) Eigen strat Single SNP test
Multi- population lasso
Tree-Guided Group Lasso
h2 h1
Inputs Outputs
Regression Coefficients
Tree-guided group lasso penalty Key idea: use overlapping groups in group lasso
Outputs (Genes)
Example: Learning Genetic Associations
TCGACGTTTTACTGTACAATT ¡
Inputs (SNPs) Regression coefficients
Tree-Guided Group-Lasso Penalty
- Hierarchical clustering tree over the outputs
(tasks) as prior knowledge
– Tree structure: clustering at multiple granularity – Heights of internal nodes: strength of clustering
- Group-lasso-like penalty with overlapping
groups
– Each group at each node of the tree
h2 h1
Tree-Guided Group Lasso
- Low height
- Tight correlation
- Joint selection
- Large height
- Weak correlation
- Separate selection
h h
- In a simple case of two outputs
Inputs Inputs
L1 penalty
- Lasso penalty
- Separate selection
L2 penalty
- Group lasso
- Joint selection
h Elastic net
Select the child nodes jointly or separately?
Tree-Guided Group Lasso
- In a simple case of two outputs
Tree-guided group lasso
Inputs
h2 h1
Select the child nodes jointly or separately?
Joint
selection Separate selection
- For a general tree
Tree-Guided Group Lasso
Tree-guided group lasso
h2 h1
Select the child nodes jointly or separately?
- For a general tree
Tree-Guided Group Lasso
Tree-guided group lasso Joint
selection Separate selection
Note that the groups overlap!
Overlapping Groups in Tree-guided Group Lasso
Balanced penalization
Overlapping Groups
- Previously
- Arbitrarily overlapping groups (Jenatton, Audibert, Bach, 2009)
- Overlapping groups over tree-structured inputs (Zhao, Roach, Yu, 2008)
Unbalanced penalization
Tree-Guided Group-Lasso Penalty
- Penalty function
where
Unit Contour Surface for Various Penalty Function
Lasso L1/L2 Tree g1=0.5, g2=0.5 Tree g1=0.2, g2=0.7 Tree g1=0.7, g2=0.2
Estimating Parameters
- Second-order cone program
– Many publicly available software packages for solving convex
- ptimization problems can be used
- Also, variational formulation
Proximal Gradient Descent
Original Problem: Approximation Problem: Gradient of the Approximation:
Geometric Interpretation
- Smooth ¡approxima8on ¡
Uppermost Line Nonsmooth Uppermost Line Smooth
Illustration with Simulated Data
True ¡regression ¡ coefficients ¡ Lasso ¡ ¡ Tree-‑guided ¡ group ¡lasso ¡ ¡
No association High association
L1/L2-‑regularized ¡ mul8-‑task ¡regression ¡ ¡
Inputs (SNPs) Outputs (Genes)
Simulation Study: ROC Curves
- Results averaged over 50 simulated datasets
Simulation Study: Prediction Errors
- Results averaged over 50 simulated datasets
Experiments
- Yeast Dataset
– Inputs: 21 genetic variations in chromosome 3 of yeast – Outputs: gene expression measurements for 3684 – Samples for 114 yeast strains
- Goal: learn input features that preturb the output
gene expression levels
Yeast eQTL Analysis
Lasso ¡ ¡ Tree-‑guided ¡ group ¡lasso ¡ ¡
Inputs (SNPs) Outputs (Genes)
No association High association
Hierarchical ¡ clustering ¡tree ¡for ¡ genes ¡(outputs) ¡ L1/L2-‑regularized ¡ mul8-‑task ¡regression ¡ ¡