Structured Association II 02-715 Advanced Topics in Computa8onal - - PowerPoint PPT Presentation

structured association ii
SMART_READER_LITE
LIVE PREVIEW

Structured Association II 02-715 Advanced Topics in Computa8onal - - PowerPoint PPT Presentation

Structured Association II 02-715 Advanced Topics in Computa8onal Genomics Regression with Regularization Group lasso (Yuan and Lin, 2006) L1/L2 2 || j || L 1/ L 2 = jk k


slide-1
SLIDE 1

Structured Association II

02-­‑715 ¡Advanced ¡Topics ¡in ¡Computa8onal ¡ Genomics ¡

slide-2
SLIDE 2

Regression with Regularization

  • Group ¡lasso ¡(Yuan ¡and ¡Lin, ¡2006) ¡

– Parameter ¡es8ma8on: ¡pathwise ¡coordinate ¡descent ¡(Friedman ¡et ¡al, ¡2007) ¡

L1/L2

|| βj ||L1/ L2= β jk

2 k

slide-3
SLIDE 3

Regression with Regularization (Group Lasso Penalty)

Lasso penalty Group lasso penalty L2 penalty

slide-4
SLIDE 4

Lasso (Tibshirani, 1996)

Inputs Outputs

Regression Coefficients

slide-5
SLIDE 5

L1/L2-regularized Multi-task Regression

(Obozinski et al., 2008) Regression Coefficients

Inputs Outputs

slide-6
SLIDE 6

Hierarchical Selection with Nested Groups

(Zhao, Rocha, and Yu, 2009)

Prior knowledge on group hierarchies One-sided tree Almost-complete tree Regular tree

slide-7
SLIDE 7

Grouped Variable Selection in Structured Association

  • Mul8-­‑popula8on ¡lasso ¡(Puniyani ¡et ¡al., ¡2010) ¡

– Groups ¡are ¡given ¡as ¡groups ¡of ¡individuals ¡for ¡different ¡popula8ons ¡

  • Tree-­‑guided ¡group ¡lasso ¡(Kim ¡& ¡Xing, ¡2010) ¡

– Groups ¡are ¡defined ¡hierarchically ¡according ¡to ¡hierarchical ¡clustering ¡ tree ¡

slide-8
SLIDE 8

Pooled analysis

  • f multiple populations:

lasso for all populations

Separate analysis

  • f multiple populations:

lasso for each population

Population Structure in GWAS

slide-9
SLIDE 9

Multi-population group lasso

Population Structure in GWAS

slide-10
SLIDE 10

Analysis of Lactose Intolerance Dataset

Lasso (pooled) Lasso (separate) Eigen strat Single SNP test

Multi- population lasso

slide-11
SLIDE 11

Tree-Guided Group Lasso

h2 h1

Inputs Outputs

Regression Coefficients

Tree-guided group lasso penalty Key idea: use overlapping groups in group lasso

slide-12
SLIDE 12

Outputs (Genes)

Example: Learning Genetic Associations

TCGACGTTTTACTGTACAATT ¡

Inputs (SNPs) Regression coefficients

slide-13
SLIDE 13

Tree-Guided Group-Lasso Penalty

  • Hierarchical clustering tree over the outputs

(tasks) as prior knowledge

– Tree structure: clustering at multiple granularity – Heights of internal nodes: strength of clustering

  • Group-lasso-like penalty with overlapping

groups

– Each group at each node of the tree

h2 h1

slide-14
SLIDE 14

Tree-Guided Group Lasso

  • Low height
  • Tight correlation
  • Joint selection
  • Large height
  • Weak correlation
  • Separate selection

h h

  • In a simple case of two outputs

Inputs Inputs

slide-15
SLIDE 15

L1 penalty

  • Lasso penalty
  • Separate selection

L2 penalty

  • Group lasso
  • Joint selection

h Elastic net

Select the child nodes jointly or separately?

Tree-Guided Group Lasso

  • In a simple case of two outputs

Tree-guided group lasso

Inputs

slide-16
SLIDE 16

h2 h1

Select the child nodes jointly or separately?

Joint

selection Separate selection

  • For a general tree

Tree-Guided Group Lasso

Tree-guided group lasso

slide-17
SLIDE 17

h2 h1

Select the child nodes jointly or separately?

  • For a general tree

Tree-Guided Group Lasso

Tree-guided group lasso Joint

selection Separate selection

Note that the groups overlap!

slide-18
SLIDE 18

Overlapping Groups in Tree-guided Group Lasso

Balanced penalization

slide-19
SLIDE 19

Overlapping Groups

  • Previously
  • Arbitrarily overlapping groups (Jenatton, Audibert, Bach, 2009)
  • Overlapping groups over tree-structured inputs (Zhao, Roach, Yu, 2008)

Unbalanced penalization

slide-20
SLIDE 20

Tree-Guided Group-Lasso Penalty

  • Penalty function

where

slide-21
SLIDE 21

Unit Contour Surface for Various Penalty Function

Lasso L1/L2 Tree g1=0.5, g2=0.5 Tree g1=0.2, g2=0.7 Tree g1=0.7, g2=0.2

slide-22
SLIDE 22

Estimating Parameters

  • Second-order cone program

– Many publicly available software packages for solving convex

  • ptimization problems can be used
  • Also, variational formulation
slide-23
SLIDE 23

Proximal Gradient Descent

Original Problem: Approximation Problem: Gradient of the Approximation:

slide-24
SLIDE 24

Geometric Interpretation

  • Smooth ¡approxima8on ¡

Uppermost Line Nonsmooth Uppermost Line Smooth

slide-25
SLIDE 25

Illustration with Simulated Data

True ¡regression ¡ coefficients ¡ Lasso ¡ ¡ Tree-­‑guided ¡ group ¡lasso ¡ ¡

No association High association

L1/L2-­‑regularized ¡ mul8-­‑task ¡regression ¡ ¡

Inputs (SNPs) Outputs (Genes)

slide-26
SLIDE 26

Simulation Study: ROC Curves

  • Results averaged over 50 simulated datasets
slide-27
SLIDE 27

Simulation Study: Prediction Errors

  • Results averaged over 50 simulated datasets
slide-28
SLIDE 28

Experiments

  • Yeast Dataset

– Inputs: 21 genetic variations in chromosome 3 of yeast – Outputs: gene expression measurements for 3684 – Samples for 114 yeast strains

  • Goal: learn input features that preturb the output

gene expression levels

slide-29
SLIDE 29

Yeast eQTL Analysis

Lasso ¡ ¡ Tree-­‑guided ¡ group ¡lasso ¡ ¡

Inputs (SNPs) Outputs (Genes)

No association High association

Hierarchical ¡ clustering ¡tree ¡for ¡ genes ¡(outputs) ¡ L1/L2-­‑regularized ¡ mul8-­‑task ¡regression ¡ ¡