Large scale greedy feature-selection for multi-target learning - - PowerPoint PPT Presentation

large scale greedy feature selection for multi target
SMART_READER_LITE
LIVE PREVIEW

Large scale greedy feature-selection for multi-target learning - - PowerPoint PPT Presentation

Large scale greedy feature-selection for multi-target learning Antti Airola, Tapio Pahikkala et al. ECML 2015 BigTargets Workshop Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning Overview


slide-1
SLIDE 1

Large scale greedy feature-selection for multi-target learning

Antti Airola, Tapio Pahikkala et al. ECML 2015 BigTargets Workshop

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-2
SLIDE 2

Overview

Joint work with many authors University of Turku: Antti Airola, Pekka Naula, Tapio Pahikkala, Tapio Salakoski (Multi-target greedy RLS)

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-3
SLIDE 3

Overview

Large scale feature selection for multi-target learning Task: select minimal set of common features allowing accurate predictions over target tasks Greedy RLS: greedy regularized least-squares Linear time (#inputs, #features, #outputs, #selected) Highlights from experiments

Broad-DREAM Gene Essentiality Prediction Challenge Outperforms multi-task Lasso for small feature budgets

Also scales to full Genome Wide Association Studies; thousands of samples, hundreds of thousands of features (recent PhD thesis: Sebastian Okser)

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-4
SLIDE 4

Motivation

Why feature selection?

1 Accuracy: regularizing effect, avoiding overfitting leads to

better generalization

2 Interpretability: obtain a small set of features understandable

by human expert

3 Budget constraints: obtaining features costs time and money Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-5
SLIDE 5

Model sparsity

W1 =             1 3 2 −1 3 1 2 2             , W2 =             2 3 −1 2 3 1 4 1             features x targets coefficient matrices W1 8 features needed for prediction W2 2 features needed for prediction

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-6
SLIDE 6

Learning task

Least-squares formulation arg minW∈Rd×t XW − Y2

F

subject to C(W) Notation X data matrix Y

  • utput matrix

W model coefficients · F Frobenius norm C(·) Constraint (regularizer)

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-7
SLIDE 7

Multi-task Lasso (baseline)

Multi-task Lasso (Zhang, 2006) arg minW∈Rd×t XW − Y2

F

subject to d

i=1 maxj |Wi,j| ≤ r

L1,∞ norm enforces sparsity in the number of features r > 0 regularization parameter

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-8
SLIDE 8

Greedy RLS

Greedy RLS (proposed) arg minW∈Rd×t XW − Y2

F

subject to W2

F < r

and |{i | ∃j, Wi,j = 0}| ≤ k r > 0 regularization parameter k > 0 constraint on the number of features heuristics needed to search over the power set of features

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-9
SLIDE 9

Greedy RLS

Greedy regularized least-squares (Greedy RLS) Starting from empty feature set, at each point add the feature reducing leave-one-out cross-validation error most Stop once k features have been selected

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-10
SLIDE 10

Greedy RLS

Algorithm 1 Multi-target greedy RLS

1: S ← ∅

⊲ selected features common for all tasks

2: while |S| < k do

⊲ select k features

3:

e ← ∞

4:

b ← 0

5:

for i ∈ {1, . . . , d} \ S do ⊲ test all features

6:

eavg ← 0

7:

for j ∈ {1, . . . , t} do

8:

ei,j ← L(X:,S∪{i}, Y:,j) ⊲ LOO for task j

9:

eavg ← eavg + ei,j/t

10:

if eavg < e then

11:

e ← eavg

12:

b ← i

13:

S ← S ∪ {b} ⊲ feature with lowest LOO-error

14: W ← A(X:,S, Y)

⊲ train final models

15: return W, S

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-11
SLIDE 11

Greedy RLS could be implemented as a general wrapper code calling a black-box solver #selected×#features×#targets×#CV-rounds calls for naive implementation! Matrix algebraic optimization for feature addition, leave-one-out... (for all targets simultaneously) Linear time algorithm (#inputs, #features, #outputs, #selected)

  • P. Naula, A. Airola, T. Salakoski and T. Pahikkala.

Multi-label learning under feature extraction budgets. Pattern Recognition Letters, 2014.

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-12
SLIDE 12

Greedy RLS

Algorithm 2 Multi-target greedy RLS A ← λ−1Y g ← λ−11 C ← λ−1X S ← ∅ while |S| < k do e ← ∞ b ← 0 for i ∈ {1, . . . , d} \ S do u ← C:,i(1 + (X:,i)TC:,i)−1 ei ← 0

  • A ← A − u((X:,i)TA)

for h ∈ {1, . . . , t} do for j ∈ {1, . . . , n} do ˜ gj ← gj − ujCj,i ei ← ei + (˜ gj)−2( Aj,h)2 if ei < e then e ← ei b ← i u ← C:,b(1 + (X:,b)TC:,b)−1 A ← A − u((X:,b)TA) for j ∈ {1, . . . , n} do gj ← gj − ujCj,b C ← C − u((X:,b)TC) S ← S ∪ {b} W ← (X:,S)TA Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-13
SLIDE 13

Benchmarking greedy RLS and multi-task Lasso

Table: Mulan datasets (Tsoumakas et al. 2011).

Data sets domain labels features instances Scene image 6 294 2407 Yeast biology 14 103 2417 Emotions music 6 72 593 Mediamill* text 9 120 41583 Delicious text 983 500 16105 Tmc2007 text 22 49060 28596

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-14
SLIDE 14

Greedy RLS vs. Lasso

50 100 150 200 250 0.5 0.6 0.7 0.8 0.9 1.0

M.avg.AUC

Scene data

MT-Lasso ML-gRLS

20 40 60 80 100 0.5 0.6 0.7 0.8 0.9 1.0

M.avg.AUC

Yeast data

MT-Lasso ML-gRLS

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-15
SLIDE 15

Greedy RLS vs. Lasso

10 20 30 40 50 60 70 0.5 0.6 0.7 0.8 0.9 1.0

M.avg.AUC

Emotions data

MT-Lasso ML-gRLS

20 40 60 80 100 0.5 0.6 0.7 0.8 0.9 1.0

M.avg.AUC

Mediamill data

MT-Lasso ML-gRLS

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-16
SLIDE 16

Greedy RLS vs. Lasso

20 40 60 80 0.5 0.6 0.7 0.8 0.9 1.0

M.avg.AUC

Delicious data

MT-Lasso ML-gRLS

10 20 30 40 Number of features 0.5 0.6 0.7 0.8 0.9 1.0

M.avg.AUC

Tmc2007 data

MT-Lasso ML-gRLS

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

slide-17
SLIDE 17

Conclusion

Greedy RLS: linear time algorithm for (multi-target) feature selection Selects joint features for the target tasks Competitive, when number of features to be selected small Applications on Genome-Wide Association Studies RLScore open source implementation at https://github.com/aatapa/RLScore

Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning