Large scale greedy feature-selection for multi-target learning - PowerPoint PPT Presentation

Large scale greedy feature-selection for multi-target learning Antti Airola, Tapio Pahikkala et al. ECML 2015 BigTargets Workshop Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Overview Joint work with many authors University of Turku: Antti Airola, Pekka Naula, Tapio Pahikkala, Tapio Salakoski (Multi-target greedy RLS) Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Overview Large scale feature selection for multi-target learning Task: select minimal set of common features allowing accurate predictions over target tasks Greedy RLS: greedy regularized least-squares Linear time (#inputs, #features, #outputs, #selected) Highlights from experiments Broad-DREAM Gene Essentiality Prediction Challenge Outperforms multi-task Lasso for small feature budgets Also scales to full Genome Wide Association Studies; thousands of samples, hundreds of thousands of features (recent PhD thesis: Sebastian Okser) Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Motivation Why feature selection? 1 Accuracy: regularizing effect, avoiding overfitting leads to better generalization 2 Interpretability: obtain a small set of features understandable by human expert 3 Budget constraints: obtaining features costs time and money Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Model sparsity     1 0 0 0 0 0 0 0     3 0 0 0 2 3 − 1 2         0 2 0 0 0 0 0 0         0 − 1 0 0 3 1 4 1     W 1 = , W 2 =     0 0 0 3 0 0 0 0         0 0 0 1 0 0 0 0         0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 features x targets coefficient matrices W 1 8 features needed for prediction W 2 2 features needed for prediction Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Learning task Least-squares formulation arg min W ∈ R d × t � XW − Y � 2 F subject to C ( W ) Notation X data matrix Y output matrix W model coefficients � · � F Frobenius norm C ( · ) Constraint (regularizer) Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Multi-task Lasso (baseline) Multi-task Lasso (Zhang, 2006) arg min W ∈ R d × t � XW − Y � 2 F subject to � d i =1 max j | W i , j | ≤ r L 1 , ∞ norm enforces sparsity in the number of features r > 0 regularization parameter Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Greedy RLS Greedy RLS (proposed) arg min W ∈ R d × t � XW − Y � 2 F subject to � W � 2 F < r and |{ i | ∃ j , W i , j � = 0 }| ≤ k r > 0 regularization parameter k > 0 constraint on the number of features heuristics needed to search over the power set of features Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Greedy RLS Greedy regularized least-squares (Greedy RLS) Starting from empty feature set, at each point add the feature reducing leave-one-out cross-validation error most Stop once k features have been selected Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Greedy RLS Algorithm 1 Multi-target greedy RLS 1: S ← ∅ ⊲ selected features common for all tasks 2: while |S| < k do ⊲ select k features e ← ∞ 3: b ← 0 4: for i ∈ { 1 , . . . , d } \ S do ⊲ test all features 5: e avg ← 0 6: for j ∈ { 1 , . . . , t } do 7: e i , j ← L ( X : , S∪{ i } , Y : , j ) ⊲ LOO for task j 8: e avg ← e avg + e i , j / t 9: if e avg < e then 10: e ← e avg 11: b ← i 12: S ← S ∪ { b } ⊲ feature with lowest LOO-error 13: 14: W ← A ( X : , S , Y ) ⊲ train final models 15: return W , S Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Greedy RLS could be implemented as a general wrapper code calling a black-box solver #selected × #features × #targets × #CV-rounds calls for naive implementation! Matrix algebraic optimization for feature addition, leave-one-out... (for all targets simultaneously) Linear time algorithm (#inputs, #features, #outputs, #selected) P. Naula, A. Airola, T. Salakoski and T. Pahikkala. Multi-label learning under feature extraction budgets. Pattern Recognition Letters , 2014. Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Greedy RLS Algorithm 2 Multi-target greedy RLS A ← λ − 1 Y g ← λ − 1 1 C ← λ − 1 X S ← ∅ while |S| < k do e ← ∞ b ← 0 for i ∈ { 1 , . . . , d } \ S do u ← C : , i (1 + ( X : , i ) T C : , i ) − 1 e i ← 0 � A ← A − u (( X : , i ) T A ) for h ∈ { 1 , . . . , t } do for j ∈ { 1 , . . . , n } do ˜ g j ← g j − u j C j , i g j ) − 2 ( � A j , h ) 2 e i ← e i + (˜ if e i < e then e ← e i b ← i u ← C : , b (1 + ( X : , b ) T C : , b ) − 1 A ← A − u (( X : , b ) T A ) for j ∈ { 1 , . . . , n } do g j ← g j − u j C j , b C ← C − u (( X : , b ) T C ) S ← S ∪ { b } W ← ( X : , S ) T A Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Benchmarking greedy RLS and multi-task Lasso Table: Mulan datasets (Tsoumakas et al. 2011). Data sets domain labels features instances Scene image 6 294 2407 Yeast biology 14 103 2417 Emotions music 6 72 593 Mediamill* text 9 120 41583 Delicious text 983 500 16105 Tmc2007 text 22 49060 28596 Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Greedy RLS vs. Lasso 1.0 M.avg.AUC 0.9 0.8 Scene data 0.7 MT-Lasso 0.6 ML-gRLS 0.5 0 50 100 150 200 250 1.0 Yeast data M.avg.AUC 0.9 MT-Lasso 0.8 ML-gRLS 0.7 0.6 0.5 0 20 40 60 80 100 Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Greedy RLS vs. Lasso 1.0 M.avg.AUC 0.9 0.8 Emotions data 0.7 MT-Lasso 0.6 ML-gRLS 0.5 0 10 20 30 40 50 60 70 1.0 M.avg.AUC 0.9 0.8 Mediamill data 0.7 MT-Lasso 0.6 ML-gRLS 0.5 0 20 40 60 80 100 Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Greedy RLS vs. Lasso 1.0 Delicious data M.avg.AUC 0.9 MT-Lasso 0.8 ML-gRLS 0.7 0.6 0.5 0 20 40 60 80 1.0 M.avg.AUC 0.9 0.8 Tmc2007 data 0.7 MT-Lasso 0.6 ML-gRLS 0.5 0 10 20 30 40 Number of features Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Conclusion Greedy RLS: linear time algorithm for (multi-target) feature selection Selects joint features for the target tasks Competitive, when number of features to be selected small Applications on Genome-Wide Association Studies RLScore open source implementation at https://github.com/aatapa/RLScore Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning

Large scale greedy feature-selection for multi-target learning - PowerPoint PPT Presentation

Large scale greedy feature-selection for multi-target learning Antti Airola, Tapio Pahikkala et al. ECML 2015 BigTargets Workshop Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning Overview

Greedy On-Line Planning - abstract overview: what is greedy on-line planning? Part 1: - greedy

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

From greedy approximation to greedy optimization Vladimir Temlyakov July, 2014 Vladimir

From greedy approximation to greedy optimization Vladimir Temlyakov December 10, 2013 Vladimir

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

CS 170 Section 4 Greedy Algorithms I Owen Jow | owenjow@berkeley.edu Agenda Greedy

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

Greedy Algorithms Pedro Ribeiro DCC/FCUP 2018/2019 Pedro Ribeiro (DCC/FCUP) Greedy Algorithms

Greedy algorithms Greedy algorithms Find the best solution to a local problem and (hope) it

Greedy Algorithms 1 The main idea of greedy algorithm is look some optimal solution locally

Greedy routing Greedy routing Other variations on greedy criterion Introduce

General remarks Algorithms Algorithms Oliver Oliver Week 8 Kullmann Kullmann Greedy Greedy

Greedy routing by distributed D l Delaunay triangulation t i l ti 4/4/2017 Greedy Routing (S.

Pharos - Federated Labs to Face NFV challenges Morgan Richomme (Orange) Trevor Cooper (Intel)

The ATLAS Distributed Data Management System David Cameron EPF Seminar 6 June 2007 1

Improved clinical care and service delivery Graeme Bell and Chee Fon Chang Outline Foundation

BIBFRAME: Libraries can lead Linked Data 1 | 25 | BIBFRAME | November 26, 2013 Libraries

Introduction Sequents and Multisets, Sets and Provability Logic Formalisation of cut-admissibility

Extending a CICS web application using JCICS Extending a CICS web application using JCICS

OPTIMUM OPTIMUM ADAPTIVE ALGORITHMS ADAPTIVE ALGORITHMS for for SYSTEM IDENTIFICATION SYSTEM

Rich Communications with Kamailio & IMS What is he talking about? Timetravel: The 90s

Large scale greedy feature-selection for multi-target learning - PowerPoint PPT Presentation

Large scale greedy feature-selection for multi-target learning Antti Airola, Tapio Pahikkala et al. ECML 2015 BigTargets Workshop Antti Airola, Tapio Pahikkala et al. Large scale greedy feature-selection for multi-target learning Overview

Greedy On-Line Planning - abstract overview: what is greedy on-line planning? Part 1: - greedy

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

From greedy approximation to greedy optimization Vladimir Temlyakov July, 2014 Vladimir

From greedy approximation to greedy optimization Vladimir Temlyakov December 10, 2013 Vladimir

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

CS 170 Section 4 Greedy Algorithms I Owen Jow | owenjow@berkeley.edu Agenda Greedy

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

Greedy Algorithms Pedro Ribeiro DCC/FCUP 2018/2019 Pedro Ribeiro (DCC/FCUP) Greedy Algorithms

Greedy algorithms Greedy algorithms Find the best solution to a local problem and (hope) it

Greedy Algorithms 1 The main idea of greedy algorithm is look some optimal solution locally

Greedy routing Greedy routing Other variations on greedy criterion Introduce

General remarks Algorithms Algorithms Oliver Oliver Week 8 Kullmann Kullmann Greedy Greedy

Greedy routing by distributed D l Delaunay triangulation t i l ti 4/4/2017 Greedy Routing (S.

Pharos - Federated Labs to Face NFV challenges Morgan Richomme (Orange) Trevor Cooper (Intel)

The ATLAS Distributed Data Management System David Cameron EPF Seminar 6 June 2007 1

Improved clinical care and service delivery Graeme Bell and Chee Fon Chang Outline Foundation

BIBFRAME: Libraries can lead Linked Data 1 | 25 | BIBFRAME | November 26, 2013 Libraries

Introduction Sequents and Multisets, Sets and Provability Logic Formalisation of cut-admissibility

Extending a CICS web application using JCICS Extending a CICS web application using JCICS

OPTIMUM OPTIMUM ADAPTIVE ALGORITHMS ADAPTIVE ALGORITHMS for for SYSTEM IDENTIFICATION SYSTEM

Rich Communications with Kamailio &amp; IMS What is he talking about? Timetravel: The 90s

Rich Communications with Kamailio & IMS What is he talking about? Timetravel: The 90s