A two-step method to incorporate task features spaces for large - - PowerPoint PPT Presentation

a two step method to incorporate task features
SMART_READER_LITE
LIVE PREVIEW

A two-step method to incorporate task features spaces for large - - PowerPoint PPT Presentation

KERMIT A two-step method for large output A two-step method to incorporate task features spaces for large output spaces Michiel Stock twitter: @michielstock Motivation Introductory Michiel Stock 1 , Tapio Pahikkala 2 , Antti Airola 2 ,


slide-1
SLIDE 1

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

A two-step method to incorporate task features for large output spaces

Michiel Stock1, Tapio Pahikkala2, Antti Airola2, Bernard De Baets1 & Willem Waegeman1

1KERMIT

Department of Mathematical Modelling, Statistics and Bioinformatics Ghent University

2Department of Computer Science

University of Turku

NIPS: extreme classification workshop December 12, 2015

slide-2
SLIDE 2

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

What will we read next?

5 4 1 4 4 2 1 4 3

Alice Bob Cedric Daphne

slide-3
SLIDE 3

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

What will we read next?

5 4 1 4 4 2 1 4 3

Alice Bob Cedric Daphne

Social graph Genre 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0 1

slide-4
SLIDE 4

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

What will we read next?

5 2.3 4 3.1 1 4.5 1.3 4 3.9 4 3.8 0.8 2 5.2 1 4.5 4 2.5 3 3.6

Alice Bob Cedric Daphne

Social graph Genre 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0 1

slide-5
SLIDE 5

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

What will we read next?

5 2.3 4 3.1 1 4.5 1.3 4 3.9 4 3.8 0.8 2 5.2 1 4.5 4 2.5 3 3.6

Alice Bob Cedric Daphne

Social graph Genre 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0 1

4.8 1.1 3.7 2.3

1 1 0 1

slide-6
SLIDE 6

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

What will we read next?

5 2.3 4 3.1 1 4.5 1.3 4 3.9 4 3.8 0.8 2 5.2 1 4.5 4 2.5 3 3.6

Alice Bob Cedric Daphne

Social graph Genre 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0 1

4.8 1.1 3.7 2.3

1 1 0 1

2.3 4.0 1.7 4.8 2.9

Eric

slide-7
SLIDE 7

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

What will we read next?

5 2.3 4 3.1 1 4.5 1.3 4 3.9 4 3.8 0.8 2 5.2 1 4.5 4 2.5 3 3.6

Alice Bob Cedric Daphne

Social graph Genre 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0 1

4.8 1.1 3.7 2.3

1 1 0 1

2.3 4.0 1.7 4.8 2.9

Eric

2.4

slide-8
SLIDE 8

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Learning relations

Setting A Setting B Setting C Setting D Training

In-sample tasks Out-of-sample tasks Out-of- sample instances In-sample instances

slide-9
SLIDE 9

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Other cool applications: drug design

Predicting interaction between proteins and small compounds

slide-10
SLIDE 10

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Other cool applications: social network analysis

Predicting links between people

slide-11
SLIDE 11

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Other cool applications: food pairing

Finding ingredients that pair well

slide-12
SLIDE 12

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Learning with pairwise feature representations

Features books Features readers

Φ

Ψ

d : instance (e.g. book) φ(d) : instance features (e.g. genre) t : task (e.g. reader) ψ(t) : task features (e.g. social network)

slide-13
SLIDE 13

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Learning with pairwise feature representations

Features books Features readers

Φ

Ψ

Φ ⊗ Ψ

⊗ =

d : instance (e.g. book) φ(d) : instance features (e.g. genre) t : task (e.g. reader) ψ(t) : task features (e.g. social network)

slide-14
SLIDE 14

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Learning with pairwise feature representations

Features books Features readers

Φ

Ψ

Φ ⊗ Ψ

⊗ =

d : instance (e.g. book) φ(d) : instance features (e.g. genre) t : task (e.g. reader) ψ(t) : task features (e.g. social network) Pairwise prediction function: f (d, t) = w|(φ(d) ⊗ ψ(t))

slide-15
SLIDE 15

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Learning relations in two steps

In-sample tasks Out-of-sample tasks

Task KRR Instance KRR Virtual instances In-sample instances Out-of- sample instances 1 Build a ridge

regression model to generalize to new instances

2 Build a ridge

regression model to generalize to new tasks

slide-16
SLIDE 16

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

The two-step ridge regression

Prediction function: f (d, t) = φ(d)|Wψ(t) Parameters can be found by solving: Φ|YΨ = (Φ|Φ + λdI)W(Ψ|Ψ + λtI)

slide-17
SLIDE 17

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

The two-step ridge regression

Prediction function: f (d, t) = φ(d)|Wψ(t) Parameters can be found by solving: Φ|YΨ = (Φ|Φ + λdI)W(Ψ|Ψ + λtI) Two hyperparameters: λd and λt!

slide-18
SLIDE 18

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Four ways of cross validation

Setting A Setting B Setting D Setting C

Train Test Discarded

slide-19
SLIDE 19

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Four ways of cross validation

Setting A Setting B Setting D Setting C

Train Test Discarded

Analytic shortcuts can be derived to perform LOOCV for each setting! Tuning λd and λt essentially free!

slide-20
SLIDE 20

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Effect of regularization for the four settings

Data: protein-ligand interactions. Evaluation by AUC (lighter = better performance) Clear difference between four settings and λd and λt!

slide-21
SLIDE 21

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Learning with mini-batches

Initial training data New training instances Even more training instances

Tasks Instances

New training tasks

slide-22
SLIDE 22

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Learning with mini-batches

Initial training data New training instances Even more training instances

Tasks Instances

New training tasks

Exact updating of the parameters when new training instances and/or taks become available scalable for “Big Data” applications updating model in dynamic environment

slide-23
SLIDE 23

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Exact online learning for hierarchical text classification

Hierarchical text classification (> 12, 000 labels): from 5,000 to 350,000 instances in steps of 1,000 instances.

slide-24
SLIDE 24

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Why two-step ridge regression?

Zero-shot learning, transfer learning, multi-task learning... in one line of code

slide-25
SLIDE 25

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Why two-step ridge regression?

Zero-shot learning, transfer learning, multi-task learning... in one line of code Theoretically well founded

slide-26
SLIDE 26

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

Why two-step ridge regression?

Zero-shot learning, transfer learning, multi-task learning... in one line of code Theoretically well founded Allows for nifty computational tricks

‘free’ tuning for the hyperparameters ‘free’ LOOCV for all four settings! closed-form solution for updating with mini-batches

slide-27
SLIDE 27

A two-step method for large output spaces Michiel Stock twitter: @michielstock Motivation

Introductory example Relational learning Other applications

Pairwise learning methods

Kronecker kernel ridge regression Two-step kernel ridge regression

Computational aspects

Cross-validation Exact online learning

Take home messages KERMIT

A two-step method to incorporate task features for large output spaces

Michiel Stock1, Tapio Pahikkala2, Antti Airola2, Bernard De Baets1 & Willem Waegeman1

1KERMIT

Department of Mathematical Modelling, Statistics and Bioinformatics Ghent University

2Department of Computer Science

University of Turku

NIPS: extreme classification workshop December 12, 2015