The Multilabel Naive Credal Classifier Alessandro Antonucci and - - PowerPoint PPT Presentation

the multilabel naive credal classifier
SMART_READER_LITE
LIVE PREVIEW

The Multilabel Naive Credal Classifier Alessandro Antonucci and - - PowerPoint PPT Presentation

The Multilabel Naive Credal Classifier Alessandro Antonucci and Giorgio Corani { alessandro,giorgio } @idsia.ch Istituto Dalle Molle di Studi sullIntelligenza Artificiale - Lugano (Switzerland) http://ipg.idsia.ch ISIPTA 15, Pescara,


slide-1
SLIDE 1

The Multilabel Naive Credal Classifier

Alessandro Antonucci and Giorgio Corani

{alessandro,giorgio}@idsia.ch

Istituto “Dalle Molle” di Studi sull’Intelligenza Artificiale - Lugano (Switzerland) http://ipg.idsia.ch

ISIPTA ’15, Pescara, July 21st, 2015

slide-2
SLIDE 2

IPG ⊂ IDSIA ⊂ USI ∪ SUPSI ⊂ LUGANO

slide-3
SLIDE 3

IPG ⊂ IDSIA ⊂ USI ∪ SUPSI ⊂ LUGANO

slide-4
SLIDE 4

IPG ⊂ IDSIA ⊂ USI ∪ SUPSI ⊂ LUGANO

slide-5
SLIDE 5

IPG ⊂ IDSIA ⊂ USI ∪ SUPSI ⊂ LUGANO

University of Applied Sciences and Arts of Southern Switzerland (supsi.ch) Universit` a della Svizzera Italiana (usi.ch)

slide-6
SLIDE 6

IPG ⊂ IDSIA ⊂ USI ∪ SUPSI ⊂ LUGANO

slide-7
SLIDE 7

Chronology (Acknowledgements)

Credal version of the naive Bayes classifier by Marco (Zaffalon) MAP algs for imprecise HMMs by Jasper (De Bock) & Gert (de Cooman)

ISIPTA ’01 ISIPTA ’11 IJCAI-13 NIPS 14 ISIPTA ’15

Bayes nets as multilabel classifiers by Denis (Mau´ a) & us MAP in generic credal nets by Jasper & Cassio (de Campos) & me A credal classifiers based on MAP tasks in credal nets by us

slide-8
SLIDE 8

Chronology (Acknowledgements)

Credal version of the naive Bayes classifier by Marco (Zaffalon) MAP algs for imprecise HMMs by Jasper (De Bock) & Gert (de Cooman)

ISIPTA ’01 ISIPTA ’11 IJCAI-13 NIPS 14 ISIPTA ’15

Bayes nets as multilabel classifiers by Denis (Mau´ a) & us MAP in generic credal nets by Jasper & Cassio (de Campos) & me A credal classifiers based on MAP tasks in credal nets by us

slide-9
SLIDE 9

Chronology (Acknowledgements)

Credal version of the naive Bayes classifier by Marco (Zaffalon) MAP algs for imprecise HMMs by Jasper (De Bock) & Gert (de Cooman)

ISIPTA ’01 ISIPTA ’11 IJCAI-13 NIPS 14 ISIPTA ’15

Bayes nets as multilabel classifiers by Denis (Mau´ a) & us MAP in generic credal nets by Jasper & Cassio (de Campos) & me A credal classifiers based on MAP tasks in credal nets by us

slide-10
SLIDE 10

Chronology (Acknowledgements)

Credal version of the naive Bayes classifier by Marco (Zaffalon) MAP algs for imprecise HMMs by Jasper (De Bock) & Gert (de Cooman)

ISIPTA ’01 ISIPTA ’11 IJCAI-13 NIPS 14 ISIPTA ’15

Bayes nets as multilabel classifiers by Denis (Mau´ a) & us MAP in generic credal nets by Jasper & Cassio (de Campos) & me A credal classifiers based on MAP tasks in credal nets by us

slide-11
SLIDE 11

Chronology (Acknowledgements)

Credal version of the naive Bayes classifier by Marco (Zaffalon) MAP algs for imprecise HMMs by Jasper (De Bock) & Gert (de Cooman)

ISIPTA ’01 ISIPTA ’11 IJCAI-13 NIPS 14 ISIPTA ’15

Bayes nets as multilabel classifiers by Denis (Mau´ a) & us MAP in generic credal nets by Jasper & Cassio (de Campos) & me A credal classifiers based on MAP tasks in credal nets by us

slide-12
SLIDE 12

Single- vs. multi-label classification

A (fictious) classifier to detect eyes color Possible classes C := {brown,green,blue} Heterochromia iridum: two (or more) colors Possible values in 2C, a multilabel task! Trivial approaches Standard classification over the power set Exponential in the number of labels! Each label as a separate Boolean variable a (standard) classifier for each label Ignored relations among classes ! Graphical models (GMs) to depict relations among class labels (and features) Classification as (standard) inference in GMs SINGLE-LABEL C = green MULTI-LABEL C = {blue,brown}

slide-13
SLIDE 13

Credal classifiers are not (yet) multilabel classifiers

Class variable C and (discrete) features F, a test instance ˜ f Standard (single-label) classifier are maps: F → C

learn P(C, F) from data and return c∗ := arg maxc∈C P(c, ˜ f )

Multi-label classifiers: F → 2C

C = (C1, . . . , Cn) as an array of Boolean vars, one for each label learn P(C, F) and solve the MAP task c∗ := arg maxc∈{0,1}n P(c, ˜ f )

Credal (single-label) classifiers: F → 2C

learn credal set K(C, F) and return all c′′ ∈ C s.t. ∄c′ : P(c′, ˜ f ) > P(c′′, ˜ f ) ∀P(C, F) ∈ K(C, F)

Multilabel credal classifier (MCC): F → 22C

learn credal set K(C, F) and return all sequences c′′ s.t. ∄c′ : P(c′, ˜ f ) > P(c′′, ˜ f ) ∀P(C, F) ∈ K(C, F)

slide-14
SLIDE 14

Compact Representation of the Output

Output of a MCC might be exponentially large Jasper & Gert’s idea to fix this with imprecise HMMs (Viterbi): decide whether or not there is at least an optimal sequence sucht that a variable is in a particular state (for each variable and state) With MCCs, for each class label, we can decide whether:

the class is active for all the optimal sequences the class is inactive fro all the optimal sequences there are optimal sequences with the label active, and others with the label inactive

Optimization task min

c′′:c′′

l =0/1 max

c′

inf

P(C,F)∈K(C,F)

P(c′, f ) P(c′′, f ) ≤ 1 O(2treewidth) for separately specified credal nets (e.g., local IDM) More complex with non-separate specifications

slide-15
SLIDE 15

C F1 F2 . . . Fm NBC

slide-16
SLIDE 16

C F1 F2 . . . Fm NCC=NBC+IDM

slide-17
SLIDE 17

C1 C2 . . . Cn F1 F2 . . . Fm

Structural learning to bound # of parents of the features and to select the super-class C1 Multi-label? Naive topology over classes

slide-18
SLIDE 18

MNBC

Features replicated: tree topology

C1 C2 . . . Cn F 1

1

F 1

2

. . . F 1

m

F n

1

F n

2 . . .

F n

m

F 2

1

F 2

2

. . . F 2

m

slide-19
SLIDE 19

MNBC

Features replicated: tree topology

C1 C2 . . . Cn F 1

1

F 1

2

. . . F 1

m

F n

1

F n

2 . . .

F n

m

F 2

1

F 2

2

. . . F 2

m

+ IDM = MNCC

slide-20
SLIDE 20

During the poster session I can

Explain some detail about the learning of the structure Explain the feature replication trick (tis makes inference simpler) Explain the non-separate IDM-based quantification of the model Explain the detail of the (convex) optimization . . .

slide-21
SLIDE 21

MNCC: the algorithm

Input: test instance f (+ dataset D) / Output initialized: C1 C2 . . . Cn active . . . inactive . . . for l = 1, . . . , n do for cl = 0, 1 do if minc′′:c′′

l =cl maxc′ inft

Pt(c′,f ) Pt(c′′,f ) ≤ 1 then

Output(l,cl)=1 end if end for end for linear representation of a (exponential) number of maximal seqs 1 1 1 1 1

slide-22
SLIDE 22

Testing MNCC

Preliminary tests on real-world datasets Data set Classes Features Instances Emotions 6 44/72 593 Scene 6 224/294 2407 E-mobility 10 14/18 4226 Slashdot 22 496/1079 3782 Perfomance described by:

% of instance s.t. all maximal seqs all in the same state Accuracy of the precise model when MNCC is determinate Accuracy of the precise model when MNCC is indeterminate

slide-23
SLIDE 23

C1 C2 C3 C4 C5 C6 .25 .50 .75 1.00 Emotions

slide-24
SLIDE 24

C1 C2 C3 C4 C5 C6 .25 .50 .75 1.00 Scene

slide-25
SLIDE 25

.25 .50 .75 1.00 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 E-mobility

slide-26
SLIDE 26

.25 .50 .75 1.00 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 Slashdot

slide-27
SLIDE 27

Conclusions, Outlooks and Acks

Among the first tools for robust multilabel classification Still lots of things to do: Extension to multidimensional/hierarchical case Extension to continuous variables (features) Extension to continuous class (multi-target interval-valued regression) More complex topologies (ETAN, de Campos, 2014) Variational approach to features replication Not only 0/1 losses (imprecise losses?)