Multi-Label Learning with Highly Incomplete Data via Collaborative Embedding
Yufei Han1, Guolei Sun2, Yun Shen1, Xiangliang Zhang2 1. Symantec Research Labs 2. King Abdullah University of Science and Technology
Multi-Label Learning with Highly Incomplete Data via Collaborative - - PowerPoint PPT Presentation
Multi-Label Learning with Highly Incomplete Data via Collaborative Embedding Yufei Han 1 , Guolei Sun 2 , Yun Shen 1 , Xiangliang Zhang 2 1. Symantec Research Labs 2. King Abdullah University of Science and Technology Outline Introduction
Yufei Han1, Guolei Sun2, Yun Shen1, Xiangliang Zhang2 1. Symantec Research Labs 2. King Abdullah University of Science and Technology
f(x)=apple f(x)=banana
Multi-label classification Collaborative embedding Incomplete feature
f(x)=orange
– Construct classifier for each label independently – Not consider label dependency
– Convert into multi-class classification – A,B: {}, {A}, {B}, {A,B} – 2n: 40 labels, 240=1,099,511,627,776 multi-class classification
– Learn L binary classifiers by formatting the training problems as – Only capture the dependency of yi on y1, …, yi-1
5
1 0 5 … 1 0 0 … 1 ? 0 … 0 1 2 … 0 2 1 … 1 1 0 … 3 1 0 … 0 0 1 0 … ? 0 1 … 0 0 1 … 0 9 0 1 … 1 ? 1 … 1 0 0 … 0 1 0 1 0 1 0 … ? 1 ? …
Machine days
Incomplete signature counts as features
Incomple te labels
Training
Train a prediction model for a given product
Feature Matrix Classification Model Label Matrix
Corrupted / Incomplete data
Weak supervision
constraint
Methods Feature Values Labels Transductive/ Inductive BiasMC (ICML’15) Complete Positive (Weak) Both WELL (AAAI’10) Complete Positive (Weak) Transductive LEML (ICML’14) Complete Positive and Negative Inductive CoEmbed (AAAI’17) Complete Positive and Negative Transductive MC-1 (NIPS’10) Missing Positive and Negative Transductive DirtyIMC (NIPS’15) Noisy Positive and Negative Both Our study Missing Positive (Weak) Both Q: Give this column?
Incomplete Feature Matrix (signatures
Partially observed Label Matrix (security event class) Low-rank LSE based Matrix Factorization X = U VT Shared Embedding Space Y = W HT Cost-Sensitive Logistic Matrix Factorization Logit + R(W)
WH
T =φ(UV T )
Incomplete Feature Matrix (signatures
Partially observed Label Matrix (security event class) Low-rank LSE based Matrix Factorization X = U VT Shared Embedding Space Y = W HT Cost-Sensitive Logistic Matrix Factorization Logit + R(W)
WH
T =φ(UV T )
X U VT
Incomplete Feature Matrix (signatures
Partially observed Label Matrix (security event class) Low-rank LSE based Matrix Factorization X = U VT Shared Embedding Space Y = W HT Cost-Sensitive Logistic Matrix Factorization Logit + R(W)
WH
T =φ(UV T )
W,H
(1−2Yi, j )Xi,:(WH T ),:, j ) i, j
2 + H 2)
Observed and positively labeled entries Unobserved thus unlabeled entries
Feature completion Label completion Tolerance to residual error Functional Feature Extraction
– M, D: the number of labels and the dimensionality of feature vectors – N: the number of training samples – t : the upper bound of the spectral norm of H – : maximum L2-norm of the row vectors in X
Flexible for both Transductive and Inductive setting
Ali Rahimi and Ben Recht, Random Features for Large-Scale Kernel Machines, NIPS 2007
Non-linear case:
– BiasMC (transductive)and BiasMC-I (inductive), by PU-learning – LEML (cost-sensitive binomial loss), need + and - labels – LEML (least squared loss) – WELL, weak labels – CoEmbed, need + and - labels – MC-1, need + and - labels – DirtyIMC, need + and - labels
With missing or noisy feature values With complete Feature values
Real-world IOT device event detection data Public benchmark data
Transductive mode test Inductive mode test