multi label learning with highly incomplete data via
play

Multi-Label Learning with Highly Incomplete Data via Collaborative - PowerPoint PPT Presentation

Multi-Label Learning with Highly Incomplete Data via Collaborative Embedding Yufei Han 1 , Guolei Sun 2 , Yun Shen 1 , Xiangliang Zhang 2 1. Symantec Research Labs 2. King Abdullah University of Science and Technology Outline Introduction


  1. Multi-Label Learning with Highly Incomplete Data via Collaborative Embedding Yufei Han 1 , Guolei Sun 2 , Yun Shen 1 , Xiangliang Zhang 2 1. Symantec Research Labs 2. King Abdullah University of Science and Technology

  2. Outline • Introduction and Problem Definition • Our Methods • Experimental Results

  3. Multi-Label Classification in Cyber Security • Multi-class classification, f ( x ) = c 1? c 2? c 3? or f(x)=apple f(x)=banana f(x)=orange • Multi-label classification, f ( x ) = { c 1? and c 2? and c 3? } Multi-label classification Collaborative embedding Incomplete feature

  4. Existing popular solutions • Binary relevance – Construct classifier for each label independently – Not consider label dependency • Label power-set – Convert into multi-class classification – A,B: {}, {A}, {B}, {A,B} – 2 n : 40 labels, 2 40 =1,099,511,627,776 multi-class classification • Classifier Chains – Learn L binary classifiers by formatting the training problems as ( x i , y 1 , ..., y j − 1 ) → y j = { 0 , 1 } – Only capture the dependency of y i on y 1 , …, y i-1

  5. Use Case of Multi-label Classification Train a prediction model for a Training given product Incomplete signature Incomple counts as features te labels 1 0 5 … 1 0 0 … 1 ? 0 … 0 1 2 … 0 2 1 … 1 1 0 … 3 1 0 … 0 0 1 0 … ? 0 1 … Machine days 0 0 1 … 0 9 0 1 … 1 ? 1 … 1 0 0 … 0 1 0 1 0 1 0 … ? 1 ? … 5

  6. Our Problem: A Tale of Two Cities • Multi-label learning with incomplete feature values and weak labels – Training data (N instances with D X ∈ R N ∗ D features) is partially observed, with if X i,j Ω i,j = 1 is observed. Otherwise Ω i,j = 0 – Label assignment (M is the label Y ∈ { 0 , 1 } N ∗ M dimension) is a positive-unlabeled matrix, with indicating the corresponding instance X i,: is • Y i,j = 1 positively labeled in the j-th label indicating unobservable • Y i,j = 0

  7. Our Problem: A Tale of Two Cities Classification Feature Matrix Label Matrix Model Corrupted / Incomplete data Weak supervision Limited coverage of sensors • Semi-Supervised information • Privacy control • Positive Unlabeled / Partially • Failure of sensors • observed Supervision Partial responses • Weak Pairwise / Triple-wise • constraint

  8. Existing Approaches Methods Feature Values Labels Transductive/ Inductive BiasMC (ICML’15) Complete Positive (Weak) Both WELL (AAAI’10) Complete Positive (Weak) Transductive LEML (ICML’14) Complete Positive and Negative Inductive CoEmbed (AAAI’17) Complete Positive and Negative Transductive MC-1 (NIPS’10) Missing Positive and Negative Transductive DirtyIMC (NIPS’15) Noisy Positive and Negative Both Our study Missing Positive (Weak) Both Q: Give this column?

  9. Outline • Introduction and Problem Definition • Our Methods • Experimental Results

  10. Collaborative Embedding: A Transfer Learning Approach Incomplete Feature Partially observed T ) Matrix (signatures T = φ ( UV Label Matrix WH of security events) (security event class) Shared Embedding Space Cost-Sensitive Logistic Matrix Low-rank LSE based Matrix Factorization Factorization V T H T = Logit = + R(W) U X W Y

  11. Collaborative Embedding: A Transfer Learning Approach Incomplete Feature Partially observed T ) Matrix (signatures T = φ ( UV Label Matrix WH of security events) (security event class) Shared Embedding Space Cost-Sensitive Logistic Matrix Low-rank LSE based Matrix Factorization Factorization V T H T = Logit = + R(W) U X W Y

  12. Feature Matrix Completion • Low-rank Completion to Partially Observed Feature Matrix Ω x ∗ ( X − UV T ) U * , V * = argmin 2 U , V V T X U U : projected features of data instances V : spanning basis defining the projection subspace

  13. Collaborative Embedding: A Transfer Learning Approach Incomplete Feature Partially observed T ) Matrix (signatures T = φ ( UV Label Matrix WH of security events) (security event class) Shared Embedding Space Cost-Sensitive Logistic Matrix Low-rank LSE based Matrix Factorization Factorization V T H T + R(W) = Logit = U X W Y

  14. Label Matrix Reconstruction • Cost-sensitive Logistic Matrix Factorization on Positive- Unlabeled class assignment matrix 2 + H (1 − 2 Y i , j ) X i ,: ( WH T ) ,:, j ) W * , H * = argmin 2 ) ∑ Γ i , j log(1 + e + λ ( W W , H i , j Y i , j = 1 Observed and positively labeled entries Γ i , j = α Unobserved thus unlabeled entries Γ i , j = 1 − α Y i , j = 0 Y = I ( WH T )

  15. ColEmbed: Collaborative Embedding • Collaborative Embedding as a solution to learning with incomplete feature and weak labels: Feature completion Label completion Functional Feature Extraction Tolerance to residual error

  16. Upper Bound of Reconstruction Error • Provably reconstruction of the missing label entries – M, D: the number of labels and the dimensionality of feature vectors – N: the number of training samples – t : the upper bound of the spectral norm of H : maximum L2-norm of the row vectors in X – • The label reconstruction error is of the order of 1/(NM(1- ))

  17. ColEmbed-L • Linear Collaborative Embedding: f ( ˆ X ) = ˆ XS T Flexible for both Transductive and Inductive setting

  18. ColEmbed-NL • Non-linear Embedding: linear combination of random feature expansion Ali Rahimi and Ben Recht, Random Features for Large-Scale Kernel Machines, NIPS 2007

  19. ColEmbed-NL • Non-linear Embedding: linear combination of random feature expansion

  20. Training Process • Stochastic Gradient: Large-scale matrix factorization Non-linear case:

  21. Outline • Introduction and Problem Definition • Our Methods • Experimental Results

  22. Empirical Study • Empirical study aims at answering the following questions – Is it really helpful to reconstruct features and labels simultaneously ? – Do transductive and inductive classification present consistently high precision ? – Does the proposed method provide better classification compared to the state-of-the-art approaches ? – Does the proposed method scale well ?

  23. Methods to Compare • Baseline approaches: – BiasMC (transductive)and BiasMC-I (inductive) , by PU-learning – LEML (cost-sensitive binomial loss) , need + and - labels With complete – LEML (least squared loss) Feature values – WELL , weak labels – CoEmbed , need + and - labels – MC-1 , need + and - labels With missing or noisy – DirtyIMC , need + and - labels feature values • Incomplete feature matrix is completed using the convex low- rank matrix completion approach, noted as MC-Convex

  24. Evaluation Data Sets • Benchmark data sets Public benchmark data Real-world IOT device event detection data

  25. Feature Reconstruction • Lower errors on estimating the missing feature values, comparing to baseline method

  26. Transductive Classification Accuracy • Higher classification accuracy than baseline methods

  27. Inductive Classification Accuracy • Higher classification accuracy than baseline methods

  28. On Real-world Security Data • Consistent better performances on classifying real-world security data, comparing to baseline methods Transductive mode test Inductive mode test

  29. Efficiency Evaluation • Run time in seconds, linear w.r.t. the No. of instances

  30. Takeaway • Collaboratively reconstructing missing feature values and learning missing labels is beneficial for both tasks. • Our proposed method is applicable for both transductive and inductive classification setting. • Our proposed method has better performance than the state-of-the-art approaches.

  31. Future Work • Learning with incomplete data streams • Deep Neural Nets as a more powerful functional mapping between features and labels • Structured feature / label missing patterns • Further extension to multi-task learning

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend