semi supervised learning
play

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative HW 4 due April 10 Recommender Systems Motivation Problem formulation Content-based recommendations Collaborative filtering


  1. Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

  2. Administrative • HW 4 due April 10

  3. Recommender Systems • Motivation • Problem formulation • Content-based recommendations • Collaborative filtering • Mean normalization

  4. Problem motivation 𝑦 1 𝑦 2 Movie Alice (1) Bob (2) Carol (3) Dave (4) (romance) (action) Love at last 5 5 0 0 0.9 0 Romance 5 ? ? 0 1.0 0.01 forever Cute puppies ? 4 0 ? 0.99 0 of love Nonstop car 0 0 5 4 0.1 1.0 chases Swords vs. 0 0 5 ? 0 0.9 karate

  5. Problem motivation 𝑦 1 𝑦 2 Movie Alice (1) Bob (2) Carol (3) Dave (4) (romance) (action) Love at last 5 5 0 0 ? ? Romance 5 ? ? 0 ? ? forever Cute puppies ? 4 0 ? ? ? of love Nonstop car 0 0 5 4 ? ? chases Swords vs. 0 0 5 ? ? ? karate 0 0 0 0 ? 𝜄 1 = 𝜄 2 = 𝜄 3 = 𝜄 4 = 𝑦 1 = 5 5 0 0 ? 0 0 5 5 ?

  6. Optimization algorithm • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , to learn 𝑦 (𝑗) : 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min ෍ 2 ෍ 𝑦 𝑙 2 𝑦 (𝑗) 𝑘:𝑠 𝑗,𝑘 =1 𝑙=1 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , to learn 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑜 𝑛 ) : 𝑜 𝑛 𝑜 𝑛 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min 2 ෍ ෍ 2 ෍ ෍ 𝑦 𝑙 𝑦 (1) ,𝑦 (2) ,⋯,𝑦 (𝑜𝑛) 𝑗=1 𝑘:𝑠 𝑗,𝑘 =1 𝑗=1 𝑙=1

  7. Collaborative filtering • Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 (and movie ratings), Can estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 Can estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛

  8. Collaborative filtering optimization objective • Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 𝑜 𝑣 𝑜 𝑣 𝑜 1 2 + 𝜇 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 min 2 ෍ ෍ 2 ෍ ෍ 𝜄 𝑙 𝜄 1 ,𝜄 2 ,⋯,𝜄 𝑜𝑣 𝑘=1 𝑘=1 𝑙=1 𝑗:𝑠 𝑗,𝑘 =1 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 𝑜 𝑛 𝑜 𝑛 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min 2 ෍ ෍ 2 ෍ ෍ 𝑦 𝑙 𝑦 (1) ,𝑦 (2) ,⋯,𝑦 (𝑜𝑛) 𝑗=1 𝑘:𝑠 𝑗,𝑘 =1 𝑗=1 𝑙=1

  9. Collaborative filtering optimization objective • Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 𝑜 𝑣 𝑜 𝑣 𝑜 1 2 + 𝜇 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 min 2 ෍ ෍ 2 ෍ ෍ 𝜄 𝑙 𝜄 1 ,𝜄 2 ,⋯,𝜄 𝑜𝑣 𝑘=1 𝑗:𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 𝑜 𝑛 𝑜 𝑛 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min 2 ෍ ෍ 2 ෍ ෍ 𝑦 𝑙 𝑦 (1) ,𝑦 (2) ,⋯,𝑦 (𝑜𝑛) 𝑗=1 𝑘:𝑠 𝑗,𝑘 =1 𝑗=1 𝑙=1 • Minimize 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 and 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 simultaneously 𝑜 𝑣 𝑜 𝑜 𝑛 𝑜 𝐾 = 1 2 + 𝜇 + 𝜇 2 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 ෍ 2 ෍ ෍ 𝜄 𝑙 2 ෍ ෍ 𝑦 𝑙 2 𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 𝑗=1 𝑙=1

  10. Collaborative filtering optimization objective 𝐾(𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 ) = 𝑜 𝑣 𝑜 𝑜 𝑛 𝑜 1 2 + 𝜇 + 𝜇 2 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 𝜄 𝑙 𝑦 𝑙 ෍ 2 ෍ ෍ 2 ෍ ෍ 2 𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 𝑗=1 𝑙=1

  11. Collaborative filtering algorithm • Initialize 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 to small random values • Minimize 𝐾(𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 ) using gradient descent (or an advanced optimization algorithm). For every 𝑘 = 1 ⋯ 𝑜 𝑣 , 𝑗 = 1, ⋯ , 𝑜 𝑛 : ⊤ 𝑦 𝑗 𝑘 ≔ 𝑦 𝑙 𝑘 − 𝛽 𝑗 + 𝜇 𝑦 𝑙 − 𝑧 𝑗,𝑘 ) 𝜄 𝑙 (𝑗) ( 𝜄 𝑘 𝑦 𝑙 ෍ 𝑘:𝑠 𝑗,𝑘 =1 ⊤ 𝑦 𝑗 𝑘 ≔ 𝜄 𝑙 𝑘 − 𝛽 𝑗 + 𝜇 𝜄 𝑙 − 𝑧 𝑗,𝑘 ) 𝑦 𝑙 (𝑘) ( 𝜄 𝑘 𝜄 𝑙 ෍ 𝑗:𝑠 𝑗,𝑘 =1 • For a user with parameter 𝜄 and movie with (learned) feature 𝑦 , predict a star rating of 𝜄 ⊤ 𝑦

  12. Collaborative filtering Movie Alice (1) Bob (2) Carol (3) Dave (4) Love at last 5 5 0 0 Romance forever 5 ? ? 0 Cute puppies of ? 4 0 ? love Nonstop car chases 0 0 5 4 Swords vs. karate 0 0 5 ?

  13. Collaborative filtering • Predicted ratings: ⊤ − ⊤ − Y = XΘ ⊤ 𝑦 1 𝜄 1 − − ⊤ − ⊤ − 𝑦 2 𝜄 2 − − 𝑌 = Θ = ⋮ ⋮ ⊤ − ⊤ − Low-rank matrix factorization 𝑦 𝑜 𝑛 𝜄 𝑜 𝑣 − −

  14. Finding related movies/products • For each product 𝑗 , we learn a feature vector 𝑦 (𝑗) ∈ 𝑆 𝑜 𝑦 1 : romance, 𝑦 2 : action, 𝑦 3 : comedy, … • How to find movie 𝑘 relate to movie 𝑗 ? Small 𝑦 (𝑗) − 𝑦 (𝑘) movie j and I are “similar”

  15. Recommender Systems • Motivation • Problem formulation • Content-based recommendations • Collaborative filtering • Mean normalization

  16. Users who have not rated any movies Movie Alice (1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last 5 5 0 0 ? Romance 5 ? ? 0 ? forever Cute puppies ? 4 0 ? ? of love Nonstop car 0 0 5 4 ? chases Swords vs. 0 0 5 ? ? karate 𝑜 𝑣 𝑜 𝑜 𝑛 𝑜 1 2 + 𝜇 + 𝜇 2 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 ෍ 2 ෍ ෍ 𝜄 𝑙 2 ෍ ෍ 𝑦 𝑙 2 𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 𝑗=1 𝑙=1 𝜄 (5) = 0 0

  17. Users who have not rated any movies Movie Alice (1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last 5 5 0 0 0 Romance 5 ? ? 0 0 forever Cute puppies ? 4 0 ? 0 of love Nonstop car 0 0 5 4 0 chases Swords vs. 0 0 5 ? 0 karate 𝑜 𝑣 𝑜 𝑜 𝑛 𝑜 1 2 + 𝜇 + 𝜇 2 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 ෍ 2 ෍ ෍ 𝜄 𝑙 2 ෍ ෍ 𝑦 𝑙 2 𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 𝑗=1 𝑙=1 𝜄 (5) = 0 0

  18. Mean normalization Learn 𝜄 (𝑘) , 𝑦 (𝑗) For user 𝑘 , on movie 𝑗 predict: ⊤ 𝑦 (𝑗) + 𝜈 𝑗 𝜄 𝑘 User 5 (Eve): ⊤ 𝑦 (𝑗) + 𝜈 𝑗 𝜄 5 = 0 𝜄 5 0

  19. Recommender Systems • Motivation • Problem formulation • Content-based recommendations • Collaborative filtering • Mean normalization

  20. Review: Supervised Learning • K nearest neighbor • Linear Regression • Naïve Bayes • Logistic Regression • Support Vector Machines • Neural Networks

  21. Review: Unsupervised Learning • Clustering, K-Mean • Expectation maximization • Dimensionality reduction • Anomaly detection • Recommendation system

  22. Advanced Topics • Semi-supervised learning • Probabilistic graphical models • Generative models • Sequence prediction models • Deep reinforcement learning

  23. Semi-supervised Learning • Motivation • Problem formulation • Consistency regularization • Entropy-based method • Pseudo-labeling

  24. Semi-supervised Learning • Motivation • Problem formulation • Consistency regularization • Entropy-based method • Pseudo-labeling

  25. Classic Paradigm Insufficient Nowadays • Modern applications: massive amounts of raw data. • Only a tiny fraction can be annotated by human experts Protein sequences Billions of webpages Images

  26. Semi-supervised Learning

  27. Active Learning

  28. Semi-supervised Learning • Motivation • Problem formulation • Consistency regularization • Entropy-based method • Pseudo-labeling

  29. Semi-supervised Learning Problem Formulation • Labeled data 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑛 𝑚 , 𝑧 𝑛 𝑚 𝑇 𝑚 = • Unlabeled data 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑛 𝑣 , 𝑧 𝑛 𝑣 𝑇 𝑣 = • Goal: Learn a hypothesis ℎ 𝜄 (e.g., a classifier) that has small error

  30. Combining labeled and unlabeled data - Classical methods • Transductive SVM [Joachims ’99] • Co- training [Blum and Mitchell ’98] • Graph-based methods [Blum and Chawla ‘01] [ Zhu, Ghahramani, Lafferty ‘03]

  31. Transductive SVM • The separator goes through low density regions of the space / large margin

  32. Transductive SVM SVM Inputs: Inputs: (𝑗) , 𝑧 l (𝑗) (𝑗) , 𝑧 𝑣 (𝑗) 𝑦 l , 𝑦 u (𝑗) , 𝑧 l (𝑗) 𝑦 l 1 1 𝑜 2 𝑜 2 2 σ 𝑘=1 min 𝜄 2 σ 𝑘=1 min 𝜄 𝑘 𝑘 𝜄 𝜄 𝑗 ≥ 1 (𝑗) 𝜄 ⊤ 𝑦 𝑚 s. t. 𝑧 l 𝑗 ≥ 1 (𝑗) 𝜄 ⊤ 𝑦 𝑚 s. t. 𝑧 l ෢ (𝑗) 𝜄 ⊤ 𝑦 𝑗 ≥ 1 𝑧 u ෢ 𝑗 ∈ {−1, 1} 𝑧 u

  33. Transductive SVMs • First maximize margin over the labeled points • Use this to give initial labels to unlabeled points based on this separator. • Try flipping labels of unlabeled points to see if doing so can increase margin

  34. Deep Semi-supervised Learning

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend