Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Administrative • HW 4 due April 10

Recommender Systems • Motivation • Problem formulation • Content-based recommendations • Collaborative filtering • Mean normalization

Problem motivation 𝑦 1 𝑦 2 Movie Alice (1) Bob (2) Carol (3) Dave (4) (romance) (action) Love at last 5 5 0 0 0.9 0 Romance 5 ? ? 0 1.0 0.01 forever Cute puppies ? 4 0 ? 0.99 0 of love Nonstop car 0 0 5 4 0.1 1.0 chases Swords vs. 0 0 5 ? 0 0.9 karate

Problem motivation 𝑦 1 𝑦 2 Movie Alice (1) Bob (2) Carol (3) Dave (4) (romance) (action) Love at last 5 5 0 0 ? ? Romance 5 ? ? 0 ? ? forever Cute puppies ? 4 0 ? ? ? of love Nonstop car 0 0 5 4 ? ? chases Swords vs. 0 0 5 ? ? ? karate 0 0 0 0 ? 𝜄 1 = 𝜄 2 = 𝜄 3 = 𝜄 4 = 𝑦 1 = 5 5 0 0 ? 0 0 5 5 ?

Optimization algorithm • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , to learn 𝑦 (𝑗) : 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min ෍ 2 ෍ 𝑦 𝑙 2 𝑦 (𝑗) 𝑘:𝑠 𝑗,𝑘 =1 𝑙=1 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , to learn 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑜 𝑛 ) : 𝑜 𝑛 𝑜 𝑛 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min 2 ෍ ෍ 2 ෍ ෍ 𝑦 𝑙 𝑦 (1) ,𝑦 (2) ,⋯,𝑦 (𝑜𝑛) 𝑗=1 𝑘:𝑠 𝑗,𝑘 =1 𝑗=1 𝑙=1

Collaborative filtering • Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 (and movie ratings), Can estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 Can estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛

Collaborative filtering optimization objective • Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 𝑜 𝑣 𝑜 𝑣 𝑜 1 2 + 𝜇 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 min 2 ෍ ෍ 2 ෍ ෍ 𝜄 𝑙 𝜄 1 ,𝜄 2 ,⋯,𝜄 𝑜𝑣 𝑘=1 𝑘=1 𝑙=1 𝑗:𝑠 𝑗,𝑘 =1 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 𝑜 𝑛 𝑜 𝑛 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min 2 ෍ ෍ 2 ෍ ෍ 𝑦 𝑙 𝑦 (1) ,𝑦 (2) ,⋯,𝑦 (𝑜𝑛) 𝑗=1 𝑘:𝑠 𝑗,𝑘 =1 𝑗=1 𝑙=1

Collaborative filtering optimization objective • Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 𝑜 𝑣 𝑜 𝑣 𝑜 1 2 + 𝜇 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 min 2 ෍ ෍ 2 ෍ ෍ 𝜄 𝑙 𝜄 1 ,𝜄 2 ,⋯,𝜄 𝑜𝑣 𝑘=1 𝑗:𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 𝑜 𝑛 𝑜 𝑛 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min 2 ෍ ෍ 2 ෍ ෍ 𝑦 𝑙 𝑦 (1) ,𝑦 (2) ,⋯,𝑦 (𝑜𝑛) 𝑗=1 𝑘:𝑠 𝑗,𝑘 =1 𝑗=1 𝑙=1 • Minimize 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 and 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 simultaneously 𝑜 𝑣 𝑜 𝑜 𝑛 𝑜 𝐾 = 1 2 + 𝜇 + 𝜇 2 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 ෍ 2 ෍ ෍ 𝜄 𝑙 2 ෍ ෍ 𝑦 𝑙 2 𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 𝑗=1 𝑙=1

Collaborative filtering optimization objective 𝐾(𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 ) = 𝑜 𝑣 𝑜 𝑜 𝑛 𝑜 1 2 + 𝜇 + 𝜇 2 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 𝜄 𝑙 𝑦 𝑙 ෍ 2 ෍ ෍ 2 ෍ ෍ 2 𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 𝑗=1 𝑙=1

Collaborative filtering algorithm • Initialize 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 to small random values • Minimize 𝐾(𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 ) using gradient descent (or an advanced optimization algorithm). For every 𝑘 = 1 ⋯ 𝑜 𝑣 , 𝑗 = 1, ⋯ , 𝑜 𝑛 : ⊤ 𝑦 𝑗 𝑘 ≔ 𝑦 𝑙 𝑘 − 𝛽 𝑗 + 𝜇 𝑦 𝑙 − 𝑧 𝑗,𝑘 ) 𝜄 𝑙 (𝑗) ( 𝜄 𝑘 𝑦 𝑙 ෍ 𝑘:𝑠 𝑗,𝑘 =1 ⊤ 𝑦 𝑗 𝑘 ≔ 𝜄 𝑙 𝑘 − 𝛽 𝑗 + 𝜇 𝜄 𝑙 − 𝑧 𝑗,𝑘 ) 𝑦 𝑙 (𝑘) ( 𝜄 𝑘 𝜄 𝑙 ෍ 𝑗:𝑠 𝑗,𝑘 =1 • For a user with parameter 𝜄 and movie with (learned) feature 𝑦 , predict a star rating of 𝜄 ⊤ 𝑦

Collaborative filtering Movie Alice (1) Bob (2) Carol (3) Dave (4) Love at last 5 5 0 0 Romance forever 5 ? ? 0 Cute puppies of ? 4 0 ? love Nonstop car chases 0 0 5 4 Swords vs. karate 0 0 5 ?

Collaborative filtering • Predicted ratings: ⊤ − ⊤ − Y = XΘ ⊤ 𝑦 1 𝜄 1 − − ⊤ − ⊤ − 𝑦 2 𝜄 2 − − 𝑌 = Θ = ⋮ ⋮ ⊤ − ⊤ − Low-rank matrix factorization 𝑦 𝑜 𝑛 𝜄 𝑜 𝑣 − −

Finding related movies/products • For each product 𝑗 , we learn a feature vector 𝑦 (𝑗) ∈ 𝑆 𝑜 𝑦 1 : romance, 𝑦 2 : action, 𝑦 3 : comedy, … • How to find movie 𝑘 relate to movie 𝑗 ? Small 𝑦 (𝑗) − 𝑦 (𝑘) movie j and I are “similar”

Users who have not rated any movies Movie Alice (1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last 5 5 0 0 ? Romance 5 ? ? 0 ? forever Cute puppies ? 4 0 ? ? of love Nonstop car 0 0 5 4 ? chases Swords vs. 0 0 5 ? ? karate 𝑜 𝑣 𝑜 𝑜 𝑛 𝑜 1 2 + 𝜇 + 𝜇 2 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 ෍ 2 ෍ ෍ 𝜄 𝑙 2 ෍ ෍ 𝑦 𝑙 2 𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 𝑗=1 𝑙=1 𝜄 (5) = 0 0

Users who have not rated any movies Movie Alice (1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last 5 5 0 0 0 Romance 5 ? ? 0 0 forever Cute puppies ? 4 0 ? 0 of love Nonstop car 0 0 5 4 0 chases Swords vs. 0 0 5 ? 0 karate 𝑜 𝑣 𝑜 𝑜 𝑛 𝑜 1 2 + 𝜇 + 𝜇 2 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 ෍ 2 ෍ ෍ 𝜄 𝑙 2 ෍ ෍ 𝑦 𝑙 2 𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 𝑗=1 𝑙=1 𝜄 (5) = 0 0

Mean normalization Learn 𝜄 (𝑘) , 𝑦 (𝑗) For user 𝑘 , on movie 𝑗 predict: ⊤ 𝑦 (𝑗) + 𝜈 𝑗 𝜄 𝑘 User 5 (Eve): ⊤ 𝑦 (𝑗) + 𝜈 𝑗 𝜄 5 = 0 𝜄 5 0

Review: Supervised Learning • K nearest neighbor • Linear Regression • Naïve Bayes • Logistic Regression • Support Vector Machines • Neural Networks

Review: Unsupervised Learning • Clustering, K-Mean • Expectation maximization • Dimensionality reduction • Anomaly detection • Recommendation system

Advanced Topics • Semi-supervised learning • Probabilistic graphical models • Generative models • Sequence prediction models • Deep reinforcement learning

Semi-supervised Learning • Motivation • Problem formulation • Consistency regularization • Entropy-based method • Pseudo-labeling

Classic Paradigm Insufficient Nowadays • Modern applications: massive amounts of raw data. • Only a tiny fraction can be annotated by human experts Protein sequences Billions of webpages Images

Semi-supervised Learning

Active Learning

Semi-supervised Learning • Motivation • Problem formulation • Consistency regularization • Entropy-based method • Pseudo-labeling

Semi-supervised Learning Problem Formulation • Labeled data 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑛 𝑚 , 𝑧 𝑛 𝑚 𝑇 𝑚 = • Unlabeled data 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑛 𝑣 , 𝑧 𝑛 𝑣 𝑇 𝑣 = • Goal: Learn a hypothesis ℎ 𝜄 (e.g., a classifier) that has small error

Combining labeled and unlabeled data - Classical methods • Transductive SVM [Joachims ’99] • Co- training [Blum and Mitchell ’98] • Graph-based methods [Blum and Chawla ‘01] [ Zhu, Ghahramani, Lafferty ‘03]

Transductive SVM • The separator goes through low density regions of the space / large margin

Transductive SVM SVM Inputs: Inputs: (𝑗) , 𝑧 l (𝑗) (𝑗) , 𝑧 𝑣 (𝑗) 𝑦 l , 𝑦 u (𝑗) , 𝑧 l (𝑗) 𝑦 l 1 1 𝑜 2 𝑜 2 2 σ 𝑘=1 min 𝜄 2 σ 𝑘=1 min 𝜄 𝑘 𝑘 𝜄 𝜄 𝑗 ≥ 1 (𝑗) 𝜄 ⊤ 𝑦 𝑚 s. t. 𝑧 l 𝑗 ≥ 1 (𝑗) 𝜄 ⊤ 𝑦 𝑚 s. t. 𝑧 l ෢ (𝑗) 𝜄 ⊤ 𝑦 𝑗 ≥ 1 𝑧 u ෢ 𝑗 ∈ {−1, 1} 𝑧 u

Transductive SVMs • First maximize margin over the labeled points • Use this to give initial labels to unlabeled points based on this separator. • Try flipping labels of unlabeled points to see if doing so can increase margin

Deep Semi-supervised Learning

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative HW 4 due April 10 Recommender Systems Motivation Problem formulation Content-based recommendations Collaborative filtering

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

5 Semi-Supervised Learning BVM Tutorial: Advanced Deep Learning Methods David Zimmerer, Division

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

Parallelizing Semi- ReDAS Lab Supervised Learning Algorithms with MapReduce Nick Gauthier

Keepin It Real: Semi-Supervised Learning with Realistic Tuning Andrew B. Goldberg Xiaojin

Semi-Supervised Learning Tutorial Xiaojin Zhu Department of Computer Sciences University of

CLEF and P CLEF and P PROMISEs PROMISEs Nicola a Ferro Information Management Sys

Announcements 2011 http://tinyurl.com/clef2012 19 10 15 20 0 5 Spain Germany France

Practice to Pass By Kenneth Amunugama BSc (Hons) Business and Management Northumbria

Just for fun: too smart to fail Just for fun: too smart to fail Francesco P. Lovergine

Buildi ding ng Reco commen mmende ders rs and Searc rch h Engines es by Re-usin sing g

Heads and history nominal domain till in Swedish Prepositions in the verbal domain Infinitival

k-Nearest Neighbors Lecture 2 k-Nearest Neighbors September 16, 2015 1 Wentworth Institute of

PACT Academy Building, Leveraging and Communicating with your Board and Advisors December 8,