Supervising Unsupervised Learning Vikas K. Garg & Adam Kalai - PowerPoint PPT Presentation

Supervising Unsupervised Learning Vikas K. Garg & Adam Kalai Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

Clustering problem Clustering repository: in isolation: 1˚ 14˚ 2˚ 15˚ 0 db 3 db How many clusters? 65˚ 34˚ Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

Contributions Introduce a principled framework to evaluate unsupervised settings Show how to transfer knowledge across heterogeneous datasets different sizes, dimensions, representations, domains... Design provably efficient algorithms select clustering algorithm and number of clusters, determine threshold in single-linkage clustering remove outliers, recycle problems Make good meta-clustering possible introduce meta-scale-invariance property show how to circumvent Kleinberg’s impossibility result Automate deep feature learning across very small datasets encode diverse small data effectively into big data perform non-trivial zero shot learning Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

General approach Define a meta-distribution µ over all problems in the universe Each training sample is a dataset drawn i.i.d. from µ Learn a mapping from an intrinsic measure to an extrinsic measure Intrinsic measure avoids labels and abstracts away heterogeneity Each test problem is drawn from µ but labels are hidden Compute intrinsic measure on test and predict the extrinsic quality Encode covariance of small datasets for deep zero-shot learning Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

Number of clusters Summary Run k -means algorithm with different k on each train dataset. Use Silhouette Index (SI) as intrinsic measure. Use Adjusted Rand Index (ARI) as extrinsic measure. Selecting the number of clusters Silhouette 0 . 12 Ours 0 . 115 Average ARI 0 . 11 0 . 105 0 . 1 40 60 80 100 120 140 160 180 200 220 240 260 280 300 Number of training datasets Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

Clustering algorithm (assume fixed k for simplicity) Summary Run different algorithms to get k clusters & compute SI. Form a feature vector from SI and dataset specific features (e.g. max and min singular values, size, dimensionality). Use Adjusted Rand Index (ARI) as extrinsic measure. Performance of different algorithms 0 . 13 Ours KMeans 0 . 12 Adjusted Rand Index (ARI) KMeans-N 0 . 11 Ward Ward-N 0 . 1 Average 0 . 09 Average-N Complete 0 . 08 Complete-N 0 . 07 Spectral Spectral-N 0 . 06 0 . 05 0 . 04 0 . 03 0 . 02 40 60 80 100 120 140 160 180 200 220 Number of training datasets Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

Fraction of outliers Summary Remove points with large norms, cluster other points, and compute SI. Put the removed points into clusters, and compute ARI. Find the candidate fraction that performs best on test set. Extensions possible to customize fractions for each test set. Performance with outlier removal 0 . 13 5% Average Adjusted Rand Index 4% 0 . 125 3% 2% 1% 0 . 12 0% 0 . 115 0 . 11 0 . 105 0 . 1 0 50 100 150 200 250 300 Number of training datasets Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

Deep learning binary similarity function Summary Sample pairs of examples from each small dataset. For each pair, also include covariance features specific to its dataset. Label 1 if the sampled pair comes from same cluster, 0 otherwise. Train a deep net classifier on all the pairs together. Predict whether test pair comes from same cluster or not. Average binary similarity prediction accuracy 0 . 8 Ours 0 . 75 Majority Average accuracy 0 . 7 0 . 65 0 . 6 0 . 55 0 . 5 Internal test (IT) External test (ET) Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

See you... Tue Dec 4th 05:00 – 07:00 PM Room 210 & 230 AB Poster #164 Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

Supervising Unsupervised Learning Vikas K. Garg & Adam Kalai - PowerPoint PPT Presentation

Supervising Unsupervised Learning Vikas K. Garg & Adam Kalai Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning Clustering problem Clustering repository: in isolation: 1 14 2 15 0 db 3 db How many clusters?

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised learning introduction October 7, 2019 Unsupervised learning introduction

Response-based Learning for Grounded Grounded SMT Riezler, Machine Translation Simianer, Haas

Projective Geometry and Light Various slides from previous courses by: D.A. Forsyth (Berkeley /

Extrinsic surface passivation of silicon solar cells Ruy Sebastian Bonilla Department of

Paragraph Clustering for Intrinsic Plagiarism Detection Using a Stylistic Vector Space Model

Evaluation methods for unsupervised word embeddings EMNLP 2015 Tobias Schnabel, Igor Labutov,

Motivating Yourself Olivia Roche HELLO! I am Olivia Roche I am a ______ trainer since XXXX.

@danhaesler @danhaesler WHAT DO YOU SEE? @danhaesler CONFUSION ANXIETY RESISTANCE

Word Embedding CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani