Fast Direct Search in an Optimally Compressed Continuous Target - PowerPoint PPT Presentation

Introduction Methodology Results Fast Direct Search in an Optimally Compressed Continuous Target Space for Efficient Multi-Label Active Learning Weishi Shi and Qi Yu B. Thomas Golisano College of Computing and Information Sciences Rochester Institute of Technology Jun 2019 Weishi Shi and Qi Yu Multi-label Active Learning

Introduction Methodology Results Multi-Label Active Learning Multi-label classification (ML-C) aims to learn a model that automatically assigns a set of relevant labels to a data instance. Multi-label problems naturally arise in many applications, including various image classification and video/audio recognition tasks. Data labeling for model training becomes more labor intensive as it is necessary to check each label in a potentially large label space, making active learning more important. Key challenges for multi-label AL Sampling measure is hard to design due to label correlations. Rare labels are much harder to detect. Computational cost increases fast with the number of labels. Weishi Shi and Qi Yu Multi-label Active Learning

Introduction Methodology Results CS-BPCA Label Transformation We have proposed a principled two-level label transformation (Compressed Sensing (CS) + Bayesian Principal Component Analysis (BPCA)) strategy that enables multi-label active learning to be performed in an optimally compressed target space. CS-BPCA: Two-level Label Transformation Compressing/sampling Original label space (Y) CS Compressed space (R) BPCA Target space (U) Data Sample MOGP Recovery/prediction Weishi Shi and Qi Yu Multi-label Active Learning

Introduction Methodology Results CS-BPCA Label Transformation We have proposed a principled two-level label transformation (Compressed Sensing (CS) + Bayesian Principal Component Analysis (BPCA)) strategy that enables multi-label active learning to be performed in an optimally compressed target space. CS-BPCA: Two-level Label Transformation Compressing/sampling Original label space (Y) CS Compressed space (R) BPCA Target space (U) Data Sample MOGP Recovery/prediction Key Properties of the Transformed Label Space Optimally compressed: The optimal compressing rate is automatically determined. Orthogonal: Label correlation is fully decoupled. Weishi Shi and Qi Yu Multi-label Active Learning

Introduction Methodology Results Multi-output GP (MOGP) based Data Sampling Two key benefits Output the predictive entropy that provides an informative measure for uncertainty based data sampling. Use a flexible covariance function to precisely capture the covariance structure of the input data. A flexible kernel function k ( x i , x j ) = θ 0 exp {− θ 1 2 � x i − x j � 2 } + θ 2 x T i x j + θ 3 Apply to the optimally compressed target space Continuous: Consistent with the MOGP assumption; Compact: Efficient computation; Weighted: Precise sampling; Orthogonality: Decoupling label correlation. Weishi Shi and Qi Yu Multi-label Active Learning

Introduction Methodology Results Gradient-free Hyper-parameter Optimization High computational cost of gradient based methods Compute the gradient of the likelihood over each hyperparameter until convergence (via p iterations): O | θ | pm 3 [Need to run multiple times due to a non-convex likelihood]. Construct the covariance matrix of input data: O ( m 2 n ). The overall complexity: O ( | θ | ( pm 3 + m 2 n )) Fast kernel re-estimation for covariance matrix construction We separate two blocks of computation that are invariant to θ and only partially update the kernel matrix for fast covariance matrix construction. O ( m 2 n ) − → O ( m 2 ) Weishi Shi and Qi Yu Multi-label Active Learning

Introduction Methodology Results Gradient-free Hyper-parameter Optimization Bayesian Optimization (B-OPT) Use expected improvement as a cheap surrogate of the likelihood to choose a candidate θ from the grid search space. Need to define a grid search space. Simplex Optimization (S-OPT) Explore the search space by evolving (i.e., expanding, reflecting, and contracting) a simplex. Automatically explore the search space. Overall Complexity Reduction O ( | θ | ( pm 3 + m 2 n )) − → O ( qm 3 + m 2 ) where q ≪ p Weishi Shi and Qi Yu Multi-label Active Learning

Introduction Methodology Results Benchmark Datasets and Compared Models Summary of Datasets Dataset Domain Instances Features Labels Label Card Label Sparsity Delicious web 8172 500 157 5.56 0.03 BookMark publication 38548 2150 136 3.45 0.02 WebAPI software 9166 5659 90 2.50 0.02 Corel5K images 5000 499 132 3.25 0.02 Bibtex text 7013 1836 127 2.4 0.02 Competitive Active Learning Models for Multi-label Classification Type I models : Perform active learning in a compressed label space (CS-MIML, CS-BR, CS-RR). Type II models : Perform active learning in the original label space (MMC, Adaptive). Weishi Shi and Qi Yu Multi-label Active Learning

Introduction Methodology Results Comparison Results WEB API data Delicious data Bookmark data Corel5K data Bibtex data 0.30 0.16 0.35 0.28 0.15 0.25 0.30 0.25 0.26 0.14 0.20 0.25 0.24 0.13 F-score F-score F-score F-score F-score 0.22 0.12 0.20 0.20 0.15 0.20 0.11 CS-BPCA-GP CS-BPCA-GP CS-BPCA-GP 0.15 CS-BPCA-GP 0.18 CS-BR 0.10 CS-BR CS-BR CS-BR 0.10 0.15 0.16 CS-RR CS-BPCA-GP CS-RR CS-RR CS-RR CS-RR 0.10 0.09 CS-MIML CS-BR CS-MIML CS-MIML CS-MIML CS-MIML 0.14 0.05 0.08 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 Number of active iterations Number of active iterations Number of active iterations Number of active iterations Number of active iterations Comparison Result I Reduced WEB API Dataset Reduced Delicious Dataset Reduced Bookmark Dataset Reduced Colrel5K Dataset Reduced Bibtex Dataset 0.50 0.46 0.40 0.28 0.35 0.45 0.44 0.26 0.30 0.40 0.42 0.35 0.24 0.35 0.40 0.25 F-score F-score F-score F-score 0.22 F-score 0.30 0.38 0.30 0.20 0.20 0.25 0.36 0.18 0.20 0.34 CS-BPCA-GP CS-BPCA-GP CS-BPCA-GP CS-BPCA-GP CS-BPCA-GP 0.15 MMC MMC 0.25 MMC MMC MMC 0.16 0.15 0.32 Adaptive Adaptive Adaptive Adaptive Adaptive 0.10 0.30 0.14 0.10 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 Number of active iterations Number of active iterations Number of active iterations Number of active iterations Number of active iterations Comparison Result II Weishi Shi and Qi Yu Multi-label Active Learning

Introduction Methodology Results Rare Label Prediction Comparison Bookmark data Delicious data Corel5K data Adaptive Adaptive Adaptive CS-BPCA-GP CS-BPCA-GP CS-BPCA-GP 1.0 1.0 1.0 0.8 0.8 0.8 Recall Recall Recall 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.003 0.003 0.004 0.004 0.004 0.004 0.005 0.006 0.008 0.010 0.012 0.029 0.029 0.020 0.020 0.023 0.025 0.028 0.030 0.033 0.041 0.047 0.060 0.067 0.086 0.112 0.177 0.012 0.012 0.012 0.016 0.017 0.020 0.022 0.023 0.025 0.027 0.030 0.039 0.043 0.059 0.069 0.149 0.149 Label Frequency Label Frequency Label Frequency Bibtex data Web-API data 1.0 Adaptive Adaptive CS-BPCA-GP CS-BPCA-GP 1.0 0.8 0.8 0.6 Recall Recall 0.6 0.4 0.4 0.2 0.2 0.006 0.006 0.009 0.009 0.010 0.010 0.014 0.015 0.016 0.018 0.021 0.027 0.035 0.062 0.062 0.008 0.008 0.009 0.011 0.012 0.013 0.016 0.017 0.024 0.043 0.045 0.066 0.109 Label Frequency Label Frequency Rare Label Prediction Comparison The proposed model is effective at detecting rare labels by leveraging label correlation. Weishi Shi and Qi Yu Multi-label Active Learning

Introduction Methodology Results CPU Time of Hyper-parameter Optimization Dataset GA B-OPT S-OPT Delicious 1.83 0.17 0.20 BookMark 15.0 0.80 0.79 WebAPI 10.10 0.54 0.55 Corel5K 0.58 0.08 0.08 Bibtex 8.71 0.48 0.51 The proposed direct search methods learn the kernel parameters 10 ∼ 15 times faster than the gradient based methods. Weishi Shi and Qi Yu Multi-label Active Learning

Introduction Methodology Results Conclusions Propose a two-level CS-BPCA process to generate an optimally compressed, weighted, orthogonal, and continuous target space to support multi-label data sampling. Propose an MOGP based sampling function that accurately captures the covariance structure of the input data. Propose gradient-free hyper-parameter optimization to enable fast online active learning. Apply to real-world multi-label datasets from diverse domains to evaluate the effectiveness of the proposed model. Poster Poster ID: 261 Weishi Shi and Qi Yu Multi-label Active Learning

Fast Direct Search in an Optimally Compressed Continuous Target - PowerPoint PPT Presentation

Introduction Methodology Results Fast Direct Search in an Optimally Compressed Continuous Target Space for Efficient Multi-Label Active Learning Weishi Shi and Qi Yu B. Thomas Golisano College of Computing and Information Sciences Rochester

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je University of

Optimally Propagating SAT Encodings Martin Brain, Liana Hadarean , Ruben Martins and Daniel

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je Wrocaw,

Pattern Matching on Compressed T exts II Shunsuke Inenaga Kyushu University, Japan Agenda

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 Discrete Compressed Sensing R

Fast Data Driven Compressed Sensing and application to compressed quantitative MRI Mike Davies

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference

Optimal Predition Ma rk ets with Optimal Pla y ers Lea rn Optimally fo r Log Loss

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Great Lakes Chloride, Inc. Direct Liquid Application (DLA) Direct Liquid Application (DLA)

State of Collaboration Direct Deposit and Payroll Reissuance 1 1 Topics Direct Deposit

Direct loan Direct loan Information Information Feder deral Direct Student Loans l Direct

AIR CHALLENGE SUMMARY SUSTAINABILITY NORTH AMERICA WHY COMPRESSED AIR? Inappropriate

Introduction to Compressed Sensing Gitta Kutyniok (Institut f ur Mathematik, Technische

Aligning DNA sequences on compressed collections of genomes Part 2. Compressed indexing The

Save the Date 30 Days of Cancer Prayer is coming Sunday, Jan. 26th - Monday, Feb. 24th Come

Tsinghua @ TRECVID2007.search Zhikun Wang, Dong Wang, Huiyi Wang, Tongchun Xiao, Duanpeng Wang,

Impact of 3D bookmarks on Navigation and Streaming in a Networked Virtual Environment Thomas

TUESDAY 10.4 Updates WebEx Event # 807 378 505 Audio will be heard on your computer speakers.

On the Privacy of Private Browsing Kiavash Satvat, Ma8

Advanced UNIX CIS 218 Advanced UNIX Director ies again CIS 218 Advanced UNIX 1 Directory

Site Level Noise Removal for Search Engines Andr Luiz da Costa Carvalho Federal University of

Social Media Why You Should Care IST 331 - Olivier Georgeon, Frank Ritter 31 oct 15