Comment-based Multi-View Clustering of Web 2.0 Items Xiangnan He , - PowerPoint PPT Presentation

Comment-based Multi-View Clustering of Web 2.0 Items Xiangnan He , Min-Yen Kan, Peichu Xie, Xiao Chen Presenter: Xiangnan He Supervised by Prof. Min-Yen Kan Web IR/NLP Group (WING) National University of Singapore Presented at WWW’2014 main conference; April 11, 2014, Souel, South Korea

User Generated Content: A driving force of Web 2.0 Challenges:  Information overload  Dynamic, temporally evolving Web  Rich but noisy UGC Daily growth of UGC:  Twitter: 500+ million tweets  Flickr: 1+ million images  YouTube: 360,000+ hours of videos WING (Web IR / NLP Group) 2

Comment-based Multi-View Clustering Why clustering? Clustering benefits: – Automatically organizing web resources for content providers. – Diversifying search results in web search. – Improving text/image/video retrieval. – Assisting tag generation for web resources. WING (Web IR / NLP Group) 3

Comment-based Multi-View Clustering Why user comments? • Comments are rich sources of information: – Textual comments. – Commenting users. – Commenting timestamps. • Example: Comments are a suitable data source for the categorization of web sources! Figure YouTube video comments WING (Web IR / NLP Group) 4

Comment-based Multi-View Clustering Why user comments? • Comments are rich sources of information: – Textual comments. – Commenting users. – Commenting timestamps. • Example: Comments are a suitable data source for the categorization of web sources! Figure YouTube video comments WING (Web IR / NLP Group) 5

Xiangnan He Previous work – Comment-based clustering • Filippova and Hall [1]: YouTube video classification. – Showed that although textual comments are quite noisy, they provide a useful and complementary signal for categorization. • Hsu et al. [2]: Clustering YouTube videos. – Focused on de-noising the textual comments to use comments to cluster. • Li et al. [3]: Blog clustering. – Found that incorporating textual comments improves clustering over using just content (i.e., blog title and body). • Kuzar and Navrat [4]: Blog clustering. – Incorporated the identities of commenting users to improve the content-based clustering. [1] K. Filippova and K. B. Hall . Improved video categorization from text metadata and user comments . In SIGIR, 2011. [2] C.-F. Hsu, J. Caverlee, and E. Khabiri. Hierarchical comments-based clustering . In SAC, 2011. [3] B. Li, S. Xu, and J. Zhang . Enhancing clustering blog documents by utilizing author/reader comments . In ACM-SE, 2007. [4] T. Kuzar and P. Navrat. Slovak blog clustering enhanced by mining the web comments . In WI-IAT, 2011. WING (Web IR / NLP Group) 6

Xiangnan He Inspiration from Previous Work Both textual comments and identity of the commenting users contain useful signals for categorization. But no comprehensive study of comment-based clustering has been done to date. We aim to close this gap in this work. WING (Web IR / NLP Group) 9

Xiangnan He Problem Formulation Textual Items intrinsic Commenting comments features Users How to combine three heterogeneous views for better clustering? WING (Web IR / NLP Group) 10

Experimental evidence 1. On a single Table 1. Clustering accuracy (%) on the Last.fm and Yelp datasets dataset, different views yield differing Last.fm Yelp clustering quality. Method Des. Com. Usr. Des. Com. Usr. 2. For different datasets, the utility of views K-means varies. 23.5 30.1 34.5 25.2 56.3 25.0 (single view) 3. Simply concatenating the K-means feature space only (combined 40.1 (+5.6%)* 58.2 (+1.9%) leads to modest view) improvement. 4. Same trends result when using other clustering algorithms (e.g., NMF) WING (Web IR / NLP Group) 11

Clustering: NMF (Non-negative Matrix Factorization) 1 6 Feature Feature Item 1 Item 4 V V W H ≈ × k × n m × n m × k Adopted from Carmen Vaca et al. (WWW 2014) 12 12

Clustering: NMF (Non-negative Matrix Factorization) 1 6 Feature Feature Item 1 Item 4 V V W H ≈ × k × n m × n m × k Each entry W ik indicates the degree of item i belongs to cluster k . Adopted from Carmen Vaca et al. (WWW 2014) 13 13

Multi-View Clustering (MVC) • Hypothesis: – Different views should admit the same (or similar) underlying clustering. • How to implement this hypothesis under NMF? V 1 W 1 H 1 ≈ × V 2 W 2 H 2 ≈ × V 3 W 3 H 3 ≈ × WING (Web IR / NLP Group) 14

Existed Solution 1 – Collective NMF ( Akata et al. 2011 ) In 16th Computer Vision Winter Workshop, 2011. • Idea: – Forcing W matrix of different views to be the same. V 1 W 1 H 1 ≈ × V 2 W 2 H 2 ≈ × V 3 W 3 H 3 ≈ × • Drawback: – Too strict for real applications (theoretically shown to be equal to NMF on the combined view). WING (Web IR / NLP Group) 15

Existed Solution 2 – Joint NMF ( Liu et al. 2013 ) In Proc. of SDM 2013. • Idea: – Regularizing W matrices towards a common consensus. V 1 W 1 H 1 ≈ × V 2 W 2 H 2 ≈ × V 3 W 3 H 3 ≈ × • Drawback: – The consensus clustering degrades when incorporating low-quality views. WING (Web IR / NLP Group) 16

Proposed Solution – CoNMF (Co-regularized NMF) • Idea: – Imposing the similarity constraint on each pair of views (pair-wise co-regularization). V 1 W 1 H 1 ≈ × V 2 W 2 H 2 ≈ × V 3 W 3 H 3 ≈ × • Advantage: – Clustering learnt from each two views complement with each. – Less sensitive to low-quality views. WING (Web IR / NLP Group) 17

Xiangnan He CoNMF – Loss Function Pair-wise co-regularization: NMF part (combination of Co-regularization part (pair- NMF each individual view) wise similarity constraint) WING (Web IR / NLP Group) 18

Xiangnan He Pair-wise CoNMF solution • Alternating optimization: Do iterations until convergence: - Fixing W , optimizing over H ; - Fixing H , optimizing over W ; • Update rules: NMF part: equivalent to New! Co-regularization original NMF solution. part: capturing the similarity constraint. WING (Web IR / NLP Group) 19

Xiangnan He Normalization Problem Although the update rules guarantee to converge, but: 1. Comparable problem: W matrices of different views may not be comparable at the same scale. 2. Scaling problem ( c > 1, resulting to trivialized descent) : CoNMF loss function: WING (Web IR / NLP Group) 20

Xiangnan He Normalization Problem Although the update rules guarantee to find local minima, but: 1. Comparable problem: W matrices of different views may not be comparable at the same scale. 2. Scaling problem ( c > 1, resulting to trivialized descent) : Address these 2 concerns by incorporating normalization into the optimization process: – Normalizing W and H matrices per iteration prior to update: where Q is the diagonal matrix for normalizing W ( normalization- independent : any norm-strategy can apply, such as L 1 , and L 2 ) WING (Web IR / NLP Group) 21

Comment-based Multi-View Clustering of Web 2.0 Items Xiangnan He , - PowerPoint PPT Presentation

Comment-based Multi-View Clustering of Web 2.0 Items Xiangnan He , Min-Yen Kan, Peichu Xie, Xiao Chen Presenter: Xiangnan He Supervised by Prof. Min-Yen Kan Web IR/NLP Group (WING) National University of Singapore Presented at WWW2014 main

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

Multi-view Active Learning Ion Muslea University of Southern California Outline Multi-view

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

Multi-View Representation Learning: Algorithms and Applications Changqing Zhang ( )

Semantic Multi-View Model For Low-Power Carlos Gmez, Julien DeAntoni, Frdric Mallet

Multi-plane multi-view approach to project the sphere viewing sphere Introduction Global Map

Maple View Flats 1.24.17 1 Site Plan Maple View Flats - 1.24.17 Historic Homes to be moved and

Student view socrative.com STUDENT LOGIN or LOGIN, STUDENT LOGIN Room name: ALLIANCE Student

Those wishing to make a public comment should email their comment to publiccomment@ulsystem.edu

IGES view on new market IGES view on new market- IGES view on new market IGES view on new market

A O I Posterior View A O I Posterior View A O I

101 iOS Container View Controllers Container View Controllers Display a view controller inside

Multi Multi Multi- Multi - - -Layer Access Control Layer Access Control Layer Access

Collaborative View Synthesis for Interactive Multi-view Video Streaming Fei Chen, Jiangchuan

Efficient algorithms for estimating multi-view mixture models Daniel Hsu Microsoft Research, New

HOW TO LEVERAGE VIDEO MARKETING TO IMPRESS YOUR BUYERS INTRO VIDEO CONTENT CONTINUES TO

Few pertinent lessons learned that gave birth to the Starttech Ventures Lean Acceleration Program

Richard Thripp A Survey of Investing and Retirement Knowledge and Preferences of Florida

KERS Contribution Rates 90% 83.0% 80% 70% 60% 49.0% 49.0% 49.0% 49.0% 49.0% 50% 39.0%

Dream to Control Learning Behaviors by Latent Imagination Danijar Hafner, Timothy Lillicrap,

VP, Marketing Analytics, IBM VP of Marketing Analytics, IBM Has worked at IBM for 17 years

WEB TECHNOLOGY TUTORIAL SESSION #6 FOR WE CREATE IDENTITY Module 1 - We Create Identity

Welcome The Super Pig 2019 The Year of the Earth Pig Setting The Scene The Chinese Zodiac