Overlapping Clustering Models, and One (class) SVM to Bind Them All - PowerPoint PPT Presentation

Overlapping Clustering Models, and One (class) SVM to Bind Them All Xueyu Mao Department of Computer Science The University of Texas at Austin Neural Information Processing Systems December 6, 2018 Joint work with Purnamrita Sarkar and Deepayan Chakrabarti (Poster: Today 10:45 AM – 12:45 PM @ Room 517 AB #114) Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti SVM-cone (Poster: Today 10:45 AM – 12:45 PM @ #114) 1 / 8

Stochastic Blockmodel K − θ T i − n = B Θ T = Θ P !"#$%&'( ./))#0,%1( )&)*&'$+,-$ ,0%&'./00&.%,/0$ Limitations: ◮ Each node belongs to exactly one community ◮ All nodes in the same community have the same expected degree Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti SVM-cone (Poster: Today 10:45 AM – 12:45 PM @ #114) 2 / 8

Extensions of Stochastic Blockmodel ◮ Mixed membership blockmodels (Airoldi et al. 2008) extend this to allow overlap ◮ θ i is a distribution over K communities ◮ Degree-corrected blockmodels (Karrer and Newman 2011) extend this to allow heterogeneous degree distributions ◮ Each node has a degree parameter γ i ◮ There are many other extensions to model the above two properties ◮ DCMMSB (Jin et al., 2017) ◮ OCCAM (Zhang et al. 2014) ◮ SBMO (Kaufmann et al. 2016) Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti SVM-cone (Poster: Today 10:45 AM – 12:45 PM @ #114) 3 / 8

Overlapping clustering model K − θ T γ i i − n = B Θ T = P Γ Θ Γ !"#$""% +,-*)"$% 12((-30)4% &'$'(")"$* 03)"$1233"1)023* ("(."$*/0&* ◮ This covers many well-known overlapping clustering models: � θ i � 1 = 1 DCMMSB � θ i � 2 = 1 OCCAM θ i ∈ { 0 , 1 } K SBMO ◮ The LDA topic model (Blei et al. 2003) is also a special case Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti SVM-cone (Poster: Today 10:45 AM – 12:45 PM @ #114) 4 / 8

Main idea Model Main idea (Zhang et al. 2014) OCCAM k -median on regularized eigenvectors (Kaufmann et al. 2016) SBMO Alternating minimization Finding K corners of a simplex in R K (Mao et al., 2017) MMSB Finding K corners of a simplex in R K − 1 (Jin et al., 2017) DCMMSB Finding K corners of a simplex in R V (Arora et al., 2013) Topic Models This work All Finding extreme rays of a convex cone ◮ Let V ∈ R n × K be the top- K eigenvectors of P ◮ Rows of V form a cone Figure: Each point is a row of V Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti SVM-cone (Poster: Today 10:45 AM – 12:45 PM @ #114) 5 / 8

Main idea Normalize One-class SVM − − − − − − → − − − − − − − − − → ◮ SVM-cone: ◮ Normalize rows v i of V to unit ℓ 2 norm ◮ Each node lies on the intersection of the cone and the unit sphere ◮ Run a one-class SVM = ⇒ support vectors are the corners ◮ Estimate community memberships by regression v i on these corners ◮ This is for the ideal “population” version ◮ Similar ideas provably work for the “empirical” version Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti SVM-cone (Poster: Today 10:45 AM – 12:45 PM @ #114) 6 / 8

Per-node Consistency Guarantees ◮ This one algorithm yields consistency guarantees for ◮ community memberships of each node ◮ most algorithms show guarantees for the whole matrix ◮ for all overlapping clustering models mentioned earlier ◮ Example Per-node consistency guarantee for DCMMSB (informal) If θ i ∼ Dirichlet ( α ), under a broad parameter regime, with high probability, � g � � ˆ θ i − θ i � = ˜ max O √ ρ n , i where g depends on model parameters. Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti SVM-cone (Poster: Today 10:45 AM – 12:45 PM @ #114) 7 / 8

Conclusions ◮ A simple and scalable algorithm Eigendecomposition ⇒ Row-normalize ⇒ One-class SVM ⇒ Regression ◮ infers community memberships for a broad class of overlapping clustering models ◮ with per-node consistency guarantees ◮ Good performance on several large scale real-world datasets. Poster: Today 10:45 AM – 12:45 PM @ Room 517 AB #114 Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti SVM-cone (Poster: Today 10:45 AM – 12:45 PM @ #114) 8 / 8

Overlapping Clustering Models, and One (class) SVM to Bind Them All - PowerPoint PPT Presentation

Overlapping Clustering Models, and One (class) SVM to Bind Them All Xueyu Mao Department of Computer Science The University of Texas at Austin Neural Information Processing Systems December 6, 2018 Joint work with Purnamrita Sarkar and Deepayan

The BIND Software Computer Center, CS, NCTU BIND BIND the Berkeley Internet Name Domain

BIND, from ISC Name Server Round Table ccNSO, ICANN 50 23 June 2014 BIND use cases 2013 BIND

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

BIND configuration Computer Center, CS, NCTU BIND BIND the Berkeley Internet Name

SVM on Intel Graphics Jesse Barnes Intel Open Source Technology Center 1 What is SVM?

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM =

Overview SVM theoretical framework ORACLE data mining technology SVM parameter

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Machine Learning Theory CS 446 1. SVM risk SVM risk Consider the empirical and true/population

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Variational methods for overlapping and non-overlapping stochastic block models Pierre Latouche

Embellishing Group C ON CERTINA BRO CHURE ` Concertina Bind is a revolutionary and stunning

Bindex: Naming, Show scope in Racket via lexical contour s in scope diagrams . Free Variables, and

Monads Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of Computer

System-on-Chip Design Introduc6on Hao Zheng Computer Science & Engineering U of South

61A Lecture 13 Wednesday, September 26 A Function with Behavior That Varies Over Time Let's

The DNS security mess D. J. Bernstein Thanks to: University of Illinois at Chicago NSF

L9: Frontend Abstractions Web Engineering 188.951 2VU SS20 Jrgen Cito L9: Frontend

CSE 116: Fall 2019 Introduction to Functional Programming Environments and closures

CSMC 412 Operating Systems Prof. Ashok K Agrawala Memory Management Online Set 1 March 2020 1

Overlapping Clustering Models, and One (class) SVM to Bind Them All - PowerPoint PPT Presentation

Overlapping Clustering Models, and One (class) SVM to Bind Them All Xueyu Mao Department of Computer Science The University of Texas at Austin Neural Information Processing Systems December 6, 2018 Joint work with Purnamrita Sarkar and Deepayan

The BIND Software Computer Center, CS, NCTU BIND BIND the Berkeley Internet Name Domain

BIND, from ISC Name Server Round Table ccNSO, ICANN 50 23 June 2014 BIND use cases 2013 BIND

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

BIND configuration Computer Center, CS, NCTU BIND BIND the Berkeley Internet Name

SVM on Intel Graphics Jesse Barnes Intel Open Source Technology Center 1 What is SVM?

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM =

Overview SVM theoretical framework ORACLE data mining technology SVM parameter

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Machine Learning Theory CS 446 1. SVM risk SVM risk Consider the empirical and true/population

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Variational methods for overlapping and non-overlapping stochastic block models Pierre Latouche

Embellishing Group C ON CERTINA BRO CHURE ` Concertina Bind is a revolutionary and stunning

Bindex: Naming, Show scope in Racket via lexical contour s in scope diagrams . Free Variables, and

Monads Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of Computer

System-on-Chip Design Introduc6on Hao Zheng Computer Science &amp; Engineering U of South

61A Lecture 13 Wednesday, September 26 A Function with Behavior That Varies Over Time Let's

The DNS security mess D. J. Bernstein Thanks to: University of Illinois at Chicago NSF

L9: Frontend Abstractions Web Engineering 188.951 2VU SS20 Jrgen Cito L9: Frontend

CSE 116: Fall 2019 Introduction to Functional Programming Environments and closures

CSMC 412 Operating Systems Prof. Ashok K Agrawala Memory Management Online Set 1 March 2020 1

System-on-Chip Design Introduc6on Hao Zheng Computer Science & Engineering U of South