Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli - PowerPoint PPT Presentation

Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli with Bart Goethals, Emmanuel Müller and Jilles Vreeken

High Dimensional Data ✓ ✓ ✗ ✗ ? 2

High Dimensional Data 3

Problem Setting • Preserve local neighborhoods • Combine different views on the data • Produce explainable results 7

Transformation CARTIFICATION High Itemset Dimensional (Transaction) DB DB Clustering FIM Subspace Clusters Frequent Patterns 8

Cartification 9

Cartification 10

Cartification 11

Cartification 12

Cartification 13

Cartification 14

Frequent Itemset Mining Cartified DB ? Original DB FIs 15

Cartification • Frequent Itemset Mining solves our problem. • It is not scalable. 16

Take 2 17

Take 2 ? 18

Uniform vs. Clusters 19

Running example Dim 1 ✓ 20

Running example Dim 2 ✓ Dim 1 ? 21

Running example ✓ Dim 1 ✗ ? Dim 2 ✗ ? Dim 3 ✓ ? Dim 4 22

Experiments 1.0 0.8 0.6 F1 Score 0.4 0.2 0.0 Our Method CartiClus FIRES PROCLUS STATPC SUBCLUE 1 2 4 8 16 32 100 200 23

Experiments 10000 1000 Run Time (seconds) 100 10 1 0 S1500 S2500 S3500 S4500 S5500 Our Method CartiClus FIRES PROCLUS STATPC SUBCLU 24

Experiments 10000 1000 Run Time (seconds) 100 10 1 0 D5 D10 D15 D25 D50 D75 Our Method CartiClus FIRES PROCLUS STATPC SUBCLU 25

Real World – MovieLens Star Wars: A New Hope (a.k.a. Star Wars) (1977) Star Wars: The Empire Strikes Back (1980) Star Wars: Return of the Jedi (1983) LotR: The Fellowship of the Ring, The (2001) LotR: The T wo T owers, The (2002) LotR: The Return of the King, The (2003) Back to the Future (1985) T erminator, The (1984) T erminator 2: Judgment Day (1991) Die Hard (1988) T erminator, The (1984) T erminator 2: Judgment Day (1991) Usual Suspects, The (1995) Pulp Fiction (1994) Silence of the Lambs, The (1991) 26

Real World - Movielens Star Wars: A New Hope (1977) Brazil (1985) Star Wars: The Empire Strikes Back (1980) Dr. Strangelove (1964) Star Wars: Return of the Jedi (1983) Clockwork Orange, A (1971) LotR: The Fellowship of the Ring, The (2001) 2001: A Space Odyssey (1968) LotR: The T wo T owers, The (2002) Blade Runner (1982) LotR: The Return of the King, The (2003) Alien (1979) Chinatown (1974) Third Man, The (1949) Rear Window (1954) Citizen Kane (1941) North by Northwest (1959) Godfather: Part II, The (1974) Vertigo (1958) Chinatown (1974) Psycho (1960) Godfather, The (1972) Silence of the Lambs, The (1991) T axi Driver (1976) 27

Conclusion • Preserves neighborhood information • Combines different similarity measures gracefully • Finds relevant features and discards noise • Fast • Produce explainable results → Code and the data is available at our website. Thank you! 28

Real World – Gene Expression Alon Nutt Our method 0.78 0.78 PROCLUS 0.46 0.49 FIRES 0.52 0.55 SUBCLU 0.58 n/a STATPC n/a n/a CartiClus n/a n/a # of Objects 62 50 # of Dims 2000 1377 29

More Experiments 1 0.9 0.8 0.7 0.6 F1 Score 0.5 0.4 0.3 0.2 0.1 0 S1500 S2500 S3500 S4500 S5500 Our Method CartiClus FIRES PROCLUS STATPC SUBCLU 30

More Experiments 1 0.9 0.8 0.7 F1 Score 0.6 0.5 0.4 0.3 0.2 0.1 0 D05 D10 D15 D25 D50 D75 Our Method CartiClus FIRES PROCLUS STATPC SUBCLU 31

Experiments • Evaluate: - Subspace cluster detection - Noise Robustness - Scalability • Competitors: - Subspace clustering: PROCLUS - Clustering: K-Means - Dimensionality Reduction: PCA and Random Projection - Clustering Ensemble: CSPA 32

Results 1 Quality of the found clusters 0.8 0.6 10 clusters in 10 dimensions 0.4 200 irrelevant dimensions 0.2 0 F1 E4SC Our Method CSPA Proclus K-Means PCA+KM RP+KM 33

Results 1 Quality of the found clusters 0.8 0.6 10 clusters in 10 dimensions 0.4 200 irrelevant dimensions 0.2 0 F1 E4SC Our Method CSPA Proclus K-Means PCA+KM RP+KM • Very effective on finding relevant dimensions. 34

Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli - PowerPoint PPT Presentation

Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli with Bart Goethals, Emmanuel Mller and Jilles Vreeken High Dimensional Data ? 2 High Dimensional Data 3 High Dimensional Data 4 High Dimensional

Subspace Polynomials and Cyclic Subspace Codes Netanel Raviv Joint work with: Prof. Tuvi Etzion

NEIGHBORHOODS Strategies to stabilize CLEVELAND MIDDLE NEIGHBORHOODS Middle Neighborhoods Field

Very Large Scale Neighborhoods Weighted Matching Neighborhoods Cyclic Exchange Neighborhoods

Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013

Internet Server Clusters Internet Server Clusters Jeff Chase Duke University, Department of

I nternational research The evidence on clusters is clear Firms located in clusters are more

Subspace Embeddings for Regression Lecture 12 October 1, 2020 Chandra (UIUC) CS498ABD 1 Fall

Subspace Embeddings and p -Regression Using Exponential Random Variables David P. Woodruff

Cyclic Subspace Codes Via Subspace Polynomials Kamil Otal and Ferruh zbudak Middle East

Subspace Modeling and Selection Subspace Modeling and Selection for Noisy Speech Recognition for

Subspace Information Criterion Subspace Information Criterion for Image Restoration for Image

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Math 211 Math 211 Lecture #21 Determinants October 16, 2002 2 Basis of a Subspace Basis of a

Distance-based Methods: Drawbacks Hard to find clusters with irregular shapes Hard to

CITY OF HOUSTON NEIGHBORHOOD ACADEMY Department of Neighborhoods Quality of Life Committee 1

Choice Neighborhoods Initiative Julian Marsh & Rico Correia Housing Authority of the City

Type-based termination analysis with disjunctive invariants Dimitrios Vytiniotis, MSR Cambridge

HTTPS Token Binding & TLS Termination Brian Campbell IETF 97 Seoul November 2016 1

Lec07: Return-oriented programming Taesoo Kim 2 Scoreboard 3 Administrivia Please submit

Linux Terminal Server Project so far we have examined and and the development model what about

OpenSSLNTRU: experiences integrating a post-quantum KEM into TLS 1.3 via an OpenSSL ENGINE

August 4, 1997: Skynet goes online August 29, 1997, 2:14am ET: Skynet gains consciousness

Mohamed Thahir Traditional and Open Relation Extraction Read the Web Relation Extraction

I would like to know (empirically) Bertrand Meyer (SEAFOOD)

Sambuz

Useful Links

Newsletter

Mail Us

Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli - PowerPoint PPT Presentation

Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli with Bart Goethals, Emmanuel Mller and Jilles Vreeken High Dimensional Data ? 2 High Dimensional Data 3 High Dimensional Data 4 High Dimensional

Subspace Polynomials and Cyclic Subspace Codes Netanel Raviv Joint work with: Prof. Tuvi Etzion

NEIGHBORHOODS Strategies to stabilize CLEVELAND MIDDLE NEIGHBORHOODS Middle Neighborhoods Field

Very Large Scale Neighborhoods Weighted Matching Neighborhoods Cyclic Exchange Neighborhoods

Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013

Internet Server Clusters Internet Server Clusters Jeff Chase Duke University, Department of

I nternational research The evidence on clusters is clear Firms located in clusters are more

Subspace Embeddings for Regression Lecture 12 October 1, 2020 Chandra (UIUC) CS498ABD 1 Fall

Subspace Embeddings and p -Regression Using Exponential Random Variables David P. Woodruff

Cyclic Subspace Codes Via Subspace Polynomials Kamil Otal and Ferruh zbudak Middle East

Subspace Modeling and Selection Subspace Modeling and Selection for Noisy Speech Recognition for

Subspace Information Criterion Subspace Information Criterion for Image Restoration for Image

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Math 211 Math 211 Lecture #21 Determinants October 16, 2002 2 Basis of a Subspace Basis of a

Distance-based Methods: Drawbacks Hard to find clusters with irregular shapes Hard to

CITY OF HOUSTON NEIGHBORHOOD ACADEMY Department of Neighborhoods Quality of Life Committee 1

Choice Neighborhoods Initiative Julian Marsh &amp; Rico Correia Housing Authority of the City

Type-based termination analysis with disjunctive invariants Dimitrios Vytiniotis, MSR Cambridge

HTTPS Token Binding &amp; TLS Termination Brian Campbell IETF 97 Seoul November 2016 1

Lec07: Return-oriented programming Taesoo Kim 2 Scoreboard 3 Administrivia Please submit

Linux Terminal Server Project so far we have examined and and the development model what about

OpenSSLNTRU: experiences integrating a post-quantum KEM into TLS 1.3 via an OpenSSL ENGINE

August 4, 1997: Skynet goes online August 29, 1997, 2:14am ET: Skynet gains consciousness

Mohamed Thahir Traditional and Open Relation Extraction Read the Web Relation Extraction

I would like to know (empirically) Bertrand Meyer (SEAFOOD)

Sambuz

Useful Links

Newsletter

Mail Us

Choice Neighborhoods Initiative Julian Marsh & Rico Correia Housing Authority of the City

HTTPS Token Binding & TLS Termination Brian Campbell IETF 97 Seoul November 2016 1