Efficient Distribution Mining and Classification Yasushi Sakurai - PowerPoint PPT Presentation

Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei Li (Carnegie Mellon University), Christos Faloutsos (Carnegie Mellon University)

Classification for Distribution Data Sets l Given n distributions ( n multi-dimensional vector sets) With a portion of them labeled and others unlabeled l l Classify unlabeled distributions into the right group Ex. Distr. #1 and Distr. #2 fall into the same group l Distribution #1 Distribution #2 Distribution #3 (unknown) (walking) (jumping) 2

Scenario 1 l Marketing research for e-commerce Vectors: l l orders by each customer l Time the customer spent browsing l Number of pages the customer browsed l Number of items the customer bought l Sales price l Number of visits by each customer Distributions: customers l Classification: identify customer groups who carry similar traits l Find distribution groups to do market segmentation, rule l discovery and spot anomalies l E.g., “Design an advertisement for each customer categories” 3

Scenario 2 l User analysis for SNS systems (e.g., blog hosting service) Vectors: l l internet habits by each participant l Number of blog entries for every topic l Length of entries for every topic l Number of links of entries for every topic l Number of hours spent online Distributions: SNS participants l Classification: identify participant groups who have similar l internet habits Find distribution groups to facilitate community creation l l E.g., “Create communities according to users’ interests” 4

Representing Distributions l Histograms l Easy to be updated incrementally l Used in this work l Another option: probability density function 40 35 30 25 20 15 6 10 4 5 0 2 0 1 2 3 0 4 5 6 7 5

Background l Kullback-Leibler divergence Measures the natural distance difference from one l probability distribution P to another arbitrary probability distribution Q. æ ö ) ò p ( ç ÷ = × d P , Q p log x dx ç ÷ KL x q è ø x ( ) ( ) ¹ d P , Q d Q , P One undesirable property: l KL KL l Symmetric KL-divergence æ ö æ ö p q ( ) ò ò ç ÷ ç ÷ = × + × d P , Q p log x dx q log x dx ç ÷ ç ÷ SKL x x q p è ø è ø x x æ ö p ò ç ÷ = - × ( p q ) log x dx ç ÷ x x q è ø x 6

Proposed Solution l Naïve approach l Create histogram for each distribution of data l Compute the KL divergence directly from histograms p i and q i l Use any data mining method l E.g., classification, clustering, outlier detection Group 1 40 35 30 25 Group 2 20 15 6 10 4 5 0 2 0 1 2 3 4 0 5 6 7 Group 3 Distribution data Histograms Groups 7

Proposed Solution l DualWavelet (wavelet-based approach) Create histogram for each distribution of data l Represent each histogram p i as and using wavelets wp ˆ w p l i i wp : the wavelet of p i l i w ˆ p : the wavelet of log ( p i ) l i Reduce number of wavelets by selecting c coefficients with the l highest energy ( c << m ) Compute the KL divergence from the wavelets l Use any data mining method l l E.g., classification, clustering, outlier detection Group 1 Wavelet Group 2 40 35 highest c 30 25 20 15 6 coefficients 10 4 Group 3 5 0 2 0 1 2 3 0 4 5 6 7 Distribution data Histograms Groups 8

DualWavelet l Theorem 1 l Let wp wq and be the wavelet of p i and q i resp. i i ˆ and ˆ be the wavelet of log( p i ) and log( q i ) resp. w p w q i i l We have m: # of bins of a histogram æ ö p å m ç ÷ c: # of wavelet coefficients = - × d ( P , Q ) ( p q ) log i ç ÷ SKL i i = i 1 q è ø i å m = - × - ( p q ) (log p log q ) i i i i = i 1 KL divergence ( ) ( ) æ ö can be 2 2 - + - ˆ ˆ wp w q wq w p 1 å ç ÷ c computed i i i i = × ç ÷ ( ) ( ) from wavelets = 2 i 1 2 2 - - - - ˆ ˆ wp w p wq w q è ø i i i i 9

Time Complexity l Naïve method for the nearest neighbor classification O( mnt ) time l l n : # of input distributions, m : # of grid cells l t : # of distributions in the training data set l DualWavelet l Wavelet transform: O( mn ) l Classification: O( nt ) l Since c (# of wavelet coefficients we use) is a small constant value 10

Space Complexity l Naïve method for the nearest neighbor classification O( mt ) space l l m : # of grid cells l t : # of distributions in the training data set l DualWavelet l Wavelet transform: O( m ) l Classification: O( t ) l Since c (# of wavelet coefficients we use) is a small constant value 11

GEM: Optimal grid-side selection l Optimal granularity of histogram s Optimal number of segments provides good accuracy l opt Plus reasonable computation cost l Proposed normalized KL divergence (GEM criterion) l d ( P , Q ) = C ( P , Q ) SKL s + H ( P ) H ( Q ) S S s Choose that maximizes the pairwise criteria opt l = S ( P , Q ) arg max ( C ( P , Q )) opt s s s () Obtain for every sampled pair, then choose the maximum l opt opt = S max s ( P , Q ) opt all ( P , Q ) pairs

Experiments l Gaussians n=4,000 distributions, each with l 10,000 points (dimension d=3) Mixture of Gaussians (1, 2, 2 d , (2 d +1)) l Same means, but different variances l for each class l MoCap n=58 real running, jumping and l walking motions (d=93) Each dimension corresponds to the x, l y, or z position of a body joint Dimensionality reduction by using l SVD (d=2) 13

Classification (MoCap) l Confusion matrix for classification Jumping recovered J W R correct Jumping 3 0 0 Walking 0 22 1 Walking Running 0 1 19 Running 14

Computation Cost (Gaussians) l NaïveDense, which uses all histogram buckets l NaïveSparse, which uses only selected buckets (largest values) l DualWavelet achieves a dramatic reduction in computation time 15

Approximation Quality l Scatter plot of computation cost vs. approximation quality l Trade-off between quality and cost l DualWavelet gives significantly lower approximation error, for the same computation time 16

Conclusions l Addressed the problem of distribution classification, in general, distribution mining l Proposed a fast and effective method to solve it l Proposed to use wavelets on both the histograms, as well as their logarithms l Solution can be applied to large datasets with multi- dimensional distributions l Experiments show that DualWavelet is significantly faster than the naïve implementation (up to 400 times) 17

Efficient Distribution Mining and Classification Yasushi Sakurai - PowerPoint PPT Presentation

Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei Li (Carnegie Mellon University), Christos Faloutsos (Carnegie Mellon University)

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Data Mining 2020 Text Classification Naive Bayes Ad Feelders Universiteit Utrecht Ad Feelders

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Mining Sentiment Mining Sentiment Classification from Classification from Political Web Logs

Efficient Mining of Dissociation Rules Mikoaj Morzy 7 th International Conference DaWaK 2006

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Web Mining Web Mining to automatically discover and extract information from Web

Perspectives of wavelet bases in simulation of lattice theories 1 Mikhail V. Altaisky Space

Op#mizing error for workloads of queries CompSci 590.03

Wavelet-Powered Neural Networks for Turbulence Dr. Arvind T. Mohan Postdoctoral Researcher

Multiresolution analysis & wavelets (quick tutorial) Application : image modeling Andr

Adaptive waveletGalerkin methods: Algorithm & Applications Rob Stevenson Korteweg-de

Particle-in-Wavelet scheme for the 1D Vlasov-Poisson equations Romain Nguyen van yen, Kai

!""#$%&'()*%+$),' -.,")/)0%1/$2+' 34'5,6,$1%'7$&&%,'8)/9.:';/%%+' '

Continuous Wavelet Transform in Quantum Field Theory 1 Mikhail V. Altaisky 1 Natalia E.Kaputkina 2

Efficient Distribution Mining and Classification Yasushi Sakurai - PowerPoint PPT Presentation

Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei Li (Carnegie Mellon University), Christos Faloutsos (Carnegie Mellon University)

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Data Mining 2020 Text Classification Naive Bayes Ad Feelders Universiteit Utrecht Ad Feelders

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Mining Sentiment Mining Sentiment Classification from Classification from Political Web Logs

Efficient Mining of Dissociation Rules Mikoaj Morzy 7 th International Conference DaWaK 2006

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Web Mining Web Mining to automatically discover and extract information from Web

Perspectives of wavelet bases in simulation of lattice theories 1 Mikhail V. Altaisky Space

Op#mizing error for workloads of queries CompSci 590.03

Wavelet-Powered Neural Networks for Turbulence Dr. Arvind T. Mohan Postdoctoral Researcher

Multiresolution analysis &amp; wavelets (quick tutorial) Application : image modeling Andr

Adaptive waveletGalerkin methods: Algorithm &amp; Applications Rob Stevenson Korteweg-de

Particle-in-Wavelet scheme for the 1D Vlasov-Poisson equations Romain Nguyen van yen, Kai

!&quot;&quot;#$%&amp;'()*%+$),' -.,&quot;)/)0%1/$2+' 34'5,6,$1%'7$&amp;&amp;%,'8)/9.:';/%%+' '

Continuous Wavelet Transform in Quantum Field Theory 1 Mikhail V. Altaisky 1 Natalia E.Kaputkina 2

Multiresolution analysis & wavelets (quick tutorial) Application : image modeling Andr

Adaptive waveletGalerkin methods: Algorithm & Applications Rob Stevenson Korteweg-de

!""#$%&'()*%+$),' -.,")/)0%1/$2+' 34'5,6,$1%'7$&&%,'8)/9.:';/%%+' '