Roadmap Roadmap Distributed Data Mining (DDM) Distributed Data - - PDF document

roadmap roadmap
SMART_READER_LITE
LIVE PREVIEW

Roadmap Roadmap Distributed Data Mining (DDM) Distributed Data - - PDF document

Distributed Data Mining for Distributed Data Mining for Pervasive and Privacy- -Sensitive Sensitive Pervasive and Privacy Applications Applications Hillol Kargupta Hillol Kargupta Dept. of Computer Science and Electrical Engg Engg, ,


slide-1
SLIDE 1

1

Distributed Data Mining for Distributed Data Mining for Pervasive and Privacy Pervasive and Privacy-

  • Sensitive

Sensitive Applications Applications

Hillol Hillol Kargupta Kargupta

  • Dept. of Computer Science and Electrical
  • Dept. of Computer Science and Electrical Engg

Engg, , University of Maryland Baltimore County University of Maryland Baltimore County

http:// http://www.cs.umbc.edu/~hillol www.cs.umbc.edu/~hillol hillol@cs.umbc.edu hillol@cs.umbc.edu

Roadmap Roadmap

■ ■ Distributed Data Mining (DDM)

Distributed Data Mining (DDM)

■ ■ Pervasive and Privacy

Pervasive and Privacy-

  • Sensitive

Sensitive Applications of DDM Applications of DDM

■ ■ Dealing with ensemble of data mining

Dealing with ensemble of data mining models models

■ ■ Linear representations for advanced

Linear representations for advanced meta meta-

  • level analysis of models

level analysis of models

■ ■ Conclusions

Conclusions

slide-2
SLIDE 2

2

Distributed Data Mining (DDM) Distributed Data Mining (DDM)

■ ■ Distributed resources

Distributed resources

– – data data – – Computation and communication Computation and communication – – users users

■ ■ Data mining by properly exploiting the

Data mining by properly exploiting the distributed resources distributed resources

Distributed Resources and DDM Distributed Resources and DDM

■ ■ Distributed compute nodes connected by first

Distributed compute nodes connected by first communication network communication network

– – Partition data if necessary and distribute Partition data if necessary and distribute computation computation

■ ■ Inherently distributed data that may not be

Inherently distributed data that may not be collected to a single site or re collected to a single site or re-

  • partitioned

partitioned

– – Connected by limited bandwidth network Connected by limited bandwidth network – – Privacy Privacy-

  • sensitive data

sensitive data

slide-3
SLIDE 3

3

  • Vehicle Health Monitoring Systems

Vehicle Health Monitoring Systems

  • Collect and analyze vehicle related

Collect and analyze vehicle related information. information.

  • On

On-

  • board/

board/in situ in situ data analysis data analysis

  • Send out interesting patterns

Send out interesting patterns

  • Analyze data for the entire fleet

Analyze data for the entire fleet

  • UMBC fleet operations management

UMBC fleet operations management

Pervasive Applications: UMBC Fleet Pervasive Applications: UMBC Fleet Health Monitoring Health Monitoring

Continued… Continued…

■ ■ Onboard real

Onboard real-

  • time

time vehicle vehicle-

  • mining

mining system over a wireless system over a wireless network network

slide-4
SLIDE 4

4

Pervasive Applications: Pervasive Applications: MobiMine MobiMine

■ ■ MobiMine

MobiMine System: A System: A mobile data stream mobile data stream mining system for mining system for monitoring financial monitoring financial data data

DDM from NASA EOS Distributed Data DDM from NASA EOS Distributed Data Repositories Repositories

slide-5
SLIDE 5

5

Mining from Distributed Privacy Mining from Distributed Privacy-

  • Sensitive Data

Sensitive Data

■ ■ Analyze data without moving the data

Analyze data without moving the data in its original form. in its original form.

■ ■ Many DDM algorithms are privacy

Many DDM algorithms are privacy-

  • friendly since they minimize data

friendly since they minimize data communication. communication.

Distributed Data Mining Distributed Data Mining

Local mining

Analysis & filtering

Models/patterns and filtered data Aggregation and analysis of models/patterns

Local mining

Analysis & filtering

Site 1 Site 2 Central site

slide-6
SLIDE 6

6

Ensemble of Classifiers and Clusters Ensemble of Classifiers and Clusters

Weighted Sum

f1(x) ai f2(x) f3(x) fn(x) f(x) = ∑i fi(x)

ai : weight for the i-th base classifier fi(x) : output of the i-th classifier

Discrete Structures for Data Mining Discrete Structures for Data Mining Models Models

■ ■ Trees, in general Graphs are popular choices

Trees, in general Graphs are popular choices for data mining models: for data mining models:

– – Decision trees (Tree) Decision trees (Tree) – – Neural networks (Graph) Neural networks (Graph) – – Graphical models (Graph) Graphical models (Graph) – – Clusters (Graph, Clusters (Graph, hypergraph hypergraph) )

■ ■ Dealing with ensembles requires an algebraic

Dealing with ensembles requires an algebraic framework. framework.

slide-7
SLIDE 7

7

Examples Examples

■ ■ Eigen

Eigen analysis of graphs: analysis of graphs:

– – Graphs can be represented using matrices Graphs can be represented using matrices – – Eigen Eigen analysis of the analysis of the Laplacian Laplacian of graphs (Chung,

  • f graphs (Chung,

1997). 1997).

■ ■ Wavelet, Fourier, or other representations of

Wavelet, Fourier, or other representations of discrete structures?? discrete structures??

Decision Trees as Functions Decision Trees as Functions

■ ■ Decision tree can be viewed as a numeric function.

Decision tree can be viewed as a numeric function.

Strong Weak Rain

Outlook Humidity Wind

Sunny Overcast

No No Yes Yes Yes

High Normal

Outlook Humidity Wind

2 1

1 1 1

1 1

slide-8
SLIDE 8

8

Fourier Representation of a Decision Tree

Fourier Coefficient (FC) Fourier Basis Function partition f(x) = ∑j wj Ψj(x)

Outlook Humidity Wind

2 1

1 1 1

1 1

Fourier Basis Fourier Basis

Ξ

Ψ =

j

(x) w f(x)

j j

x . j j

(-1) (x) = Ψ

j j-

  • th

th Fourier basis function, Fourier basis function, w wj

j is the corresponding Fourier coefficient;

is the corresponding Fourier coefficient;

Ψ =

x j j

(x) ) f( N 1 w x

j, x j, x ∈ ∈ { {0, 1 0, 1} }l

l

slide-9
SLIDE 9

9

Partitions Partitions

A partition A partition j j is an is an l l-

  • bit

bit boolean boolean string. string. It can also be viewed as a subset of variables. It can also be viewed as a subset of variables. Example: Example: Partition 101 Partition 101 ⇒ ⇒ {x {x1

1, x

, x2

2} contains the features

} contains the features associated with locations indicated by the 1 associated with locations indicated by the 1-

  • s in the

s in the partition. partition. Order Order of a partition = the number 1

  • f a partition = the number 1-
  • s in a partition.

s in a partition.

Fourier Spectrum of a Decision Tree Fourier Spectrum of a Decision Tree

■ ■ Very sparse representation; polynomial number of

Very sparse representation; polynomial number of non non-

  • zero coefficients. If k is the depth then all

zero coefficients. If k is the depth then all coefficients involving more than k features are zero. coefficients involving more than k features are zero.

■ ■

Higher order coefficients are exponentially smaller Higher order coefficients are exponentially smaller compared to the low order coefficients ( compared to the low order coefficients (Kushlewitz Kushlewitz and and Mansour Mansour, 1990; Park, , 1990; Park, Kargupta Kargupta, 2001). , 2001).

■ ■ Can be approximated by the low order coefficients

Can be approximated by the low order coefficients with significant magnitude. with significant magnitude.

■ ■ Further details in [

Further details in [Linial Linial, , Mansour Mansour, Nisan, 89], [Park, , Nisan, 89], [Park, Ayyagari Ayyagari Kargupta Kargupta 01’], [ 01’], [Kargupta Kargupta et al. 2001]. et al. 2001].

slide-10
SLIDE 10

10

Exponential Decay of FCs Exponential Decay of FCs ( (S&P 500 Index Data S&P 500 Index Data) )

Compression Compression

Sufficient spectrum (99% energy) Energy preserved in the Lower Order Coefficients

slide-11
SLIDE 11

11

Fourier Spectrum and Decision Trees Fourier Spectrum and Decision Trees

Decision Tree Decision Tree Fourier Spectrum Fourier Spectrum

■ ■ Developed efficient algorithms to

Developed efficient algorithms to

– – Compute Fourier spectrum of decision tree Compute Fourier spectrum of decision tree

(IEEE TKDE, SIAM Data Mining Conf., IEEE Data Mining Conf, ACM S (IEEE TKDE, SIAM Data Mining Conf., IEEE Data Mining Conf, ACM SIGKDD IGKDD Explorations) Explorations)

– – Compute tree from the Fourier spectrum Compute tree from the Fourier spectrum

(DMKD, SIGMOD 2002) (DMKD, SIGMOD 2002)

Aggregation of Multiple Decision Aggregation of Multiple Decision Trees Trees

■ ■ Weighted average of decision trees through Fourier

Weighted average of decision trees through Fourier analysis analysis

F3(x) = F3(x) = Σ Σw wj

j ψ

ψj

j (x)

(x) F2(x) = F2(x) = Σ Σw wj

j ψ

ψj

j (x)

(x) F1(x) = F1(x) = Σ Σw wj

j ψ

ψj

j (x)

(x) F(x) = a1* F(x) = a1*F1(x) + a2*F2(x) + a3*F3(x) F1(x) + a2*F2(x) + a3*F3(x)

slide-12
SLIDE 12

12

Visualization of Decision Trees Visualization of Decision Trees

■ ■

FC are color FC are color-

  • coded in accordance to the magnitude.

coded in accordance to the magnitude.

■ ■

Brighter spots are more significant coefficients. Brighter spots are more significant coefficients.

■ ■

On clicking, partition corresponding to the coefficient On clicking, partition corresponding to the coefficient is displayed. is displayed.

PCA PCA-

  • Based Visualization of Decision Trees

Based Visualization of Decision Trees

−0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 2nd Principal Component 1st Principal Component

slide-13
SLIDE 13

13

Redundancy Reduction: Orthogonal Redundancy Reduction: Orthogonal Decision Trees Decision Trees

1 1 1 1

  • 1

1 1 1 1 1

  • 1

1

  • 1

1

  • 1

1

  • 1

1

  • 1

1 1 1 1 1

  • 1

1 1 1 1 1 1 1 1 1 1 1 1 1

  • 1

1 1 1

  • 1

1 1 1

  • 1

1

  • 1

1

  • 1

1

  • 1

1

  • 1

1

  • 1

1 1 1

  • 1

1 1 1

Tree1 Tree2 Tree3 Tree4 Tree1 Tree2 Tree3 Tree4

1 1

  • 1

1 1 1 1 1 1 1

  • 1

1

  • 1

1 1 1

True output True output

  • f the target
  • f the target

function function

All domain All domain members members Matrix D Matrix D

PCA PCA-

  • Based Redundancy Reduction

Based Redundancy Reduction

■ ■ Trees may share underlying redundancy.

Trees may share underlying redundancy.

■ ■ Perform PCA; the eigenvectors tell us how to

Perform PCA; the eigenvectors tell us how to combine the trees for creating a basis set. combine the trees for creating a basis set.

■ ■ Problems:

Problems:

1) Impractical, D is very very large for most applications. 1) Impractical, D is very very large for most applications. 2) You only get the weights of the base classifiers. 2) You only get the weights of the base classifiers. ■ ■ Approximating D over the training data (

Approximating D over the training data (Merz Merz and and Pazzani Pazzani, 1999). , 1999).

slide-14
SLIDE 14

14

Inner Product of Decision Trees and Inner Product of Decision Trees and Fourier Transformation Fourier Transformation

■ ■ Inner product between trees f1(x) and f2(x) :

Inner product between trees f1(x) and f2(x) :

■ ■ If W1 and W2 are the corresponding Fourier

If W1 and W2 are the corresponding Fourier spectra then: spectra then:

>= <

x

f1(x)f2(x) f2(x) f1(x), > >=< < w2 , w1 f2(x) f1(x),

Inner Product Matrices Inner Product Matrices

(a) Between Trees (a) Between Trees (b) Between the Fourier Spectra (b) Between the Fourier Spectra

slide-15
SLIDE 15

15

The Fourier Spectra Matrix The Fourier Spectra Matrix

■ ■ Consider W, where

Consider W, where W Wi,j

i,j is the Fourier

is the Fourier coefficient of the coefficient of the i i-

  • th

th basis from the spectrum basis from the spectrum

  • f the tree
  • f the tree T

Tj

j.

.

■ ■ W

WT

TW and D

W and DT

TD are identical.

D are identical.

■ ■ W is a smaller matrix compared to D.

W is a smaller matrix compared to D.

■ ■ So we can efficiently compute the

So we can efficiently compute the eigenvectors using W eigenvectors using WT

TW.

W.

Conclusions Conclusions

■ ■ Distributed data mining appears interesting

Distributed data mining appears interesting for pervasive and privacy for pervasive and privacy-

  • sensitive

sensitive applications. applications.

■ ■ We need meta

We need meta-

  • level techniques to analyze

level techniques to analyze aggregate the data mining models: aggregate the data mining models:

– – Stability of models/ensembles Stability of models/ensembles – – Detecting changes in the model distribution Detecting changes in the model distribution – – Many other issues…. Many other issues….

slide-16
SLIDE 16

16

Advertisement Advertisement

■ ■ IEEE Transactions on System, Men,

IEEE Transactions on System, Men, Cybernatics Cybernatics, Part B, Special Issue on , Part B, Special Issue on Distributed and Mobile Data Mining Distributed and Mobile Data Mining

■ ■ Deadline: January 1, 2003.

Deadline: January 1, 2003.

http:// http://www.cs.umbc.edu/~hillol/DKD/smcb_dmdm.html www.cs.umbc.edu/~hillol/DKD/smcb_dmdm.html

Hillol Hillol Kargupta Kargupta

■ ■

Hillol Hillol Kargupta Kargupta is an Assistant Professor in the Department of Computer Science is an Assistant Professor in the Department of Computer Science and and Electrical Engineering, University of Maryland Baltimore County. Electrical Engineering, University of Maryland Baltimore County. He received his Ph.D. in He received his Ph.D. in Computer Science from University of Illinois at Urbana Computer Science from University of Illinois at Urbana-

  • Champaign in 1996. He is also a co

Champaign in 1996. He is also a co-

  • founder of

founder of Agnik Agnik LLC, a ubiquitous data intelligence company. His research inter LLC, a ubiquitous data intelligence company. His research interests include ests include mobile and distributed data mining, computation in gene expressi mobile and distributed data mining, computation in gene expression, and genetic

  • n, and genetic

algorithms. algorithms. Dr.

  • Dr. Kargupta

Kargupta won a National Science Foundation (NSF) CARRER award in 2001 fo won a National Science Foundation (NSF) CARRER award in 2001 for his r his research on ubiquitous and distributed data mining. His research research on ubiquitous and distributed data mining. His research is also funded by several is also funded by several

  • ther grants from NSF and NASA. He also received support from th
  • ther grants from NSF and NASA. He also received support from the TRW Research

e TRW Research Foundation, American Cancer Society, US Department of Energy, an Foundation, American Cancer Society, US Department of Energy, and Caterpillar. He won d Caterpillar. He won the 1997 Los Alamos Award for Outstanding Technical Achievement. the 1997 Los Alamos Award for Outstanding Technical Achievement. His dissertation earned His dissertation earned him the 1996 Society for Industrial and Applied Mathematics (SIA him the 1996 Society for Industrial and Applied Mathematics (SIAM) annual best student M) annual best student paper prize. He has published more than fifty peer paper prize. He has published more than fifty peer-

  • reviewed articles in journals,

reviewed articles in journals, conferences, and books. He is the distributed data mining consul conferences, and books. He is the distributed data mining consultant for DaimlerChrysler. tant for DaimlerChrysler. He is the primary editor of a book entitled "Advances in Distrib He is the primary editor of a book entitled "Advances in Distributed and Parallel Knowledge uted and Parallel Knowledge Discovery", AAAI/MIT Press. His other recent activities include Discovery", AAAI/MIT Press. His other recent activities include hosting the ACM SIGKDD hosting the ACM SIGKDD-

  • 2000 workshop on Distributed and Parallel Knowledge Discovery (D

2000 workshop on Distributed and Parallel Knowledge Discovery (DPKD), KDD PKD), KDD-

  • 98 workshop

98 workshop

  • n distributed data mining, a special issue on DPKD in
  • n distributed data mining, a special issue on DPKD in Knowledge and Information Systems

Knowledge and Information Systems Journal

  • Journal. He is the co

. He is the co-

  • chair of the IJCAI

chair of the IJCAI-

  • 2001 Workshop on Wrappers for Performance

2001 Workshop on Wrappers for Performance Enhancement in Knowledge Discovery in Databases. He is in the pr Enhancement in Knowledge Discovery in Databases. He is in the program/organizing

  • gram/organizing

committee for the 2001 & 2002 SIAM Data Mining Conference and th committee for the 2001 & 2002 SIAM Data Mining Conference and the 2001 ACM SIGKDD e 2001 ACM SIGKDD Conference among several others. He is also the co Conference among several others. He is also the co-

  • chair of a workshop on ubiquitous data

chair of a workshop on ubiquitous data mining in PKDD mining in PKDD-

  • 2001. More information about him can be found at
  • 2001. More information about him can be found at

http:// http://www.cs.umbc.edu/~hillol www.cs.umbc.edu/~hillol. .