Roadmap Roadmap Distributed Data Mining (DDM) Distributed Data - PDF document

Distributed Data Mining for Distributed Data Mining for Pervasive and Privacy- -Sensitive Sensitive Pervasive and Privacy Applications Applications Hillol Kargupta Hillol Kargupta Dept. of Computer Science and Electrical Engg Engg, , Dept. of Computer Science and Electrical University of Maryland Baltimore County University of Maryland Baltimore County http://www.cs.umbc.edu/~hillol www.cs.umbc.edu/~hillol http:// hillol@cs.umbc.edu hillol@cs.umbc.edu Roadmap Roadmap ■ Distributed Data Mining (DDM) Distributed Data Mining (DDM) ■ ■ Pervasive and Privacy Pervasive and Privacy- -Sensitive Sensitive ■ Applications of DDM Applications of DDM ■ Dealing with ensemble of data mining Dealing with ensemble of data mining ■ models models ■ Linear representations for advanced Linear representations for advanced ■ meta meta- -level analysis of models level analysis of models ■ Conclusions Conclusions ■ 1

Distributed Data Mining (DDM) Distributed Data Mining (DDM) ■ Distributed resources Distributed resources ■ – data data – – Computation and communication – Computation and communication – users – users ■ Data mining by properly exploiting the Data mining by properly exploiting the ■ distributed resources distributed resources Distributed Resources and DDM Distributed Resources and DDM ■ Distributed compute nodes connected by first Distributed compute nodes connected by first ■ communication network communication network – Partition data if necessary and distribute – Partition data if necessary and distribute computation computation ■ Inherently distributed data that may not be Inherently distributed data that may not be ■ collected to a single site or re- -partitioned partitioned collected to a single site or re – Connected by limited bandwidth network – Connected by limited bandwidth network – Privacy Privacy- -sensitive data sensitive data – 2

Pervasive Applications: UMBC Fleet Pervasive Applications: UMBC Fleet Health Monitoring Health Monitoring • Vehicle Health Monitoring Systems • Vehicle Health Monitoring Systems • • Collect and analyze vehicle related Collect and analyze vehicle related information. information. • • On On- -board/ board/ in situ in situ data analysis data analysis • • Send out interesting patterns Send out interesting patterns • • Analyze data for the entire fleet Analyze data for the entire fleet • • UMBC fleet operations management UMBC fleet operations management Continued… Continued… ■ Onboard real Onboard real- -time time ■ vehicle vehicle- -mining mining system over a wireless system over a wireless network network 3

Pervasive Applications: MobiMine MobiMine Pervasive Applications: ■ MobiMine MobiMine System: A System: A ■ mobile data stream mobile data stream mining system for mining system for monitoring financial monitoring financial data data DDM from NASA EOS Distributed Data DDM from NASA EOS Distributed Data Repositories Repositories 4

Mining from Distributed Privacy- - Mining from Distributed Privacy Sensitive Data Sensitive Data ■ Analyze data without moving the data Analyze data without moving the data ■ in its original form. in its original form. ■ Many DDM algorithms are privacy Many DDM algorithms are privacy- - ■ friendly since they minimize data friendly since they minimize data communication. communication. Distributed Data Mining Distributed Data Mining Site 1 Central site Local Analysis mining & filtering Models/patterns Aggregation and filtered data and analysis of models/patterns Local Analysis mining & filtering Site 2 5

Ensemble of Classifiers and Clusters Ensemble of Classifiers and Clusters … f 2 ( x ) f 3 ( x ) f 1 ( x ) f n ( x ) ∑ Weighted Sum a i : weight for the i-th base classifier f(x) = ∑ i a i f i (x) f i (x) : output of the i-th classifier Discrete Structures for Data Mining Discrete Structures for Data Mining Models Models ■ Trees, in general Graphs are popular choices Trees, in general Graphs are popular choices ■ for data mining models: for data mining models: – Decision trees (Tree) – Decision trees (Tree) – Neural networks (Graph) Neural networks (Graph) – – Graphical models (Graph) Graphical models (Graph) – – Clusters (Graph, – Clusters (Graph, hypergraph hypergraph) ) ■ Dealing with ensembles requires an algebraic Dealing with ensembles requires an algebraic ■ framework. framework. 6

Examples Examples ■ Eigen Eigen analysis of graphs: analysis of graphs: ■ – Graphs can be represented using matrices Graphs can be represented using matrices – – Eigen – Eigen analysis of the analysis of the Laplacian Laplacian of graphs (Chung, of graphs (Chung, 1997). 1997). ■ Wavelet, Fourier, or other representations of Wavelet, Fourier, or other representations of ■ discrete structures?? discrete structures?? Decision Trees as Functions Decision Trees as Functions Outlook Outlook 2 Rain 0 Sunny Overcast 1 Humidity Wind Humidity Wind Yes 1 Strong Weak 1 0 High Normal 1 0 No Yes 0 1 Yes 1 No 0 ■ Decision tree can be viewed as a numeric function. Decision tree can be viewed as a numeric function. ■ 7

Fourier Representation of a Decision Tree Fourier Coefficient (FC) Outlook f( x ) = ∑ j w j Ψ j ( x ) 2 0 1 Humidity Wind 1 partition 1 0 1 0 Fourier Basis Function 1 0 1 0 Fourier Basis Fourier Basis ∑ = Ψ f(x) w (x) j j ∈ j Ξ ∈ { { 0, 1 } l j, x ∈ 0, 1 } j, x l Ψ (x) = j . x (-1) j - -th th Fourier basis function, Fourier basis function, j j w j w j is the corresponding Fourier coefficient; is the corresponding Fourier coefficient; 1 ∑ = Ψ w f( x ) (x) j j N x 8

Partitions Partitions A partition j j is an is an l l - -bit bit boolean boolean string. string. A partition It can also be viewed as a subset of variables. It can also be viewed as a subset of variables. Example: Example: Partition 101 ⇒ ⇒ {x {x 1 , x 2 } contains the features Partition 101 1 , x 2 } contains the features associated with locations indicated by the 1- associated with locations indicated by the 1 -s in the s in the partition. partition. Order of a partition = the number 1 of a partition = the number 1- -s in a partition. s in a partition. Order Fourier Spectrum of a Decision Tree Fourier Spectrum of a Decision Tree ■ Very sparse representation; polynomial number of Very sparse representation; polynomial number of ■ non non- -zero coefficients. If k is the depth then all zero coefficients. If k is the depth then all coefficients involving more than k features are zero. coefficients involving more than k features are zero. Higher order coefficients are exponentially smaller Higher order coefficients are exponentially smaller ■ ■ compared to the low order coefficients ( compared to the low order coefficients (Kushlewitz Kushlewitz and Mansour and Mansour, 1990; Park, , 1990; Park, Kargupta Kargupta, 2001). , 2001). ■ Can be approximated by the low order coefficients Can be approximated by the low order coefficients ■ with significant magnitude. with significant magnitude. ■ Further details in [ Further details in [Linial Linial, , Mansour Mansour, Nisan, 89], [Park, , Nisan, 89], [Park, ■ Ayyagari Ayyagari Kargupta Kargupta 01’], [ 01’], [Kargupta Kargupta et al. 2001]. et al. 2001]. 9

Exponential Decay of FCs Exponential Decay of FCs (S&P 500 Index Data S&P 500 Index Data) ) ( Compression Compression Sufficient spectrum Energy preserved in the Lower (99% energy) Order Coefficients 10

Fourier Spectrum and Decision Trees Fourier Spectrum and Decision Trees Decision Tree Decision Tree Fourier Spectrum Fourier Spectrum ■ Developed efficient algorithms to Developed efficient algorithms to ■ – Compute Fourier spectrum of decision tree – Compute Fourier spectrum of decision tree (IEEE TKDE, SIAM Data Mining Conf., IEEE Data Mining Conf, ACM SIGKDD (IEEE TKDE, SIAM Data Mining Conf., IEEE Data Mining Conf, ACM S IGKDD Explorations) Explorations) – Compute tree from the Fourier spectrum – Compute tree from the Fourier spectrum (DMKD, SIGMOD 2002) (DMKD, SIGMOD 2002) Aggregation of Multiple Decision Aggregation of Multiple Decision Trees Trees Σ w ψ j F1(x) = Σ w j j ψ (x) F1(x) = j (x) F2(x) = Σ Σ w j ψ ψ j F3(x) = Σ Σ w j ψ ψ j F2(x) = w j j (x) (x) F3(x) = w j j (x) (x) F(x) = a1*F1(x) + a2*F2(x) + a3*F3(x) F(x) = a1* F1(x) + a2*F2(x) + a3*F3(x) ■ Weighted average of decision trees through Fourier Weighted average of decision trees through Fourier ■ analysis analysis 11

Visualization of Decision Trees Visualization of Decision Trees FC are color FC are color- -coded in accordance to the magnitude. coded in accordance to the magnitude. ■ ■ Brighter spots are more significant coefficients. Brighter spots are more significant coefficients. ■ ■ On clicking, partition corresponding to the coefficient On clicking, partition corresponding to the coefficient ■ ■ is displayed. is displayed. PCA- -Based Visualization of Decision Trees Based Visualization of Decision Trees PCA 0.3 0.2 0.1 2nd Principal Component 0 −0.1 −0.2 −0.3 −0.4 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 1st Principal Component 12

Roadmap Roadmap Distributed Data Mining (DDM) Distributed Data - PDF document

Distributed Data Mining for Distributed Data Mining for Pervasive and Privacy- -Sensitive Sensitive Pervasive and Privacy Applications Applications Hillol Kargupta Hillol Kargupta Dept. of Computer Science and Electrical Engg Engg, ,

ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap

FIFE Roadmap Workshop Mike Kirby FIFE Roadmap Workshop Dec 5, 2017 FIFE Roadmap Workshop The

Defining Encryption Lecture 2 1 Roadmap 2 Roadmap First, Symmetric Key Encryption 2 Roadmap

Roadmap: Transition to College Professor Doug Szajda Roadmap, Fall 2017 Welcome This is my

Partnering with Missouri Communities: Roadmap to Resilience Webinar 2: Roadmap Action Steps 1-3

unification 2016 unification strategic roadmap succession unification strategic roadmap

2015 Roadmap Update Maddy Thompson | Randy Spaulding Senate Higher Education Committee January

ICE Roadmap Japanese STAR Conference Richard Johns Introduction Top-Level Roadmap

THE SECTORAL ROADMAP FOR RUBBER PRODUCTS dongcubillas 1 To elicit feedback on the content of

Plug & Abandonment Forum (PAF) Desired P&A direction - P&A Technology Roadmap Martin

2018 1MORE PRODUCT ROADMAP 2018 ROADMAP Dual Driver TWS Triple BT BT IE iBFree 2 Gaming

Towards a roadmap for Towards a roadmap for standardization in standardization in language

The Philippine Housing Industry Roadmap: 2012-2030 BOI PRESENTATION By the Subdivision &

New Data Protection law - Impact on the University LSBUs compliance roadmap Roadmap -

Date: 25 th October 2016 IT-BPM Transformation Journey Through 3 Industry Roadmaps Roadmap

eCommerce Roadmap with Focus on Logistics @ Swiss Post Baumberger Fabian Business Development

SLS Methods: An Overview adapted from slides for SLS:FA, Chapter 2 Outline 1. Constructive

DC-DC Converter Development for the CMS Pixel Upgrade Katja Klein RWTH Aachen University with

Motivation SMT-solvers are routinely used in program analysis: Deductive program verification

Proofs in Satisfiability Modulo Theories Clark Barrett (NYU) Leonardo de Moura (Microsoft

A Measurement-Based Algorithm to Maximize the Utility of Wireless Networks Julien Herzen joint

Metal Cutting (Machining) Metal cutting , commonly called machining , is the removal of unwanted

15-780 Graduate Artificial Intelligence: Integer programming J. Zico Kolter (this lecture)

Process Characteristics It is one of the faster cutting processes. The work piece does not

Roadmap Roadmap Distributed Data Mining (DDM) Distributed Data - PDF document

Distributed Data Mining for Distributed Data Mining for Pervasive and Privacy- -Sensitive Sensitive Pervasive and Privacy Applications Applications Hillol Kargupta Hillol Kargupta Dept. of Computer Science and Electrical Engg Engg, ,

ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap

FIFE Roadmap Workshop Mike Kirby FIFE Roadmap Workshop Dec 5, 2017 FIFE Roadmap Workshop The

Defining Encryption Lecture 2 1 Roadmap 2 Roadmap First, Symmetric Key Encryption 2 Roadmap

Roadmap: Transition to College Professor Doug Szajda Roadmap, Fall 2017 Welcome This is my

Partnering with Missouri Communities: Roadmap to Resilience Webinar 2: Roadmap Action Steps 1-3

unification 2016 unification strategic roadmap succession unification strategic roadmap

2015 Roadmap Update Maddy Thompson | Randy Spaulding Senate Higher Education Committee January

ICE Roadmap Japanese STAR Conference Richard Johns Introduction Top-Level Roadmap

THE SECTORAL ROADMAP FOR RUBBER PRODUCTS dongcubillas 1 To elicit feedback on the content of

Plug &amp; Abandonment Forum (PAF) Desired P&amp;A direction - P&amp;A Technology Roadmap Martin

2018 1MORE PRODUCT ROADMAP 2018 ROADMAP Dual Driver TWS Triple BT BT IE iBFree 2 Gaming

Towards a roadmap for Towards a roadmap for standardization in standardization in language

The Philippine Housing Industry Roadmap: 2012-2030 BOI PRESENTATION By the Subdivision &amp;

New Data Protection law - Impact on the University LSBUs compliance roadmap Roadmap -

Date: 25 th October 2016 IT-BPM Transformation Journey Through 3 Industry Roadmaps Roadmap

eCommerce Roadmap with Focus on Logistics @ Swiss Post Baumberger Fabian Business Development

SLS Methods: An Overview adapted from slides for SLS:FA, Chapter 2 Outline 1. Constructive

DC-DC Converter Development for the CMS Pixel Upgrade Katja Klein RWTH Aachen University with

Motivation SMT-solvers are routinely used in program analysis: Deductive program verification

Proofs in Satisfiability Modulo Theories Clark Barrett (NYU) Leonardo de Moura (Microsoft

A Measurement-Based Algorithm to Maximize the Utility of Wireless Networks Julien Herzen joint

Metal Cutting (Machining) Metal cutting , commonly called machining , is the removal of unwanted

15-780 Graduate Artificial Intelligence: Integer programming J. Zico Kolter (this lecture)

Process Characteristics It is one of the faster cutting processes. The work piece does not

Plug & Abandonment Forum (PAF) Desired P&A direction - P&A Technology Roadmap Martin

The Philippine Housing Industry Roadmap: 2012-2030 BOI PRESENTATION By the Subdivision &