Roadmap Roadmap Distributed Data Mining: Why Bother? Distributed - PDF document

Distributed Data Mining: Current Distributed Data Mining: Current Pleasures and Emerging Applications Pleasures and Emerging Applications Hillol Kargupta Hillol Kargupta University of Maryland, Baltimore County and AGNIK University of Maryland, Baltimore County and AGNIK www.cs.umbc.edu/~hillol www.cs.umbc.edu/~hillol Acknowledgements: Wes Griffin, Souptik Acknowledgements: Wes Griffin, Souptik Datta Datta, , Kanishka Bhaduri, Kamalika Kanishka Bhaduri, Kamalika Das, Ran Wolff, Chris Das, Ran Wolff, Chris Giannella Giannella Roadmap Roadmap � Distributed Data Mining: Why Bother? Distributed Data Mining: Why Bother? � � Some Emerging Applications Some Emerging Applications � � Local Algorithms Local Algorithms � � Exact Local Algorithms Exact Local Algorithms � � Approximate Local Algorithms Approximate Local Algorithms � � Resources Resources � 1

Data Mining and Distributed Data Mining Data Mining and Distributed Data Mining � Data Mining: Scalable analysis of data by paying Data Mining: Scalable analysis of data by paying � careful attention to the resources: careful attention to the resources: � computing, computing, � � communication, communication, � � storage, and storage, and � � human human- -computer interaction. computer interaction. � � Distributed data mining (DDM): Mining data Distributed data mining (DDM): Mining data � using distributed resources. using distributed resources. Data Mining for Distributed and Ubiquitous Data Mining for Distributed and Ubiquitous Environments: Applications Environments: Applications � Mining Large Databases from distributed sites Mining Large Databases from distributed sites � � Grid data mining in Earth Science, Astronomy, Counter Grid data mining in Earth Science, Astronomy, Counter- -terrorism, Bioinformatics terrorism, Bioinformatics � � Monitoring Multiple time critical data streams Monitoring Multiple time critical data streams � � Monitoring vehicle data streams in real Monitoring vehicle data streams in real- -time time � � Monitoring physiological data streams Monitoring physiological data streams � � Analyzing data in Lightweight Sensor Networks and Mobile devices Analyzing data in Lightweight Sensor Networks and Mobile devices � � Limited network bandwidth Limited network bandwidth � � Limited power supply Limited power supply � � Preserving privacy Preserving privacy � � Security/Safety related applications Security/Safety related applications � � Peer Peer- -to to- -peer data mining peer data mining � � Large decentralized asynchronous environments Large decentralized asynchronous environments � 2

Vehicles: Source of High Volume Data Streams Vehicles: Source of High Volume Data Streams � Vehicles generate tons Vehicles generate tons � of data of data � Hundreds of different Hundreds of different � parameters from parameters from different subsystems different subsystems � High throughput data High throughput data � streams streams � So what? So what? � Why Mine Vehicle Data? Why Mine Vehicle Data? � Fuel consumption analysis Fuel consumption analysis � � Fleet analytics Fleet analytics � � Vehicle benchmarking Vehicle benchmarking � � Predictive health Predictive health- -monitoring monitoring High gas prices High gas prices � � Driver behavior analytics Driver behavior analytics � Breakdowns cost Breakdowns cost Bad driving Bad driving thousands of thousands of costs money--- --- costs money dollars dollars fuel, brake shoe, fuel, brake shoe, insurance, law- insurance, law - suits suits 3

From Concept to Commercial Product From Concept to Commercial Product First prototype First prototype -- -- PDA PDA- -based platform based platform � � Other choices: Other choices: � � Cell phones and Cell phones and � � Low- -cost, less powerful embedded devices cost, less powerful embedded devices Low � � Circa 2001 Circa 2001 Market Entry Point Market Entry Point Circa 2005 � � Circa 2005 � Location management companies Location management companies � � M2M companies M2M companies � Low Cost Embedded GPS Devices Low Cost Embedded GPS Devices � � Resource constrained Resource constrained � � 3 3- -4K run time memory 4K run time memory � � Circa 2007 Circa 2007 250K footprint 250K footprint � � Resource sharing with GPS program Resource sharing with GPS program � � Private & Secure Data Mining from Multi- -Party Party Private & Secure Data Mining from Multi Distributed Data Distributed Data � Compute global patterns without direct access to the multi Compute global patterns without direct access to the multi- -party party � raw distributed data raw distributed data � Minimize communication cost Minimize communication cost � � Must come with provably correct guarantees with respect to a Must come with provably correct guarantees with respect to a � given privacy model given privacy model � Must be scalable with respect to Must be scalable with respect to � � number of data sites number of data sites � � size of the data size of the data � � Privacy Privacy- -preserving data mining preserving data mining � � Blends in ``pattern Blends in ``pattern- -preserving’’ transformations with data analysis preserving’’ transformations with data analysis � 4

How PURSUIT Works for the User How PURSUIT Works for the User � Need to have your own sensor such as SNORT, MINDS Need to have your own sensor such as SNORT, MINDS � � Download PURSUIT plug Download PURSUIT plug- -in for the sensor and install in for the sensor and install � � PURSUIT plug PURSUIT plug- -in offers in offers � � A stand A stand- -alone interface for processing your alerts from the sensor alone interface for processing your alerts from the sensor � and cross and cross- -domain analysis domain analysis � Web account for detailed cross Web account for detailed cross- -domain statistics domain statistics � � Optional distributed collaboration management module for Optional distributed collaboration management module for � managing the threats and archiving forensics managing the threats and archiving forensics PURSUIT Web Site PURSUIT Web Site 5

Peer- -to to- -peer (P2P) Networks peer (P2P) Networks Peer � Relies primarily on the computing resources of the Relies primarily on the computing resources of the � participants in the network rather than a relatively low participants in the network rather than a relatively low number of servers. number of servers. � P2P networks are typically used for connecting nodes via P2P networks are typically used for connecting nodes via � largely ad hoc connections. largely ad hoc connections. � No central administrator/coordinator No central administrator/coordinator � � Peers simultaneously function as both "clients" and "servers" Peers simultaneously function as both "clients" and "servers" � � Privacy is an important issue in most P2P applications Privacy is an important issue in most P2P applications � Where do we find P2P Networks? Where do we find P2P Networks? � Applications: Applications: � � File File- -sharing networks: sharing networks: KaZAa KaZAa, Napster, Gnutella , Napster, Gnutella � � P2P network storage, web caching, P2P network storage, web caching, � � P2P bio P2P bio- -informatics, informatics, � � P2P astronomy, P2P astronomy, � � P2P Information retrieval P2P Information retrieval � � P2P Sensor Networks? P2P Sensor Networks? � � P2P Mobile Ad P2P Mobile Ad- -hoc hoc NETwork NETwork (MANET)? (MANET)? � � Next Generation: Next Generation: � � P2P Search Engines, Social Networking, Digital libraries, P2P P2P Search Engines, Social Networking, Digital libraries, P2P � “YouTube”? “YouTube”? 6

P2P Web Mining P2P Web Mining � Web mining in a sever Web mining in a sever- -less environment less environment � Useful Browser Data Useful Browser Data � Web Web- -browser history browser history � � Browser cache Browser cache � � Click Click- -stream data stored at browser (browsing pattern) stream data stored at browser (browsing pattern) � � Search queries typed in the search engine Search queries typed in the search engine � � User profile User profile � � Bookmarks Bookmarks � � Challenges Challenges � � Indexing, clustering, data analysis in a decentralized Indexing, clustering, data analysis in a decentralized � asynchronous manner asynchronous manner � Scalability Scalability � � Privacy Privacy � 7

Roadmap Roadmap Distributed Data Mining: Why Bother? Distributed - PDF document

Distributed Data Mining: Current Distributed Data Mining: Current Pleasures and Emerging Applications Pleasures and Emerging Applications Hillol Kargupta Hillol Kargupta University of Maryland, Baltimore County and AGNIK University of

ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap

FIFE Roadmap Workshop Mike Kirby FIFE Roadmap Workshop Dec 5, 2017 FIFE Roadmap Workshop The

Defining Encryption Lecture 2 1 Roadmap 2 Roadmap First, Symmetric Key Encryption 2 Roadmap

Roadmap: Transition to College Professor Doug Szajda Roadmap, Fall 2017 Welcome This is my

Partnering with Missouri Communities: Roadmap to Resilience Webinar 2: Roadmap Action Steps 1-3

unification 2016 unification strategic roadmap succession unification strategic roadmap

2015 Roadmap Update Maddy Thompson | Randy Spaulding Senate Higher Education Committee January

ICE Roadmap Japanese STAR Conference Richard Johns Introduction Top-Level Roadmap

THE SECTORAL ROADMAP FOR RUBBER PRODUCTS dongcubillas 1 To elicit feedback on the content of

Plug & Abandonment Forum (PAF) Desired P&A direction - P&A Technology Roadmap Martin

2018 1MORE PRODUCT ROADMAP 2018 ROADMAP Dual Driver TWS Triple BT BT IE iBFree 2 Gaming

Towards a roadmap for Towards a roadmap for standardization in standardization in language

The Philippine Housing Industry Roadmap: 2012-2030 BOI PRESENTATION By the Subdivision &

New Data Protection law - Impact on the University LSBUs compliance roadmap Roadmap -

Date: 25 th October 2016 IT-BPM Transformation Journey Through 3 Industry Roadmaps Roadmap

eCommerce Roadmap with Focus on Logistics @ Swiss Post Baumberger Fabian Business Development

Advanced CUDA: Overview of GPU Hardware John E. Stone Theoretical and Computational Biophysics

Efficient Model Evaluation in the Search-Based Approach to Latent Structure Discovery Tao Chen,

in the presence of latent confounders and linear non-Gaussian SEMs Shohei Shimizu Osaka

Latent Wishart Processes for Relational Kernel Learning Wu-Jun Li Department of Computer Science

On a Road to 6G: Interplay Between NOMA and Reconfigurable Intelligent Surfaces (RIS) Dr. Yuanwei

The Essence of the Course COSC 404 Database System Implementation If you walk out of this course

Room for the River project examples Robert Slomp Rijkswaterstaat RIZA 37. IWASA Aachen

Nonparametric density estimation Christopher F Baum ECON 8823: Applied Econometrics Boston

Sambuz

Useful Links

Newsletter

Mail Us

Roadmap Roadmap Distributed Data Mining: Why Bother? Distributed - PDF document

Distributed Data Mining: Current Distributed Data Mining: Current Pleasures and Emerging Applications Pleasures and Emerging Applications Hillol Kargupta Hillol Kargupta University of Maryland, Baltimore County and AGNIK University of

ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap Catalysis ICCA/IEA/DECHEMA Roadmap

FIFE Roadmap Workshop Mike Kirby FIFE Roadmap Workshop Dec 5, 2017 FIFE Roadmap Workshop The

Defining Encryption Lecture 2 1 Roadmap 2 Roadmap First, Symmetric Key Encryption 2 Roadmap

Roadmap: Transition to College Professor Doug Szajda Roadmap, Fall 2017 Welcome This is my

Partnering with Missouri Communities: Roadmap to Resilience Webinar 2: Roadmap Action Steps 1-3

unification 2016 unification strategic roadmap succession unification strategic roadmap

2015 Roadmap Update Maddy Thompson | Randy Spaulding Senate Higher Education Committee January

ICE Roadmap Japanese STAR Conference Richard Johns Introduction Top-Level Roadmap

THE SECTORAL ROADMAP FOR RUBBER PRODUCTS dongcubillas 1 To elicit feedback on the content of

Plug &amp; Abandonment Forum (PAF) Desired P&amp;A direction - P&amp;A Technology Roadmap Martin

2018 1MORE PRODUCT ROADMAP 2018 ROADMAP Dual Driver TWS Triple BT BT IE iBFree 2 Gaming

Towards a roadmap for Towards a roadmap for standardization in standardization in language

The Philippine Housing Industry Roadmap: 2012-2030 BOI PRESENTATION By the Subdivision &amp;

New Data Protection law - Impact on the University LSBUs compliance roadmap Roadmap -

Date: 25 th October 2016 IT-BPM Transformation Journey Through 3 Industry Roadmaps Roadmap

eCommerce Roadmap with Focus on Logistics @ Swiss Post Baumberger Fabian Business Development

Advanced CUDA: Overview of GPU Hardware John E. Stone Theoretical and Computational Biophysics

Efficient Model Evaluation in the Search-Based Approach to Latent Structure Discovery Tao Chen,

in the presence of latent confounders and linear non-Gaussian SEMs Shohei Shimizu Osaka

Latent Wishart Processes for Relational Kernel Learning Wu-Jun Li Department of Computer Science

On a Road to 6G: Interplay Between NOMA and Reconfigurable Intelligent Surfaces (RIS) Dr. Yuanwei

The Essence of the Course COSC 404 Database System Implementation If you walk out of this course

Room for the River project examples Robert Slomp Rijkswaterstaat RIZA 37. IWASA Aachen

Nonparametric density estimation Christopher F Baum ECON 8823: Applied Econometrics Boston

Sambuz

Useful Links

Newsletter

Mail Us

Plug & Abandonment Forum (PAF) Desired P&A direction - P&A Technology Roadmap Martin

The Philippine Housing Industry Roadmap: 2012-2030 BOI PRESENTATION By the Subdivision &