DATA ANALYTICS USING DEEP LEARNING GT 8803 FALL 2018 POOJA - PowerPoint PPT Presentation

DATA ANALYTICS USING DEEP LEARNING GT 8803 FALL 2018 POOJA BHANDARY T O W A R D S A U N I F I E D A R C H I T E C T U R E F O R I N - R D B M S A N A L Y T I C S

TODAY’s PAPER • SIGMOD 2012 � In-RDBMS Analytics • Hazy project at the Department of Computer Science, University of Wisconsin, Madison. GT 8803 // Fall 2018 2

TODAY’S AGENDA • Motivation • Problem Overview • Key Idea • Technical Details • Experiments • Discussion GT 8803 // Fall 2018 3

Motivation GT 8803 // Fall 2018 4

Problem Overview • Ad hoc development cycle for incorporating new analytical tasks. • Performance optimization on a per module basis. • Limited code reusability. GT 8803 // Fall 2018 5

In-RDBMS Analytics Architecture GT 8803 // Fall 2018 6

High Level Idea • Devise a unified architecture that is capable of processing multiple data analytics techniques. • Frame analytical tasks using Convex Programming. GT 8803 // Fall 2018 7

Main Contributions • Bismarck • Identification of factors that impact performance and suggesting relevant optimizations. GT 8803 // Fall 2018 8

Bismarck GT 8803 // Fall 2018 9

Convex Optimization GT 8803 // Fall 2018 10

Gradient Descent GT 8803 // Fall 2018 11

Incremental Gradient descent • ! (#$%) = ! # − ( # )ℱ(! # , , - ) GT 8803 // Fall 2018 12

Incremental Gradient Descent • Data-access properties are amenable to an efficient in-RDBMS implementation. • IGD approximates the full gradient ∇ F using only one term at a time. GT 8803 // Fall 2018 13

Technical Approach • IGD can be implemented using a classic RDBMS abstraction called a UDA ( user-defined aggregate ). GT 8803 // Fall 2018 14

User Defined Aggregate(UDA) • Initialize • Transition • Finalize GT 8803 // Fall 2018 15

GT 8803 // Fall 2018 16

Performance Optimizations • Data Ordering • Parallelizing Gradient Computations • Avoiding Shuffling Overhead GT 8803 // Fall 2018 17

Data Ordering • Data is often clustered in RDBMSs which could lead to slower convergence time. • Shuffling at every epoch can be computationally expensive. Solution: Shuffle once GT 8803 // Fall 2018 18

Parallelizing Gradient Computations • Pure UDA - Shared Nothing Requires a merge function. Can lead to sub-optimal run time results. • Shared-Memory UDA Implemented in the user space. The model to be learned is maintained in shared memory and is concurrently updated by parallel threads. GT 8803 // Fall 2018 19

Avoiding Shuffling Overhead • Shuffling once might not be feasible for very large datasets. • Straightforward reservoir sampling could lead to slower convergence rate by discarding data items that could lead to a faster convergence. GT 8803 // Fall 2018 20

Multiplexed Reservoir Sampling • Combines the reservoir sampling idea with the concurrent update model. • Combine or multiplex, gradient steps over both the reservoir sample and the data that is not put in the reservoir buffer GT 8803 // Fall 2018 21

Multiplexed Reservoir Sampling GT 8803 // Fall 2018 22

Evaluation 1) Implement Bismarck over PostrgreSQL and two other commercial databases. 2) Compare its performance with native analytical tools provided by the RDBMSs. GT 8803 // Fall 2018 23

Tasks and Datasets 1. Logistic Regression (LR) - Forest, DBLife 2. Support Vector Machine (SVM) - Forest, DBLife 3. Low Rank Matrix Factorization (LRM) – MovieLens 4. Conditional Random Fields Labeling(CRF) - CoNLL GT 8803 // Fall 2018 24

Benchmarking Results Dataset Task PostgreSQL DBMS A DBMS B(8 segments) BISMARCK MADlib BISMARCK Native BISMARCK Native Forest LR 8.0 43.5 40.2 489.0 3.7 17.0 (Dense) SVM 7.5 140.2 32.7 66.7 3.3 19.2 DBLife LR 0.8 N/A 9.8 20.6 2.3 N/A (Sparse) SVM 1.2 N/A 11.6 4.8 4.1 N/A MovieLens LMF 36.0 29325.7 394.7 N/A 11.9 17431.3 GT 8803 // Fall 2018 25

Impact of Data Ordering GT 8803 // Fall 2018 26

Scalability Test GT 8803 // Fall 2018 27

Strengths 1. Incorporating a new task requires only a few lines of code change. 2. Shorter development cycles. 3. Performance optimization is generic. GT 8803 // Fall 2018 28

Weaknesses • Theoretical inference made about the effect of clustering on the convergence rate. • Only applies to analytical tasks that can be expressed as a convex optimization problem. GT 8803 // Fall 2018 29

Reflections GT 8803 // Fall 2018 30

References GT 8803 // Fall 2018 31

DATA ANALYTICS USING DEEP LEARNING GT 8803 FALL 2018 POOJA - PowerPoint PPT Presentation

DATA ANALYTICS USING DEEP LEARNING GT 8803 FALL 2018 POOJA BHANDARY T O W A R D S A U N I F I E D A R C H I T E C T U R E F O R I N - R D B M S A N A L Y T I C S TODAYs PAPER SIGMOD 2012 In-RDBMS Analytics Hazy project at

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Deep Data Analytics for Pricing: Uses, Issues, and Solutions Walter R. Paczkowski, Ph.D. Data

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // VENKATA KISHORE PATCHA Lecture#16 :

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Google Analytics Overview Whats Google Analytics? The Google Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Data Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Smooth Constraint Convex Minimization via Conditional Gradients Sebastian Pokutta H. Milton

On Using Application-Layer Middlebox Protocols for Peeking Behind NAT Gateways Teemu Rytilahti,

CSE 232A Graduate Database Systems Arun Kumar About Paper Reviews 1 Goal of Peer Review in

Universal Plug and Play (UPnP) Internet Gateway Device (IGD)- Port Control Protocol (PCP)

Financial Disclosure IN INTERNET ADDICTION ACROSS THE LIFESPAN David R. Rosenberg, M.D., Paul

VCoRE: A web resource oriented architecture for efficient data exchange Tobias Alexander Franke

Some Comments on GD and IGD and Relations to the Hausdorff Distance O. Schtze, X. Esquivel, A.

Joutsa et al. (2012). NeuroImage: 60: 19921999 Joutsa et al. (2012). NeuroImage: 60:

Sambuz

Useful Links

Newsletter

Mail Us

DATA ANALYTICS USING DEEP LEARNING GT 8803 FALL 2018 POOJA - PowerPoint PPT Presentation

DATA ANALYTICS USING DEEP LEARNING GT 8803 FALL 2018 POOJA BHANDARY T O W A R D S A U N I F I E D A R C H I T E C T U R E F O R I N - R D B M S A N A L Y T I C S TODAYs PAPER SIGMOD 2012 In-RDBMS Analytics Hazy project at

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Deep Data Analytics for Pricing: Uses, Issues, and Solutions Walter R. Paczkowski, Ph.D. Data

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // VENKATA KISHORE PATCHA Lecture#16 :

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Google Analytics Overview Whats Google Analytics? The Google Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Data Mining &amp; Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Smooth Constraint Convex Minimization via Conditional Gradients Sebastian Pokutta H. Milton

On Using Application-Layer Middlebox Protocols for Peeking Behind NAT Gateways Teemu Rytilahti,

CSE 232A Graduate Database Systems Arun Kumar About Paper Reviews 1 Goal of Peer Review in

Universal Plug and Play (UPnP) Internet Gateway Device (IGD)- Port Control Protocol (PCP)

Financial Disclosure IN INTERNET ADDICTION ACROSS THE LIFESPAN David R. Rosenberg, M.D., Paul

VCoRE: A web resource oriented architecture for efficient data exchange Tobias Alexander Franke

Some Comments on GD and IGD and Relations to the Hausdorff Distance O. Schtze, X. Esquivel, A.

Joutsa et al. (2012). NeuroImage: 60: 19921999 Joutsa et al. (2012). NeuroImage: 60:

Sambuz

Useful Links

Newsletter

Mail Us

Data Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues