DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY - PowerPoint PPT Presentation

NETWORK DATA MODEL Schema SUPPLIER SU ER PART PA (sno, sname, scity, sstate) (pno, pname, psize) SU SUPPLIES ES SU SUPPLIED ED_BY SUPPLY SU (qty, price) GT 8803 // Fall 2019 39

NETWORK DATA MODEL Schema Complex Queries SU SUPPLIER ER PART PA (sno, sname, scity, sstate) (pno, pname, psize) SU SUPPLIES ES SU SUPPLIED ED_BY SUPPLY SU (qty, price) GT 8803 // Fall 2019 40

NETWORK DATA MODEL Schema Complex Queries SU SUPPLIER ER PA PART (sno, sname, scity, sstate) (pno, pname, psize) Easily Corrupted SU SUPPLIES ES SU SUPPLIED ED_BY SUPPLY SU (qty, price) GT 8803 // Fall 2019 41

NETWORK DATA MODEL • Advantages – Graph structured data models are less restrictive • Limitations – Poorer physical and logical data independence : Cannot freely change physical data storage organization or change logical application schema – Slow loading and recovery: Data is typically stored in one large network. This much larger object had to be bulk-loaded all at once, leading to very long load times. GT 8803 // Fall 2019 42

1970 s – RELATIONAL MODEL • Ted Codd was a mathematician working at IBM Research. – He saw developers spending their time rewriting IMS and Codasyl programs every time the database’s schema or layout changed. • Relational abstraction to avoid this: – Store database in simple data structures. Codd – Access data via high-level declarative language. – Physical storage left up to implementation. GT 8803 // Fall 2019 43

RELATIONAL DATA MODEL Schema SU SUPPLIER ER PA PART (sno, sname, scity, sstate) (pno, pname, psize) SU SUPPLY (sno, pno, qty, price) GT 8803 // Fall 2019 44

48 GT 8803 // Fall 2018

RELATIONAL DATA MODEL • Advantages – Set-a-time languages are good, regardless of the data model, since they offer physical data independence – Logical data independence is easier with a simple data model than with a complex one. – Query optimizers can beat all but the best tuple-at-a- time DBMS application programmers GT 8803 // Fall 2019 49

RELATIONAL DATA MODEL • Early implementations of relational DBMS: – System R – IBM Research – INGRES – U.C. Berkeley – Oracle – Larry Ellison Gray Stonebraker Ellison GT 8803 // Fall 2019 50

RELATIONAL DATA MODEL • The relational model wins. – IBM comes out with DB2 in 1983. – “SEQUEL” becomes the standard (SQL). • Many new “enterprise” DBMSs but Oracle wins marketplace. GT 8803 // Fall 2019 51

1980 s – OBJECT-ORIENTED DATABASES • Avoid “relational-object impedance mismatch” by tightly coupling objects and database. • Few of these original DBMSs from the 1980s still exist today but many of the technologies exist in other forms (JSON, XML) GT 8803 // Fall 2019 52

OBJECT-ORIENTED MODEL Application Code class Student { int id; String name; String email; String phone[]; } GT 8803 // Fall 2019 53

OBJECT-ORIENTED MODEL Application Code Relational Schema class Student { STU STUDENT ENT int id; String name; (id, name, email) String email; String phone[]; } STU STUDENT_ ENT_PHONE NE (sid, phone) GT 8803 // Fall 2019 54

OBJECT-ORIENTED MODEL Application Code Relational Schema class Student { STU STUDENT ENT int id; String name; (id, name, email) String email; String phone[]; } id name email STUDENT_ STU ENT_PHONE NE 1001 M.O.P. ante@up.com (sid, phone) sid phone 1001 444-444-4444 1001 555-555-5555 GT 8803 // Fall 2019 55

OBJECT-ORIENTED MODEL Application Code Relational Schema class Student { STU STUDENT ENT int id; String name; (id, name, email) String email; String phone[]; } id name email STUDENT_ STU ENT_PHONE NE 1001 M.O.P. ante@up.com (sid, phone) sid phone 1001 444-444-4444 1001 555-555-5555 GT 8803 // Fall 2019 56

OBJECT-ORIENTED MODEL Application Code class Student { int id; String name; String email; String phone[]; } GT 8803 // Fall 2019 57

OBJECT-ORIENTED MODEL Application Code Student class Student { int id; { String name; “ id ”: 1001, “ name ”: “M.O.P.”, String email; “ email ”: “ante@up.com”, String phone[]; “ phone ”: [ } “444-444-4444”, “555-555-5555” ] } GT 8803 // Fall 2019 58

OBJECT-ORIENTED MODEL Application Code Complex Queries Student class Student { int id; { String name; “ id ”: 1001, “ name ”: “M.O.P.”, String email; “ email ”: “ante@up.com”, String phone[]; “ phone ”: [ } “444-444-4444”, “555-555-5555” ] } GT 8803 // Fall 2019 59

OBJECT-ORIENTED MODEL Application Code Complex Queries Student class Student { int id; { String name; “ id ”: 1001, “ name ”: “M.O.P.”, String email; “ email ”: “ante@up.com”, String phone[]; “ phone ”: [ } “444-444-4444”, No Standard API “555-555-5555” ] } GT 8803 // Fall 2019 60

1990 s – BORING DAYS • No major advancements in database systems or application workloads. – Microsoft forks Sybase and creates SQL Server. – MySQL is written as a replacement for mSQL. – Postgres gets SQL support. – SQLite started in early 2000. GT 8803 // Fall 2019 61

1990 s – BORING DAYS • Multimedia databases – Feature engineering – Accuracy, robustness, and performance SELECT image_date FROM images WHERE event = ‘Sunrise’ GT 8803 // Fall 2019 62

2000 s – INTERNET BOOM • All the big players were heavyweight and expensive. Open-source databases were missing important features. • Many companies wrote their own custom middleware to scale out database across single-node DBMS instances. GT 8803 // Fall 2019 63

2000 s – DATA WAREHOUSES • Rise of the special purpose OLAP DBMSs. – Distributed / Shared-Nothing – Relational / SQL – Usually closed-source. • Significant performance benefits from using columnar storage organization GT 8803 // Fall 2019 64

2000 s – NoSQL SYSTEMS • Focus on high-availability & high-scalability: – Schemaless (i.e., “Schema Last”) – Non-relational data models (document, key/value, etc.) – No ACID transactions – Custom APIs instead of SQL – Usually open-source GT 8803 // Fall 2019 65

2010 s – NEWSQL SYSTEMS • Provide same performance for OLTP workloads as NoSQL DBMSs without giving up ACID: – Relational / SQL – Distributed – Usually closed-source GT 8803 // Fall 2019 66

2010 s – HYBRID SYSTEMS • H ybrid T ransactional- A nalytical P rocessing • Execute fast OLTP like a NewSQL system while also executing complex OLAP queries like a data warehouse system. – Distributed / Shared-Nothing – Relational / SQL – Mixed open/closed-source. GT 8803 // Fall 2019 67

2010 s – CLOUD SYSTEMS • First database-as-a-service (DBaaS) offerings were "containerized" versions of existing DBMSs. • There are new DBMSs that are designed from scratch explicitly for running in a cloud environment. GT 8803 // Fall 2019 68

2010 s – SPECIALIZED SYSTEMS • Shared-disk DBMSs • Embedded DBMSs • Times Series DBMS • Multi-Model DBMSs • Blockchain DBMSs GT 8803 // Fall 2019 69

2010 s – SPECIALIZED SYSTEMS • Shared-disk DBMSs • Embedded DBMSs • Times Series DBMS • Multi-Model DBMSs • Blockchain DBMSs GT 8803 // Fall 2019 70

SUMMARY • There are many innovations that come from both industry and academia: – Lots of ideas start in academia but few build complete DBMSs to verify them. – IBM was the vanguard during 1970-1980s but now there is no single trendsetter. – Oracle borrows ideas from anybody. • The relational model has won for operational databases. GT 8803 // Fall 2019 71

GOAL: VIDEO ANALYTICS DBMS • Feature Engineering • Robustness • Computational Efficiency • Usability GT 8803 // Fall 2019 72

CHALLENGES: MULTIMEDIA DBMS s • Feature Engineering – The same multi-media data could mean different things to different people. Second, users typically have diverse information needs. – Thus, a single feature may not be sufficient to completely index a given video. – Therefore, it becomes difficult to identify the features that are most appropriate in any given environment. GT 8803 // Fall 2019 73

CHALLENGES: MULTIMEDIA DBMS s • Robustness – Works well on one dataset, but completely breaks on another dataset from the same domain – Example: Two traffic cameras in different cities – Limits the utility of the database system – Need inherent support for coping with data drift GT 8803 // Fall 2019 74

CHALLENGES: COMPUTER VISION PIPELINES • Computational Efficiency – These pipelines are computationally infeasible at scale – Example: State-of-the-art object detection models run at 3 frames per second (fps) (e.g., Mask R-CNN) – It will take 8 decades of GPU time to process 100 cameras over a month of video GT 8803 // Fall 2019 75

CHALLENGES: COMPUTER VISION PIPELINES • Usability – These techniques require complex, imperative programming across many low-level libraries (e.g., Pytorch and OpenCV) – This is an ad-hoc, tedious process that ignores opportunity for cross-operator optimization – Traditional database systems were successful due to their ease of use (i.e., SQL is declarative) GT 8803 // Fall 2019 76

GOAL: VIDEO ANALYTICS DBMS GT 8803 // Fall 2019 77

ADVANCED SQL 78 GT 8803 // Fall 2018

RELATIONAL LANGUAGES • User only needs to specify what answer that they want, not how to compute it. • The DBMS is responsible for efficient evaluation of the query. – Query optimizer: re-orders operations and generates query plan GT 8803 // Fall 2019 79

SQL HISTORY • Originally “SEQUEL” from IBM’s System R prototype. – Structured English Query Language – Adopted by Oracle in the 1970s. • IBM releases DB2 in 1983. • ANSI Standard in 1986. ISO in 1987 – Structured Query Language GT 8803 // Fall 2019 80

SQL HISTORY • Current standard is SQL:2016 – SQL:2016 → JSON, Polymorphic tables – SQL:2011 → Temporal DBs, Pipelined DML – SQL:2008 → TRUNCATE, Fancy ORDER – SQL:2003 → XML, windows, sequences, auto- generated IDs. – SQL:1999 → Regex, triggers, OO • Most DBMSs at least support SQL-92 GT 8803 // Fall 2019 81

RELATIONAL LANGUAGES • Language – Data Manipulation Language (DML) – Data Definition Language (DDL) – Data Control Language (DCL) – View definition – Integrity & Referential Constraints – Transactions • Important: SQL is based on bags (duplicates) not sets (no duplicates). GT 8803 // Fall 2019 82

ADVANCED SQL • Aggregations + Group By • Output Control + Redirection • Nested Queries • Common Table Expressions • Window Functions GT 8803 // Fall 2019 83

EXAMPLE DATABASE student(sid,name,login,gpa) enrolled(sid,cid,grade) sid name login age gpa sid cid grade 53666 Kanye kayne@cs 39 4.0 53666 15-445 C 53688 Bieber jbieber@cs 22 3.9 53688 15-721 A 53655 Tupac shakur@cs 26 3.5 53688 15-826 B 53655 15-445 B course(cid,name) 53666 15-721 C cid name 15-445 Database Systems 15-721 Advanced Database Systems 15-826 Data Mining 15-823 Advanced Topics in Databases 84 GT 8803 // Fall 2018

AGGREGATES • Functions that return a single value from a bag of tuples: – AVG(col) → Return the average col value. – MIN(col) → Return minimum col value. – MAX(col) → Return maximum col value. – SUM(col) → Return sum of values in col. – COUNT(col) → Return # of values for col. GT 8803 // Fall 2019 85

AGGREGATES • Aggregate functions can only be used in the SELECT output list. • Get # of students with a “@cs” login: SELECT COUNT (login) AS cnt FROM student WHERE login LIKE '%@cs' GT 8803 // Fall 2019 86

AGGREGATES • Aggregate functions can only be used in the SELECT output list. • Get # of students with a “@cs” login: SELECT COUNT (login) AS cnt FROM student WHERE login LIKE '%@cs' GT 8803 // Fall 2019 87

AGGREGATES • Aggregate functions can only be used in the SELECT output list. • Get # of students with a “@cs” login: SELECT COUNT (login) AS cnt FROM student WHERE login LIKE '%@cs' SELECT COUNT (*) AS cnt FROM student WHERE login LIKE '%@cs' GT 8803 // Fall 2019 88

AGGREGATES • Aggregate functions can only be used in the SELECT output list. • Get # of students with a “@cs” login: SELECT COUNT (login) AS cnt FROM student WHERE login LIKE '%@cs' SELECT COUNT (*) AS cnt FROM student WHERE login LIKE '%@cs' SELECT COUNT (1) AS cnt FROM student WHERE login LIKE '%@cs' GT 8803 // Fall 2019 89

MULTIPLE AGGREGATES • Get the number of students and their average GPA that have a “@cs” login. SELECT AVG (gpa), COUNT (sid) FROM student WHERE login LIKE '%@cs' GT 8803 // Fall 2019 90

MULTIPLE AGGREGATES • Get the number of students and their average GPA that have a “@cs” login. AVG(gpa) COUNT(sid) SELECT AVG (gpa), COUNT (sid) 3.25 12 FROM student WHERE login LIKE '%@cs' GT 8803 // Fall 2019 91

DISTINCT AGGREGATES • COUNT , SUM , AVG support DISTINCT • Get the number of unique students that have an “@cs” login. SELECT COUNT ( DISTINCT login) FROM student WHERE login LIKE '%@cs' GT 8803 // Fall 2019 92

DISTINCT AGGREGATES • COUNT , SUM , AVG support DISTINCT • Get the number of unique students that have an “@cs” login. COUNT(DISTINCT login) 10 SELECT COUNT ( DISTINCT login) FROM student WHERE login LIKE '%@cs' GT 8803 // Fall 2019 93

AGGREGATES • Output of other columns outside of an aggregate is undefined. • Get the average GPA of students enrolled in each course. SELECT AVG (s.gpa), e.cid FROM enrolled AS e, student AS s WHERE e.sid = s.sid GT 8803 // Fall 2019 94

AGGREGATES • Output of other columns outside of an aggregate is undefined. • Get the average GPA of students enrolled in each course. AVG(s.gpa) e.cid SELECT AVG (s.gpa), e.cid 3.5 ??? FROM enrolled AS e, student AS s WHERE e.sid = s.sid GT 8803 // Fall 2019 95

GROUP BY • Project tuples into subsets and calculate aggregates against SELECT AVG (s.gpa), e.cid FROM enrolled AS e, student AS s each subset. WHERE e.sid = s.sid GROUP BY e.cid GT 8803 // Fall 2019 96

GROUP BY • Project tuples into subsets and calculate aggregates against SELECT AVG (s.gpa), e.cid FROM enrolled AS e, student AS s each subset. WHERE e.sid = s.sid GROUP BY e.cid e.sid s.sid s.gpa e.cid 53435 53435 2.25 15-721 53439 53439 2.70 15-721 56023 56023 2.75 15-826 59439 59439 3.90 15-826 53961 53961 3.50 15-826 58345 58345 1.89 15-445 GT 8803 // Fall 2019 97

GROUP BY • Project tuples into subsets and calculate aggregates against SELECT AVG (s.gpa), e.cid FROM enrolled AS e, student AS s each subset. WHERE e.sid = s.sid GROUP BY e.cid e.sid s.sid s.gpa e.cid 53435 53435 2.25 15-721 53439 53439 2.70 15-721 56023 56023 2.75 15-826 59439 59439 3.90 15-826 53961 53961 3.50 15-826 58345 58345 1.89 15-445 GT 8803 // Fall 2019 98

GROUP BY • Project tuples into subsets and calculate aggregates against SELECT AVG (s.gpa), e.cid FROM enrolled AS e, student AS s each subset. WHERE e.sid = s.sid GROUP BY e.cid e.sid s.sid s.gpa e.cid 53435 53435 2.25 15-721 AVG(s.gpa) e.cid 53439 53439 2.70 15-721 2.46 15-721 56023 56023 2.75 15-826 3.39 15-826 59439 59439 3.90 15-826 1.89 15-445 53961 53961 3.50 15-826 58345 58345 1.89 15-445 GT 8803 // Fall 2019 99

GROUP BY • Project tuples into subsets and calculate aggregates against SELECT AVG (s.gpa), e.cid FROM enrolled AS e, student AS s each subset. WHERE e.sid = s.sid GROUP BY e.cid e.sid s.sid s.gpa e.cid 53435 53435 2.25 15-721 AVG(s.gpa) e.cid 53439 53439 2.70 15-721 2.46 15-721 56023 56023 2.75 15-826 3.39 15-826 59439 59439 3.90 15-826 1.89 15-445 53961 53961 3.50 15-826 58345 58345 1.89 15-445 GT 8803 // Fall 2019 100

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY - PowerPoint PPT Presentation

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ L E C T U R E # 0 5 : I N T R O D U C T I O N T O D A T A B A S E S Y S T E M S A N D A D V A N C E D S Q L administrivia Assignment 1 Due on Sep 18

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Deep Data Analytics for Pricing: Uses, Issues, and Solutions Walter R. Paczkowski, Ph.D. Data

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // VENKATA KISHORE PATCHA Lecture#16 :

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Google Analytics Overview Whats Google Analytics? The Google Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Data Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Attacks on DNS: Risks of Caching March 21, 2018 The Inside Story of How Facebook Responded to

Ponnurangam Kumaraguru Computation, Organizations and Society Carnegie Mellon University Feb 23

JavaScript: Memory, Passing, and Objects CS 115 Computing for the Socio-Techno Web Brian Brubach

Internet Internet 1962: Paul Baran, packet switching 1969: DARPA, 4-node ARPANET 1974:

Computer Systems Dr. Barry Wittman Not Dr. Barry Whitman Education: PhD and MS in

Logist ics Cour se Web Page 1: Welcome and Overview Mailing List St af f Text book

Chapter 10: Case Studies So what happens in a real operating system? Operating systems in the real

Using OpenBSD Security Features to Find Software Bugs Peter Valchev pvalchev@openbsd.org

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY - PowerPoint PPT Presentation

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ L E C T U R E # 0 5 : I N T R O D U C T I O N T O D A T A B A S E S Y S T E M S A N D A D V A N C E D S Q L administrivia Assignment 1 Due on Sep 18

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Deep Data Analytics for Pricing: Uses, Issues, and Solutions Walter R. Paczkowski, Ph.D. Data

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // VENKATA KISHORE PATCHA Lecture#16 :

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Google Analytics Overview Whats Google Analytics? The Google Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Data Mining &amp; Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Attacks on DNS: Risks of Caching March 21, 2018 The Inside Story of How Facebook Responded to

Ponnurangam Kumaraguru Computation, Organizations and Society Carnegie Mellon University Feb 23

JavaScript: Memory, Passing, and Objects CS 115 Computing for the Socio-Techno Web Brian Brubach

Internet Internet 1962: Paul Baran, packet switching 1969: DARPA, 4-node ARPANET 1974:

Computer Systems Dr. Barry Wittman Not Dr. Barry Whitman Education: PhD and MS in

Logist ics Cour se Web Page 1: Welcome and Overview Mailing List St af f Text book

Chapter 10: Case Studies So what happens in a real operating system? Operating systems in the real

Using OpenBSD Security Features to Find Software Bugs Peter Valchev pvalchev@openbsd.org

Data Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues