relational non relational
play

Relational Non-Relational Rational Agile Predictable Flexible - PowerPoint PPT Presentation

B IG D ATA A NALYTICS R EFERENCE A RCHITECTURES AND C ASE S TUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Agile Predictable Flexible Traditional Modern 2 Agenda Tips for Big Data


  1. B IG D ATA A NALYTICS R EFERENCE A RCHITECTURES AND C ASE S TUDIES

  2. Relational vs. Non-Relational Architecture Relational Non-Relational • Rational • Agile • Predictable • Flexible • Traditional • Modern 2

  3. Agenda Tips for Big Data Big Data Case Designing Reference Challenges Studies Big Data Architectures Solutions 3

  4. Big Data Challenges UNSTRUCTURED STRUCTURED HIGH MEDIUM LOW Archives Docs Business Media Social Public Data Machine Sensor Apps Networks Web Storages Log Data Data Complexity Velocity Variety Volume Media Archives Data Storages Scanned documents, statements, Images, video, audio etc. RDBMS, NoSQL, Hadoop, file systems medical records, e-mails etc.. etc. Docs Social Networks Machine Log Data XLS, PDF, CSV, HTML, JSON etc. Twitter, Facebook, Google+, Application logs, event logs, server LinkedIn etc. data, CDRs, clickstream data etc. Business Apps Public Web Sensor Data CRM, ERP systems, HR, project Wikipedia, news, weather, public Smart electric meters, medical management etc. finance etc devices, car sensors, road cameras etc. 4

  5. Big Data Analytics Big Data Analytics Traditional Analytics (BI) vs Focus on • Predictive analytics • Descriptive analytics • Data Science • Diagnosis analytics • Large scale data sets • Limited data sets Data Sets • More types of data • Cleansed data • Raw data • Simple models • Complex data models Supports Causation: what happened, Correlation : new insight and why? More accurate answers 5

  6. Big Data Analytics Use Cases Low Latency Reliability Real Time Intelligence Consumers Intelligent Agents Volume Data Quality Performance Self Service Data Business Discovery Reporting Data Scientists/ Business Users Analysts 6

  7. Big Data Analytics Reference Architectures Architecture Drivers: Reference Architectures: ▪ Extended Relational ▪ Volume ▪ Sources ▪ Non-Relational ▪ Throughput ▪ Hybrid ▪ Latency ▪ Extensibility ▪ Data Quality ▪ Reliability ▪ Security ▪ Self-Service ▪ Cost 7

  8. Relational Reference Architecture Data Sources Integration Data Storages Analytics Presentation Data Query & Web Structured ETL Warehouses Reporting Browsers Semi- Native OLAP Cubes Messaging Data Marts Structured Desktop Operational Advanced Mobile Unstructured API/ODBC Data Stores Analytics Devices Replication Web Services 8

  9. Extended Relational Reference Architecture Data Sources Integration Data Storages Analytics Presentation Data Query & Web Structured ETL Warehouses Reporting Browsers Semi- Native OLAP Cubes Messaging Data Marts Structured Desktop Operational Advanced Mobile Unstructured API/ODBC Data Stores Analytics Devices Replication Web Services Key components affected with Big Data challenges 9

  10. Non-Relational Reference Architecture Data Sources Integration Data Storages Analytics Presentation Query & Web NoSQL Structured ETL Reporting Browsers Databases Semi- Native Distributed File Messaging Map Reduce Structured Desktop Systems Mobile Unstructured API Search Engines Devices Advanced Web Services Analytics Key components introduced with non-relational movement 10

  11. Extended Relational vs. Non-Relational Architecture Extended Architecture Drivers Non ‐ Relational Relational Large data volume Self ‐ service (ad ‐ hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault ‐ tolerance Low latency (near ‐ real time) Low cost Skills availability 11

  12. Extended Relational vs. Non-Relational Architecture Extended Architecture Drivers Non ‐ Relational Relational Large data volume Self ‐ service (ad ‐ hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault ‐ tolerance Low latency (near ‐ real time) Low cost Skills availability 12

  13. Extended Relational vs. Non-Relational Architecture Extended Architecture Drivers Non ‐ Relational Relational Large data volume Self ‐ service (ad ‐ hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault ‐ tolerance Low latency (near ‐ real time) Low cost Skills availability 13

  14. Relational vs. Non-Relational Architecture Relational Non-Relational • Rational • Agile • Predictable • Flexible • Traditional • Modern 14

  15. Big Data Analytics Use Cases Real Time Intelligence Consumers Intelligent Agents Performance Volume Data Business Discovery Reporting Data Scientists Business Users 15

  16. Data Discovery: Non-Relational Architecture Data Sources Integration Data Storages Analytics Presentation Query & Web NoSQL Structured ETL Reporting Browsers Databases Semi- Native Distributed File Map Reduce Messaging Structured Desktop Systems Mobile Search Engines Unstructured API Devices Advanced Web Services Analytics 16

  17. Big Data Analytics Use Cases Real Time Intelligence Consumers Intelligent Agents Data Quality Self Service Data Business Discovery Reporting Data Scientists Business Users 17

  18. Business Reporting: Hybrid Architecture Data Sources Integration Data Storages Analytics Presentation SQL Query & Web Relational Structured ETL Reporting Browsers DWH/DM Semi- Native Distributed File Map Reduce Messaging Desktop Structured Systems Mobile Search Engines Unstructured API Devices Advanced Web Services Analytics Extended Relational components Non-relational components 18

  19. Big Data Analytics Use Cases Low Latency Reliability Real Time Intelligence Consumers Intelligent Agents Data Business Discovery Reporting Data Scientists Business Users 19

  20. Lambda Architecture Source: 20

  21. Case Study #1: Usage & Billing Analysis Business Goals: Business Goals: Business Ar Business Area: ea:  Provide visual environment for building Cloud based platform for building, deploying, custom mobile application hosting and managing of mobile applications  Charge customers based on the platform they are using, number of consumers’ applications etc. 21

  22. Architectural Decisions Architectur Ar chitecture Driver Drivers: s: ▪ Volume (> 10 TB) ▪ Reliability (24/7) ▪ Sources (Semi-structured - JSON) ▪ Security (Multitenancy) ▪ Throughput (> 10K/sec) ▪ Self Self-Ser -Service (Ad-Ho vice (Ad-Hoc r repor ports) s) ▪ Latency (2 min) ▪ Cost (The less the better  ) ▪ Extensibility (Custom m tensibility (Custom metrics) trics) ▪ Constraints (Public Cloud) ▪ Data Quality (Consisten Data Quality (Consistency) cy) Tr Trade-off: Extended Non-Relational // Relational Extensibility ‐ +  Extended Relational Architecture  Extensibility via Pre ‐ allocated Data Quality + ‐ Fields pattern Self-Service + ‐ 22

  23. Technologies: Solution Architecture • Amazon Redshift • Amazon SQS • Amazon S3 • Elastic Beanstalk • Jaspersoft BI Professional • Python 23

  24. Case Study #2: Clickstream for retail website Business Goals: Business Goals: Business Ar Business Area: ea:  Build in-house Analytics Platform for ROI measurement and performance analysis of every product and feature Retail. A platform for e-commerce and delivered by the e-commerce platform; collecting feedbacks from customers  Provide the ability to understand how end-users are interacting with service content, products, and features on sites;  Do clickstream analysis;  Perform A/B T esting 24

  25. Architectural Decisions Architectur Ar chitecture Driver Drivers: s: ▪ Volume (45 TB) lume (45 TB) ▪ Reliability (24/7) ▪ Sources (Semi-structured - JSON) ▪ Security (Multitenancy) ▪ Thr Throughput (> 20K/sec) ughput (> 20K/sec) ▪ Self Self-Ser -Service (Canned r vice (Canned repor ports, Data s, Data scien science) e) ▪ Latency (1 hour) ▪ Cost (The less the better  ) ▪ Extensibility (Custom tags) tensibility (Custom tags) ▪ Constraints (Public Cloud) ▪ Data Quality (Not critical) Tr Trade-off: Extended Non- // Relational Relational Volume/Scalability +/ ‐ +  Non ‐ Relational Architecture  Reporting via Materialized View Throughput + + pattern Self-Service + +/ ‐ Extensibility ‐ + 25

  26. Technologies: Solution Architecture • Amazon S3 • Flume • Hadoop/HDFS, MapReduce • HBase • Oozie • Hive Node 1 Node 2 Node N 26

  27. Tips for Designing Big Data Solutions Understand data users and sources  Discover architecture drivers  Select proper reference architecture  Do trade-off analysis, address cons  Map reference architecture to technology stack  Prototype, re-evaluate architecture  Estimate implementation efforts  Set up devops practices from the very beginning  Advance in solution development through “small wins”  Be ready for changes, big data technologies are evolving  rapidly 27

  28. Clients include: Leading global Product and ▪ Application Development partner founded in 1993 3,300+ employees across North ▪ America, Ukraine and Western Europe Thousands of successful outsourcing ▪ projects! SaaS/Cloud Solutions . Mobility Solutions . UX/UI BI/Analytics/Big Data . Software Architecture . Security 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend