adaptive data modeling techniques
play

Adaptive Data Modeling Techniques M AXIMIZING THE W ORK N OT D ONE - PDF document

3/9/2016 A GILE D ATA W AREHOUSING / S TATE OF THE A RT FOR 2016 Adaptive Data Modeling Techniques M AXIMIZING THE W ORK N OT D ONE Ralph Hughes, MA, PMP, CSM ralph.hughes@ceregenics.com Presenters Background Website: www.Ceregenics.com


  1. 3/9/2016 A GILE D ATA W AREHOUSING / S TATE OF THE A RT FOR 2016 Adaptive Data Modeling Techniques M AXIMIZING THE W ORK N OT D ONE Ralph Hughes, MA, PMP, CSM ralph.hughes@ceregenics.com Presenter’s Background Website: www.Ceregenics.com Email: ralph.hughes@ceregenics.com www.linkedin.com/in/ralphhughesadw Twitter: @ceregenics Ralph Hughes  30 years solutions architecture, ETL & BI development  MA, PMP, CSM  Author of three agile methods books  Member of DW/BI advisory boards and best practices panel  Frequent keynote speaker & instructor DWBI conferences 2016 2 1

  2. 3/9/2016 Today’s Topics Agile = quick & continuous delivery of value to the customer Agile EDW achieves this goal through:  “Surface Solutions”  End-User Hadoop  Document data stores  Hyper modeling • Hyper normalization • Hyper generalization  Agile value cycle 3 Agile EDW Works Fabulously  Major Healthcare # of columns loaded per iteration Clinic (2014) 11x  Ceregenics joins an agile team that wasn’t getting traction during Iteration 7 story points per iteration  Best practices accelerates project by 3x to 10x, depending upon units of measure 3X First 6 iterations with a Scrum master only, no without best practices (subjective) AEDW best practices 5 2

  3. 3/9/2016 (Formerly) Outrageous Statements The problem of enterprise data warehousing has been solved:  If you’re not using 80/20 specifications, you’re taking 5x longer to get started  If you’re coding by hand, you’re wasting 90% of your programming  If you’re thinking only RDBMS, you’re building at least 3X more than you need  Without automated testing, you’re missing over half of the defects 6 Presenters Must Remain Tool-Agnostic Consumers Vendors   7 3

  4. 3/9/2016 Today’s Topics Agile = quick & continuous delivery of value to the customer Agile EDW achieves this goal through:  “Surface Solutions”  End-User Hadoop  Document data stores  Hyper modeling • Hyper normalization • Hyper generalization  Agile value cycle 8 EDW is Like Building 5+ Applications at Once Agile Solution Architect Staging Integration Presentation Semantics Application (warehouse) (star schema) (“BI universe”) Story      User Mini BI universes / frameworks • Each architectural layer has different purpose and constraints • Why approach them all with the same techniques and tools? • Provisional value available long before application layer, so why wait? 9 4

  5. 3/9/2016 Surface Solutions Possible Even with RDBMS Historical Departmental Raw Staging Archive Integration Dimensional OLAP BI Apps End User Access 1% of data 1: AQE Trends 5% of data 2: AQE 360 ° Vision 10% of data 3: AQE Single Version of the Truth 25% of data 4: AQE Dash boards Full data 5: 10 Today’s Topics Agile = quick & continuous delivery of value to the customer Agile EDW achieves this goal through:  “Surface Solutions”  End-User Hadoop  Document data stores  Hyper modeling • Hyper normalization • Hyper generalization  Agile value cycle 11 5

  6. 3/9/2016 Option 3: Agile Big Data Schema-on-Read at Facebook Feedback & Parsing Change Requests instructions (Tables & Views) Fast Cycle HQL RDBMS Server HDFS Hive on Hadoop Cluster Web Log Web Servers File Writers Scribers Data Warehousing End-User Departments Team 12 Evolving Surface Solutions Using Hadoop Sub Releases: Systems Source Area Landing Layer Integration Layer Presentation Semantic Layer HDFS Data 1 Stores 2 EDW Extract 3 13 6

  7. 3/9/2016 Today’s Topics Agile = quick & continuous delivery of value to the customer Agile EDW achieves this goal through:  “Surface Solutions”  End-User Hadoop  Document data stores  Hyper modeling • Hyper normalization • Hyper generalization  Agile value cycle 14 Option 4: Document Database Increasingly More Big Data are in XML and Json Documents Document data stores will let us explore and build quick solutions with very little programming Agile Solution Architect XML & Json 15 7

  8. 3/9/2016 Powerful Search with Little Programming Out-of-the-Box Delivery Tool – Requires only changing Google-style query Weighted elements the query text and the indexing of the documents Registry of Patient Registries (Prototype) HCA / NY Data Facets anesthetic implanted atrial edema OR autoimmune x search Data Source Diabetcs sort by: relevance Immunology Cardiology Ann Goldreimer: MRN 522-060-774: DOB 11/28/14 Anesthesia Atrial defibulator implant 8/1/14; Medtronic ADU-2193; copper leads; ICD-10-CM I50.9 Pharmacy Skills needed: diagnosis for reimbursement purposes; billable; date of dischard August 13, 2014; require the use of; anesthetics: inhaled agents nitrous oxide, Sevoflurane; transfusion-linked aortal • Some HTML edema; new-stroke ... [more] Age Bracket • A little CSS < 10 Kyle Jeffery Watson: MRN 623-293-228: DOB 7/3/1981 10-19 • Some XML or Json 20s 11/21/14 re-implantation service; ventral left atrium; presenting ICD-10-CM I50.21 non- • Some Xquery / XPath 30-45 reimbursed; discharged 12/1/2014; general anesthesia: Halothane and Methoxyflurane 46-60 (inhalation), Diazepam (intravenous); significant-delay; autoimune hepatitis;... [more] or Java script 60+ Analytical facets Stemming 17 Today’s Topics Agile = quick & continuous delivery of value to the customer Agile EDW achieves this goal through:  “Surface Solutions”  End-User Hadoop  Document data stores  Hyper modeling • Hyper normalization • Hyper generalization  Agile value cycle 19 8

  9. 3/9/2016 Step 1: Identify Business Keys Step 2: Create M-M Links 9

  10. 3/9/2016 Step 3: Add Attributes HNF Makes Re-Usable ETL Straightforward BKs Links Attribs 10

  11. 3/9/2016 Parameter-Driven ETL Load_BK (target, source, BKs natural key column list) “Cookie - cutter ETL” Links Load_Link (target, source 1, src 1 natural key cols, source 2, src 2 natural key cols) Attribs Load_Attribs (target, source, exclude column list) Today’s Topics Agile = quick & continuous delivery of value to the customer Agile EDW achieves this goal through:  “Surface Solutions”  End-User Hadoop  Document data stores  Hyper modeling • Hyper normalization • Hyper generalization  Agile value cycle 25 11

  12. 3/9/2016 From HNF to HGF 0 of 5 Convert to Metadata to Distinguish Instances 1 of 5 12

  13. 3/9/2016 “Fold” the Model to Eliminate Separate Tables 2 of 5 Combine Tables with Equivalent Function 3 of 5 The Model The Data 13

  14. 3/9/2016 Allow Re-Classifications of Instances 4 of 5 Completely Temporal Data Warehouse The Model 5 of 5 The Data 14

  15. 3/9/2016 Model Objects Map to Meta Data Entities “Product Type” class exists 1. “Product” class exists 2. Product rolls up to Product Category 3. 3 1 2 Computer-Assisted ETL Programming 33 15

  16. 3/9/2016 Automation Surrounds Us  Computers build our goods…  …land our planes…  …will soon drive our cars. Why are we still building data warehouses by hand? Ceregenics proprietary information 34 Henry Ford Considers a Tesla 1947 Ford Coupe Where’s the carburetor? ... the transmission? ...the radiator? I can see all kinds of problems with this car. What a hoax! 35 16

  17. 3/9/2016 Why are we still building data warehouses by hand?  Don’t understand what’s going on under the hood  Don’t want to give up data modeling  Don’t want to be the first one to do it  Too much invested in 1990s technology  Don’t want to eliminate people’s jobs  Don’t want to eliminate my department  Staff can’t handle learning another tool 36 Tools for the Business Opportunity Cycle 17

  18. 3/9/2016 Automated Monitoring for Faster Requirements Tools for the Business Opportunity Cycle 1. Hadoop 2. Document Data Package 1. Data Virtualization Server 2. Adaptive Master Data 3. Enterprise BI Package XQuery Configuraton 4. Document Data Package Data Warehouse Generator Citizen Data Scientist Tool 1. Data Warehouse Generator 2. Citizen Data Scientist Tool 18

  19. 3/9/2016 Find Your Voice & Help Others to Find Theirs ̶̵ Stephen Covey ’ s “ Eighth Habit ”  Call for contributors  Write a chapter, sidebar, or a section  Focus: theory & case histories that blend • Disciplined agile & EDW methods • Hadoop, M/R, Spark • Textual and triple data stores • Empowering citizen data scientist 40 Long Design will Still Delay Value Traditional Methods Project Definition Database Design Release Review ETL Coding Agile Approach + Productivity Tools • “Surface Solutions” • End-User Hadoop • Document data stores        • Hyper normalization • Hyper generalization • Agile value cycle 41 19

  20. 3/9/2016 999 18 th Street, Suite 3000 Denver CO 80033 303.274.9101 www.Ceregenics.com Hyper normalization: //www.youtube.com/watch?v=3QOSOeN8vcY Hyper generalization: //www.youtube.com/watch?v=aNtUoVkeq_Q 42 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend