Adaptive Data Modeling Techniques M AXIMIZING THE W ORK N OT D ONE - - PDF document

adaptive data modeling techniques
SMART_READER_LITE
LIVE PREVIEW

Adaptive Data Modeling Techniques M AXIMIZING THE W ORK N OT D ONE - - PDF document

3/9/2016 A GILE D ATA W AREHOUSING / S TATE OF THE A RT FOR 2016 Adaptive Data Modeling Techniques M AXIMIZING THE W ORK N OT D ONE Ralph Hughes, MA, PMP, CSM ralph.hughes@ceregenics.com Presenters Background Website: www.Ceregenics.com


slide-1
SLIDE 1

3/9/2016 1

Adaptive Data Modeling Techniques

MAXIMIZING THE WORK NOT DONE Ralph Hughes, MA, PMP, CSM ralph.hughes@ceregenics.com AGILE DATA WAREHOUSING / STATE OF THE ART FOR 2016

Presenter’s Background

Ralph Hughes

 30 years solutions architecture,

ETL & BI development

 MA, PMP, CSM  Author of three agile methods books  Member of DW/BI advisory boards and best practices panel  Frequent keynote speaker & instructor DWBI conferences

Website: www.Ceregenics.com Email: ralph.hughes@ceregenics.com www.linkedin.com/in/ralphhughesadw Twitter: @ceregenics

2016 2

slide-2
SLIDE 2

3/9/2016 2

Today’s Topics

Agile = quick & continuous delivery of value to the customer

Agile EDW achieves this goal through:

 “Surface Solutions”  End-User Hadoop  Document data stores  Hyper modeling

  • Hyper normalization
  • Hyper generalization

 Agile value cycle

3

Agile EDW Works Fabulously

5

11x

# of columns loaded per iteration story points per iteration without best practices (subjective)

3X First 6 iterations with a Scrum master only, no AEDW best practices

 Major Healthcare

Clinic (2014)

 Ceregenics joins an

agile team that wasn’t getting traction during Iteration 7

 Best practices

accelerates project by 3x to 10x, depending upon units of measure

slide-3
SLIDE 3

3/9/2016 3

(Formerly) Outrageous Statements

The problem of enterprise data warehousing has been solved:

 If you’re not using 80/20 specifications, you’re taking 5x longer to get started  If you’re coding by hand, you’re wasting 90% of your programming  If you’re thinking only RDBMS, you’re building at least 3X more than you need  Without automated testing, you’re missing over half of the defects

6 7

Presenters Must Remain Tool-Agnostic

Vendors Consumers

 

slide-4
SLIDE 4

3/9/2016 4

Today’s Topics

Agile = quick & continuous delivery of value to the customer

Agile EDW achieves this goal through:

 “Surface Solutions”  End-User Hadoop  Document data stores  Hyper modeling

  • Hyper normalization
  • Hyper generalization

 Agile value cycle

8

Agile Solution Architect

EDW is Like Building 5+ Applications at Once

Staging Integration (warehouse) Presentation (star schema) Semantics (“BI universe”) Application

User Story  

 

  • Each architectural layer has different purpose and constraints
  • Why approach them all with the same techniques and tools?
  • Provisional value available long before application layer, so why wait?

Mini BI universes / frameworks

9

slide-5
SLIDE 5

3/9/2016 5

Surface Solutions Possible Even with RDBMS

Raw Staging Historical Archive Integration Dimensional OLAP Departmental BI Apps

AQE AQE AQE AQE Access Trends 360° Vision Dash boards

1: 2: 3: 4: 5:

Single Version

  • f the Truth

End User

1% of data 5% of data 10% of data 25% of data Full data 10

Today’s Topics

Agile = quick & continuous delivery of value to the customer

Agile EDW achieves this goal through:

 “Surface Solutions”  End-User Hadoop  Document data stores  Hyper modeling

  • Hyper normalization
  • Hyper generalization

 Agile value cycle

11

slide-6
SLIDE 6

3/9/2016 6

Option 3: Agile Big Data

12

Web Servers Web Log Scribers HDFS File Writers Hive on Hadoop Cluster

Data Warehousing Team End-User Departments

RDBMS Server

HQL Parsing instructions (Tables & Views) Feedback & Change Requests Schema-on-Read at Facebook Fast Cycle

Evolving Surface Solutions Using Hadoop

13

Source Systems Integration Layer Presentation Layer

Semantic Layer

Landing Area

HDFS Data Stores

EDW

Extract 1 2 3

Sub Releases:

slide-7
SLIDE 7

3/9/2016 7

Today’s Topics

Agile = quick & continuous delivery of value to the customer

Agile EDW achieves this goal through:

 “Surface Solutions”  End-User Hadoop  Document data stores  Hyper modeling

  • Hyper normalization
  • Hyper generalization

 Agile value cycle

14

Option 4: Document Database

15

XML & Json

Increasingly More Big Data are in XML and Json Documents

Document data stores will let us explore and build quick solutions with very little programming

Agile Solution Architect

slide-8
SLIDE 8

3/9/2016 8

Powerful Search with Little Programming

17 Registry of Patient Registries (Prototype)

Data Facets anesthetic implanted atrial edema OR autoimmune x search Age Bracket < 10 10-19 20s 30-45 46-60 60+ Data Source Diabetcs Immunology Cardiology Anesthesia Pharmacy relevance sort by: Ann Goldreimer: MRN 522-060-774: DOB 11/28/14 Atrial defibulator implant 8/1/14; Medtronic ADU-2193; copper leads; ICD-10-CM I50.9 diagnosis for reimbursement purposes; billable; date of dischard August 13, 2014; require the use of; anesthetics: inhaled agents nitrous oxide, Sevoflurane; transfusion-linked aortal edema; new-stroke ... [more] 11/21/14 re-implantation service; ventral left atrium; presenting ICD-10-CM I50.21 non- reimbursed; discharged 12/1/2014; general anesthesia: Halothane and Methoxyflurane (inhalation), Diazepam (intravenous); significant-delay; autoimune hepatitis;...[more] Kyle Jeffery Watson: MRN 623-293-228: DOB 7/3/1981 HCA / NY

Google-style query Weighted elements Analytical facets Stemming

Skills needed:

  • Some HTML
  • A little CSS
  • Some XML or Json
  • Some Xquery / XPath
  • r Java script

Out-of-the-Box Delivery Tool – Requires only changing the query text and the indexing of the documents

Today’s Topics

Agile = quick & continuous delivery of value to the customer

Agile EDW achieves this goal through:

 “Surface Solutions”  End-User Hadoop  Document data stores  Hyper modeling

  • Hyper normalization
  • Hyper generalization

 Agile value cycle

19

slide-9
SLIDE 9

3/9/2016 9

Step 1: Identify Business Keys Step 2: Create M-M Links

slide-10
SLIDE 10

3/9/2016 10

Step 3: Add Attributes HNF Makes Re-Usable ETL Straightforward

BKs Links Attribs

slide-11
SLIDE 11

3/9/2016 11

Parameter-Driven ETL

Load_BK (target, source, natural key column list) Load_Link (target, source 1, src 1 natural key cols, source 2, src 2 natural key cols) Load_Attribs (target, source, exclude column list)

BKs Links Attribs

“Cookie-cutter ETL”

Today’s Topics

Agile = quick & continuous delivery of value to the customer

Agile EDW achieves this goal through:

 “Surface Solutions”  End-User Hadoop  Document data stores  Hyper modeling

  • Hyper normalization
  • Hyper generalization

 Agile value cycle

25

slide-12
SLIDE 12

3/9/2016 12

From HNF to HGF

0 of 5

Convert to Metadata to Distinguish Instances

1 of 5

slide-13
SLIDE 13

3/9/2016 13

“Fold” the Model to Eliminate Separate Tables

2 of 5

Combine Tables with Equivalent Function

3 of 5

The Model The Data

slide-14
SLIDE 14

3/9/2016 14

Allow Re-Classifications of Instances

4 of 5

Completely Temporal Data Warehouse

5 of 5 The Model The Data

slide-15
SLIDE 15

3/9/2016 15

Model Objects Map to Meta Data Entities

1.

“Product Type” class exists

2.

“Product” class exists

3.

Product rolls up to Product Category

1 2 3

33

Computer-Assisted ETL Programming

slide-16
SLIDE 16

3/9/2016 16

Automation Surrounds Us

 Computers build our goods…  …land our planes…  …will soon drive our cars.

Why are we still building data warehouses by hand?

Ceregenics proprietary information

34

Henry Ford Considers a Tesla

35

Where’s the carburetor? ... the transmission? ...the radiator? I can see all kinds of problems with this car. What a hoax!

1947 Ford Coupe

slide-17
SLIDE 17

3/9/2016 17

Why are we still building data warehouses by hand?

Don’t understand what’s going on under the hood Don’t want to give up data modeling Don’t want to be the first one to do it Too much invested in 1990s technology Don’t want to eliminate people’s jobs Don’t want to eliminate my department Staff can’t handle learning another tool

36

Tools for the Business Opportunity Cycle

slide-18
SLIDE 18

3/9/2016 18

Automated Monitoring for Faster Requirements Tools for the Business Opportunity Cycle

Citizen Data Scientist Tool 1. Data Virtualization Server 2. Adaptive Master Data 3. Enterprise BI Package 4. Document Data Package 1. Hadoop 2. Document Data Package XQuery Configuraton Data Warehouse Generator 1. Data Warehouse Generator 2. Citizen Data Scientist Tool

slide-19
SLIDE 19

3/9/2016 19

Find Your Voice & Help Others to Find Theirs

40

̶̵ Stephen Covey’s “Eighth Habit”  Call for contributors  Write a chapter, sidebar, or a section  Focus: theory & case histories that blend

  • Disciplined agile & EDW methods
  • Hadoop, M/R, Spark
  • Textual and triple data stores
  • Empowering citizen data scientist

Long Design will Still Delay Value

Project Definition Database Design ETL Coding Release Review Traditional Methods Agile Approach + Productivity Tools  

41

    

  • “Surface Solutions”
  • End-User Hadoop
  • Document data stores
  • Hyper normalization
  • Hyper generalization
  • Agile value cycle
slide-20
SLIDE 20

3/9/2016 20

999 18th Street, Suite 3000 Denver CO 80033

303.274.9101 www.Ceregenics.com

42

Hyper normalization: //www.youtube.com/watch?v=3QOSOeN8vcY Hyper generalization: //www.youtube.com/watch?v=aNtUoVkeq_Q