overview dw source integration tools and architecture
play

Overview DW Source Integration, Tools, and Architecture DW Front - PowerPoint PPT Presentation

Overview DW Source Integration, Tools, and Architecture DW Front End Tools Source Integration DW architecture Original slides were written by Torben Bach Pedersen Aalborg University 2007 - DWML course 2 End User Applications


  1. Overview DW Source Integration, Tools, and Architecture • DW Front End Tools • Source Integration • DW architecture Original slides were written by Torben Bach Pedersen Aalborg University 2007 - DWML course 2 End User Applications (EUA) EUA Concepts • The business impact of the DW! • Templates • Canned reports � Layout/structure + parameters � End user application templates � Compare sales per product in <area> for <period1> and <period2> � Provide answers to common questions • Parameters - chosen at run-time � Can be used as (quality-assured) building blocks for other reports � Come from any level of the given dimension – drill-down • Two extremes � Time (All time, 2002, 2002 4Q, 2002 Dec, 2002 Dec 1) possible � Ad hoc strategic analysis, power users, DIY query tools � Many different � Fixed operational analysis, report consumers, operational reporting • Identify report candidates • EUA fills the gap � Produce a list of candidates � “Tactical” analysis, push-button knowledge workers • Consolidate candidate list � Categorize candidates by data elements Aalborg University 2007 - DWML course 3 Aalborg University 2007 - DWML course 4

  2. What Templates to Choose? Overview • “Analytical Cycle” Steps (repeats) • DW Front End Tools How’s business? – current performance 1) What are the trends? – performance over time 2) • Source Integration What’s unusual? – quick identification of exceptions (+/-) 3) What is driving the exceptions? – find causes for exceptions 4) • DW architecture What if …? – play around with parameters and see effect 5) Make a business decision – small as well as big decisions 6) Implement the decision – feed analysis results into op. systems 7) • Prioritize template list Rank or group templates – implement 15 most important at first � Aalborg University 2007 - DWML course 5 Aalborg University 2007 - DWML course 6 Data Integration Research Projects Views on DW Metadata • Focus on source integration and update propagation • Wrapper: convert source data into a standard format • Information Manifold � Sources: databases, SGML docs, unstructured files,… • Most DW projects: DW architecture as a “stepwise flow” of information from � Relational integration data model source to analyst • TSIMMIS • No conceptual domain model used for � Wrapper/mediator integration Some questions cannot be answered � Semi-structured OEM integration data model � • DWQ project: extended metamodel to • Squirrel capture all relevant aspects � Powerful “integration mediator” • WHIPS � Wrapper/mediator � Relational integration data model Aalborg University 2007 - DWML course 7 Aalborg University 2007 - DWML course 8

  3. Using DW Metadata in the Enterprise Analyst: “Why can’t I answer question X?” • Analyst wants to analyze data Gather data from operational • Possible reasons � departments through OLTP (5) � Certain measures not included in fact table Question travels through (1)-(5) � • Traditional DW (previous slide) � Granularity of facts too coarse (4) only describes step (3)-(4) � Particular dimensions not in DW Cannot solve problems like � � Descriptive attributes missing from dimensions “why can’t I answer quest. X?” (3) • Conceptual relationships between (1) � Meaning of attributes/measures deviate from the enterprise model, operational analyst’s expectation (2) models + DW must be captured � …… Everything is a view on the � enterprise model ! (“local as view”) – unlike previous slide Aalborg University 2007 - DWML course 9 Aalborg University 2007 - DWML course 10 DWQ Metadata Source integration practice • Focus on information integration in databases • Three metadata perspectives (schema and data) must be captured Conceptual (enterprise) • Two main approaches � Logical (data model) � � Constructing integrated enterprise model Physical (data flow) � � Focus on mappings between sources and DW • Framework instantiated by • Tools for DW management conceptual, logical, and physical � Schema integration information models • DW quality heavily depends on � Metadata management DW processes rather than � Based on modeling tools schemas • Tools for data integration • A process meta model is needed � Mapping specification to capture process definitions, and � ETL tools – like last lecture the relationships to DW quality Aalborg University 2007 - DWML course 11 Aalborg University 2007 - DWML course 12

  4. Schema Integration Virtual Data Integration • Only data definition is integrated • Producing one global schema (one-shot or incremental) Data only in sources, queries on views, queries shipped to sources � • Pre-integration Not suited for DW? � Analyzing and annotating source schemata � • Carnot Semantic enrichment of schema, often in richer data model � Individual schemata mapped onto rich GCL ontology (1. order logic) � • Schema comparison Articulation axioms specify mappings, queries mapped to GCL � Determine correlations/conflicts among schema concepts � • SIMS Heterogeneity conflicts – different source data models � Creates common class-based domain model to describe sources � Naming conflicts – homonyms and synonyms � Sources are dynamically chosen and integrated at query time � Semantic conflicts –different abstraction levels � Query reformulation, access planning, optimization, execution � Structural – different constructs � • Information Manifold • Schema conforming Relational world view + information source description + correspondences � Conform/align schemas to make them compatible � Metamodel enriched using description logic/Datalog rules � Typically semi-automatic process � Datalog queries, optimized by choosing ”minimal” sources � • Schema merging and restructuring • TSIMMIS “Superimpose” conformed schemas � Wrappers wrap sources using semi-structured OEM model � Quality: completeness, correctness, minimality, understandability � Mediator performs its own integration – no global integration (global as view) � Aalborg University 2007 - DWML course 13 Aalborg University 2007 - DWML course 14 Materialized Data Integration DWQ Source Integration • Current DW tools cannot fully support DW quality � No support for validation of interschema assertions and other • Views on source data are materialized in integrated DB specified relationships, i.e., the DW design process • Squirrel • Conceptual perspective � Integration mediators incrementally maintain materialized views � Domain model = enterprise model + source models Cooperation of sources required � � Consolidated and reconciled description of important concepts • WHIPS ◆ Not all enterprise data captured (at first, incremental approach) � Relational SPJ + aggregation views specified in view tree � Logic-based formalism allows reasoning over metadata � View manager computes view and handles updates � Intermodel assertions capture interdependencies � Integrator ensures view maintainability • Logical perspective � Global query processor queries sources using wrappers/mediators � Source schemata + DW schema in logical data model (relational) • In combination with virtual integration? � Defined as queries over the corresponding conceptual component • Physical perspective � The actual data stores Aalborg University 2007 - DWML course 15 Aalborg University 2007 - DWML course 16

  5. DWQ Source Integration Architecture DWQ Source Integration Methodology • Source-driven integration � Enterprise and source model construction � Source model integration (into the domain model) � Source and DW schema specification (+ mappings) � Data integration and reconciliation � Quality analysis steps in all phases above • Client-driven integration � New client query considered � Reasoning determines whether query can be answered by materialized views already in DW ◆ Query containment reasoning � If DW not sufficient, materialize new concepts in domain model? � Otherwise, new sources must be added using source-driven integr. Note explicit mappings! Aalborg University 2007 - DWML course 17 Aalborg University 2007 - DWML course 18 Overview Lifecycle Overview • DW Front End Tools Technical Product Technical Product • Source Integration Architecture Selection& Architecture Selection& Design Installation Design Installation • DW architecture Data Staging Business Data Staging Business Dimensional Physical Project Dimensional Physical Maintenance Project Maintenance Design & Requirements Design & Deployment Requirements Deployment Modeling Design Planning Modeling Design and Growth Planning and Growth Development Definition Development Definition End-User End-User End-User End-User Application Application Application Application Specification Development Specification Development Project Management Project Management Aalborg University 2007 - DWML course 19 Aalborg University 2007 - DWML course 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend