OLAP & Data Mining OLAP & Data Mining Agenda Agenda SQL - - PDF document

olap data mining olap data mining agenda agenda
SMART_READER_LITE
LIVE PREVIEW

OLAP & Data Mining OLAP & Data Mining Agenda Agenda SQL - - PDF document

OLAP & Data Mining OLAP & Data Mining Agenda Agenda SQL Server Features (in short) SQL Server Features (in short) OLAP OLAP Data Mining Data Mining Demos Demos Agenda Agenda SQL Server Features (in short) SQL Server Features


slide-1
SLIDE 1

OLAP & Data Mining OLAP & Data Mining Agenda Agenda

SQL Server Features (in short) SQL Server Features (in short) OLAP OLAP Data Mining Data Mining Demos Demos

slide-2
SLIDE 2

Agenda Agenda

SQL Server Features (in short) SQL Server Features (in short) OLAP OLAP Data Mining Data Mining Demos Demos

slide-3
SLIDE 3

SQL Server Features SQL Server Features

.NET Inside the database .NET Inside the database Native support of XML & Web Services Native support of XML & Web Services Visual Studio Integration Visual Studio Integration SQL Server Mobile Enhancements SQL Server Mobile Enhancements 64 64-

  • bit support

bit support Notification Services Notification Services Integration Services Integration Services Scalability & Availability Improvements Scalability & Availability Improvements

What .NET Inside the Database means? What .NET Inside the Database means?

Same experience as developing normal Same experience as developing normal applications applications Place part of the business logic inside the database Place part of the business logic inside the database Extend data types with user defined types Extend data types with user defined types Better Performance Better Performance

No connections required No connections required Managed Code is compiled Managed Code is compiled – – TSQL TSQL interpeted interpeted

Write object oriented applications and avoid T Write object oriented applications and avoid T-

  • SQL

SQL Use the rich API of .NET Framework Use the rich API of .NET Framework Expose DB functionality via web services Expose DB functionality via web services

slide-4
SLIDE 4

SQL Server 2005 SQL Server 2005

A Complete Enterprise Data Management and BI Solution

Analysis Services Analysis Services

OLAP & Data Mining OLAP & Data Mining

Data Transformation Data Transformation Services Services

ETL ETL

SQL Server SQL Server

Relational Engine Relational Engine

Reporting Services Reporting Services Management Tools Management Tools Development Tools Development Tools

Agenda Agenda

SQL Server Features (in short) SQL Server Features (in short) OLAP OLAP Data Mining Data Mining Demos Demos

slide-5
SLIDE 5

Relational Versus OLAP Reports Relational Versus OLAP Reports

  • OLAP

OLAP

  • Ease of navigation and exploration

Ease of navigation and exploration

  • Detail reporting

Detail reporting

  • Rich analytics

Rich analytics

  • End

End-

  • user oriented

user oriented

  • High performance

High performance

  • Simple management

Simple management

  • Single data store

Single data store

  • Real time data access

Real time data access

  • Flexible schema

Flexible schema Relational Relational Feature Feature

Basic Principles Basic Principles

OLAP databases are OLAP databases are denormalized denormalized databases databases

Normalization is good for updates Normalization is good for updates – – inserts but bad for inserts but bad for performance performance

Process data in advance to provide better response Process data in advance to provide better response times times Process more Process more data data need need for extra disk for extra disk space space takes takes more more time time faster faster response: We should find the balance response: We should find the balance Precalculate Precalculate values that users may ask for values that users may ask for No real No real-

  • time data

time data Use Integration Services (DTS) to build and update the Use Integration Services (DTS) to build and update the datawarehouse datawarehouse

slide-6
SLIDE 6

Concepts Concepts

Fact tables Fact tables Dimension tables Dimension tables Measures Measures Star and Snowflake schemas Star and Snowflake schemas Cube Cube Storage options: ROLAP Storage options: ROLAP-

  • HOLAP

HOLAP-

  • MOLAP

MOLAP

Multidimensional Data

Sales volume as a function of product, Sales volume as a function of product, month, and region month, and region

Product R e g i

  • n

Month

Dimensions: Product, Location, Time Hierarchical summarization paths Industry Region Year Category Country Quarter Product City Month Week Office Day

slide-7
SLIDE 7

A Concept Hierarchy: Dimension (location) A Concept Hierarchy: Dimension (location)

all Europe North_America Mexico Canada Spain Germany Vancouver

  • M. Wind
  • L. Chan

... ... ... ... ... ... all region

  • ffice

country Toronto Frankfurt city

A Sample Data Cube

Total annual sales

  • f TV in U.S.A.

Date Product Country

sum sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico sum

slide-8
SLIDE 8

Agenda Agenda

SQL Server Features (in short) SQL Server Features (in short) OLAP OLAP Data Mining Data Mining Demos Demos

Value of Data Mining Value of Data Mining

5 new algorithms 5 new algorithms Graphical tools/wizards Graphical tools/wizards 12 embeddable viewers 12 embeddable viewers Tightly integrated with AS, DTS, Reporting Tightly integrated with AS, DTS, Reporting Integration with Web/Office apps Integration with Web/Office apps

SQL Server 2005 SQL Server 2005 OLAP Reports (Adhoc) Reports (Static) Data Mining

Business Knowledge

Easy Difficult

Usability Relative Business Value

slide-9
SLIDE 9

Algorithms Algorithms

Data Mining algorithms developed in conjunction Data Mining algorithms developed in conjunction with Microsoft Research address the full range of with Microsoft Research address the full range of data mining applications data mining applications

Classification, Regression, Segmentation, Classification, Regression, Segmentation, Association, Forecasting, Text Analysis, and Association, Forecasting, Text Analysis, and Advanced Data Exploration Advanced Data Exploration

Algorithm complexity is hidden behind a Algorithm complexity is hidden behind a consistent, familiar API consistent, familiar API

Data Mining logic accessed through SQL queries, Data Mining logic accessed through SQL queries, results provided through datasets, results provided through datasets, rowsets rowsets Applications can use SQL Server Data Mining to apply Applications can use SQL Server Data Mining to apply learned rules, or show patterns to the user learned rules, or show patterns to the user

Data Mining Tasks Data Mining Tasks

Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration

slide-10
SLIDE 10

Data Mining Tasks Data Mining Tasks

Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration

Typical Business Questions Typical Business Questions

What type of membership card What type of membership card should I offer? should I offer? Which customers will respond to Which customers will respond to my mailing? my mailing? Is this transaction fraudulent? Is this transaction fraudulent? Will I lose this customer? Will I lose this customer? Will this product be defective? Will this product be defective?

Algorithms Algorithms

Preferred Preferred

Decision Trees Decision Trees Na Naï ïve Bayes ve Bayes Neural Nets Neural Nets

Also Can Also Can

Clustering Clustering Sequence Clustering Sequence Clustering Association Rules Association Rules

Data Mining Tasks Data Mining Tasks

Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration

Typical Business Questions Typical Business Questions

How much revenue will I get from How much revenue will I get from this customer? this customer? How long will this asset be in How long will this asset be in service? service?

Algorithms Algorithms

Preferred Preferred

Decision Trees Decision Trees Neural Nets Neural Nets

Also Can Also Can

Clustering Clustering Sequence Clustering Sequence Clustering

slide-11
SLIDE 11

Data Mining Tasks Data Mining Tasks

Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration

Typical Business Questions Typical Business Questions

Describe my customers Describe my customers How can I differentiate my How can I differentiate my customers? customers? How can I organize my data in a How can I organize my data in a manner that makes sense? manner that makes sense?

Algorithms Algorithms

Preferred Preferred

Clustering Clustering Sequence Clustering Sequence Clustering

Also Can Also Can

Neural Nets Neural Nets

Data Mining Tasks Data Mining Tasks

Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration

Typical Business Questions Typical Business Questions

Market Basket Analysis/Cross Market Basket Analysis/Cross Sales Sales What items are bought together? What items are bought together? What products should I What products should I recommend to my customers? recommend to my customers?

Algorithms Algorithms

Preferred Preferred

Association Rules Association Rules Decision Trees (small catalogs) Decision Trees (small catalogs)

Also Can Also Can

Clustering, Sequence Clustering, Clustering, Sequence Clustering, Na Naï ïve Bayes, Neural Nets ve Bayes, Neural Nets

slide-12
SLIDE 12

Data Mining Tasks Data Mining Tasks

Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration

Typical Business Questions Typical Business Questions What are projected revenues What are projected revenues for all products? for all products? What are inventory levels next What are inventory levels next month? month?

Algorithms Algorithms

Time Series Time Series

Data Mining Tasks Data Mining Tasks

Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration

Analysis of unstructured data Analysis of unstructured data

Key term and key phrase Key term and key phrase extraction extraction Conversion to structured data Conversion to structured data Feed into other algorithms Feed into other algorithms

Classification Classification Segmentation Segmentation Association Association

Typical Business Questions Typical Business Questions

How do I handle call center data? How do I handle call center data? How can I classify mail? How can I classify mail? What can I do with web feedback? What can I do with web feedback?

Implemented inside DTS Implemented inside DTS

slide-13
SLIDE 13

Data Mining Tasks Data Mining Tasks

Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration

Learning more about your data Learning more about your data through visualizations through visualizations Typical Business Questions Typical Business Questions

Why do people churn? Why do people churn? What are the relationships between What are the relationships between products? products? What are the differences between What are the differences between high profit and low profit high profit and low profit customers? customers?

All algorithms All algorithms

Complete Set of Algorithms Complete Set of Algorithms

Decision Trees Decision Trees Clustering Clustering Time Series Time Series Sequence Sequence Clustering Clustering Association Association Na Naï ïve Bayes ve Bayes Neural Net Neural Net

Introduced in SQL Server 2000

slide-14
SLIDE 14

Data Mining Programmability Data Mining Programmability

DMX Query Interface DMX Query Interface

OLEDB, ADO, OLEDB, ADO, ADO.Net ADO.Net, , ADOMD.Net ADOMD.Net, XMLA , XMLA

Dim Dim cmd cmd as as ADOMD.Command ADOMD.Command Dim reader as Dim reader as ADOMD.DataReader ADOMD.DataReader Cmd.Connection Cmd.Connection = = conn conn Set reader = Set reader = Cmd.ExecuteReader( Cmd.ExecuteReader(“ “Select Select Predict(Gender Predict(Gender) )…” …”) )

Data Mining Object Model Data Mining Object Model

Analysis Management Objects (AMO) Analysis Management Objects (AMO) ADOMD.Net ADOMD.Net, Server , Server ADOMD.Net ADOMD.Net Direct access to Mining content Direct access to Mining content CLR User Defined Procedures execute on the server CLR User Defined Procedures execute on the server

Expandability Expandability

Plug Plug-

  • In Algorithms

In Algorithms Plug Plug-

  • In Viewers

In Viewers

Agenda Agenda

SQL Server Features (in short) SQL Server Features (in short) OLAP OLAP Data Mining Data Mining Demos Demos