OLAP & Data Mining OLAP & Data Mining Agenda Agenda SQL - - PDF document
OLAP & Data Mining OLAP & Data Mining Agenda Agenda SQL - - PDF document
OLAP & Data Mining OLAP & Data Mining Agenda Agenda SQL Server Features (in short) SQL Server Features (in short) OLAP OLAP Data Mining Data Mining Demos Demos Agenda Agenda SQL Server Features (in short) SQL Server Features
Agenda Agenda
SQL Server Features (in short) SQL Server Features (in short) OLAP OLAP Data Mining Data Mining Demos Demos
SQL Server Features SQL Server Features
.NET Inside the database .NET Inside the database Native support of XML & Web Services Native support of XML & Web Services Visual Studio Integration Visual Studio Integration SQL Server Mobile Enhancements SQL Server Mobile Enhancements 64 64-
- bit support
bit support Notification Services Notification Services Integration Services Integration Services Scalability & Availability Improvements Scalability & Availability Improvements
What .NET Inside the Database means? What .NET Inside the Database means?
Same experience as developing normal Same experience as developing normal applications applications Place part of the business logic inside the database Place part of the business logic inside the database Extend data types with user defined types Extend data types with user defined types Better Performance Better Performance
No connections required No connections required Managed Code is compiled Managed Code is compiled – – TSQL TSQL interpeted interpeted
Write object oriented applications and avoid T Write object oriented applications and avoid T-
- SQL
SQL Use the rich API of .NET Framework Use the rich API of .NET Framework Expose DB functionality via web services Expose DB functionality via web services
SQL Server 2005 SQL Server 2005
A Complete Enterprise Data Management and BI Solution
Analysis Services Analysis Services
OLAP & Data Mining OLAP & Data Mining
Data Transformation Data Transformation Services Services
ETL ETL
SQL Server SQL Server
Relational Engine Relational Engine
Reporting Services Reporting Services Management Tools Management Tools Development Tools Development Tools
Agenda Agenda
SQL Server Features (in short) SQL Server Features (in short) OLAP OLAP Data Mining Data Mining Demos Demos
Relational Versus OLAP Reports Relational Versus OLAP Reports
- OLAP
OLAP
- Ease of navigation and exploration
Ease of navigation and exploration
- Detail reporting
Detail reporting
- Rich analytics
Rich analytics
- End
End-
- user oriented
user oriented
- High performance
High performance
- Simple management
Simple management
- Single data store
Single data store
- Real time data access
Real time data access
- Flexible schema
Flexible schema Relational Relational Feature Feature
Basic Principles Basic Principles
OLAP databases are OLAP databases are denormalized denormalized databases databases
Normalization is good for updates Normalization is good for updates – – inserts but bad for inserts but bad for performance performance
Process data in advance to provide better response Process data in advance to provide better response times times Process more Process more data data need need for extra disk for extra disk space space takes takes more more time time faster faster response: We should find the balance response: We should find the balance Precalculate Precalculate values that users may ask for values that users may ask for No real No real-
- time data
time data Use Integration Services (DTS) to build and update the Use Integration Services (DTS) to build and update the datawarehouse datawarehouse
Concepts Concepts
Fact tables Fact tables Dimension tables Dimension tables Measures Measures Star and Snowflake schemas Star and Snowflake schemas Cube Cube Storage options: ROLAP Storage options: ROLAP-
- HOLAP
HOLAP-
- MOLAP
MOLAP
Multidimensional Data
Sales volume as a function of product, Sales volume as a function of product, month, and region month, and region
Product R e g i
- n
Month
Dimensions: Product, Location, Time Hierarchical summarization paths Industry Region Year Category Country Quarter Product City Month Week Office Day
A Concept Hierarchy: Dimension (location) A Concept Hierarchy: Dimension (location)
all Europe North_America Mexico Canada Spain Germany Vancouver
- M. Wind
- L. Chan
... ... ... ... ... ... all region
- ffice
country Toronto Frankfurt city
A Sample Data Cube
Total annual sales
- f TV in U.S.A.
Date Product Country
sum sum TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico sum
Agenda Agenda
SQL Server Features (in short) SQL Server Features (in short) OLAP OLAP Data Mining Data Mining Demos Demos
Value of Data Mining Value of Data Mining
5 new algorithms 5 new algorithms Graphical tools/wizards Graphical tools/wizards 12 embeddable viewers 12 embeddable viewers Tightly integrated with AS, DTS, Reporting Tightly integrated with AS, DTS, Reporting Integration with Web/Office apps Integration with Web/Office apps
SQL Server 2005 SQL Server 2005 OLAP Reports (Adhoc) Reports (Static) Data Mining
Business Knowledge
Easy Difficult
Usability Relative Business Value
Algorithms Algorithms
Data Mining algorithms developed in conjunction Data Mining algorithms developed in conjunction with Microsoft Research address the full range of with Microsoft Research address the full range of data mining applications data mining applications
Classification, Regression, Segmentation, Classification, Regression, Segmentation, Association, Forecasting, Text Analysis, and Association, Forecasting, Text Analysis, and Advanced Data Exploration Advanced Data Exploration
Algorithm complexity is hidden behind a Algorithm complexity is hidden behind a consistent, familiar API consistent, familiar API
Data Mining logic accessed through SQL queries, Data Mining logic accessed through SQL queries, results provided through datasets, results provided through datasets, rowsets rowsets Applications can use SQL Server Data Mining to apply Applications can use SQL Server Data Mining to apply learned rules, or show patterns to the user learned rules, or show patterns to the user
Data Mining Tasks Data Mining Tasks
Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration
Data Mining Tasks Data Mining Tasks
Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration
Typical Business Questions Typical Business Questions
What type of membership card What type of membership card should I offer? should I offer? Which customers will respond to Which customers will respond to my mailing? my mailing? Is this transaction fraudulent? Is this transaction fraudulent? Will I lose this customer? Will I lose this customer? Will this product be defective? Will this product be defective?
Algorithms Algorithms
Preferred Preferred
Decision Trees Decision Trees Na Naï ïve Bayes ve Bayes Neural Nets Neural Nets
Also Can Also Can
Clustering Clustering Sequence Clustering Sequence Clustering Association Rules Association Rules
Data Mining Tasks Data Mining Tasks
Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration
Typical Business Questions Typical Business Questions
How much revenue will I get from How much revenue will I get from this customer? this customer? How long will this asset be in How long will this asset be in service? service?
Algorithms Algorithms
Preferred Preferred
Decision Trees Decision Trees Neural Nets Neural Nets
Also Can Also Can
Clustering Clustering Sequence Clustering Sequence Clustering
Data Mining Tasks Data Mining Tasks
Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration
Typical Business Questions Typical Business Questions
Describe my customers Describe my customers How can I differentiate my How can I differentiate my customers? customers? How can I organize my data in a How can I organize my data in a manner that makes sense? manner that makes sense?
Algorithms Algorithms
Preferred Preferred
Clustering Clustering Sequence Clustering Sequence Clustering
Also Can Also Can
Neural Nets Neural Nets
Data Mining Tasks Data Mining Tasks
Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration
Typical Business Questions Typical Business Questions
Market Basket Analysis/Cross Market Basket Analysis/Cross Sales Sales What items are bought together? What items are bought together? What products should I What products should I recommend to my customers? recommend to my customers?
Algorithms Algorithms
Preferred Preferred
Association Rules Association Rules Decision Trees (small catalogs) Decision Trees (small catalogs)
Also Can Also Can
Clustering, Sequence Clustering, Clustering, Sequence Clustering, Na Naï ïve Bayes, Neural Nets ve Bayes, Neural Nets
Data Mining Tasks Data Mining Tasks
Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration
Typical Business Questions Typical Business Questions What are projected revenues What are projected revenues for all products? for all products? What are inventory levels next What are inventory levels next month? month?
Algorithms Algorithms
Time Series Time Series
Data Mining Tasks Data Mining Tasks
Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration
Analysis of unstructured data Analysis of unstructured data
Key term and key phrase Key term and key phrase extraction extraction Conversion to structured data Conversion to structured data Feed into other algorithms Feed into other algorithms
Classification Classification Segmentation Segmentation Association Association
Typical Business Questions Typical Business Questions
How do I handle call center data? How do I handle call center data? How can I classify mail? How can I classify mail? What can I do with web feedback? What can I do with web feedback?
Implemented inside DTS Implemented inside DTS
Data Mining Tasks Data Mining Tasks
Classification Classification Regression Regression Segmentation Segmentation Association Association Forecasting Forecasting Text Analysis Text Analysis Advanced Data Advanced Data Exploration Exploration
Learning more about your data Learning more about your data through visualizations through visualizations Typical Business Questions Typical Business Questions
Why do people churn? Why do people churn? What are the relationships between What are the relationships between products? products? What are the differences between What are the differences between high profit and low profit high profit and low profit customers? customers?
All algorithms All algorithms
Complete Set of Algorithms Complete Set of Algorithms
Decision Trees Decision Trees Clustering Clustering Time Series Time Series Sequence Sequence Clustering Clustering Association Association Na Naï ïve Bayes ve Bayes Neural Net Neural Net
Introduced in SQL Server 2000
Data Mining Programmability Data Mining Programmability
DMX Query Interface DMX Query Interface
OLEDB, ADO, OLEDB, ADO, ADO.Net ADO.Net, , ADOMD.Net ADOMD.Net, XMLA , XMLA
Dim Dim cmd cmd as as ADOMD.Command ADOMD.Command Dim reader as Dim reader as ADOMD.DataReader ADOMD.DataReader Cmd.Connection Cmd.Connection = = conn conn Set reader = Set reader = Cmd.ExecuteReader( Cmd.ExecuteReader(“ “Select Select Predict(Gender Predict(Gender) )…” …”) )
Data Mining Object Model Data Mining Object Model
Analysis Management Objects (AMO) Analysis Management Objects (AMO) ADOMD.Net ADOMD.Net, Server , Server ADOMD.Net ADOMD.Net Direct access to Mining content Direct access to Mining content CLR User Defined Procedures execute on the server CLR User Defined Procedures execute on the server
Expandability Expandability
Plug Plug-
- In Algorithms
In Algorithms Plug Plug-
- In Viewers
In Viewers