Global innovative Leadership Module Disclaimer > The information - - PowerPoint PPT Presentation

global
SMART_READER_LITE
LIVE PREVIEW

Global innovative Leadership Module Disclaimer > The information - - PowerPoint PPT Presentation

Global innovative Leadership Module Disclaimer > The information and views set out in this publication are those of the author(s) and do not necessarily reflect the official opinion of the European Union. Neither the European Union


slide-1
SLIDE 1

Global innovative Leadership Module

Disclaimer> The information and views set out in this publication are those of the author(s) and do not necessarily reflect the official opinion of the European Union. Neither the European Union institutions and bodies nor any person acting on their behalf may be held responsible for the use which may be made of the information contained therein.

slide-2
SLIDE 2

Data Mining

slide-3
SLIDE 3

The process of analyzing data to discover hidden patterns and relationships that can help you manage and improve your business.

What is is data min ining?

slide-4
SLIDE 4

There are two new types of mining:

  • Text mining
  • Web mining.

They increase both the accuracy and depth of the insights uncovered through your data mining efforts.

What types of data are used in in data min ining?

slide-5
SLIDE 5

Categorical variables:

  • Nominal: categories with no ranking (e.g. gender,

race/ethnicity, place of birth, etc.);

  • Ordinal: categories with a ranking (e.g.

educational level, income categories, Likert scales (strongly agree, agree, disagree, strongly disagree) etc.);

  • Continuous: A zero point and equal distance

between values (e.g. age, height, weight, # of hours studying a day, etc.).

Data types

slide-6
SLIDE 6
  • Increasing revenues from customers
  • Understanding customer segments and

preferences

  • Identifying profitable customers and acquiring

new ones

  • Improving cross-selling and up-selling
  • Retaining customers and increasing loyalty

What business problems does data min ining solve?

slide-7
SLIDE 7
  • Increasing ROI and reducing marketing campaign

costs

  • Detecting fraud, waste, and abuse
  • Determining credit risks
  • Increasing Web site profitability
  • Increasing retail store traffic and optimizing

layouts for increased sales

  • Monitoring business performance

What business problems does data min ining solve?

slide-8
SLIDE 8
  • SPSS data mining products and services ensure

timely, reliable results by supporting the CRoss- Industry Standard Process for Data Mining (CRISP-DM).* CRISP-DM provides step-by-step guidelines, tasks, and objectives for every stage

  • f the data mining process.

How does the data min ining process work?

slide-9
SLIDE 9
  • Business understanding: Achieve a clear

understanding of your business challenges

  • Data understanding: Determine what data are

available to mine for answers

  • Data preparation: Prepare the data in the

appropriate format to answer your business questions

  • Modeling: Design data models to meet your

requirements

  • Evaluation: Test your results against the goals of

your project

  • Deployment: Make the results of the project

available to decision makers

Six ix phases in in CRIS ISP-DM DM

slide-10
SLIDE 10
  • Make sure project stakeholders know that data

mining is not a silver bullet that magically solves business problems.

  • As with any business problem, stakeholders need

to find a solvable problem and work on the solution.

Set expectations

slide-11
SLIDE 11
  • Know “who, what, when, where, why, and how”

from a business perspective

  • Develop a thorough understanding of the project

parameters: the current business situation, the primary business objective of the project, the criteria for success, and who will determine the success of the project.

Business understanding

slide-12
SLIDE 12

Make sure to go over every aspect of the project in advance to ensure you have what you need for success:

  • Personnel (project sponsor, business, and

technical experts)

  • Data sources (access to warehouse or operational

data)

  • Computing resources (hardware, platforms)
  • Software (data mining and other relevant

software)

Assess the sit ituation and in inventory resources

slide-13
SLIDE 13
  • List and clarify all of the assumptions you have

made about:

  • Data quality (accuracy, availability)
  • External factors (economic issues, competition,

technical advances)

  • Internal factors (the business problem)
  • Models (Is it necessary to understand, describe,
  • r explain the models to senior management?)

What assumptions are being made about the project?

slide-14
SLIDE 14
  • Gather all of the data you will need for your

project.

  • A web mining tool will add a deeper level of

insight to the project.

  • Up to 80 percent of your data may be hidden in

text documents. A text mining tool to efficiently search these sources for valuable information.

Make sure the data are available

slide-15
SLIDE 15

Select your data Decide what data to use for analysis and list the reasons for your decisions. This involves:

  • Performing significance and correlation tests

to determine which fields to include

  • Selecting data subsets
  • Using sampling techniques to review small

chunks of data for appropriateness

Data preparation

slide-16
SLIDE 16

Integrate Data Joining multiple data tables Summarization/aggregation of data Deriving new variables

Data Preparation Phase

slide-17
SLIDE 17

Select Data Attribute subset selection Rationale for Inclusion/Exclusion Data sampling Training/Validation and Test sets Data Transformation Using functions such as log Factor/Principal Components analysis Normalization/ Discretisation /Binarisation Clean Data Handling missing values/Outliers

Data Preparation Phase

slide-18
SLIDE 18
  • Testing is crucial beforehand after building a

model.

  • To create a model, run your modeling tool on the

dataset you have prepared.

  • Create a detailed model report that lists the rules

produced, the parameter settings used, the model’s behavior and interpretation, and any conclusions about patterns revealed in the data.

Build your model

slide-19
SLIDE 19

Select of the appropriate modeling technique Data pre-processing implications Attribute independence Data types/Normalisation/Distributions Dependent on Develop a testing regime Sampling Verify samples have similar characteristics and are representative of the population

The Modeling Phase

slide-20
SLIDE 20

Build Model Choose initial parameter settings Study model behaviour Sensitivity analysis Assess the model Beware of over-fitting Investigate the error distribution Identify segments of the state space where the model is less effective Iteratively adjust parameter settings Document reasons of these changes

The Modeling Phase

slide-21
SLIDE 21

Validate Model Human evaluation of results by domain experts Evaluate usefulness of results from business perspective Define control groups Calculate lift curves Expected Return on Investment Review Process Determine next steps Potential for deployment Deployment architecture Metrics for success of deployment

The Evaluation Phase

slide-22
SLIDE 22
  • Summarize deployable models or software

results

  • Develop and evaluate alternative deployment

plans

  • Confirm how the results will be distributed to

recipients

  • Determine how to monitor the use of the results

and measure the benefits

  • Identify possible problems and pitfalls of

deployment

Deployment: Create a deployment plan

slide-23
SLIDE 23
  • To create your final report, first:
  • Identify which reports are needed (slides,

management summary, etc.)

  • Analyze how well the data mining goals were met
  • Identify report recipients
  • Outline the structure and content of the report
  • Select which discoveries to include

Deployment: create a fin inal report

slide-24
SLIDE 24
  • Interview all significant project members about

their experiences

  • Interview any end users of your data mining

results about their experiences

  • Document and analyze the specific data mining

steps that you took

Review the project

slide-25
SLIDE 25
  • Knowledge Deployment is specific to objectives

Knowledge Presentation Deployment within Scoring Engines and Integration with the current IT infrastructure XML interfaces to 3rd party tools Generation of a report Monitoring and evaluation of effectiveness

  • Process deployment/production
  • Produce final project report

Document everything along the way

The Deployment Phase

slide-26
SLIDE 26
  • Look for a tool with a proven record of solving

the business problems your project addresses.

  • Choose a tool that you know to be useful in

solving problems within your industry and that has a successful track record with the types of applications you’re planning.

Selecting A Data Min ining Tool

slide-27
SLIDE 27

Classification

  • The process of identifying the group to which an
  • bject belongs by examining characteristics of

the object. In classification, the groups are defined by an external criterion (contrast with clustering).

Data Min ining Tasks

slide-28
SLIDE 28

Clustering

  • The process of grouping records based on similarity.

Clustering divides a dataset so that records with similar content are in the same group, and groups are as different as possible from each other (contrast with classification).

slide-29
SLIDE 29

Segmentation Why Segmentation

  • Used by e.g. retail and consumer product

companies trying to learn about and describe their customers' buying habits, gender, age, income level, etc.

  • A valuable approach in Market Research, and

SPSS offers some useful tools to facilitate this commercial process.

slide-30
SLIDE 30
  • Factor Analysis - to find patterns within variables
  • Categories - use if data doesn’t fit assumptions

for Factor Analysis

  • Cluster Analysis - to find patterns between

individuals

  • Two-Step Cluster – To use with both categorical

and continuous variables

  • Discriminant Analysis - to look for differences

between groups, try to predict target variable

  • Answer Tree (decision tree) - combinations of

data, to predict target

Whic ich Test to use?

slide-31
SLIDE 31

Thanks for your attention