Analisi dei dati ed estrazione di conoscenza Mastering Data Mining - - PowerPoint PPT Presentation

analisi dei dati ed estrazione di conoscenza
SMART_READER_LITE
LIVE PREVIEW

Analisi dei dati ed estrazione di conoscenza Mastering Data Mining - - PowerPoint PPT Presentation

Analisi dei dati ed estrazione di conoscenza Mastering Data Mining Fosca Giannotti Pisa KDD Lab, ISTI-CNR & Univ. Pisa http:/ / www-kdd.isti.cnr.it/ DI PARTI MENTO DI I NFORMATI CA - Universit di Pisa anno accadem ico 2 0 0 5 / 2 0 0 6


slide-1
SLIDE 1

Analisi dei dati ed estrazione di conoscenza

Mastering Data Mining

Fosca Giannotti Pisa KDD Lab, ISTI-CNR & Univ. Pisa

http:/ / www-kdd.isti.cnr.it/

DI PARTI MENTO DI I NFORMATI CA - Università di Pisa anno accadem ico 2 0 0 5 / 2 0 0 6

slide-2
SLIDE 2

Mastering Data Mining

slide-3
SLIDE 3

The KDD process

Selection and Preprocessing Data Mining Interpretation and Evaluation Data Consolidation

Knowledge

p(x)=0.02

Warehouse

Data Sources Patterns & Models Prepared Data Consolidated Data

slide-4
SLIDE 4

CogNova

Technologies

9

The KDD Process The KDD Process

Selection and Preprocessing

Data Mining

Interpretation and Evaluation Data Consolidation

Knowledge

p(x)=0.02

Warehouse Data Sources Patterns & Models Prepared Data Consolidated Data

Knowledge Problem

The virtuous cycle

Identify Problem or Opportunity Act on Knowledge Measure effect

  • f Action

Results Strategy

slide-5
SLIDE 5

Business Intelligence

Business Intelligence is a global term for all the processes, techniques and tools that support business decision-making based on information technology. The approaches can range from a simple spreadsheet to a major competitive undertaking. Data mining is an important new component

  • f business undertaking.
slide-6
SLIDE 6

Increasing potential to support business decisions End User Business Analyst Data Analyst DBA

Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration OLAP, MDA Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts Data Sources Paper, Files, Information Providers, Database Systems, OLTP

Business intelligence technologies

slide-7
SLIDE 7

Analogia: Piramide di Anthony

classifica le attività svolte in un’organizzazione identifica il ruolo dei sistemi informatici a supporto di tali attività.

Attività operative Programmazione e controllo Pianificazione strategica

Attività strategiche Attività tattiche Attività

  • perative
  • Scelta degli obiettivi aziendali
  • Scelta delle risorse per il loro

conseguimento

  • Definizione delle politiche di

comportamento aziendale

  • Programmazione delle risorse

disponibili

  • Controllo sul conseguimento degli
  • biettivi programmati
  • Conduzione a regime delle attività

aziendali

slide-8
SLIDE 8

Applications, operations, techniques

slide-9
SLIDE 9

Roles in the KDD process

slide-10
SLIDE 10

A business intelligence environment

slide-11
SLIDE 11

How to develop a Data Mining Project?

slide-12
SLIDE 12

CRISP-DM: The life cicle of a

data mining project

KDD Process

slide-13
SLIDE 13

Business understanding

Understanding the project objectives and requirements from a business perspective. then converting this knowledge into a data mining problem definition and a preliminary plan.

Determine the Business Objectives Determine Data requirements for Business

Objectives

Translate Business questions into Data

Mining Objective

slide-14
SLIDE 14

Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Determine Business Objective Assess Situation Determine Data Mining Goals Produce Project Plan Background Business Objective Business Success Criteria Inventory of Resources Data Mining Goals Data Mining Success Criteria Project Plan Assessment Of Tools and Techiniques Requirements Assumptions Constraints Risk and Contingencies Terminology Costs & Benefits

slide-15
SLIDE 15

Data understanding

Data understanding: characterize data

available for modelling. Provide assessment and verification for data.

slide-16
SLIDE 16

Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Describe Data Explore Data Verify Data Quality Initial Data Collection Report Data Description Report Data Exploration Report Data Quality Report Collect Initial Data

slide-17
SLIDE 17

Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Clean Data Construct Data Integrate Data Rationale for Inclusion Exclusion Data Cleaning Report Derived Attributes Merged Data Select Data Generated Records Format Data Reformatted Data Resulting Dataset Description

slide-18
SLIDE 18

Modeling:

In this phase, various modeling techniques are selected and applied and their parameters are calibrated to optimal values. Typically, there are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase is often necessary.

slide-19
SLIDE 19

Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Generate Test Design Build Model Assess Model Modeling Technique Modeling Assumptions Test Design Parameter Setting Models Model Assessment Revised Parameter Setting Model Description Selecting Modeling Technique

slide-20
SLIDE 20

Evaluation

At this stage in the project you have built a model (or models) that appears to have high quality from a data analysis perspective. Evaluate the model and review the steps executed to construct the model to be certain it properly achieves the business objectives. A key objective is to determine if there is some important business issue that has not been sufficiently considered.

slide-21
SLIDE 21

Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Review Process Determining Next Steps Assessment Of DMining Results Approved Models Review of Process List of Possible Actions Decisions Evaluate Results

slide-22
SLIDE 22

Deployment:

The knowledge gained will need to be organized and presented in a way that the customer can use it. It often involves applying “live” models within an

  • rganization’s decision making processes, for

example in real-time personalization of Web pages or repeated scoring of marketing databases.

slide-23
SLIDE 23

Deployment:

It can be as simple as generating a report or as complex as implementing a repeatable data mining process across the enterprise. In many cases it is the customer, not the data analyst, who carries out the deployment steps.

slide-24
SLIDE 24

Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Plan Monitoring and Maintenance Produce Final Report Review Project Deployment Plan Monitoring and Maintenance Plan Final Report Final Presentation Experience Documentation Plan Deployment

slide-25
SLIDE 25

Es: Automatic Target Marketing

slide-26
SLIDE 26

Mining Based Decision Support System: Adaptive Architecture

On-line data DW/ Data Mart

DM models User Interface Intelligent Engine Data preparation Data mining task

On-line side Off-line side Update

Knowledge Base

slide-27
SLIDE 27

How to bring Data Mining to bear on a company’s business problem

slide-28
SLIDE 28

A photography metaphor

Mastering data mining means learning how to get data to tell a true and useful story Similar to mastering the art of photography – Mastering Data Mining, Barry Linoff 2002

slide-29
SLIDE 29

Using an automatic Polaroid

Purchasing Scores from outside vendors as for example from Nielsen, Aggregate information from Istat Purchasing demographic overlay and surveys

slide-30
SLIDE 30

Using a fully automated camera

To purchase software that embodies DM expertise directed toward a particular application Vertical products Neural Net for Credit Card Fraud detection Churn Management Customer Relationship Management (Decisionhouse)

slide-31
SLIDE 31

Hiring a wedding photographer

By hiring outside consultants to perform predictive modelling for you for special projects Valuable in early stages Failing when all models, data, and insights generated are in the end of outsiders. The problem is How to use outside expertise “A prophet of another land may have more success in persuading the management of a new approach” Pilot projects with DM Labs.

slide-32
SLIDE 32

Building your own dark-room and becoming a skilled photographer

Developing in house expertise A long term goal People which understand both the data and the business will build better models.

slide-33
SLIDE 33

The frontier of Data Mining

slide-34
SLIDE 34

New data and new applications

specificità della struttura dei dati da analizzare (sequenze, grafi, stream, testi, dati semistrutturati) tipiche in settori applicativi emergenti quali bioinformatica, biologia ed il mondo Web. Specificità dell’applicazione finale come la necessità di incapsulare le funzionalità di mining all’interno di processi automatici (Invisible Data Mining).

slide-35
SLIDE 35

Vertical DM and privacy

Necessità di fornire all’utente possibilità di interazione ad alto livello in tutti i passi per personalizzare e validare il processo di estrazione di conoscenza rispetto ad una specifica conoscenza di dominio. Infine, un’altra problematica interessante proviene dalla necessità di garantire gli aspetti di privacy e sicurezza degli individui pur estraendo informazione aggregata e globale.

slide-36
SLIDE 36

Mining Data Streams:

In many emerging applications data arrives and needs to be processed on a continuous basis, i.e., there is need for mining without the benefit of several passes over a static, persistent snapshot.

slide-37
SLIDE 37

Data Mining in Bioinformatics

High-performance data mining tools will play a crucial role in the analysis of the ever-growing databases of bio- sequences/structures.

slide-38
SLIDE 38

Semi/Un-Structured Mining for the World Wide Web:

The vast amounts of information with little or no structure on the web raise a host challenging mining problems such as web resource discovery and topic distillation; web structure/linkage mining; intelligent web searching and crawling; personalization of web content.

slide-39
SLIDE 39

Web Mining: A Fast Expanding Frontier in Data Mining

Mine what Web search engine finds Automatic classification of Web documents Discovery of authoritative Web pages, Web structures and Web communities Meta-Web Warehousing: Web yellow page service Web usage mining

slide-40
SLIDE 40

Web Mining Taxonomy

slide-41
SLIDE 41

OLAP Mining: An Integration of Data Mining and Data Warehousing

Data mining systems, DBMS, Data warehouse systems coupling

No coupling, loose-coupling, semi-tight-coupling, tight-coupling

On-line analytical mining data

integration of mining and OLAP technologies

I nteractive mining multi-level knowledge

Necessity of mining knowledge and patterns at different levels of

abstraction by drilling/rolling, pivoting, slicing/dicing, etc.

I ntegration of multiple mining functions

  • Characterized classification, first clustering and then association