Data mining Machine Intelligence Thomas D. Nielsen September 2008 - PowerPoint PPT Presentation

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008 1 / 37

What is Data Mining? ? Introduction Data mining September 2008 2 / 37

What is Data Mining? ! Introduction Data mining September 2008 2 / 37

What is Data Mining? Data Mining in practice Introduction Data mining September 2008 3 / 37

What is Data Mining? Data Mining in practice Real−life data Off−the−shelf algorithm preprocess adapt Introduction Data mining September 2008 3 / 37

What is Data Mining? Data Mining in practice Real−life data Off−the−shelf algorithm preprocess adapt evaluate + iterate Introduction Data mining September 2008 3 / 37

What is Data Mining? Data Mining in practice Real−life data Off−the−shelf algorithm preprocess adapt evaluate + iterate general algorithmic methods data/domain − specific operations Introduction Data mining September 2008 3 / 37

What is Data Mining? An overview Unsupervised Learning Supervised Learning Labeled Data Unlabeled Data Classification Clustering Predictive Modeling Descriptive Modeling Rule Mining, Association Analysis Introduction Data mining September 2008 4 / 37

Classification A high-level view r e fi Spam i s s a yes/no l C Classification Data mining September 2008 5 / 37

Classification A high-level view SubAllCap yes/no TrustSend yes/no InvRet yes/no r Body’adult’ e fi yes/no Spam i s s a yes/no l C Body’zambia’ yes/no Classification Data mining September 2008 5 / 37

Classification A high-level view Cell -1 1..64 Cell-2 1..64 Cell-3 1..64 r e fi Symbol i s s a A..Z,0..9 l C Cell-324 1..64 Classification Data mining September 2008 5 / 37

Classification Labeled Data Attributes Class variable (Cases, Examples) (Features, Predictor Variables) (Target variable) SubAllCap TrustSend InvRet . . . B’zambia’ Spam y n n . . . n y Instances n n n . . . n n n y n . . . n y n n n . . . n n . . . . . . . . . . . . . . . . . . Attributes Class variable Cell-1 Cell-2 Cell-3 . . . Cell-324 Symbol 1 1 4 . . . 12 B Instances 1 1 1 . . . 3 1 34 37 43 . . . 22 Z 1 1 1 . . . 7 0 . . . . . . . . . . . . . . . . . . (In principle, any attribute can become the designated class variable) Classification Data mining September 2008 6 / 37

Classification Classification in general Attributes : Variables A 1 , A 2 , . . . , A n (discrete or continuous). Class variable : Variable C . Always discrete: states ( C ) = { c 1 , . . . , c l } (set of class labels ) A (complete data) Classifier is a mapping C : states ( A 1 , . . . , A n ) → states ( C ) . A classifier able to handle incomplete data provides mappings C : states ( A i 1 , . . . , A i k ) → states ( C ) for subsets { A i 1 , . . . , A i k } of { A 1 , . . . , A n } . A classifier partitions Attribute-value space (also: instance space ) into subsets labelled with class labels. Classification Data mining September 2008 7 / 37

Classification Iris dataset Measurement of petal width/length and sepal width/length for 150 flowers of 3 different species of Iris. PL PW first reported in: Fisher,R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7 (1936). SL Attributes Class variable SW SL SW PL PW Species 5.1 3.5 1.4 0.2 Setosa 4.9 3.0 1.4 0.2 Setosa 6.3 2.9 6.0 2.1 Virginica 6.3 2.5 4.9 1.5 Versicolor . . . . . . . . . . . . . . . Classification Data mining September 2008 8 / 37

Classification Labeled data in instance space: Classification Data mining September 2008 9 / 37

Classification Labeled data in instance space: Virginica Versicolor Setosa Partition defined by classifier Classification Data mining September 2008 9 / 37

Classification Decision Regions Piecewise linear: e.g. Naive Axis-parallel linear: e.g. Deci- Bayes sion Trees Nonlinear: e.g. Neural Network Classification Data mining September 2008 10 / 37

Classification Classifiers differ in . . . Model space: types of partitions and their representation. how they compute the class label corresponding to a point in instance space (the actual classification task). how they are learned from data. Some important types of classifiers: Decision trees Naive Bayes classifier Other probabilistic classifiers (TAN,. . . ) Neural networks K-nearest neighbors Classification Data mining September 2008 11 / 37

Decision Trees Example Attributes: height ∈ [ 0 , 2 . 5 ] , sex ∈ { m , f } . Class labels: { tall , short } . 2.5 s 2.0 f tall tall m h h 1.0 short < 1 . 7 ≥ 1 . 7 short < 1 . 8 ≥ 1 . 8 0 short tall short tall m f Partition of instance space Representation by decision tree Decision trees Data mining September 2008 12 / 37

Decision Trees A decision tree is a tree - whose internal nodes are labeled with attributes - whose leaves are labeled with class labels - edges going out from node labeled with attribute A are labeled with subsets of states ( A ) , such that all labels combined form a partition of states ( A ) . Possible partitions states ( A ) = R : [ −∞ , 2 . 3 [ , [ 2 . 3 , ∞ ] [ −∞ , 1 . 9 [ , [ 1 . 9 , 3 . 5 [ , [ 3 . 5 , ∞ ] states ( A ) = { a , b , c } : { a } , { b } , { c } { a , b } , { c } Decision trees Data mining September 2008 13 / 37

Decision Trees Decision tree classification Each point in the instance space is sorted into a leaf by the decision tree. It is classified according to the class label at that leaf. s f m h h < 1 . 7 ≥ 1 . 7 < 1 . 8 ≥ 1 . 8 short tall short tall [m,1.85] C ([ m , 1 . 85 ]) = tall Decision trees Data mining September 2008 14 / 37

Decision Trees Learning a decision tree In general, we look for a small decision tree with minimal classification error over the data set � ( a 1 , c 1 ) , ( a 2 , c 2 ) , . . . , ( a n , c n ) � . A B t f t f B B A c 2 t f t f t f c 1 c 2 c 1 c 2 c 1 c 2 Bad tree Good tree Note: if data is noise-free , i.e. there are no instances ( a i , c i ) , ( a j , c j ) with a i = a j and c i � = c j , then there always exists decision tree with zero classification error. Decision trees Data mining September 2008 15 / 37

Decision Trees The ID3 algorithm A t f yes X Decision trees Data mining September 2008 16 / 37

Decision Trees The ID3 algorithm A f t yes B t f ? ? Top-down construction of the decision tree. For an “open” node X : Let D ( X ) be the instances that can reach X . If all instances agree on the class c , then label X with c and make it a leaf. Otherwise, find best attribute A and partition of states ( A ) , replace X with A , and make an outgoing edge from A for each member of the partition. Decision trees Data mining September 2008 16 / 37

Decision Trees Notes: The exact algorithm is formulated as a recursive procedure. One can modify the algorithm by providing weaker conditions for termination (necessary for noisy data): - If <some other termination condition applies> , turn X into a leaf with <most appropriate class label> . Decision trees Data mining September 2008 17 / 37

Decision Trees Scoring new partitions B t f X c 1 D ( X ) Decision trees Data mining September 2008 18 / 37

Decision Trees Scoring new partitions For each candidate attribute A with partition a 1 , a 2 , a 3 B of states ( A ) : t f Let p i ( c ) be the relative frequency of class label c in D ( a i ) . Measure for uniformity of class label distribution in D ( X i ) (entropy): A c 1 X H X i := − p i ( c ) log 2 ( p i ( c )) a 1 a 2 a 3 c ∈ states ( C ) Score of new partition (-expected entropy): X 1 X 2 X 3 3 | D ( X i ) | X Score ( A , a 1 , a 2 , a 3 ) := − | D ( X ) | H X i D ( X 1 ) D ( X 2 ) D ( X 3 ) i = 1 Decision trees Data mining September 2008 18 / 37

Decision Trees Searching for partitions When trying attribute A look for the partition of states ( A ) with highest score. In practice: Can try all choices for A . Cannot try all partitions of states ( A ) . Therefore For states ( A ) = R : only consider partitions of the form ] − ∞ , r [ , [ r , ∞ [ . Example: A : 1 3 4 6 10 12 17 18 22 25 C : y y y n n y y y n n Pick the partition with minimal expected entropy. For states ( A ) = { a 1 , . . . , a k } : only consider partition { a 1 } , . . . , { a k } . Decision trees Data mining September 2008 19 / 37

Decision Trees Decision boundaries revisited Decision trees Data mining September 2008 20 / 37

Attributes with many values The expected entropy measure favors attributes with many values: For example, an attribute Date (with the possible dates as states) will have a very low expected entropy but is unable to generalize! One approach for avoiding this problem is to select attributes based on GainRation: GainRation ( D , A ) = score ( S , A ) H A X H A = − p ( a ) log 2 ( p ( a )) , a ∈ states ( A ) where p ( a ) is the relative frequency of A = a in D . Decision trees Data mining September 2008 21 / 37

Data mining Machine Intelligence Thomas D. Nielsen September 2008 - PowerPoint PPT Presentation

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008 1 / 37 What is Data Mining? ? Introduction Data mining September 2008 2 / 37 What is Data Mining? ? Introduction Data mining September 2008

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data mining functionalities Data Mining Major

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

Data Mining 2020 Text Classification Naive Bayes Ad Feelders Universiteit Utrecht Ad Feelders

IntroProbability September 17, 2018 1 Lecture 12: Introduction to Probability CBIO (CSCI)

Computer Communication Networks Final Review ICEN/ICSI 416 Fall 2017 Prof. Dola Saha 1

Our TAs Top 11 Technologies of the Decade Mo Sha Yong Fu 1.

Mithat Unsal North Carolina State University Some of the work presented here is done in

2. Digital Data CHAPTER HIGHLIGHTS Elements of digital media. Digital codes. Di it l d

Uses of IIR filters IIR filters can be unstable. What if the pole is on the unit circle?

Real-Space RG for dynamics of random spin chains and many-body localization Ehud Altman, Weizmann

First Post Launch AIRS Science Team Meeting 18 September 2002 H. H. Aumann We want AIRS data! We

Sambuz

Useful Links

Newsletter

Mail Us