Data Mining and Machine Learning: Fundamental Concepts and - PowerPoint PPT Presentation

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil Chapter 3: Categorical Attributes Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 1 / 26

Univariate Analysis: Bernoulli Variable Consider a single categorical attribute, X , with domain dom ( X ) = { a 1 , a 2 ,..., a m } comprising m symbolic values. The data D is an n × 1 symbolic data matrix given as   X x 1     x 2 D =    .  .   .   x n where each point x i ∈ dom ( X ) . Bernoulli Variable : Special case when m = 2 � 1 if v = a 1 X ( v ) = 0 if v = a 2 i.e., dom ( X ) = { 0 , 1 } . Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 2 / 26

Bernoulli Variable: Mean and Variance Assume that each symbolic point has The probability mass function (PMF) of been mapped to its binary value. The set X is given as { x 1 , x 2 ,..., x n } is a random sample drawn from X . P ( X = x ) = f ( x ) = p x ( 1 − p ) 1 − x The sample mean is given as n µ = 1 x i = n 1 The expected value of X is given as � ˆ n = ˆ p n i = 1 µ = E [ X ] = 1 · p + 0 · ( 1 − p ) = p where n i is the number of points with x j = i in the random sample (equal to and the variance of X is given as the number of occurrences of symbol a i ). The sample variance is given as σ 2 = var ( X ) = p ( 1 − p ) σ 2 = ˆ ˆ p ( 1 − ˆ p ) Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 3 / 26

Binomial Distribution: Number of Occurrences Given the Bernoulli variable X , let { x 1 , x 2 ,..., x n } be a random sample of size n . Let N be the random variable denoting the number of occurrences of the symbol a 1 (value X = 1). N has a binomial distribution, given as � n � p n 1 ( 1 − p ) n − n 1 f ( N = n 1 | n , p ) = n 1 N is the sum of the n independent Bernoulli random variables x i IID with X , that is, N = � n i = 1 x i . The mean or expected number of occurrences of a 1 is � n � n n � � � µ N = E [ N ] = E x i = E [ x i ] = p = np i = 1 i = 1 i = 1 The variance of N is n n � � σ 2 N = var ( N ) = var ( x i ) = p ( 1 − p ) = np ( 1 − p ) i = 1 i = 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 4 / 26

Multivariate Bernoulli Variable For the general case when dom ( X ) = { a 1 , a 2 ,..., a m } , we model X as an m -dimensional or multivariate Bernoulli random variable X = ( A 1 , A 2 ,..., A m ) T , where each A i is a Bernoulli variable with parameter p i denoting the probability of observing symbol a i . However, X can assume only one of the symbolic values at any one time. Thus, X ( v ) = e i if v = a i where e i is the i -th standard basis vector in m dimensions. The range of X consists of m distinct vector values { e 1 , e 2 ,..., e m } . The PMF of X is m e ij � P ( X = e i ) = f ( e i ) = p i = p j j = 1 with � m i = 1 p i = 1. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 5 / 26

Multivariate Bernoulli: Mean The mean or expected value of X can be obtained as  1   0    p 1 m m 0 0 p 2       � � µ = E [ X ] = e i f ( e i ) = e i p i =  p 1 + ··· +  p m =  = p  .   .   .  . . .       . . . i = 1 i = 1    0 1 p m The sample mean is     n 1 / n p 1 ˆ n m n 2 / n p 2 ˆ µ = 1 n i     � � ˆ x i = n e i =  =  = ˆ  .   .  p . . n     . . i = 1 i = 1   n m / n p m ˆ where n i is the number of occurrences of the vector value e i in the sample, i.e., the number of occurrences of the symbol a i . Furthermore, � m i = 1 n i = n . Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 6 / 26

b b b b Multivariate Bernoulli Variable: sepal length Probability Mass Function Bins Domain Counts The total sample size is n = 150; the [ 4 . 3 , 5 . 2 ] Very Short ( a 1 ) n 1 = 45 estimates ˆ p i are: ( 5 . 2 , 6 . 1 ] Short ( a 2 ) n 2 = 50 ( 6 . 1 , 7 . 0 ] Long ( a 3 ) n 3 = 43 p 1 = 45 / 150 = 0 . 3 ˆ ( 7 . 0 , 7 . 9 ] Very Long ( a 4 ) n 4 = 12 p 2 = 50 / 150 = 0 . 333 ˆ We model sepal length as a multivariate p 3 = 43 / 150 = 0 . 287 ˆ Bernoulli variable X p 4 = 12 / 150 = 0 . 08 ˆ  e 1 = ( 1 , 0 , 0 , 0 ) if v = a 1  f ( x )    e 2 = ( 0 , 1 , 0 , 0 ) if v = a 2 0 . 333 X ( v ) = 0 . 3 0 . 287 0 . 3  e 3 = ( 0 , 0 , 1 , 0 ) if v = a 3    e 4 = ( 0 , 0 , 0 , 1 ) if v = a 4 0 . 2 For example, the symbolic point 0 . 1 0 . 08 x 1 = Short = a 2 is represented as the vector ( 0 , 1 , 0 , 0 ) T = e 2 . 0 x e 1 e 2 e 3 e 4 Very Short Short Long Very Long Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 7 / 26

Multivariate Bernoulli Variable: Covariance Matrix We have X = ( A 1 , A 2 ,..., A m ) T , where A i is the Bernoulli variable corresponding to symbol a i . The variance for each Bernoulli variable A i is σ 2 i = var ( A i ) = p i ( 1 − p i ) The covariance between A i and A j is σ ij = E [ A i A j ] − E [ A i ] · E [ A j ] = 0 − p i p j = − p i p j Negative relationship since A i and A j cannot both be 1 at the same time. The covariance matrix for X is given as     σ 2 σ 12 ... σ 1 m p 1 ( 1 − p 1 ) − p 1 p 2 ··· − p 1 p m 1  σ 2    σ 12 ... σ 2 m − p 1 p 2 p 2 ( 1 − p 2 ) ··· − p 2 p m 2     Σ =  =  . . .   . . .  ... ... . . . . . .    . . . . . . σ 2 σ 1 m σ 2 m ... − p 1 p m − p 2 p m ··· p m ( 1 − p m ) m More compactly Σ = diag ( p ) − p · p T where µ = p = ( p 1 , ··· , p m ) T . Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 8 / 26

Categorical, Mapped Binary and Centered Dataset Modeling as multivariate Bernoulli variable is equivalent to treating X ( x i ) as a new n × m binary data matrix X A 1 A 2 Z 1 Z 2 0 1 − 0.4 0.4 x 1 Short x 1 z 1 0 1 − 0.4 0.4 x 2 Short x 2 z 2 1 0 0.6 − 0.6 x 3 Long x 3 z 3 0 1 − 0.4 0.4 x 4 Short x 4 z 4 1 0 0.6 − 0.6 x 5 Long x 5 z 5 X is the multivariate Bernoulli variable   e 1 = ( 1 , 0 ) T if v = Long ( a 1 ) X ( v ) = e 2 = ( 0 , 1 ) T if v = Short ( a 2 )  The sample mean and covariance matrix are � 0 . 24 � − 0 . 24 p = ( 2 / 5 , 3 / 5 ) T = ( 0 . 4 , 0 . 6 ) T p T = � µ = ˆ ˆ Σ = diag (ˆ p ) − ˆ p ˆ − 0 . 24 0 . 24 From the centered data, we have Z = ( Z 1 , Z 2 ) T and � 0 . 24 � Σ = 1 − 0 . 24 � 5 Z T Z = − 0 . 24 0 . 24 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 9 / 26

Multinomial Distribution: Number of Occurrences Let { x 1 , x 2 ,..., x n } be a random sample from X . Let N i be the random variable denoting number of occurrences of symbol a i in the sample, and let N = ( N 1 , N 2 ,..., N m ) T . N has a multinomial distribution, given as � � m � � � n p n i N = ( n 1 , n 2 ,..., n m ) | p = f i n 1 n 2 ... n m i = 1 The mean and covariance matrix of N are:   np 1  .  µ N = E [ N ] = nE [ X ] = n · µ = n · p = .   . np m   np 1 ( 1 − p 1 ) − np 1 p 2 ··· − np 1 p m   − np 1 p 2 np 2 ( 1 − p 2 ) ··· − np 2 p m   Σ N = n · ( diag ( p ) − pp T ) =  . . .  ... . . .   . . . − np 1 p m − np 2 p m ··· np m ( 1 − p m ) The sample mean and covariance matrix for N are � p T � � µ N = n ˆ ˆ p Σ N = n diag (ˆ p ) − ˆ p ˆ Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 10 / 26

Bivariate Analysis Assume the data comprises two categorical attributes, X 1 and X 2 , dom ( X 1 ) = { a 11 , a 12 ,..., a 1 m 1 } dom ( X 2 ) = { a 21 , a 22 ,..., a 2 m 2 } We model X 1 and X 2 as multivariate Bernoulli variables X 1 and X 2 with dimensions m 1 and m 2 , respectively. The joint distribution of X 1 and X 2 is modeled as the m 1 + m 2 � X 1 � dimensional vector variable X = X 2 � X 1 ( v 1 ) � � e 1 i � � ( v 1 , v 2 ) T � = = X X 2 ( v 2 ) e 2 j provided that v 1 = a 1 i and v 2 = a 2 j . The joint PMF for X is given as the m 1 × m 2 matrix   p 11 p 12 ... p 1 m 2   p 21 p 22 ... p 2 m 2   P 12 =  . . .  ... . . .   . . . p m 1 1 p m 1 2 ... p m 1 m 2 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 11 / 26

Data Mining and Machine Learning: Fundamental Concepts and - PowerPoint PPT Presentation

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Data Mining: Concepts and Techniques Chapter 1 Introduction 1 August 19, 2013

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 1 of Data Mining by

Introduction What is data mining? to Data mining functionalities Data Mining Major

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Data Mining: Concepts and Techniques Web Mining Li Xiong Slides credits: Jiawei Han and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

CS6220: DATA MINING TECHNIQUES Set Data: Frequent Pattern Mining Instructor: Yizhou Sun

Chapter 2 1 2.1: Inferences about 1 Test of interest throughout regression: Need sampling

Decision Trees Aarti Singh Machine Learning 10-701/15-781 Oct 6 , 2010 Learning a good

Review - Mathematical Tools & Probability Logarithm Fundamentals of Probability Discrete

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Statistical Modeling of Loss Vincent Goulet Distributions Using actuar Probability Laws

Early Childhood Funding Funding Trends and Leveraging Opportunities Obje jectives 1. Deepen

Recent Federal Government Actions in Support of Children Doug Murphy 51 st Annual Conference of

Data Mining and Machine Learning: Fundamental Concepts and - PowerPoint PPT Presentation

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Data Mining: Concepts and Techniques Chapter 1 Introduction 1 August 19, 2013

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 1 of Data Mining by

Introduction What is data mining? to Data mining functionalities Data Mining Major

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Data Mining: Concepts and Techniques Web Mining Li Xiong Slides credits: Jiawei Han and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

CS6220: DATA MINING TECHNIQUES Set Data: Frequent Pattern Mining Instructor: Yizhou Sun

Chapter 2 1 2.1: Inferences about 1 Test of interest throughout regression: Need sampling

Decision Trees Aarti Singh Machine Learning 10-701/15-781 Oct 6 , 2010 Learning a good

Review - Mathematical Tools &amp; Probability Logarithm Fundamentals of Probability Discrete

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Statistical Modeling of Loss Vincent Goulet Distributions Using actuar Probability Laws

Early Childhood Funding Funding Trends and Leveraging Opportunities Obje jectives 1. Deepen

Recent Federal Government Actions in Support of Children Doug Murphy 51 st Annual Conference of

Review - Mathematical Tools & Probability Logarithm Fundamentals of Probability Discrete