SLIDE 5 7/27/2017 5
Naïve Baye s
- k‐NN and decision trees do not make any
assumptions about the distribution of the underlying data.
- If we assume that the data comes from a
certain underlying distribution, we can treat the data as a statistical sample. This can reduce the influence of the outliers on
- ur model.
- A naïve Bayes classifier assumes the
independence of the predictors within each class. This classifier is a good choice for relatively simple problems.
- Function – fitcnb
- Performance
- Fit Time:
– Normal Dist. ‐ Fast; Kernel Dist. – Slow
– Normal Dist. ‐ Fast; Kernel Dist. – Slow
– Normal Dist. ‐ Small; Kernel Dist. ‐ Moderate to large
– ʹDistributionʹ – Distribution used to calculate probabilities – ʹWidthʹ – Width of the smoothing window (when ʹDistributionʹ is set to ʹkernelʹ)
– ʹKernelʹ – Type of kernel to use (when ʹDistributionʹ is set to ʹkernelʹ). – Special Notes
– Naive Bayes is a good choice when there is a significant amount of missing data.
Disc riminant Analysis
- Similar to naive Bayes, discriminant analysis works by
assuming that the observations in each prediction class can be modeled with a normal probability distribution.
- There is no assumption of independence in each
predictor.
- A multivariate normal distribution is fitted to each class.
- Fit Time: Fast; ∝ size of the data
- Prediction Time: Fast; ∝ size of the data
- Memory Overhead: Linear DA ‐ Small; Quadratic DA ‐
Moderate to large; ∝ number of predictors
‐ ʹDiscrimTypeʹ ‐ Type of boundary used. ‐ ʹDeltaʹ ‐ Coefficient threshold for including predictors in a linear boundary. (Default 0.) ‐ ʹGammaʹ ‐ Regularization to use when estimating the covariance matrix for linear DA.
- Linear discriminant analysis works well for “wide”
data (more predictors than observations).
- Linear Discriminant Analysis
– The default classification assumes that the covariance for each response class is assumed to be the same. This results in linear boundaries between classes.
– DaModel = fitcdiscr(dataTrain,ʹresponseʹ);
- Quadratic Discriminant Analysis
– Give up equal covariance assumption, a quadratic boundary will be drawn between classes
– daModel = fitcdiscr(dataTrain,ʹresponseʹ,ʹDiscrimTypeʹ,ʹquadra ticʹ);