Revision (Part I I ) Ke Chen Revision slides are going to summarise - PowerPoint PPT Presentation

Revision (Part I I ) Ke Chen Revision slides are going to summarise all you have learnt from Part II, which should be helpful for you to prepare your exam in January along with those non-assessed exercises also available from the teaching page. COMP24111 Machine Learning

Generative Models and Naïve Bayes • Probabilistic Classifiers – discriminative vs. generative classifiers – Bayesian rule used to convert generative to discriminative, MAP for decision making • Naïve Bayesian Assumption – Conditionally independent assumption on input attributes • Naïve Bayes classification algorithm: discrete vs. continuous features – Estimate conditional probabilities for each attribute given a class label and prior probabilities for each class label (training phase) – MAP rule for decision making (test phase) • Relevant issues – Zero conditional probability due to short of training examples – Applicability to problems violating the naïve Bayes assumption COMP24111 Machine Learning 2

Clustering Analysis Basics • Clustering Analysis Task – discover the “natural” clustering number – properly grouping objects into “sensible” clusters • Data type and representation – Data type: continuous vs. discrete (binary, ranking, …) – Data matrix and distance matrix • Distance Metric – Minkowski distance (Manhattan, Euclidean …) for continuous – Cosine measure for nonmetric – Distance for binary: contingency table, symmetric vs. asymmetric • Major Clustering Approach – Partitioning, hierarchical, density-based, spectral, ensemble, … COMP24111 Machine Learning 3

K-Means Clustering • Principle – A typical partitioning clustering approach with an iterative process to minimise the square distance in each cluster • K-means algorithm 1) Initialisation: choose K centroids (seed points) 2) Assign each data object to the cluster whose centroid is nearest 3) Re-calculate the mean for each cluster to get a updated centroid 4) Repeat 2) and 3) until no new assignment • Application: K-means based image segmentation • Relevant issues – Efficiency: O( tkn ) where t, k < < n – Sensitive to initialisation and converge to local optimum – Other weakness and limitations COMP24111 Machine Learning 4

Hierarchical and Ensemble Clustering • Hierarchical clustering – Principle: partitioning data set sequentially – Strategy: divisive (top-down) vs. agglomerative (bottom-up) • Cluster Distance – Single-link, complete-link and averaging-link • Agglomerative algorithm 1) Convert object attributes to distance matrix 2) Repeat until number of cluster is one Merge two closest clusters o Update distance matrix with cluster distance o • Key concepts and technique in heretical clustering – Dendrogram tree, life-time of clusters, K life-time – Inferring the number of clusters with maximum K life-time • Clustering ensemble based on evidence accumulation – Multiple k-means clustering with different initialisation, resulting in different partitions; – Accumulating the “evidence” from all partitions to form a “collective distance” matrix; – Apply agglomerative algorithm to the “collective distances” and decide K using maximum K life-time COMP24111 Machine Learning 5

Cluster Validation • Cluster Validation – Evaluate the results of clustering in a quantitative and objective fashion – Performance evaluation, clustering comparison, find cluster num. • Two different types of cluster validation methods – Internal indexes • No ground truth available and sometimes named “relative index” • Defined based on “common sense” or “a priori knowledge” • Scatter-based validity indexes • Application: finding the “proper” number of clusters, … – External indexes • Ground truth known or reference given • Rand Index: understanding the idea in addressing the permutation/inconsistency issues • Application: performance evaluation of clustering, clustering comparison... – Weighted clustering ensemble • Key idea: using multiple “meaningful” validity indexes to weight different partitions before evidence accumulation to diminish the effect of trivial partitions COMP24111 Machine Learning 6

Examination I nformation • Three Sections (total 60 marks) – Section A (30 marks) o 30 multiple choice questions totally ( online ) o Q1-15 for Part I & Q16-30 for Part II – Section B (15 marks) o Compulsory questions relevant to Part I – Section C (15 marks) o Compulsory questions relevant to Part II • Length: two hours • Calculator (without memory) allowed COMP24111 Machine Learning 7

Revision (Part I I ) Ke Chen Revision slides are going to summarise - PowerPoint PPT Presentation

Revision (Part I I ) Ke Chen Revision slides are going to summarise all you have learnt from Part II, which should be helpful for you to prepare your exam in January along with those non-assessed exercises also available from the teaching page.

Week 12 Revision Discrete Math May 14, 2020 Marie Demlova: Discrete Math Revision Revision

Revision! How can we help? Revision Technique Didnt bother to revise ? How do you revise?

REVISION GUIDES: How to use them effectively Miss A Humphries and Mr C Dawson Science revision

REVISION GUIDES: How to use them effectively Miss A Humphries and Mrs K Leafe Science revision

EWBS Receiving Module Specifications 1.00 Century Revision history Revision history Revision

Accounts Revision https://nsa.org.na Overview Revision of national accounts Performance

Effective Revision Techniques Mrs Poole DLA RE Revision skills Much more than simply reading,

Revision Python - Nick Reynolds April 7, 2017 Revision (~15 mins) This Class Quiz

B2 Symmetry and Relativity Revision 1 TT 2020 Revision notes Highlights basic things

Revision of Pharmaceutical Affairs Law (PAL) - Japan Update - Revision of Pharmaceutical

1 ReVision Energy presentation to SMMC Energy Team 3-13-2014 Sam LaValle of ReVision

EWBS Receiving Module Communication specifications v1.00 Century Revision history Revision

Using Revision Control In Vivado Tim Vanevenhoven Overview of revision control Recent

SoM Curriculum Revision The Curriculum Revision Committee 9/19/2017 Who is the Committee

Year 8 Study Skills Study Skills - Successful Revision Strategies 1. Organisation 2. Revision

Year 11 Parents Evening Revision and Study Skills The Importance of Revision Well organised,

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

The Human Right to a Healthy Environment INTERNATIONAL ASSOCIATION OF DEMOCRATIC LAWYERS HANOI,

Depth, Universality, Learned Ministry Challenges for Preparing Young Generations for the Future

Employment Options and Guidelines for Hiring Foreign Employees MOTT Center Wayne State

Air Travel Forecast Problem Objectives Introduction to forecasting methods Experience

Linear Methods for Regression and Classification Petr Pok Czech Technical University in

Good Predictions Are Worth a Few Comparisons Carine Pivoteau with Nicolas Auger and Cyril Nicaud

Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets Dimension Reduction