Introduction to Artificial Intelligence Introduction to Artificial - - PowerPoint PPT Presentation

introduction to artificial intelligence
SMART_READER_LITE
LIVE PREVIEW

Introduction to Artificial Intelligence Introduction to Artificial - - PowerPoint PPT Presentation

Introduction to Artificial Intelligence Introduction to Artificial Intelligence Data Mining with Clustering Algorithms Miosz Kadziski Institute of Computing Science Poznan University of Technology, Poland www.cs.put.poznan.pl/mkadzinski/iai


slide-1
SLIDE 1

Introduction to Artificial Intelligence Introduction to Artificial Intelligence Data Mining with Clustering Algorithms

Miłosz Kadziński

Institute of Computing Science Poznan University of Technology, Poland www.cs.put.poznan.pl/mkadzinski/iai

Artificial Intelligence Introduction to Artificial Intelligence

slide-2
SLIDE 2

A Few Words About Me

Miłosz Kadziński

e-mail: milosz.kadzinski@cs.put.poznan.pl

  • please use [IAI] in the e-mail’s subject

ph.: +48 61 665 3022 room: 1.6.6 (Technical Library, BT; 1st floor) consultation hours: Wed 9:45 – 11:15 slides: www.cs.put.poznan.pl/mkadzinski/iai

2003 – Adam Mickiewicz High School in Poznań (VIII LO) 2008 – M.Sc. in Computer Science 2012 – Ph.D. in Intelligent Decision Support Systems 2017 – Habilitation in Computer-aided Decision Support Research specialization – Multiple Criteria Decision Analysis Research specialization – Multiple Criteria Decision Analysis Over 40 international and Polish research awards Main author and (informal) supervisor of the BSc Program in AI

Artificial Intelligence Introduction to Artificial Intelligence

slide-3
SLIDE 3

Defining Artificial Intelligence (1)

Activity devoted to making machines intelligent, and intelligence is that quality that enables an entity to function appropriately and with foresight in its environment

Nils J. Nilsson, Cambridge, 2010

A science and a set of computational technologies that are inspired by – but typically operate quire differently from – the way people use their nervous systems and bodies to sense, learn, reason and take action

  • P. Stone et al., Stanford. 2016

Nils J. Nilsson, Cambridge, 2010

Artificial Intelligence Introduction to Artificial Intelligence

slide-4
SLIDE 4

Defining Artificial Intelligence (2)

Characterizing AI depends on the credit one is willing to give software and hardware for ”functioning appropriately” and ”with foresight” The differences in scale, speed, degree of autonomy, generality, … The differences in scale, speed, degree of autonomy, generality, …

electronic calculator (speed, no mistakes) Deep Blue (1997; chess match against Garry Kasparov) (brute force methods, no single use of ”intelligence”)

The frontier of AI is moving far ahead (calculator vs. smartphone) AI suffers from losing claim to its acquisitions (pattern: new technologies, AI suffers from losing claim to its acquisitions (pattern: new technologies, people getting accustomed to them, stop being considered AI)

Artificial Intelligence Introduction to Artificial Intelligence

slide-5
SLIDE 5

Artificial Intelligence: Main Application Areas

Intelligence is a complex phenomenon Frightening, futurist vision of AI dominating films and novels are fictional (superhuman robots) Abuse of AI technologies must be acknowledged and novels are fictional (superhuman robots) …, more importantly, AI is changing our lives

AI AI is improving human wealth, safety, and productivity Transportation

Major research universities devote department to AI studies Apple, Facebook, Google, IBM, and Microsoft explore AI applications

Healthcare Education Home/service robots Public safety Etertainment

Artificial Intelligence Introduction to Artificial Intelligence

slide-6
SLIDE 6

Artificial Intelligence in Transportation and Logistics

Smart cars (GPS; almost 100 sensors responsible for lane changing, self-parking, detecting objects in blind spots, pre-collision systems, …) AI in transportation self-parking, detecting objects in blind spots, pre-collision systems, …) Self-driving cars: Google, Tesla (automatic perception, planning) On-demand transportation: Uber or Lyft matching drivers/passengers Self-driving delivery vehicles: Amazon drones Carpooling/ridesharing: Zimride and Nuride bring people for a joint trip Transportation planning (bus/subway schedule, tracking traffic conditions (speed limits, smart pricing, traffic light), routing trips, predictions about traffic conditions)

Artificial Intelligence Introduction to Artificial Intelligence

slide-7
SLIDE 7

Artificial Intelligence in Healthcare and Medicine

AI in healthcare Clinical decision support: mine outcomes from millions of patient clinical records to enable more personalized diagnosis and treatment, automated image interpretation Mining social media: infer possible health risks, predicts patients at risk Devices/treatments: da Vinci or Computer Motion, millions surgeries a year; better hearing aids and visual assistive devices Patient monitoring and coaching: LifeGraph (behavioral patterns, introduce behavior modifications, alerts from data, identify groups of “people like me”)

Artificial Intelligence Introduction to Artificial Intelligence

slide-8
SLIDE 8

Artificial Intelligence in Education and Teaching

Teaching robots / tutoring systems / online learning: Ozobot teaches children to code and reason; Duolingo provides foreign language AI in education Ozobot teaches children to code and reason; Duolingo provides foreign language training; avatar-based training modules to train military personnel; … Automated generation

  • f questions:

tests for thousands rather than tens Coursera and Udacity make use of AI for grading short- answers, essay questions and programming assignments Model common students misconceptions, predict which students are at risk of failure, and provide real-time student feedback

Artificial Intelligence Introduction to Artificial Intelligence

slide-9
SLIDE 9

Artificial Intelligence in Public Safety

Predictive policing applications and crime prevention: predicting when and where crimes are more likely to happen AI in public safety predicting when and where crimes are more likely to happen and who may commit them (CompStat; NYPD) Detecting while collar crimes (e.g., credit card fraud; cybersecurity) Scanning Twitter and other feeds for certain types of events Cameras for surveillance that can detect anomalies pointing to a possible crime

Artificial Intelligence Introduction to Artificial Intelligence

slide-10
SLIDE 10

Artificial Intelligence in Everyday Life

Vacuum cleaners: Electrolux, iRobot Roomba; obstacle avoidance, self-charging, dealing with full binds, electrical cords and rug tassels, AI in home robots and everyday devices self-charging, dealing with full binds, electrical cords and rug tassels, building a complete 3D world model of a house System in Module, System on Chip: low cost devices able to support

  • nboard AI

Interaction with people: speech understanding and image labeling Smartphones: better photos; battery management; facial recognition (FaceID); voice assistants (Bixby, Google Assistant, Alexa, Siri), creating accurate and rich profiles of

  • wners (mobile advertising, target customers, where to build a next store branch)

Artificial Intelligence Introduction to Artificial Intelligence

slide-11
SLIDE 11

Artificial Intelligence in Entertainment

AI in entertainment Hollywood industry uses AI technologies to bring its fantasies to the screen Software for composing music and recognizing soundtracks Creating stage performances Video games make use of computer vision and AI planning; an alternative existence in a virtual world (Second Life, World of Warcraft)

Artificial Intelligence Introduction to Artificial Intelligence

slide-12
SLIDE 12

A Brief History of Artificial Intelligence

20th CENTURY

Born at a 1956 workshop organized by John McCarthy Mostly academic area of study, but… promised to deliver Theorem proving, logic-based knowledge representation/reasoning Planning (1970s and 1980s), expert and knowledge-based systems Model-based approaches (physics-based approaches in robotics)

21st CENTURY

Started to deliver technologies that have a substantial impact on everyday lives Success of the data-driven paradigm Human-aware systems: accounting for the characteristics of users

Artificial Intelligence Introduction to Artificial Intelligence

slide-13
SLIDE 13

Main Trends in Artificial Intelligence

large-scale machine learning

  • P. Stone et al., Artificial Intelligence and Life in 2030.

One Hundred Year Study on Artificial Intelligence. Stanford, 2016

natural language processing large-scale machine learning (pattern mining from large data) reinforcement learning (experience-driven sequential decision-making) deep learning (neural networks) robotics (training robots to interact with the world) computer vision (machine perception) natural language processing (text processing, speech recognition, machine translation) Internet of things (interconnected devices that share/use information) collaborative systems (autonomous systems that can work with other systems or humans) neuromorphic computing crowdsourcing and human computation algorithmic game theory and computational social choice

Artificial Intelligence Introduction to Artificial Intelligence

slide-14
SLIDE 14

Introduction in Artificial Intelligence: Our Plan

  • I. Clustering (Data mining):

K-means, Hierarchical clustering TODAY Introduction to AI (your course) K-means, Hierarchical clustering

  • II. Classification

(Natural Language Process.): K-NN, Naïve Bayes

  • VII. Search algorithms (A*)
  • III. Classification

(Machine Learning): Decision Trees, ID3, C4

  • VI. Neural networks:

linear and convolutional

  • IV. Evolutionary algorithms

(Optimization)

  • V. Multi-criteria choice methods

(Decision analysis): ELECTRE I

  • VIII. Assessment test

(small problems to solve and a few test questions) TODAY

Artificial Intelligence Introduction to Artificial Intelligence

slide-15
SLIDE 15

Clustering in Data Mining

Clustering is a process of partitioning a set of data (or objects) in a set of meaningful sub-classes, called clusters Cluster is collection of data objects: similar to one another (hence, can Data mining: process of discovering patterns in data sets; extract information from a data set and transform it to a comprehensive structure for further use similar to one another (hence, can be treated collectively as one group) as a collection, they are sufficiently different from other groups intra- vs. inter-cluster similarity Clustering can be seen as unsupervised classification (no pre-defined classes)

Artificial Intelligence Introduction to Artificial Intelligence

slide-16
SLIDE 16

Why Do We Need Clustering?

Data reduction: General aim: help users understand Clustering Prediction based on groups: cluster and find characteristic patterns for each group (similar access paterns) Data reduction: summarization preprocessing for regression, classification; compression: image processing Finding nearest neighbors: localizing search to one or a small number of clusters Outlier detection:

  • utliers are often viewed as

those ”away” from any cluster help users understand the natural structure in a data set

Artificial Intelligence Introduction to Artificial Intelligence

slide-17
SLIDE 17

Historic Application of Clustering

John Snow, a London physician plotted the location of cholera deaths on a map during an outbreak in the 1850s The locations indicated that cases were clustered around certain intersections where they were polluted wells – thus expoising both the problem and the solution Earth-quake studies: observed earth quake epicenters should be clusters along continent faults Criminal investigation: crime detection and prevention

Artificial Intelligence Introduction to Artificial Intelligence

slide-18
SLIDE 18

Clustered Searched Results

i” with Carrot2

Clustering search engines: Grouper, Carrot2, Vivisimo, SnakeT, Yippy Perform clustering and labeling on the results of a search engine Help users to find a quick overview of the search results

results for ”milosz kadzinski” with

Help users to find a quick overview of the search results

Artificial Intelligence Introduction to Artificial Intelligence

slide-19
SLIDE 19

Clustering Based on Ratings: movielens

”MovieLens helps you find movies you will like. Rate movies to build a custom taste profile, then MovieLens recommends other movies for you to watch.” Non-commercial, personalized movie recommendations Groups of users named after animals Groups of users named after animals

Artificial Intelligence Introduction to Artificial Intelligence

slide-20
SLIDE 20

Popular Clustering Applications

Clustering Clustering genes on microarray data: similar expression patterns imply coexpression of genes Areas of similar land use in an earth observation database Groups of motor insurance policy holders with a high average claim cost City planning: identifying groups of houses according to their type, value and geographical locations Marketing: distinct groups in customer bases (develop target marketing programs) Sales segmentation: what types of customers buy what products Similar brands or products: identify competitors, potential market opportunities and available niches

Artificial Intelligence Introduction to Artificial Intelligence

slide-21
SLIDE 21

Clustering Task: Basic Steps

Feature selection / data preprocessing Select info concerning the task of interest May need to normalize/standardize data Distance / similarity measure Similarity of two feature vectors Clustering criterion Cost function or some rule Clustering algorithms Choice of algorithm(s) Validation of the results Interpretation of the results with applications

Artificial Intelligence Introduction to Artificial Intelligence

slide-22
SLIDE 22

Representation of Objects/Items (1)

D1 D2 D3 D4 D5

U1 0 3 2 2 U2 2 1 1 U3 0 3 2 User-pageview transaction matrix

D1 D2 D3 D4 D5

U1 0 1 1 1 U2 1 1 1 U3 0 1 1 documents / pages documents / pages users has the user visited a page in a given session? duration of a visit / / number of page displays

D1 D2 D3 D4 D5

U1 80 40 20 60 100 U2 2 1 5 3 3 min-max normalization: y = x - min max - min

D1 D2 D3 D4 D5

U1 3/4 1/4 2/4 1 U2 1/4 1 2/4 2/4 Need for normalization of data (2 – 1) / (5 – 1) = 1/4 today’s focus: vectors of numbers normalization of data for objects

Artificial Intelligence Introduction to Artificial Intelligence

slide-23
SLIDE 23

Representation of Objects/Items (2)

D1 D2 D3 D4 D5 U1 0 3 2 2 U2 2 1 1 U3 0 3 2 User-pageview transaction matrix D1 D2 D3 D4 D5 U1 0 1 1 1 U2 1 1 1 U3 0 1 1 documents / pages documents / pages users has the user visited a page in a given session? duration of a visit / / number of page displays D1 D2 D3 D4 D5 U1 80 40 20 60 100 U2 2 1 5 3 3 U3 41 2 15 59 90 min-max normalization: y = x - min max - min D1 D2 D3 D4 D5 U1 1 1 1 1 1 U2 U3 1/2 1/39 2/3 56/57 87/97 Need for normalization of data (41 – 2) / (80 – 2) = 1/2 today’s focus: vectors of numbers normalization of data for features

Artificial Intelligence Introduction to Artificial Intelligence

slide-24
SLIDE 24

Popular Distance Metrics for Clustering

Feature vectors: X = < x1, x2, …, xn > Y = < y1, y2, …, yn > D1 D2 D3 D4 D5 Euclidean distance(x,y) = √(x1 – y1)2 + …. + (xn – yn)2 Manhattan distance(x,y) = |x1 – y1| + …. + |xn – yn| Chebyshev distance(x,y) = maxi=1,…,n |xi – yi| D1 D2 D3 D4 D5 U1 3 2 2 U6 2 1 1 2 ED(U1,U6) = √(0-2)2 + (3-0)2 + (2-1)2 + (0-1)2 + (2-2)2 = √15 = 3.873 MD(U1,U6) = |0-2| + |3-0| + |2-1| + |0-1| + |2-2| = 7 CD(U1,U6) = max{|0-2|, |3-0|, |2-1|, |0-1|, |2-2|} = 3 CD(U1,U6) = max{|0-2|, |3-0|, |2-1|, |0-1|, |2-2|} = 3

Artificial Intelligence Introduction to Artificial Intelligence

slide-25
SLIDE 25

Popular Similarity Measures for Clustering

Feature vectors: X = < x1, x2, …, xn > Y = < y1, y2, …, yn > D1 D2 D3 D4 D5 |Ux| vector’s length cosine similarity(x,y) =

j xj ·yj √j xj2 ·√j yj2

simple matching similarity(x,y) = j xj ·yj D1 D2 D3 D4 D5 |Ux| U1 3 2 2 4.12 U6 2 1 1 2 3.16 = j xj ·yj

|x|·|y|

SM(U1,U6) = 0·2 + 3·0 + 2·1 + 0·1 + 2·2 = 6 cos(U1,U6) = 6 / (4.12·3.16) = 0.46 vector’s length General transformations: distance(x,y) = 1 - similarity(x,y) 1 = ideal; 0 = anti-ideal distance(x,y) = 1 / similarity(x,y) General transformations:

Artificial Intelligence Introduction to Artificial Intelligence

slide-26
SLIDE 26

What is Good Clustering?

A good clustering method will produce high quality clusters high intra-class similarity: cohesive within clusters low inter-class similarity: distinctive between clusters The quality of a clustering method depends on the similarity measure used, its implementation, and its ability to discover some or all of the hidden patterns Partitioning approach Hierarchical approach Partitioning approach Constructs various partitions and evaluates them by some criterion K-means, K-medoids, CLARANS Hierarchical approach Hierachical decomposition of the set

  • f data (objects) using some criterion

Diana, Agnes, BIRCH, CAMELEON Density-based approach Based on connectivity and density functions DBSCAN, OPTICS, DenClue Model-based approach A model is hypothetised for each cluster and the best fit of that model is searched EM, SOM, COBWEB

More: grid-based (STING, CLIQUE), frequent pattern-based (pCluster), user-guided or constrained- based (COD, constrained clustering)

Artificial Intelligence Introduction to Artificial Intelligence

slide-27
SLIDE 27

Partitioning Approaches

The notion of comparing item similarities can be extended to clusters themselves, by focusing on a representative vector for each cluster cluster representatives can be actual items in the cluster

  • r other “virtual” representatives such as the centroid

reduces the number of similarity computations in clustering clusters are revised successively until a stopping condition is satisfied, or until no more changes to clusters can be made Reallocation-Based Partitioning Methods Start with an initial assignment of items to clusters and then move items from cluster to cluster to obtain an improved partitioning Most common algorithm: k-means satisfied, or until no more changes to clusters can be made

D1 D2 D3 D4 D5 U2 2 1 1 U3 3 2 U7 1 2 2

centroid

D1 D2 D3 D4 D5 C 1 4/3 2/3 1 2/3

(2 + 0 + 1) / 3 = 1

Artificial Intelligence Introduction to Artificial Intelligence

slide-28
SLIDE 28

K-Means Clustering Method - Example (1)

Given the number of desired clusters K: Randomly assign objects to create K nonempty initial partitions (clusters) Compute the centroids of the clusters of the current partitioning (the centroid is the center, i.e., mean point, of the cluster) Assign each object to the cluster with the nearest centroid (reallocation) Repeat the steps (2 and 3) until the assignment does not change Repeat the steps (2 and 3) until the assignment does not change

D1 D2 D3 D4 D5 U1 3 2 2 U2 2 1 1 U3 3 2 U4 1 2 2 1 U5 1 3 1 U6 2 1 1 2 U7 1 2 2 U8 3 1 2

Initial (arbitrary) assignment: C1={U4}, C2={U6}, C3={U7} Compute the similarity of each item to each cluster (simple matching (dot product) as the similarity measure):

U1 U2 U3 U4 U5 U6 U7 U8 C1 (U4) 8 6 8 10 3 6 5 7 C2 (U6) 6 5 4 6 5 10 6 10 C3 (U7) 4 4 5 6 6 9 3

Allocate each user to the cluster to which it has the highest similarity (shown in red in the above table) C1={U1, U2, U3, U4}, C2={U6, U8}, C3={U5, U7} C1={U1, U2, U3, U4}, C2={U6, U8}, C3={U5, U7} End of the first iteration

Artificial Intelligence Introduction to Artificial Intelligence

slide-29
SLIDE 29

K-Means Clustering Method - Example (2)

We repeat the process for another reallocation…

D1 D2 D3 D4 D5 U1 3 2 2 U2 2 1 1 U3 3 2

… starting from: C1={U1, U2, U3, U4}, C2={U6, U8}, C3={U5, U7} Compute new cluster centroids using the original user-document matrix

D1 D2 D3 D4 D5 C1 3/4 9/4 2/4 3/4 5/4 U3 3 2 U4 1 2 2 1 U5 1 3 1 U6 2 1 1 2 U7 1 2 2 U8 3 1 2 C1 3/4 9/4 2/4 3/4 5/4 C2 5/2 1/2 1/2 1/2 4/2 C3 1/2 1/2 5/2 2/2 1/2

Compute a new centroid-user similarity matrix: Reallocate the items to clusters with the highest similarity:

U1 U2 U3 U4 U5 U6 U7 U8 C1 10.25 4.5 9.25 8 5 5.25 3.25 7 C2 6.5 6 5.5 6.5 4 10 4.5 12 C3 7.5 2.5 2.5 4 8.5 5.5 7.5 3

C1={U1, U3, U4}, C2={U2, U6, U8}, C3={U5, U7} End of the second iteration

Artificial Intelligence Introduction to Artificial Intelligence

slide-30
SLIDE 30

K-Means Clustering Method - Example (3)

We repeat the process for another reallocation…

D1 D2 D3 D4 D5 U1 3 2 2 U2 2 1 1 U3 3 2

… starting from: C1={U1, U3, U4}, C2={U2, U6, U8}, C3={U5, U7} Compute new cluster centroids using the original user-document matrix

D1 D2 D3 D4 D5 C1 1/3 8/3 2/3 2/3 5/3 U3 3 2 U4 1 2 2 1 U5 1 3 1 U6 2 1 1 2 U7 1 2 2 U8 3 1 2 C1 1/3 8/3 2/3 2/3 5/3 C2 7/3 2/3 1/3 2/3 4/3 C3 1/2 1/2 5/2 2/2 1/2

Compute a new centroid-doc similarity matrix: Reallocate the items to clusters with the highest similarity:

U1 U2 U3 U4 U5 U6 U7 U8 C1 12.67 4 11.33 8.67 6.33 5.33 3 7 C2 5.33 6 4.67 6.33 3 8.33 4.33 10.33 C3 7.5 2.5 2.5 4 8.5 5.5 7.5 3

C1={U1, U3, U4}, C2={U2, U6, U8}, C3={U5, U7} No change to the clusters terminate the algorithm

Artificial Intelligence Introduction to Artificial Intelligence

slide-31
SLIDE 31

K-Means Clustering Method

C1 C2

Y

C

1

C2

Y

Pick initial cluster centers Assign each point to the closest cluster center

Y

C1 C2 C C1 C3 C3

X

C3

X

k1 k k3

Y

Move each cluster center to the mean

  • f each cluster

Reassign points closest to a different new cluster center

X

C2 C3 k2

X

Artificial Intelligence Introduction to Artificial Intelligence

slide-32
SLIDE 32

K-Means Clustering Method - Summary

 Applicable only when mean is defined Need to specify k X-means

Weakness

 Simple, understandable  Relatively efficient; complexity

Strength

 Need to specify k X-means  Results can vary vastly depending on the seeds  Unable to handle noisy data

  • r outliers

dependant on t·k·n, where n – no. of objectcs, k – no. of clusters, and t – no. of iterations  Often terminates at a local

  • ptimum

Restart with different random seeds (increase chance of finding global optimumum) K-medoids – instead of mean, use medians of each cluster mean of 1, 3, 5, 7, 9 is 5 mean of 1, 3, 5, 7, 1009 is 205 median of 1, 3, 5, 7, 1009 is 5 median: not affected Variations of k-means differ in: Selection of the initial k means

J = j=1,…,K xCj

Cj sim(xj, mj)

median: not affected by extreme values Selection of the initial k means Distance or similarity measures used Strategies to calculate cluster means

Artificial Intelligence Introduction to Artificial Intelligence

slide-33
SLIDE 33

Hierarchical Clustering Approaches

Two main types of hierarchical clustering Agglomerative Start with the points as individual clusters At each step, merge the closest pair of clusters until a stopping Divisive Start with one, all-inclusive cluster At each step, split a cluster until a stopping criterion is met (e.g., each cluster contains a point) pair of clusters until a stopping criterion (e.g., one cluster left) each cluster contains a point) Traditional hierarchical algorithms use a similarity or distance matrix

Merge or split one cluster at a time A B C D E AB CD CDE ABCDE Step 0 Step 1 Step 2 Step 3 Step 4

Agglomerative

E Step 4 Step 3 Step 2 Step 1 Step 0

Divisive

Artificial Intelligence Introduction to Artificial Intelligence

slide-34
SLIDE 34

Hierarchical Agglomerative Clustering

Basic procedure Place each of N items into a cluster of its own Compute all pairwise item-item similarity coefficients Form a new cluster by combining the most similar pair of current clusters Ci and Cj Update similarity matrix by deleting rows/columns corresponding to Ci and Cj Calculate the entries in the row corresponding to the new cluster Ci+j Methods for computing similarity between clusters: single-link complete link group average centroid method Repeat step 3 (forming a new cluster) until a stopping criterion is met

F C A D B 1 2 5 4

nested clusters

E 1 3

Artificial Intelligence Introduction to Artificial Intelligence

slide-35
SLIDE 35

HAC - Distance Between Two Clusters (1)

dist(Ci,Cj) = minx.y {dist(x,y} : x∈Ci, y∈Cj}

Single-link distance between clusters Ci and Cj is the minimum distance between any object in Ci and any object in Cj The distance is defined by the two closest objects (data points):

sim(Ci,Cj) = maxx.y {sim(x,y} : x∈Ci, y∈Cj}

Single-link similarity between clusters Ci and Cj is the maximum similarity between any object in Ci and any object in Cj The similarity defined by the two most similar objects: It can find arbitrarily shaped clusters, but may cause the undesirable “chain effect” due to noisy points

Artificial Intelligence Introduction to Artificial Intelligence

slide-36
SLIDE 36

HAC - Example Incorporating Single-Link Similarity

U1 U2 U3 U4 U5 U1 U2 U3 U4 U5 U1 1 0.9 0.1 0.65 0.2 U2 0.9 1 0.7 0.6 0.5 U3 0.1 0.7 1 0.4 0.3 U4 0.65 0.6 0.4 1 0.8 U5 0.2 0.5 0.3 0.8 1 Similarity matrix U12 U3 U4 U5 U12 1 0.7 0.65 0.5 U3 0.7 1 0.4 0.3 U4 0.65 0.4 1 0.8 U5 0.5 0.3 0.8 1 sim(U12,U3) = max{sim(U1,U3), sim(U2,U3)} = max{0.1, 0.7} = 0.7 U12 U3 U45 U12 1 0.7 0.65 U3 0.7 1 0.4 U45 0.65 0.4 1 U123 U45 U123 1 0.65 U45 0.65 1 U12345 U12345 1 sim(U12,U45) = max{sim(U12,U4), sim(U12,U5)} = max{0.65, 0.5} = 0.65 Dendrogram 0.9 0.8 0.7 0.65 similarity Possible stopping criteria: number of clusters similarity thresholds (do not combine clusters which are not similar)

Artificial Intelligence Introduction to Artificial Intelligence

slide-37
SLIDE 37

HAC - Distance Between Two Clusters (2)

dist(Ci,Cj) = maxx.y {dist(x,y} : x∈Ci, y∈Cj}

Complete-link distance between clusters Ci and Cj is the maximum distance between any object in Ci and any object in Cj The distance is defined by the two furthest objects (data points):

sim(Ci,Cj) = minx.y {sim(x,y} : x∈Ci, y∈Cj}

Complete-link similarity between clusters Ci and Cj is the minimum similarity between any object in Ci and any object in Cj The similarity defined by the two least similar objects: It is sensitive to outliers because they are far away from each other

Artificial Intelligence Introduction to Artificial Intelligence

slide-38
SLIDE 38

HAC - Example Incorporating Complete-Link Similarity

U1 U2 U3 U4 U5 U1 1 0.9 0.1 0.65 0.2 U2 0.9 1 0.7 0.6 0.5 U3 0.1 0.7 1 0.4 0.3 U4 0.65 0.6 0.4 1 0.8 U5 0.2 0.5 0.3 0.8 1 Similarity matrix U12 U3 U4 U5 U12 1 0.1 0.6 0.2 U3 0.1 1 0.4 0.3 U4 0.6 0.4 1 0.8 U5 0.2 0.3 0.8 1 sim(U12,U3) = min{sim(U1,U3), sim(U2,U3)} = min{0.1, 0.7} = 0.1 U12 U3 u45 U12 1 0.1 0.2 U3 0.1 1 0.3 U45 0.2 0.3 1 U12 U345 U12 1 0.1 U345 0.1 1 U12345 U12345 1 sim(U12,U45) = min{sim(U12,U4), sim(U12,U5)} = min{0.6, 0.2} = 0.2 Dendrogram similarity U1 U2 U3 U4 U5 0.9 0.8 0.3 0.1

Artificial Intelligence Introduction to Artificial Intelligence

slide-39
SLIDE 39

HAC - Distance Between Two Clusters (3)

dist(Ci,Cj) = averagex.y {dist(x,y} : x∈Ci, y∈Cj}

Average-link distance between clusters Ci and Cj is the average distance

  • f all pair-wise distances between the data points in two clusters

Centroid method: the distance between two clusters is the distance between their centroids A compromise between: the sensitivity of complete-link clustering to outliers the tendency of single-link clustering to form long chains that do not correspond to the intuitive notion of clusters as compact, spherical objects

Artificial Intelligence Introduction to Artificial Intelligence

slide-40
SLIDE 40

Summary (1)

U1 U2 U3 U4 U5 D1 0.2 0.5 0.7 0.7 0.8 D2 0.7 0.2 0.6 0.3 0.6 U1 U2 U3 U4 U5 U1 1 0.61 0.83 0.63 0.79 U2 0.61 1 0.94 1 0.96 U3 0.83 0.94 1 0.95 0.99 U4 0.63 1 0.95 1 0.97 U5 0.79 0.96 0.99 0.97 1

Given the representation of five users and the cosine similarity matrix, use 2-means to group these docs into two clusters: Assume the least similar users are the initial centroids. Which users would be used? I) What would be the clustering obtained the first iteration? Groups G1 and G2? II) Would it differ in case U3 and U4 were used as the centroids? Compute the new centroid after the first iteration (for the case of starting with U1 and U2 as the initial centroids). IV)

C1 C2 D1 0.2 ? D2 0.7 ?

Compute the J measure after the first iteration for the above data? III)

U1 U2 U3 U4 U5 U6 U7 U8 C1 8 6 8 10 3 6 5 7 C2 6 5 4 6 5 10 6 10 C3 4 4 5 6 6 9 3

Hint: for our lecture example: J = (8 + 6 + 8 + 10) + + (10 + 10) + + (6 + 9) = 67

Artificial Intelligence Introduction to Artificial Intelligence

slide-41
SLIDE 41

Summary (2)

U1 U2 U3 U4 U5 U1 1 0.61 0.83 0.63 0.79 U2 0.61 1 0.94 1 0.96 U3 0.83 0.94 1 0.95 0.99 U4 0.63 1 0.95 1 0.97 U5 0.79 0.96 0.99 0.97 1

Given the cosine similarity matrix for five users, use agglomerative hierarchical clustering (AHC) to group these users: Which users would be clustered together first (irrespective of how the similarity between groups is defined)? I) Compute the similarity matrix after the first iteration while assuming that the similarity between groups is equal to the maximal/minimal/ average similarity of the users contained in these clusters? II) Present the process of AHC by means of a dendrogram. III)

U1 U24 U3 U5 U1 1 ? 0.83 0.79 U24 ? 1 ? ? U3 0.83 ? 1 0.99 U5 0.79 ? 0.99 1

How many groups would be obtained if the similarity threshold for AHC would be set to 0.8? IV) sim(U1,U24) = max/min/ave{sim(U1,U2),sim(U1,U4)} = max/min/ave{0.63,0.61}

Artificial Intelligence Introduction to Artificial Intelligence