Model-theoretic and algebraic approach in machine learning (data - - PowerPoint PPT Presentation

model theoretic and algebraic approach in machine
SMART_READER_LITE
LIVE PREVIEW

Model-theoretic and algebraic approach in machine learning (data - - PowerPoint PPT Presentation

Model-theoretic and algebraic approach in machine learning (data mining, pattern recognition, etc.) Shevlyakov A.N. I. Algebra, logic and clustering Clustering Clustering is the task of grouping a set of objects in such a way that objects in


slide-1
SLIDE 1

Model-theoretic and algebraic approach in machine learning (data mining, pattern recognition, etc.)

Shevlyakov A.N.

slide-2
SLIDE 2
  • I. Algebra, logic and clustering
slide-3
SLIDE 3

Clustering

Clustering is the task of grouping a set

  • f objects in such a way that objects in

the same group (called a cluster) are more similar (in some sense) to each

  • ther than to those in other groups

(clusters).

slide-4
SLIDE 4

Clustering with algebraic objects Clustering with graphs

slide-5
SLIDE 5

Data representation

Suppose we have n

  • bjects which are

represented by ertices

  • f a complete graph.

The edge label is the distance between the

  • bjects (in a given

metric).

6 2 3 4 5 2 2 1 4 7 4 1 4 4 1.5 A B F C D E

slide-6
SLIDE 6

Clustering process

The input of the algorithm is a positive real number R. Let us remove all edges with labels >R. For example, if R=2 we have the picture -----> Clusters = connected components of the final graph. Here are two clusters {A,B,C,D} {E,F}

2 2 2 1 1 1.5 A B F C D E

slide-7
SLIDE 7

Описание алгоритма

Obviously, if one put R=1.4, then we obtain 4 clusters {A,B},{C,D},{E},{F}.

1 1 A B F C D E

slide-8
SLIDE 8

Hierarchical clustering

slide-9
SLIDE 9

Definition

One should obtain the data decription as a tree-like structure. Thus,

  • bjects form a

multilevel cluster system. This picture --------------> is called dendrogram.

slide-10
SLIDE 10

Agglomerative clustering (general idea)

Let us fix a metric (distance function) between the

  • bjects.

Originally, each cluster is one-element. The distances between the clusters are equal to the distance between the corresponding elements. At each iteration of the algorithms we merge two nearest clusters and recalculate the distances to the new cluster. We do such cluster fusions until it remains one cluster.

slide-11
SLIDE 11

Cluster distance

Let us put that the distance between the clusters S,T equals d(S,T)=minx S,y

T{d(x,y)}

(single-linkage clustering).

slide-12
SLIDE 12

Example

Objects A B C D E A 2 1 5 6 B 2 3 7 4 C 1 3 4 5 D 5 7 4 1 E 6 4 5 1

We have the distance matrix, and we use the single-linkage formula d(S,T)=minx S,y

T{d(x,y)}

for cluster distances

slide-13
SLIDE 13

1st fusion

Objects {A} {B} {C} {D,E} {A} 2 1 5 {B} 2 3 4 {C} 1 3 4 {D,E} 5 4 4 Clusters {A} {B} {C} {D} {E} {A} 2 1 5 6 {B} 2 3 7 4 {C} 1 3 4 5 {D} 5 7 4 1 {E} 6 4 5 1 Let us merge two nearest clusters and recalculate the entries of the matrix:

slide-14
SLIDE 14

2nd fusion

Objects {A.C} {B} {D,E} {A,C} 2 4 {B} 2 4 {D,E} 4 4 Objects {A} {B} {C} {D,E} {A} 2 1 5 {B} 2 3 4 {C} 1 3 4 {D,E} 5 4 4

slide-15
SLIDE 15

3rd fusion

Objects {{A,C},B} {D,E} {{A,C},B} 4 {D,E} 4 Objects {A.C} {B} {D,E} {A,C} 2 4 {B} 2 4 {D,E} 4 4

slide-16
SLIDE 16

The last fusion

Finally, we merge clusters {{A,C},B} и {D,E}. The sequence of fusions is explained by the following dendrogram:

A C B D E

slide-17
SLIDE 17

Outlier detection with clustering

slide-18
SLIDE 18

Outliers detection. Definition

Suppose you have a set of objects М. Detect in M all anomaly objects.

In Russian the words “outlier” and “air pollution” have the same writing ))))

slide-19
SLIDE 19

The examples of outliers

  • 1. (Wiki) if you measure a temperature in random points of your

room, you usually obtain values from interval (18 °C , 22 °C). However, the temperature of a heater is more than 70° C. So points from the heater surface are outliers.

  • 2. I teach at the math faculty of a university. I asked all my

students about their exam marks. I study the obtained data with algorithms of data mining. Surprisingly, all excellent students (with marks A+) are detected as outliers )))))

slide-20
SLIDE 20

Outlier detection based on metrics

Let us represent objects by points in the space Rm. Idea: any outlier has few neighbors, but a regular object has many near neighbors.

slide-21
SLIDE 21

Let us detect outliers with dendrograms

We give the following definition of an outlier. An

  • bject А is outlier, if it directly connects with

the root of dendrogram.

А

  • ther objects
slide-22
SLIDE 22

A theoretical problem

Can you evaluate the probability of the appearance of a dendrogram with outlier? What is the limit of such dendrograms if the number of objects n tends to ∞?

slide-23
SLIDE 23

The algorithm of data generation

How to generate a random distance matrix? It is not simple, because it should satisfy the triangle inequality. Since we will use the formula d(S,T)=minx S,y

T{d(x,y)},

then the absolute values of the distances are not important for us. It is sufficient to define a linear order over the entries of a matrix!

slide-24
SLIDE 24

How we generate matrices?

Let us fix n (the number of objects). We uniformly pick an (n x n)-matrix from the following set: 1) matrices are symmetric with zero diagonals; 2) the upper triangle consists of elements (without repetitions) of a linear ordered set

  • f n(n-1)/2 elements.
slide-25
SLIDE 25

Matrix generation

The number of such matrices is N! where N=n(n- 1)/2. Each matrix defines its own dendrogram. Thus, we may ask: What is the fraction of such matrices whoose dendrograms contain outlier?

slide-26
SLIDE 26

Main result

  • Theorem. For n

∞ almost all datasets (precisely, their distance matrices) contain

  • utliers.
slide-27
SLIDE 27

Connections with 0-1 laws

Is it unexpected that almost all datasets (with the given algorithm of generation) have

  • utliers?

Perhaps, it follows from the 0-1 law for distance matrices? What is the 0-1 law in logic?

slide-28
SLIDE 28

Connections with 0-1 laws

  • Definition. 0-1 law holds for the class of

algebraic structures K if for any first-order sentence Ф either Ф or Ф holds for almost all structures from K. Similarly, one can define the 0-1 law for second-

  • rder and other types of logics.
slide-29
SLIDE 29

Connections with 0-1 laws

  • Question. Let K be the class of all distance
  • matrices. Can you:

1) represent K as a class of algebraic structures

  • f an appropriate language?

2) prove the 0-1 law for such class?

slide-30
SLIDE 30

Second question

Suppose we use the formula d(S,T)=maxx S,y

T{d(x,y)}

(complete-linkage clustering). Does the following theorem holds?

  • Theorem. For n

∞ almost all datasets (precisely, their distance matrices) contain

  • utliers.
slide-31
SLIDE 31
  • II. General approach to data

mining problems

slide-32
SLIDE 32

Digit recornition

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53?gi=bdd39f581d2c

slide-33
SLIDE 33

Problem: sparse data

How to recognize such data? The classic models of machine learning do not work!

?

slide-34
SLIDE 34

Topological approach

We deal with clouds of points (in some metric space). Using the metric, we will construct a simplicial complex of the data.

https://www.math.upenn.edu/~ghrist/preprints/barcodes.pdf

slide-35
SLIDE 35

How to build a complex?

Fix a real number >0. Draw ball for each data point with diameter .

https://www.math.upenn.edu/~ghrist/preprints/barcodes.pdf

slide-36
SLIDE 36

How to build a complex?

  • 1. If two balls intersect, we draw an edge.
  • 2. If n balls have nonempty pairwise intersectoin, we draw a

simplex of dimension n.

https://www.math.upenn.edu/~ghrist/preprints/barcodes.pdf

slide-37
SLIDE 37

Important properties of a data complex

The following features mainly describe the initial dataset: 1) number of holes; 2) size of holes; 3) the dependence of (1-2) on .

slide-38
SLIDE 38

Let us change from 0 to …

https://vk.com/@persistenthomology-persistentnye-gomologii-diagrammy-i-shtrih-kody-dannyh

slide-39
SLIDE 39

Features of dataset

1) Changing , we obtain a function h( )={the number of holes of the complex defined by }. 2) One can introduce the barcode of holes. Barcode describes the lifetime of each hole.

slide-40
SLIDE 40

Barcode of holes

https://vk.com/@persistenthomology-persistentnye-gomologii-diagrammy-i-shtrih-kody-dannyh

slide-41
SLIDE 41

Barcode as a scatter plot

https://vk.com/@persistenthomology-persistentnye-gomologii-diagrammy-i-shtrih-kody-dannyh

slide-42
SLIDE 42

Classification of images

Each image defines its own barcode and scatterplot: Let us define a distance function for scatterplots.

slide-43
SLIDE 43

Wasserstein distance

between scatterplots X,Y: where ranges over bijections between X and Y.

slide-44
SLIDE 44

Now we can cluster any data!

One can apply any algorithm of clustering if the distance between images is defined. Thus, the Wasserstein distance allows us to classify clouds of points.

slide-45
SLIDE 45

Can we apply either algebra or logic to topological data analysis?

I guess we may consider barcodes by algebraic

  • r model-theoretical point of view.

Formally, any barcode is a structure over real numbers:

https://www.math.upenn.edu/~ghrist/preprints/barcodes.pdf

slide-46
SLIDE 46

Relations over intervals

Question: can we use the relations of inclusion and antecedency instead of real-valued intervals in barcode?

slide-47
SLIDE 47

Barcode as a relational algebraic structure

One can treat a barcode as an algebraic structure of a relational language L, and the relations interpret inclusions and antecedency

  • f barcode intervals.

According to this approach, there arise the following classes of algebraic structures: 1) interval graphs; 2) posets; 3) etc…

slide-48
SLIDE 48

Interval graphs

Can you introduce an “good” metric on such graphs?

https://www.pulsus.com/scholarly-articles/a-sequential-and-parallel-algorithm-for-disjoint-cliques-problem-on-interval-graphs-4975.html