AI and Predictive Analytics in Data-Center Environments - - PowerPoint PPT Presentation

▶

Aug 14, 2022 175 likes •401 views

AI and Predictive Analytics in Data-Center Environments Introduction to Machine Learning Josep Ll. Berral @BSC Intel Academic Education Mindshare Initiative for AI Introduction Let the machine to automate the analysis for you

SLIDE 1

AI and Predictive Analytics in Data-Center Environments

Introduction to Machine Learning

Josep Ll. Berral @BSC

Intel Academic Education Mindshare Initiative for AI

SLIDE 2

Introduction

“Let the machine to automate the analysis for you”

SLIDE 3

Introduction

Machine Learning:

1. Algorithms and methods… 2. …to automatically learn/model a system… 3. …from some observations

SLIDE 4

Machine Learning

Collected Data

“a-posteriori” labelling Yep! It’s Versicolor

Labelled Data

Obtaining Data and Pre-processing Samples

Example of “Supervised Learning”

SLIDE 5

Machine Learning

Collected Data

“a-posteriori” labelling Yep! It’s Versicolor

Labelled Data Labelled Data

ML method Model for Labelling Obtaining Data and Pre-processing Training a Model Samples

Example of “Supervised Learning”

SLIDE 6

Machine Learning

Collected Data

“a-posteriori” labelling Yep! It’s Versicolor

Labelled Data Labelled Data

ML method Model for Labelling Model for Labelling “It’s Versicolor” Obtaining Data and Pre-processing Training a Model Infering New Data New sample Samples

Example of “Supervised Learning”

Prediction

SLIDE 7

Learning Example

Let’s see that with Real Data

SLIDE 8

Learning Example

“Recognizing iris flowers”
Iris setosa? Iris versicolor? Iris virginica?

SLIDE 9

Learning Example

The “Iris Data-Set”
People measured and classified the flowers
1. x1: sepal length in cm
2. x2: sepal width in cm
3. x3: petal length in cm
4. x4: petal width in cm
5. class: [Iris Setosa, Iris Versicolor, Iris Virginica]

*R.A. Fisher (1936). Source: https://archive.ics.uci.edu/ml/datasets/Iris

SLIDE 10

Learning Example

We have labeled samples:

sepal length sepal width petal length petal width class 5.1 3.5 1.4 0.2 Setosa 7.0 3.2 4.7 1.4 Versicolor 5.8 2.7 5.1 1.9 Virginica ... ... ... ... ...

SLIDE 11

Learning Example

Given any iris, we want to know to which class it belongs

sepal length sepal width petal length petal width class 6.1 3.2 5.0 1.3 ???

SLIDE 12

Learning Example

Find a function:
That function can be linear/non-linear/tree/set-of-rules/...

f(sepal length, sepal width, petal length, petal width) → class

The ML algorithm will produce that Model (here a formula) The Model can predict new samples

SLIDE 13

Learning Example

Another example

SLIDE 14

Another Example

Algorithm to detect spam e-mails
A Bayes-based (Naïve Bayes) approach
Counting the word “diamonds” from spam-classified e-mails:

spam ¬ spam diamonds 130 5 135 ¬ diamonds 987 300 1287 1117 305 1422

SLIDE 15

Another Example

The NB algorithm concludes from emails that:
These stats will define the model

P(spam) = 1117/1422 = 0.786 P(¬spam) = 305/1422 = 0.214 P(diamonds) = 135/1422 = 0.095 P(¬diamonds) = 1287/1422 = 0.905 P(diamonds & spam) = 130/1422 = 0.091 P(diamonds & ¬spam) = 5/1422 = 0.0035 spam ¬ spam diamonds 130 5 135 ¬ diamonds 987 300 1287 1117 305 1422

SLIDE 16

Another Example

The model for “diamonds” become:
Given a new mail with “diamonds”:
The mail is classified as “spam”

P(spam) = 1117/1422 = 0.786 P(¬spam) = 305/1422 = 0.214 P(diamonds) = 135/1422 = 0.095 P(¬diamonds) = 1287/1422 = 0.905 P(diamonds & spam) = 130/1422 = 0.091 P(diamonds & ¬spam) = 5/1422 = 0.0035 P(spam | diamonds) ← P(diamonds & spam) P(diamonds) P(¬spam | diamonds) ← P(diamonds & ¬spam) P(diamonds) P(spam | diamonds) = 0,9514 P(¬spam | diamonds) = 0,0367

SLIDE 17

ML Capabilities

Models are based on statistical properties of data
ML algorithms “automate” such modeling
Uses:

1. Estimate values (regression) 2. Predict classes or categories (classification) 3. Find similarities from data (clustering) 4. Recommendations 5. Display properties of the modeled system

There are lots of different Algorithms, with different properties

SLIDE 18

Interpretability

Interpreting Models
Know the mechanism behind data!

Model

Data

A → B

SLIDE 19

Interpretability

Models are usually interpretable
Read the model
View the statistical properties of data

Model

Data

A → B

Model

Data

C = Ax + B

Model

Data

A > 5 → C A ≤ 5 → B

SLIDE 20

Interpretability

E.g. Regressions
E.g. Decision trees
E.g. Naive Bayes

f(x) = 102 + speed * 10.34 + weight * 5.14

P(diamonds | spam) = 0.115 P(spam) = 0.786

SLIDE 21

Summary

Time to decide which algorithm use
Choose those that are expected to fit better to your problem
Look for the statistical analysis you would perform
And which question you want to answer
According to
… the data you have
… the problem you are solving
… what you know about the problem
… how you are going to use the model
Some people…
… have a favorite set of methods
… just try a bunch of them, and see which one works better
… select the method according to their constraints