AI and Predictive Analytics in Data-Center Environments - - PowerPoint PPT Presentation

ai and predictive analytics in data center environments
SMART_READER_LITE
LIVE PREVIEW

AI and Predictive Analytics in Data-Center Environments - - PowerPoint PPT Presentation

AI and Predictive Analytics in Data-Center Environments Introduction to Machine Learning Josep Ll. Berral @BSC Intel Academic Education Mindshare Initiative for AI Introduction Let the machine to automate the analysis for you


slide-1
SLIDE 1

AI and Predictive Analytics in Data-Center Environments

Introduction to Machine Learning

Josep Ll. Berral @BSC

Intel Academic Education Mindshare Initiative for AI

slide-2
SLIDE 2

Introduction

“Let the machine to automate the analysis for you”

slide-3
SLIDE 3

Introduction

  • Machine Learning:

1. Algorithms and methods… 2. …to automatically learn/model a system… 3. …from some observations

slide-4
SLIDE 4

Machine Learning

Collected Data

“a-posteriori” labelling Yep! It’s Versicolor

Labelled Data

Obtaining Data and Pre-processing Samples

Example of “Supervised Learning”

slide-5
SLIDE 5

Machine Learning

Collected Data

“a-posteriori” labelling Yep! It’s Versicolor

Labelled Data Labelled Data

ML method Model for Labelling Obtaining Data and Pre-processing Training a Model Samples

Example of “Supervised Learning”

slide-6
SLIDE 6

Machine Learning

Collected Data

“a-posteriori” labelling Yep! It’s Versicolor

Labelled Data Labelled Data

ML method Model for Labelling Model for Labelling “It’s Versicolor” Obtaining Data and Pre-processing Training a Model Infering New Data New sample Samples

Example of “Supervised Learning”

Prediction

slide-7
SLIDE 7

Learning Example

Let’s see that with Real Data

slide-8
SLIDE 8

Learning Example

  • “Recognizing iris flowers”
  • Iris setosa? Iris versicolor? Iris virginica?
slide-9
SLIDE 9

Learning Example

  • The “Iris Data-Set”
  • People measured and classified the flowers
  • 1. x1: sepal length in cm
  • 2. x2: sepal width in cm
  • 3. x3: petal length in cm
  • 4. x4: petal width in cm
  • 5. class: [Iris Setosa, Iris Versicolor, Iris Virginica]

*R.A. Fisher (1936). Source: https://archive.ics.uci.edu/ml/datasets/Iris

slide-10
SLIDE 10

Learning Example

  • We have labeled samples:

sepal length sepal width petal length petal width class 5.1 3.5 1.4 0.2 Setosa 7.0 3.2 4.7 1.4 Versicolor 5.8 2.7 5.1 1.9 Virginica ... ... ... ... ...

slide-11
SLIDE 11

Learning Example

  • Given any iris, we want to know to which class it belongs

sepal length sepal width petal length petal width class 6.1 3.2 5.0 1.3 ???

slide-12
SLIDE 12

Learning Example

  • Find a function:
  • That function can be linear/non-linear/tree/set-of-rules/...

f(sepal length, sepal width, petal length, petal width) → class

The ML algorithm will produce that Model (here a formula) The Model can predict new samples

slide-13
SLIDE 13

Learning Example

Another example

slide-14
SLIDE 14

Another Example

  • Algorithm to detect spam e-mails
  • A Bayes-based (Naïve Bayes) approach
  • Counting the word “diamonds” from spam-classified e-mails:

spam ¬ spam diamonds 130 5 135 ¬ diamonds 987 300 1287 1117 305 1422

slide-15
SLIDE 15

Another Example

  • The NB algorithm concludes from emails that:
  • These stats will define the model

P(spam) = 1117/1422 = 0.786 P(¬spam) = 305/1422 = 0.214 P(diamonds) = 135/1422 = 0.095 P(¬diamonds) = 1287/1422 = 0.905 P(diamonds & spam) = 130/1422 = 0.091 P(diamonds & ¬spam) = 5/1422 = 0.0035 spam ¬ spam diamonds 130 5 135 ¬ diamonds 987 300 1287 1117 305 1422

slide-16
SLIDE 16

Another Example

  • The model for “diamonds” become:
  • Given a new mail with “diamonds”:
  • The mail is classified as “spam”

P(spam) = 1117/1422 = 0.786 P(¬spam) = 305/1422 = 0.214 P(diamonds) = 135/1422 = 0.095 P(¬diamonds) = 1287/1422 = 0.905 P(diamonds & spam) = 130/1422 = 0.091 P(diamonds & ¬spam) = 5/1422 = 0.0035 P(spam | diamonds) ← P(diamonds & spam) P(diamonds) P(¬spam | diamonds) ← P(diamonds & ¬spam) P(diamonds) P(spam | diamonds) = 0,9514 P(¬spam | diamonds) = 0,0367

slide-17
SLIDE 17

ML Capabilities

  • Models are based on statistical properties of data
  • ML algorithms “automate” such modeling
  • Uses:

1. Estimate values (regression) 2. Predict classes or categories (classification) 3. Find similarities from data (clustering) 4. Recommendations 5. Display properties of the modeled system

  • There are lots of different Algorithms, with different properties
slide-18
SLIDE 18

Interpretability

  • Interpreting Models
  • Know the mechanism behind data!

Model

Data

A → B

slide-19
SLIDE 19

Interpretability

  • Models are usually interpretable
  • Read the model
  • View the statistical properties of data

Model

Data

A → B

Model

Data

C = Ax + B

Model

Data

A > 5 → C A ≤ 5 → B

slide-20
SLIDE 20

Interpretability

  • E.g. Regressions
  • E.g. Decision trees
  • E.g. Naive Bayes

f(x) = 102 + speed * 10.34 + weight * 5.14

P(diamonds | spam) = 0.115 P(spam) = 0.786

slide-21
SLIDE 21

Summary

  • Time to decide which algorithm use
  • Choose those that are expected to fit better to your problem
  • Look for the statistical analysis you would perform
  • And which question you want to answer
  • According to
  • … the data you have
  • … the problem you are solving
  • … what you know about the problem
  • … how you are going to use the model
  • Some people…
  • … have a favorite set of methods
  • … just try a bunch of them, and see which one works better
  • … select the method according to their constraints