AI and Predictive Analytics in Data-Center Environments Supervised - - PowerPoint PPT Presentation

ai and predictive analytics in data center environments
SMART_READER_LITE
LIVE PREVIEW

AI and Predictive Analytics in Data-Center Environments Supervised - - PowerPoint PPT Presentation

AI and Predictive Analytics in Data-Center Environments Supervised Learning Methods Josep Ll. Berral @BSC Intel Academic Education Mindshare Initiative for AI Introduction If we have data and it is labeled, we can learn their relation and


slide-1
SLIDE 1

AI and Predictive Analytics in Data-Center Environments

Supervised Learning Methods

Josep Ll. Berral @BSC

Intel Academic Education Mindshare Initiative for AI

slide-2
SLIDE 2

Introduction

“If we have data and it is labeled, we can learn their relation and predict future labels”

slide-3
SLIDE 3

Supervised Learning

  • Supervised Learning
  • Training data is already labeled
  • Want to predict new unlabeled data
slide-4
SLIDE 4

Supervised Learning

Set of Examples <features> + <label> Observe Label Model Observe New Set of Examples <features> Automatic Labeling <labels> Later...

slide-5
SLIDE 5

Supervised Learning

  • Labeling data:
  • By hand
  • By known methods
  • By posterior metrics
  • From known data

in

  • ut

“cat” “cat” “dog” “cat” “dog” “cat” “cat” “cat” “dog” “dog”

Observe Label

dataset

slide-6
SLIDE 6

Typical Flow

Model “dog” What is this? Right! [Reinforce]

slide-7
SLIDE 7

Typical Flow

Model “dog” What is this? No! It is a “cat” [Adapt] Model “dog” What is this? Right! [Reinforce]

slide-8
SLIDE 8

Typical Flow

Model “dog” What is this? No! It is a “cat” [Adapt] Model “dog” What is this? Right! [Reinforce] Model “cat” What is this? Right! [Reinforce]

slide-9
SLIDE 9

Typical Flow

Model “dog” What is this? No! It is a “cat” [Adapt] Model “dog” What is this? Right! [Reinforce] Model “cat” What is this? Right! [Reinforce] Iterate until the model is accurate

slide-10
SLIDE 10

Good Procedures

  • Keep some data “unseen”, to avoid overfitting / memorizing

Training Set Model Algorithm

slide-11
SLIDE 11

Good Procedures

  • Keep some data “unseen”, to avoid overfitting / memorizing

Training Set Validation Set Model Algorithm Predict Model Evaluate Is it good? No Tune the Algorithm

slide-12
SLIDE 12

Good Procedures

  • Keep some data “unseen”, to avoid overfitting / memorizing

Training Set Validation Set Model Algorithm Predict Model Evaluate Is it good? No Tune the Algorithm Test Set Model Predict Final Evaluation Yes

slide-13
SLIDE 13

Supervised Learning

Algorithms & Methods!

slide-14
SLIDE 14

Algorithms & Methods

  • Classification
  • The outputs are “classes”

E.g.:

  • Regression
  • The outputs are “quantities”

E.g.:

“cat” “dog” Max Speed Car Properties

slide-15
SLIDE 15

Some Methods

  • Regression algorithms
  • Linear & Polynomial Regression, Gaussian Processes, ...

“Attempt to find a function/set-of-functions that match with the example points”

  • The learning process minimizes the regression error
slide-16
SLIDE 16

Some Methods

  • Trees and Forests
  • Decision Trees, Regression Trees, Random Forests

“Attempt to find a set of recursive partition that minimize the classification or regression error”

“A” “B” “C”

“A” “B” “C” “A” “C” “B” “B” “A” “C”

slide-17
SLIDE 17

Some Methods

  • k – Nearest Neighbors

“Compare new samples with some memorized ones, and classify/predict as the ‘k’ nearest ones”

?

slide-18
SLIDE 18

Some Methods

  • Bayesian Methods
  • Naïve Bayes, Bayesian Networks, ...

“Compute probabilities of classes, events and relations, then apply Bayes theorem” P(A|B) = P(A) · P(B|A) / P(B) P(Class|Example) = P(Class) · P(Example|Class) / P(Example) = P(Class & Example) / P(Example)

slide-19
SLIDE 19

Some Methods

  • Support Vector Machines

“Find the function that best divides classes, with a minimal tolerance for errors”

slide-20
SLIDE 20

Summary

  • Supervised Learning:
  • Models learn from labeled data, and “human-driven” tuning
  • Methods for Regression & Classification
  • Lots of Algorithms to be applied
  • Each with its characteristics
  • Strong and weak points
  • Different consumption of resources