AI and Predictive Analytics in Data-Center Environments Data - - PowerPoint PPT Presentation

ai and predictive analytics in data center environments
SMART_READER_LITE
LIVE PREVIEW

AI and Predictive Analytics in Data-Center Environments Data - - PowerPoint PPT Presentation

AI and Predictive Analytics in Data-Center Environments Data Science and Engineering Josep Ll. Berral @BSC Intel Academic Education Mindshare Initiative for AI Introduction Before doing experiments, we have to know which question we want


slide-1
SLIDE 1

AI and Predictive Analytics in Data-Center Environments

Data Science and Engineering

Josep Ll. Berral @BSC

Intel Academic Education Mindshare Initiative for AI

slide-2
SLIDE 2

Introduction

“Before doing experiments, we have to know which question we want so solve”

slide-3
SLIDE 3

Data Science and Engineering

Which kind of question are we asking to/about our data?

slide-4
SLIDE 4

Descriptive Questions

How is that data?

  • Which features?
  • Which ranges?
  • Descriptive, ordinal or quantity variables?
  • Features follow a distribution?
  • ... ?
slide-5
SLIDE 5

Exploratory Questions

Are there Relations?

  • Related or correlated features?
  • Features result of combination of other features?
  • Features derived from others?
  • ... ?
slide-6
SLIDE 6

Inductive Questions

Does data Repeats?

  • Repetitive values?
  • Repetitive patterns?
  • Recurring data?
  • ... ?
slide-7
SLIDE 7

Predictive Questions

Can we predict data?

  • Does data follow a model?
  • Can be forecasted?
  • Can data be classified/regressed?
  • ... ?
slide-8
SLIDE 8

Causal Questions

What causes that data?

  • Which mechanism is behind data?
  • ... ?
slide-9
SLIDE 9

Algorithms

  • Statistical analysis can help solving those questions…
  • … but sometimes “algorithms” and “heuristics” are required
slide-10
SLIDE 10

Automatic statistical analysis

  • Complexity on analyzing that data:
  • More that simplistic statistic analysis
  • Decisions to be made when selecting features, format data, …

sepal length sepal width petal length petal width class 5.1 3.5 1.4 0.2 Setosa 7.0 3.2 4.7 1.4 Versicolor 5.8 2.7 5.1 1.9 Virginica ... ... ... ... ...

Process Descriptive → Classification? Select this? Normalize? Data

slide-11
SLIDE 11

Automatic statistical analysis

  • Complexity on analyzing that data:
  • Repetitive or exhaustive analysis to be done

D

P

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

P

D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D

. . . Process

slide-12
SLIDE 12

Automatic statistical analysis

  • “Let the machine do the analysis”
  • and select which features to examine
  • and use heuristics to decide which analyses to do
  • and decide/rank when something is relevant
  • and return a reusable data model
  • “Machine Learning” → “Automatic Learning” (or “Modeling”)
slide-13
SLIDE 13

Machine Learning

  • Automatic methods to create models
  • Engineering algorithms to perform Statistical Analyses
  • Designing heuristics to decide which analyses to be done
  • Solving problems using those methods
  • An application of Statistics and Artificial Intelligence
slide-14
SLIDE 14

Methodology

Example

slide-15
SLIDE 15

Examples

  • “Our data seems to show no patterns”
  • No clear or strong correlations
  • Try to find relations on data combinations?
  • Try to find relations on data transformations?
  • No clear “device” generating that data
  • Try to find similar groups on data?
slide-16
SLIDE 16

Examples

  • “Certain features seem to divide data better than others?”
  • Divide data by those features
  • “Divide and Conquer” strategy on data!
  • Repeat analyses on each group
  • Statistical Modeling, check a mechanism in similarities
  • Use a heuristic to decide first-analyzed features
  • “Gain of Information” techniques to decide where to start and where

to follow

slide-17
SLIDE 17

Examples

  • “A subset shows something like patterns”
  • Repeat analysis in subset
  • That’s “Tree-Search” strategy
  • …“Now all subsets can be directly modeled”
  • FIN 
  • (That’s actually called the “RecPart-Tree” algorithm)