AI and Predictive Analytics in Data-Center Environments Performance - - PowerPoint PPT Presentation

ai and predictive analytics in data center environments
SMART_READER_LITE
LIVE PREVIEW

AI and Predictive Analytics in Data-Center Environments Performance - - PowerPoint PPT Presentation

AI and Predictive Analytics in Data-Center Environments Performance & Executing Experiments Josep Ll. Berral @BSC Intel Academic Education Mindshare Initiative for AI Introduction We have to choose where/how to run AI algorithms &


slide-1
SLIDE 1

AI and Predictive Analytics in Data-Center Environments

Performance & Executing Experiments

Josep Ll. Berral @BSC

Intel Academic Education Mindshare Initiative for AI

slide-2
SLIDE 2

Introduction

“We have to choose where/how to run AI algorithms & experiments”

slide-3
SLIDE 3

Introduction

  • Algorithms have computing/data requirements
  • Computation Resources
  • CPU, Memory, GPUs, accelerators,

storage, ...

  • Time to run
  • Train models, infer new data, …
  • Data
  • What we are modeling and imitating

Machines to run our algorithms Data to feed our algorithms

slide-4
SLIDE 4
  • Algorithms have computing requirements

Resources

Mem Disk CPU

Algorithms

CPU CPU Mem Mem

Machine Resources

Time

slide-5
SLIDE 5
  • Algorithms also have data requirements

Resources

In Disks

Disk

Data Algorithms

Mem

In Memory

Disk Disk

From Users From the Network

slide-6
SLIDE 6

Environment

We need a COMPUTING ENVIRONMENT!

slide-7
SLIDE 7

Environment

  • Local Machines
  • Own computer
  • Workstations at work
  • ...
slide-8
SLIDE 8

Environment

  • Cluster Machines
  • DataCenter at work
  • DataCener at labs
  • Scientific grids
  • ...

Data-Center / Cluster Your Machine Execution Submission

slide-9
SLIDE 9

Environment

  • The Cloud
  • Data-Centers from Resource Providers

Data-Centers Resource Provider Experiments

slide-10
SLIDE 10

Environment

  • Choosing the environment

X 1 X 16GB

CPU Mem

X 2 X 128GB

CPU Mem

X 4 X 256 GB

CPU Mem

X 32 X 2GB

CPU Mem

X 16 X 4GB

CPU Mem

X 64 X 128 GB

CPU Mem

slide-11
SLIDE 11

Environment

  • Choosing the environment

X 128 X 1TB

CPU Mem

X 1 X 4GB

CPU Mem

X 2 X 2GB

CPU Mem

X 4 X 4GB

CPU Mem

X 16 X 16GB

CPU Mem

X 16 X 16GB

CPU Mem

X 64 X 128GB

CPU Mem

X ∞ X ∞GB

CPU Mem

slide-12
SLIDE 12

Performance

“Work x Time” “Capacity to Progress”

slide-13
SLIDE 13

Performance

  • Performance is (usually) linked to Resources

Mem Disk CPU

Resources Performance

slide-14
SLIDE 14

Performance

  • Performance is (usually) linked to Resources

Mem Disk CPU

Resources Performance

(Up to a point) (Not always) (Not necessarily linear) (etc...)

slide-15
SLIDE 15

Performance

  • Computing Environment
  • Pool of Resources
  • Algorithms/Apps/Experiments
  • Require resources

Mem Disk CPU CPU CPU Mem Mem

X 1 X 32GB

CPU Mem

OK!

slide-16
SLIDE 16

Some Theory

Little’s Law

  • Little’s Law: L = λW
  • “Arrival rate x Dedicated time per input = Average load”
slide-17
SLIDE 17

Some Theory

Little’s Law

  • Relation between
  • Received load
  • Resources/Time required
  • Average load // Required resources

Mem Disk CPU CPU CPU CPU

slide-18
SLIDE 18

Little’s Law: L = λW

  • Example
  • we submit λ = 100 experiments / hour
  • Exps. take an average of W = 0.5 hours
  • [with 1 CPU per exp.]
  • Average number of exps. on our system: L = 50 exps
  • [avg. 50 CPUs in use]

CPU

100 Exp / hour ½ hour 1 CPU

BUSY x 50

CPU

What we expect (what we need)

slide-19
SLIDE 19

Little’s Law

Demo!

slide-20
SLIDE 20
  • 100 CPUs

100 Jobs 100 CPUs BUSY Running

Resource Limits

CPU

x 100 x 100

30 60

CPU

0 Jobs 0 CPUs BUSY Running

Arrival rate: λ = 100 experiments / hour

  • Exps. take:

W = 0.5 hours [1 CPU per exp.] Average exps. in: L = 50 exps

slide-21
SLIDE 21
  • 50 CPUs

(What Little’s Law indicated)

50 Jobs 50 CPUs BUSY Running

Resource Limits

CPU

x 100 x 50

30 60

CPU

50 Jobs 50 CPUs BUSY Running 50 Jobs Queue 0 Jobs Queue

CPU Arrival rate: λ = 100 experiments / hour

  • Exps. take:

W = 0.5 hours [1 CPU per exp.] Average exps. in: L = 50 exps

slide-22
SLIDE 22
  • 25 CPUs
  • Less than needed!

25 Jobs 25 CPUs BUSY Running

Resource Limits

CPU

x 100 x 25

30 60

CPU

25 Jobs 25 CPUs BUSY Running 75 Jobs Queue 50 Jobs Queue

CPU Arrival rate: λ = 100 experiments / hour

  • Exps. take:

W = 0.5 hours [1 CPU per exp.] Average exps. in: L = 50 exps

slide-23
SLIDE 23
  • 25 CPUs
  • Less than needed!

25 Jobs 25 CPUs BUSY Running

Resource Limits

CPU

x 100 x 25

30 60 90

CPU

25 Jobs 25 CPUs BUSY Running 75 Jobs Queue 50 Jobs Queue

CPU

25 Jobs 25 CPUs BUSY Running 125 Jobs Queue

CPU x 100 Arrival rate: λ = 100 experiments / hour

  • Exps. take:

W = 0.5 hours [1 CPU per exp.] Average exps. in: L = 50 exps

slide-24
SLIDE 24

Throughput

  • Throughput: outcome per time unit
  • E.g. experiments finished per hour
  • E.g. data-sets processed per minute
  • E.g. data points trained per second

Load Throughput

Resources Limit

slide-25
SLIDE 25

Resource Competition

  • Systems do not always have “queues”
  • Processes (applications) compete in the system
  • For using the CPU
  • For getting some memory
  • For accessing the disk and network (I/O)