High Performance Computing with do doAzur ureP ePar arallel - - PowerPoint PPT Presentation

high performance computing with
SMART_READER_LITE
LIVE PREVIEW

High Performance Computing with do doAzur ureP ePar arallel - - PowerPoint PPT Presentation

High Performance Computing with do doAzur ureP ePar arallel allel Using Azure as your Parallel-Backend for Microsoft Embarassingly Parallel work JS Tan Azure Big Compute Azure Infrastructure Commodity, most Fast processors, Fast


slide-1
SLIDE 1

High Performance Computing with

do doAzur ureP ePar arallel allel

Using Azure as your Parallel-Backend for Embarassingly Parallel work Microsoft JS Tan

slide-2
SLIDE 2

Azure Big Compute

slide-3
SLIDE 3

Commodity, most value for cost Fast processors, higher memory-to- core ratio, SSDs Most memory, Intel Xeon processors HPC/Low Latency VMs for compute intensive workloads GPU enabled VMs for Visualization/ Compute Fast processors, lower-memory to core ratio, SSDs

Azure Infrastructure

slide-4
SLIDE 4

What is Batch?

APP Tasks are assigned to computers/VMs Many individual tasks Many computers/VMs

slide-5
SLIDE 5

Scenarios

  • A quant back-testing portfolio strategies
  • A data scientist optimizing their model & parameter tuning
  • A life-science researcher doing genome sequencing
slide-6
SLIDE 6

What do they have in common?

  • Scale – computationally expensive work - need to scale up in order to

get results back quickly

  • Minimal IT Management – the user is the domain specialist, not an IT

specialist

  • Elastic compute – temporary need for a lot of capacity
  • Cost effective – low cost strategies are important!

+ They are all probably using R…

slide-7
SLIDE 7

doAzureParallel is...

A R package that uses Azure as a parallel-backend for popular open source tools to use – foreach, caret, dplyr, etc.

slide-8
SLIDE 8

Foreach using doAzureParallel

foreach (i = 1:100) %dopar% { myParallelAlgorithm(...) }

Microsoft Azure

slide-9
SLIDE 9

doAzureParallel on Azure Batch

Azure Batch is a platform service that provides easy job scheduling and cluster management, allowing applications or algorithms to run in parallel at scale.

  • Capacity on demand; jobs on demand
  • Autoscale (more on this later)
  • Minimal cluster management (node failure, install, etc)
  • Hardware choice – use any VM size
  • Pay by the minute
  • Cost effective – no charge for using it, you only pay for the VMs
  • More cost effective – low priority VMs (more on this later)

If you want to run jobs using elastic compute, Batch is a great fit!

slide-10
SLIDE 10

Scale

  • From 1 to 10,000 VMs for a cluster
  • From 1 to millions of tasks
  • Your selection of hardware:
  • General compute VMs (A-Series / D-Series)
  • Memory / storage optimized (G-Series)
  • Compute Optimized (F-Series)
  • GPU enabled (N-Series)
  • Results from computing the mandelbrot set when

scaling up:

Local machine 5 parallel workers 10 parallel workers 20 parallel workers

slide-11
SLIDE 11

Minimal Cluster Management

  • Abstract away complex Azure/cloud concepts
  • Zero IT-level management
  • Work entirely in R Studio
  • Monitor / Debug your jobs directly in R

studio

  • Manage your cluster and multiple jobs

directly in R studio

  • The results of your distributed, large scale

work can be returned directly to your R session

slide-12
SLIDE 12

Minimal code change

  • Minimal code change to use doAzureParallel
  • Easy to use and you can get started in just a few lines of code
slide-13
SLIDE 13

Elastic Compute

  • Compute on-demand
  • Create/delete your cluster as you

need

  • Autoscaling pool = maximizing cloud

elasticity

  • Long running batch jobs / overnight
  • Daily scheduled work – pre-provision

cluster so its ready for you at the beginning of the day

  • Bursty work
slide-14
SLIDE 14

Cost Effective

  • Low-Priority = (extremely) Low Costs
  • Provisioning VMs from Azure’s surplus capacity at 80% discount
  • Your Azure cluster can contain both regular (dedicated) VMs and low-priority VMs

My Local R Session

Azure Batch

Low Priority VMs

at up to 80% discount

Dedicated VMs

slide-15
SLIDE 15

Cost Effective: More about Low Priority

When should I use it?

  • Long running work that can be broken into smaller pieces and work that doesn't have a strict time

limit to complete

  • Experimentation, testing, evaluating models

What you need to know when using it:

  • Possibility that Azure
  • will not allocate your VMs OR
  • that it will take some or all of the capacity back
  • If a node is pre-empted
  • Azure Batch will replace your node for you
  • Azure Batch will reschedule your work so that you job can successfully complete
slide-16
SLIDE 16

Low Priority Scenarios

Dedicated Low-priority

Lowest Cost Lower cost + guaranteed baseline capacity Lower cost + maintaining capacity w/ autoscale

Azure Batch Pool Azure Batch Pool Azure Batch Pool Preempted Capacity Time Capacity Time Capacity Time

slide-17
SLIDE 17

Questions? www.github.com/azure/doazureparallel

https://aka.ms/earl2017

slide-18
SLIDE 18

What’s new with doAzureParallel?

  • Low priority support a
  • Richer Job Management experience a
  • Resource Files to preload data a
  • Parameter Tuning integration with Caret a
  • Simple connector to Azure Blob Storage a
slide-19
SLIDE 19

R + Azure Batch

So what R workloads work great on Azure Batch?

  • Simulation based work (VaR calculation, back-testing, monte-carlo simulations, financial

modelling)

  • Parameter Tuning / Model Evaluation (grid search, random search, cross validation, etc)
  • Computing against data / ETL jobs / Data-prep jobs

What industries / verticals might be interested in using this?

  • Financial Services
  • Education & Research
  • Sports analytics
slide-20
SLIDE 20

doAzureParallel (since initial release)

  • Initial release in March
  • Grass roots strategy
  • End-user focused
  • Financial Services targeted / key messaging has been around simulation based work
  • Interest from the field
  • Feedback
slide-21
SLIDE 21

Azure Batch

Low Priority VMs

at up to 80% discount

Dedicated VMs

My Local R Session