How to Use HPC AI500 Zihan Jiang, Xingwang Xiong, Tianshu Hao, and - - PowerPoint PPT Presentation

how to use hpc ai500
SMART_READER_LITE
LIVE PREVIEW

How to Use HPC AI500 Zihan Jiang, Xingwang Xiong, Tianshu Hao, and - - PowerPoint PPT Presentation

How to Use HPC AI500 Zihan Jiang, Xingwang Xiong, Tianshu Hao, and Jianfeng Zhan INSTITUTE O http://www.benchcouncil.org/HPCAI500/index.html OF C COMPUTING T ICT, Chinese Academy of Sciences TECHNOLOGY ASPLOS 2018, Williamsburg, VA, USA


slide-1
SLIDE 1

INSTITUTE O OF C COMPUTING T TECHNOLOGY

How to Use HPC AI500

Zihan Jiang, Xingwang Xiong, Tianshu Hao, and Jianfeng Zhan

http://www.benchcouncil.org/HPCAI500/index.html ICT, Chinese Academy of Sciences

ASPLOS 2018, Williamsburg, VA, USA

slide-2
SLIDE 2

HPC AI500 Bench 19

General Steps to Use HPC AI500

n Current release

n Version 1.0 on

  • http://www.benchcouncil.org/HPCAI500/index.html

n Reference Implementation on BenchHub:

  • http://125.39.136.212:8090/hpc-ai500/

n General steps to run the benchmarks

n Download the reference implementation on BenchHub n Prepare the dataset, environment according README.md n Running the scripts (training, evaluation, inference)

slide-3
SLIDE 3

HPC AI500 Bench 19

Download from BenchHub

n http://125.39.136.212:8090/hpc-ai500/ n Component benchmarks:

n http://125.39.136.212:8090/hpc-ai500/EWA (Extreme

Weather Analysis)

n Micro Benchmarks:

n CUDA Version: http://125.39.136.212:8090/hpc-ai500/hpc-ai500-

benchmark/tree/master/micro_benchmarks/CUDA_version

n MKL Version: http://125.39.136.212:8090/hpc-ai500/hpc-ai500-

benchmark/tree/master/micro_benchmarks/MKL_version

slide-4
SLIDE 4

HPC AI500 Bench 19

n Component Benchmark

n Extreme weather analysis

n Micro Benchmark

slide-5
SLIDE 5

HPC AI500 Bench 19

Overview

n Extreme weather poses a great challenge to human society.

Understanding extreme weather life cycle and even predicting its future trend become a significant scientific goal.

n Achieving this goal always requires accurately identifying the

weather patterns to acquire the insight of climate change based on massive climate data analysis.

Extratropical Cyclone Tropical Cyclone Atmospheric River Tropical Depression

slide-6
SLIDE 6

HPC AI500 Bench 19

Overview

n Using deep learning as the data analysis tool to automatically

identify the extreme weather patterns, instead of if-else rules defined by human expert.

Original weather images labeled weather images Essentially an object detection task

slide-7
SLIDE 7

HPC AI500 Bench 19

Dataset

n Dataset Intro:

n https://extremeweatherdataset.github.io/

n Dataset Download:

n

The files are large (62 GB each). Obtain them from the following Globus endpoint.

n

https://app.globus.org/file- manager?origin_id=89a33dca-e540-11e9-9bfc- 0a19784404f4&origin_path=%2F

n

You will need a Globus endpoint of your own for the transfer. n Features:

n

16 channels, high resolution (1152 * 768)

slide-8
SLIDE 8

HPC AI500 Bench 19

Adopted Model

n Faster-RCNN

n ResNet-50 + FPN

  • See the model-desc.log

in the EWA repo on BenchHub for details.

  • http://125.39.136.212:8

090/hpc-ai500/EWA

slide-9
SLIDE 9

HPC AI500 Bench 19

Running Steps

n Data preprocessing

# h5 ⟹ JSON file with COCO format python hdf5_to_json.py -i ${HDF5_PATH} -o ${ANNO_DIR_PATH} -y ${year} # h5 ⟹ 16-channel TIFF images python hdf5_to_tif.py -i ${HDF5_PATH} -o ${TIFF_DIR_PATH} -y ${year}

slide-10
SLIDE 10

HPC AI500 Bench 19

Running Steps

n Environment installation

# build a docker image cd docker docker build -t climo .​ # start and run the docker image: docker run --gpus all --ipc=host -p 2222:22 -d climo docker exec -it climo bash

slide-11
SLIDE 11

HPC AI500 Bench 19

Running Steps

n Training

export PYTHONPATH="$(pwd)/src" ​ mpirun -np 32 --hostfile "src/hostfile" -bind-to none -map-by slot \

  • x NCCL_DEBUG=INFO \
  • x LD_LIBRARY_PATH \
  • x NCCL_SOCKET_IFNAME=eth0 \
  • -allow-run-as-root \

python src/train.py --logdir /path/to/logdir/ \

  • -config MODE_MASK=False MODE_FPN=True \

DATA.BASEDIR=${DATA_DIR} TRAINER=horovod

slide-12
SLIDE 12

HPC AI500 Bench 19

Running Steps

n Inference

python src/predict.py --predict /path/to/dataset/1979/climo_1979_00101.tif \

  • -load train_log/${dir}/model- 247500 \
  • -config MODE_MASK=False MODE_FPN=True
slide-13
SLIDE 13

HPC AI500 Bench 19

Running Steps

n Get the Time-to-accuracy

# time_to_accuracy.sh export PYTHONPATH="$(pwd)/src"​ LOG_DIR=train_log/ ACC_THRESHOLD=0.11 ​ python src/time_to_accuracy.py --logdir ${LOG_DIR} --acc_threshold ${ACC_THRESHOLD}

slide-14
SLIDE 14

HPC AI500 Bench 19

Visualization

n Run HTTP Server

export PYTHONPATH="$(pwd)/src" python http-server.py --load train_log/model-167000 \

  • -config MODE_MASK=False MODE_FPN=True

n Visualization on browser

n http://localhost:5000 * The prediction result contains the predicted boxes and their confidence. * TD, TC, EC, and AR represent Tropical Depression, Tropical Cyclone, Extratropical Cyclone, and Atmospheric River, respectively.

slide-15
SLIDE 15

HPC AI500 Bench 19

Other Metrics

n Obtain other metrics from Tensorboard

n training loss n mAP (mean Average Precision)

slide-16
SLIDE 16

HPC AI500 Bench 19

Ranks:128 Ranks:32

The Impact of Batchsize

slide-17
SLIDE 17

HPC AI500 Bench 19

Scaling Evaluation

20 40 60 80 100 120 140 160 180 1 8 16 32 Throuthput(samples/sec) Practical Ideal

Only 50% scaling efficiency. Reason: The EWA workload use Faster-rcnn for object detection. The sizes and numbers of objects are different in each image, which leads to different amount of computation in each rank.

slide-18
SLIDE 18

HPC AI500 Bench 19

n Component Benchmark

n Extreme weather analysis

n Micro Benchmark

slide-19
SLIDE 19

HPC AI500 Bench 19

Overview

n Objective

n Evaluate the upper bound performance of the

systems.

n Significant DL operators based on the

component workloads.

n Convolution

n Pooling n Fully-connected

slide-20
SLIDE 20

HPC AI500 Bench 19

Example

n Cuda Version

n The MKL version of the implementation is

basically similar.

n See the following link:

  • http://125.39.136.212:8090/hpc-ai500/hpc-ai500-

benchmark/tree/master/micro_benchmarks

slide-21
SLIDE 21

HPC AI500 Bench 19

Running step

n Environment Installation

n CUDA: 9.0 n CUDNN: 7.1.4 n OPEN MPI: 3.1.2 n HDF5: 1.10.4

slide-22
SLIDE 22

HPC AI500 Bench 19

Running Step

n Convolution

n Source code: cudnn_conv.cpp n Running script: run_conv.sh n Parameters

  • Input data size: NCHW format
  • Filter size: OIHW format
  • Paddings
  • Strides
  • Dilations
slide-23
SLIDE 23

HPC AI500 Bench 19

Running Step

n Pooling

n Source code: cudnn_pooling_forward.cpp n Running script: run_pooling.sh n Parameters:

  • Input data size: NCHW
  • Filter size
  • Paddings
  • Strides
  • The mode of pooling(0 for max pooling and 1 for

average pooling)

slide-24
SLIDE 24

HPC AI500 Bench 19

Running Step

n Fully-connected

n Source code: cudnn_fc_forward.cpp n Running script: run_fc.sh n Parameters:

  • Input data size: NCHW
  • Output Channel
slide-25
SLIDE 25

HPC AI500 Bench 19

SPFLOPS

slide-26
SLIDE 26

HPC AI500 Bench 19

Tensorcore

Deep Learning FLOPS Nvidia Volta Architecture 64 FMA floating point operations per cycle

slide-27
SLIDE 27

HPC AI500 Bench 19

Deep Learning FLOPS

n The speed up of enabling Tensorcore

slide-28
SLIDE 28

HPC AI500 Bench 19

Tensorcore's limitations

slide-29
SLIDE 29

HPC AI500 Bench 19