INSTITUTE O OF C COMPUTING T TECHNOLOGY
How to Use HPC AI500
Zihan Jiang, Xingwang Xiong, Tianshu Hao, and Jianfeng Zhan
http://www.benchcouncil.org/HPCAI500/index.html ICT, Chinese Academy of Sciences
ASPLOS 2018, Williamsburg, VA, USA
How to Use HPC AI500 Zihan Jiang, Xingwang Xiong, Tianshu Hao, and - - PowerPoint PPT Presentation
How to Use HPC AI500 Zihan Jiang, Xingwang Xiong, Tianshu Hao, and Jianfeng Zhan INSTITUTE O http://www.benchcouncil.org/HPCAI500/index.html OF C COMPUTING T ICT, Chinese Academy of Sciences TECHNOLOGY ASPLOS 2018, Williamsburg, VA, USA
INSTITUTE O OF C COMPUTING T TECHNOLOGY
http://www.benchcouncil.org/HPCAI500/index.html ICT, Chinese Academy of Sciences
ASPLOS 2018, Williamsburg, VA, USA
HPC AI500 Bench 19
n Version 1.0 on
n Reference Implementation on BenchHub:
n Download the reference implementation on BenchHub n Prepare the dataset, environment according README.md n Running the scripts (training, evaluation, inference)
HPC AI500 Bench 19
n http://125.39.136.212:8090/hpc-ai500/EWA (Extreme
n CUDA Version: http://125.39.136.212:8090/hpc-ai500/hpc-ai500-
benchmark/tree/master/micro_benchmarks/CUDA_version
n MKL Version: http://125.39.136.212:8090/hpc-ai500/hpc-ai500-
benchmark/tree/master/micro_benchmarks/MKL_version
HPC AI500 Bench 19
n Extreme weather analysis
HPC AI500 Bench 19
n Extreme weather poses a great challenge to human society.
n Achieving this goal always requires accurately identifying the
Extratropical Cyclone Tropical Cyclone Atmospheric River Tropical Depression
HPC AI500 Bench 19
n Using deep learning as the data analysis tool to automatically
Original weather images labeled weather images Essentially an object detection task
HPC AI500 Bench 19
n Dataset Intro:
n https://extremeweatherdataset.github.io/
n Dataset Download:
n
The files are large (62 GB each). Obtain them from the following Globus endpoint.
n
https://app.globus.org/file- manager?origin_id=89a33dca-e540-11e9-9bfc- 0a19784404f4&origin_path=%2F
n
You will need a Globus endpoint of your own for the transfer. n Features:
n
16 channels, high resolution (1152 * 768)
HPC AI500 Bench 19
n ResNet-50 + FPN
HPC AI500 Bench 19
HPC AI500 Bench 19
HPC AI500 Bench 19
export PYTHONPATH="$(pwd)/src" mpirun -np 32 --hostfile "src/hostfile" -bind-to none -map-by slot \
python src/train.py --logdir /path/to/logdir/ \
DATA.BASEDIR=${DATA_DIR} TRAINER=horovod
HPC AI500 Bench 19
python src/predict.py --predict /path/to/dataset/1979/climo_1979_00101.tif \
HPC AI500 Bench 19
# time_to_accuracy.sh export PYTHONPATH="$(pwd)/src" LOG_DIR=train_log/ ACC_THRESHOLD=0.11 python src/time_to_accuracy.py --logdir ${LOG_DIR} --acc_threshold ${ACC_THRESHOLD}
HPC AI500 Bench 19
export PYTHONPATH="$(pwd)/src" python http-server.py --load train_log/model-167000 \
n Visualization on browser
n http://localhost:5000 * The prediction result contains the predicted boxes and their confidence. * TD, TC, EC, and AR represent Tropical Depression, Tropical Cyclone, Extratropical Cyclone, and Atmospheric River, respectively.
HPC AI500 Bench 19
n training loss n mAP (mean Average Precision)
HPC AI500 Bench 19
Ranks:128 Ranks:32
HPC AI500 Bench 19
20 40 60 80 100 120 140 160 180 1 8 16 32 Throuthput(samples/sec) Practical Ideal
Only 50% scaling efficiency. Reason: The EWA workload use Faster-rcnn for object detection. The sizes and numbers of objects are different in each image, which leads to different amount of computation in each rank.
HPC AI500 Bench 19
n Extreme weather analysis
HPC AI500 Bench 19
n Evaluate the upper bound performance of the
n Convolution
n Pooling n Fully-connected
HPC AI500 Bench 19
n The MKL version of the implementation is
n See the following link:
HPC AI500 Bench 19
n CUDA: 9.0 n CUDNN: 7.1.4 n OPEN MPI: 3.1.2 n HDF5: 1.10.4
HPC AI500 Bench 19
n Source code: cudnn_conv.cpp n Running script: run_conv.sh n Parameters
HPC AI500 Bench 19
n Source code: cudnn_pooling_forward.cpp n Running script: run_pooling.sh n Parameters:
HPC AI500 Bench 19
n Source code: cudnn_fc_forward.cpp n Running script: run_fc.sh n Parameters:
HPC AI500 Bench 19
HPC AI500 Bench 19
Deep Learning FLOPS Nvidia Volta Architecture 64 FMA floating point operations per cycle
HPC AI500 Bench 19
HPC AI500 Bench 19
HPC AI500 Bench 19