BenchCouncil AIBench --- A Datacenter AI Benchmark Suite Wanling - - PowerPoint PPT Presentation

benchcouncil aibench
SMART_READER_LITE
LIVE PREVIEW

BenchCouncil AIBench --- A Datacenter AI Benchmark Suite Wanling - - PowerPoint PPT Presentation

BenchCouncil AIBench --- A Datacenter AI Benchmark Suite Wanling Gao, Fei Tang, Jianfeng Zhan http://www.benchcouncil.org/AIBench/index.html INSTITUTE O BenchCouncil OF C Bench19, Denver, Colorado, USA COMPUTING T TECHNOLOGY Why


slide-1
SLIDE 1

INSTITUTE O OF C COMPUTING T TECHNOLOGY

BenchCouncil AIBench

  • --A Datacenter AI Benchmark Suite

Wanling Gao, Fei Tang, Jianfeng Zhan

http://www.benchcouncil.org/AIBench/index.html

BenchCouncil

Bench’19, Denver, Colorado, USA

slide-2
SLIDE 2

AIBench Bench’19

Why Datacenter AI?

§ AI is widely employed to augment Internet services

n

processing images, video, speech, and audio n There is an urgent need for datacenter AI benchmarks

slide-3
SLIDE 3

AIBench Bench’19

Challenge 1# Isolation

n Confidential issues of workloads and datasets

n Isolation!

n There is no publicly available industry-scale

Internet service benchmark

slide-4
SLIDE 4

AIBench Bench’19

Challenge 2# Microservice based Architecture

n A collection of loosely coupled services

n Various modules and complex execution path n massive scale and complex hierarchy of infrastructure

End-to-end benchmark that models the critical paths and primary modules is needed

Component 1 Merged results

Request st Resp sponse se

Component n Splitted suboperations

slide-5
SLIDE 5

AIBench Bench’19

Challenge 3# Diversity of workloads and models

Bianco, S., Cadene, R., Celona, L., and Napoletano, P. Benchmark analysis of representativ e deep neural network architectures . IEEE Access, 6:64270– 64277, 2018.

slide-6
SLIDE 6

AIBench Bench’19

Challenge 4# Domain-specific metrics

n Time-to-accuracy

n State-of-the-art accuracy

n Throughput n Latency n Tail latency

slide-7
SLIDE 7

AIBench Bench’19

Challenge 5# More vs. Less

n Workload characterization

n SPECCPU 2017 (43), PARSEC3.0 (30), TPC-DS (99)

n Performance ranking (Benchmarketing)

n top500

slide-8
SLIDE 8

AIBench Bench’19

Challenge 6# Inconsistency

n Inconsistent benchmarking requirements

Portability Reality

Earlier stage of architecture research Later stage of architecture research Micro or Component ? Application benchmark ?

slide-9
SLIDE 9

AIBench Bench’19

Requirements

n Industry-scale

n critical paths and primary modules of business AI scenario

  • AI-related and non AI-related components

n A modular framework design

n Collectively as a whole end-to-end application n Individually as a micro or component benchmark

n Representativeness and coverage

n Diverse AI problem domains and datasets are needed

slide-10
SLIDE 10

AIBench Bench’19

Outline

n AIBench Overview

n Tasks, Models, Datasets, Metrics

n How to use AIBench n Preliminary Results n Conclusion

slide-11
SLIDE 11

AIBench Bench’19

BenchCouncil AIBench

n A Datacenter AI Benchmark Suite

n Contributors: many companies and top

universities

  • Alibaba, Microsoft, Paypal, Tencent, etc

http://www.benchcouncil.org/AIBench/index.html

Wanling Gao, Fei Tang, Lei Wang, Jianfeng Zhan, Chunxin Lan, Chunjie Luo, et al. AIBench: An Industry Standard Internet Service AI Benchmark Suite. Technical Report 2019. arXiv preprint arXiv:1908.08998.

slide-12
SLIDE 12

AIBench Bench’19

AIBench Overview

n

The First end-to-end industry- standard AI benchmark suite

n

Industry-scale Internet services

  • critical paths and primary modules
  • AI-related and non AI-related

n

A highly extensible, configurable, and flexible benchmark framework

n

16 prominent AI problem domains

n

Multiple loosely coupled modules

  • Individually

– Micro/Component benchmarks

  • Collectively

– Application benchmarks

slide-13
SLIDE 13

AIBench Bench’19

Sixteen AI Problem Domains

n Text Processing (4)

n Text-to-Text translation, Text summarization, Learning to rank,

Recommendation

n Image Processing (8)

n Image classification, Image generation, Image-to-text, Image-to-Image,

Face embedding, Object detection, Image compression, Spatial transformer

n Audio Processing (1)

n Speech recognition

n Video Processing (1)

n Video prediction

n 3D Data Processing (2)

n 3D face recognition, 3D object reconstruction

slide-14
SLIDE 14

AIBench Bench’19

End-to-End: E-commerce Search

n

Query generator:simulate concurrent users and send query requests

n

Online Module:personalized searching and recommendations

n

Offline Module:a training stage to generate a learning model

n

Data storage module:data storage, e.g., user database, product database

slide-15
SLIDE 15

AIBench Bench’19

Component Benchmark (16)

slide-16
SLIDE 16

AIBench Bench’19

BenchCouncil International Competitions

n DC-AI-C1 Image classification n DC-AI-C8 3D face recognition n DC-AI-C10 Recommendation n Competition papers are available soon!

slide-17
SLIDE 17

AIBench Bench’19

Image Classification

n Extract different thematic classes within an image

n a supervised learning problem to define a set of target

classes and train a model to recognize

n ResNet neural network, Dataset:ImageNet2012, 100GB+

slide-18
SLIDE 18

AIBench Bench’19

Image Generation

n Mimic the distribution of data and generate image

data

n Dataset:LSUN,about million labelled image data n Model: WGAN algorithm

slide-19
SLIDE 19

AIBench Bench’19

Text-to-Text Translation

n Translates text from one language to another

n Model: Transformer n Dataset:WMT English-German (4.5MB training text data)

slide-20
SLIDE 20

AIBench Bench’19

Image-to-Image

n Converts an image from one representation of a

specific scene to another scene or representation

n Model: cycle- GAN algorithm n Datasets: Cityscapes from 50+ cities(300MB)

slide-21
SLIDE 21

AIBench Bench’19

Speech-to-Text

n Recognizes and translates the spoken language to

text

n Model:deep speech 2 n Dataset:LibriSpeech, 1000+ hours‘ speech data

slide-22
SLIDE 22

AIBench Bench’19

Object Detection

n Detects the objects within an image

n Model:Faster R-CNN algorithm n Dataset:MSCOCO2014

  • 82783 training samples, 40504 Validation samples,

40775 test samples(20GB+)

slide-23
SLIDE 23

AIBench Bench’19

Image-to-Text

n Generates the description of an image

automatically

n Model:Neural Image Caption model n Dataset:MSCOCO2014

slide-24
SLIDE 24

AIBench Bench’19

Face Embedding

n Transforms a facial image to a vector in embedding

space

n Model:FaceNet algorithm n Dataset:VGGFace2

  • 36GB training data,1.9GB test data
slide-25
SLIDE 25

AIBench Bench’19

3D Face Recognition

n Recognize the 3D facial information from an image

n Model:3D face models n Dataset:Intellifusion data set,77,715 samples from 253

face IDs

slide-26
SLIDE 26

AIBench Bench’19

Video Prediction

n Predicts the future video through predicting

previous frames transformation

n Model:motion-focused predictive models n Dataset: Robot pushing dataset

  • 59000 samples, 100GB+
slide-27
SLIDE 27

AIBench Bench’19

Image Compression

n Full-resolution lossy image compression

n Model:recurrent neural networks n Dataset:ImageNet2012,100GB+

slide-28
SLIDE 28

AIBench Bench’19

Recommendation

n Collaborative filtering-based movie

recommendations

n Model:Collaborative filtering algorithm n Dataset: MovieLens

  • 20,000,000 movie ratings data
slide-29
SLIDE 29

AIBench Bench’19

3D Object Reconstruction

n Predicts and reconstructs 3D objects

n Model:a convolutional encoder-decoder network n Dataset:ShapeNet

  • 51,3000 different 3D data covering 5 object categories
slide-30
SLIDE 30

AIBench Bench’19

Text Summarization

n Generate the text summary

n Model:sequence-to-sequence model n Dataset: Gigaword

  • 10,000,000 text data, Four billion words
slide-31
SLIDE 31

AIBench Bench’19

Spatial Transformer

n Performs spatial transformations

n Model:spatial transformer networks n Dataset:MNIST

  • 60000 training samples, 10000 test samples
slide-32
SLIDE 32

AIBench Bench’19

Learning to Rank

n Machine-learned ranking for recommender system

n Model: ranking distillation n Dataset: Gowalla

  • Social network data: 196,591 nodes and 950,327 edges
slide-33
SLIDE 33

AIBench Bench’19

Micro Benchmark (12)

slide-34
SLIDE 34

AIBench Bench’19

AIBench Inference Specification

n Inference System under Test

Query Generator

Ø Concurrency Ø Arriving rate Ø Distribution Ø Thinking time

System under Test Monitoring Tools Result Outputs

Ø Accuracy Ø Latency Ø Tail Latency Ø Throughput

Datasets

slide-35
SLIDE 35

AIBench Bench’19

Inference Metrics

n Online Inference (Accuracy-ensured)

n Latency, Tail latency n Latency-bounded throughput

n Offline Inference (Accuracy-ensured)

n Throughput, Energy consumption

n Accuracy-ensured:

n Accuracy deviation with target accuracy is within

2%

slide-36
SLIDE 36

AIBench Bench’19

AIBench Training Specification

n Training System under Test

System under Test Monitoring Tools Result Outputs

Ø Accuracy Ø Latency Ø Tail Latency Ø Throughput

Datasets

slide-37
SLIDE 37

AIBench Bench’19

Training Metrics

n Offline Training

n Time-to-accuracy n Energy-to-accuracy n Throughput

  • Running 1000 epochs
  • Hyper parameter settings should be able to achieve

target accuracy

slide-38
SLIDE 38

AIBench Bench’19

Benchmark Guideline

n Online Server

n Each component can be distributed deployed on a

large cluster

slide-39
SLIDE 39

AIBench Bench’19

Benchmark Guideline

n Offline Analytics

n Single GPU, Multi GPUs, Distributed versions

  • TensorFlow implementation
  • PyTorch implementation

n Example distributed training setting

slide-40
SLIDE 40

AIBench Bench’19

Outline

n AIBench Overview

n Tasks, Models, Datasets, Metrics

n How to use AIBench n Preliminary Results n Conclusion

slide-41
SLIDE 41

AIBench Bench’19

General Steps to Use AIBench

n General steps to run the benchmarks

n Prepare the package of AIBench n Prepare the environments of the selected software stack n Prepare corresponding data set n Run the scripts or commands (User Manual!)

  • Micro benchmarks

– run-tensorflow.sh (TensorFlow), run-pthread.sh (Pthreads)

  • Component benchmarks

– run_train_time.sh (Training stage), run_val_time.sh (Inference stage)

  • Application benchmarks

– Start online and offline modules » neo4j, Elasticsearch, Recommender, Search-planer

slide-42
SLIDE 42

AIBench Bench’19

How to Download AIBench

n http://www.benchcouncil.org/AIBench/download.html

n User Manual

n

http://www.benchcouncil.org/AIBench/files/AIBench-User-Manual.pdf

slide-43
SLIDE 43

AIBench Bench’19

Directory Structure

AIBench Micro Benchmark Pthreads 12 benchmarks 12 benchmarks TensorFlow TensorFlow 16 benchmarks Component Benchmark PyTorch 16 benchmarks Offline Module 10 benchmarks Online Module Online benchmarks Application Benchmark

slide-44
SLIDE 44

AIBench Bench’19

AIBench Framework

n http://125.39.136.212:8090/AIBench/aibench_framework

slide-45
SLIDE 45

AIBench Bench’19

Application Benchmarks

n http://125.39.136.212:8090/AIBench/aibench_application_benchmark

slide-46
SLIDE 46

AIBench Bench’19

Component Benchmarks

n http://125.39.136.212:8090/AIBench/DC_AIBench_Component

slide-47
SLIDE 47

AIBench Bench’19

Micro Benchmarks

n http://125.39.136.212:8090/AIBench/DC_AIBench_Micro

slide-48
SLIDE 48

AIBench Bench’19

BenchCouncil Testbed

n http://www.benchcouncil.org/testbed.html n Provide container-based AIBench images

n Log in and apply for nodes !

n Provide pretrained AI models

slide-49
SLIDE 49

AIBench Bench’19

Image-to-Text (TensorFlow)

n Generates the description of an image

automatically

n Model:Neural Image Caption model n Dataset:MSCOCO2014

n Steps:

n Apply for nodes on testbed n Training or inference

cd DC_AIBench_Component/TensorFlow/Image_to_Text/tf-models/research/im2txt

./run_train_time.sh ./run_val_time.sh

slide-50
SLIDE 50

AIBench Bench’19

Image Classification (PyTorch)

n Extract different thematic classes within an

image

n ResNet neural network, Dataset:ImageNet2012,

100GB+

n Steps:

n Apply for nodes on testbed n Training or inference

cd DC_AIBench_Component/PyTorch/Image_classification

./run_train_time.sh ./run_val_time.sh

slide-51
SLIDE 51

AIBench Bench’19

Text-to-Text (PyTorch)

n Translates text from one language to another

n Model: Transformer n Dataset:WMT English-German

n Steps:

n Apply for nodes on testbed n Training or inference

cd DC_AIBench_Component/PyTorch/Text_to_Text

./run_train_time.sh ./run_val_time.sh

slide-52
SLIDE 52

AIBench Bench’19

Outline

n AIBench Overview

n Tasks, Models, Datasets, Metrics

n How to use AIBench n Preliminary Results n Conclusion

slide-53
SLIDE 53

AIBench Bench’19

Latency of Online Server

n AI components change the critical path significantly

n

34.29 vs.. 49.07 milliseconds for average latency

n Model depth and size limit QoS

n

99th percentile latency increasing from 149.12 to 5335.12 milliseconds when model increasing from 184 MB to 253 MB

n AI-related components suffer from higher cache misses

n

61 vs.. 37 for L2 cache misses per Kilo instructions

slide-54
SLIDE 54

AIBench Bench’19

Offline Training

n SM Efficiency

n Different models have different execution efficiency n Learning_to_rank has the lowest efficiency

slide-55
SLIDE 55

AIBench Bench’19

Runtime Breakdown

n Six categories

n using nvprof to trace the running time breakdown and find the

hotspot functions that occupy more than 80% of running time in total

slide-56
SLIDE 56

AIBench Bench’19

Hotspot Functions

slide-57
SLIDE 57

AIBench Bench’19

Stall Breakdown

n Top two stalls

n memory dependency stalls, execution dependency stalls

slide-58
SLIDE 58

AIBench Bench’19

Outline

n AIBench Overview

n Tasks, Models, Datasets, Metrics

n How to use AIBench n Preliminary Results n Conclusion

slide-59
SLIDE 59

AIBench Bench’19

Conclusion

n AIBench Website:

http://www.benchcouncil.org/AIBench/index.html

n Please refer to user manual for more details !

slide-60
SLIDE 60

AIBench Bench’19

Publications

n Benchmarking

n

AIBench: An Industry Standard Internet Service AI Benchmark Suite. Technical Report, 2019.

n

AIBench: Towards Scalable and Comprehensive Datacenter AI Benchmarking. Bench18.

n

HPC AI500: A Benchmark Suite for HPC AI Systems. Bench18.

n

Edge AIBench: Towards Comprehensive End-to-end Edge Computing Benchmarking. Bench18.

n

AIoT Bench: Towards Comprehensive Benchmarking Mobile and Embedded device

  • Intelligence. Bench18.

n

Data Motifs: A Lens Towards Fully Understanding Big Data and AI Workloads. PACT’18.

n

BigDataBench: a Big Data Benchmark Suite from Internet Services. HPCA’14

n

Data Motif-based Proxy Benchmarks for Big Data and AI Workloads. IISWC 2018.

n

Auto-tuning Spark Big Data Workloads on POWER8: Prediction-Based Dynamic SMT . PACT’16

n

CVR: Efficient Vectorization of SpMV on X86 Processors. CGO’18.

n

Characterizing data analysis workloads in data centers. IISWC 13 best paper award.

slide-61
SLIDE 61

AIBench Bench’19