BenchCouncil: Present and Future Prof. Dr. Jianfeng Zhan - - PowerPoint PPT Presentation

benchcouncil present and future
SMART_READER_LITE
LIVE PREVIEW

BenchCouncil: Present and Future Prof. Dr. Jianfeng Zhan - - PowerPoint PPT Presentation

BenchCouncil: Present and Future Prof. Dr. Jianfeng Zhan BenchCouncil http://www.benchcouncil.org 2019.11.14 BenchCouncil International non-profit benchmark organization Executive Committee Prof. D. K. Panda, the Ohio State


slide-1
SLIDE 1

BenchCouncil: Present and Future

  • Prof. Dr. Jianfeng Zhan

BenchCouncil http://www.benchcouncil.org 2019.11.14

slide-2
SLIDE 2

BenchCouncil

  • International non-profit benchmark organization
  • Executive Committee
  • Prof. D. K. Panda, the Ohio State University
  • Prof. Lizy Kurian John, the University of Texas at Austin
  • Prof. Geoffrey Fox, Indiana University
  • Prof. Vijay Janapa Reddi, Harvard University
  • Prof. Jianfeng Zhan, ICT, Chinese Academy of Sciences, University of

Chinese Academy of Sciences (Chair)

slide-3
SLIDE 3

Yong but Fast-growing

  • Founded in 2018
  • 60+ Full and associate international memberships
  • Several top Internet service providers and high performance computing

centers

  • http://www.benchcouncil.org/organization.html
slide-4
SLIDE 4

Influential

  • Three conferences
  • International Symposium on Intelligent Computers
  • 6.27-29, Shenzhen, China
  • 1000+ attendees
  • International Symposium on Benchmarking, Measuring, and Optimization
  • 11.14-16,Denver, US
  • International Symposium on Chips
  • 12.18-20, Beijing, China
  • 40+ high-level policy makers in China will attend this symposium
slide-5
SLIDE 5

AI Systems and Algorithms Challenges

  • http://www.benchcouncil.org/competitions.html
  • 500K RMB, 2019
  • Using AIBench
  • Four tracks
  • Systems
  • Cambricon
  • RISC-V
  • X86
  • 3D face recognition challenge
  • Competitors from Top university and company
  • Chinese Academy of Sciences
  • Shanghai Jiaotong University
  • Google
  • Ohio State University
slide-6
SLIDE 6

Award

  • http://www.benchcouncil.org/html/awards.html
  • BenchCouncil achievement award
  • BenchCouncil Fellow
  • Best paper award
slide-7
SLIDE 7

Testbed

  • http://www.benchcouncil.org/testbed.html
  • Host 2019 BenchCouncil International AI system and algorithm

Challenges

  • Provide container-based benchmarks images
  • Provide pre-trained AI models.
slide-8
SLIDE 8

Numbers

  • Report big data and AI performance numbers.
  • http://www.benchcouncil.org/numbers.html
slide-9
SLIDE 9

Organization Evolution

  • Steering Committee
  • Executive Committee
  • Track Steering Committee
  • Big Data
  • Datacenter, HPC, IoT, and Edge AI
  • Track Executive Committee
slide-10
SLIDE 10

Conference Changes

  • BenchCouncil International Symposium on Intelligent Computers
  • BenchCouncil Intelligent Computing Federated Conferences
  • Intelligent Computers
  • Smart Health
  • Smart Finance and Chain Block Systems
  • Education Technologies
slide-11
SLIDE 11

Outline

  • Summary of BenchCouncil Work
  • BenchCouncil’s Viewpoints on Benchmarking AI and Other

Emerging Workloads

slide-12
SLIDE 12

A New Golden Age for Computer Architecture— Domain-specific Co-design

§Only path left is Domain Specific Architectures

§ (Forrest Gump) Just do a few tasks, but extremely well

§Fundamental Changes in Technology

§ Ending of Moore’s Law § End of Dennard Scaling § ILP limitation and inefficiency § Amdahl’s Law

John Hennessy and David Patterson A.M. TURING AWARD WINNERS

slide-13
SLIDE 13

Domain-specific Co-design is Totally Not New!

  • The first computer is domain-specific
  • not general-purpose
  • Few specific tasks
  • Indeed use benchmarks
  • Machine language
  • Even without an OS

13

slide-14
SLIDE 14

HPC: Domain-specific Co-design Flagship

  • FLOPS
  • Benchmarks
  • HPCC (Linpack)
  • OS
  • Eliminate OS noises
  • Communication
  • RDMA
  • Programming: MPI

14

slide-15
SLIDE 15

Co-designing Everything is Brand-new!

  • A big application can afford the co-design cost
  • Google, Alibaba, Facebook, WeChat ……
slide-16
SLIDE 16

The Landscape of Modern Workloads

§Big Data §Machine learning (AI) §Internet services §Different application scenarios

§IoT, Edge, Datacenter, HPC

§Ideal target for co-design

slide-17
SLIDE 17

Server-side

HPC only takes 20% market share Big Data, ML, Internet Service

slide-18
SLIDE 18

(Hardware) Bad News

  • Find a workload (from Google), just do it.
  • Architecture conferences become accelerator ones.
  • Engineers have to put more (1000) accelerators in one node.
slide-19
SLIDE 19

Bad News!

  • Abstractions are abandoned
  • Ad-hoc solutions everywhere!
slide-20
SLIDE 20

Big Data Landscape

slide-21
SLIDE 21

AI Chips

  • AI Inference Chips
  • 100+
  • AI Training Chips
  • 10+
slide-22
SLIDE 22

Fundamental Challenges

  • Lack simple but elegant abstractions that help achieve both

efficiency and general-purpose!

  • Single-purpose is a structure obstacle to resource sharing
slide-23
SLIDE 23
  • Looking back at History!
slide-24
SLIDE 24

Database - Relational Algebra

  • Relational Algebra
  • Five primitive and

fundamental operators

  • Theoretical foundation of

database

  • Strong expression power
  • Compose complex queries

From E. F. Codd, A relational Model of Data for Large shared data banks. Communication of ACM, vol

  • 13. no.6, 1970

Select Difference Union Project Product

slide-25
SLIDE 25

Numerical Method

  • Seven motifs would be important for the next decade

From P. Colella, “Defining software requirements for scientific computing,” 2004.

Structured Grids Sparse linear algebra Dense linear algebra Particles Monte Carlo FFT

7“Motifs”

  • Phillip Colella proposed
  • Simulation in the physical

sciences is done out using various combinations of the following core algorithms

Unstructure d Grids

slide-26
SLIDE 26

Parallel Computing

  • Landscape of Parallel Computing Research

N-Body method

13 dwarfs

  • Berkeley research group
  • Define building blocks for

creating libraries & frameworks

  • A pattern of computation and

communication

From K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, et al, “The landscape of parallel computing research: A view from berkeley,” tech. rep., Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, 2006. Combination logic Dynamic programm ing Graph models Graph traversal Finite state machine Backtrack and branch bound Structu- red Grids Unstructure d Grids Sparse linear algebra Dense linear algebra Monte Carlo Spectral method

slide-27
SLIDE 27

Other Challenges

  • Totally isolated
  • Workload churns
  • SaaS
  • Microservice-based architecture
  • ML models updated frequently
  • Open-source components are not the best!
slide-28
SLIDE 28

Understand Essentials of Workloads

  • The common requirements are specified only algorithmically in a

paper-and pencil approach (NAS parallel benchmarks)

  • Reasonably divorced from individual implementations
slide-29
SLIDE 29

Complexity of Modern Workloads

  • The common requirements are handled differently or even

collaboratively by datacenter, edge, and devices.

slide-30
SLIDE 30

Essentials of Big Data, AI and Internet Services Workloads

  • Treat big data, AI and Internet service workloads as a pipeline of

units of computation handling (input or intermediate) data

  • Target: find the main abstractions of time-consuming units of

computation (data motifs)

  • The combination of data motifs = complex workloads
  • Similar to Relational Algebra

Wanling Gao, Jianfeng Zhan, Lei Wang, et al. Data Motif: A Lens towards Fully Understanding Big Data and AI Workloads. PACT 2018.

slide-31
SLIDE 31

Basic Methodology

slide-32
SLIDE 32

Algorithms with a Broad Spectrum

Ø Internet services Ø Data mining/Machine learning Ø Natural language processing/Computer vision (Recognition Sciences) Ø Bioinformatics (Medical Sciences)

slide-33
SLIDE 33

Our Observations: Eight Data Motifs

Sampling Transform Graph Logic Set Statistics Sort Matrix

Gao, Wanling, et al. "Data motifs: a lens towards fully understanding big data and AI workloads." Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques. ACM, 2018.

slide-34
SLIDE 34

Expression Power of Eight Data Motifs

  • Using the combination of data motifs to represent a wide variety
  • f big data and AI workloads

Combinations with different weights

Diverse big data and AI workloads

Ø Coverage of fundamental units of computation Ø Provide the methodology of choosing typical workloads Ø Reduce workload redundancy

Big Data and AI Motif

Ø Matrix Ø Sampling Ø Transform Ø Graph Ø Logic Ø Set Ø Statistics Ø Sort

slide-35
SLIDE 35

Data Motifs 'Differences from Kernels

  • behaviors are affected by the sizes, patterns, types, and sources of

different data inputs

  • reflect not only computation patterns, memory access patterns,

but also disk and network I/O patterns

slide-36
SLIDE 36

§Ad-hoc solution

§ Case by case

§Structure solution

§ Tailoring the system and architecture to characteristics of data

motifs ØNew architecture/accelerator design ØData motif-based libraries ØBottleneck identification and optimization

Domain-specific Hardware and Software Co-design

slide-37
SLIDE 37

Scalable Benchmark Methodology

  • Traditional: create each benchmark or proxy for every possible

workload

  • Our: Data motif-based (Scalable)
  • Micro Benchmark---Single data motif
  • Component Benchmark---Data motif combination with

different weights

  • Application Benchmark---End-to-end application
slide-38
SLIDE 38

Data Motif-based proxy benchmarks

  • A DAG-like combination of data motifs
  • An auto-tuning tool using machine learning model
  • Mimic system and micro-architectural behaviors
slide-39
SLIDE 39

BenchCouncil Benchmarks

  • http://www.benchcouncil.org/benchmarks.html
  • BigDataBench
  • AIBench
  • HPC AI500
  • Edge AIBench
  • AIoT Bench
  • BENCHCPU
slide-40
SLIDE 40

BigDataBench: a Scalable Big Data Benchmark Suite

15 software stacks Shark Impala DataMPI Hadoop RDMA NoSql

MPI

Streaming

Real-world dataset & data generation tools

Data warehouse NoSQL

24 big data workloads, 6 workload types

Graph analytics Offline analytics Online service

Cover 5 application domains Abstract 3 levels of benchmarking

BigDataBench 5.0: the Scalable and Comprehensive Big Data Benchmark Suite

Application Benchmark Micro Benchmark Component Benchmark Search Engine E-commerce Social Network Recognition science Medical Science Table Semi-structured Structured Text Graph Matrix Image Audio Un-structured

slide-41
SLIDE 41

Representative Domains in BigDataBench 5.0

40% 25% 15% 5% 15%

Search Engine Social Network Electronic Commerce

Top 20 websites

Taking up 80% of internet services according to page views and daily visitors http://www.oldcolony.us/wp-content/uploads/2014/11/whatisbigdata-DKB-v2.pdf

new VIDEOS

  • n YouTube every minute

new PHOTOS

  • n FLICKR every minute

VIDEO

feeds from surveillance cameras data growth are

IMAGES, VIDEOS, documents, …

hours

MUSIC streaming on PANDORA

every minute minutes

VOICE calls on Skype every minute

Internet Service

Search engine, Social network, E-commerce

Recognition Science

http://www.alexa.com/topsites/global;0

50 100 150 50 100 150 200

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015.3

Entries (million)

Nucleotides (billion)

DDBJ/EMBL/GenBank database Growth

Nucleotides Entries

http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth-e.html#dbgrowth-graph

Medical science

slide-42
SLIDE 42

BigDataBench 3.2 BigDataBench 3.1 BigDataBench 3.0

CloudRank 1.0 DCBench 1.0 BigDataBench 1.0

BigDataBench 2.0

2013.12 2013.7

Search engine 6 workloads 11 data analytics workloads Mixed data analytics workloads Typical Internet service domains An architectural perspective 19 workloads & data generation tools

2014.4

Multidisciplinary effort 32 workloads: diverse implementations 5 application domains: 14 data sets and 33 workloads Same specifications: diverse implementations Multi-tenancy version BigDataBench subset and simulator version

2014.12 2015.12

BigDataBench 4.0 New software stack: Flink, JStorm, GraphX, GraphLab New workload type: Streaming, Graph processing New dataset and workloads Data motif-based benchmarking methodology Micro, component, application benchmark specification 13 real-world data sets, 47 benchmarks, 7 workload types

2018.03

BigDataBench Evolution

Moves to BenchCouncil since V5.0

slide-43
SLIDE 43

BigDataBench Users

  • Industry users
  • Accenture, BROADCOM, SAMSUMG, Huawei, IBM
  • China’s first industry-standard big data benchmark suite
  • About 600+ published papers using or citing BigDataBench
  • VLDB/SIGMOD, SC, FAST, ASPLOS, ISCA/Micro/ HPCA and etc.

http://www.benchcouncil.org/BigDataBench/index.html

slide-44
SLIDE 44

AIBench: A Datacenter AI Benchmark Suite

  • A First end-to-end industry-standard AI

benchmark suite

  • Models the critical paths and primary modules
  • f industry-scale Internet services
  • A highly extensible, configurable, and

flexible benchmark framework

  • 16 prominent AI problem domains
  • Multiple loosely coupled modules
  • Micro/Component/Application benchmarks

http://www.benchcouncil.org/AIBench/index.html

slide-45
SLIDE 45

Sixteen AI Problem Domains of AIBench

  • Covering text, image, audio, video, 3D data processing, and

specified only algorithmically in a paper-and pencil approach

  • Image classification, Image generation, Text-to-Text Translation, Image-to-text, Image-

to-image, Speech-to-text, Face embedding, Object detection, Recommendation, Video prediction, Image compression, Text summarization, 3D face recognition, 3D object reconstruction, Spatial transformer, Learning to rank

slide-46
SLIDE 46

The End-to-end Industry-standard Implementation: E-commerce Search

slide-47
SLIDE 47

Datacenter AI Benchmark Comparison

  • End-to-end
  • Covering the critical paths of AI-

related and non AI-related components

  • For different benchmarking purposes
  • Micro, Component, Application
  • For different stages
  • Training, Inference
  • Cover a wide spectrum of problem

domains

  • 16 problem domains
  • Text/Image/Audio/Video processing
  • 15 real-world datasets
slide-48
SLIDE 48

Datacenter AI Benchmark Comparison

  • End-to-end
  • Covering the critical paths of AI-

related and non AI-related components

  • For different benchmarking purposes
  • Micro, Component, Application
  • For different stages
  • Training, Inference
  • Cover a wide spectrum of problem

domains

  • 16 problem domains
  • Text/Image/Audio/Video processing
slide-49
SLIDE 49

HPC AI500: Motivations

  • Deep learning has become the promising tool in scientific

computing.

  • More powerful than if-else rules made by human experts.
  • Emerging HPC AI workloads
  • Extreme weather analysis, Cosmology, High energy physics

Cosmology High energy physics Extreme weather analysis The community needs a new yard stick to evaluate the future HPC AI systems.

slide-50
SLIDE 50

HPC AI500: A Benchmark Suite for HPC AI Systems

  • The first HPC AI Benchmark based on real-world scientific AI applications.
  • Covers the most representative scientific AI applications
  • Real-world scientific datasets.
  • Component benchmarks + micro benchmarks
  • Main metrics: Time-to-accuracy and FLOPS
  • Scalable reference implementations.

http://www.benchcouncil.org/HPCAI500/index.html http://125.39.136.212:8090/hpc-ai500/EWA Homepage: Open source :

slide-51
SLIDE 51

HPC AI Benchmark Comparison

Benchmark Efforts Datasets

Problem domains Implementation

Scientific Commercial Standalone Distributed

Extreme weather analysis Cosmology High energy physics

HPC AI500 Scientific data ✔ ✔ ✔ ❌ ✔ ✔ TBD Commercial data ❌ ❌ ❌ ✔ ✔ ❌ MLPerf Commercial data ❌ ❌ ❌ ✔ ✔ ❌ DAWNBench Commercial data ❌ ❌ ❌ ✔ ✔ ❌ Fathom Commercial data ❌ ❌ ❌ ✔ ✔ ❌ Deep 500 Commercial data Framework, undefined ✔ ✔

slide-52
SLIDE 52

Summary of HPC AI500

Scenarios Workloads Involved filed Datasets Data type Software stack Micro Benchmarks Convolution N/A matrix 2d MKL CUDNN Pooling 2d sparse Fully-connected 3d Image Recognition ResNet HEP Particle collision dataset 2d sparse TensorFlow Pytorch Cosmology N-body dataset 3d Object Detection Faster-RCNN EWA CAM5 dataset 2d TensorFlow Pytorch Image Generation DCGAN Cosmology N-body dataset 3d TensorFlow Pytorch

http://www.benchcouncil.org/HPCAI500/index.html

slide-53
SLIDE 53

Edge AIBench: AI Benchmarks for Edge Computing

  • The First Comprehensive End-to-end Edge AI Benchmark

Framework

  • Covering the whole layer of edge computing framework: cloud server,

edge computing layer, and client-side devices.

  • Four typical scenarios

ü Intensive Care Unit(ICU) Patient Monitor ü Surveillance Camera ü Smart Home ü Autonomous Vehicle

  • Eight application benchmarks
  • Eight component benchmarks
  • Six real-world datasets

http://www.benchcouncil.org/EdgeAIBench/index.html

slide-54
SLIDE 54

Edge AIBench Summary

AI Application Benchmarks End-to-end Application Scenarios Component Benchmarks Heart Failure Prediction ICU Patient Monitor Train, infer, send alarm, generate data Endpoint Prediction ICU Patient Monitor Train, infer, send alarm, generate data Unexpected Respiratory Decompensation Prediction ICU Patient Monitor Train, infer, send alarm, generate data Person Re-identification Surveillance Camera Decompress data, train, compress data, infer, generate data Lane Keeping Autonomous Vehicle Train, infer, generate data Road Sign Recognition Autonomous Vehicle Train, infer, generate data Speech Recognition Smart Home Train, infer, generate data Face Recognition Smart Home Train, infer, generate data

slide-55
SLIDE 55

AI on Things

智能手机 智能驾驶 智能家居 智能制造 智能成为基本生产力和日用消费 品的核心元素

  • apply the intelligent algorithms to the mobile and embedded devices, e.g. smart phone, self-

driving cars, smart home, industrial robot

  • a lack of benchmark to comprehensively evaluate the mobile and embedded devices
slide-56
SLIDE 56

AIoT Bench: AI Benchmarks for IOT

  • Domain diversity
  • image recognition, speech recognition and

natural language processing

  • Platform diversity
  • Android and Raspberry Pi
  • Framework diversity
  • TensorFlow and Caffe2
  • Testing Hierarchy
  • end-to-end application workloads and micro

workloads

AIoT Bench

http://www.benchcouncil.org/AIoTBench/index.html

slide-57
SLIDE 57

AIoT Bench Workloads

  • Image classification workload
  • End-to-end application workload of vision domain,
  • MobileNet
  • Speech recognition workload
  • End-to-end application workload of speech domain
  • DeepSpeech 2
  • Transformer translation workload
  • End-to-end application workload of NLP domain
  • Transformer translation model
  • Micro workloads
  • Convolutional operation, pointwise convolution, depthwise convolution, matrix multiply,

pointwise add, ReLU activation, sigmoid activa_x0002_tion, max pooling, average pooling.

slide-58
SLIDE 58

BENCHCPU: Motivations

  • Traditional CPU-Oriented Benchmarks
  • Compute-intensive workloads (Integer or Floating Point computations)
  • Emerging workloads: Big Data, AI, and Internet Services
  • Totally different characteristics from traditional workloads

ü Software stacks have significant impact on the behaviors of workloads

ü Data movement dominated workloads.

We need a new measuring tools for the CPU component.

slide-59
SLIDE 59

BENCHCPU: A CPU Benchmark Suite for Emerging Workloads

  • Covering emerging application domains
  • Big data applications: 49 representative big data workloads
  • AI applications: 17 prominent AI problem domains
  • Internet services: search engine, social network, and e-commerce

Portable across edge, IoT, datacenter processor architectures

AI

BigData

Internet Services http://www.benchcouncil.org/BenchCPU/index.html

slide-60
SLIDE 60

Thanks a lot!