BenchCouncil: Present and Future
- Prof. Dr. Jianfeng Zhan
BenchCouncil http://www.benchcouncil.org 2019.11.14
BenchCouncil: Present and Future Prof. Dr. Jianfeng Zhan - - PowerPoint PPT Presentation
BenchCouncil: Present and Future Prof. Dr. Jianfeng Zhan BenchCouncil http://www.benchcouncil.org 2019.11.14 BenchCouncil International non-profit benchmark organization Executive Committee Prof. D. K. Panda, the Ohio State
BenchCouncil http://www.benchcouncil.org 2019.11.14
Chinese Academy of Sciences (Chair)
centers
Challenges
Emerging Workloads
§ (Forrest Gump) Just do a few tasks, but extremely well
§ Ending of Moore’s Law § End of Dennard Scaling § ILP limitation and inefficiency § Amdahl’s Law
John Hennessy and David Patterson A.M. TURING AWARD WINNERS
13
14
HPC only takes 20% market share Big Data, ML, Internet Service
efficiency and general-purpose!
fundamental operators
database
From E. F. Codd, A relational Model of Data for Large shared data banks. Communication of ACM, vol
Select Difference Union Project Product
From P. Colella, “Defining software requirements for scientific computing,” 2004.
Structured Grids Sparse linear algebra Dense linear algebra Particles Monte Carlo FFT
7“Motifs”
sciences is done out using various combinations of the following core algorithms
Unstructure d Grids
N-Body method
13 dwarfs
creating libraries & frameworks
communication
From K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, et al, “The landscape of parallel computing research: A view from berkeley,” tech. rep., Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, 2006. Combination logic Dynamic programm ing Graph models Graph traversal Finite state machine Backtrack and branch bound Structu- red Grids Unstructure d Grids Sparse linear algebra Dense linear algebra Monte Carlo Spectral method
paper-and pencil approach (NAS parallel benchmarks)
collaboratively by datacenter, edge, and devices.
units of computation handling (input or intermediate) data
computation (data motifs)
Wanling Gao, Jianfeng Zhan, Lei Wang, et al. Data Motif: A Lens towards Fully Understanding Big Data and AI Workloads. PACT 2018.
Ø Internet services Ø Data mining/Machine learning Ø Natural language processing/Computer vision (Recognition Sciences) Ø Bioinformatics (Medical Sciences)
Sampling Transform Graph Logic Set Statistics Sort Matrix
Gao, Wanling, et al. "Data motifs: a lens towards fully understanding big data and AI workloads." Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques. ACM, 2018.
Combinations with different weights
Diverse big data and AI workloads
Ø Coverage of fundamental units of computation Ø Provide the methodology of choosing typical workloads Ø Reduce workload redundancy
Big Data and AI Motif
Ø Matrix Ø Sampling Ø Transform Ø Graph Ø Logic Ø Set Ø Statistics Ø Sort
different data inputs
but also disk and network I/O patterns
§ Case by case
§ Tailoring the system and architecture to characteristics of data
motifs ØNew architecture/accelerator design ØData motif-based libraries ØBottleneck identification and optimization
workload
different weights
15 software stacks Shark Impala DataMPI Hadoop RDMA NoSql
MPI
Streaming
Real-world dataset & data generation tools
Data warehouse NoSQL
24 big data workloads, 6 workload types
Graph analytics Offline analytics Online service
Cover 5 application domains Abstract 3 levels of benchmarking
BigDataBench 5.0: the Scalable and Comprehensive Big Data Benchmark Suite
Application Benchmark Micro Benchmark Component Benchmark Search Engine E-commerce Social Network Recognition science Medical Science Table Semi-structured Structured Text Graph Matrix Image Audio Un-structured
40% 25% 15% 5% 15%
Search Engine Social Network Electronic Commerce
Top 20 websites
Taking up 80% of internet services according to page views and daily visitors http://www.oldcolony.us/wp-content/uploads/2014/11/whatisbigdata-DKB-v2.pdf
new VIDEOS
new PHOTOS
VIDEO
feeds from surveillance cameras data growth are
IMAGES, VIDEOS, documents, …
hours
MUSIC streaming on PANDORA
every minute minutes
VOICE calls on Skype every minute
Internet Service
Search engine, Social network, E-commerce
Recognition Science
http://www.alexa.com/topsites/global;0
50 100 150 50 100 150 200
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015.3
Entries (million)
Nucleotides (billion)
DDBJ/EMBL/GenBank database Growth
Nucleotides Entries
http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth-e.html#dbgrowth-graph
Medical science
BigDataBench 3.2 BigDataBench 3.1 BigDataBench 3.0
CloudRank 1.0 DCBench 1.0 BigDataBench 1.0
BigDataBench 2.0
2013.12 2013.7
Search engine 6 workloads 11 data analytics workloads Mixed data analytics workloads Typical Internet service domains An architectural perspective 19 workloads & data generation tools
2014.4
Multidisciplinary effort 32 workloads: diverse implementations 5 application domains: 14 data sets and 33 workloads Same specifications: diverse implementations Multi-tenancy version BigDataBench subset and simulator version
2014.12 2015.12
BigDataBench 4.0 New software stack: Flink, JStorm, GraphX, GraphLab New workload type: Streaming, Graph processing New dataset and workloads Data motif-based benchmarking methodology Micro, component, application benchmark specification 13 real-world data sets, 47 benchmarks, 7 workload types
2018.03
Moves to BenchCouncil since V5.0
http://www.benchcouncil.org/BigDataBench/index.html
benchmark suite
flexible benchmark framework
http://www.benchcouncil.org/AIBench/index.html
specified only algorithmically in a paper-and pencil approach
to-image, Speech-to-text, Face embedding, Object detection, Recommendation, Video prediction, Image compression, Text summarization, 3D face recognition, 3D object reconstruction, Spatial transformer, Learning to rank
related and non AI-related components
domains
related and non AI-related components
domains
computing.
Cosmology High energy physics Extreme weather analysis The community needs a new yard stick to evaluate the future HPC AI systems.
http://www.benchcouncil.org/HPCAI500/index.html http://125.39.136.212:8090/hpc-ai500/EWA Homepage: Open source :
Benchmark Efforts Datasets
Problem domains Implementation
Scientific Commercial Standalone Distributed
Extreme weather analysis Cosmology High energy physics
HPC AI500 Scientific data ✔ ✔ ✔ ❌ ✔ ✔ TBD Commercial data ❌ ❌ ❌ ✔ ✔ ❌ MLPerf Commercial data ❌ ❌ ❌ ✔ ✔ ❌ DAWNBench Commercial data ❌ ❌ ❌ ✔ ✔ ❌ Fathom Commercial data ❌ ❌ ❌ ✔ ✔ ❌ Deep 500 Commercial data Framework, undefined ✔ ✔
Scenarios Workloads Involved filed Datasets Data type Software stack Micro Benchmarks Convolution N/A matrix 2d MKL CUDNN Pooling 2d sparse Fully-connected 3d Image Recognition ResNet HEP Particle collision dataset 2d sparse TensorFlow Pytorch Cosmology N-body dataset 3d Object Detection Faster-RCNN EWA CAM5 dataset 2d TensorFlow Pytorch Image Generation DCGAN Cosmology N-body dataset 3d TensorFlow Pytorch
http://www.benchcouncil.org/HPCAI500/index.html
Framework
edge computing layer, and client-side devices.
ü Intensive Care Unit(ICU) Patient Monitor ü Surveillance Camera ü Smart Home ü Autonomous Vehicle
http://www.benchcouncil.org/EdgeAIBench/index.html
AI Application Benchmarks End-to-end Application Scenarios Component Benchmarks Heart Failure Prediction ICU Patient Monitor Train, infer, send alarm, generate data Endpoint Prediction ICU Patient Monitor Train, infer, send alarm, generate data Unexpected Respiratory Decompensation Prediction ICU Patient Monitor Train, infer, send alarm, generate data Person Re-identification Surveillance Camera Decompress data, train, compress data, infer, generate data Lane Keeping Autonomous Vehicle Train, infer, generate data Road Sign Recognition Autonomous Vehicle Train, infer, generate data Speech Recognition Smart Home Train, infer, generate data Face Recognition Smart Home Train, infer, generate data
智能手机 智能驾驶 智能家居 智能制造 智能成为基本生产力和日用消费 品的核心元素
driving cars, smart home, industrial robot
natural language processing
workloads
AIoT Bench
http://www.benchcouncil.org/AIoTBench/index.html
pointwise add, ReLU activation, sigmoid activa_x0002_tion, max pooling, average pooling.
ü Software stacks have significant impact on the behaviors of workloads
ü Data movement dominated workloads.
We need a new measuring tools for the CPU component.
Portable across edge, IoT, datacenter processor architectures
BigData
Internet Services http://www.benchcouncil.org/BenchCPU/index.html