Understanding Big Data Workloads on Understanding Big Data Workloads - PowerPoint PPT Presentation

Understanding Big Data Workloads on Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan http://prof.ict.ac.cn/BigDataBench INS INSTITUT Professor, ICT, Chinese Academy of Sciences TE E OF CO and University of Chinese Academy of Sciences COMPUTIN MPUTING G T TECHNOLOGY HPBDC 2015 Ohio, USA

Outline Outline � BigDataBench Overview � Workload characterization � Multi-tenancy version � Multi-tenancy version � Processors evaluation BigDataBench HPBDC 2015

What is BigDataBench ? What is BigDataBench ? � An open source big data benchmarking project p g g p j • http://prof.ict.ac.cn/BigDataBench • Search Google using “ BigDataBench ” BigDataBench HPBDC 2015

BigDataBench Detail BigDataBench Detail • Five application domains • Propose benchmark specifications for each domain Methodology • 14 Real world data sets & 3 kinds of big data generators • 14 Real world data sets & 3 kinds of big data generators • 33 Big data workloads with diverse implementation Implementation • BigDataBench subset version g Specific ‐ purpose Version BigDataBench HPBDC 2015

Five Application Domains Five Application Domains DDBJ/EMBL/GenBank database Growth / / Taking up 80% of Nucleotides Entries internet services Internet Service Multimedia Search Engine Social Network according to page g p g 200 180 180 new new new Electronic Commerce El t i C M di St Media Streaming i S Search engine, Social network, E ‐ commerce h i S i l k E views and daily visitors 180 Others 160 VIDEOS on YouTube hours MUSIC streaming PHOTOS on FLICKR every 160 15% 140 5% on) e e y every minute ute on PANDORA every minute on PANDORA every minute minute minute 140 140 40% 40% es (million) ) tides (billio 120 120 15% 100 100 80 Entrie 80 80 Nucleot 25% 60 data growth 60 VIDEO feeds from 40 Bioinformatics f 40 are minutes VOICE calls on 20 20 surveillance cameras 20 ll IMAGES, VIDEOS , doc Skype every minute 0 Top 20 websites 0 uments, … http://www.oldcolony.us/wp ‐ content/uploads/2014/11/whatisbigdata ‐ DKB ‐ v2.pdf http://www.alexa.com/topsites/global;0 p // / p /g ; http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth ‐ e.html#dbgrowth ‐ graph BigDataBench HPBDC 2015

Benchmark specification Benchmark specification � Guidelines for BigDataBench implementation � Data model � workloads Describe data model Model typical application scenarios Extract important workloads BigDataBench HPBDC 2015

BigDataBench Details BigDataBench Details • Five application domains • Benchmark specification for each domain Methodology • 14 Real world data sets & 3 kinds of big data generators • 14 Real world data sets & 3 kinds of big data generators • 33 Big data workloads with diverse implementation Implementation • BigDataBench subset version g Specific ‐ purpose Version BigDataBench HPBDC 2015

BigDataBench Summary g y BDGS(Big Data Generator Suite) for scalable data Wikipedia Entries Amazon Movie Reviews Google Web Graph Facebook Social Network E ‐ commerce Transaction ProfSearch Resumes ImageNet g English broadcasting audio DVD Input Streams p Genome sequence data Image scene Assembly of the human genome MNIST SoGou Data 14 Real ‐ world Data Sets NoSql Impala Impala Social Search Shark E-commerce Engine Network Hadoop RDMA H d RDMA Bioinformatics MPI Multimedia DataMPI Software Stacks Software Stacks 33 Workloads 33 Workloads BigDataBench HPBDC 2015

Big Data Generator Tool Big Data Generator Tool � 3 kinds of big data generators � Preserving original characteristics of real data g g � Text/Graph/Table generator BigDataBench HPBDC 2015

BigDataBench Details BigDataBench Details • Five application domains • Benchmark specification for each domain Methodology • 14 Real world data sets & 3 kinds of big data generators • 14 Real world data sets & 3 kinds of big data generators • 33 Big data workloads with diverse implementations Implementation • BigDataBench subset version g Specific ‐ purpose Version BigDataBench HPBDC 2015

BigDataBench Subset BigDataBench Subset � Motivation � Expensive to run all the benchmarks for system p y and architecture researches • multiplied by different implementations multiplied by different implementations • BigDataBench 3.0 provides about 77 workloads Eliminate the Eliminate the correlation data Identify workload Clustering Subset Subset characteristics from a characteristics from a (K ‐ Means) specific perspective Dimension reduction (PCA) ( ) BigDataBench HPBDC 2015

Why BigDataBench? Why BigDataBench? Specifi Application Workload Work Scalable data Multiple Multite Subs Simulat cation ca o domains do a s Types ypes loads oads sets (from real se s ( o ea impleme p e e nancy a cy ets e s or o data) ntations version BigDataBench Four [1] Y Five 33 8 Y Y Y Y BigBench Y One Three 10 3 N N N N Cloud ‐ Suite N N/A Two 8 3 N N N Y HiBench N N/A Two 10 3 N N N N CALDA Y N/A / One 5 N/A / Y N N N YCSB Y N/A One 6 N/A Y N N N LinkBench Y N/A One 10 N/A Y N N N AMP Y N/A One 4 N/A Y N N N Benchmarks [1] The four workloads types include Offline Analytics, Cloud OLTP, Interactive Analytics and Online Service BigDataBench HPBDC 2015

BigDataBench Users BigDataBench Users � http://prof.ict.ac.cn/BigDataBench/users/ htt // f i t /Bi D t B h/ / � Industry users � Accenture, BROADCOM, SAMSUMG, Huawei, IBM � China’s first industry ‐ standard big data benchmark y g suite � http://prof.ict.ac.cn/BigDataBench/industry ‐ standard ‐ p //p / g / y benchmarks/ � About 20 academia groups published papers using g p p p p g BigDataBench BigDataBench HPBDC 2015

BigDataBench Publications BigDataBench Publications BigDataBench: a Big Data Benchmark Suite from Internet Services. 20th IEEE h h k f h � International Symposium On High Performance Computer Architecture (HPCA ‐ 2014). Characterizing data analysis workloads in data centers. 2013 IEEE � International Symposium on Workload Characterization (IISWC 2013) （ Best paper award ） BigOP: generating comprehensive big data workloads as a benchmarking � framework. 19th International Conference on Database Systems for Advanced Applications (DASFAA 2014) Advanced Applications (DASFAA 2014) BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. The � Fourth workshop on big data benchmarking (WBDB 2014) Identifying Dwarfs Workloads in Big Data Analytics arXiv preprint Id if i D f W kl d i Bi D A l i Xi i � arXiv:1505.06872 BigDataBench ‐ MT: A Benchmark Tool for Generating Realistic Mixed Data � Center Workloads arXiv preprint arXiv:1504.02205 BigDataBench HPBDC 2015

Outline Outline � BigDataBench Overview � Workload characterization � Multi-tenancy version � Multi-tenancy version � Processors evaluation BigDataBench HPBDC 2015

� Diversified system level BigDataBench behaviors: System Behaviors System Behaviors HPBDC 2015 Pe ercentage Weighte ed I/O time ratio o 100% 20% 20% 40% 60% 80% 0.01 0.01 0% 100 0.1 10 10 1 H-Grep(7) H-Grep(7) S-K Kmeans(1) S-Kmeans(1) S-Pag geRank(1) S-PageRank(1) S H-Wor dCount(1) H-WordCount(1) H The Average Weighted disk I/O time ratio H H-Bayes(1) H-Bayes(1) M-Bayes M-Bayes M M-Kmeans M-Kmeans M-P PageRank M-PageRank H- -Read(10) H-Read(10) H-Diff ference(9) H-Difference(9) I-Select tQuery(9) I-S SelectQuery(9) CPU utilization CPU utilization S-Wor dCount(8) S S-WordCount(8) S- -Project(4) S-Project(4) S-O OrderBy(3) S-OrderBy(3) S-Grep(1) S-Grep(1) M-Grep M-Grep H -TPC-DS-… H-TPC-DS-… … I-O OrderBy(7) I-OrderBy(7) S -TPC-DS-… S-TPC-DS-… … I/O wait ratio I/O wait ratio S -TPC-DS-… S-TPC-DS-… … S-Sort(1) S-Sort(1) M-W WordCount M-WordCount M-Sort M-Sort AVG_S S_BigData AVG_S_BigData A

� Diversified system level BigDataBench behaviors: � High CPU utilization & less I/O time System Behaviors System Behaviors HPBDC 2015 Pe ercentage Weight ed I/O time rat tio 100% 20% 20% 40% 60% 80% 0.01 0.01 0% 100 0.1 10 10 1 H-Grep(7) H-Grep(7) S-K Kmeans(1) S-Kmeans(1) The Average Weighted disk I/O time ratio S-Pag geRank(1) S-PageRank(1) S H-Wor dCount(1) H-WordCount(1) H H-Bayes(1) H H-Bayes(1) M-Bayes M-Bayes M-Kmeans M M-Kmeans M-P PageRank M-PageRank H- -Read(10) H-Read(10) H-Diff ference(9) H-Difference(9) I-Select tQuery(9) I-S SelectQuery(9) CPU utilization CPU utilization S-Wor dCount(8) S S-WordCount(8) S- -Project(4) S-Project(4) S-O OrderBy(3) S-OrderBy(3) S-Grep(1) S-Grep(1) M-Grep M-Grep H -TPC-DS-… H-TPC-DS-… … I-O OrderBy(7) I-OrderBy(7) S -TPC-DS-… S-TPC-DS-… … I/O wait ratio I/O wait ratio S -TPC-DS-… S-TPC-DS-… … S-Sort(1) S-Sort(1) M-W WordCount M-WordCount M-Sort M-Sort AVG_S S_BigData AVG_S_BigData A

Understanding Big Data Workloads on Understanding Big Data Workloads - PowerPoint PPT Presentation

Understanding Big Data Workloads on Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan http://prof.ict.ac.cn/BigDataBench INS INSTITUT Professor, ICT, Chinese Academy of Sciences TE E OF CO and University

Introduction Workloads for Experiments Introduction to workloads CS 239 Workload

Biscuit: A Framework for Near-Data Processing of Big Data Workloads Oct 21, 2016 Duck-Ho Bae

Evaluation of Memory and CPU usage via Cgroups of ATLAS workloads via Cgroups of ATLAS workloads

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

When to use 3D Die-Stacked Memory for Bandwidth-Constrained Big Data Workloads Jason Lowe-Power

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Anomaly Analysis and Diagnosis for Co-located Datacenter Workloads in the Alibaba Cluster

ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads Kangnyeon Kim Tianzheng

Mostly-Optimistic Concurrency Control for Highly Contended Dynamic Workloads on a Thousand Cores

Zones - Containers Server Consolidation Run multiple workloads on system Improve utilization of

Traffic Footprint Characterization of Workloads using BPF Aditi Ghag aghag@vmware.com VMware

CSE 232A Database System Implementation Arun Kumar Topic 8: Data Systems for ML Workloads

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

Outline Overview Byzantine-Altruistic-Rational (BAR) model System Architecture

Probabilistic models for primes and large gaps William Banks Kevin Ford Terence Tao July, 2019

HH in gluon-gluon fusion Biggest cross section Only loop induced channel Exact NLO

Attack-Resilient Multitree Data Distribution Topologies Sascha Grau 1 Technische Universit at

Algorithm-based checkpoint-recovery for the conjugate gradient method Carlos Pachajoa, Christina

/k Introduction and content 2/37 Error-correcting pair - Generalized Reed-Solomon codes -

Robust texture image representation by scale selective local binary patterns (TIP2016)

Winnipeg by: Sancho Where What Capital of... Capital of... Capital of... Why How is it?

Sambuz

Useful Links

Newsletter

Mail Us

Understanding Big Data Workloads on Understanding Big Data Workloads - PowerPoint PPT Presentation

Understanding Big Data Workloads on Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan http://prof.ict.ac.cn/BigDataBench INS INSTITUT Professor, ICT, Chinese Academy of Sciences TE E OF CO and University

Introduction Workloads for Experiments Introduction to workloads CS 239 Workload

Biscuit: A Framework for Near-Data Processing of Big Data Workloads Oct 21, 2016 Duck-Ho Bae

Evaluation of Memory and CPU usage via Cgroups of ATLAS workloads via Cgroups of ATLAS workloads

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

When to use 3D Die-Stacked Memory for Bandwidth-Constrained Big Data Workloads Jason Lowe-Power

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Anomaly Analysis and Diagnosis for Co-located Datacenter Workloads in the Alibaba Cluster

ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads Kangnyeon Kim Tianzheng

Mostly-Optimistic Concurrency Control for Highly Contended Dynamic Workloads on a Thousand Cores

Zones - Containers Server Consolidation Run multiple workloads on system Improve utilization of

Traffic Footprint Characterization of Workloads using BPF Aditi Ghag aghag@vmware.com VMware

CSE 232A Database System Implementation Arun Kumar Topic 8: Data Systems for ML Workloads

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES &amp; OPPORTUNITIES Paris Big Data

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

Outline Overview Byzantine-Altruistic-Rational (BAR) model System Architecture

Probabilistic models for primes and large gaps William Banks Kevin Ford Terence Tao July, 2019

HH in gluon-gluon fusion Biggest cross section Only loop induced channel Exact NLO

Attack-Resilient Multitree Data Distribution Topologies Sascha Grau 1 Technische Universit at

Algorithm-based checkpoint-recovery for the conjugate gradient method Carlos Pachajoa, Christina

/k Introduction and content 2/37 Error-correcting pair - Generalized Reed-Solomon codes -

Robust texture image representation by scale selective local binary patterns (TIP2016)

Winnipeg by: Sancho Where What Capital of... Capital of... Capital of... Why How is it?

Sambuz

Useful Links

Newsletter

Mail Us

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data