understanding big data workloads on understanding big
play

Understanding Big Data Workloads on Understanding Big Data Workloads - PowerPoint PPT Presentation

Understanding Big Data Workloads on Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan http://prof.ict.ac.cn/BigDataBench INS INSTITUT Professor, ICT, Chinese Academy of Sciences TE E OF CO and University


  1. Understanding Big Data Workloads on Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan http://prof.ict.ac.cn/BigDataBench INS INSTITUT Professor, ICT, Chinese Academy of Sciences TE E OF CO and University of Chinese Academy of Sciences COMPUTIN MPUTING G T TECHNOLOGY HPBDC 2015 Ohio, USA

  2. Outline Outline � BigDataBench Overview � Workload characterization � Multi-tenancy version � Multi-tenancy version � Processors evaluation BigDataBench HPBDC 2015

  3. What is BigDataBench ? What is BigDataBench ? � An open source big data benchmarking project p g g p j • http://prof.ict.ac.cn/BigDataBench • Search Google using “ BigDataBench ” BigDataBench HPBDC 2015

  4. BigDataBench Detail BigDataBench Detail • Five application domains • Propose benchmark specifications for each domain Methodology • 14 Real world data sets & 3 kinds of big data generators • 14 Real world data sets & 3 kinds of big data generators • 33 Big data workloads with diverse implementation Implementation • BigDataBench subset version g Specific ‐ purpose Version BigDataBench HPBDC 2015

  5. Five Application Domains Five Application Domains DDBJ/EMBL/GenBank database Growth / / Taking up 80% of Nucleotides Entries internet services Internet Service Multimedia Search Engine Social Network according to page g p g 200 180 180 new new new Electronic Commerce El t i C M di St Media Streaming i S Search engine, Social network, E ‐ commerce h i S i l k E views and daily visitors 180 Others 160 VIDEOS on YouTube hours MUSIC streaming PHOTOS on FLICKR every 160 15% 140 5% on) e e y every minute ute on PANDORA every minute on PANDORA every minute minute minute 140 140 40% 40% es (million) ) tides (billio 120 120 15% 100 100 80 Entrie 80 80 Nucleot 25% 60 data growth 60 VIDEO feeds from 40 Bioinformatics f 40 are minutes VOICE calls on 20 20 surveillance cameras 20 ll IMAGES, VIDEOS , doc Skype every minute 0 Top 20 websites 0 uments, … http://www.oldcolony.us/wp ‐ content/uploads/2014/11/whatisbigdata ‐ DKB ‐ v2.pdf http://www.alexa.com/topsites/global;0 p // / p /g ; http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth ‐ e.html#dbgrowth ‐ graph BigDataBench HPBDC 2015

  6. Benchmark specification Benchmark specification � Guidelines for BigDataBench implementation � Data model � workloads Describe data model Model typical application scenarios Extract important workloads BigDataBench HPBDC 2015

  7. BigDataBench Details BigDataBench Details • Five application domains • Benchmark specification for each domain Methodology • 14 Real world data sets & 3 kinds of big data generators • 14 Real world data sets & 3 kinds of big data generators • 33 Big data workloads with diverse implementation Implementation • BigDataBench subset version g Specific ‐ purpose Version BigDataBench HPBDC 2015

  8. BigDataBench Summary g y BDGS(Big Data Generator Suite) for scalable data Wikipedia Entries Amazon Movie Reviews Google Web Graph Facebook Social Network E ‐ commerce Transaction ProfSearch Resumes ImageNet g English broadcasting audio DVD Input Streams p Genome sequence data Image scene Assembly of the human genome MNIST SoGou Data 14 Real ‐ world Data Sets NoSql Impala Impala Social Search Shark E-commerce Engine Network Hadoop RDMA H d RDMA Bioinformatics MPI Multimedia DataMPI Software Stacks Software Stacks 33 Workloads 33 Workloads BigDataBench HPBDC 2015

  9. Big Data Generator Tool Big Data Generator Tool � 3 kinds of big data generators � Preserving original characteristics of real data g g � Text/Graph/Table generator BigDataBench HPBDC 2015

  10. BigDataBench Details BigDataBench Details • Five application domains • Benchmark specification for each domain Methodology • 14 Real world data sets & 3 kinds of big data generators • 14 Real world data sets & 3 kinds of big data generators • 33 Big data workloads with diverse implementations Implementation • BigDataBench subset version g Specific ‐ purpose Version BigDataBench HPBDC 2015

  11. BigDataBench Subset BigDataBench Subset � Motivation � Expensive to run all the benchmarks for system p y and architecture researches • multiplied by different implementations multiplied by different implementations • BigDataBench 3.0 provides about 77 workloads Eliminate the Eliminate the correlation data Identify workload Clustering Subset Subset characteristics from a characteristics from a (K ‐ Means) specific perspective Dimension reduction (PCA) ( ) BigDataBench HPBDC 2015

  12. Why BigDataBench? Why BigDataBench? Specifi Application Workload Work Scalable data Multiple Multite Subs Simulat cation ca o domains do a s Types ypes loads oads sets (from real se s ( o ea impleme p e e nancy a cy ets e s or o data) ntations version BigDataBench Four [1] Y Five 33 8 Y Y Y Y BigBench Y One Three 10 3 N N N N Cloud ‐ Suite N N/A Two 8 3 N N N Y HiBench N N/A Two 10 3 N N N N CALDA Y N/A / One 5 N/A / Y N N N YCSB Y N/A One 6 N/A Y N N N LinkBench Y N/A One 10 N/A Y N N N AMP Y N/A One 4 N/A Y N N N Benchmarks [1] The four workloads types include Offline Analytics, Cloud OLTP, Interactive Analytics and Online Service BigDataBench HPBDC 2015

  13. BigDataBench Users BigDataBench Users � http://prof.ict.ac.cn/BigDataBench/users/ htt // f i t /Bi D t B h/ / � Industry users � Accenture, BROADCOM, SAMSUMG, Huawei, IBM � China’s first industry ‐ standard big data benchmark y g suite � http://prof.ict.ac.cn/BigDataBench/industry ‐ standard ‐ p //p / g / y benchmarks/ � About 20 academia groups published papers using g p p p p g BigDataBench BigDataBench HPBDC 2015

  14. BigDataBench Publications BigDataBench Publications BigDataBench: a Big Data Benchmark Suite from Internet Services. 20th IEEE h h k f h � International Symposium On High Performance Computer Architecture (HPCA ‐ 2014). Characterizing data analysis workloads in data centers. 2013 IEEE � International Symposium on Workload Characterization (IISWC 2013) ( Best paper award ) BigOP: generating comprehensive big data workloads as a benchmarking � framework. 19th International Conference on Database Systems for Advanced Applications (DASFAA 2014) Advanced Applications (DASFAA 2014) BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. The � Fourth workshop on big data benchmarking (WBDB 2014) Identifying Dwarfs Workloads in Big Data Analytics arXiv preprint Id if i D f W kl d i Bi D A l i Xi i � arXiv:1505.06872 BigDataBench ‐ MT: A Benchmark Tool for Generating Realistic Mixed Data � Center Workloads arXiv preprint arXiv:1504.02205 BigDataBench HPBDC 2015

  15. Outline Outline � BigDataBench Overview � Workload characterization � Multi-tenancy version � Multi-tenancy version � Processors evaluation BigDataBench HPBDC 2015

  16. � Diversified system level BigDataBench behaviors: System Behaviors System Behaviors HPBDC 2015 Pe ercentage Weighte ed I/O time ratio o 100% 20% 20% 40% 60% 80% 0.01 0.01 0% 100 0.1 10 10 1 H-Grep(7) H-Grep(7) S-K Kmeans(1) S-Kmeans(1) S-Pag geRank(1) S-PageRank(1) S H-Wor dCount(1) H-WordCount(1) H The Average Weighted disk I/O time ratio H H-Bayes(1) H-Bayes(1) M-Bayes M-Bayes M M-Kmeans M-Kmeans M-P PageRank M-PageRank H- -Read(10) H-Read(10) H-Diff ference(9) H-Difference(9) I-Select tQuery(9) I-S SelectQuery(9) CPU utilization CPU utilization S-Wor dCount(8) S S-WordCount(8) S- -Project(4) S-Project(4) S-O OrderBy(3) S-OrderBy(3) S-Grep(1) S-Grep(1) M-Grep M-Grep H -TPC-DS-… H-TPC-DS-… … I-O OrderBy(7) I-OrderBy(7) S -TPC-DS-… S-TPC-DS-… … I/O wait ratio I/O wait ratio S -TPC-DS-… S-TPC-DS-… … S-Sort(1) S-Sort(1) M-W WordCount M-WordCount M-Sort M-Sort AVG_S S_BigData AVG_S_BigData A

  17. � Diversified system level BigDataBench behaviors: � High CPU utilization & less I/O time System Behaviors System Behaviors HPBDC 2015 Pe ercentage Weight ed I/O time rat tio 100% 20% 20% 40% 60% 80% 0.01 0.01 0% 100 0.1 10 10 1 H-Grep(7) H-Grep(7) S-K Kmeans(1) S-Kmeans(1) The Average Weighted disk I/O time ratio S-Pag geRank(1) S-PageRank(1) S H-Wor dCount(1) H-WordCount(1) H H-Bayes(1) H H-Bayes(1) M-Bayes M-Bayes M-Kmeans M M-Kmeans M-P PageRank M-PageRank H- -Read(10) H-Read(10) H-Diff ference(9) H-Difference(9) I-Select tQuery(9) I-S SelectQuery(9) CPU utilization CPU utilization S-Wor dCount(8) S S-WordCount(8) S- -Project(4) S-Project(4) S-O OrderBy(3) S-OrderBy(3) S-Grep(1) S-Grep(1) M-Grep M-Grep H -TPC-DS-… H-TPC-DS-… … I-O OrderBy(7) I-OrderBy(7) S -TPC-DS-… S-TPC-DS-… … I/O wait ratio I/O wait ratio S -TPC-DS-… S-TPC-DS-… … S-Sort(1) S-Sort(1) M-W WordCount M-WordCount M-Sort M-Sort AVG_S S_BigData AVG_S_BigData A

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend