Understanding Big Data Workloads on Understanding Big Data Workloads - - PowerPoint PPT Presentation

understanding big data workloads on understanding big
SMART_READER_LITE
LIVE PREVIEW

Understanding Big Data Workloads on Understanding Big Data Workloads - - PowerPoint PPT Presentation

Understanding Big Data Workloads on Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan http://prof.ict.ac.cn/BigDataBench INS INSTITUT Professor, ICT, Chinese Academy of Sciences TE E OF CO and University


slide-1
SLIDE 1

Understanding Big Data Workloads on Understanding Big Data Workloads on Modern Processors using BigDataBench

Jianfeng Zhan

INS INSTITUT

http://prof.ict.ac.cn/BigDataBench

TE E OF CO COMPUTIN MPUTING

Professor, ICT, Chinese Academy of Sciences and University of Chinese Academy of Sciences

G T TECHNOLOGY

HPBDC 2015 Ohio, USA

slide-2
SLIDE 2

Outline Outline

BigDataBench Overview Workload characterization Multi-tenancy version Multi-tenancy version Processors evaluation

BigDataBench HPBDC 2015

slide-3
SLIDE 3

What is BigDataBench? What is BigDataBench?

An open source big data benchmarking project

p g g p j

  • http://prof.ict.ac.cn/BigDataBench
  • Search Google using “BigDataBench”

BigDataBench HPBDC 2015

slide-4
SLIDE 4

BigDataBench Detail BigDataBench Detail

Methodology

  • Five application domains
  • Propose benchmark specifications for each domain
  • 14 Real world data sets & 3 kinds of big data generators

Implementation

  • 14 Real world data sets & 3 kinds of big data generators
  • 33 Big data workloads with diverse implementation
  • BigDataBench subset version

Specific‐ purpose Version

g

BigDataBench HPBDC 2015

slide-5
SLIDE 5

Five Application Domains Five Application Domains

DDBJ/EMBL/GenBank database Growth

Search Engine Social Network El t i C M di St i

Taking up 80% of internet services according to page new

180 200

/ /

Nucleotides Entries

Internet Service

S h i S i l k E

Multimedia

40% 5% 15%

Electronic Commerce Media Streaming Others

g p g views and daily visitors new

VIDEOS on YouTube

every minute new

PHOTOS on FLICKR every

minute hours MUSIC streaming

  • n PANDORA every minute

140 160 180 140 160 180 )

  • n)

Search engine, Social network, E‐commerce

40% 15%

e e y ute minute

  • n PANDORA every minute

80 100 120 80 100 120 140 es (million) tides (billio

25%

VIDEO feeds from

ll data growth are minutes VOICE calls on

20 40 60 40 60 80 Entrie Nucleot

Bioinformatics

http://www.alexa.com/topsites/global;0 Top 20 websites http://www.oldcolony.us/wp‐content/uploads/2014/11/whatisbigdata‐DKB‐v2.pdf surveillance cameras

IMAGES, VIDEOS, doc

uments, … Skype every minute

20 20

f

BigDataBench HPBDC 2015 p // / p /g ; http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth‐e.html#dbgrowth‐graph

slide-6
SLIDE 6

Benchmark specification Benchmark specification

Guidelines for BigDataBench implementation

Data model workloads

Describe data model Model typical application scenarios Extract important workloads BigDataBench HPBDC 2015

slide-7
SLIDE 7

BigDataBench Details BigDataBench Details

Methodology

  • Five application domains
  • Benchmark specification for each domain
  • 14 Real world data sets & 3 kinds of big data generators

Implementation

  • 14 Real world data sets & 3 kinds of big data generators
  • 33 Big data workloads with diverse implementation
  • BigDataBench subset version

Specific‐ purpose Version

g

BigDataBench HPBDC 2015

slide-8
SLIDE 8

BigDataBench Summary g y

BDGS(Big Data Generator Suite) for scalable data

Facebook Social Network ImageNet Wikipedia Entries E‐commerce Transaction

English broadcasting audio

Amazon Movie Reviews ProfSearch Resumes DVD Input Streams Google Web Graph g p SoGou Data Image scene MNIST Genome sequence data

Assembly of the human genome

Impala NoSql

14 Real‐world Data Sets

Shark Impala Search Engine

H d RDMA

Social Network E-commerce

MPI

Software Stacks 33 Workloads

DataMPI Hadoop RDMA

Multimedia Bioinformatics

BigDataBench HPBDC 2015 Software Stacks 33 Workloads

slide-9
SLIDE 9

Big Data Generator Tool Big Data Generator Tool

3 kinds of big data generators

Preserving original characteristics of real data

g g

Text/Graph/Table generator

BigDataBench HPBDC 2015

slide-10
SLIDE 10

BigDataBench Details BigDataBench Details

Methodology

  • Five application domains
  • Benchmark specification for each domain
  • 14 Real world data sets & 3 kinds of big data generators

Implementation

  • 14 Real world data sets & 3 kinds of big data generators
  • 33 Big data workloads with diverse implementations
  • BigDataBench subset version

Specific‐ purpose Version

g

BigDataBench HPBDC 2015

slide-11
SLIDE 11

BigDataBench Subset BigDataBench Subset

Motivation

Expensive to run all the benchmarks for system

p y and architecture researches

  • multiplied by different implementations

multiplied by different implementations

  • BigDataBench 3.0 provides about 77 workloads

Eliminate the Identify workload characteristics from a Eliminate the correlation data Clustering Subset characteristics from a specific perspective Dimension reduction (PCA) (K‐Means) Subset

BigDataBench HPBDC 2015

( )

slide-12
SLIDE 12

Why BigDataBench? Why BigDataBench?

Specifi cation Application domains Workload Types Work loads Scalable data sets (from real Multiple impleme Multite nancy Subs ets Simulat

  • r

ca o do a s ypes

  • ads

se s ( o ea data) p e e ntations a cy e s

  • version

BigDataBench

Y Five Four[1] 33 8 Y Y Y Y

BigBench

Y One Three 10 3 N N N N

Cloud‐Suite

N N/A Two 8 3 N N N Y

HiBench

N N/A Two 10 3 N N N N

CALDA

Y N/A One 5 N/A Y N N N / /

YCSB

Y N/A One 6 N/A Y N N N

LinkBench

Y N/A One 10 N/A Y N N N

AMP Benchmarks

Y N/A One 4 N/A Y N N N BigDataBench HPBDC 2015

[1] The four workloads types include Offline Analytics, Cloud OLTP, Interactive Analytics and Online Service

slide-13
SLIDE 13

BigDataBench Users BigDataBench Users

htt // f i t /Bi D t B h/ /

http://prof.ict.ac.cn/BigDataBench/users/ Industry users

Accenture, BROADCOM, SAMSUMG, Huawei, IBM

China’s first industry‐standard big data benchmark

y g suite

http://prof.ict.ac.cn/BigDataBench/industry‐standard‐

p //p / g / y benchmarks/

About 20 academia groups published papers using

g p p p p g BigDataBench

BigDataBench HPBDC 2015

slide-14
SLIDE 14

BigDataBench Publications BigDataBench Publications

h h k f h

  • BigDataBench: a Big Data Benchmark Suite from Internet Services. 20th IEEE

International Symposium On High Performance Computer Architecture (HPCA‐ 2014).

  • Characterizing data analysis workloads in data centers. 2013 IEEE

International Symposium on Workload Characterization (IISWC 2013)(Best paper award)

  • BigOP: generating comprehensive big data workloads as a benchmarking
  • framework. 19th International Conference on Database Systems for

Advanced Applications (DASFAA 2014) Advanced Applications (DASFAA 2014)

  • BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. The

Fourth workshop on big data benchmarking (WBDB 2014) Id if i D f W kl d i Bi D A l i Xi i

  • Identifying Dwarfs Workloads in Big Data Analytics arXiv preprint

arXiv:1505.06872

  • BigDataBench‐MT: A Benchmark Tool for Generating Realistic Mixed Data

BigDataBench HPBDC 2015

Center Workloads arXiv preprint arXiv:1504.02205

slide-15
SLIDE 15

Outline Outline

BigDataBench Overview Workload characterization Multi-tenancy version Multi-tenancy version Processors evaluation

BigDataBench HPBDC 2015

slide-16
SLIDE 16

System Behaviors System Behaviors

CPU utilization I/O wait ratio

Diversified system level

behaviors:

20% 40% 60% 80% 100%

CPU utilization I/O wait ratio ercentage

0% 20%

H-Grep(7) Kmeans(1) geRank(1) dCount(1) H-Bayes(1) M-Bayes M-Kmeans PageRank

  • Read(10)

ference(9) tQuery(9) dCount(8)

  • Project(4)

OrderBy(3) S-Grep(1) M-Grep

  • TPC-DS-…

OrderBy(7)

  • TPC-DS-…
  • TPC-DS-…

S-Sort(1) WordCount M-Sort S_BigData

Pe

10 100

  • S-K

S-Pag H-Wor H M M-P H- H-Diff I-Select S-Wor S- S-O H I-O S S M-W AVG_S 0.01 0.1 1 10 … … …

ed I/O time ratio

0.01 H-Grep(7) S-Kmeans(1) S-PageRank(1) H-WordCount(1) H-Bayes(1) M-Bayes M-Kmeans M-PageRank H-Read(10) H-Difference(9) SelectQuery(9) S-WordCount(8) S-Project(4) S-OrderBy(3) S-Grep(1) M-Grep H-TPC-DS-… I-OrderBy(7) S-TPC-DS-… S-TPC-DS-… S-Sort(1) M-WordCount M-Sort AVG_S_BigData

Weighte

BigDataBench HPBDC 2015

S H I-S S A

The Average Weighted disk I/O time ratio

slide-17
SLIDE 17

System Behaviors System Behaviors

CPU utilization I/O wait ratio

Diversified system level

behaviors:

20% 40% 60% 80% 100%

CPU utilization I/O wait ratio ercentage

High CPU utilization & less

I/O time

0% 20%

H-Grep(7) Kmeans(1) geRank(1) dCount(1) H-Bayes(1) M-Bayes M-Kmeans PageRank

  • Read(10)

ference(9) tQuery(9) dCount(8)

  • Project(4)

OrderBy(3) S-Grep(1) M-Grep

  • TPC-DS-…

OrderBy(7)

  • TPC-DS-…
  • TPC-DS-…

S-Sort(1) WordCount M-Sort S_BigData

Pe

10 100

tio

S-K S-Pag H-Wor H M M-P H- H-Diff I-Select S-Wor S- S-O H I-O S S M-W AVG_S 0.01 0.1 1 10 … … …

ed I/O time rat

0.01 H-Grep(7) S-Kmeans(1) S-PageRank(1) H-WordCount(1) H-Bayes(1) M-Bayes M-Kmeans M-PageRank H-Read(10) H-Difference(9) SelectQuery(9) S-WordCount(8) S-Project(4) S-OrderBy(3) S-Grep(1) M-Grep H-TPC-DS-… I-OrderBy(7) S-TPC-DS-… S-TPC-DS-… S-Sort(1) M-WordCount M-Sort AVG_S_BigData

Weight

BigDataBench HPBDC 2015

S H I-S S A

The Average Weighted disk I/O time ratio

slide-18
SLIDE 18

System Behaviors System Behaviors

CPU utilization I/O wait ratio

Diversified system level

behaviors:

20% 40% 60% 80% 100%

CPU utilization I/O wait ratio ercentage

High CPU utilization & less

I/O time

Low CPU utilization

0% 20%

H-Grep(7) Kmeans(1) geRank(1) dCount(1) H-Bayes(1) M-Bayes M-Kmeans PageRank

  • Read(10)

ference(9) tQuery(9) dCount(8)

  • Project(4)

OrderBy(3) S-Grep(1) M-Grep

  • TPC-DS-…

OrderBy(7)

  • TPC-DS-…
  • TPC-DS-…

S-Sort(1) WordCount M-Sort S_BigData

Pe

10 100

atio

Low CPU utilization relatively and lots of I/O time

S-K S-Pag H-Wor H M M-P H- H-Diff I-Select S-Wor S- S-O H I-O S S M-W AVG_S 0.01 0.1 1 10 … … …

ted I/O time ra

0.01 H-Grep(7) S-Kmeans(1) S-PageRank(1) H-WordCount(1) H-Bayes(1) M-Bayes M-Kmeans M-PageRank H-Read(10) H-Difference(9) SelectQuery(9) S-WordCount(8) S-Project(4) S-OrderBy(3) S-Grep(1) M-Grep H-TPC-DS-… I-OrderBy(7) S-TPC-DS-… S-TPC-DS-… S-Sort(1) M-WordCount M-Sort AVG_S_BigData

Weight

BigDataBench HPBDC 2015

S H I-S S A

The Average Weighted disk I/O time ratio

slide-19
SLIDE 19

System Behaviors System Behaviors

CPU utilization I/O wait ratio

Diversified system level

behaviors:

20% 40% 60% 80% 100%

CPU utilization I/O wait ratio ercentage

High CPU utilization & less

I/O time

Relatively low CPU

0% 20%

H-Grep(7) Kmeans(1) geRank(1) dCount(1) H-Bayes(1) M-Bayes M-Kmeans PageRank

  • Read(10)

ference(9) tQuery(9) dCount(8)

  • Project(4)

OrderBy(3) S-Grep(1) M-Grep

  • TPC-DS-…

OrderBy(7)

  • TPC-DS-…
  • TPC-DS-…

S-Sort(1) WordCount M-Sort S_BigData

Pe

Relatively low CPU utilization & lots of I/O time M di CPU tili ti &

S-K S-Pag H-Wor H M M-P H- H-Diff I-Select S-Wor S- S-O H I-O S S M-W AVG_S 1 10 100

ratio

Medium CPU utilization &

I/O time

0.01 0.1 1 (7) (1) k(1) t(1) (1) yes ans ank 10) (9) (9) t(8) t(4) y(3) (1) rep DS-… y(7) DS-… DS-… t(1) unt Sort ata

ghted I/O time

H-Grep S-Kmeans S-PageRank H-WordCount H-Bayes M-Bay M-Kmea M-PageRa H-Read(1 H-Difference I-SelectQuery( S-WordCount S-Project S-OrderBy S-Grep M-Gr H-TPC-D I-OrderBy S-TPC-D S-TPC-D S-Sort M-WordCou M-S AVG_S_BigDa

Weig

BigDataBench HPBDC 2015

The average Weighted disk I/O time ratios

slide-20
SLIDE 20

Workloads Classification Workloads Classification

From perspective of system behaviors:

System behaviors vary across different workloads

y y

Workloads are divided into 3 categories:

T W kl d Type Workloads CPU Intensive H‐Grep, S‐Kmeans, S‐PageRank, H‐WordCount, H‐ Bayes, M‐Bayes, M‐Kmeans and M‐PageRank I/O Intensive H‐Read, H‐Difference, I‐SelectQuery, S‐ WordCount, S‐Project, S‐OrderBy, M‐Grep and S‐ Grep Hybrid H‐TPC‐DS‐query3, I‐OrderBy, S‐TPC‐DS‐query10, S‐ TPC‐DS‐query8, S‐Sort, M‐WordCount and M‐Sort BigDataBench HPBDC 2015

slide-21
SLIDE 21

Off Chip Bandwidth Off‐Chip Bandwidth

Most of CPU‐intensive workloads have higher off‐chip bandwidth (3GB/s), Maximum is 6.2GB/s; Other workloads have lower off‐chip bandwidth (0.6GB/s). MPI based workloads need low memory bandwidth. BigDataBench HPBDC 2015

slide-22
SLIDE 22

IPC of BigDataBench vs. other

2

benchmarks

1 1.5

IPC

0.5 rep(7) ans(1) ank(1) unt(1) yes(1) Bayes means Rank d(10) ce(9) ry(9) unt(8) ect(4) By(3) rep(1)

  • Grep

ry3(9) By(7) C-DS-… ry8(1)

  • rt(1)

Count M-Sort gData PC-C Suite HPCC RSEC PECfp ECint H-Gr S-Kmea S-PageRa H-WordCou H-Bay M-B M-Km M-Page H-Rea H-Differen I-SelectQue S-WordCou S-Proje S-Order S-Gr M-

  • TPC-DS-quer

I-Order S-TPC

  • TPC-DS-quer

S-So M-WordC M AVG_S_Big TP AVG_Cloud Avg_H Avg_PAR AVG_SP AVG_SPE H- S-

The average IPC of the big data workloads is larger than that of CloudSuite, SPECFP and SPECINT similar with PARSEC and slightly lower than HPCC SPECINT, similar with PARSEC and slightly lower than HPCC The avrerage IPC of BigDataBench is 1.3 times of that of CloudSuite Some workloads have high IPC (M Kmeans S TPC DS Query8) BigDataBench HPBDC 2015 Some workloads have high IPC (M_Kmeans, S‐TPC‐DS‐Query8)

slide-23
SLIDE 23

Instructions Mix of BigDataBench

  • vs. other benchmarks

Bi d kl d d d i d i i h b h Big data workloads are data movement dominated computing with more branch

  • perations
  • 92% percentage in terms of instruction mix (Load + Store + Branch + data movements of INT)

BigDataBench HPBDC 2015

slide-24
SLIDE 24

Pipeline Stalls Pipeline Stalls

The service workloads have more RAT (Register Allocation Table) stalls The data analysis workloads have more RS (Reservation Station) and The data analysis workloads have more RS (Reservation Station) and

ROB (ReOrder Buffer) full stalls

Notable front end stalls (i.e., instruction fetch stall) !

Data Service Data analysis BigDataBench HPBDC 2015

slide-25
SLIDE 25

Cache Behaviors of BigDataBench g

CPU‐intensive I/O‐intensive hybrid

  • L1I MPKI
  • L1I MPKI
  • Larger than traditional benchmarks, but lower than that of CloudSuite (12 vs.

31) Diff bi d kl d

  • Different among big data workloads

CPU‐intensive(8), I/O intensive(22), and hybrid workloads(9)

  • One order of magnitude differences among diverse implementations

BigDataBench HPBDC 2015

  • M_WordCount is 2, while H_WordCount is 17
slide-26
SLIDE 26

Cache Behaviors Cache Behaviors

L2 Cache:

The IO‐intensive workloads undergo more L2 MPKI

L3 Cache:

The average L3 MPKI of the big data workloads is lower The average L3 MPKI of the big data workloads is lower

than all of the other workloads

The underlying software stacks impact data The underlying software stacks impact data

locality

MPI workloads have better data locality and less cache

misses

BigDataBench HPBDC 2015

slide-27
SLIDE 27

TLB Behaviors TLB Behaviors

CPU‐intensive I/O‐intensive hybrid

0.1 1 10 DTLB MPKI ITLB MPKI misses (MPKI) 0.01 0.1 H-Grep(7) means(1) eRank(1) dCount(1) Bayes(1) M-Bayes

  • Kmeans

ageRank Read(10) erence(9) Query(9) dCount(8) Project(4) rderBy(3) S-Grep(1) M-Grep query3(9) rderBy(7) uery10(4) query8(1) S-Sort(1)

  • rdCount

M-Sort _BigData TPC-C

  • udSuite

vg_HPCC PARSEC _SPECfp _SPECint TLB

ITLB

H S-Km S-Page H-Word H- M M-P H-R H-Diffe I-SelectQ S-Word S-P S-Or S H-TPC-DS-q I-Or S-TPC-DS-qu S-TPC-DS-q S M-Wo AVG_S_ AVG_Cl Av Avg_P AVG_ AVG_

ITLB

  • IO‐intensive workloads undergo more ITLB MPKI.

DTLB

DTLB

  • CPU‐intensive workloads have more DTLB MPKI.

BigDataBench HPBDC 2015

slide-28
SLIDE 28

Our observations from BigDataBench Our observations from BigDataBench

Unique characteristics Unique characteristics

  • Data movement dominated computing with more branch operations
  • 92% percentage in terms of instruction mix
  • Notable pipeline frontend stalls

Diff b h i Bi D kl d

Different behaviors among Big Data workloads

  • Disparity of IPCs and memory access behaviors
  • CloudSuite is a subclass of Big Data

CloudSuite is a subclass of Big Data

Software stacks impacts

  • The L1I cache miss rates have one order of magnitude differences among

diverse implementations with different software stacks. BigDataBench HPBDC 2015

slide-29
SLIDE 29

Correlation Analysis Correlation Analysis

Compute the correlation coefficients of CPI with

  • ther micro‐architecture level metrics.

Pearson’s correlation coefficient: Absolute value (from 0 to 1) shows the dependency:

  • The bigger the absolute value, the stronger the correlation.

BigDataBench HPBDC 2015

slide-30
SLIDE 30

Top five coefficients Top five coefficients

① ⑤

Naive Bayes Grep WordCount Kmeans FKmeans PageRank Sort Hive IBCF HMM SVM

BigDataBench HPBDC 2015

slide-31
SLIDE 31

Insights Insights

Frontend stall does not have high correlation Frontend stall does not have high correlation

coefficient value for most of big data analytics workloads workloads

Frontend stall is not the factor that affects the CPI

performance most performance most.

L2 cache misses and TLB misses have high

correlation coefficient values correlation coefficient values.

The long latency memory accesses (access L3 cache or

memory) affect the CPI performance most and should memory) affect the CPI performance most and should be the optimization point with highest priority.

BigDataBench HPBDC 2015

slide-32
SLIDE 32

Outline Outline

BigDataBench Overview Workload characterization Multi-tenancy version Multi-tenancy version Processors evaluation

BigDataBench HPBDC 2015

slide-33
SLIDE 33

Cloud Data Centers Cloud Data Centers

Two class of popular workloads

Long‐running services

g g

  • Search engines, E‐commerce sites

Short‐term data analytic jobs Short term data analytic jobs

  • Hadoop MapReduce, Spark jobs

BigDataBench HPBDC 2015

slide-34
SLIDE 34

Problem Problem

Existing benchmarks focus on specific types of

workload

Scenarios are not realistic

D t t h th t i l d t t i

Does not match the typical data center scenario

that mixes different percentages of tenants and kl d h i th ti workloads shareing the same computing infrastructure

BigDataBench HPBDC 2015

slide-35
SLIDE 35

Purpose of BigDataBench MT Purpose of BigDataBench‐MT

Developing realistic benchmarks to reflect

such practical scenarios of mixed workloads. p

Both service and data analytic workloads Dynamic scaling up and down Dynamic scaling up and down

The tool is publicly available from

http://prof.ict.ac.cn/BigDataBench/multi‐ p //p / g / tenancyversion

BigDataBench HPBDC 2015

slide-36
SLIDE 36

What can you do with it? What can you do with it?

We consider two dimensions of the

benchmarking scenarios g

From tenants’ perspectives

kl d ’

From workloads’ perspectives

BigDataBench HPBDC 2015

slide-37
SLIDE 37

You can specify the tenants You can specify the tenants

Th b f t t

The number of tenants

Scalability Benchmark: How many tenants are able

run in parallel ? run in parallel ?

The priorities of tenants F i B h k H

f i i th

Fairness Benchmark: How fair is the

system, i.e., are the available resources equally available to all tenants? If tenants have different available to all tenants? If tenants have different priorities ?

Time line

How the number and priorities of tenants change

  • ver time?

BigDataBench HPBDC 2015

slide-38
SLIDE 38

You can specify the workloads You can specify the workloads

D t h t i ti

Data characteristics

Data type, source Input/output data volumes, distributions

Computation semantics

p

Source code Big data software stacks Big data software stacks

Job arrival patterns

A i l

Arrival rate Arrival sequence

BigDataBench HPBDC 2015

slide-39
SLIDE 39

Two major challenges Two major challenges

  • H t

it f l kl d

  • Heterogeneity of real workloads

Different workload types

  • /
  • e.g. CPU or I/O intensive workloads

Different software stacks

  • e.g. Hadoop, Spark, MPI
  • Workload dynamicity hidden in real‐world traces

Arrival patterns

  • Request/Job submitting time and sequences

Job input sizes

  • e.g. ranging from KB to ZB

BigDataBench HPBDC 2015

slide-40
SLIDE 40

Existing big data benchmarks g g

Benchmarks Actual workloads Real workload traces Mixed workloads AMPLab benchmark Linkbench AMPLab benchmark, Linkbench , Bigbench , YCSB, CloudSuite Yes No No d

H t t l kl d th b i f

GridMix , SWIM No Yes No

How to generate real workloads on the basis of

real workload traces, is still an open question.

BigDataBench HPBDC 2015

slide-41
SLIDE 41

System Overview System Overview

Three modules

Benchmark User

Portal Portal

  • A visual interface
  • Combiner of Workloads

d T and Traces

  • A matcher of real

workloads and traces

M l i W kl d

  • Multi‐tenant Workload

Generator

  • A multi‐tenant

workload generator workload generator

BigDataBench HPBDC 2015

slide-42
SLIDE 42

Key technique: Combination of real and synthetic data analytic jobs

Goal: Combining the arrival patterns

t t d f l t ith l extracted from real traces with real workloads.

Problem: Workload traces only contain

anonymous jobs whose workload types anonymous jobs whose workload types and/or input data are unknown.

BigDataBench HPBDC 2015

slide-43
SLIDE 43

Solution: the first step

Deriving the workload characteristics of both

g real and anonymous jobs

Metric Description

  • TABLE. Metrics to represent workload characteristics

Execution time Measured in seconds CPU usage Total CPU time per second Memory usage Measured in GB CPI Cycles per instruction h b f MAI The number of memory accesses per instruction

BigDataBench HPBDC 2015

slide-44
SLIDE 44

Solution: the second step

Matching both types of jobs whose workload

g yp j characteristics are sufficiently similar

BigDataBench HPBDC 2015

slide-45
SLIDE 45

An Example p

An example of matching Hadoop workloads

Mining Facebook/Google workload trace Matching result: replaying basis workload trace (Exact workload characteristics information

Job type Input size (GB) Starting Time

information Profiling Hadoop Workload matching using

( ) (minutes) Bayes 2 10 Sort 1 20

g p workloads from BigDataBench (Collect workload using k‐means clustering

Sort 1 20 K‐means 0.5 25 Bayes 5 30

characteristics information)

Sort 1 40

BigDataBench HPBDC 2015

slide-46
SLIDE 46

System demostration

Three steps to generate a mix of search service and Hadoop MapReduce jobs Traces : 24‐hour Sogou user query logs and Google cluster trace.

Step 1‐Specification of tested machines and workloads Step 2 Selection of benchmarking period and scale Step 2‐Selection of benchmarking period and scale Step 3‐Generation of mixed workloads

BigDataBench HPBDC 2015

slide-47
SLIDE 47

Workloads and traces in BigDataBench MT BigDataBench‐MT

M lti t V1 0 l

f

Multi‐tenancy V1.0 releases:

Workloads Software stack Workload trace Nutch Web Search Apache Tomcat 6.0.26, Search Sogou (http://www.sogou.com/labs/dl Server(Nutch) /q‐e.html) Hadoop Hadoop 1.0.2 Facebook (https://github.com/SWIMProje ctUCB/SWIM/wiki) Shark Shark 0 8 0 Google data center Shark Shark 0.8.0 Google data center (https://code.google.com/p/goo gleclusterdata/)

BigDataBench HPBDC 2015

slide-48
SLIDE 48

Outline Outline

BigDataBench Overview Workload characterization Multi-tenancy version Multi-tenancy version Processors evaluation

BigDataBench HPBDC 2015

slide-49
SLIDE 49

Core Architecture Core Architecture

Multi brawny core (Xeon E5645, 2.4 GHz)

6 Out‐of‐Order cores Dynamic Multiple Issue (supper scalar) Dynamic Overclocking Simultaneous multithreading

Many wimpy core architecture (Tile‐Gx36, 1.2 GHz):

y py ( , )

36 In‐Order cores Static Multiple Issue (VLIW)

p ( )

BigDataBench HPBDC 2015

slide-50
SLIDE 50

Experiment methodology Experiment methodology

User real hardware instead of simulation Real power consumption measurement instead of

modeling

Saturate CPU performance by:

Saturate CPU performance by:

Isolate the processor behavior

  • Over‐provisions the disk I/O subsystem by using RAM disk

p / y y g

Optimize benchmarks

  • Tune the software stack parameters
  • JVM flags to performance

BigDataBench HPBDC 2015

slide-51
SLIDE 51

Execution time Execution time

For Hadoop based sort, the performance gap is about 1.08×. For the other workloads, more than 2× gaps exist between

X d Til Xeon and Tilera.

3.5 4 4.5 5

me Xeon Tilera

1 1.5 2 2.5 3 3.5

Normalized Tim

0.5 1

From the perspective of execution time, the Xeon processor is

b tt th Til ll th ti

BigDataBench HPBDC 2015

better than Tilera processor all the time.

slide-52
SLIDE 52

Cycle Counts Cycle Counts

There are huge cycle count gaps between Xeon and Tilera

ranging from 5.3 to 14. Til d l t l t th t f

Tilera need more cycles to complete the same amount of

work.

16

Xeon Tilera

8 10 12 14

zed Cycles

2 4 6 8

Normaliz

BigDataBench HPBDC 2015

slide-53
SLIDE 53

Pipeline Efficiency Pipeline Efficiency

  • The theoretical IPC:

The theoretical IPC:

Xeon: 4 instructions per cycle

  • Pipeline efficiency:

Tilera: 1 instruction bundle per cycle

Pipeline efficiency:

0.45

Tilera Xeon

0 15 0.2 0.25 0.3 0.35 0.4

ine Efficiency

0.05 0.1 0.15

Pipeli

  • OoO pipelines are more efficient than in‐order ones

BigDataBench HPBDC 2015

slide-54
SLIDE 54

Power Consumption Power Consumption

Tilera is power optimized. Xeon consumes more power. Xeon consumes more power.

1.6 1.8

Tilera Xeon

0.8 1 1.2 1.4

alized Power

0.2 0.4 0.6

Norm

BigDataBench HPBDC 2015

slide-55
SLIDE 55

Energy Consumption Energy Consumption

d b d l l h

Hadoop based sort consumes less energy on Tilera than on Xeon

  • Hadoop sort is an extremely I/O intensive workloads.

Tilera consumes more energy than Xeon to complete the same amount of

Tilera consumes more energy than Xeon to complete the same amount of work for most big data workloads

  • The longer execution time offsets the lower power design

3 5

Xeon Tilera

2 2.5 3 3.5

zed Energy

0.5 1 1.5

Normali

BigDataBench HPBDC 2015

slide-56
SLIDE 56

Total Cost of Ownership (TCO) Model[*] Total Cost of Ownership (TCO) Model[ ]

Three‐year depreciation cycle Hardware costs associated with individual

components

CPU Memory Disk Board Power Cooling [*] K. Lim et al. Understanding and designing new server architectures for emerging warehouse‐computing

BigDataBench HPBDC 2015

  • environments. ISCA 2008
slide-57
SLIDE 57

Cost model Cost model

The cost data originate from diverse sources:

Diff t d

Different vendors Corresponding official websites

Power and cooling: BigDataBench HPBDC 2015

An activity factor of 0.75

slide-58
SLIDE 58

Performance per TCO Performance per TCO

Haoop‐based Sort has higher performance per TCO

  • n the Tilera.

For other workloads, Xeon outperforms Tilera.

Tilera Xeon Turbo & HT enabled

2 2.5 3 3.5

ance per TCO

0.5 1 1.5 2

malized Perform Norm

BigDataBench HPBDC 2015

slide-59
SLIDE 59

Key Takeaways Key Takeaways

Try using an open‐source big data benchmark suite

from http://prof.ict.ac.cn/BigDataBench

Big Data: data movement dominated computing

with more branch operations

92% percentage in terms of instruction mix

Multi‐tenancy version: replaying mixed workloads

Multi tenancy version: replaying mixed workloads according to publicly available workloads traces.

Wimpy‐core processors only suit a part of big data Wimpy‐core processors only suit a part of big data

workloads.

BigDataBench HPBDC 2015

slide-60
SLIDE 60

BigDataBench HPBDC 2015