CENSUS: Counting Interleaved Workloads on Shared Storage 36 th - PowerPoint PPT Presentation

CENSUS: Counting Interleaved Workloads on Shared Storage 36 th International Conference on Massive Storage Systems and Technology (MSST 2020) Si Chen, Jianqiao Liu, Avani Wildani 1

How to choose the right storage for workload? Cost efficiency : higher throughput, less latency, less cost Sequential write → LSM -tree based Key-value store Fast random read → Flash memory Random write → SSD Lower speed read and write → HDD … And the best configuration? 2

Fair Resource Provisioning for Shared Storage is hard! Challenge : shared storage, dynamic, interleaved, Smart storage: capacity prediction and performance management Deep understanding the workload! 3

Workload separation for shared storage device Dev1 ： SSD Dev2:log-structure Dev3: flash Dev4: HDD Sequential Random Random read write read Sequential 4 write

What exactly shall we separate? Application specific workload Fully isolation does not really means shared storage. Single workload has several functional usage of storage. Fworkload workload Functionally distinct usage of a storage system Process ID (PID) is a stand-in for non-existent labels 5

Motivation Existing approaches fail to distinguish interleaved storage fworkloads. Traditional workload characterization The number of concurrent fworkloads only have limited features. is precursor for separation (read/write ratio, sequentiality...) Goal ： Given a block I/O trace, we are able to identify the number of fworkloads in a storage system. 6

Our Approach: Census Feature Number of classification extraction fworkloads Time series analysis ( tsfresh) Gradient boosting tree model Benefit: hundreds of new Benefit:Training speed fast, Interpretable features options LightGBM: leaf- wise tree growth feature histogram 7

Fworkload number Inference prediction 9

Dataset ● FIU (Florida International University) nearly three weeks of block I/O traces. Include web related, home related domain. ● MSR (Microsoft Research (MSR), Cambridge) 1 week of block I/O traces from 36 different volumes on 13 enterprise servers ● EmoryML ( newly collected ) 30 days of block I/O traces collected by blktrace from our local server, running machine learning workloads 10

Extracted features Feature criticality = the count of Features from summary statistics Additional characteristics of sample distribution Features derived from observed dynamics 11

Extracted features Feature criticality = the count of Features from summary statistics Additional characteristics of sample distribution Features derived from observed dynamics 12

Feature Importance Heatmap MSR EmoryML FIU-Home Training dataset Feature criticality is trace dependent. 13

Sample features 1) address complexity It measures the complexity of the address series A high feature value indicates that more random accesses and less sequential accesses are in the trace, which implies more concurrent workloads during that time window. 14

Sample features 2) address change quantiles Quantiles : divide data into equally sized groups. It returns the average absolute consecutive changes of the address series identified between given higher and lower quantiles . 15

Model Evaluation x-accuracy Considers the instances with prediction error within 1 or 2, respectively as accurate. MAPE (mean absolute percentage error) Measures the size of the prediction error. Identifies instances that are approximately correct. Baseline (fairest guess) : Randomly generating labels based on the fworkload number distribution in the training set. 16

Training method Generalized model: ID model: Consider multiple domains Domain specific 17

Result of Generalized model Accuracy score: CENSUS is 23% higher than baseline on average 18

Result of Generalized model MAPE: CENSUS is 57% better than baseline on average 19

Application: Separating Interleaved fworkloads The estimate for the number of fworkloads provided by CENSUS decreases the average MSE compared to the fair guess MSE 20

Summary CENSUS could identify the number of concurrent fworkloads with as little as 5% error. CENSUS opens the field to insights derivable from formerly overlooked metrics . LBA carries more effective information than time interval . Only 30% top features are related to time, affecting 1% of the final result. CENSUS improves fworkload separation in a test case. 21

Discussion and Future work Online model, recurrently training the model when unknown fworkload emerge. Find better fworkload label instead of PID, e.g. UID, process name. Add more trace attributes for workload characterization, e.g. latency. Try the workload separation on large-scale dataset. 22

Thank you! Questions! si.chen2@emory.edu https://github.com/meditates/CENSUS 23

CENSUS: Counting Interleaved Workloads on Shared Storage 36 th - PowerPoint PPT Presentation

CENSUS: Counting Interleaved Workloads on Shared Storage 36 th International Conference on Massive Storage Systems and Technology (MSST 2020) Si Chen, Jianqiao Liu, Avani Wildani 1 How to choose the right storage for workload? Cost efficiency :

2020 CENSUS Wilmette is counting on you 1 What is the Census? The Census is the decennial

Preparing for Census 2020 Census 101 Agenda Census Overview Why We do a Census Why it

United States Census Bureau Chicago Regional Census Center The 2020 Census 2020 Census A

Outline 1. What Is the Census? 2. Why Does the Census Matter? 3. Barriers to Overcome with the

Census Goodwill Ambassador Training Round 2 census.lacity.org Agenda 1. Census 2020 Overview;

Census Goodwill Ambassador Training census.lacity.org What is Census 2020? The census is a

Preparing for the 2020 Census to Go Door-to-Door (NRFU) Hosted by: The Census Counts Campaign and

Introduction Workloads for Experiments Introduction to workloads CS 239 Workload

2020 Census Local Update of Census Addresses Operation (LUCA) U.S. Census Bureau Geography

Census 2020 What is the Census? The Census is a constitutionally mandated count of all people

Census Geographies Introduction to Fundamentals of Census Geographies GIS/Data Center | Email

Census 2020 Getting a full and accurate count of Los Angeles census.lacity.org What is Census

The 2020 Census Geographic Partnership Opportunities Jim Castagneri U.S. Census Bureau Denver

U.S. Census 2020 and Complete Count Committees September 24, 2019 1 U.S. Decennial Census

Census Basics 2020CENSUS.GOV 2 2 1 2/27/2020 Census Data is important 3 2020CENSUS.GOV 3

2020 Census Program Management Review Decennial Census Programs U.S. Census Bureau April 20,

A Machine Learning Approach to Mapping Streaming Workloads to Dynamic Multicore Processors

Embarcadero Station Improvement Study Survey Survey Results October 28 - November 7, 2014

Survey Results October 28 November 7, 2014 2034 Total Responses Date Created: November 10,

Welcome to #WCETWebcast August 17, 2017 The webcast will begin shortly. There is no audio

In-Test Adaptation of Workload in Enterprise Application Performance Testing Maciej Kaczmarski

EnaCloud: An Energy-saving Application Live Placement Approach for Cloud Computing Environments

Evaluating whether the training data provided for profile feedback is a realistic control flow

From Traffic Measurement to From Traffic Measurement to Realistic Workload Generation Realistic