DSDS: DATA STORE DRIVEN APPLICATION SCHEDULING
Frezewd Lemma Tena, Christof Fetzer TU Dresden, Germany
1
DSDS: DATA STORE DRIVEN APPLICATION SCHEDULING Frezewd Lemma Tena, - - PowerPoint PPT Presentation
DSDS: DATA STORE DRIVEN APPLICATION SCHEDULING Frezewd Lemma Tena, Christof Fetzer TU Dresden, Germany 1 MOTIVATION Context: reduce end user perceived latency move computing closer to end user how to build an edge cloud? Problem
Frezewd Lemma Tena, Christof Fetzer TU Dresden, Germany
1
MOTIVATION
➤ Context: reduce end user perceived latency ➤ move computing closer to end user ➤ how to build an edge cloud? ➤ Problem: cost of building and operating an edge cloud ➤ Objective: Reduce TCO of an edge cloud ➤ electricity costs ➤ cost of hosting and maintaining computing infrastructure
2
SYSTEM MODEL
➤ Distributed edge cloud ➤ connected to heating system ➤ each micro-cloud provides
compute & storage resources
➤ Cost of computing depends
➤ need for heat / hot water (of
building)
➤ local electricity cost: ➤ local solar power
micro-cloud consisting
( C ) C l
d & H e a t
solar panel
3
OBSERVATION 1: WE NEED TO INCREASE UTILIZATION
➤ Infrastructure permits to ➤ reduce user perceived latency ➤ To reduce TCO, micro-clouds need to support more app
domains:
➤ compute heavy jobs (protein folding, …) ➤ store backups ➤ store replicas of data ➤ data mining jobs (accessing one of the replicas) ➤ …
4
OBSERVATION 2: CUT DOWN POWER COSTS
➤ To reduce the electricity costs, we can ➤ use lower-cost solar power ➤ sell the „waste heat“ of the computers ➤ computers hibernate to reduce power consumption ➤ Difficult scheduling problem!
5
PROBLEM ADDRESSED
➤ In which microcloud should we run a compute job? ➤ e.g., data mining jobs access ➤ Naive approach: ➤ at microcloud that has the lowest effective electricity costs ➤ Problem: ➤ data too large to move to another microcloud before
running compute job
6
NODE ARCHITECTURE (COST-EFFECTIVE PLATFORM)
Ethernet server for computing & storage disk disk disk disk … node server for computing & storage disk disk disk disk … node Ethernet
not energy-proportional
Example: access to one disk requires server to be in „active state“
7
REPLICATION OF DATA
R1 R2 R3 typically, we keep 3 replicas microcloud 1 lives in microcloud2 lives in microcloud3 lives in Write(W): 3 Read(R): 1 satisfies: R + W > N For writing: all three disks/servers need to be active For reading: one disk/server needs to be active Problem: this might require to keep all servers & disks in „active state“
8
POWERCASS ARCHITECTURE
DHT Approach: dormant and sleep peers can go into „hibernation mode“
9
REPLICATION ACROSS MICRO CLOUDS
node node node We can always read data from active node
10
microcloud 1 microcloud 2 microcloud 3
WRITING TO SWITCHED-OFF NODES
hinted handoff hinted handoff write Can always write: hinted-handoff to using active nodes
active active
11
microcloud 1 microcloud 2 microcloud 3
APPLICATION ASSUMPTIONS
➤ We assume that we ➤ know what data will be accessed by an application ➤ know if a job is „short“ or „long“ running
application App’s data
Where should we execute App?
12
NODES
➤ daily load pattern
13
SCHEDULING IDEA: LOW LOAD
all apps run here
14
SCHEDULING IDEA: MEDIUM LOAD
switch on dormant machines to access „dormant“ replica need est. of running time
15
SCHEDULING IDEA: HIGH LOAD
switch on sleepy machines in third micro cloud also run apps on sleepy nodes
16
SCHEDULING IDEA: HIGH LOAD
run microcloud that minimises cost
17
NEXT STEPS: SWITCH ROLES OF NODES
➤ Problem: ➤ static classification in active / dormant / sleep not optimal ➤ Approach: ➤ switch „roles“ of nodes to reduce cost of computation ➤ Example: ➤ swap roles of sleepy and dormant nodes at different sites
18
EXAMPLE
cost > cost
19
microcloud 1 microcloud 2 microcloud 3
SWITCH ROLE OF NODES: ACTIVE VS DORMANT
cost > cost
A B
20
microcloud 1 microcloud 2 microcloud 3
PROBLEMS
➤ What if nodes A and B do not store identical content? ➤ we might not be able to simply change roles of A and B! ➤ How to address this? ➤ keep nodes identical (bad for durability) ➤ migrate data locally to different class of node ➤ …
CURRENT WORK
➤ Address security concerns (due to limited physical security) ➤ Motivation: ➤ we need to keep the data encrypted ➤ data mining job needs encryption key - how to keep this
secure?
➤ Approach: Docker-Compatible Secure Framework ➤ provide secure computation based on Intel SGX (SCONE,
OSDI 2016, SGXBounds, EuroSys 2017)
22
SUMMARY
➤ We are working on an edge cloud that combines ➤ energy-efficiency, and ➤ low-latency (edge cloud) ➤ We want to use this edge cloud to ➤ store and process data ➤ Showed: smart scheduling can reduce the cost of computation ➤ Current work: ➤ further improve energy-efficiency ➤ address security issues
23