Computing While Charging Building a Distributed Computing - - PowerPoint PPT Presentation

computing while charging
SMART_READER_LITE
LIVE PREVIEW

Computing While Charging Building a Distributed Computing - - PowerPoint PPT Presentation

Computing While Charging Building a Distributed Computing Infrastructure Using Smartphones Mustafa Y. Arslan - Indrajeet Singh - Shailendra Singh Harsha V. Madhyastha - Karthikeyan Sundaresan - Srikanth V. Krishnamurthy Thursday, December


slide-1
SLIDE 1

Computing While Charging

Building a Distributed Computing Infrastructure Using Smartphones

Mustafa Y. Arslan - Indrajeet Singh - Shailendra Singh Harsha V. Madhyastha - Karthikeyan Sundaresan - Srikanth V. Krishnamurthy

Thursday, December 13, 12

slide-2
SLIDE 2

Smartphones and Computing

  • Smartphones with higher CPU clock speeds, more

CPU cores, and so on.

  • real computers in our pockets
  • Enterprises are also adopting smartphones.
  • Problem: The real computing power of

smartphones is yet to be tapped into.

  • battery drains quickly -- long idle charging times

(e.g., at night)

Thursday, December 13, 12

slide-3
SLIDE 3

Idle Phones Put Back to Work

  • Utilize idle charging periods for executing distributed jobs
  • n phones, for an enterprise.
  • 100s-1000s of smartphones, working in parallel

Thursday, December 13, 12

slide-4
SLIDE 4

Idle Phones Put Back to Work

  • Utilize idle charging periods for executing distributed jobs
  • n phones, for an enterprise.
  • 100s-1000s of smartphones, working in parallel
  • Why would I bother? I have a datacenter full of PCs.

Thursday, December 13, 12

slide-5
SLIDE 5

Idle Phones Put Back to Work

  • Utilize idle charging periods for executing distributed jobs
  • n phones, for an enterprise.
  • 100s-1000s of smartphones, working in parallel
  • Why would I bother? I have a datacenter full of PCs.
  • untapped horsepower -- phones will run the same code

as your servers

Thursday, December 13, 12

slide-6
SLIDE 6

Idle Phones Put Back to Work

  • Utilize idle charging periods for executing distributed jobs
  • n phones, for an enterprise.
  • 100s-1000s of smartphones, working in parallel
  • Why would I bother? I have a datacenter full of PCs.
  • untapped horsepower -- phones will run the same code

as your servers

  • no cost to bootstrap (phones are already in hand)

Thursday, December 13, 12

slide-7
SLIDE 7

Idle Phones Put Back to Work

  • Utilize idle charging periods for executing distributed jobs
  • n phones, for an enterprise.
  • 100s-1000s of smartphones, working in parallel
  • Why would I bother? I have a datacenter full of PCs.
  • untapped horsepower -- phones will run the same code

as your servers

  • no cost to bootstrap (phones are already in hand)
  • a phone consumes ~5% of the energy of a PC.

Thursday, December 13, 12

slide-8
SLIDE 8

Idle Phones Put Back to Work

  • Utilize idle charging periods for executing distributed jobs
  • n phones, for an enterprise.
  • 100s-1000s of smartphones, working in parallel
  • Why would I bother? I have a datacenter full of PCs.
  • untapped horsepower -- phones will run the same code

as your servers

  • no cost to bootstrap (phones are already in hand)
  • a phone consumes ~5% of the energy of a PC.
  • no wiring, switching or cooling needed.

Thursday, December 13, 12

slide-9
SLIDE 9

Idle Phones Put Back to Work

  • Utilize idle charging periods for executing distributed jobs
  • n phones, for an enterprise.
  • 100s-1000s of smartphones, working in parallel
  • Why would I bother? I have a datacenter full of PCs.
  • untapped horsepower -- phones will run the same code

as your servers

  • no cost to bootstrap (phones are already in hand)
  • a phone consumes ~5% of the energy of a PC.
  • no wiring, switching or cooling needed.
  • Case to consider smartphones as a supplement for existing

computational systems.

Thursday, December 13, 12

slide-10
SLIDE 10

job binary job binary

The Desired System

job binary

input

central server

Thursday, December 13, 12

slide-11
SLIDE 11

job binary job binary

The Desired System

job binary

input

central server

Thursday, December 13, 12

slide-12
SLIDE 12

job binary job binary

The Desired System

job binary central server

Thursday, December 13, 12

slide-13
SLIDE 13

job binary job binary

The Desired System

job binary central server

X unplugged

Thursday, December 13, 12

slide-14
SLIDE 14

job binary

The Desired System

job binary central server

X unplugged

Thursday, December 13, 12

slide-15
SLIDE 15

The Desired System

central server

R E S U L T

Thursday, December 13, 12

slide-16
SLIDE 16

The Desired System

central server

R E S U L T

Thursday, December 13, 12

slide-17
SLIDE 17

Our Contribution

Thursday, December 13, 12

slide-18
SLIDE 18

Our Contribution

  • Design & implement CWC.

Thursday, December 13, 12

slide-19
SLIDE 19

Our Contribution

  • Design & implement CWC.
  • What is not novel: utilizing idle CPUs (e.g., Condor: A

Hunter of Idle Workstations [Litzkow, Livny, Mutka]).

Thursday, December 13, 12

slide-20
SLIDE 20

Our Contribution

  • Design & implement CWC.
  • What is not novel: utilizing idle CPUs (e.g., Condor: A

Hunter of Idle Workstations [Litzkow, Livny, Mutka]).

  • What is novel: algorithm to optimally distribute computation

across smartphones with non-uniform bandwidths

Thursday, December 13, 12

slide-21
SLIDE 21

Our Contribution

  • Design & implement CWC.
  • What is not novel: utilizing idle CPUs (e.g., Condor: A

Hunter of Idle Workstations [Litzkow, Livny, Mutka]).

  • What is novel: algorithm to optimally distribute computation

across smartphones with non-uniform bandwidths

  • Non-uniform wireless bandwidth calls for novel

schedulers (such as CWC).

Thursday, December 13, 12

slide-22
SLIDE 22

Our Contribution

  • Design & implement CWC.
  • What is not novel: utilizing idle CPUs (e.g., Condor: A

Hunter of Idle Workstations [Litzkow, Livny, Mutka]).

  • What is novel: algorithm to optimally distribute computation

across smartphones with non-uniform bandwidths

  • Non-uniform wireless bandwidth calls for novel

schedulers (such as CWC).

  • Unique challenges not previously addressed.

100 10 20 30 40 Battery %

Time (Minutes)

Default Charging Curve Charge with CPU-intensive Jobs

Thursday, December 13, 12

slide-23
SLIDE 23

Effect of Bandwidth on Scheduling

  • Files to be processed by the phones (queue a file

until next phone becomes available).

  • The server logs the service time (queueing +

transfer + processing) of each file.

server

Phones have identical CPUs but varying bandwidths

Queue of Files

Thursday, December 13, 12

slide-24
SLIDE 24

Effect of Bandwidth on Scheduling

  • Files to be processed by the phones (queue a file

until next phone becomes available).

  • The server logs the service time (queueing +

transfer + processing) of each file.

server

Phones have identical CPUs but varying bandwidths

Queue of Files

Thursday, December 13, 12

slide-25
SLIDE 25

Effect of Bandwidth on Scheduling

  • Files to be processed by the phones (queue a file

until next phone becomes available).

  • The server logs the service time (queueing +

transfer + processing) of each file.

server

Phones have identical CPUs but varying bandwidths

Queue of Files

Thursday, December 13, 12

slide-26
SLIDE 26

Effect of Bandwidth on Scheduling

  • Two phones with the lowest bandwidths are

removed and the experiment is repeated.

server

Phones have identical CPUs but varying bandwidths

Queue of Files

Thursday, December 13, 12

slide-27
SLIDE 27

0.5 1 1000 2000 3000 CDF Service Time (ms)

Too Much Parallelism Hurts

4 Phones 6 Phones

Thursday, December 13, 12

slide-28
SLIDE 28

0.5 1 1000 2000 3000 CDF Service Time (ms)

Too Much Parallelism Hurts

  • Using only phones with high bandwidth can

compensate for reduced number of worker phones.

  • it is not a straight-forward choice to leverage the

full parallelism!

4 Phones 6 Phones

Thursday, December 13, 12

slide-29
SLIDE 29

Problem Statement

  • Given a set of jobs and their input files, how do we

partition and distribute these files across a set of smartphones, to minimize the makespan?

Time

Local Execution Time (depends on CPU clock speed, the job itself & the input partition size) File Transfer Time (depends on bandwidth and the input partition size)

Thursday, December 13, 12

slide-30
SLIDE 30

Problem Statement

  • Given a set of jobs and their input files, how do we

partition and distribute these files across a set of smartphones, to minimize the makespan?

Time

Local Execution Time (depends on CPU clock speed, the job itself & the input partition size) File Transfer Time (depends on bandwidth and the input partition size)

Thursday, December 13, 12

slide-31
SLIDE 31

Problem Statement

  • Given a set of jobs and their input files, how do we

partition and distribute these files across a set of smartphones, to minimize the makespan?

Time

Local Execution Time (depends on CPU clock speed, the job itself & the input partition size) File Transfer Time (depends on bandwidth and the input partition size)

Thursday, December 13, 12

slide-32
SLIDE 32

Problem Statement

  • Given a set of jobs and their input files, how do we

partition and distribute these files across a set of smartphones, to minimize the makespan?

Time

Local Execution Time (depends on CPU clock speed, the job itself & the input partition size) File Transfer Time (depends on bandwidth and the input partition size)

Thursday, December 13, 12

slide-33
SLIDE 33

Problem Statement

  • Given a set of jobs and their input files, how do we

partition and distribute these files across a set of smartphones, to minimize the makespan?

Time

Local Execution Time (depends on CPU clock speed, the job itself & the input partition size) File Transfer Time (depends on bandwidth and the input partition size)

Makespan

Thursday, December 13, 12

slide-34
SLIDE 34

Predicting Execution Times

Thursday, December 13, 12

slide-35
SLIDE 35

Predicting Execution Times

  • bandwidth to each phone periodically measured. but

cannot measure the local execution time for every job- phone pair.

Thursday, December 13, 12

slide-36
SLIDE 36

Predicting Execution Times

  • bandwidth to each phone periodically measured. but

cannot measure the local execution time for every job- phone pair.

  • run each job j using the slowest phone in the system:

e.g., with S MHz CPU

Thursday, December 13, 12

slide-37
SLIDE 37

Predicting Execution Times

  • bandwidth to each phone periodically measured. but

cannot measure the local execution time for every job- phone pair.

  • run each job j using the slowest phone in the system:

e.g., with S MHz CPU

  • if the slowest phone takes Ts ms, then another phone

with A MHz CPU should take Ts * S / A ms.

Thursday, December 13, 12

slide-38
SLIDE 38

Predicting Execution Times

  • bandwidth to each phone periodically measured. but

cannot measure the local execution time for every job- phone pair.

  • run each job j using the slowest phone in the system:

e.g., with S MHz CPU

  • if the slowest phone takes Ts ms, then another phone

with A MHz CPU should take Ts * S / A ms.

Thursday, December 13, 12

slide-39
SLIDE 39

Predicting Execution Times

  • bandwidth to each phone periodically measured. but

cannot measure the local execution time for every job- phone pair.

  • run each job j using the slowest phone in the system:

e.g., with S MHz CPU

  • if the slowest phone takes Ts ms, then another phone

with A MHz CPU should take Ts * S / A ms.

Thursday, December 13, 12

slide-40
SLIDE 40

Predicting Execution Times

  • bandwidth to each phone periodically measured. but

cannot measure the local execution time for every job- phone pair.

  • run each job j using the slowest phone in the system:

e.g., with S MHz CPU

  • if the slowest phone takes Ts ms, then another phone

with A MHz CPU should take Ts * S / A ms.

wrong estimates are corrected using execution reports sent to the central server.

Thursday, December 13, 12

slide-41
SLIDE 41

Minimum Height Bin Packing

  • Given a finite set of items U, a size for each

item in U and a bin capacity C.

  • partition U into disjoint sets U1, U2, .., Un s.t.

the sum of the item sizes in each Ui is <= C.

Thursday, December 13, 12

slide-42
SLIDE 42

Minimum Height Bin Packing

  • Given a finite set of items U, a size for each

item in U and a bin capacity C.

  • partition U into disjoint sets U1, U2, .., Un s.t.

the sum of the item sizes in each Ui is <= C.

Bin 1 Bin 2 Bin 3 C

Thursday, December 13, 12

slide-43
SLIDE 43

Minimum Height Bin Packing

  • Given a finite set of items U, a size for each

item in U and a bin capacity C.

  • partition U into disjoint sets U1, U2, .., Un s.t.

the sum of the item sizes in each Ui is <= C.

Bin 1 Bin 2 Bin 3 C

Thursday, December 13, 12

slide-44
SLIDE 44

Minimum Height Bin Packing

  • Given a finite set of items U, a size for each

item in U and a bin capacity C.

  • partition U into disjoint sets U1, U2, .., Un s.t.

the sum of the item sizes in each Ui is <= C.

Bin 1 Bin 2 Bin 3 C

Thursday, December 13, 12

slide-45
SLIDE 45

Minimum Height Bin Packing

Bin 1 Bin 2 Bin 3

Thursday, December 13, 12

slide-46
SLIDE 46

Minimum Height Bin Packing

Bin 1 Bin 2 Bin 3

Thursday, December 13, 12

slide-47
SLIDE 47

Minimum Height Bin Packing

  • Each job input is an item (not rigid, i.e., can be partitioned & packed in different bins)

Bin 1 Bin 2 Bin 3

Thursday, December 13, 12

slide-48
SLIDE 48

Minimum Height Bin Packing

  • Each job input is an item (not rigid, i.e., can be partitioned & packed in different bins)
  • Cost of partitioning (transferring files adds height depending on phone bandwidth)

Bin 1 Bin 2 Bin 3

Thursday, December 13, 12

slide-49
SLIDE 49

Minimum Height Bin Packing

  • Each job input is an item (not rigid, i.e., can be partitioned & packed in different bins)
  • Cost of partitioning (transferring files adds height depending on phone bandwidth)
  • Phones are bins (but they are not identical)

Bin 1 Bin 2 Bin 3

Thursday, December 13, 12

slide-50
SLIDE 50

Minimum Height Bin Packing

  • Each job input is an item (not rigid, i.e., can be partitioned & packed in different bins)
  • Cost of partitioning (transferring files adds height depending on phone bandwidth)
  • Phones are bins (but they are not identical)

Bin 1 Bin 2 Bin 3

  • Items occupy different heights depending on the bin they are packed in.
  • e.g., items behave like liquids.

Thursday, December 13, 12

slide-51
SLIDE 51

Scheduling Jobs

C

Sorted List of Inputs

Thursday, December 13, 12

slide-52
SLIDE 52

Scheduling Jobs

C

Sorted List of Inputs

Thursday, December 13, 12

slide-53
SLIDE 53

Scheduling Jobs

C

Sorted List of Inputs

Thursday, December 13, 12

slide-54
SLIDE 54

Scheduling Jobs

C

Sorted List of Inputs

Thursday, December 13, 12

slide-55
SLIDE 55

Scheduling Jobs

C

Sorted List of Inputs

Thursday, December 13, 12

slide-56
SLIDE 56

Scheduling Jobs

C

Sorted List of Inputs

Thursday, December 13, 12

slide-57
SLIDE 57

Scheduling Jobs

C

Sorted List of Inputs

Thursday, December 13, 12

slide-58
SLIDE 58

Scheduling Jobs

C

Sorted List of Inputs

Thursday, December 13, 12

slide-59
SLIDE 59

Scheduling Jobs

C

Sorted List of Inputs

Thursday, December 13, 12

slide-60
SLIDE 60

Scheduling Jobs

C

Sorted List of Inputs

Thursday, December 13, 12

slide-61
SLIDE 61

Scheduling Jobs

  • try to produce few partitions to reduce the aggregation load at the central

server, while minimizing C.

C

Sorted List of Inputs

Thursday, December 13, 12

slide-62
SLIDE 62

Automating Job Execution

  • We implement the CWC service in Android
  • runs in “background” -- no human input
  • exploits the compatibility between JVM and

Dalvik (a core subset of Java APIs are common)

  • leverages Java Reflection API to dynamically load

classes and execute methods defined by them

Thursday, December 13, 12

slide-63
SLIDE 63

Automating Job Execution

  • We implement the CWC service in Android
  • runs in “background” -- no human input
  • exploits the compatibility between JVM and

Dalvik (a core subset of Java APIs are common)

  • leverages Java Reflection API to dynamically load

classes and execute methods defined by them

.java .class .dex load exec Server (Traditional Java) Phone (Android)

Thursday, December 13, 12

slide-64
SLIDE 64

Automating Job Execution

  • We implement the CWC service in Android
  • runs in “background” -- no human input
  • exploits the compatibility between JVM and

Dalvik (a core subset of Java APIs are common)

  • leverages Java Reflection API to dynamically load

classes and execute methods defined by them

  • The same Java code runs on both PCs and

smartphones!

.java .class .dex load exec Server (Traditional Java) Phone (Android)

Thursday, December 13, 12

slide-65
SLIDE 65

Setup

  • 18 Android smartphones with CWC software.
  • Lightweight central server
  • Amazon EC2 small instance ( < 2 GB RAM )
  • Multi-threaded Java NIO implementation
  • Workload: 3 types of tasks
  • 1. Count the prime numbers in a file (50 files)
  • 2. Count the target word occurrences in a file (50 files)
  • 3. Blur pixels in a photo (atomic -- 50 photos)

Connectivity 802.11a / g, EDGE, 3G, 4G CPU Speed 806 MHz to 1.5 GHz Single and Dual Core

Thursday, December 13, 12

slide-66
SLIDE 66

Results

Shows a sub-set

  • f the phones

Thursday, December 13, 12

slide-67
SLIDE 67

Results

  • Balanced assignment for phones 4, 12, 13, 14

(and for other phones not shown) Shows a sub-set

  • f the phones

Thursday, December 13, 12

slide-68
SLIDE 68

Results

  • Balanced assignment for phones 4, 12, 13, 14

(and for other phones not shown)

  • Phones 2 and 9 finish earlier than others

(because they are “faster” than predicted). Shows a sub-set

  • f the phones

Thursday, December 13, 12

slide-69
SLIDE 69

Results

  • Makespan is 1120 seconds.
  • %88 of the jobs are not partitioned (i.e.,

running on one phone), %9 have 3 partitions and %3 have 4 partitions.

  • Balanced assignment for phones 4, 12, 13, 14

(and for other phones not shown)

  • Phones 2 and 9 finish earlier than others

(because they are “faster” than predicted). Shows a sub-set

  • f the phones

Thursday, December 13, 12

slide-70
SLIDE 70

Results

  • Makespan is 1120 seconds.
  • %88 of the jobs are not partitioned (i.e.,

running on one phone), %9 have 3 partitions and %3 have 4 partitions.

  • Balanced assignment for phones 4, 12, 13, 14

(and for other phones not shown)

  • Phones 2 and 9 finish earlier than others

(because they are “faster” than predicted). Shows a sub-set

  • f the phones
  • How about full parallelism?
  • each job has |P| partitions (one partition per

phone) -- makespan is 1720 seconds.

  • each job to a phone (as a whole) in a RR

fashion -- makespan is 1805 seconds.

Thursday, December 13, 12

slide-71
SLIDE 71

THANK YOU!

QUESTIONS?

Thursday, December 13, 12