Swarm Transparently distributed computation in the cloud Ian Clarke - - PowerPoint PPT Presentation

swarm
SMART_READER_LITE
LIVE PREVIEW

Swarm Transparently distributed computation in the cloud Ian Clarke - - PowerPoint PPT Presentation

Swarm Transparently distributed computation in the cloud Ian Clarke ian@uprizer.com Sunday, September 13, 2009 Swarm Transparently Distributed Computation in the cloud Ian Clarke ian.clarke@gmail.com Sunday, September 13, 2009


slide-1
SLIDE 1

Swarm

Ian Clarke

ian@uprizer.com

“Transparently distributed computation in the cloud”

Sunday, September 13, 2009
slide-2
SLIDE 2

Swarm

Ian Clarke

ian.clarke@gmail.com

“Transparently Distributed Computation in the cloud”

Sunday, September 13, 2009
slide-3
SLIDE 3

About me

  • Degree in AI and Comp Sci from Edinburgh

University, Scotland (1995-1999)

  • Designer and co-ordinator of Freenet, the first

decentralized P2P architecture (1999-present)

  • Designed P2P video streaming system that later

became part of “Joost” (2003-2004)

  • Founder and Chief Scientist of Revver (2004-2006)
  • CEO of Uprizer Labs (2007-present)
Sunday, September 13, 2009
slide-4
SLIDE 4

The Problem

Sunday, September 13, 2009
slide-5
SLIDE 5

Building a web-app?

You want your development process to be:

  • Cheap and fast to implement
  • Scalable in the event of success
Sunday, September 13, 2009
slide-6
SLIDE 6

Building a web-app?

You want your development process to be:

  • Cheap and fast to implement
  • Scalable in the event of success

Pick One!

Sunday, September 13, 2009
slide-7
SLIDE 7

Examples

Sunday, September 13, 2009
slide-8
SLIDE 8 Sunday, September 13, 2009
slide-9
SLIDE 9 Sunday, September 13, 2009
slide-10
SLIDE 10
  • 22 hour outage after IPO in 1999
  • Estimated cost: Over $2M
Sunday, September 13, 2009
slide-11
SLIDE 11
  • 22 hour outage after IPO in 1999
  • Estimated cost: Over $2M
Sunday, September 13, 2009
slide-12
SLIDE 12
  • Periodic outages since it started, most

recently August ’09

  • Forced fundamental rearchitecture
  • Aside: Started with Ruby on Rails,

now using Scala

  • 22 hour outage after IPO in 1999
  • Estimated cost: Over $2M
Sunday, September 13, 2009
slide-13
SLIDE 13 Sunday, September 13, 2009
slide-14
SLIDE 14

How is this solved today?

Sunday, September 13, 2009
slide-15
SLIDE 15

Database Architecture

MySql WebNode Cache WebNode Cache WebNode Cache

Sunday, September 13, 2009
slide-16
SLIDE 16

Replicate databases

MySql WebNode Cache WebNode Cache WebNode Cache MySql MySql

Sunday, September 13, 2009
slide-17
SLIDE 17

Map Reduce

  • Certain problems may be broken into

“map” and “reduce” operations

  • Interesting because the data stays still, the

computation moves

  • Good at things like distributed sort,

distributed grep, etc

  • Not general-purpose
Sunday, September 13, 2009
slide-18
SLIDE 18

Our Proposal: Swarm

Sunday, September 13, 2009
slide-19
SLIDE 19

But first... Some background

Sunday, September 13, 2009
slide-20
SLIDE 20

Scala

Sunday, September 13, 2009
slide-21
SLIDE 21
  • Compiles to Java bytecode
  • so its fast and widely supported

Scala

Sunday, September 13, 2009
slide-22
SLIDE 22
  • Compiles to Java bytecode
  • so its fast and widely supported
  • Supports closures, and type-inference
  • so it solves most of Java’s problems

Scala

Sunday, September 13, 2009
slide-23
SLIDE 23
  • Compiles to Java bytecode
  • so its fast and widely supported
  • Supports closures, and type-inference
  • so it solves most of Java’s problems
  • The upcoming Scala 2.8 supports “portable

continuations”

Scala

Sunday, September 13, 2009
slide-24
SLIDE 24

Continuations

Sunday, September 13, 2009
slide-25
SLIDE 25

What do continuations do?

  • Store the state of a

computer program

  • Like saving your position

in a video game

  • Resume execution at

some point in the future

Sunday, September 13, 2009
slide-26
SLIDE 26

Scala 2.8’s continuations support

Sunday, September 13, 2009
slide-27
SLIDE 27

Scala 2.8’s continuations support

  • “Delimited”
Sunday, September 13, 2009
slide-28
SLIDE 28

Scala 2.8’s continuations support

  • “Delimited”
  • Portable
Sunday, September 13, 2009
slide-29
SLIDE 29

Scala 2.8’s continuations support

  • “Delimited”
  • Portable
  • Implemented through a

code transformation

Sunday, September 13, 2009
slide-30
SLIDE 30

Scala 2.8’s continuations support

  • “Delimited”
  • Portable
  • Implemented through a

code transformation

  • Complicated!
Sunday, September 13, 2009
slide-31
SLIDE 31

The Solution

Sunday, September 13, 2009
slide-32
SLIDE 32

?

What if we could distribute data and computation across multiple computers such that the programmer need not think about it?

Sunday, September 13, 2009
slide-33
SLIDE 33

But how?

Sunday, September 13, 2009
slide-34
SLIDE 34

But how?

  • Move the computation, not the data
Sunday, September 13, 2009
slide-35
SLIDE 35

But how?

  • Move the computation, not the data
  • Handle this transparently within the

framework

Sunday, September 13, 2009
slide-36
SLIDE 36

But how?

  • Move the computation, not the data
  • Handle this transparently within the

framework

  • Arrange the data to minimize movement of

the computation

Sunday, September 13, 2009
slide-37
SLIDE 37

How does it work?

Program:

  • 1. print a
  • 2. print b
  • 3. print c

a b c

Sunday, September 13, 2009
slide-38
SLIDE 38

How does it work?

a b c Program:

  • 1. print a
  • 2. print b
  • 3. print c
Sunday, September 13, 2009
slide-39
SLIDE 39

How does it work?

a b c Program:

  • 1. print a
  • 2. print b
  • 3. print c
Sunday, September 13, 2009
slide-40
SLIDE 40

How does it work?

a b c Program:

  • 1. print a
  • 2. print b
  • 3. print c
Sunday, September 13, 2009
slide-41
SLIDE 41

Arranging data with graph clustering

Sunday, September 13, 2009
slide-42
SLIDE 42

Forcing Swarm to migrate the continuation

Sunday, September 13, 2009
slide-43
SLIDE 43 Sunday, September 13, 2009
slide-44
SLIDE 44

Forced remote variable

Sunday, September 13, 2009
slide-45
SLIDE 45 Sunday, September 13, 2009
slide-46
SLIDE 46

What next?

  • Just a simple prototype
  • Many interesting sub-problems
  • Open source
  • Need your help!
Sunday, September 13, 2009
slide-47
SLIDE 47

Storage

  • How do we arrange the data for optimal

efficiency?

  • What about concurrency?
  • Software transactional memory
  • Replication and redundancy
  • Garbage collection
Sunday, September 13, 2009
slide-48
SLIDE 48

A “universal” codebase

  • Swarm requires that every node has the same

binary

  • We could use the JVM’s classloader mechanism

to retrieve binaries as needed from a global namespace

  • Will need to address issues of versioning and

security

Sunday, September 13, 2009
slide-49
SLIDE 49

“Swarm aware” libraries

  • Need “Swarm” aware collections classes

like Map, List, and Set

  • Develop a storage system with capabilities

similar to a relational database

  • The creation of a web framework around

Swarm (similar to “Rails” or “LiftWeb”)

Sunday, September 13, 2009
slide-50
SLIDE 50

Swarm tools

  • Continuations plugin imposes restrictions
  • n the code that can be migrated
  • “foreach”
  • Serializable
  • A Scala compiler plugin that understood

these limitations would be very useful

Sunday, September 13, 2009
slide-51
SLIDE 51

Interested in helping?

http://code.google.com/p/swarm-dpl/ ian@uprizer.com

Sunday, September 13, 2009