SLIDE 1 Swarm
Ian Clarke
ian@uprizer.com
“Transparently distributed computation in the cloud”
Sunday, September 13, 2009
SLIDE 2 Swarm
Ian Clarke
ian.clarke@gmail.com
“Transparently Distributed Computation in the cloud”
Sunday, September 13, 2009
SLIDE 3 About me
- Degree in AI and Comp Sci from Edinburgh
University, Scotland (1995-1999)
- Designer and co-ordinator of Freenet, the first
decentralized P2P architecture (1999-present)
- Designed P2P video streaming system that later
became part of “Joost” (2003-2004)
- Founder and Chief Scientist of Revver (2004-2006)
- CEO of Uprizer Labs (2007-present)
Sunday, September 13, 2009
SLIDE 4
The Problem
Sunday, September 13, 2009
SLIDE 5 Building a web-app?
You want your development process to be:
- Cheap and fast to implement
- Scalable in the event of success
Sunday, September 13, 2009
SLIDE 6 Building a web-app?
You want your development process to be:
- Cheap and fast to implement
- Scalable in the event of success
Pick One!
Sunday, September 13, 2009
SLIDE 7
Examples
Sunday, September 13, 2009
SLIDE 8 Sunday, September 13, 2009
SLIDE 9 Sunday, September 13, 2009
SLIDE 10
- 22 hour outage after IPO in 1999
- Estimated cost: Over $2M
Sunday, September 13, 2009
SLIDE 11
- 22 hour outage after IPO in 1999
- Estimated cost: Over $2M
Sunday, September 13, 2009
SLIDE 12
- Periodic outages since it started, most
recently August ’09
- Forced fundamental rearchitecture
- Aside: Started with Ruby on Rails,
now using Scala
- 22 hour outage after IPO in 1999
- Estimated cost: Over $2M
Sunday, September 13, 2009
SLIDE 13 Sunday, September 13, 2009
SLIDE 14
How is this solved today?
Sunday, September 13, 2009
SLIDE 15 Database Architecture
MySql WebNode Cache WebNode Cache WebNode Cache
Sunday, September 13, 2009
SLIDE 16 Replicate databases
MySql WebNode Cache WebNode Cache WebNode Cache MySql MySql
Sunday, September 13, 2009
SLIDE 17 Map Reduce
- Certain problems may be broken into
“map” and “reduce” operations
- Interesting because the data stays still, the
computation moves
- Good at things like distributed sort,
distributed grep, etc
Sunday, September 13, 2009
SLIDE 18
Our Proposal: Swarm
Sunday, September 13, 2009
SLIDE 19
But first... Some background
Sunday, September 13, 2009
SLIDE 20
Scala
Sunday, September 13, 2009
SLIDE 21
- Compiles to Java bytecode
- so its fast and widely supported
Scala
Sunday, September 13, 2009
SLIDE 22
- Compiles to Java bytecode
- so its fast and widely supported
- Supports closures, and type-inference
- so it solves most of Java’s problems
Scala
Sunday, September 13, 2009
SLIDE 23
- Compiles to Java bytecode
- so its fast and widely supported
- Supports closures, and type-inference
- so it solves most of Java’s problems
- The upcoming Scala 2.8 supports “portable
continuations”
Scala
Sunday, September 13, 2009
SLIDE 24
Continuations
Sunday, September 13, 2009
SLIDE 25 What do continuations do?
computer program
- Like saving your position
in a video game
some point in the future
Sunday, September 13, 2009
SLIDE 26 Scala 2.8’s continuations support
Sunday, September 13, 2009
SLIDE 27 Scala 2.8’s continuations support
Sunday, September 13, 2009
SLIDE 28 Scala 2.8’s continuations support
Sunday, September 13, 2009
SLIDE 29 Scala 2.8’s continuations support
- “Delimited”
- Portable
- Implemented through a
code transformation
Sunday, September 13, 2009
SLIDE 30 Scala 2.8’s continuations support
- “Delimited”
- Portable
- Implemented through a
code transformation
Sunday, September 13, 2009
SLIDE 31
The Solution
Sunday, September 13, 2009
SLIDE 32 ?
What if we could distribute data and computation across multiple computers such that the programmer need not think about it?
Sunday, September 13, 2009
SLIDE 33
But how?
Sunday, September 13, 2009
SLIDE 34 But how?
- Move the computation, not the data
Sunday, September 13, 2009
SLIDE 35 But how?
- Move the computation, not the data
- Handle this transparently within the
framework
Sunday, September 13, 2009
SLIDE 36 But how?
- Move the computation, not the data
- Handle this transparently within the
framework
- Arrange the data to minimize movement of
the computation
Sunday, September 13, 2009
SLIDE 37 How does it work?
Program:
- 1. print a
- 2. print b
- 3. print c
a b c
Sunday, September 13, 2009
SLIDE 38 How does it work?
a b c Program:
- 1. print a
- 2. print b
- 3. print c
Sunday, September 13, 2009
SLIDE 39 How does it work?
a b c Program:
- 1. print a
- 2. print b
- 3. print c
Sunday, September 13, 2009
SLIDE 40 How does it work?
a b c Program:
- 1. print a
- 2. print b
- 3. print c
Sunday, September 13, 2009
SLIDE 41 Arranging data with graph clustering
Sunday, September 13, 2009
SLIDE 42 Forcing Swarm to migrate the continuation
Sunday, September 13, 2009
SLIDE 43 Sunday, September 13, 2009
SLIDE 44
Forced remote variable
Sunday, September 13, 2009
SLIDE 45 Sunday, September 13, 2009
SLIDE 46 What next?
- Just a simple prototype
- Many interesting sub-problems
- Open source
- Need your help!
Sunday, September 13, 2009
SLIDE 47 Storage
- How do we arrange the data for optimal
efficiency?
- What about concurrency?
- Software transactional memory
- Replication and redundancy
- Garbage collection
Sunday, September 13, 2009
SLIDE 48 A “universal” codebase
- Swarm requires that every node has the same
binary
- We could use the JVM’s classloader mechanism
to retrieve binaries as needed from a global namespace
- Will need to address issues of versioning and
security
Sunday, September 13, 2009
SLIDE 49 “Swarm aware” libraries
- Need “Swarm” aware collections classes
like Map, List, and Set
- Develop a storage system with capabilities
similar to a relational database
- The creation of a web framework around
Swarm (similar to “Rails” or “LiftWeb”)
Sunday, September 13, 2009
SLIDE 50 Swarm tools
- Continuations plugin imposes restrictions
- n the code that can be migrated
- “foreach”
- Serializable
- A Scala compiler plugin that understood
these limitations would be very useful
Sunday, September 13, 2009
SLIDE 51 Interested in helping?
http://code.google.com/p/swarm-dpl/ ian@uprizer.com
Sunday, September 13, 2009