Designing for Scalability Patrick Linskey pcl@apache.org Patrick - - PowerPoint PPT Presentation

designing for scalability
SMART_READER_LITE
LIVE PREVIEW

Designing for Scalability Patrick Linskey pcl@apache.org Patrick - - PowerPoint PPT Presentation

Designing for Scalability Patrick Linskey pcl@apache.org Patrick Linskey Apache OpenJPA Committer JPA 1, 2 EG Member EJB3, EJB3.1 EG Member Agenda Define and discuss scalability Vertical Horizontal Examine ways to make


slide-1
SLIDE 1

Designing for Scalability

Patrick Linskey pcl@apache.org

slide-2
SLIDE 2

Patrick Linskey Apache OpenJPA Committer JPA 1, 2 EG Member EJB3, EJB3.1 EG Member

slide-3
SLIDE 3

Agenda

Define and discuss scalability

  • Vertical
  • Horizontal

Examine ways to make software scale

  • Code / Algorithms
  • Asynchronous Libraries
  • Other Languages
slide-4
SLIDE 4

Scalability

Ability to increase the total number of

  • perations performed in a unit of time

Vertical Scalability:

  • “Make the machine bigger”

Horizontal Scalability

  • “Add more machines”
slide-5
SLIDE 5

Bottlenecks

Limit the scalability of a system Intrinsic bottlenecks Artificial bottlenecks

slide-6
SLIDE 6

Example Problem Domain

Financial fund management Multiple in-house engineering needs

  • Trade Execution
  • Trade Settlement
  • Strategy Definition
  • Strategy Simulation
  • Portfolio Risk Analysis
slide-7
SLIDE 7

Vertical Scalability

Translated into Java:

Scaling Within a Machine

slide-8
SLIDE 8

Vertical Scale Factors In Your Control

Improve code efficiency

  • Memory
  • CPU

Optimize I/O between physical tiers

  • Web 2.0: beware!

Make code scale across multiple cores / CPUs

slide-9
SLIDE 9

Code Optimization Possibilities

Performance and scalability are linked Scalability: more operations per time unit

time time

Architectural

time

“Quick and dirty”

slide-10
SLIDE 10

“Scale” Vertically via Code Optimization

Reduce copying, looping, etc.

  • “Write good code”

SQL statement batching

  • PreparedStatement.addBatch()
  • ORM frameworks

Transaction batching

  • Especially powerful in XA environments
  • JMS message batching
slide-11
SLIDE 11

Synchronization

synchronized is for asynchronous execution

  • “Execute this block of code in its entirety

before others that share this lock” Modern computers handle high* concurrency

  • synchronized is often a bottleneck
  • Avoid synchronization at runtime at all costs
  • uncontended synchronization is cheap
slide-12
SLIDE 12

Write-Once Shared Memory

class SlowTradeManager { private Set types; public synchronized Set getTradeTypes() { if (types == null) types = loadTypeData(); return types; } }

loadTypeData() might be called more than once

class FastTradeManager { private Set types; public Set getTradeTypes() { if (types == null) types = loadTypeData(); return types; } }

slide-13
SLIDE 13

Fund Risk Balancing

Problem

  • Multiple traders act on the same

security Solution

  • Maintain fund-global position data
  • Mutable shared state!
slide-14
SLIDE 14

time

Multi-machine solution (circa 1998)

slide-15
SLIDE 15

time

Multi-core / CPU synchronization

sync sync sync sync sync

slide-16
SLIDE 16

Mutable Shared Memory

import java.util.concurrent.atomic.AtomicDouble; class AggregateFundPosition { private AtomicDouble totalExposure = new AtomicDouble(0); public double incrementBy(double amount) { while (true) { double old = totalExposure.get(); double next = old + amount; if (counter.compareAndSet(old, next)) return next; } } }

slide-17
SLIDE 17

time

Synchronization-free shared state

CAS CAS CAS CAS CAS CAS

slide-18
SLIDE 18

Horizontal Scalability

Translated into Java:

Scaling Across Machines

slide-19
SLIDE 19

Horizontal Scaling: Add More Servers

All doing the same thing Partitioned by infrastructure layer Partitioned by application role Partitioned along data graph boundaries

slide-20
SLIDE 20

Build a Farm

OS App OS App OS App OS App OS App OS App OS App Load Balancer

slide-21
SLIDE 21

OS

Web

OS

EJB

Slow Down

OS App

237ms 983ms

slide-22
SLIDE 22

Divide and Conquer

Old as `time` itself

  • mail, news, telnet all on different servers

You use partitioning every day

  • Telephone call routing
  • ATM card transactions
  • Stock markets
  • Elevator banks
slide-23
SLIDE 23

OS Apps OS Apps OS Apps OS Apps

Break Up Stateful Services

Worldwide Trade Execution, Clearing, Position Analysis

slide-24
SLIDE 24

OS Apps OS Apps OS Apps OS Apps OS Apps OS Apps

Partition Along Application Boundaries

Trade Clearing Trade Execution Position Analysis

slide-25
SLIDE 25

OS Apps OS Apps OS Apps OS Apps OS Apps OS Apps

Partition along data set “fault lines”

US Europe Asia

slide-26
SLIDE 26

Asynchrony in Java

Java is a mostly synchronous environment Business algorithms often aren’t Take advantage of this where possible

  • JMS message queues
  • java.util.concurrent.ExecutorService
  • commonj.work.WorkManager
  • Scheduled jobs
slide-27
SLIDE 27

Async Tasks and Resource Utilization

 Good JMS servers / ExecutorServices / WorkManagers do resource tuning and optimization

  • Limit threads allocated to async processing
  • Configure priority of async vs. sync (i.e., HTTP request)

25 50 75 100

Trade Execution and Strategy Definition Strategy Analysis Trade Settlement

async tasks throttled async task backlog handled

slide-28
SLIDE 28

Adapt Requirements to Concurrency

Identify slow-running / expensive parts of the user experience Work with requirements team to replace these with asynchronous processes

  • Website usage statistics generated nightly

instead of on-demand

  • Dynamic PDF delivery via email instead of

embedded web content

slide-29
SLIDE 29

Starting from Scratch

slide-30
SLIDE 30

Choose Your Toolset

Java makes synchronization easy

  • ... but synchronization != scalability

Other languages avoid shared state

  • Rely on message-passing instead
slide-31
SLIDE 31

Erlang: Functional, Asynchronous, Mature

Designed for concurrency in the language

  • Parallel execution
  • Intrinsic hot-redeploy
  • State can only be assigned once

Communication happens via message-passing between actors

  • No threads no shared state!
  • JMS-like behavior; language-native syntax
slide-32
SLIDE 32

Scala: Functional Programming for the JVM

Java-integrated

  • Designed by Java stalwart Martin Odersky

JVM-optimized Supports Erlang-style concurrency

slide-33
SLIDE 33

Compute Grids

Federate your data around a cluster Decompose your algorithm into serializable work items Let the compute grid send your work items to the data

slide-34
SLIDE 34

Decision Factors

What are your application requirements?

  • How many concurrent operations?
  • How big of a workload?
  • What sorts of SLAs?

Tolerance of deployment complexity?

  • How about your operations, QA teams?
slide-35
SLIDE 35

Recap

 Concepts

  • Scalability
  • Bottlenecks
  • Synchronization
  • Asynchrony vs.

concurrency

  • Compare-and-set
  • Application Partitioning
  • Synchronous tasks vs.

asynchronous tasks  Technology

  • java.util.concurrent
  • j.u.concurrent.atomic
  • Operation batching
  • Transactions
  • SQL
  • JMS; Executor;

WorkManager

  • Scala and Erlang
  • Hibernate Shards
  • OpenJPA Slice
slide-36
SLIDE 36

Questions

Patrick Linskey pcl@apache.org