The Gearman Cookbook OSCON 2010 Eric Day http://oddments.org/ - - PowerPoint PPT Presentation

the gearman cookbook
SMART_READER_LITE
LIVE PREVIEW

The Gearman Cookbook OSCON 2010 Eric Day http://oddments.org/ - - PowerPoint PPT Presentation

The Gearman Cookbook OSCON 2010 Eric Day http://oddments.org/ Senior Software Engineer @ Rackspace Thanks for being here! OSCON 2010 The Gearman Cookbook 2 Ask questions! Grab a mic for long questions. OSCON 2010 The Gearman Cookbook 3


slide-1
SLIDE 1

The Gearman Cookbook

OSCON 2010 Eric Day http://oddments.org/ Senior Software Engineer @ Rackspace

slide-2
SLIDE 2

OSCON 2010 The Gearman Cookbook 2

Thanks for being here!

slide-3
SLIDE 3

OSCON 2010 The Gearman Cookbook 3

Ask questions!

Grab a mic for long questions.

slide-4
SLIDE 4

OSCON 2010 The Gearman Cookbook 4

Use the source...

Source: 00

slide-5
SLIDE 5

OSCON 2010 The Gearman Cookbook 5

What is Gearman?

slide-6
SLIDE 6

OSCON 2010 The Gearman Cookbook 6

It is not German.

(well, not entirely at least)

slide-7
SLIDE 7

OSCON 2010 The Gearman Cookbook 7

A protocol with multiple implementations.

slide-8
SLIDE 8

OSCON 2010 The Gearman Cookbook 8

A message queue.

slide-9
SLIDE 9

OSCON 2010 The Gearman Cookbook 9

A job coordinator.

slide-10
SLIDE 10

OSCON 2010 The Gearman Cookbook 10

MANAGER GEARMAN

slide-11
SLIDE 11

OSCON 2010 The Gearman Cookbook 11

“A massively distributed, massively fault tolerant fork mechanism.”

  • Joe Stump, SimpleGeo
slide-12
SLIDE 12

OSCON 2010 The Gearman Cookbook 12

A building block for distributed architectures.

slide-13
SLIDE 13

OSCON 2010 The Gearman Cookbook 13

Features

  • Open Source
  • Simple & Fast
  • Multi-language
  • Flexible application design
  • Embeddable
  • No single point of failure
slide-14
SLIDE 14

OSCON 2010 The Gearman Cookbook 14

How does Gearman work?

slide-15
SLIDE 15

OSCON 2010 The Gearman Cookbook 15

slide-16
SLIDE 16

OSCON 2010 The Gearman Cookbook 16

slide-17
SLIDE 17

OSCON 2010 The Gearman Cookbook 17

While large-scale architectures work well, you can start off simple.

Source: 01

slide-18
SLIDE 18

OSCON 2010 The Gearman Cookbook 18

Foreground

(synchronous)

  • r

Background

(asynchronous)

Source: 02

slide-19
SLIDE 19

OSCON 2010 The Gearman Cookbook 19

Questions?

slide-20
SLIDE 20

OSCON 2010 The Gearman Cookbook 20

Let's get cooking!

slide-21
SLIDE 21

OSCON 2010 The Gearman Cookbook 21

Required Ingredients:

slide-22
SLIDE 22

OSCON 2010 The Gearman Cookbook 22

Job Server

  • Perl Server (Gearman::Server in CPAN)
  • The original implementation
  • Actively maintained by folks at SixApart
  • C Server (https://launchpad.net/gearmand)
  • Rewrite for performance and threading
  • Added new features like persistent queues
  • Different port (IANA assigned 4730)
  • Now moving to C++
slide-23
SLIDE 23

OSCON 2010 The Gearman Cookbook 23

Client API

  • Available for most common languages
  • Command line tool
  • User defined functions in SQL databases
  • MySQL
  • PostgreSQL
  • Drizzle
slide-24
SLIDE 24

OSCON 2010 The Gearman Cookbook 24

Worker API

  • Available for most common languages
  • Usually in the same packages as the client API
  • Command line tool
slide-25
SLIDE 25

OSCON 2010 The Gearman Cookbook 25

Optional Ingredients

  • Databases
  • Shared or distributed file systems
  • Other network protocols
  • HTTP
  • E-Mail
  • Domain specific libraries
  • Image manipulation
  • Full-text indexing
slide-26
SLIDE 26

OSCON 2010 The Gearman Cookbook 26

Recipes

  • Scatter/Gather
  • Map/Reduce
  • Asynchronous Queues
  • Pipeline Processing
slide-27
SLIDE 27

OSCON 2010 The Gearman Cookbook 27

Scatter/Gather

  • Perform a number of tasks concurrently
  • Great way to speed up web applications
  • Tasks don't need to be related
  • Allocate dedicated resources for different tasks
  • Push logic down to where data exists
slide-28
SLIDE 28

OSCON 2010 The Gearman Cookbook 28

Scatter/Gather

Client

DB Query DB Query Image Resize Location Search Full-text Search

slide-29
SLIDE 29

OSCON 2010 The Gearman Cookbook 29

Scatter/Gather

  • Start simple with a single task
  • Multiple tasks
  • Concurrent tasks

Source: 03

slide-30
SLIDE 30

OSCON 2010 The Gearman Cookbook 30

Scatter/Gather

  • Concurrent tasks with different workers
  • All tasks run in the time for longest running
  • Must have enough workers available

Source: 04

slide-31
SLIDE 31

OSCON 2010 The Gearman Cookbook 31

Note on Resize Worker

slide-32
SLIDE 32

OSCON 2010 The Gearman Cookbook 32

Web Applications

  • Reduce page load time with concurrency
  • Don't tie up web server resources
  • Improve time to first byte
  • Start non-blocking requests
  • Send first part of response
  • Block when you need one of the results
slide-33
SLIDE 33

OSCON 2010 The Gearman Cookbook 33

Questions?

slide-34
SLIDE 34

OSCON 2010 The Gearman Cookbook 34

Map/Reduce

  • Similar to scatter/gather, but split up one task
  • Push logic to where data exists (map)
  • Report aggregates or other summary (reduce)
  • Can be multi-tier
slide-35
SLIDE 35

OSCON 2010 The Gearman Cookbook 35

Map/Reduce

Client

Task T

Task T0 Task T1 Task T2 Task T3 Task T0

slide-36
SLIDE 36

OSCON 2010 The Gearman Cookbook 36

Map/Reduce

Client

Task T

Task T0 Task T1 Task T2 Task T3 Task T0

Task T00 Task T01 Task T02

slide-37
SLIDE 37

OSCON 2010 The Gearman Cookbook 37

Log Service

  • Push all log entries to log_collect queue
  • tail -f access_log | gearman -n -f log_collect
  • Natural spreading between workers when busy
  • Can shutdown workers to help balance
  • Worker for each operation per log server
  • Push operations to where data resides
slide-38
SLIDE 38

OSCON 2010 The Gearman Cookbook 38

Log Service

Source: 05

slide-39
SLIDE 39

OSCON 2010 The Gearman Cookbook 39

Questions?

slide-40
SLIDE 40

OSCON 2010 The Gearman Cookbook 40

Asynchronous Queues

  • They help you scale
  • Not everything needs immediate processing
  • Sending e-mail, tweets, …
  • Log entries and other notifications
  • Data insertion and indexing
  • Allows for batch operations
slide-41
SLIDE 41

OSCON 2010 The Gearman Cookbook 41

Delayed E-Mail

  • Replace:
  • With:

# Send email right now mail($to_address, $subject, $body, $headers); # Put email in queue to send $client = new GearmanClient(); $client->addServer('127.0.0.1', 4730); $client->doBackground('send_email', serialize($email_options)); Source: 06

slide-42
SLIDE 42

OSCON 2010 The Gearman Cookbook 42

Database Updates

  • Also useful as a database trigger
  • Start background jobs on database changes
  • Requires MySQL UDF package

CREATE TRIGGER tweet_blog BEFORE INSERT ON blog_entries FOR EACH ROW SET @ret=gman_do_background('send_tweet', CONCAT(NEW.title, " - ", NEW.url));

slide-43
SLIDE 43

OSCON 2010 The Gearman Cookbook 43

Questions?

slide-44
SLIDE 44

OSCON 2010 The Gearman Cookbook 44

Pipeline Processing

  • Some tasks need a series of transformations
  • Chain workers to send data for the next step

Client

Task T

Worker

Operation 1

Client

Task T

Worker

Operation 2

Worker

Operation 3

Output

slide-45
SLIDE 45

OSCON 2010 The Gearman Cookbook 45

Search Engine

  • Insert URLs, track duplicates
  • Fetch contents of URLs
  • Store URLs with title and body
  • Search stored URLs
slide-46
SLIDE 46

OSCON 2010 The Gearman Cookbook 46

Search Engine

Insert Fetch Store/Search Insert Search

Source: 07

slide-47
SLIDE 47

OSCON 2010 The Gearman Cookbook 47

Questions?

slide-48
SLIDE 48

OSCON 2010 The Gearman Cookbook 48

Persistent Queues

  • By default, jobs are only stored in memory
  • Various contributions from community
  • MySQL/Drizzle
  • PostgreSQL
  • SQLite
  • Tokyo Cabinet
  • memcached (not always “persistent”)
slide-49
SLIDE 49

OSCON 2010 The Gearman Cookbook 49

Persistent Queues

  • Use at your own risk, test in your environment!
  • Configure back-end to meet your performance

and durability needs

Source: 08

slide-50
SLIDE 50

OSCON 2010 The Gearman Cookbook 50

Timeouts

  • By default, operations block forever
  • Clients may want a timeout on foreground jobs
  • Workers may need to periodically run other

code besides job callback

Source: 09

slide-51
SLIDE 51

OSCON 2010 The Gearman Cookbook 51

gearmand --help

  • --job-retries - Prevent poisonous jobs
  • --worker-wakeup - Don't wake up all workers for

every job

  • --threads - Run multiple I/O threads (C only)
  • --protocol - Load pluggable protocols (C only)
slide-52
SLIDE 52

OSCON 2010 The Gearman Cookbook 52

New Distributed Applications

  • Think of scalable cloud architectures
  • Not just LAMP on a virtual machine
  • Elastic servers and services (workers)
  • New data models
  • Use eventual consistency whenever possible
  • Blogs, wikis, and other web apps powered by

EC and queues, not a single logical database

slide-53
SLIDE 53

OSCON 2010 The Gearman Cookbook 53

Get involved!

  • http://gearman.org/
  • Mailing list, documentation, related projects
  • #gearman on irc.freenode.net
  • Contact me at: http://oddments.org/
  • Stickers!