Developing Erlang At Yahoo Nick Gerakines and Mark Zweifel with - - PowerPoint PPT Presentation

developing erlang at yahoo
SMART_READER_LITE
LIVE PREVIEW

Developing Erlang At Yahoo Nick Gerakines and Mark Zweifel with - - PowerPoint PPT Presentation

Developing Erlang At Yahoo Nick Gerakines and Mark Zweifel with contributions by Chris Goffinet Old Friends Lisp, Scheme, Erlang, etc Lots of Official Languages C/C++ Java PHP Perl Unofficial Languages Ruby


slide-1
SLIDE 1

Developing Erlang At Yahoo

Nick Gerakines and Mark Zweifel with contributions by Chris Goffinet

slide-2
SLIDE 2

Old Friends

Lisp, Scheme, Erlang, etc

slide-3
SLIDE 3

Lots of “Official” Languages

  • C/C++
  • Java
  • PHP
  • Perl
slide-4
SLIDE 4

Unofficial Languages

  • Ruby
  • Objective-C
  • ... Erlang!
slide-5
SLIDE 5

Not the first to go down this path.

slide-6
SLIDE 6

Delicious

slide-7
SLIDE 7

2.0 Launch is Huge

  • Out on July 31st -- Over a year in the

making

  • Complete rewrite, front to back
slide-8
SLIDE 8

Uses Erlang!

slide-9
SLIDE 9
  • Mostly C++ (is OO, I know)
  • Ties to several subsystems to delegate

large tasks, aka spam, search, algo, etc

  • Several subsystems built in Erlang
slide-10
SLIDE 10

Use Case #1 Data Migrations

slide-11
SLIDE 11
  • Rewrites are hard.
  • More than just a row-to-row data copy.
slide-12
SLIDE 12

Not just one. 2.0 involved simultaneous front- end and back-end development There were several migrations

  • f the entire system done over

the course of development

slide-13
SLIDE 13
  • Written in Perl
  • Multiple threading modules used
  • No throttling or scaling of work in real-

time

  • Hard to debug
  • Start/Stop was a nightmare

First Attempt

slide-14
SLIDE 14

Second Attempt

  • Rewritten into Erlang services
  • Crazy-fast
  • System was introspective and self-monitoring
  • Dynamic scaling/throttling
  • Live migration status updates
slide-15
SLIDE 15

Compute, Store & Write

  • Created large snapshots of the entire d1

system for processing

  • Phase 1 -- Compute diffs and store
  • Fragmented Mnesia stores around ~50

gigs a piece, up to 6 “cells”

  • Phase 2 -- Write data into d2 system
slide-16
SLIDE 16

Concurrency saved migrations

slide-17
SLIDE 17

Mnesia Erlang/OTP

Yeah, that’s it.

slide-18
SLIDE 18

Ports!

  • Several systems required interfaces to Perl

scripts or C/C++ libraries

  • Leveraged data auditing tool in Perl
  • Could recycle non-Erlang code to really

maximize efficiency

  • Included Yahoo! specific functions, string/

language encoding and detection.

slide-19
SLIDE 19

Use Case #2 Algorithmics

slide-20
SLIDE 20

Before

  • Perl on top of cron jobs
  • Perl can be difficult to manage
  • Jobs can be very database intensive
slide-21
SLIDE 21

After

  • Rewritten into a number of small, independent

systems

  • Systems can be tweaked while live and running in

production

  • No cron, all running in real time
  • Self-monitoring recursive operations
slide-22
SLIDE 22

Mnesia Erlang/OTP

Sound familiar?

slide-23
SLIDE 23

Concurrency

  • Could leverage 600-700% of the CPU
  • Algorithms were made friendly to parallel

processing

  • Introspection facilities let us scale up and

down load to run at peak throughput

slide-24
SLIDE 24

Use Case #3 Spam Demographics

slide-25
SLIDE 25

Before

  • Was a collection of several (3-6) Perl

scripts

  • Was very ad-hoc
  • Worked pretty well
slide-26
SLIDE 26

After

  • Rewritten into a very small Erlang module
  • Systems can be tweaked while live and

running in production

slide-27
SLIDE 27

Use Case #4 Rolling Migrations

slide-28
SLIDE 28

There was no before

This entire system was written in Erlang from scratch to bring the entire d2 system up to date to the hour.

slide-29
SLIDE 29

Architecture

  • d1 Reader loop -- Monitors changes in the

d1 system

  • d1 Processing loop -- Would act on the

changes and prepare them for d2 input

slide-30
SLIDE 30

Delicious Complications

slide-31
SLIDE 31

There’s more!

slide-32
SLIDE 32

“If we knew what we were doing, it wouldn't be called research, would it?”

  • - Albert Einstein
slide-33
SLIDE 33
  • Erlang is foreign.
  • Engineers are usually stubborn.
  • It’s very easy to get distracted with lots of

design meetings for new technologies.

  • Tension was already high, adding a new

language into the mix added uncertainty.

slide-34
SLIDE 34

MyBlogLog

slide-35
SLIDE 35

Use Case #5 Distributed Hash Table

slide-36
SLIDE 36
  • Huge memory store for simple data

structures

  • Needed to be fault tolerant
  • Data source must be multi-master
  • Thrift interface
slide-37
SLIDE 37

Erlang/OTP Mnesia + Tokyo Cabinet Memcached

slide-38
SLIDE 38

Use Case #6 Auto-Tagging Engine

slide-39
SLIDE 39
  • Extends algorithmic functionality to the

DHT

  • In staging environments, processes over a

million tags a day at 50% capacity.

slide-40
SLIDE 40

Bumps along the way

slide-41
SLIDE 41

Using Erlang At Yahoo

slide-42
SLIDE 42

Strengths

  • Extremely good at fault-tolerant distributed

applications.

  • Ideal for messaging, communications and logging.
  • Distributed algorithms
  • Long running jobs with heavy monitoring

requirements.

  • Agile development process
  • Web services
slide-43
SLIDE 43

Weaknesses

  • There are documentation gaps.
  • Hasn’t achieved critical mass yet.
  • The community is thin.
slide-44
SLIDE 44

What We Did

  • Internal packages and builds for multiple

platforms.

  • Created a simple build process based on a

single Erlang install path.

  • Standardized start/stop processes.
slide-45
SLIDE 45

Proving your case

  • Ignore the nay-sayers.
  • Spend a small amount of time prototyping

and creating a proof of concept and immediately test it.

  • Use every resource available to you.
slide-46
SLIDE 46

Thanks

Nick Gerakines <gerakine@yahoo-inc.com> Mark Zweifel <markez@yahoo-inc.com> Chris Goffinet <cgoffinet@yahoo-inc.com>