INTRODUCTION TO BIGCOUCH robert newson couchdb conf berlin - - PowerPoint PPT Presentation

introduction to bigcouch
SMART_READER_LITE
LIVE PREVIEW

INTRODUCTION TO BIGCOUCH robert newson couchdb conf berlin - - PowerPoint PPT Presentation

INTRODUCTION TO BIGCOUCH robert newson couchdb conf berlin january 2013 1 Friday, 25 January 13 INTRODUCTIONS robert newson BigCouch CouchDB pmc member Putting the C back in CouchDB irc menace Contact rnewson@apache.org rnewson


slide-1
SLIDE 1

1

INTRODUCTION TO BIGCOUCH

robert newson

couchdb conf berlin january 2013

Friday, 25 January 13

slide-2
SLIDE 2

INTRODUCTIONS

2

robert newson

CouchDB pmc member irc menace

BigCouch

Putting the “C” back in CouchDB

Contact

rnewson@apache.org rnewson in #cloudant or #couchdb

@rnewson

Friday, 25 January 13

slide-3
SLIDE 3

WHAT WE TALK ABOUT WHEN WE TALK ABOUT SCALING

3

  • Horizontal scaling: more servers creates more capacity
  • Transparent to the application: adding more capacity should not

affect the business logic of the application.

  • No single point of failure.

http://adam.heroku.com/past/2009/7/6/sql_databases_dont_scale/

Pseudo Scalars

Friday, 25 January 13

slide-4
SLIDE 4

BIGCOUCH = COUCH+SCALING

  • Horizontal Scalability

Easily add storage capacity by adding more servers Computing power (views, compaction, etc.) scales with more servers

  • No single point of failure (SPOF)

Any node can handle any request With quorum, individual nodes can come and go

  • Transparent to the Application

All clustering operations take place “behind the curtain” ‘looks’ like a single server instance of Couch, just with more awesome asterisks and caveats discussed later

4

Friday, 25 January 13

slide-5
SLIDE 5

GRAPHICAL REPRESENTATION

5

hash(blah) = E

Load Balancer

PUT http://rnewson.cloudant.com/dbname/blah?w=2

N=3 W=2 R=2

Node 1 A B C D N

  • d

e 2 B C D E Node 3 C D E F Node 4 D E F G Node 24 X Y Z A

  • Clustering in a ring (a la

Dynamo)

  • Any node can handle a

request

  • O(1) lookup
  • Quorum system (N, R, W)
  • Views distributed like

documents

  • Distributed Erlang
  • Masterless

Friday, 25 January 13

slide-6
SLIDE 6
  • Shopping List

Dependencies

  • Erlang (R13B03+)
  • ICU
  • Spidermonkey
  • LibCurl
  • OpenSSL
  • make
  • Python

BUILDING YOUR FIRST CLUSTER

6

brew install erlang icu4c spidermonkey brew ln icu4c git clone https://github.com/cloudant/bigcouch.git cd bigcouch ./configure make dev

Friday, 25 January 13

slide-7
SLIDE 7

BUILDING YOUR FIRST CLUSTER

7

dev1 dev2 dev3

Join the cluster

rel/dev1/bin/bigcouch rel/dev2/bin/bigcouch rel/dev3/bin/bigcouch curl localhost:15986/nodes/dev2@127.0.0.1 -X PUT -d '{}' curl localhost:15986/nodes/dev3@127.0.0.1 -X PUT -d '{}' curl http://localhost:15984/_membership

... and verify

Friday, 25 January 13

slide-8
SLIDE 8

QUORUM: IT’S YOUR FRIEND

  • BigCouch Clusters are governed by 4 parameters

Q: Number of shards per DB N: Number of redundant copies of each document R: Read quorum constant W: Write quorum constant (NB: Also consider the number of nodes in a cluster)

8

For the next few examples, consider a 5 node cluster

1 2 3 4 5

Friday, 25 January 13

slide-9
SLIDE 9

Q

  • Q: The number of shards over which a DB will be spread

consistent hashing space divided into Q pieces Specified at DB creation time possible for more than one shard to live on a node Documents deterministically mapped to a shard More shards = faster view builds Less shards = better memory management

9

Q=1 Q=4 1 2 3 4 5

Friday, 25 January 13

slide-10
SLIDE 10

1 2 3 4 5

N

10

  • N: The number of redundant copies of each document

Choose N>1 for fault-tolerant cluster Default specified at DB creation Each shard is copied N times Recommend N>2

N=3

Friday, 25 January 13

slide-11
SLIDE 11

1 2 3 4 5

W

11

  • W: The number of document copies that must be saved

before a document is “written”

W must be less than or equal to N W=1, maximise throughput W=N, maximise consistency Allow for “202” created response Can be specified at write time

W=2

‘201 ok’

Friday, 25 January 13

slide-12
SLIDE 12

1 2 3 4 5

R

  • R: The number of identical document copies that must be

read before a read request is ok

R must be less than or equal to N R=1, minimise latency R=N, maximise consistency Can be specified at query time

12

R=2

Friday, 25 January 13

slide-13
SLIDE 13

Views

  • So far, so good, but what about secondary indexes?

Views are built locally on each node, for each DB shard Merge sort at query time using exactly one copy of each shard Run a final re-reduce on each row if the view has a reduce

  • _changes feed works similarly, but has

no global ordering

Sequence numbers converted to JSON to encode more information

13 14

1 2 3 4 5

Friday, 25 January 13

slide-14
SLIDE 14

API AND CAVEATS

  • Clustered API

By default listens on port 5984 All single-doc operations and most view operations

  • What’s Different?

update_seq value is now opaque JSON rereduce=true always called on reduce views no temporary views no all_or_nothing: true

  • ‘Backdoor’ Access

Able to reach a single node (i.e. at the shard level) By default listens on port 5986 Allows you to trigger local view updates, compactions, etc.

14

Friday, 25 January 13

slide-15
SLIDE 15

15

Hacker Portion

The BigCouch Stack

CHTTPD Fabric Rexi Mem3 Embedded CouchDB Mochiweb, Spidermonkey, etc.

Friday, 25 January 13

slide-16
SLIDE 16

16

chttpd / fabric

  • Chttpd

Cut-n-paste of couch_httpd, but using fabric for all data access

  • Fabric

OTP library application (no processes) responsible for clustered versions of CouchDB core API calls Quorum logic, view merging, etc. Provides a clean Erlang interface to BigCouch

CHTTPD Fabric Rexi Mem3 Embedded CouchDB Mochiweb, Spidermonkey, etc.

Friday, 25 January 13

slide-17
SLIDE 17

17

Mem3

  • Maintains the shard mapping for each clustered database in a

node-local CouchDB database

  • Changes in the node registration and shard mapping databases are

automatically replicated to all cluster nodes

CHTTPD Fabric Rexi Mem3 Embedded CouchDB Mochiweb, Spidermonkey, etc.

Friday, 25 January 13

slide-18
SLIDE 18

18

Rexi

CHTTPD Fabric Rexi Mem3 Embedded CouchDB Mochiweb, Spidermonkey, etc.

  • BigCouch makes a large number of parallel RPCs
  • Erlang RPC library not designed for heavy parallelism

promiscuous spawning of processes responses directed back through single process on remote node requests block until remote ‘rex’ process is monitored

  • Rexi removes some of the safeguards

in exchange for lower latencies

no middlemen on the local node remote process responds directly to client remote process monitoring occurs

  • ut-of-band

Friday, 25 January 13

slide-19
SLIDE 19

FUTURE

19

Friday, 25 January 13

slide-20
SLIDE 20

20

BIGCOUCH HAS NO FUTURE

Friday, 25 January 13

slide-21
SLIDE 21

21

THE FUTURE IS COUCHDB

Friday, 25 January 13

slide-22
SLIDE 22

22

WE’RE MERGING

Friday, 25 January 13

slide-23
SLIDE 23

THE MERGE

  • Release BigCouch 0.5.0
  • Release Apache CouchDB 1.3.0
  • Merge them
  • Release Apache CouchDB 2.0.0 (couchdb strikes back)

23

Friday, 25 January 13

slide-24
SLIDE 24

SUMMARY

  • BigCouch: putting the ‘C’ back in CouchDB
  • Consistent hashing for database sharding (a la Dynamo)
  • True horizontal scalability with CouchDB
  • Download now and get started

https://github.com/cloudant/bigcouch.git

24

Friday, 25 January 13