Why I chose mongodb for guardian.co.uk Mat Wall Lead Software - - PowerPoint PPT Presentation

why i chose mongodb for guardian co uk
SMART_READER_LITE
LIVE PREVIEW

Why I chose mongodb for guardian.co.uk Mat Wall Lead Software - - PowerPoint PPT Presentation

Why I chose mongodb for guardian.co.uk Mat Wall Lead Software Architect, guardian.co.uk It is not the strongest of the species that survives, nor the most intelligent. It is the one that is most adaptable to change. Early Period circa


slide-1
SLIDE 1

Why I chose mongodb for guardian.co.uk

Mat Wall Lead Software Architect, guardian.co.uk

slide-2
SLIDE 2

“It is not the strongest of the species that survives, nor the most intelligent. It is the one that is most adaptable to change.”

slide-3
SLIDE 3

Early Period circa ’95 The “Lash It Together” era

slide-4
SLIDE 4

Early Period (95, the “Lash It Together” era) Perl, CGI, apache Experimental Manual processes Bespoke software RDBMS, scripts & static files

slide-5
SLIDE 5

Mid Period circa ’00 The “Vendor CMS” era

slide-6
SLIDE 6

Mid Period: 2000s (The “Vendor CMS era”) Vignette / AOLserver TCL, Apache, Oracle Platform for online publishing Initially scales well with acceleration in delivery

  • f features
slide-7
SLIDE 7

Mid Period: 2000s (The “Vendor CMS era”) Surprise! Vendor’s CMS doesn’t do what we want! Mish-mash in templates: HTML, JavaScript, TCL, SQL, PL-SQL No model in app tier, only in RDBMS schema created in Oracle Designer

slide-8
SLIDE 8

Mid Period: 2000s (The “Vendor CMS era”)

slide-9
SLIDE 9

Mid Period: 2000s (The “Vendor CMS era”)

slide-10
SLIDE 10

Mid Period: 2000s (The “Vendor CMS era”) After a few years, very difficult to extend Database schema becomes fixed due to dependencies in templates

slide-11
SLIDE 11

Mid Period: 2000s (The “Vendor CMS era”) If you can’t change the system:

slide-12
SLIDE 12

Modern Period circa ’05-09 The “J2EE Monolithic” era

slide-13
SLIDE 13
slide-14
SLIDE 14

I bring you NEWS!!!

App server App server App server Web server Web server Web server CMS Data feeds

Oracle

slide-15
SLIDE 15

I bring you NEWS!!!

App server App server App server Web server Web server Web server CMS Data feeds

Oracle Modern java app Spring / Hibernate DDD / TDD Strong model in java Database abstracted away with ORM

slide-16
SLIDE 16

Problems

slide-17
SLIDE 17

Each release involves schema upgrade Schema upgrade = downtime for journalists

slide-18
SLIDE 18

Complexity still increasing: 300+ tables, 10,000 lines of hibernate XML config 1,000 domain objects mapped to database 70,000 lines of domain object code Very tight binding to database

slide-19
SLIDE 19

ORM not really masking complexity: Database has strong influence on domain model: many domain objects made more complex mapping joins in RDBMS Complex hibernate features used, interceptors, proxies Complex caching strategy Lots of optimisations And: We still hand code complex queries in SQL!

slide-20
SLIDE 20

Load becoming an issue RDBMS difficult to scale

slide-21
SLIDE 21

Partial NoSQL circa ’09-10 The “Sticking Plaster” era

slide-22
SLIDE 22

Introduce yet more caching to patch up load problems Decouple applications from database by building APIs Power APIs using alternative, more scalable technologies APIs used to scale out database reads Writes still go to RDBMs

slide-23
SLIDE 23

App server Web servers CMS Memcached (20Gb) Solr

Core

Solr/API Solr/API Solr/API Solr/API Solr/API

Cloud, EC2

M/Q

Api

rdbms

slide-24
SLIDE 24

Mutualised news!

Content API Read API delivered using Apache Solr Hosted in EC2 Document oriented search engine Loose schema: records, fields, facets Scales well for read operations

slide-25
SLIDE 25

Introduction of memcached Related content from Solr

slide-26
SLIDE 26

Mutualised news!

We’ve solved our load problem (for now) but Increased our complexity

slide-27
SLIDE 27

Mutualised news!

We now have 3 models! RDBMS tables Java Objects JSON API

slide-28
SLIDE 28

Mutualised news!

slide-29
SLIDE 29

Mutualised news!

slide-30
SLIDE 30

Mutualised news!

slide-31
SLIDE 31

Mutualised news!

JSON API is very simple Multiple domain concepts expressed in single document Can be designed in forwardly extensible way What if the JSON API was our primary model?

slide-32
SLIDE 32

Full NoSQL in development The “It’s the future!” era

slide-33
SLIDE 33

The first project: Identity Current login/registration system still in TCL/PL-SQL 3M+ users in relational database Very complex schema + PL-SQL New system required Can we migrate from Oracle to NoSql?

slide-34
SLIDE 34

Database selection Simple keystore. Too simple? Huge scalability. Do we need it? Schema design difficult. Simple to use, can execute similar queries to RDBMs

slide-35
SLIDE 35

Mutualised news!

MongoDB Document oriented database Stores parsed JSON documents Can express complex queries Can be flexible about consistency Malleable schema: can easily change at runtime Can work at both large & small scales

slide-36
SLIDE 36

Mutualised news!

MongoDB concepts

RDBMS MongoDB

Table Collection Row JSON Document Index Index Join Embedding & Linking Partition Shard

slide-37
SLIDE 37

Mutualised news!

Flexible Schema

slide-38
SLIDE 38

Mutualised news!

Flexible Schema

slide-39
SLIDE 39

Mutualised news!

Flexible Schema Can easily represent different classes of tag as documents Both documents can be inserted into same collection Far simpler than equivalent hibernate mapped subclass configuration

slide-40
SLIDE 40

Mutualised news!

Flexible Schema Simple to query:

slide-41
SLIDE 41

Mutualised news!

Flexible Schema Simple to query: Query operators: $ne, $nin, $all, $exists, $gt, $lt, $gte ...

slide-42
SLIDE 42

Mutualised news!

Modifying the schema

slide-43
SLIDE 43

Mutualised news!

Modifying the schema

slide-44
SLIDE 44

Mutualised news!

Modifying the schema

slide-45
SLIDE 45

Mutualised news!

Schema upgrades Schema can be upgraded simply by upgrading the application version Application must deal with differing document versions Can become complex over time

slide-46
SLIDE 46

Mutualised news!

Schema upgrades This can be mitigated by: Adding a “version” key to each document Updating the version each time the application modifies a document Using MapReduce capability to forcibly migrate documents from older versions if required

slide-47
SLIDE 47

mongod

Mongodb architecture Single node Durability only possible in upcoming 1.8 release (databse fsync from buffer every min)

slide-48
SLIDE 48

mongod

Mongodb architecture Replica set

mongod

master replicas

mongod mongod mongod

Can choose to read & write from master for full consistency Can choose to run reads

  • n slaves to scale reads
slide-49
SLIDE 49

mongod

Mongodb architecture Replica set

mongod

master replicas

mongod mongod mongod

Can choose to read & write from master for full consistency Can choose to accept dirty reads from slaves to scale reads Durability achieved (<1.8) via replication Reads can be scaled out onto replicas (eventual consistency) All writes to master If master fails, new master nominated by election DB drivers handle most cluster complexity

slide-50
SLIDE 50

mongos shard shard shard shard replica replica replica replica replica replica replica replica

Mongodb architecture consistent inconsistent (replica) (master) Aggregator

replica replica replica replica

slide-51
SLIDE 51

mongos shard shard shard shard replica replica replica replica replica replica replica replica

Mongodb architecture consistent inconsistent (replica) (master) Aggregator

replica replica replica replica

Writes scaled by sharding Shards populated by ranges mongos queries appropriate shard(s) Shards automatically balanced Developers (essentially) unaware of shards

slide-52
SLIDE 52

Mongodb durability Relies (pre 1.8) on replication for durability 1.8 features optional journaling & redo logs Database users need to be cluster aware, each query can specify: No error checking / write confirmation Write confirmed on master Write replicated to N slave servers

slide-53
SLIDE 53

Mutualised news!

Old Idenity system Hundreds of tables & stored procedures New Identity model

User List Fields Dates Statuses

Text Date/Time Boolean

slide-54
SLIDE 54
slide-55
SLIDE 55

Very simple domain objects Simple, flexible objects No hibernate session

slide-56
SLIDE 56

Very simple domain objects Flexible schema embraced in domain object design

slide-57
SLIDE 57

Very simple domain objects Using casbah scala drivers = significant reduction in LOC vs SQL implementation

slide-58
SLIDE 58

Build API that can support both backends

Registration app guardian.co.uk API Oracle MongoDB

slide-59
SLIDE 59

Build API that can support both backends

Registration app guardian.co.uk API Oracle MongoDB

This bit is hard!

slide-60
SLIDE 60

Migrate using API & decommision

Registration app guardian.co.uk API MongoDB

slide-61
SLIDE 61

Add new stuff!

Registration app guardian.co.uk MongoDB Solr? API Redis?

slide-62
SLIDE 62

MongoDB Simple, flexible schema with similar query & indexing to RDBMS Great at small or large scale Easy for developers to get going Commercial support available (10Gen) One day may power all of guardian.co.uk No transactions / joins: developers must cater for this Produces a net reduction in lines of code / complexity

slide-63
SLIDE 63

Shameless plug We’re hiring: http://www.careersatgnl.co.uk