with RethinkDB Ilya Verbitskiy Ilya Verbitskiy Distributed - - PowerPoint PPT Presentation

with rethinkdb
SMART_READER_LITE
LIVE PREVIEW

with RethinkDB Ilya Verbitskiy Ilya Verbitskiy Distributed - - PowerPoint PPT Presentation

Agile web-development with RethinkDB Ilya Verbitskiy Ilya Verbitskiy Distributed systems, application security, fintech ilya@verbitskiy.co @ilich_x86 https://github.com/ilich 2 Demo https://github.com/ilich/rethinkdb-101 3


slide-1
SLIDE 1

Agile web-development with RethinkDB

Ilya Verbitskiy

slide-2
SLIDE 2

2

Ilya Verbitskiy

  • Distributed systems, application security, fintech
  • ilya@verbitskiy.co
  • @ilich_x86
  • https://github.com/ilich
slide-3
SLIDE 3

3

Demo

https://github.com/ilich/rethinkdb-101

slide-4
SLIDE 4

4

What is RethinkDB?

  • Open-source database for building realtime web applications.
  • NoSQL database that stores schemaless JSON documents.
  • Distributed database that is easy to scale.
  • High availability database with automatic failover and robust fault

tolerance.

  • The second most popular database on GitHub
slide-5
SLIDE 5

5

The Good

  • Changefeeds.
  • Map-reduce.
  • Geospatial queries.
  • Collaborative web and mobile apps.
  • Streaming analytics apps.
  • Multiplayer games.
  • Realtime marketplaces.
  • Connected devices.
slide-6
SLIDE 6

6

The Bad

  • RethinkDB is not a good choice if you need full ACID support or strong

schema enforcement.

  • If you are doing deep, computationally-intensive analytics you are better
  • ff using a system like Hadoop.
  • In some cases RethinkDB trades off write availability in favor of data

consistency.

slide-7
SLIDE 7

7

RethinkDB vs. …

  • MongoDB
  • Firebase
slide-8
SLIDE 8

8

Can I use my programming language?

  • JavaScript/Node.js
  • Python
  • Ruby
  • Java
  • C#/.NET
  • C++
  • Go
  • PHP
  • … and even more on

https://rethinkdb.com/docs/install- drivers/

slide-9
SLIDE 9

9

RethinkDB Structure

  • Database → Table → Document
  • Document is a schemaless JSON documents.
slide-10
SLIDE 10

10

Introduction to ReQL

  • ReQL is the RethinkDB query language.
  • ReQL key principles:
  • ReQL embeds into your programming language.
  • All ReQL queries are chainable.
  • All queries execute on the server.
  • Good starting point if you already know SQL:

https://www.rethinkdb.com/docs/sql-to-reql/javascript/

slide-11
SLIDE 11

11

Understanding ReQL

  • Client driver translates ReQL queries RethinkDB protocol and sends to

the server for execution.

  • Anonymous function must return a valid ReQL expression.
  • In JavaScript you should use lt and gt commands instead of < and >
  • perators.
slide-12
SLIDE 12

12

Supported data types

  • Number
  • String (UTF-8)
  • Boolean
  • Null
  • Object
  • Array (by default, up to 100,000 elements)
  • Dates and times
  • Binary objects
  • Geometry objects and geospatial queries (indexes, GeoJSON support)
slide-13
SLIDE 13

13

Data modeling in RethinkDB

  • Embedded arrays
  • Similar to MongoDB
  • Queries are simpler.
  • The data is often colocated on disk. If you have a dataset that doesn’t fit into RAM,

data is loaded from disk faster.

  • Any update to the main document atomically updates both the main data and the

linked data.

  • Up to 100,000 elements by default.
  • Deleting, adding or updating a document requires loading the entire array, modifying

it, and writing the entire document back to disk.

  • Because of the previous limitation, it’s best to keep the size of the array to no more

than a few hundred documents.

slide-14
SLIDE 14

14

Data modeling in RethinkDB

  • Multiple tables
  • Similar to SQL
  • Operations on parent document don’t require loading the data for every child

document for a given parent into memory.

  • There is no limitation on the number of child documents, so this approach is more

suitable for large amounts of data.

  • The queries linking the data tend to be more complicated.
  • With this approach you cannot atomically update both the parent data and the child

data.

slide-15
SLIDE 15

15

Changefeeds

  • Changefeeds allow clients to receive changes on a table.
  • The changes command returns a cursor that receives updates.
  • Each update includes the new and old value of the modified record.
  • Changefeeds cannot guarantee delivery, since they are unidirectional with

no acknowledgement returned from clients.

slide-16
SLIDE 16

16

// Node.js r.table('users').changes().run(conn, function(err, cursor) { // Use cursor to process changes }); …. // Sample cursor {

  • ld_val: null,

new_val: { { "city" : "MINNEAPOLIS", "state" : "MN", "_id" : "55311" }, } }

slide-17
SLIDE 17

17

Commands supporting changefeeds

  • filter
  • getAll
  • map
  • pluck
  • between
  • union
  • min
  • max
  • orderBy.limit
slide-18
SLIDE 18

18

Sharding and replication

  • RethinkDB is designed for clustering and easy scalability.
  • To add a new server to the cluster, just launch it with the --join parameter.
  • Configure sharding and replication per table.
  • Any feature that works with a single database will work in a sharded

cluster.

slide-19
SLIDE 19

19

Sharding and replication

  • There is a hard limit of 64 shards.
  • All sharding is currently done based on the table’s primary key only.
  • RethinkDB uses system statistics for the table to find the optimal set of

split points to break up the table evenly

  • Sharding and replication is configured through table configurations
  • Number of shards
  • Number of replicas
  • Replicas can be associated with servers using server tags.
  • Tags are assigned to a server using --server-tag parameter
  • Use rebalance command to rebalances the shards of a table.
  • Use reconfigure command to setup a table’s sharding and replication.
slide-20
SLIDE 20

20

RethinkDB Security

  • There is a little chance to have an injection attack against RethinkDB because it

embeds into your programming language.

  • Make sure that you use the latest database drivers!
  • Be careful with .match() function. It may cause regular expression injection

attack.

  • Do not use r.js(‘…’) to execute JavaScript code on the server. It is vulnerable to

JavaScript injection attack.

  • Use TLS encryption.
  • By default, admin account does not have password. Always run you primary

server with --initial-password parameter.

  • You cannot set password to administrator web-interface. Make sure it is behind

firewall or bound to localhost (--bind-http parameter).

slide-21
SLIDE 21

21

Additional Resources

  • RethinkDB: https://www.rethinkdb.com/
  • RethinkDB installation: https://www.rethinkdb.com/docs/install/
  • Thirty-second quickstart: https://www.rethinkdb.com/docs/quickstart/
  • Ten-minute guide: https://www.rethinkdb.com/docs/guide/javascript/
  • Cookbook: https://www.rethinkdb.com/docs/cookbook/javascript/
  • Cheat sheet: https://www.rethinkdb.com/docs/sql-to-reql/javascript/
slide-22
SLIDE 22

22

Questions?