Writing Datomic in Clojure Rich Hickey Datomic, Clojure Overview - - PowerPoint PPT Presentation

writing datomic in clojure
SMART_READER_LITE
LIVE PREVIEW

Writing Datomic in Clojure Rich Hickey Datomic, Clojure Overview - - PowerPoint PPT Presentation

Writing Datomic in Clojure Rich Hickey Datomic, Clojure Overview What is Datomic? Architecture Implementation - Clojure Applied Summary What is Datomic? A new kind of database Bringing data power into the application


slide-1
SLIDE 1

Writing Datomic in Clojure

Rich Hickey Datomic, Clojure

slide-2
SLIDE 2

Overview

  • What is Datomic?
  • Architecture
  • Implementation - Clojure Applied
  • Summary
slide-3
SLIDE 3

What is Datomic?

  • A new kind of database
  • Bringing data power into the application
  • A sound model of information, with time
  • Enabled by architectural advances
slide-4
SLIDE 4

Why Datomic?

  • Architecture
  • Data Model
slide-5
SLIDE 5

Architectures

Queries Transactions Consistency Storage

Server App App App App App App App App

Client-Server

Server App App App App App App App App

Client-Server

slide-6
SLIDE 6

Architectures

Queries Transactions Consistency Storage

Server Server App App App App App App App App

Clustered Client-Server

Server Server App App App App App App App App

Clustered Client-Server

slide-7
SLIDE 7

Architectures

Queries Transactions Consistency Storage

App App App App App App App App Server Server Server

Sharded Client-Server

App App App App App App App App Server Server Server

Sharded Client-Server

slide-8
SLIDE 8

Architectures

Queries Transactions Consistency Storage

App App App App App App App App Server Server Server

Sharded Client-Server

App App App App App App App App Server Server Server

Sharded Client-Server

Queries Transactions Consistency Storage

slide-9
SLIDE 9

Architectures

Queries Transactions Consistency Storage

App App App App App App App App Server Server Server

K/V Store

App App App App App App App App Server Server Server

K/V Store

Queries Transactions Consistency Storage

slide-10
SLIDE 10

Architectures

Queries Transactions Consistency Storage

App App App App App App App App Server Server Server

K/V Store

App App App App App App App App Server Server Server

K/V Store

Queries Transactions Consistency Storage Queries Transactions Consistency Storage

slide-11
SLIDE 11

Architectures

Queries Transactions Consistency Storage

App App App App App App App App Service Storage Distributed

K/V Store

App App App App App App App App Service Storage Distributed

K/V Store

Queries Transactions Consistency Storage Queries Transactions Consistency Storage

slide-12
SLIDE 12

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

slide-13
SLIDE 13

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Queries Transactions Consistency Storage

slide-14
SLIDE 14

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Queries Transactions Consistency Storage

slide-15
SLIDE 15

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Transactor

Queries Transactions Consistency Storage

slide-16
SLIDE 16

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Transactor

Queries Transactions Consistency Storage Queries Transactions Consistency Storage

slide-17
SLIDE 17

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Transactor

Queries Transactions Consistency Storage Queries Transactions Consistency Storage

slide-18
SLIDE 18

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Transactor

Queries Transactions Consistency Storage Queries Transactions Consistency Storage

slide-19
SLIDE 19

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Transactor

Queries Transactions Consistency Storage Queries Transactions Consistency Storage

slide-20
SLIDE 20

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Transactor

Queries Transactions Consistency Storage Queries Transactions Consistency Storage

slide-21
SLIDE 21

The Database, Deconstructed

T raditional DB Datomic

Storage Service App Process D Peer Lib b,c,e a,d,e a,b,d D Transactor Indexing Trans- actions Query Cache App Data Data Data segments Live Index Data Segments Data Segments Server Indexing Trans- actions Query App Process I/O App Strings DDL + DML Result Sets Storage cache

slide-22
SLIDE 22

Designed for the Cloud

  • Ephemeral instances, unreliable disks
  • Redundancy in storage service
  • Leverages reliable storage services
  • e.g. DynamoDB
slide-23
SLIDE 23

Elastic Scaling

  • More peers, more power
  • Fewer peers, less power, lower cost
  • Demand-driven
  • No configuration
slide-24
SLIDE 24

Get Y

  • ur Own Brain
  • Query, communication and memory engine
  • Goes into your app, making it a peer
  • The db is effectively local
  • Ad hoc, long running queries - ok
slide-25
SLIDE 25

Logic

  • Declarative search and business logic
  • The query language is Datalog
  • Simple rules and data patterns
  • Joins are implicit, meaning is evident
  • db and non-db sources
slide-26
SLIDE 26

Perception

  • Obtain a queue of transactions
  • not just your own
  • Query transactions for filtering/triggering
slide-27
SLIDE 27

Consistency

  • ACID transactions add new facts
  • Database presented to app as a value
  • Data in storage service is immutable
slide-28
SLIDE 28

Programmability

  • T

ransactions/Rules/Queries/Results are data

  • Extensible types, predicates, etc
  • Queries can invoke your code
slide-29
SLIDE 29

A Database of Facts

  • A single storage construct, the datom
  • Entity/Attribute/V

alue/T ransaction

  • Attribute definition is the only 'schema'
slide-30
SLIDE 30

Adaptability

  • Sparse, irregular, hierarchical data
  • Single and multi-valued attributes
  • No structural rigidity
slide-31
SLIDE 31

Time Built-in

  • Every datom retains its transaction
  • T

ransactions are totally ordered

  • T

ransactions are first-class entities

  • Get the db as-of, or since, a point in time
slide-32
SLIDE 32

Implementation

slide-33
SLIDE 33

Architecture

App Process Peer Lib Query Cache App Live Index Comm Transactor AMI Indexing Trans- actions Data Segments Storage Service (Dynamo DB)

SSD SSD

Data Segments Redundant segment storage

slide-34
SLIDE 34

State

  • Immutable, expanding value
  • Must be organized to support query
  • Sorted set of facts
  • Maintaining sort live in storage - bad
  • BigTable - mem + storage merge
  • occasional merge into storage
  • persistent trees
slide-35
SLIDE 35

Memory Index

  • New persistent sorted set
  • Large internal nodes
  • Pluggable comparators
  • 2 sorts always maintained
  • EAVT, AEVT
  • plus AVET, VAET
slide-36
SLIDE 36

Storage

  • Log of tx asserts/retracts (in tree)
  • V

arious covering indexes (trees)

  • Storage requirements
  • Data segment values (K->V)
  • atoms (consistent read)
  • pods (conditional put)
slide-37
SLIDE 37

Index Storage

Sorted Datoms Index Root

  • f key->dir

T 42 VeAET AEVT AVET Lucene EAVT Storage Service dirs segs

slide-38
SLIDE 38

What’s in a DB V alue?

EAVT t VeAET AEVT db atom Lucene index history live Lucene sinceT asOfT key->id ids id->key keys index being-indexed db value live Storage Hierarchical Cache Roots Memory index (live window) Storage-backed index

slide-39
SLIDE 39

Process

  • Assert/retract can’t express transformation
  • T

ransaction function: (f db & args) -> tx-data

  • tx-data: assert|retract|(tx-fn args...)
  • Expand/splice until all assert/retracts
slide-40
SLIDE 40

Process Expansion

+ + + + foo

  • baz

+ + + + bar

  • ...

+ + +

  • +

+ + + +

slide-41
SLIDE 41

T ransactor

  • Accepts transactions
  • Expands, applies, logs, broadcasts
  • Periodic indexing, in background
  • Indexing creates garbage
  • Storage GC
slide-42
SLIDE 42

T ransactor Implementation

  • HornetQ for transaction communication
  • Extensive internal pipelining - j.u.c. queues
  • Async message decompression
  • transaction expansion/application
  • encoding for, communication with storage
  • Java interop to storage APIs
slide-43
SLIDE 43

Indexing

  • Extensive use of laziness
  • Parallel processing
  • Parallel I/O
  • Async, rejoins via queue
slide-44
SLIDE 44

Declarative Programming

  • Embedded Datalog
  • Takes data sources and rule sets as args
  • Extended to work with scalars/collections
  • Expression clauses call your code
slide-45
SLIDE 45

Datalog Implementation

  • Data driven, in and out
  • Query/Subquery Recursive (QSQR)
  • Dynamic, set oriented
  • DB joins leverage indexes
  • Expressions use Clojure compiler
  • caching of transforms at all stages
slide-46
SLIDE 46

Over Here

  • Peers directly access storage service
  • Have own query engine
  • Have live mem index and merging
  • Two-tier cache
  • Segments (on/off heap)
  • Datoms w/object values (on heap)
slide-47
SLIDE 47

Peer Implementation

  • HornetQ for transaction communication
  • Google Guava caches
  • Java APIs for storage
  • Entities are like multimaps
  • key -> value(s)
  • reverse attrs
slide-48
SLIDE 48

Consistency and Scale

  • Process/writes go through transactor
  • traditional server scaling/availability
  • Immutability supports consistent reads
  • without transactions
  • scale reads turning knobs on storage
  • Query scales with peers
  • dynamic e.g. auto-scaling
slide-49
SLIDE 49

Testing

  • test.generative was born here
  • Functional tests
  • Simulation-based testing
slide-50
SLIDE 50

Simplicity is Agility

  • Key protocols extremely small (< 7 fns)
  • Memory, embedded SQL, remote SQL,

Infinispan, DynamoDB

  • Move from our own dynamo cluster to

DynamoDB:

  • 2 weeks
  • Support PostgreSQL, Infinispan
  • 1 day each
slide-51
SLIDE 51

Leverage

✓Read/print data ✓Embedded language ✓Runtime compilation ✓Extend standard interfaces/protocols ✓Interop ✓State model - extended

slide-52
SLIDE 52

Summary

  • Clojure was made for this kind of app
  • Fast enough at all levels
  • Most key subsystems < 1000 lines
  • A ton of concurrency, no sweat
  • Leverage interop - Hornetq, Guava etc
  • Startup time could be better
  • Datomic is Simple