Writing Datomic in Clojure Rich Hickey Overview What is Datomic? - - PowerPoint PPT Presentation
Writing Datomic in Clojure Rich Hickey Overview What is Datomic? - - PowerPoint PPT Presentation
Writing Datomic in Clojure Rich Hickey Overview What is Datomic? Architecture Implementation - Clojure Applied Summary What is Datomic? A database A sound model of information, with time Provides database as a value
Overview
- What is Datomic?
- Architecture
- Implementation - Clojure Applied
- Summary
What is Datomic?
- A database
- A sound model of information, with time
- Provides database as a value to applications
- Bring declarative programming to
applications
- Focus on reducing complexity
Why Datomic?
- Architecture
- Data Model
Architectures
Queries Transactions Consistency Storage
Server App App App App App App App App
Client-Server
Server App App App App App App App App
Client-Server
Architectures
Queries Transactions Consistency Storage
Server Server App App App App App App App App
Clustered Client-Server
Server Server App App App App App App App App
Clustered Client-Server
Architectures
Queries Transactions Consistency Storage
App App App App App App App App Server Server Server
Sharded Client-Server
App App App App App App App App Server Server Server
Sharded Client-Server
Architectures
Queries Transactions Consistency Storage
App App App App App App App App Server Server Server
Sharded Client-Server
App App App App App App App App Server Server Server
Sharded Client-Server
Queries Transactions Consistency Storage
Architectures
Queries Transactions Consistency Storage
App App App App App App App App Server Server Server
K/V Store
App App App App App App App App Server Server Server
K/V Store
Queries Transactions Consistency Storage
Architectures
Queries Transactions Consistency Storage
App App App App App App App App Server Server Server
K/V Store
App App App App App App App App Server Server Server
K/V Store
Queries Transactions Consistency Storage Queries Transactions Consistency Storage
Architectures
Queries Transactions Consistency Storage
App App App App App App App App Service Storage Distributed
K/V Store
App App App App App App App App Service Storage Distributed
K/V Store
Queries Transactions Consistency Storage Queries Transactions Consistency Storage
App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App
Datomic Architecture
App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App
Datomic Architecture
Queries Transactions Consistency Storage
App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App
Datomic Architecture
Queries Transactions Consistency Storage
App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App
Datomic Architecture
Transactor
Queries Transactions Consistency Storage
App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App
Datomic Architecture
Transactor
Queries Transactions Consistency Storage Queries Transactions Consistency Storage
App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App
Datomic Architecture
Transactor
Queries Transactions Consistency Storage Queries Transactions Consistency Storage
App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App
Datomic Architecture
Transactor
Queries Transactions Consistency Storage Queries Transactions Consistency Storage
App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App
Datomic Architecture
Transactor
Queries Transactions Consistency Storage Queries Transactions Consistency Storage
App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App
Datomic Architecture
Transactor
Queries Transactions Consistency Storage Queries Transactions Consistency Storage
The Database, Deconstructed
T raditional DB Datomic
Storage Service App Process D Peer Lib b,c,e a,d,e a,b,d D Transactor Indexing Trans- actions Query Cache App Data Data Data segments Live Index Data Segments Data Segments Server Indexing Trans- actions Query App Process I/O App Strings DDL + DML Result Sets Storage cache
Designed for the Cloud
- Ephemeral instances, unreliable disks
- Redundancy in storage service
- Leverages reliable storage services
- e.g. DynamoDB, Riak
Elastic Scaling
- More peers, more power
- Fewer peers, less power, lower cost
- Demand-driven
- No configuration
Get Y
- ur Own Brain
- Query, communication and memory engine
- Goes into your app, making it a peer
- The db is effectively local
- Ad hoc, long running queries - ok
Logic
- Declarative search and business logic
- The query language is Datalog
- Simple rules and data patterns
- Joins are implicit, meaning is evident
- db and non-db sources
Perception
- Obtain a queue of transactions
- not just your own
- Query transactions for filtering/triggering
Consistency
- ACID transactions add new facts
- Database presented to app as a value
- Data in storage service is immutable
Programmability
- T
ransactions/Rules/Queries/Results are data
- Extensible types, predicates, etc
- Queries can invoke your code
A Database of Facts
- A single storage construct, the datom
- Entity/Attribute/V
alue/T ransaction
- Attribute definition is the only 'schema'
Adaptability
- Sparse, irregular, hierarchical data
- Single and multi-valued attributes
- No structural rigidity
Time Built-in
- Every datom retains its transaction
- T
ransactions are totally ordered
- T
ransactions are first-class entities
- Get the db as-of, or since, a point in time
v1 F v2 F v3 F v4
Process events (pure functions)
Observers/perception/memory
States (immutable values)
Identity (succession of states)
Clojure Time Model
v1 F v2 F v3 F v4
Process events (pure functions)
Observers/perception/memory
States (immutable values)
Identity (succession of states)
Datomic Time Model
DB Connection Transactions DB Values Queries
Implementation
Datomic Architecture
App Server Process Peer Lib Query Cache App Live Index Comm App Server Process Peer Lib Query Cache App Live Index Comm Transactor Indexing Trans- actions App Server Process Peer Lib Query Cache App Live Index Comm Transactor Indexing Trans- actions Data Segments Data Segments Redundant segment storage Storage Service Segment storage memcached cluster (optional) standby
State
- Immutable, expanding value
- Must be organized to support query
Index
- Sorted set of facts
- Maintaining sort live in storage - bad
- BigTable - mem + storage merge
- occasional merge into storage
- persistent trees
Memory Index
- New persistent sorted set
- Large internal nodes
- Pluggable comparators
- 2 sorts always maintained
- EAVT, AEVT
- plus AVET, VAET
Storage
- Log of tx asserts/retracts (in tree)
- V
arious covering indexes (trees)
- Storage requirements
- Data segment values (K->V)
- atoms (consistent read)
- pods (conditional put)
Index in Storage
Sorted Datoms Index Root
- f key->dir
T 42 VeAET AEVT AVET Lucene EAVT dirs segs Index ref
Identity Value
What’s in a DB V alue?
EAVT t VeAET AEVT db atom nextT asOfT Lucene index history live Lucene sinceT index db value live Storage Hierarchical Cache Roots Memory index (live window) Storage-backed index
Identity Value
DB V alues
- Time travel and more
- db.asOf - past, db.since - windowed
- db.with(tx) - speculative
- db.filter(pred) - slice
- dbs are arguments to query, not implicit
- mock with datom-shaped data:
[[:fred :likes "Pizza"] [:sally :likes "Ice cream"]]
Process
- Assert/retract can’t express transformation
- T
ransaction function:
(f db & args) -> tx-data
- tx-data: assert|retract|(tx-fn args...)
- Expand/splice until all assert/retracts
Process Expansion
+ + + + foo
- baz
+ + + + bar
- ...
+ + +
- +
+ + + +
T ransactor
- Accepts transactions
- Expands, applies, logs, broadcasts
- Periodic indexing, in background
- Indexing creates garbage
- Storage GC
T ransactor Implementation
- HornetQ for transaction communication
- Extensive internal pipelining - j.u.c. queues
- Async message decompression
- transaction expansion/application
- encoding for, communication with storage
- Java interop to storage APIs
Indexing
- Extensive use of laziness
- Parallel processing
- Parallel I/O
- Async, rejoins via queue
Declarative Programming
- Embedded Datalog
- Takes data sources and rule sets as args
- Extended to work with scalars/collections
- Expression clauses call your code
Datalog Implementation
- Data driven, in and out
- Query/Subquery Recursive (QSQR)
- Dynamic, set oriented
- DB joins leverage indexes
- Expressions use Clojure compiler
- caching of transforms at all stages
Over Here
- Peers directly access storage service
- Have own query engine
- Have live mem index and merging
- Two-tier cache
- Segments (on/off heap)
- Datoms w/object values (on heap)
Peer Implementation
- HornetQ for transaction communication
- Google Guava caches
- Java APIs for storage
- Entities are like multimaps
- key -> value(s)
- reverse attrs
Testing
- test.generative was born here
- Functional tests
- Simulation-based testing
What’s not in Clojure?
- Fressian (serialization library)
- Java API
stubs only
- Clojure API written in terms of Java API
Simplicity is Agility
- Key protocols extremely small (< 7 fns)
- Memory, embedded SQL, remote SQL,
Infinispan, DynamoDB
- Move from our own dynamo cluster to
DynamoDB:
- 2 weeks
- Support PostgreSQL, Infinispan
- 1 day each
Leverage
✓Read/print data ✓Embedded language ✓Runtime compilation ✓Extend standard interfaces/protocols ✓Interop ✓State model - extended
Summary
- Clojure was made for this kind of app
- Fast enough at all levels
- Most key subsystems < 1000 lines
- A ton of concurrency, no sweat
- Leverage interop - Hornetq, Guava etc
- Clojure startup time could be better
- Datomic is Simple