Writing Datomic in Clojure Rich Hickey Overview What is Datomic? - - PowerPoint PPT Presentation

writing datomic in clojure
SMART_READER_LITE
LIVE PREVIEW

Writing Datomic in Clojure Rich Hickey Overview What is Datomic? - - PowerPoint PPT Presentation

Writing Datomic in Clojure Rich Hickey Overview What is Datomic? Architecture Implementation - Clojure Applied Summary What is Datomic? A database A sound model of information, with time Provides database as a value


slide-1
SLIDE 1

Writing Datomic in Clojure

Rich Hickey

slide-2
SLIDE 2

Overview

  • What is Datomic?
  • Architecture
  • Implementation - Clojure Applied
  • Summary
slide-3
SLIDE 3

What is Datomic?

  • A database
  • A sound model of information, with time
  • Provides database as a value to applications
  • Bring declarative programming to

applications

  • Focus on reducing complexity
slide-4
SLIDE 4

Why Datomic?

  • Architecture
  • Data Model
slide-5
SLIDE 5

Architectures

Queries Transactions Consistency Storage

Server App App App App App App App App

Client-Server

Server App App App App App App App App

Client-Server

slide-6
SLIDE 6

Architectures

Queries Transactions Consistency Storage

Server Server App App App App App App App App

Clustered Client-Server

Server Server App App App App App App App App

Clustered Client-Server

slide-7
SLIDE 7

Architectures

Queries Transactions Consistency Storage

App App App App App App App App Server Server Server

Sharded Client-Server

App App App App App App App App Server Server Server

Sharded Client-Server

slide-8
SLIDE 8

Architectures

Queries Transactions Consistency Storage

App App App App App App App App Server Server Server

Sharded Client-Server

App App App App App App App App Server Server Server

Sharded Client-Server

Queries Transactions Consistency Storage

slide-9
SLIDE 9

Architectures

Queries Transactions Consistency Storage

App App App App App App App App Server Server Server

K/V Store

App App App App App App App App Server Server Server

K/V Store

Queries Transactions Consistency Storage

slide-10
SLIDE 10

Architectures

Queries Transactions Consistency Storage

App App App App App App App App Server Server Server

K/V Store

App App App App App App App App Server Server Server

K/V Store

Queries Transactions Consistency Storage Queries Transactions Consistency Storage

slide-11
SLIDE 11

Architectures

Queries Transactions Consistency Storage

App App App App App App App App Service Storage Distributed

K/V Store

App App App App App App App App Service Storage Distributed

K/V Store

Queries Transactions Consistency Storage Queries Transactions Consistency Storage

slide-12
SLIDE 12

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

slide-13
SLIDE 13

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Queries Transactions Consistency Storage

slide-14
SLIDE 14

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Queries Transactions Consistency Storage

slide-15
SLIDE 15

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Transactor

Queries Transactions Consistency Storage

slide-16
SLIDE 16

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Transactor

Queries Transactions Consistency Storage Queries Transactions Consistency Storage

slide-17
SLIDE 17

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Transactor

Queries Transactions Consistency Storage Queries Transactions Consistency Storage

slide-18
SLIDE 18

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Transactor

Queries Transactions Consistency Storage Queries Transactions Consistency Storage

slide-19
SLIDE 19

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Transactor

Queries Transactions Consistency Storage Queries Transactions Consistency Storage

slide-20
SLIDE 20

App App Service Storage Distributed App App App App App App App App Service Storage Distributed App App App App App App

Datomic Architecture

Transactor

Queries Transactions Consistency Storage Queries Transactions Consistency Storage

slide-21
SLIDE 21

The Database, Deconstructed

T raditional DB Datomic

Storage Service App Process D Peer Lib b,c,e a,d,e a,b,d D Transactor Indexing Trans- actions Query Cache App Data Data Data segments Live Index Data Segments Data Segments Server Indexing Trans- actions Query App Process I/O App Strings DDL + DML Result Sets Storage cache

slide-22
SLIDE 22

Designed for the Cloud

  • Ephemeral instances, unreliable disks
  • Redundancy in storage service
  • Leverages reliable storage services
  • e.g. DynamoDB, Riak
slide-23
SLIDE 23

Elastic Scaling

  • More peers, more power
  • Fewer peers, less power, lower cost
  • Demand-driven
  • No configuration
slide-24
SLIDE 24

Get Y

  • ur Own Brain
  • Query, communication and memory engine
  • Goes into your app, making it a peer
  • The db is effectively local
  • Ad hoc, long running queries - ok
slide-25
SLIDE 25

Logic

  • Declarative search and business logic
  • The query language is Datalog
  • Simple rules and data patterns
  • Joins are implicit, meaning is evident
  • db and non-db sources
slide-26
SLIDE 26

Perception

  • Obtain a queue of transactions
  • not just your own
  • Query transactions for filtering/triggering
slide-27
SLIDE 27

Consistency

  • ACID transactions add new facts
  • Database presented to app as a value
  • Data in storage service is immutable
slide-28
SLIDE 28

Programmability

  • T

ransactions/Rules/Queries/Results are data

  • Extensible types, predicates, etc
  • Queries can invoke your code
slide-29
SLIDE 29

A Database of Facts

  • A single storage construct, the datom
  • Entity/Attribute/V

alue/T ransaction

  • Attribute definition is the only 'schema'
slide-30
SLIDE 30

Adaptability

  • Sparse, irregular, hierarchical data
  • Single and multi-valued attributes
  • No structural rigidity
slide-31
SLIDE 31

Time Built-in

  • Every datom retains its transaction
  • T

ransactions are totally ordered

  • T

ransactions are first-class entities

  • Get the db as-of, or since, a point in time
slide-32
SLIDE 32

v1 F v2 F v3 F v4

Process events (pure functions)

Observers/perception/memory

States (immutable values)

Identity (succession of states)

Clojure Time Model

slide-33
SLIDE 33

v1 F v2 F v3 F v4

Process events (pure functions)

Observers/perception/memory

States (immutable values)

Identity (succession of states)

Datomic Time Model

DB Connection Transactions DB Values Queries

slide-34
SLIDE 34

Implementation

slide-35
SLIDE 35

Datomic Architecture

App Server Process Peer Lib Query Cache App Live Index Comm App Server Process Peer Lib Query Cache App Live Index Comm Transactor Indexing Trans- actions App Server Process Peer Lib Query Cache App Live Index Comm Transactor Indexing Trans- actions Data Segments Data Segments Redundant segment storage Storage Service Segment storage memcached cluster (optional) standby

slide-36
SLIDE 36

State

  • Immutable, expanding value
  • Must be organized to support query
slide-37
SLIDE 37

Index

  • Sorted set of facts
  • Maintaining sort live in storage - bad
  • BigTable - mem + storage merge
  • occasional merge into storage
  • persistent trees
slide-38
SLIDE 38

Memory Index

  • New persistent sorted set
  • Large internal nodes
  • Pluggable comparators
  • 2 sorts always maintained
  • EAVT, AEVT
  • plus AVET, VAET
slide-39
SLIDE 39

Storage

  • Log of tx asserts/retracts (in tree)
  • V

arious covering indexes (trees)

  • Storage requirements
  • Data segment values (K->V)
  • atoms (consistent read)
  • pods (conditional put)
slide-40
SLIDE 40

Index in Storage

Sorted Datoms Index Root

  • f key->dir

T 42 VeAET AEVT AVET Lucene EAVT dirs segs Index ref

Identity Value

slide-41
SLIDE 41

What’s in a DB V alue?

EAVT t VeAET AEVT db atom nextT asOfT Lucene index history live Lucene sinceT index db value live Storage Hierarchical Cache Roots Memory index (live window) Storage-backed index

Identity Value

slide-42
SLIDE 42

DB V alues

  • Time travel and more
  • db.asOf - past, db.since - windowed
  • db.with(tx) - speculative
  • db.filter(pred) - slice
  • dbs are arguments to query, not implicit
  • mock with datom-shaped data:

[[:fred :likes "Pizza"] [:sally :likes "Ice cream"]]

slide-43
SLIDE 43

Process

  • Assert/retract can’t express transformation
  • T

ransaction function:

(f db & args) -> tx-data

  • tx-data: assert|retract|(tx-fn args...)
  • Expand/splice until all assert/retracts
slide-44
SLIDE 44

Process Expansion

+ + + + foo

  • baz

+ + + + bar

  • ...

+ + +

  • +

+ + + +

slide-45
SLIDE 45

T ransactor

  • Accepts transactions
  • Expands, applies, logs, broadcasts
  • Periodic indexing, in background
  • Indexing creates garbage
  • Storage GC
slide-46
SLIDE 46

T ransactor Implementation

  • HornetQ for transaction communication
  • Extensive internal pipelining - j.u.c. queues
  • Async message decompression
  • transaction expansion/application
  • encoding for, communication with storage
  • Java interop to storage APIs
slide-47
SLIDE 47

Indexing

  • Extensive use of laziness
  • Parallel processing
  • Parallel I/O
  • Async, rejoins via queue
slide-48
SLIDE 48

Declarative Programming

  • Embedded Datalog
  • Takes data sources and rule sets as args
  • Extended to work with scalars/collections
  • Expression clauses call your code
slide-49
SLIDE 49

Datalog Implementation

  • Data driven, in and out
  • Query/Subquery Recursive (QSQR)
  • Dynamic, set oriented
  • DB joins leverage indexes
  • Expressions use Clojure compiler
  • caching of transforms at all stages
slide-50
SLIDE 50

Over Here

  • Peers directly access storage service
  • Have own query engine
  • Have live mem index and merging
  • Two-tier cache
  • Segments (on/off heap)
  • Datoms w/object values (on heap)
slide-51
SLIDE 51

Peer Implementation

  • HornetQ for transaction communication
  • Google Guava caches
  • Java APIs for storage
  • Entities are like multimaps
  • key -> value(s)
  • reverse attrs
slide-52
SLIDE 52

Testing

  • test.generative was born here
  • Functional tests
  • Simulation-based testing
slide-53
SLIDE 53

What’s not in Clojure?

  • Fressian (serialization library)
  • Java API

stubs only

  • Clojure API written in terms of Java API
slide-54
SLIDE 54

Simplicity is Agility

  • Key protocols extremely small (< 7 fns)
  • Memory, embedded SQL, remote SQL,

Infinispan, DynamoDB

  • Move from our own dynamo cluster to

DynamoDB:

  • 2 weeks
  • Support PostgreSQL, Infinispan
  • 1 day each
slide-55
SLIDE 55

Leverage

✓Read/print data ✓Embedded language ✓Runtime compilation ✓Extend standard interfaces/protocols ✓Interop ✓State model - extended

slide-56
SLIDE 56

Summary

  • Clojure was made for this kind of app
  • Fast enough at all levels
  • Most key subsystems < 1000 lines
  • A ton of concurrency, no sweat
  • Leverage interop - Hornetq, Guava etc
  • Clojure startup time could be better
  • Datomic is Simple
slide-57
SLIDE 57

Thanks for Listening!