The Database as a Value Rich Hickey What is Datomic? A functional - - PowerPoint PPT Presentation

the database as a value
SMART_READER_LITE
LIVE PREVIEW

The Database as a Value Rich Hickey What is Datomic? A functional - - PowerPoint PPT Presentation

The Database as a Value Rich Hickey What is Datomic? A functional database A sound model of information, with time Provides database as a value to applications Bring declarative programming to applications Focus on reducing


slide-1
SLIDE 1

The Database as a Value

Rich Hickey

slide-2
SLIDE 2

What is Datomic?

  • A functional database
  • A sound model of information, with time
  • Provides database as a value to applications
  • Bring declarative programming to applications
  • Focus on reducing complexity
slide-3
SLIDE 3

DB Complexity

  • Stateful, inherently
  • Same query, different results
  • no basis
  • Over there
  • ‘Update’ poorly defined
  • Places
slide-4
SLIDE 4

Manifestations

  • Wrong programs
  • Scaling problems
  • Round-trip fears
  • Fear of overloading server
  • Coupling, e.g. questions with reporting
slide-5
SLIDE 5

Coming to Terms

Value

  • An immutable

magnitude, quantity, number... or immutable composite thereof

Identity

  • A putative entity we

associate with a series of causally related values (states) over time

State

  • Value of an identity at a

moment in time

Time

  • Relative before/after
  • rdering of causal values
slide-6
SLIDE 6

v1 F v2 F v3 F v4

Process events (pure functions)

Observers/perception/memory

States (immutable values)

Identity (succession of states)

Epochal Time Model

slide-7
SLIDE 7

Implementing Values

  • Persistent data structures
  • Trees
  • Structural sharing
slide-8
SLIDE 8

Structural Sharing

Past Next

slide-9
SLIDE 9

Process events (pure functions)

Observers/perception/memory Identity (succession of states)

Place Model

DB Connection Transactions Queries

The Database Place

F F F

slide-10
SLIDE 10

v1 F v2 F v3 F v4

Process events (pure functions)

Observers/perception/memory

States (immutable values)

Identity (succession of states)

Epochal Time Model

DB Connection Transactions DB Values Queries

slide-11
SLIDE 11

2 Notions of DB

slide-12
SLIDE 12

2 Notions of DB

  • Database system
  • facilitates the process of creating, sharing,

growing db values

  • a machine
  • has identity
slide-13
SLIDE 13

2 Notions of DB

  • Database system
  • facilitates the process of creating, sharing,

growing db values

  • a machine
  • has identity
  • Database values
  • the things with which we compute
slide-14
SLIDE 14

DB as Process

DB Process Novelty Computation Request fn(?) Result

slide-15
SLIDE 15

DB as Process

DB Process Novelty Computation Request fn(?) Result

What’s allowed? Reproducible results? How to use more than one db?

slide-16
SLIDE 16

Functional DB Process

DB Process Novelty DB Values

slide-17
SLIDE 17

Functional DB Process

DB Process Novelty DB Values

Where’s computation?

slide-18
SLIDE 18

Functional DB Process

DB Process Novelty DB Values

Where’s computation? Separate from process!

slide-19
SLIDE 19

Functional DB Computation

slide-20
SLIDE 20

Functional DB Computation

DB Value fn(db) Result

slide-21
SLIDE 21

Functional DB Computation

DB Value fn(db) Result DB Value fn(db, db)

slide-22
SLIDE 22

Functional DB Computation

DB Value fn(db) Result DB Value DB Value fn(db, db)

slide-23
SLIDE 23

Value Propositions

  • Just data
  • language-independent
  • aggregate, compose
  • Persistent data structures
  • alias freedom
  • efficient incremental ‘change’
slide-24
SLIDE 24

One Structure, Many Functions

  • Datalog queries
  • Other query langs
  • Direct index access
  • seek + scan
  • Entity navigation
slide-25
SLIDE 25

Speculation

  • What-if scenarios
  • Just drop to backtrack
  • Datomic’s “with”

dbval tx-data -> dbval

  • Try before you buy/transact
  • Tree propagation
slide-26
SLIDE 26

Time Travel

  • Accretive values contain all history
  • Query as-of and/or since a point in time
  • Query across time
slide-27
SLIDE 27

Testing

  • Flowing connections around, ugh

ambient connection pool no different

  • Reproducibility
  • Values can easily be fabricated/generated
slide-28
SLIDE 28

Stable Bases

  • Same query, same results
  • db permalinks!
  • communicable, recoverable
  • Multiple conversations about same value

//Peer Database db = connection.db().asOf(1000); Peer.q(aQuery, db); //Client GET /data/mem/test/1000/datoms?index=aevt

basis

slide-29
SLIDE 29

Datomic Datalog

q(query, db1, db2, otherInputs ...); {:find [?customer ?product] :where [[?customer :shipAddress ?addr] [?addr :zip ?zip] [?product :product/weight ?weight] [?product :product/price ?price] [(Shipping/estimate ?zip ?weight) ?shipCost] [(<= ?price ?shipCost)]]}

  • dbs are arguments to query, not implicit
slide-30
SLIDE 30

DB Values

  • Time travel and more
  • db.asOf - past, db.since - windowed
  • db.with(tx) - speculative
  • db.filter(pred) - slice
  • mock with datom-shaped data:

[[:fred :likes "Pizza"] [:sally :likes "Ice cream"]]

slide-31
SLIDE 31

Implementation

slide-32
SLIDE 32

Traditional Database

cache Server Indexing Trans- actions Query App Process I/O App ORM? Caching policy? Strings DDL + DML Result Sets Serialized ??? Serialized ??? Disk

slide-33
SLIDE 33

The Choices

  • Coordination
  • how much, and where?
  • process requires it
  • perception shouldn’t
  • Immutability
  • sine qua non
slide-34
SLIDE 34

Approach

  • Move to information model
  • Split process and perception
  • Immutable basis in storage
  • Novelty in memory
slide-35
SLIDE 35

Information

  • Inform
  • ‘to convey knowledge via facts’
  • ‘give shape to (the mind)’
  • Information
  • the facts
slide-36
SLIDE 36
  • Fact - ‘an event or thing known to have

happened or existed’

  • From: factum - ‘something done’
  • Must include time
  • Remove structure (a la RDF)
  • Atomic Datom
  • Entity/Attribute/Value/Transaction

Facts

slide-37
SLIDE 37

Database State

  • The database as an expanding value
  • An accretion of facts
  • The past doesn’t change - immutable
  • Process requires new space
  • Fundamental move away from places
slide-38
SLIDE 38

Accretion

  • Root per transaction doesn’t work
  • Latest values include past as well
  • The past is sub-range
  • Important for information model
slide-39
SLIDE 39

Datomic Architecture

App Server Process Peer Lib Query Cache App Live Index Comm App Server Process Peer Lib Query Cache App Live Index Comm Transactor Indexing Trans- actions App Server Process Peer Lib Query Cache App Live Index Comm Transactor Indexing Trans- actions Data Segments Data Segments Redundant segment storage Storage Service Segment storage memcached cluster (optional) standby

slide-40
SLIDE 40

Indexing

  • Maintaining sort live in storage - bad
  • BigTable et al:
  • Accumulate novelty in memory
  • Current view: mem + storage merge
  • Occasional integrate mem into storage

Releases memory

slide-41
SLIDE 41

Transactions and Indexing

Index Merging Trans- actions Log Data Segments Live Index Index Data Segments Storage Novelty

slide-42
SLIDE 42

Perception

Live Index Storage Index Data Segments Novelty

slide-43
SLIDE 43

Process

  • Reified
  • Primitive representation of novelty
  • Assertions and retractions of facts
  • Minimal
  • Other transformations expand into those
slide-44
SLIDE 44

Process

  • Assert/retract can’t express transformation
  • Transaction function:

(f db & args) -> tx-data

  • tx-data: assert|retract|(tx-fn args...)
  • Expand/splice until all assert/retracts
slide-45
SLIDE 45

Process Expansion

+ + + + foo

  • baz

+ + + + bar

  • ...

+ + +

  • +

+ + + +

slide-46
SLIDE 46

Memory Index

  • Persistent sorted set
  • Large internal nodes
  • Pluggable comparators
  • 2 sorts always maintained
  • EAVT, AEVT
  • plus AVET,

VAET

slide-47
SLIDE 47

Storage

  • Log of tx asserts/retracts (in tree)
  • Various covering indexes (trees)
  • Storage service/server requirements
  • Data segment values (K->V)
  • atoms (consistent read)
  • pods (conditional put)
slide-48
SLIDE 48

Index in Storage

Sorted Datoms Index Root

  • f key->dir

T 42 VeAET AEVT AVET Lucene EAVT dirs segs Index ref

Identity Value

slide-49
SLIDE 49

What’s in a DB Value?

EAVT t VeAET AEVT db atom nextT asOfT Lucene index history live Lucene sinceT index db value live Storage Hierarchical Cache Roots Memory index (live window) Storage-backed index

Identity Value

slide-50
SLIDE 50

Functional DB Benefits

  • Epochal state
  • Coordination only for process
  • Transactions well defined
  • Functional accretion
  • Freedom to relocate/scale storage, query
  • Extensive caching
  • Process events
slide-51
SLIDE 51

Thanks for Listening!