The Database as a Value Rich Hickey Complexity Out of the Tar Pit - - PowerPoint PPT Presentation

the database as a value
SMART_READER_LITE
LIVE PREVIEW

The Database as a Value Rich Hickey Complexity Out of the Tar Pit - - PowerPoint PPT Presentation

The Database as a Value Rich Hickey Complexity Out of the Tar Pit Moseley and Marks (2006) Complexity caused by state and control Close the loop - process DB Complexity Stateful, inextricably Same query, different results


slide-1
SLIDE 1

The Database as a Value

Rich Hickey

slide-2
SLIDE 2

Complexity

  • Out of the Tar Pit

Moseley and Marks (2006)

  • Complexity caused by state and control
  • Close the loop - process
slide-3
SLIDE 3

DB Complexity

  • Stateful, inextricably
  • Same query, different results
  • no basis
  • Over there
  • ‘Update’ poorly defined
  • Places
slide-4
SLIDE 4

Basis

  • Calculation and decision making:

may involve multiple components may visit a component more than once

  • Broken by simultaneous change
slide-5
SLIDE 5

Update

  • What does update mean?
  • Does the new replace the old?
  • New ?? replace the old ??
  • Visibility?
slide-6
SLIDE 6

Manifestations

  • Wrong programs
  • Scaling problems
  • Round-trip fears
  • Fear of overloading server
  • Coupling, e.g. questions with reporting
slide-7
SLIDE 7

The Choices

  • Coordination
  • how much, and where?
  • process requires it
  • perception shouldn’t
  • Immutability
  • sine qua non
slide-8
SLIDE 8

Coming to Terms

Value

  • An immutable

magnitude, quantity, number... or immutable composite thereof

Identity

  • A putative entity we

associate with a series of causally related values (states) over time

State

  • Value of an identity at a

moment in time

Time

  • Relative before/after
  • rdering of causal values
slide-9
SLIDE 9

v1 F v2 F v3 F v4

Process events (pure functions)

Observers/perception/memory

States (immutable values)

Identity (succession of states)

Epochal Time Model

slide-10
SLIDE 10

Implementing Values

  • Persistent data structures
  • Trees
  • Structural sharing
slide-11
SLIDE 11

Structural Sharing

Past Next

slide-12
SLIDE 12

Process events (pure functions)

Observers/perception/memory Identity (succession of states)

Place Model

DB Connection Transactions Queries

The Database Place

F F F

slide-13
SLIDE 13

v1 F v2 F v3 F v4

Process events (pure functions)

Observers/perception/memory

States (immutable values)

Identity (succession of states)

Epochal Time Model

DB Connection Transactions DB Values Queries

slide-14
SLIDE 14

Database State

  • The database as an expanding value
  • An accretion of facts
  • The past doesn’t change - immutable
  • Process requires new space
  • Fundamental move away from places
slide-15
SLIDE 15

Accretion

  • Root per transaction doesn’t work
  • Crossing processes and time
  • Can’t convey/find/maintain roots
  • Can’t do global GC
  • Instead, latest values include past as well
  • The past is sub-range
  • Important for information model
slide-16
SLIDE 16
  • Remove structure
  • a la RDF
  • Atomic
  • Datom
  • Entity/Attribute/Value/Transaction
  • Must include time

Facts

slide-17
SLIDE 17

Process

  • Reified
  • Primitive representation of novelty
  • Assertions and retractions of facts
  • Minimal
  • Other transformations expand into those
slide-18
SLIDE 18

Implementation

slide-19
SLIDE 19

State

  • Must be organized to support query
  • Sorted set of facts
  • Maintaining sort live in storage - bad
  • BigTable - mem + storage merge
  • occasional merge into storage
  • persistent trees
slide-20
SLIDE 20

Accumulate + Merge

Storage Indexes DB Engine Memory Indexes Index Merge Log Transaction Processing

slide-21
SLIDE 21

Datomic Architecture

App Process Peer Lib Query Cache App Live Index Comm Transactor Indexing Trans- actions Data Segments Data Segments Redundant segment storage Storage Service Segment storage

slide-22
SLIDE 22

Memory Index

  • Persistent sorted set
  • Large internal nodes
  • Pluggable comparators
  • 2 sorts always maintained
  • EAVT, AEVT
  • plus AVET,

VAET

slide-23
SLIDE 23

Storage

  • Log of tx asserts/retracts (in tree)
  • Various covering indexes (trees)
  • Storage requirements
  • Data segment values (K->V)
  • atoms (consistent read)
  • pods (conditional put)
slide-24
SLIDE 24

What’s in a DB Value?

EAVT t VeAET AEVT db atom nextT asOfT Lucene index history live Lucene sinceT index db value live Storage Hierarchical Cache Roots Memory index (live window) Storage-backed index

Identity Value

slide-25
SLIDE 25

Index Storage

Sorted Datoms Index Root

  • f key->dir

T 42 VeAET AEVT AVET Lucene EAVT Storage Service dirs segs

slide-26
SLIDE 26

Process

  • Assert/retract can’t express transformation
  • Transaction function:

(f db & args) -> tx-data

  • tx-data: assert|retract|(tx-fn args...)
  • Expand/splice until all assert/retracts
slide-27
SLIDE 27

Process Expansion

+ + + + foo

  • baz

+ + + + bar

  • ...

+ + +

  • +

+ + + +

slide-28
SLIDE 28

Transactor

  • Accepts transactions
  • Expands, applies, logs, broadcasts
  • Periodic indexing, in background
  • Indexing creates garbage
  • Storage GC
slide-29
SLIDE 29

Peers

  • Peers directly access storage service
  • Have own query engine
  • Have live mem index and merging
  • Two-tier cache
  • Segments (on/off heap)
  • Datoms w/object values (on heap)
slide-30
SLIDE 30

DB Simplicity

  • Epochal state
  • Coordination only for process
  • Same query, same results
  • stable bases
  • Transactions well defined
  • Functional accretion
slide-31
SLIDE 31

Other Benefits

  • Communicable, recoverable basis
  • Freedom to relocate/scale storage, query
  • Time travel - db.asOf, db.since, db.asIf
  • Queries comparing times
  • Process events
slide-32
SLIDE 32

The Database as a Value

  • Dramatically less complex
  • More powerful
  • More scalable
  • Better information model
slide-33
SLIDE 33

Thanks for Listening!