functional thinking
applying the philosophy of functional programming to system design & architecture Jed Wesley-Smith @jedws
functional thinking applying the philosophy of functional - - PowerPoint PPT Presentation
functional thinking applying the philosophy of functional programming to system design & architecture Jed Wesley-Smith @jedws please, ask questions functional programming has many bene fj ts: better program reasonability, composition,
applying the philosophy of functional programming to system design & architecture Jed Wesley-Smith @jedws
functional programming has many benefjts: better program reasonability, composition, refactorability and performance yet, the dominant models & paradigms for software architecture and building software systems today remain rooted in mutation and side-effects many of the ideas and principles of functional programming have been applied to solve design problems including security, concurrency, auditing and robustness it is possible and desirable to apply them to all of the systems we build, and gain practical advantage from doing so
the fundamental absurdity at the heart of programming
is a low-level execution plan for specifjc hardware, not a fundamentally important strategy to how we should write programs it only makes any sense as an execution strategy when associated with the most common Von Neumann style hardware architectures it makes less sense at any higher-level of abstraction, but we learn fairly early on that this is how programming works! it makes even less sense as a way we should design software in the large
the biggest problem is that we can only know one value for x: what it is
what it was the value of x is ephemeral, we forget what it was!
we all know global mutable variables are to be avoided unfortunately, many of our common storage systems use exactly the same paradigm
UPDATE person WHERE person_id = 123 SET phone_no = “+61 2 9876 5432”
same goes for writing to a fjle same goes for most REST interfaces… pure functional programming, in the large, rejects this approach to programming!
f : A -> B
relates one value from its domain: A to exactly one value from its codomain: B always the same – or equivalent – value and nothing else! this is also known as a pure function, because programming defjnes impure ones too
immutable, values cannot change easily shareable without concern for concurrent modifjcation referentially transparent expressions can be replaced with their computed value the state of a thing in an instant in time is a value
–Rich Hickey, Simple Made Easy
“we invented mutable values, we must uninvent them”
what we think of as the things around us; you, me, the plants and animals, rivers and mountains, are identities identities are things we name we are used to thinking of the world in terms of identities, they are the objects in our world
since the time of Plato and Aristotle, philosophers have posited true reality as timeless, based on permanent substances, while processes are denied or subordinated to timeless substances if Socrates changes, becoming sick, Socrates is still the same, and change (his sickness) only glides over his substance: change is accidental, whereas the substance is essential.
“ ”
http://en.wikipedia.org/wiki/Process_philosophy
– Heraclitus
no one ever steps in the same river twice, for it's not the same river and it's not the same person
requesting the current time is not a function, it always gives a different answer! as we are functional programmers, we recognise now is a side-effect we usually model side-effects as explicit things, commonly via a type such as IO
now :: IO Time (java) public IO<Time> now() IO is a value describing how to perform a side-effect which we can run later now is a pure function as it returns a value
we can store entire versions, or we can store deltas, or patches* they are equivalent being in possession of any two allows us to traverse time
* http://liamoc.net/posts/2015-11-10-patch-theory.html
it wasn’t that long ago that computation was expensive, disk storage was expensive, DRAM was expensive, but coordination with latches/locks was cheap now, all these have changed using cheap computation (with many-core), cheap commodity disks, and cheap DRAM and SSD coordination with latches/locks gets harder because latch latency loses lots of instruction
now much more expensive increasingly, applications are distributed, often globally, however we still use paradigms invented in the old-world with old-world assumptions
the new world is increasingly distributed distribution brings enormous problems, including increased latency and unreliability conventional consensus techniques (locks and transactions) impose intolerable constraints:
https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html
https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html
– C. A. R. Hoare
“the unavoidable price of reliability is simplicity.”
the fjrst functional architecture, codifjed in 1492 by Luca Pacioli central to bookkeeping — then and now — are its three books: the memorandum, the journal and the ledger the memorandum records all transactions as they happen the journal records the detailed transcription of these transactions, involving debiting and crediting specifjc accounts the ledger is generated by posting the journal entries to the individual accounts balance is fundamental and is key to the correctness protocol
event sourcing is a name given to the practice of storing a journal, or stream, of changes the changes can be deltas or full versions, depending on efficiency and other factors it is possible to reconstruct the state of entities in the stream at any point in time serves as a complete history or audit log of changes a single event stream usually serves as a unit of consistency, or shard, and may have one or more entities contained within it
given a source of events and a function to fold (or reduce) over them we can produce a “current” version of a value, or construct a view at any previous time given the ability to save event values, we can continuously feed new events into our fold function, using the old persisted value as the seed and produce our new “mutated” value, which we can also persist this persistence strategy is now decoupled from the source of the data (the events) we can have multiple views of our data, each of which can be tailored for specifjc purposes, such as query optimised storage
we can store events that patch, or “mutate” the previous value additionally, we can store derived facts, such as the complete value after a number of updates – sometimes known as an epoch event sourcing fjts very well with Command/Query Request Separation (CQRS) as an architectural practice – allowing separate deployment of specialised query services, and tailored load-balancing strategies for different access patterns an event stream is easily distributable, with various strategies available for consensus on write, depending on requirements
fjles are stored at an address computed from their content: a content hash names are associated with a hash retrieval looks up the current hash for a name, then accessing the content stored at that address update adds new content, then a new (name, hash) pair caches only cache content at a hash, not at a name, avoiding concurrency issues
non-linear development, branching/merging distributed development, changes must be shareable between repositories that are not necessarily connected cryptographic authentication of history, the ability to uniquely identify the complete development history of any change to the resources in a repository
content is stored as a directed acyclic graph (DAG) of content, where some of the content is the repository fjle content, and some is meta-data, including fjle-trees and commits all content is stored using a secure hash of the fjle trees store lists of fjle and directory names and links to their content in the form of other trees (for sub-directories) or fjle blobs commits are stored using a hash of the contained meta-data, including tree hash, author, date, parent commits & additional optional data such as signatures verifying authenticity
updates add new deltas, or a full version known as a pack all old versions are reconstructable the same content produces the same hash, equivalent updates commute data-structure is (mostly) immutable mutable pointer to head of a branch
presents a mutable fjle-system “view” of an immutable structure a commit includes author, date and parent commits (via their hash), providing a cryptographically secure signature of content and history all content, commits and content, are immutable shareable values, enabling simple distribution between multiple repositories unreferenced data is easily garbage-collectible via simple tree-walk from content roots (the branch heads)
full-text indexing and search needs to maintain a stable searchable “view” of an index in the face of concurrent updates
an index is a collection of Documents a document is a collection of Fields and has an ID an index is updated by deleting and re-adding documents searching is done via a Searcher – for its lifetime, a searcher will see the state of the index as it was when it was opened
an index is made of Segment fjles segments contain documents deleting a document adds the document ID to a per-segment “.del” fjle – ie. it doesn’t modify the segment fjle directly when no searchers reference a segment with many deleted documents, it may be be merged with others into a new segment containing the remaining documents –
segment 1
document 1 document 2 document 3 document 4 document 5 document 9 document 8 document 6 document 7 document 0
segment 2
document 11 document 12 document 13 document 14 document 15 document 19 document 18 document 16 document 17 document 10
segment 1 segment 2
document 1 document 2 document 3 document 4 document 5 document 9 document 8 document 6 document 7 document 0 document 11 document 12 document 13 document 14 document 15 document 19 document 18 document 16 document 17 document 10
searcher1
segment 1 segment 2 searcher segment 3
document 1 document 2 document 3 document 4 document 5 document 9 document 8 document 6 document 7 document 0 document 11 document 12 document 13 document 14 document 15 document 19 document 18 document 16 document 17 document 10 document 21 document 22 document 20 document 3 document 8 document 11
searcher1
segment 1
document 1 document 2 document 3 document 4 document 5 document 9 document 8 document 6 document 7 document 0
segment 2
document 11 document 12 document 13 document 14 document 15 document 19 document 18 document 16 document 17 document 10
searcher segment 3
document 21 document 22 document 20 document 3 document 8 document 11
searcher1
segment 1 segment 2 searcher1 searcher2 segment 3
document 1 document 2 document 3 document 4 document 5 document 9 document 8 document 6 document 7 document 0 document 11 document 12 document 13 document 14 document 15 document 19 document 18 document 16 document 17 document 10 document 3 document 8 document 11 document 21 document 22 document 20
immutable everything, including servers:
idempotent updates ReactiveJava/RX (JavaScript) programming model
source: Pat Helland http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
avoid mutation of your core data, don’t use an eraser! values replace – or occlude – previous values store changes apply changes to construct a mutable temporal view apply these ideas to your entire system architecture profjt!