Sharing Data, Hiding Complexity with RDF and Clojure Malcolm Sparks - - PowerPoint PPT Presentation

sharing data hiding complexity
SMART_READER_LITE
LIVE PREVIEW

Sharing Data, Hiding Complexity with RDF and Clojure Malcolm Sparks - - PowerPoint PPT Presentation

Sharing Data, Hiding Complexity with RDF and Clojure Malcolm Sparks James Henderson TechMesh 2012 Sharing Data, Data: Setting the scene Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP


slide-1
SLIDE 1

Sharing Data, Hiding Complexity

with RDF and Clojure Malcolm Sparks James Henderson TechMesh 2012

slide-2
SLIDE 2

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Data: Setting the scene

Financial, regulatory pressures in banking are presenting IT with new challenges.

slide-3
SLIDE 3

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Data: Setting the scene

Financial, regulatory pressures in banking are presenting IT with new challenges. How can we process data :-

slide-4
SLIDE 4

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Data: Setting the scene

Financial, regulatory pressures in banking are presenting IT with new challenges. How can we process data :-

◮ Faster (as always)

slide-5
SLIDE 5

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Data: Setting the scene

Financial, regulatory pressures in banking are presenting IT with new challenges. How can we process data :-

◮ Faster (as always) ◮ At greater volumes (risk sensitivities - HistSim)

slide-6
SLIDE 6

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Data: Setting the scene

Financial, regulatory pressures in banking are presenting IT with new challenges. How can we process data :-

◮ Faster (as always) ◮ At greater volumes (risk sensitivities - HistSim) ◮ More precisely (finer granualarity - Dodd Frank initial

margin)

slide-7
SLIDE 7

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Conway’s Law

“Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these

  • rganizations”

Mel Conway (1968)

slide-8
SLIDE 8

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Conway in action

Processes Storage

Bob

Trade Database Custom

Alice

Extraction NFS CORBA/SQL Pricing Risk Database RMI Publication XML Loading Oracle Coherence Java SFTP

Figure : An example communication structure

Q: Do Bob and Alice get the same view of the data?

slide-9
SLIDE 9

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

What does it all mean?

CommonReference Narrative NumericId LegalEntity ManagementArea GLRegion LocalBankingTrading AggregationLevel SettlementEngineSource ISIN CUSIP PrivatePlacement OtherSecurityId FinancialProductType ReportingCurve InternalExternalFlag BuySell Fas133Flag SpvId MtnId LinkageReference LinkageReference2 NodeRef ReasonCode YNFlag

Observation: Data doesn’t travel well

slide-10
SLIDE 10

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

What went wrong?

Processes Storage

Bob

Trade Database Custom

Alice

Extraction NFS CORBA/SQL Pricing Risk Database RMI Publication XML Loading Oracle Coherence Java SFTP

Conclusion: We need our data models to span across our systems.

slide-11
SLIDE 11

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Does modelling scale?

As we scale up :-

slide-12
SLIDE 12

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Does modelling scale?

As we scale up :-

◮ Complexity increases

slide-13
SLIDE 13

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Does modelling scale?

As we scale up :-

◮ Complexity increases ◮ Deviations increase

slide-14
SLIDE 14

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Does modelling scale?

As we scale up :-

◮ Complexity increases ◮ Deviations increase ◮ Concepts become more abstract

slide-15
SLIDE 15

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Does modelling scale?

As we scale up :-

◮ Complexity increases ◮ Deviations increase ◮ Concepts become more abstract ◮ Understanding decreases

slide-16
SLIDE 16

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Does modelling scale?

As we scale up :-

◮ Complexity increases ◮ Deviations increase ◮ Concepts become more abstract ◮ Understanding decreases

As data escapes its domain model :-

slide-17
SLIDE 17

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Does modelling scale?

As we scale up :-

◮ Complexity increases ◮ Deviations increase ◮ Concepts become more abstract ◮ Understanding decreases

As data escapes its domain model :-

◮ It loses assumptions (meaning)

slide-18
SLIDE 18

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Does modelling scale?

As we scale up :-

◮ Complexity increases ◮ Deviations increase ◮ Concepts become more abstract ◮ Understanding decreases

As data escapes its domain model :-

◮ It loses assumptions (meaning) ◮ Local identifiers fail to become global ones.

slide-19
SLIDE 19

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Implications

◮ Our primary tool for flexible data processing (data

modelling) does not scale!

slide-20
SLIDE 20

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Implications

◮ Our primary tool for flexible data processing (data

modelling) does not scale!

◮ We are fire-fighting this problem with a code hosepipe!

slide-21
SLIDE 21

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Introduction to RDF

The cat the mat sat on

http://animals.com/ginger http://actions.com/satOn http://tech-mesh-hotel.co.uk/mat ; rdf:type http://animals.com/Cat ; . http://actions.com/satOn rdfs:label "sat on" ; rdfs:comment "The action of sitting on a given object." ; .

slide-22
SLIDE 22

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

RDF features

slide-23
SLIDE 23

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

RDF features

◮ URIs - globally unique identification and transparency

slide-24
SLIDE 24

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

RDF features

◮ URIs - globally unique identification and transparency ◮ Namespaces (for federated governance)

slide-25
SLIDE 25

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

RDF features

◮ URIs - globally unique identification and transparency ◮ Namespaces (for federated governance) ◮ Storage and transfer neutrality

slide-26
SLIDE 26

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

RDF features

◮ URIs - globally unique identification and transparency ◮ Namespaces (for federated governance) ◮ Storage and transfer neutrality ◮ Metadata

slide-27
SLIDE 27

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Observations

RDF has the power to break silos RDF is an non-disruptive way of applying federated data governance

slide-28
SLIDE 28

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

The RAP

slide-29
SLIDE 29

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

The RAP

◮ Exposing the data in the ‘ODS’

slide-30
SLIDE 30

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

The RAP

◮ Exposing the data in the ‘ODS’ ◮ The tables in the ODS are built from a meta-model

slide-31
SLIDE 31

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

The RAP

◮ Exposing the data in the ‘ODS’ ◮ The tables in the ODS are built from a meta-model ◮ We get the meta-model in XML format

slide-32
SLIDE 32

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

So why do we transform it to RDF?

slide-33
SLIDE 33

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

So why do we transform it to RDF?

◮ Disambiguation of terms

slide-34
SLIDE 34

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

So why do we transform it to RDF?

◮ Disambiguation of terms ◮ Merging different data sources

slide-35
SLIDE 35

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Exposing a data dictionary

slide-36
SLIDE 36

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Forming a View

slide-37
SLIDE 37

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Forming a View

A View in graph form:

Author Surname attribute Book join - Book-Author Title attribute

slide-38
SLIDE 38

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Forming a View

RDF storage of a view:

views:demo-view <select> [ <node> rap:Author ; <selectedAttribute> [ <attribute> rap:Author-Name ; ] ; <childNode> [ <node> rap:Book ; <relationship> palace:Book-Author ; <selectedAttribute> [ <attribute> rap:Book-Title ; ] ; ] ; ] .

slide-39
SLIDE 39

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Delivering the View

Generate the SQL Receive the SQL results Convert to intermediate data structure JSON XML RDF CSV

slide-40
SLIDE 40

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

The Power of Graphs

slide-41
SLIDE 41

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Storing config in RDF

<hosts/p21> <instance> <instances/PROD1> . <hosts/u11> <instance> <instances/UAT2> . <hosts/d30> <instance> <instances/SIT1> .

slide-42
SLIDE 42

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Storing config in RDF

<hosts/p21> cmdb:hostName "p21.db.com" . <instances/PROD1> <service> <services/PROD1/http> . <services/PROD1/http> cmdb:port 8080 .

slide-43
SLIDE 43

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Why RDF for config?

slide-44
SLIDE 44

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Why RDF for config?

◮ Disambiguation ◮ Meta-data

slide-45
SLIDE 45

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Why RDF for config?

◮ Disambiguation ◮ Meta-data ◮ Querying the config

slide-46
SLIDE 46

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Why RDF for config?

◮ Disambiguation ◮ Meta-data ◮ Querying the config ◮ Multiple config sources

slide-47
SLIDE 47

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Querying the config

Graph-Zip:

https://github.com/james-henderson/graph-zip

slide-48
SLIDE 48

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Querying the config

An example config graph:

<hosts/p21> <instances/PROD1> :instance ’p21.db.com’ :cmdb/hostname <services/PROD1/http> :service <services/PROD1/swank> :service <jvms/PROD1> :jvm 8080 :cmdb/port <HttpService> :rdf/type 4005 :cmdb/port 127.0.0.1 :cmdb/bindTo <SwankService> :rdf/type 2048m :jvmMaxHeap

slide-49
SLIDE 49

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Querying the config

Importing the graph:

(def config-map (build-in-memory-graph [{:subject "<hosts/p21>" :property :hostname :object "p21.db.com"} {:subject "<hosts/p21>" :property :instance :object "<instances/PROD1>"} {:subject "<instances/PROD1>" :property :service :object "<services/PROD1/swank>"} {:subject "<services/PROD1/swank>" :property :port :object "4005"} {:subject "<services/PROD1/swank>" :property :rdf/type :object "<SwankService>"} {:subject "<instances/PROD1>" :property :service :object "<services/PROD1/http>"} {:subject "<services/PROD1/http>" :property :port :object "8080"} {:subject "<services/PROD1/http>" :property :binds-to :object "127.0.0.1"} {:subject "<services/PROD1/http>" :property :rdf/type :object "<HttpService>"} {:subject "<instances/PROD1>" :property :jvm :object "<jvms/PROD1>"} {:subject "<jvms/PROD1>" :property :jvmMaxHeap :object "2048m"}]))

slide-50
SLIDE 50

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Querying the config

An example config graph:

<hosts/p21> <instances/PROD1> :instance ’p21.db.com’ :cmdb/hostname <services/PROD1/http> :service <services/PROD1/swank> :service <jvms/PROD1> :jvm 8080 :cmdb/port <HttpService> :rdf/type 4005 :cmdb/port 127.0.0.1 :cmdb/bindTo <SwankService> :rdf/type 2048m :jvmMaxHeap

Building the zipper:

(def prod-host-zipper (graph-zip config-map "<hosts/p21>"))

slide-51
SLIDE 51

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Querying the config

An example config graph:

<hosts/p21> <instances/PROD1> :instance ’p21.db.com’ :cmdb/hostname <services/PROD1/http> :service <services/PROD1/swank> :service <jvms/PROD1> :jvm 8080 :cmdb/port <HttpService> :rdf/type 4005 :cmdb/port 127.0.0.1 :cmdb/bindTo <SwankService> :rdf/type 2048m :jvmMaxHeap

Navigating the graph:

(zip-> prod-host-zipper :instance :service node) ;; -> ("<services/PROD1/swank>" "<services/PROD1/http>")

slide-52
SLIDE 52

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Querying the config

An example config graph:

<hosts/p21> <instances/PROD1> :instance ’p21.db.com’ :cmdb/hostname <services/PROD1/http> :service <services/PROD1/swank> :service <jvms/PROD1> :jvm 8080 :cmdb/port <HttpService> :rdf/type 4005 :cmdb/port 127.0.0.1 :cmdb/bindTo <SwankService> :rdf/type 2048m :jvmMaxHeap

Finding out the Swank port:

(zip1-> prod-host-zipper :instance :service [(prop= :rdf/type "<SwankService>")] :cmdb/port node) ;; -> "4005"

slide-53
SLIDE 53

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Querying the config

Graph-Zip:

https://github.com/james-henderson/graph-zip

Neo-Zip:

https://github.com/james-henderson/neo-zip

slide-54
SLIDE 54

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Combining Zippers with RAP Views

slide-55
SLIDE 55

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Combining Zippers with RAP Views

In XML:

<hosts> <host> <hostname>p21.db.com</hostname> <instances> <instance> <label>PROD1</label> <services> <service> <type>swank</type> <port>4005</port> <bindsTo>127.0.0.1</bindsTo> </service> <service> <type>http</type> <port>8080</port> </service> </services> </instance> </instances> </host> ... </hosts>

slide-56
SLIDE 56

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Combining Zippers with RAP Views

Using traditional zip syntax:

(for [host (xml-> doc :host)] {:hostname (xml1-> host :hostname text) :instances (for [instance (xml-> host :instances :instance)] {:label (xml1-> instance :label text) :services (for [service (xml-> instance :services :service)] {:type (xml1-> service :type text) :port (xml1-> service :port text) :bindsTo (xml1-> service :bindsTo text)})})})

slide-57
SLIDE 57

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Combining Zippers with RAP Views

Using a datalog syntax:

(let [db (d/db conn)] (for [[host hostname] (d/q ’{:find [?h ?name] :where [[?h :type :host] [?h :hostname ?name]]} db)] {:hostname hostname :instances (for [[instance label] (d/q ’{:find [?inst ?label] :in [$ ?host] :where [[?host :instance ?inst] [?inst :label ?label]]} db host)] {:label label :services (for [[service type port binds-to] (d/q ’{:find [?service ?type ?port ?binds-to] :in [$ ?instance] :where [[?instance :service ?service] [?service :port ?port] [?service :type ?type] [?service :binds-to ?binds-to]]} db instance)] {:type type :port port :binds-to binds-to})})}))

slide-58
SLIDE 58

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Combining Zippers with RAP Views

An enhanced zipper query:

(xml-> doc :host {:hostname ^:1 [:hostname text] :instances [:instances :instance {:instance-name ^:1 [:label text] :services [:services :service {:type ^:1 [:type text] :port ^:1 [:port text] :interface ^:1 [:binds-to text]}]}]})

slide-59
SLIDE 59

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

Combining Zippers with RAP Views

An enhanced zipper query:

(xml-> doc :host {:hostname ^:1 [:hostname text] :instances [:instances :instance {:instance-name ^:1 [:label text] :services [:services :service {:type ^:1 [:type text] :port ^:1 [:port text] :interface ^:1 [:binds-to text]}]}]})

Resulting data structure:

[{:hostname "p21.db.com" :instances [{:instance-name "PROD1" :services [{:type "swank" :port "4005" :interface "127.0.0.1"} {:type "http" :port "8080"}]}]} {:hostname "u11.db.com" :instances [{:instance-name "UAT1" :services ...} {:instance-name "UAT2" :services ...}]} {:hostname "d30.db.com" :instances [{:instance-name "SIT1" :services ...}]}]

slide-60
SLIDE 60

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

slide-61
SLIDE 61

Sharing Data, Hiding Complexity Malcolm Sparks James Henderson Data Loses Fidelity As You Move It RDF and the RAP The Power of Graphs

James Henderson

@jhenderson 89 https://github.com/james-henderson

Malcolm Sparks

@malcolmsparks https://github.com/malcolmsparks