The Codex BUILDING A GRAPH OF HISTORY What is Codex? v - - PowerPoint PPT Presentation

the codex
SMART_READER_LITE
LIVE PREVIEW

The Codex BUILDING A GRAPH OF HISTORY What is Codex? v - - PowerPoint PPT Presentation

The Codex BUILDING A GRAPH OF HISTORY What is Codex? v Text-as-a-Graph with the aim to achieve the deep integration of text and data v Manages standoff property documents linked to graph entities v Graph meta-model defined by composition from


slide-1
SLIDE 1

The Codex

BUILDING A GRAPH OF HISTORY

slide-2
SLIDE 2

What is Codex?

vText-as-a-Graph with the aim to achieve the deep integration of text and data v Manages standoff property documents linked to graph entities v Graph meta-model defined by composition from statements and meta-relations vAn easily extensible system where new annotations and new entities can be integrated v Text annotations are high resolution and multidimensional

slide-3
SLIDE 3

Technology

Neo4jClient API Neo4jClient-Vector extensions

SPEEDY

slide-4
SLIDE 4

Core entities

q Texts/Standoff Properties: annotations q Agents: entities built up through composition q Statements: events, traits (“aspect-oriented ontology”) q Meta-Relations: dynamic, bi-directional, hierarchical (higher-order) q Properties: agent data-points (latitude, longitude, height, weight, etc) q Time: fuzzy dates (degrees of precision) q Concepts: shared vocabulary (the glue); hierarchical labels

slide-5
SLIDE 5

Text as a graph

The graph representation of the text can itself be stored in a graph database, like Neo4j. Using standoff properties, graph queries (using the Cypher language) are mappable directly to the parts of the text in question. Because properties are overlappable, a text can be graphed along many axes, whether it be entities, events, concepts, commentaries, or even ASTs generated by NLPs.

slide-6
SLIDE 6

Text as a Graph

The goal

  • To fully annotate a text you need to be able to represent overlapping ranges

The problem

  • In HTML/XML a range is represented by a node or an element
  • However, HTML/XML can only represent nodes in a tree structure, whereas annotations cross nodes and are better mapped

to graphs, not trees

  • XML elements are document-based rather than connected in a database

The solution

  • Separate the text and annotations, don’t embed properties in the text like HTML/XML does
  • A property is a data-structure that represents a text range (e.g., 0 -> 10) and a type (e.g., ‘italics’, ‘place’, ‘person’)
  • Properties that are not embedded in text are called ‘standoff properties’
  • Standoff property nodes act as connective tissue between the plain text and the graph meta-model
slide-7
SLIDE 7

XML overlap

slide-8
SLIDE 8

XML overlap

slide-9
SLIDE 9

XML overlap

slide-10
SLIDE 10

Standoff properties

1. Don’t suffer from the XML overlap problem 2. Stored externally from the text stream, which is left in a plain format 3. Supports annotations inside words, of single characters, and between characters 4. Annotation layers (or strata) can be exported or imported 5. Supports multidimensional queries across overlapping annotations and layers

slide-11
SLIDE 11

Standoff property graph

slide-12
SLIDE 12

Meta- relations

Meta-relations are ØUser-created ØBi-directional Ø No need to choose between “parent_of” or “child_of” in the model Ø One query can bring back both parties Ø Each part is a noun rather than a verb (“parent” vs “parent_of”) ØLink to a relationship graph for higher-order relationships Ø Family: parent/child; sibling; married; son-in-law/mother-in-law Ø Friendship: friend; close friend; girlfriend/boyfriend; correspondent

slide-13
SLIDE 13

Meta-relations

  • IsDominant: true/false
slide-14
SLIDE 14

Statements

A statement is a quasi-grammatical complex used to represent simple events or predicates, roughly resembling an RDF triple or WikiData SNAK. Ø Composed (optionally) of one or more Agents, a Concept, and a Time node Ø Construction is quasi-grammatical as parts are related mainly with prepositions (SUBJECT, OBJECT, WITH, AT, ON, UNDER, NEAR, etc) Ø E.g., Subject: The Arno action: was in flood at: Florence according-to: Luca Landucci

  • n: 1498/11/24
slide-15
SLIDE 15

Statements

  • AgentRole: SUBJECT; OBJECT; AT; WITH; ACCORDING-TO; etc
slide-16
SLIDE 16

Aspect-

  • riented
  • ntology

The statement complex can also be used to express ontological claims. While the claim that the Renaissance preacher “Girolamo Savonarola is a man” can be represented with an “is a” relationship, what about other aspects of Savonarola? An “is a” relation tells us nothing about who made the claim or when it was made. It also conditions us towards class-type classifications which conform to ontologies. We can use trait statements to capture aspects …

slide-17
SLIDE 17

Neo4jClient

  • Switches seamlessly between REST and BOLT interfaces
  • Cypher expression builder (keywords mostly)
  • Safe parameterisation of values
  • Powerful deserialiser turns JSON results into complex objects
slide-18
SLIDE 18

Neo4jClient-Vector

  • Adds extension methods to Neo4jClient
  • Adds Vector<> class to generate Cypher paths
  • Node labels
  • Relationship types and directions
  • Reversible relationships (subset_of_concept <-> children_of_concept)
  • Overloadable relationships (text_has_standoff_property -> text_has_sentence_standoff_property)
  • Removes the noise to put the focus on the path patterns
  • (lex)-(head), (t)-(asp)-(a)
slide-19
SLIDE 19

Neo4jClient-Vector

If(condition, func) makes it simpler to branch Cypher expressions based on form input …

slide-20
SLIDE 20

Pagination

Extending the projection class from Search<T> enables the results to be paginated. An ORDER BY expression builder simplifies conditional ordering.

slide-21
SLIDE 21

Contact

If you have any questions, or would be interested in trialling Codex as an alpha-user, I can be contacted as below. Iian Neill

  • Email: iian.d.neill@gmail.com
  • Twitter: codexeditor
slide-22
SLIDE 22

The Hunger Games

  • Q1. What is a standoff property?
  • Q2. What is a meta-relation?
  • Q3. What is a Codex statement?

https://docs.google.com/forms/d/e/1FAIpQLSeiY5YBj2ir_Jmi4f6GaIbyAzQcAZjYLsAJ4r- Wbr5zScJAww/viewform?usp=sf_link