The Codex
BUILDING A GRAPH OF HISTORY
The Codex BUILDING A GRAPH OF HISTORY What is Codex? v - - PowerPoint PPT Presentation
The Codex BUILDING A GRAPH OF HISTORY What is Codex? v Text-as-a-Graph with the aim to achieve the deep integration of text and data v Manages standoff property documents linked to graph entities v Graph meta-model defined by composition from
BUILDING A GRAPH OF HISTORY
vText-as-a-Graph with the aim to achieve the deep integration of text and data v Manages standoff property documents linked to graph entities v Graph meta-model defined by composition from statements and meta-relations vAn easily extensible system where new annotations and new entities can be integrated v Text annotations are high resolution and multidimensional
Neo4jClient API Neo4jClient-Vector extensions
SPEEDY
q Texts/Standoff Properties: annotations q Agents: entities built up through composition q Statements: events, traits (“aspect-oriented ontology”) q Meta-Relations: dynamic, bi-directional, hierarchical (higher-order) q Properties: agent data-points (latitude, longitude, height, weight, etc) q Time: fuzzy dates (degrees of precision) q Concepts: shared vocabulary (the glue); hierarchical labels
Text as a graph
The graph representation of the text can itself be stored in a graph database, like Neo4j. Using standoff properties, graph queries (using the Cypher language) are mappable directly to the parts of the text in question. Because properties are overlappable, a text can be graphed along many axes, whether it be entities, events, concepts, commentaries, or even ASTs generated by NLPs.
The goal
The problem
to graphs, not trees
The solution
1. Don’t suffer from the XML overlap problem 2. Stored externally from the text stream, which is left in a plain format 3. Supports annotations inside words, of single characters, and between characters 4. Annotation layers (or strata) can be exported or imported 5. Supports multidimensional queries across overlapping annotations and layers
Meta-relations are ØUser-created ØBi-directional Ø No need to choose between “parent_of” or “child_of” in the model Ø One query can bring back both parties Ø Each part is a noun rather than a verb (“parent” vs “parent_of”) ØLink to a relationship graph for higher-order relationships Ø Family: parent/child; sibling; married; son-in-law/mother-in-law Ø Friendship: friend; close friend; girlfriend/boyfriend; correspondent
A statement is a quasi-grammatical complex used to represent simple events or predicates, roughly resembling an RDF triple or WikiData SNAK. Ø Composed (optionally) of one or more Agents, a Concept, and a Time node Ø Construction is quasi-grammatical as parts are related mainly with prepositions (SUBJECT, OBJECT, WITH, AT, ON, UNDER, NEAR, etc) Ø E.g., Subject: The Arno action: was in flood at: Florence according-to: Luca Landucci
The statement complex can also be used to express ontological claims. While the claim that the Renaissance preacher “Girolamo Savonarola is a man” can be represented with an “is a” relationship, what about other aspects of Savonarola? An “is a” relation tells us nothing about who made the claim or when it was made. It also conditions us towards class-type classifications which conform to ontologies. We can use trait statements to capture aspects …
If(condition, func) makes it simpler to branch Cypher expressions based on form input …
Extending the projection class from Search<T> enables the results to be paginated. An ORDER BY expression builder simplifies conditional ordering.
If you have any questions, or would be interested in trialling Codex as an alpha-user, I can be contacted as below. Iian Neill
https://docs.google.com/forms/d/e/1FAIpQLSeiY5YBj2ir_Jmi4f6GaIbyAzQcAZjYLsAJ4r- Wbr5zScJAww/viewform?usp=sf_link