the codex
play

The Codex BUILDING A GRAPH OF HISTORY What is Codex? v - PowerPoint PPT Presentation

The Codex BUILDING A GRAPH OF HISTORY What is Codex? v Text-as-a-Graph with the aim to achieve the deep integration of text and data v Manages standoff property documents linked to graph entities v Graph meta-model defined by composition from


  1. The Codex BUILDING A GRAPH OF HISTORY

  2. What is Codex? v Text-as-a-Graph with the aim to achieve the deep integration of text and data v Manages standoff property documents linked to graph entities v Graph meta-model defined by composition from statements and meta-relations v An easily extensible system where new annotations and new entities can be integrated v Text annotations are high resolution and multidimensional

  3. Technology Neo4jClient API SPEED Y Neo4jClient-Vector extensions

  4. Core entities q Texts/Standoff Properties : annotations q Agents : entities built up through composition q Statements : events, traits (“aspect-oriented ontology”) q Meta-Relations : dynamic, bi-directional, hierarchical (higher-order) q Properties: agent data-points (latitude, longitude, height, weight, etc) q Time : fuzzy dates (degrees of precision) q Concepts : shared vocabulary (the glue); hierarchical labels

  5. Text as a graph The graph representation of the text can itself be stored in a graph database, like Neo4j. Using standoff properties, graph queries (using the Cypher language) are mappable directly to the parts of the text in question. Because properties are overlappable, a text can be graphed along many axes, whether it be entities, events, concepts, commentaries, or even ASTs generated by NLPs.

  6. Text as a Graph The goal ◦ To fully annotate a text you need to be able to represent overlapping ranges The problem ◦ In HTML/XML a range is represented by a node or an element ◦ However, HTML/XML can only represent nodes in a tree structure, whereas annotations cross nod es and are better mapped to graphs, not trees ◦ XML elements are document-based rather than connected in a database The solution ◦ Separate the text and annotations, don’t embed properties in the text like HTML/XML does ◦ A property is a data-structure that represents a text range (e.g., 0 -> 10) and a type (e.g., ‘italics’, ‘place’, ‘person’) ◦ Properties that are not embedded in text are called ‘ standoff properties ’ ◦ Standoff property nodes act as connective tissue between the plain text and the graph meta-model

  7. XML overlap

  8. XML overlap

  9. XML overlap

  10. Standoff properties 1. Don’t suffer from the XML overlap problem 2. Stored externally from the text stream, which is left in a plain format 3. Supports annotations inside words , of single characters , and between characters 4. Annotation layers (or strata ) can be exported or imported 5. Supports multidimensional queries across overlapping annotations and layers

  11. Standoff property graph

  12. Meta- Meta-relations are Ø User-created relations Ø Bi-directional Ø No need to choose between “parent_of” or “child_of” in the model Ø One query can bring back both parties Ø Each part is a noun rather than a verb (“parent” vs “parent_of”) Ø Link to a relationship graph for higher-order relationships Ø Family: parent/child; sibling; married; son-in-law/mother-in-law Ø Friendship: friend; close friend; girlfriend/boyfriend; correspondent

  13. Meta-relations - IsDominant: true/false

  14. Statements A statement is a quasi-grammatical complex used to represent simple events or predicates, roughly resembling an RDF triple or WikiData SNAK. Ø Composed (optionally) of one or more Agents , a Concept , and a Time node Ø Construction is quasi-grammatical as parts are related mainly with prepositions (SUBJECT, OBJECT, WITH, AT, ON, UNDER, NEAR, etc) Ø E.g., Subject: The Arno action: was in flood at: Florence according-to: Luca Landucci on: 1498/11/24

  15. Statements - AgentRole: SUBJECT; OBJECT; AT; WITH; ACCORDING-TO; etc

  16. The statement complex can also be used to express ontological claims . Aspect- While the claim that the Renaissance preacher “ Girolamo Savonarola is a man ” can be represented with an “is a” oriented relationship, what about other aspects of Savonarola? An “is a” relation tells us nothing about who made the claim or ontology when it was made. It also conditions us towards class-type classifications which conform to ontologies. We can use trait statements to capture aspects …

  17. Neo4jClient - Switches seamlessly between REST and BOLT interfaces - Cypher expression builder (keywords mostly) - Safe parameterisation of values - Powerful deserialiser turns JSON results into complex objects

  18. Neo4jClient-Vector - Adds extension methods to Neo4jClient - Adds Vector<> class to generate Cypher paths - Node labels - Relationship types and directions - Reversible relationships (subset_of_concept <-> children_of_concept) - Overloadable relationships (text_has_standoff_property -> text_has_sentence_standoff_property) - Removes the noise to put the focus on the path patterns - (lex)-(head), (t)-(asp)-(a)

  19. Neo4jClient-Vector If(condition, func) makes it simpler to branch Cypher expressions based on form input …

  20. Pagination Extending the projection class from Search<T> enables the results to be paginated. An ORDER BY expression builder simplifies conditional ordering.

  21. Contact If you have any questions, or would be interested in trialling Codex as an alpha-user, I can be contacted as below. Iian Neill ◦ Email: iian.d.neill@gmail.com ◦ Twitter: codexeditor

  22. The Hunger Games Q1. What is a standoff property? Q2. What is a meta-relation? Q3. What is a Codex statement? https://docs.google.com/forms/d/e/1FAIpQLSeiY5YBj2ir_Jmi4f6GaIbyAzQcAZjYLsAJ4r- Wbr5zScJAww/viewform?usp=sf_link

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend