Gr Graph Analysis of Candidate GQ GQL Features Graph Query - - PowerPoint PPT Presentation

gr graph analysis of candidate gq gql features
SMART_READER_LITE
LIVE PREVIEW

Gr Graph Analysis of Candidate GQ GQL Features Graph Query - - PowerPoint PPT Presentation

Gr Graph Analysis of Candidate GQ GQL Features Graph Query Language Project Existing Languages Working Group Thomas Frisendal thomasf@tf-informatik.dk, @VizDataModeler 2019-02-26 The Existing Languages Working Group In


slide-1
SLIDE 1

Gr Graph Analysis of Candidate GQ GQL Features

Graph Query Language Project Existing Languages Working Group Thomas Frisendal thomasf@tf-informatik.dk, @VizDataModeler 2019-02-26

slide-2
SLIDE 2

The ”Existing Languages Working Group”

  • In preparation to the commencement of planning for GQL, interested parties -- drawn from industry (Neo4j, Oracle, Redis Labs and

TigerGraph), the community (a noted data modelling expert and published technical author), and academia (the University of Talca in Chile) -- formed an informal working group called the “Existing Languages Working Group”.

  • We have worked in an incremental fashion on systematically identifying, surveying, analysing and comparing graph query language

features, drawn from the following existing query languages:

  • Cypher
  • PGQL
  • GSQL
  • SQL PGQ [ Framework:2020 , Foundation:2020 , SQL/PGQ IWD , ERF-035
  • G-CORE.
  • We hope to comprise a catalogue of:
  • the groups of features
  • to which extent (if at all) these are supported in each language
  • exemplar syntax
  • supplementary artifacts to aid in the understanding of the underlying semantics
  • grammar constructs
  • and any additional details of interest.
  • The idea is to have this landscape of existing query languages in order to inform the design and development of GQL by virtue of a

well-informed work plan and helping to lead to a more robust outcome; i.e. this would help us to have clear and meaningful discussions on scope and priorities, and will facilitate clear and unambiguous design choices. Moreover, this will help us to identify areas of consolidation, innovation and opportunities for language interoperation in GQL (for example, with SPARQL).

slide-3
SLIDE 3

Combatting Complexity: The ELWG Graph Database

  • Establishing an analytical graph database for all 5 languages across all 212 features
  • Down to the keyword level for each feature of each language across 5 descriptive (text / syntax) dimensions
  • Now in its 3rd edition
  • Methodology:
  • Consolidate all sheets into one
  • Generate MERGE commands for the features tree and the 5 languages (by way of Excel formulas)
  • Some manual intervention (remove CR’s and change ;’s to §’s)
  • Load into Neo4j
  • Connect all components
  • Build tags for Descriptors, GrammarTags and SyntaxTags
  • Build a Keyword tag tree based on all of the 3 above
  • Do some reporting (this ppt and some excel sheets)
  • Will be made availabe to phase 2 and in the GQL design work (for analysis)
  • Ambition: Pragmatic, analytical support tool, not a normative source
  • Errare humanum est – report errors and omissions, please (a few known issues already)
slide-4
SLIDE 4

Current Meta Model

slide-5
SLIDE 5

Statistics

Node types Count Min rels Max rels Feature 212 6 14 FeatureArea 6 1 17 FeatureGroup 30 2 27 InclDoc 5 80 549 InclLang 1306 4 4 Language 5 208 311 GCOREFeature 212 2 18 GSQLFeature 212 2 30 OpenCypherFeature 212 2 29 PGQLFeature 212 1 25 SQLFeature 212 2 29 DescriptorTag 401 1 22 GrammarTag 299 1 424 KeywordTag 659 1 247 SyntaxTag 214 1 247

slide-6
SLIDE 6

The Features Tree

slide-7
SLIDE 7

Comparison of Planned or Implemented Features

GCORE GSQL OpenCypher PGQL SQL

slide-8
SLIDE 8

Implementation Status (Not = ’X’)

GCORE: 72, GSQL: 152, Cypher: 168, PGQL: 113, SQL: 140

slide-9
SLIDE 9

Implementation Status Not Supported (’X’)

GCORE: 118, GSQL: 54, Cypher: 43, PGQL: 99, SQL: 71

slide-10
SLIDE 10

The Descriptor Tags

slide-11
SLIDE 11

The Grammar Tags

Function Invocation (Cypher) Not Defined (SQL)

slide-12
SLIDE 12

The Syntax Graph

slide-13
SLIDE 13

Part of the Syntax Graph

slide-14
SLIDE 14

Zooming in on a ”Word” in the Syntax Graph

slide-15
SLIDE 15

Even More Tags in the Keyword Graph

Essentially the Syntax Tags enhanced with keywords extracted from the Descriptor and Grammar Tags

slide-16
SLIDE 16

Collected Keywords per Feature and Language

slide-17
SLIDE 17

Using a Graph Algorithm to Measure Similarity of Expression (Jaccard)

Feature Name AvgSim And 1,00 Comparing values (equality) 1,00 Equality 1,00 Greater than 1,00 Greater than or equal to 1,00 Inequality 1,00 Less than 1,00 Less than or equal to 1,00 Negation 1,00 Or 1,00 Type coercions (i,e, implicit type conversions) 1,00 approximate 32-bit binary decimal number 1,00 approximate 64-bit binary decimal number 1,00 Edge directions: l-to-r 0,87 Specifying a conditional value 0,87 date 0,83 local time 0,83 Check if a property exists on a node or an edge 0,80 Edge directions: r-to-l 0,79 Edge pattern with disjunction of labels 0,79 MATCH with more than one node/edge/path pattern (i,e, allowing for 'star'-shaped patterns etc), Essentially this can also be used to obtain a cross product 0,75 Edge pattern with direction 0,75 Subtraction 0,74 Edge directions: any direction 0,73

Feature Name AvgSim Dynamic property access (accessing a property of a node

  • r edge by using a dynamically-computed string value as

the key§ e,g, allowing for the key to be passed in as a parameter)

  • Escaping characters
  • Flattening a list (transform a list into a series of rows§

transpose)

  • Get all the elements of a list/collection/array excluding

the first element

  • Get all the labels for a node
  • Get the identifier of a node or edge
  • Node pattern with label negation
  • interval
  • multidimensional array
  • Obtain the current date/time

0,06 Get all the nodes in a path 0,07 List/collection/array concatenation 0,07 Get all the edges in a path 0,08 Determine whether or not a value is a member of a multiset 0,08 Input graph specification 0,08 List equality 0,08 Create an edge 0,09 Get the edge label as a string 0,09 Subtraction operator for temporal types and durations 0,11 Create a node 0,11 Get the first element in a list/collection/array 0,11 Replace 0,11 Checking if a pattern exists 0,12 Amalgamate multiple values into a single list 0,13

  • 0,20

0,40 0,60 0,80 1,00 1,20 And Less than approximate 64-bit binary … Edge directions: r-to-l Edge pattern with label Compute 'e' raised to a given … Sorting returned rows Edge property predicates time with time zone Update all properties on an … basic list/array Projecting rows Standard aggregating operations Delete an edge Element existence checking Conversion Power Addition operator for temporal … Reading from a graph multiset Create an edge Get all the nodes in a path Get all the elements of a …

AvgSim

slide-18
SLIDE 18

10 Data Extracts in Excel (ELWG_reports_20190228.zip)

  • CandidateFeatures_20190228
  • DescriptorTags_20190228
  • FeaturesNotSupported_20190228
  • FeatureSyntaxSimilarity_20190228
  • GrammarTags_20190228
  • KeywordTagsAcrossLanguages_20190228
  • KeyWordTagsCollections_20190228
  • SyntaxSummary_20190228
  • SyntaxTags_20190228
  • SyntaxXref_20190228
slide-19
SLIDE 19

Contact information: Thomas Frisendal (Copenhagen, Denmark) thomasf@tf-informatik.dk @VizDataModeler linkedin.com/in/thomas- frisendal-19a56a