SPOWL: Spark-based OWL 2 Reasoning Materialisation Yu Liu and Peter - PowerPoint PPT Presentation

SPOWL: Spark-based OWL 2 Reasoning Materialisation Yu Liu and Peter McBrien Department of Computing, Imperial College London Y.Liu & P.McBrien BeyondMR17

Table of Contents Introduction SPOWL Overview SPOWL Features Evaluation Summary Y.Liu & P.McBrien BeyondMR17

Reasoning materialisation for OWL 2 ontologies ◮ LUBM T-Box: Student ⊑ Person (1) Student ⊑ ∃ takesCourse . Course (2) ◮ LUBM A-Box: Student(John) (3) Person(Lewis) (5) Student(Tom) (4) Person(Mary) (6) ◮ Reasoning materialisation: Student := { John , Tom } ; Person := { Lewis , Mary , John , Tom } takesCourse := { (John , ?C1) , (Tom , ?C2) } ; Course := { ?C1 , ?C2 } ◮ Querying the ontology: ◮ Not only explicit but also implicit facts will be returned. Y.Liu & P.McBrien BeyondMR17

Reasoning materialisation for OWL 2 ontologies Materialising reasoning results: Student := { John , Tom } Person := { Lewis , Mary , John , Tom } takesCourse := { (John , ?C2) , (Tom , ?C2) } Course := { ?C1 , ?C2 } ◮ Queries directly read the materialised results. ◮ Faster query processing and larger space required. ◮ Maintenance of the materialisation is difficult. ◮ Ideal case: queries are much more frequent than updates. ◮ Example systems: SPOWL, Oracle’s RDF Store, WebPIE, etc. Y.Liu & P.McBrien BeyondMR17

Rule evaluation for reasoning materialisation ◮ Rule format: if � antecedent � then � consequent � : Example: if C ⊑ D , C ( x ) then D ( x ) = ⇒ if Student ⊑ Person , Student( x ) then Person( x ) ◮ Well-known rulesets: ◮ RDFS entailment rules. ◮ OWL ter Horst rules. ◮ OWL 2 RL/RDF rules. ◮ Limitations: ◮ No use of tableaux reasoners (e.g. Pellet and Hermit). ◮ Reasoning relies on which set of entailment rules is chosen. ◮ Inefficient rule matching process. Y.Liu & P.McBrien BeyondMR17

SPOWL architecture ◮ T-Box is small enough for tableaux reasoners. ◮ The number of queries is much larger than the number of updates. Classified OWL T-Box T-Box Documents ① Spark Programme Generation ② Initial Load A-Box 1 Distributed Data Storage (e.g. HDFS) ••• ③ Programme Execution A-Box n Y.Liu & P.McBrien BeyondMR17

SPOWL overview 1. Classes & properties to Spark RDDs: C ❀ C rdd ( id ) P ❀ P rdd ( domain , range ) 2. T-Box axioms are mapped to entailment rules R axiom : C ⊑ D ❀ R C ⊑ D ::= if C rdd ( x ) then D rdd ( x ) 3. R axiom are further implemented as Spark programmes P axiom : R C ⊑ D ❀ P C ⊑ D ::= D rdd = D rdd . union( C rdd ) 4. P axiom are iteratively executed to build up the RDDs. Y.Liu & P.McBrien BeyondMR17

SPOWL uses tableaux reasoner ◮ More complete T-Box reasoning: C ⊑ D ⊔ E e.g. classifying gives us C ⊑ E C ⊓ D ⊑ ⊥ ◮ Entailment rules are specific to the A-Box data: ◮ No need to evaluate rules that are irrelevant to the ontological data. Y.Liu & P.McBrien BeyondMR17

SPOWL partitions reasoning materialisation ◮ Data of each class or property is stored separately in HDFS: C ❀ hdfs://$ { C PATH } / P ❀ hdfs://$ { P PATH } / ◮ A variant of the vertical partitioning model. ◮ Only the partitions storing the relevant data need to be accessed. e.g. Student rdd = sc . textfile( "hdfs://$ { Student PATH } /" ) ◮ Otherwise, the whole ontology should be read and a fragment of it should be filtered out. Y.Liu & P.McBrien BeyondMR17

SPOWL handles axioms beyond OWL 2 RL ◮ SomeValuesFrom forms a superclass expression (i.e. C ⊑ ∃ P . D ) e.g. Student ⊑ ∃ takesCourse . Course(2) ◮ Non-deterministic reasoning (OWL 2 RL Interpretation I ): = C ⊑ ∃ P . D iff C I ⊆ { x | ∃ y : � x , y � ∈ P I and y ∈ D I } I | ◮ Entailment rule R C ⊑∃ P . D : if C rdd ( x ) , ¬ P rdd ( x , y ) then P rdd ( x , null ) ◮ Spark programme P C ⊑∃ P . D : P rdd = P rdd . union( C rdd . subtract( P rdd . map(lambda ( x , y ) : x )) . map(lambda x : ( x , null ))) Y.Liu & P.McBrien BeyondMR17

The advantage of using Spark (1) Spark caches RDDs in distributed memory as much as possible: ◮ reduce the needs to write/read intermediate results to/from disk. ◮ reduce I/O overhead. ◮ suitable for iterative computation (e.g. computing transitive closure). Y.Liu & P.McBrien BeyondMR17

Data caching in distributed memory Iterative computation: ◮ TransitiveProperty P ( P ◦ P ⊑ P ). subOrganisationOf ◦ subOrganisationOf ⊑ subOrganisationOf (7) ◮ Entailment rule R P ◦ P ⊑ P : if P rdd ( x , y ) , P rdd ( y , z ) then P rdd ( x , z ) ◮ Spark programme P P ◦ P ⊑ P : while True do P tmp = P rdd . map(lambda ( x p , y p ) : ( y p , x p )) . join( P rdd ) . map(lambda ( y k , ( x p , z p )) : ( x p , z p )) if P tmp . isEmpty() then break P rdd = P rdd . union( P tmp ) end Y.Liu & P.McBrien BeyondMR17

Data caching in distributed memory Iterative computation: ◮ TransitiveProperty P ( P ◦ P ⊑ P ). subOrganisationOf ◦ subOrganisationOf ⊑ subOrganisationOf (7) ◮ Entailment rule R P ◦ P ⊑ P : if P rdd ( x , y ) , P rdd ( y , z ) then P rdd ( x , z ) ◮ Spark programme P P ◦ P ⊑ P : while True do P tmp = P rdd . map(lambda ( x p , y p ) : ( y p , x p )) . join( P rdd ) . map(lambda ( y k , ( x p , z p )) : ( x p , z p )) P tmp . cache() if P tmp . isEmpty() then break P rdd = P rdd . union( P tmp ) end Y.Liu & P.McBrien BeyondMR17

Data caching in distributed memory ◮ GraduateStudent rdd will be used three times: job a R GraduateStudent ⊑ Person ↓ Person rdd job b R GraduateStudent ⊑∃ takesCourse . GraduateCourse GraduateStudent rdd ↓ takesCourse rdd job c R GraduateStudent ⊑ Student ↓ Student rdd Figure: Caching GraduateStudent rdd for Repeated Usage Y.Liu & P.McBrien BeyondMR17

The advantage of using Spark (2) More flexible job scheduling as compared to Hadoop: Figure: Job Scheduling between Hadoop (left) and Spark (right) Y.Liu & P.McBrien BeyondMR17

DAG for parallelising reasoning Consider Person ⊓ ∃ takesCourse . Course ⊑ Student: ◮ R Person ⊓∃ takesCourse . Course ⊑ Student : if Person rdd ( x ) , takesCourse rdd ( x , y ) , Course rdd ( y ) then Student rdd ( x ) ◮ P Person ⊓∃ takesCourse . Course ⊑ Student : Student tmp 1 = takesCourse rdd . map(lambda ( x t , y t ) : ( y t , x t )) . join(Course rdd . map(lambda y c : ( y c , y c ))) . map(lambda ( y k , ( x t , y c )) : x t )) Student tmp 2 = Student tmp 1 . intersection(Person rdd ) Student rdd = Student rdd . union(Student tmp 2 ) Y.Liu & P.McBrien BeyondMR17

DAG for parallelising reasoning job a R Student ⊑ Person R GraduateStudent ⊑ Person ↓ Person rdd job b job d R Student ⊑∃ takesCourse . Course R Person ⊓∃ takesCourse . Course ⊑ Student ↓ ↓ takesCourse rdd Student rdd job c R GraduateCourse ⊑ Course ↓ Course rdd Figure: DAG Scheduling for R Person ⊓∃ takesCourse . Course ⊑ Student Y.Liu & P.McBrien BeyondMR17

Optimising programme execution order Executing job a , job b and job c before job d is the best order. job a R Student ⊑ Person R GraduateStudent ⊑ Person ↓ Person rdd job b job d R Student ⊑∃ takesCourse . Course R Person ⊓∃ takesCourse . Course ⊑ Student ↓ ↓ takesCourse rdd Student rdd job c R GraduateCourse ⊑ Course ↓ Course rdd Figure: DAG Scheduling for R Person ⊓∃ takesCourse . Course ⊑ Student Y.Liu & P.McBrien BeyondMR17

Ordering Spark Programmes Consider P 1 ⊑ P 2 , P 2 ◦ P 2 ⊑ P 2 and P 2 ⊑ P 3 : Figure: Acyclic property hierarchy How about considering an addition axiom P 3 ≡ P 1 − ? Figure: Cyclic property hierarchy Y.Liu & P.McBrien BeyondMR17

Evaluating SPOWL of reasoning materialisation ◮ Evaluation environment ◮ A cluster of 9 machines running on a private cloud environment. ◮ Each node with CPU @ 2.5GHz, 4 Cores, and 16 GB of Memory. ◮ Benchmarking dataset LUBM ◮ LUBM-2000: about 270 million A-Box facts and 44GB in size. ◮ Comparison system: WebPIE ◮ Using MapReduce as the computation framework. ◮ Not using tableaux reasoners. ◮ Not partitioning reasoning materialisation. ◮ Compressing data before reasoning materialisation. Y.Liu & P.McBrien BeyondMR17

Performance of reasoning materialisation ◮ Reasoning materialisation by SPOWL SPOWL LUBM-400 LUBM-800 LUBM-1200 LUBM-1600 LUBM-2000 Initial Load 9m08s 20m30s 27m50s 41m20s 54m10s Reasoning 10m19s 16m28s 33m20s 38m58s 58m08s Total Time 19m27s 36m58s 1h01m10s 1h20m18s 1h52m18s 01:00:29 00:50:24 Time (hh:mm:ss) 00:40:19 00:30:14 00:20:10 00:10:05 00:00:00 LUBM-400 LUBM-800 LUBM-1600 LUBM-1200 LUBM-2000 Initial Load Type Inference Y.Liu & P.McBrien BeyondMR17

SPOWL: Spark-based OWL 2 Reasoning Materialisation Yu Liu and Peter - PowerPoint PPT Presentation

SPOWL: Spark-based OWL 2 Reasoning Materialisation Yu Liu and Peter McBrien Department of Computing, Imperial College London Y.Liu & P.McBrien BeyondMR17 Table of Contents Introduction SPOWL Overview SPOWL Features Evaluation Summary

OWL Three species of OWL OWL full is union of OWL syntax and RDF (Undecidable) OWL DL

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Ontology Engineering Lecture 4: The Web Ontology Language OWL 2 Maria Keet email:

Overview of the Course A. Semantic Web in general and OWL syntax B. OWL Semantics (DLs) and

The Burrowing Owl By:Isabella The burrowing owl lives in underground burrows in deserts. The

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

The Protg OWL Plugin Holger Knublauch Stanford University July 07 2004 Overview The

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark

Spark starts here. Spark New Zealand Annual Results 2014 Investor Presentation Spark is more

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of

What Information SPARK Collects, and Why What Information SPARK Collects, and Why LeeAnne Green

Spark Technology 1. Spark main objectives 2. RDD concepts and operations 3. SPARK application

Distributing Matrix Computations with Spark MLlib Reza Zadeh A General Platform Standard libraries

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Shixiong Zhao , Rui Gu, Haoran Qiu , Tsz On Li , Yuexuan Wang Heming Cui , and Junfeng Yang

OWL-T for a Semantic Description of IoT Z. Maamar 1 , N. Faci 2 , E. Kajan 3 , M. Asim 4 , and A.

Extending NoHR for OWL 2 QL Nuno Costa Matthias Knorr Jo ao Leite Universidade Nova de

An Introduction to OWL Sean Bechhofer School of Computer Science University of Manchester, UK

Overview Yesterday we studied how real 2 2 matrices act on C . Just as the action of a diagonal

Publishing Vocabularies on the Web Guus Schreiber Antoine Isaac Vrije Universiteit Amsterdam

Ontology Evolution Analysis with OWL-MeT Natalya Keberle Yuriy Litvinenko Yuriy Gordeyev

Knowledge Engineering Semester 2, 2004-05 Michael Rovatsos mrovatso@inf.ed.ac.uk N I V E U

SPOWL: Spark-based OWL 2 Reasoning Materialisation Yu Liu and Peter - PowerPoint PPT Presentation

SPOWL: Spark-based OWL 2 Reasoning Materialisation Yu Liu and Peter McBrien Department of Computing, Imperial College London Y.Liu & P.McBrien BeyondMR17 Table of Contents Introduction SPOWL Overview SPOWL Features Evaluation Summary

OWL Three species of OWL OWL full is union of OWL syntax and RDF (Undecidable) OWL DL

Spark Code Camp Discover Spark Streaming &amp; Spark SQL Project Overview Focus on Spark

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Ontology Engineering Lecture 4: The Web Ontology Language OWL 2 Maria Keet email:

Overview of the Course A. Semantic Web in general and OWL syntax B. OWL Semantics (DLs) and

The Burrowing Owl By:Isabella The burrowing owl lives in underground burrows in deserts. The

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

The Protg OWL Plugin Holger Knublauch Stanford University July 07 2004 Overview The

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark

Spark starts here. Spark New Zealand Annual Results 2014 Investor Presentation Spark is more

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of

What Information SPARK Collects, and Why What Information SPARK Collects, and Why LeeAnne Green

Spark Technology 1. Spark main objectives 2. RDD concepts and operations 3. SPARK application

Distributing Matrix Computations with Spark MLlib Reza Zadeh A General Platform Standard libraries

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Shixiong Zhao , Rui Gu, Haoran Qiu , Tsz On Li , Yuexuan Wang Heming Cui , and Junfeng Yang

OWL-T for a Semantic Description of IoT Z. Maamar 1 , N. Faci 2 , E. Kajan 3 , M. Asim 4 , and A.

Extending NoHR for OWL 2 QL Nuno Costa Matthias Knorr Jo ao Leite Universidade Nova de

An Introduction to OWL Sean Bechhofer School of Computer Science University of Manchester, UK

Overview Yesterday we studied how real 2 2 matrices act on C . Just as the action of a diagonal

Publishing Vocabularies on the Web Guus Schreiber Antoine Isaac Vrije Universiteit Amsterdam

Ontology Evolution Analysis with OWL-MeT Natalya Keberle Yuriy Litvinenko Yuriy Gordeyev

Knowledge Engineering Semester 2, 2004-05 Michael Rovatsos mrovatso@inf.ed.ac.uk N I V E U

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark