spowl spark based owl 2 reasoning materialisation
play

SPOWL: Spark-based OWL 2 Reasoning Materialisation Yu Liu and Peter - PowerPoint PPT Presentation

SPOWL: Spark-based OWL 2 Reasoning Materialisation Yu Liu and Peter McBrien Department of Computing, Imperial College London Y.Liu & P.McBrien BeyondMR17 Table of Contents Introduction SPOWL Overview SPOWL Features Evaluation Summary


  1. SPOWL: Spark-based OWL 2 Reasoning Materialisation Yu Liu and Peter McBrien Department of Computing, Imperial College London Y.Liu & P.McBrien BeyondMR17

  2. Table of Contents Introduction SPOWL Overview SPOWL Features Evaluation Summary Y.Liu & P.McBrien BeyondMR17

  3. Table of Contents Introduction SPOWL Overview SPOWL Features Evaluation Summary Y.Liu & P.McBrien BeyondMR17

  4. Reasoning materialisation for OWL 2 ontologies ◮ LUBM T-Box: Student ⊑ Person (1) Student ⊑ ∃ takesCourse . Course (2) ◮ LUBM A-Box: Student(John) (3) Person(Lewis) (5) Student(Tom) (4) Person(Mary) (6) ◮ Reasoning materialisation: Student := { John , Tom } ; Person := { Lewis , Mary , John , Tom } takesCourse := { (John , ?C1) , (Tom , ?C2) } ; Course := { ?C1 , ?C2 } ◮ Querying the ontology: ◮ Not only explicit but also implicit facts will be returned. Y.Liu & P.McBrien BeyondMR17

  5. Reasoning materialisation for OWL 2 ontologies Materialising reasoning results: Student := { John , Tom } Person := { Lewis , Mary , John , Tom } takesCourse := { (John , ?C2) , (Tom , ?C2) } Course := { ?C1 , ?C2 } ◮ Queries directly read the materialised results. ◮ Faster query processing and larger space required. ◮ Maintenance of the materialisation is difficult. ◮ Ideal case: queries are much more frequent than updates. ◮ Example systems: SPOWL, Oracle’s RDF Store, WebPIE, etc. Y.Liu & P.McBrien BeyondMR17

  6. Rule evaluation for reasoning materialisation ◮ Rule format: if � antecedent � then � consequent � : Example: if C ⊑ D , C ( x ) then D ( x ) = ⇒ if Student ⊑ Person , Student( x ) then Person( x ) ◮ Well-known rulesets: ◮ RDFS entailment rules. ◮ OWL ter Horst rules. ◮ OWL 2 RL/RDF rules. ◮ Limitations: ◮ No use of tableaux reasoners (e.g. Pellet and Hermit). ◮ Reasoning relies on which set of entailment rules is chosen. ◮ Inefficient rule matching process. Y.Liu & P.McBrien BeyondMR17

  7. Table of Contents Introduction SPOWL Overview SPOWL Features Evaluation Summary Y.Liu & P.McBrien BeyondMR17

  8. SPOWL architecture ◮ T-Box is small enough for tableaux reasoners. ◮ The number of queries is much larger than the number of updates. Classified OWL T-Box T-Box Documents ① Spark Programme Generation ② Initial Load A-Box 1 Distributed Data Storage (e.g. HDFS) ••• ③ Programme Execution A-Box n Y.Liu & P.McBrien BeyondMR17

  9. SPOWL overview 1. Classes & properties to Spark RDDs: C ❀ C rdd ( id ) P ❀ P rdd ( domain , range ) 2. T-Box axioms are mapped to entailment rules R axiom : C ⊑ D ❀ R C ⊑ D ::= if C rdd ( x ) then D rdd ( x ) 3. R axiom are further implemented as Spark programmes P axiom : R C ⊑ D ❀ P C ⊑ D ::= D rdd = D rdd . union( C rdd ) 4. P axiom are iteratively executed to build up the RDDs. Y.Liu & P.McBrien BeyondMR17

  10. Table of Contents Introduction SPOWL Overview SPOWL Features Evaluation Summary Y.Liu & P.McBrien BeyondMR17

  11. SPOWL uses tableaux reasoner ◮ More complete T-Box reasoning: C ⊑ D ⊔ E e.g. classifying gives us C ⊑ E C ⊓ D ⊑ ⊥ ◮ Entailment rules are specific to the A-Box data: ◮ No need to evaluate rules that are irrelevant to the ontological data. Y.Liu & P.McBrien BeyondMR17

  12. SPOWL partitions reasoning materialisation ◮ Data of each class or property is stored separately in HDFS: C ❀ hdfs://$ { C PATH } / P ❀ hdfs://$ { P PATH } / ◮ A variant of the vertical partitioning model. ◮ Only the partitions storing the relevant data need to be accessed. e.g. Student rdd = sc . textfile( "hdfs://$ { Student PATH } /" ) ◮ Otherwise, the whole ontology should be read and a fragment of it should be filtered out. Y.Liu & P.McBrien BeyondMR17

  13. SPOWL handles axioms beyond OWL 2 RL ◮ SomeValuesFrom forms a superclass expression (i.e. C ⊑ ∃ P . D ) e.g. Student ⊑ ∃ takesCourse . Course(2) ◮ Non-deterministic reasoning (OWL 2 RL Interpretation I ): = C ⊑ ∃ P . D iff C I ⊆ { x | ∃ y : � x , y � ∈ P I and y ∈ D I } I | ◮ Entailment rule R C ⊑∃ P . D : if C rdd ( x ) , ¬ P rdd ( x , y ) then P rdd ( x , null ) ◮ Spark programme P C ⊑∃ P . D : P rdd = P rdd . union( C rdd . subtract( P rdd . map(lambda ( x , y ) : x )) . map(lambda x : ( x , null ))) Y.Liu & P.McBrien BeyondMR17

  14. The advantage of using Spark (1) Spark caches RDDs in distributed memory as much as possible: ◮ reduce the needs to write/read intermediate results to/from disk. ◮ reduce I/O overhead. ◮ suitable for iterative computation (e.g. computing transitive closure). Y.Liu & P.McBrien BeyondMR17

  15. Data caching in distributed memory Iterative computation: ◮ TransitiveProperty P ( P ◦ P ⊑ P ). subOrganisationOf ◦ subOrganisationOf ⊑ subOrganisationOf (7) ◮ Entailment rule R P ◦ P ⊑ P : if P rdd ( x , y ) , P rdd ( y , z ) then P rdd ( x , z ) ◮ Spark programme P P ◦ P ⊑ P : while True do P tmp = P rdd . map(lambda ( x p , y p ) : ( y p , x p )) . join( P rdd ) . map(lambda ( y k , ( x p , z p )) : ( x p , z p )) if P tmp . isEmpty() then break P rdd = P rdd . union( P tmp ) end Y.Liu & P.McBrien BeyondMR17

  16. Data caching in distributed memory Iterative computation: ◮ TransitiveProperty P ( P ◦ P ⊑ P ). subOrganisationOf ◦ subOrganisationOf ⊑ subOrganisationOf (7) ◮ Entailment rule R P ◦ P ⊑ P : if P rdd ( x , y ) , P rdd ( y , z ) then P rdd ( x , z ) ◮ Spark programme P P ◦ P ⊑ P : while True do P tmp = P rdd . map(lambda ( x p , y p ) : ( y p , x p )) . join( P rdd ) . map(lambda ( y k , ( x p , z p )) : ( x p , z p )) P tmp . cache() if P tmp . isEmpty() then break P rdd = P rdd . union( P tmp ) end Y.Liu & P.McBrien BeyondMR17

  17. Data caching in distributed memory ◮ GraduateStudent rdd will be used three times: job a R GraduateStudent ⊑ Person ↓ Person rdd job b R GraduateStudent ⊑∃ takesCourse . GraduateCourse GraduateStudent rdd ↓ takesCourse rdd job c R GraduateStudent ⊑ Student ↓ Student rdd Figure: Caching GraduateStudent rdd for Repeated Usage Y.Liu & P.McBrien BeyondMR17

  18. The advantage of using Spark (2) More flexible job scheduling as compared to Hadoop: Figure: Job Scheduling between Hadoop (left) and Spark (right) Y.Liu & P.McBrien BeyondMR17

  19. DAG for parallelising reasoning Consider Person ⊓ ∃ takesCourse . Course ⊑ Student: ◮ R Person ⊓∃ takesCourse . Course ⊑ Student : if Person rdd ( x ) , takesCourse rdd ( x , y ) , Course rdd ( y ) then Student rdd ( x ) ◮ P Person ⊓∃ takesCourse . Course ⊑ Student : Student tmp 1 = takesCourse rdd . map(lambda ( x t , y t ) : ( y t , x t )) . join(Course rdd . map(lambda y c : ( y c , y c ))) . map(lambda ( y k , ( x t , y c )) : x t )) Student tmp 2 = Student tmp 1 . intersection(Person rdd ) Student rdd = Student rdd . union(Student tmp 2 ) Y.Liu & P.McBrien BeyondMR17

  20. DAG for parallelising reasoning job a R Student ⊑ Person R GraduateStudent ⊑ Person ↓ Person rdd job b job d R Student ⊑∃ takesCourse . Course R Person ⊓∃ takesCourse . Course ⊑ Student ↓ ↓ takesCourse rdd Student rdd job c R GraduateCourse ⊑ Course ↓ Course rdd Figure: DAG Scheduling for R Person ⊓∃ takesCourse . Course ⊑ Student Y.Liu & P.McBrien BeyondMR17

  21. Optimising programme execution order Executing job a , job b and job c before job d is the best order. job a R Student ⊑ Person R GraduateStudent ⊑ Person ↓ Person rdd job b job d R Student ⊑∃ takesCourse . Course R Person ⊓∃ takesCourse . Course ⊑ Student ↓ ↓ takesCourse rdd Student rdd job c R GraduateCourse ⊑ Course ↓ Course rdd Figure: DAG Scheduling for R Person ⊓∃ takesCourse . Course ⊑ Student Y.Liu & P.McBrien BeyondMR17

  22. Ordering Spark Programmes Consider P 1 ⊑ P 2 , P 2 ◦ P 2 ⊑ P 2 and P 2 ⊑ P 3 : Figure: Acyclic property hierarchy How about considering an addition axiom P 3 ≡ P 1 − ? Figure: Cyclic property hierarchy Y.Liu & P.McBrien BeyondMR17

  23. Table of Contents Introduction SPOWL Overview SPOWL Features Evaluation Summary Y.Liu & P.McBrien BeyondMR17

  24. Evaluating SPOWL of reasoning materialisation ◮ Evaluation environment ◮ A cluster of 9 machines running on a private cloud environment. ◮ Each node with CPU @ 2.5GHz, 4 Cores, and 16 GB of Memory. ◮ Benchmarking dataset LUBM ◮ LUBM-2000: about 270 million A-Box facts and 44GB in size. ◮ Comparison system: WebPIE ◮ Using MapReduce as the computation framework. ◮ Not using tableaux reasoners. ◮ Not partitioning reasoning materialisation. ◮ Compressing data before reasoning materialisation. Y.Liu & P.McBrien BeyondMR17

  25. Performance of reasoning materialisation ◮ Reasoning materialisation by SPOWL SPOWL LUBM-400 LUBM-800 LUBM-1200 LUBM-1600 LUBM-2000 Initial Load 9m08s 20m30s 27m50s 41m20s 54m10s Reasoning 10m19s 16m28s 33m20s 38m58s 58m08s Total Time 19m27s 36m58s 1h01m10s 1h20m18s 1h52m18s 01:00:29 00:50:24 Time (hh:mm:ss) 00:40:19 00:30:14 00:20:10 00:10:05 00:00:00 LUBM-400 LUBM-800 LUBM-1600 LUBM-1200 LUBM-2000 Initial Load Type Inference Y.Liu & P.McBrien BeyondMR17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend