A PPLICATION : S EARCH IN T OURISM (S KY S CANNER ) Goal: search for - PowerPoint PPT Presentation

Q UERYING S EMANTIC B IG D ATA AND I TS A PPLICATIONS Boris Motik University of Oxford November 16, 2015

T ABLE OF C ONTENTS B IG D ATA A PPLICATIONS OF S EMANTIC F ORMALISMS 1 RDF OX : P ARALLEL M ATERIALISATION -B ASED D ATALOG R EASONER 2 A NSWERING Q UERIES IN OWL 2 EL 3 A NSWERING Q UERIES IN OWL 2 DL 4 R ESEARCH D IRECTIONS 5 Boris Motik Querying Semantic Big Data and Its Applications 0/24

Big Data Applications of Semantic Formalisms T ABLE OF C ONTENTS B IG D ATA A PPLICATIONS OF S EMANTIC F ORMALISMS 1 RDF OX : P ARALLEL M ATERIALISATION -B ASED D ATALOG R EASONER 2 A NSWERING Q UERIES IN OWL 2 EL 3 A NSWERING Q UERIES IN OWL 2 DL 4 R ESEARCH D IRECTIONS 5 Boris Motik Querying Semantic Big Data and Its Applications 0/24

Big Data Applications of Semantic Formalisms A PPLICATION : S EARCH IN T OURISM (S KY S CANNER ) Goal: search for hotels/flights/trips using natural language Need to represent large amounts of heterogeneous data Query for accommodation should include hotels, B&Bs, . . . Boris Motik Querying Semantic Big Data and Its Applications 1/24

Big Data Applications of Semantic Formalisms A PPLICATION : C ONTEXT -A WARE M OBILE S ERVICES (S AMSUNG ) Use sensors (WiFi, GPS, . . . ) to identify the context E.g., ‘at home’, ‘in a shop’, ‘with a friend’ . . . Adapt behaviour depending on the context ‘If with a friend who has birthday, remind to congratulate’ Declaratively describe contexts and adaptations E.g., ‘If can see home Wifi, then context is “at home”’ Interpret all rules in real-time using reasoning Main benefit: declarative, rather than procedural Boris Motik Querying Semantic Big Data and Its Applications 2/24

Big Data Applications of Semantic Formalisms D ATA A NALYSIS IN H EALTHCARE (K AISER P ERMANENTE ) HEDIS 1 is a Performance Measure specification issued by NCQA 2 E.g., all diabetic patients must have annual eye exams Meeting HEDIS standards is a requirement for government funded healthcare (Medicare) Checking/reporting is difficult and costly Complex specifications & annual revisions Disparate data sources Ad hoc schemas including implicit information ⇒ Our solution: specify reporting rules declaratively (in datalog) Easier creation, debugging, and maintenance 1 Healthcare Effectiveness Data and Information Set 2 National Committee for Quality Assurance Boris Motik Querying Semantic Big Data and Its Applications 3/24

Big Data Applications of Semantic Formalisms I NFORMATION I NTEGRATION IN G AS & O IL (S TATOIL ) Geologists & geophysicists use data from previous operations in nearby locations to develop stratigraphic models of unexplored areas TBs of relational data Diverse schemata Spread over 1,000s of tables and multiple data bases Data Access 900 geologists & geophysicists 30–70% of time on data gathering four-day turnaround for new queries Data Exploitation Better use of experts time Data analysis ‘most important factor’ for drilling success Boris Motik Querying Semantic Big Data and Its Applications 4/24

Big Data Applications of Semantic Formalisms C OMMON P ROBLEM : Q UERY A NSWERING OWL 2 DL — L ANGUAGE FOR O NTOLOGY M ODELLING Each ontology can be normalised to disjunctive existential rules: ∀ � x � � ϕ ( � x ,� z ) → ∃ � y 1 .ψ 1 ( � x ,� y 1 ) ∨ . . . ∨ � y n .ψ n ( � x ,� � z . y n ) ϕ and ψ i are conjunctions of atoms Predicates are unary (i.e., concepts), binary (i.e., roles), or ≈ Various structural restrictions ensure decidability C ONJUNCTIVE Q UERY A NSWERING Conjunctive queries: Q ( � x ) ≡ ∃ � y .ϕ ( � x ,� y ) Query answering: find all ground τ such that O | = Q ( � x ) τ OWL 2 DL F RAGMENTS OWL 2 RL — finite domain ⇒ datalog query answering OWL 2 EL — polynomial subsumption (i.e., checking O | = ∀ x . [ A ( x ) → B ( x )] ) OWL 2 QL — data complexity of query answering in AC 0 Boris Motik Querying Semantic Big Data and Its Applications 5/24

RDFox: Parallel Materialisation-Based Datalog Reasoner T ABLE OF C ONTENTS B IG D ATA A PPLICATIONS OF S EMANTIC F ORMALISMS 1 RDF OX : P ARALLEL M ATERIALISATION -B ASED D ATALOG R EASONER 2 A NSWERING Q UERIES IN OWL 2 EL 3 A NSWERING Q UERIES IN OWL 2 DL 4 R ESEARCH D IRECTIONS 5 Boris Motik Querying Semantic Big Data and Its Applications 5/24

RDFox: Parallel Materialisation-Based Datalog Reasoner G OALS OF RDF OX Develop techniques for materialisation of datalog programs on RDF data Boris Motik Querying Semantic Big Data and Its Applications 6/24

RDFox: Parallel Materialisation-Based Datalog Reasoner G OALS OF RDF OX Develop techniques for materialisation of datalog programs on RDF data Current trends in databases and knowledge-based systems: Price of RAM keeps falling 128 GB is routine, systems with 1 TB are emerging In-memory databases: SAP’s HANA, Oracle’s TimesTen, YarcData’s Urika Materialisation is computationally intensive ⇒ natural to parallelise Mid-range laptops have 4 cores, servers with 16 cores are routine Boris Motik Querying Semantic Big Data and Its Applications 6/24

RDFox: Parallel Materialisation-Based Datalog Reasoner G OALS OF RDF OX Develop techniques for materialisation of datalog programs on RDF data in main-memory, multicore systems Implemented in the RDFox system http://www.cs.ox.ac.uk/isg/tools/RDFox/ Current trends in databases and knowledge-based systems: Price of RAM keeps falling 128 GB is routine, systems with 1 TB are emerging In-memory databases: SAP’s HANA, Oracle’s TimesTen, YarcData’s Urika Materialisation is computationally intensive ⇒ natural to parallelise Mid-range laptops have 4 cores, servers with 16 cores are routine B. Motik, Y. Nenov, R. Piro, I. Horrocks, D. Olteanu: Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems. AAAI 2014 B. Motik, Y. Nenov, R. Piro, I. Horrocks.: Handling owl:sameAs via Rewriting. AAAI 2015 Boris Motik Querying Semantic Big Data and Its Applications 6/24

RDFox: Parallel Materialisation-Based Datalog Reasoner E XISTING A PPROACHES TO P ARALLEL M ATERIALISATION Interquery parallelism: run independent rules in parallel Degree of parallelism limited by the number of independent rules ⇒ does not distribute workload to cores evenly Intraquery parallelism Partition rule instantiations to N threads E.g., constrain the body of rules evaluated by thread i to ( x mod N = i ) ⇒ Static partitioning may not distribute workload well due to data skew ⇒ Dynamic partitioning may incur an overhead due to load balancing Parallelise join computation Hash-partition data into blocks, compute the join for each block independently ⇒ Hash tables keep being constantly recomputed Sort-merge join requires constant data reordering Goal: distribute workload to threads evenly and with minimum overhead Boris Motik Querying Semantic Big Data and Its Applications 7/24

RDFox: Parallel Materialisation-Based Datalog Reasoner I NTERLEAVING Q UERYING WITH U PDATES Efficient query evaluation requires indexes Crucial for elimination of duplicate triples ⇒ ensures termination Usually sorted (and clustered) to allow for merge joins Hash indexes can also be used Individual (i.e., not bulk) index updates are inefficient Materialisation interleaves . . . . . . querying (during evaluation of rule bodies) . . . updates (during updates of derived facts) ⇒ Data storage should support indexes and efficient parallel updates Boris Motik Querying Semantic Big Data and Its Applications 8/24

RDFox: Parallel Materialisation-Based Datalog Reasoner S OLUTION P ART I: A LGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A ( x ) ∧ R ( x , y ) → A ( y ) For each fact: 1 Match the fact to all body atoms to obtain subqueries 2 Evaluate subqueries w.r.t. all previous facts 3 Add results to the table Current subquery: Boris Motik Querying Semantic Big Data and Its Applications 9/24

RDFox: Parallel Materialisation-Based Datalog Reasoner S OLUTION P ART I: A LGORITHM ⇒ R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A ( x ) ∧ R ( x , y ) → A ( y ) For each fact: 1 Match the fact to all body atoms to obtain subqueries 2 Evaluate subqueries w.r.t. all previous facts 3 Add results to the table Current subquery: A(a) Boris Motik Querying Semantic Big Data and Its Applications 9/24

RDFox: Parallel Materialisation-Based Datalog Reasoner S OLUTION P ART I: A LGORITHM R(a,b) ⇒ R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A ( x ) ∧ R ( x , y ) → A ( y ) For each fact: 1 Match the fact to all body atoms to obtain subqueries 2 Evaluate subqueries w.r.t. all previous facts 3 Add results to the table Current subquery: A(a) Boris Motik Querying Semantic Big Data and Its Applications 9/24

RDFox: Parallel Materialisation-Based Datalog Reasoner S OLUTION P ART I: A LGORITHM R(a,b) R(a,c) ⇒ R(b,d) R(b,e) A(a) R(c,f) R(c,g) A ( x ) ∧ R ( x , y ) → A ( y ) For each fact: 1 Match the fact to all body atoms to obtain subqueries 2 Evaluate subqueries w.r.t. all previous facts 3 Add results to the table Current subquery: A(b) Boris Motik Querying Semantic Big Data and Its Applications 9/24

RDFox: Parallel Materialisation-Based Datalog Reasoner S OLUTION P ART I: A LGORITHM R(a,b) R(a,c) R(b,d) ⇒ R(b,e) A(a) R(c,f) R(c,g) A ( x ) ∧ R ( x , y ) → A ( y ) For each fact: 1 Match the fact to all body atoms to obtain subqueries 2 Evaluate subqueries w.r.t. all previous facts 3 Add results to the table Current subquery: A(b) Boris Motik Querying Semantic Big Data and Its Applications 9/24

A PPLICATION : S EARCH IN T OURISM (S KY S CANNER ) Goal: search for - PowerPoint PPT Presentation

Q UERYING S EMANTIC B IG D ATA AND I TS A PPLICATIONS Boris Motik University of Oxford November 16, 2015 T ABLE OF C ONTENTS B IG D ATA A PPLICATIONS OF S EMANTIC F ORMALISMS 1 RDF OX : P ARALLEL M ATERIALISATION -B ASED D ATALOG R EASONER 2 A

USINESS ! OURISM EANS H OW T OURISM AND E CONOMIC D EVELOPMENT G O H AND -I N - H AND Rob

Briefing to the Portfolio Committee on T ourism on the Department of T ourism Annual Report

S EARCH AND S EMANTIC S EARCH Indian Institute of Technology Kanpur Commonwealth of Learning

freewvs freewvs https://freewvs.schokokeks.org/ free w eb v ulnerability s canner Hanno Bck -

Min inis istry of of H Hot otels and T nd Tour ourism Directo torate te of of H Hot

Dr. Brenda Boonabaana Regional Expert for Africa Global Report on Women in T ourism , Second

Do o r Co unty T o urism Zo ne 2018 Annua l Me e ting Re po rt June 20, 2019 1 T OURISM

NESS ! USINE H OW T OURISM AND E CONOMIC D EVELOPMENT G O H AND -I N -H AND A ND H OW TO M

A NALYZING I NTER -A PPLICATION C OMMUNICATION IN A NDROID Erika Chin Adrienne Porter Felt Kate

A Regional gional Dair airy Foods oods Res esear earch h Cent enter er 1 Founda

MET METHOD HODOL OLOGY GY a st a standardized andardized 6-st step res ep research earch

U. S. VIRGIN ISL ANDS DE PART ME NT OF T OURISM Re ve nue E stimating Confe r e nc e

www.peoplefirsttourism.com/ NC info@peoplefirsttourism.com W HAT IS P EOPLE -F IRST T OURISM ?

S TRATEGIC R EGIONAL T OURISM P LAN 1 Photos supplied by North Cape Coastal Tourism Area

SenTIA Sen dai T ourism, Convention and I nternational A ssociation

CRUISE TOURISM : MALAYSIAS EXPERIENCE YONG NG EE CHIN IN Ministry of T ourism sm and

Co-visualiza+on of full data and in situ data extracts

tvz@insead.edu INSEAD (France) Presentation at DIMACS Workshop on Bounded Rationality

Morphing ensemble Kalman filter and applications Jan Mandel and Jonathan D. Beezley Center for

Feature Selection Gavin Brown www.cs.man.ac.uk/~gbrown The Usual Supervised Learning Approach

Boosting Frank-Wolfe by Chasing Gradients Cyrille W. Combettes . with Sebastian Pokutta School

CONSTRAINT-BASED PLANNING AND SCHEDULING k Ro om ma an n B Ba ar rt t k Ch ha ar

OPERATIONS CHALLENGE LABORATORY PROCEDURE 2019 Version 9.3.19 Goal Analyzing and determining t

Geriatrics Board Review Daniel Pound, MD Clinical Professor Family and Community Medicine, UCSF