Search Computing November 7, 2011 Stefano Ceri ... and the SeCo - - PowerPoint PPT Presentation

search computing
SMART_READER_LITE
LIVE PREVIEW

Search Computing November 7, 2011 Stefano Ceri ... and the SeCo - - PowerPoint PPT Presentation

Search Computing November 7, 2011 Stefano Ceri ... and the SeCo Project Team Adnan Abid, Mamoun Abu Helu, Davide Barbieri, Daniele Braga, Marco Brambilla, Alessandro Bozzon, Alessandro Campi, Davide Chicco, Emanuele Della Valle, Piero


slide-1
SLIDE 1

Search Computing

November 7, 2011 Stefano Ceri ... and the SeCo Project Team

Adnan Abid, Mamoun Abu Helu, Davide Barbieri, Daniele Braga, Marco Brambilla, Alessandro Bozzon, Alessandro Campi, Davide Chicco, Emanuele Della Valle, Piero Fraternali, Nicola Gatti, Giorgio Ghisalberghi, Michael Grossniklaus, Davide Martinenghi, Marco Masseroli, Maristella Matera, Chiara Pasini, Silvia Quarteroni, Marco Tagliasacchi, Luca Tettamanti, Salvatore Vadacca, Serge Zagorac
slide-2
SLIDE 2
  • Prof. Stefano Ceri
Database Management

Motivating Examples – What Search Engines can’t do

2
  • “Where can I find a theater close to Union Square, San

Francisco, showing a recent thriller movie, close to a good steak house?”

slide-3
SLIDE 3
  • Prof. Stefano Ceri
Database Management

Search For a Solution Using All Keywords

slide-4
SLIDE 4
  • Prof. Stefano Ceri
Database Management

Split the task, and search for theaters first

slide-5
SLIDE 5
  • Prof. Stefano Ceri
Database Management

But there’s no thriller!

  • Try another theater: Found! (The Next Three Days)

close enough to Union square....

5
slide-6
SLIDE 6
  • Prof. Stefano Ceri
Database Management

Independent search for steak house

slide-7
SLIDE 7
  • Prof. Stefano Ceri
Database Management

Done! Close enough! (data integration and ranking in the user’s brain)

slide-8
SLIDE 8
  • Prof. Stefano Ceri
Database Management

VISION

8
slide-9
SLIDE 9
  • Prof. Stefano Ceri
Database Management

The Search Computing Project

  • ERC-founded project
  • 5 years – Started in 2009, now at month 36
  • Build theories, methods and tools to support search-
  • riented multi-dimensional queries

– Given a multi-domain query – Build global solutions by integrating data produced by search services – Rank global solutions according to a global rank function and

  • utput results in rank order

– Support user-friendly interfaces for query definition and result browsing, which allow adding search domains while the search process proceeds and possibly change the relative weight of each ranking

slide-10
SLIDE 10
  • Prof. Stefano Ceri
Database Management

Search Computing = Search Service Composition

  • Searching the Web of Data requires demand-driven

service composition

  • Composition abstractions should emphasize few

elements: service invocations, fundamental operations, precedences, global constraints on execution

  • Data composition should be search-driven – producing

few top results very fast

11 Pipe Parallel Trulia.com real estate Walkscore.com walkability Metro.net public transit LocalCensus.com demographics GOOD, 30 results, 10 calls GOOD, 30 results, 5 seconds, 50 calls
slide-11
SLIDE 11
  • Prof. Stefano Ceri
Database Management

Modular software view of search applications

  • New generation software for building focused search

applications

  • Covering the functionalities of vertical search systems

(e.g. “expedia”, “amazon”) on more focused application domains (e.g. localized real estate or leasure planning, sector-specific job market offers, support of biomed research, ...)

  • Should be easy-to-build, easy-to-query, easy-to-maintain,

easy-to-scale...

12
slide-12
SLIDE 12
  • Prof. Stefano Ceri
Database Management

TECHNOLOGICAL FRAMEWORK

13
slide-13
SLIDE 13
  • Prof. Stefano Ceri
Database Management

Search Computing architecture: overall view

14 Main Query flow Domain Repository Front End Query Planner Cache Query To Domain Mapper Cache Query Analysis Cache Query Engine OP 1 OP 2 OP N Cache ... WS-Framework Cache Service Repository Result Transformation Cache WS World High-Level Query Sub-queries Concrete Query Plan Low-level queries Merged Results Domain Framework Cache Final User Results <Uses> relation High level query “Where can I attend a DB scientific conference close to a beautiful beach reachable with cheap flights?” Sub query 1 “Where can I attend a DB scientific conference?” Sub query 2 “place close to a beautiful beach?” Sub query 3 “place reachable with cheap flight?” Low level query 1 ConfSearch(“DB”,placeX,dateY) Low level query 2 TourSearch(“Beach”,PlaceX) Low level query 3 Flight(“cost<200”,PlaceX,DateY) Query plan Services invocations and operators execution Results Presented results ESWC-Crete-Olympic CAISE- Hammamet – Alitalia TOOLS-Malaga-EasyJet
slide-14
SLIDE 14
  • Prof. Stefano Ceri
Database Management

Search Computing architecture: incremental prototyping

15 Prototype 1: Core behaviour of the system.
  • Query engine
  • Domain repository
  • Service repository
  • Result presentation
<Uses> relation Domain Repository Front End Query Planner Cache Query To Domain Mapper Cache Query Analysis Cache Query Engine OP 1 OP 2 OP N Cache ... WS-Framework Cache Service Repository Result Transformation Cache WS World High-Level Query Sub-queries Concrete Query Plan Low-level queries Merged Results Domain Framework Cache Final User Results Admin Interface Low-level queries Sub-queries Concrete Query Plan Prototype 2: Vertical solutions
  • ER Domain description
  • Query planner
  • Application design tools
Prototype 3: Ontology-driven search
  • Ontological query
interpretation
  • Ontological description &
annotation of services Prototype 4: NL or keyword queries
slide-15
SLIDE 15
  • Prof. Stefano Ceri
Database Management

LIQUID QUERY INTERFACE

16
slide-16
SLIDE 16
  • Prof. Stefano Ceri
Database Management

Concert Artist Exhibition Restaurant Hotel Movie Metro Station Theatre Photo Landmark News ... Piece ... ... ... ... ShoppingCenter ... ... ... Photo

Liquid query definition

Concert

It consists of subsetting and parametrizing the resource graph...

Metro Station Restaurant News Exhibition Artist Hotel

= inputs, outputs + GR = global ranking

slide-17
SLIDE 17
  • Prof. Stefano Ceri
Database Management

Photo

Liquid query definition

Concert

... And then characterizing the user interaction

Metro Station Restaurant News Exhibition Artist Hotel

Plus:

  • Parametrization of global ranking
  • Data visualization options
  • .. and so on
Expand
slide-18
SLIDE 18
  • Prof. Stefano Ceri
Database Management

Exploration of the Service Space Entity Selection

slide-19
SLIDE 19
  • Prof. Stefano Ceri
Database Management

Exploration of the Service Space

Entity Selection

Service Selection

slide-20
SLIDE 20
  • Prof. Stefano Ceri
Database Management

Exploration of the Service Space

Entity Selection Service Selection

Query !!

slide-21
SLIDE 21
  • Prof. Stefano Ceri
Database Management

Result Presentation

Tabular Representation

Ranking Bar Local

Order Filter Projection
slide-22
SLIDE 22
  • Prof. Stefano Ceri
Database Management

Result Presentation (Map)

23
slide-23
SLIDE 23
  • Prof. Stefano Ceri
Database Management

Exploration options from a given state

24

Related Entities

slide-24
SLIDE 24
  • Prof. Stefano Ceri
Database Management

Result Presentation (Atom View)

25

Association

Real Estate Service Doctor Service
slide-25
SLIDE 25
  • Prof. Stefano Ceri
Database Management

Result Visualization – Combinations on Maps

26
slide-26
SLIDE 26
  • Prof. Stefano Ceri
Database Management

SERVICE REGISTRATION

27
slide-27
SLIDE 27
  • Prof. Stefano Ceri
Database Management

Rationale of Service Registration

Concert

  • Providing a “Semantic Resource Framework” (SRF) where concepts of the
real world are mapped to entities and interconnected by relationships
  • Along the idea of the “web of objects” instead of the “web of pages”

Artist Exhibition Restaurant Hotel Movie Metro Station Theatre Photo Landmark News ... Piece ... ... ... ... ShoppingCenter ... ... ...

slide-28
SLIDE 28
  • Prof. Stefano Ceri
Database Management

Under the scene...

29
slide-29
SLIDE 29
  • Prof. Stefano Ceri
Database Management

Service Framework in SeCo

Service Interface Access Pattern Service Mart Domain Diagram Reference KB

Service Description Framework Semantic Annotation Framework Conceptual representation: group services by core entities Logical representation: i/o fields and transitions as domain entities/relationships Physical representation: as shipped by data provider Capturing of service semantics via Knowledge Base lookup Entities/relationships “mentioned” in SDF
  • A Service Description Framework coupled with a Semantic Annotation Framework
slide-30
SLIDE 30
  • Prof. Stefano Ceri
Database Management TheaterByFLD

Semantic Framework: Domain Diagram and Access Patterns

32 TheatrebyMovie ActorByTitle PrizeByDirector Actor Theatre Movie Film_Director Prize Current Position Current Date Domain concept Access Pattern MovieByTitle
slide-31
SLIDE 31
  • Prof. Stefano Ceri
Database Management

SECO ENGINE

33
slide-32
SLIDE 32
  • Prof. Stefano Ceri
Database Management

The Query Processor in the Big Picture

slide-33
SLIDE 33
  • Prof. Stefano Ceri
Database Management

The Query Processor in the Big Picture

Query Processor Workbench testing tool
slide-34
SLIDE 34
  • Prof. Stefano Ceri
Database Management

The Query Processor in the Big Picture

Query Processor Workbench testing tool Logical Level Panta Rhei Physical Level Restaurant Movie Theatre OUT IN SeCoQL
slide-35
SLIDE 35
  • Prof. Stefano Ceri
Database Management

SeCoQL

An old drama movie showing tonight in a theatre close to a good restaurant NightOut(Piccadilly, London, UK)

slide-36
SLIDE 36
  • Prof. Stefano Ceri
Database Management

First step: from conjunctive queries to logical plans Generation of a logical plan

Restaurant Movie Theatre OUT IN
slide-37
SLIDE 37
  • Prof. Stefano Ceri
Database Management

Second step: from logical to physical query plans

Then the planner generates a physical, executable query plan, expressed in Panta Rhei

Movie Theater Restaurant (1,10,R) (1,1,T) Restaurant Movie Theatre OUT IN
slide-38
SLIDE 38
  • Prof. Stefano Ceri
Database Management

Workbench

40

Query Lifecycle Panta Rhei Plan Query Execution Report

slide-39
SLIDE 39
  • Prof. Stefano Ceri
Database Management

THEORY

41
slide-40
SLIDE 40
  • Prof. Stefano Ceri
Database Management

Proximity Rank Join (Martinenghi-Tagliasacchi, VLDB 2010)

42 Hotels Restaurants Rating Stars Join Too many results ! Order by
  • individual scores
  • proximity from query
  • Inter-object proximity
and return the top-K results
slide-41
SLIDE 41
  • Prof. Stefano Ceri
Database Management

Ranking with uncertain scoring functions (Soliman, Ilyas, Martinenghi, Tagliasacchi, Sigmod 2011) I D Ratin g Star s

τ1

2 6

τ2

7 5

τ3

4 7

τ4

5 2

wR wH

(0.1,0.9)

τ3 τ1 τ2 τ4

(0.7,0.3)

τ2 τ3 τ4 τ1

(0.4,0.6)

τ2 τ3 τ1 τ4

43
  • How to select the right weights?
SELECT * FROM Restaurants R, Hotels H WHERE R.Location = H.Location ORDER BY wR· R.Rating + wH· H.Stars LIMIT K
slide-42
SLIDE 42
  • Prof. Stefano Ceri
Database Management

CURRENT STANDING

44
slide-43
SLIDE 43
  • Prof. Stefano Ceri
Database Management

Results after 36 months

  • Concepts

– Service marts, join methods, panta rhei, liquid query

  • Research results

– LNCS: Search Computing Challenges and Directions (2010) – LNCS: New Trends in Search Computing (2011) – Publications (incl. VLDB, WWW, SIGMOD, TODS) – US Patent filed (top-k method, random & sequential services)

  • Site: www.search-computing.eu
  • Blog: http://blog.search-computing.com/
  • Temporary research positions (3 phd, 5 post-ms, 3 post-doc)
45
slide-44
SLIDE 44
  • Prof. Stefano Ceri
Database Management

Accesses to Web Site & Blog (2010)

46 Visits: 20% USA, 18% Italy, 6% UK, 4% India, 4% Canada
slide-45
SLIDE 45
  • Prof. Stefano Ceri
Database Management

Accesses to Web Site & Blog (2011)

47 Visits: 27% USA, 10% India, 8% Italy, 4% Germany – UK - France
slide-46
SLIDE 46
  • Prof. Stefano Ceri
Database Management

Events in 2011

  • Functionality Demo at WWW 2011 (Bangalore)
  • Engine Demo at ACM-Sigmod (Athens)
48
slide-47
SLIDE 47
  • Prof. Stefano Ceri
Database Management

Events in 2011-12

  • Workshops:
– ExploreWeb (Brambilla, Fraternali, Schwabe) http://exploreweb.search-computing.org/ – DBRank Workshop (Chakrabarti, Martinenghi) – Very Large Data Search (VLDS) (Brambilla, Casati, Ceri) http://vlds2011.search-computing.net/ – Ordering and Reasoning (OrdRing) (Bozzon, Della Valle, Horrocks) http://ordring2011.search-computing.org/ – DataView 2011 (Bozzon, Comai, Norrie) http://dataview.como.polimi.it/2011/
  • Third LNCS Book ?
  • Planned VLDB Journal special issue on “Web search over

structured information and crowds” (tent. Title, Brambilla/Ceri/Halevy, Sept. 2012)

49
slide-48
SLIDE 48
  • Prof. Stefano Ceri
Database Management

Future Research Directions

  • NLP or keyword-based queries

– focus on subquery partitioning & mapping to domains

  • Social dimension

– crowd-searching using social platforms

  • Verticals

– joint work with Diadem (Gottlob) on London real-estate – regional platforms for “quality of life”

50
slide-49
SLIDE 49
  • Prof. Stefano Ceri
Database Management

And finally....

51

问题?