BANKS BANKS Browsing rowsing an and d K Keyword eyword S - - PowerPoint PPT Presentation

banks banks browsing rowsing an and d k keyword eyword s
SMART_READER_LITE
LIVE PREVIEW

BANKS BANKS Browsing rowsing an and d K Keyword eyword S - - PowerPoint PPT Presentation

BANKS BANKS Browsing rowsing an and d K Keyword eyword S Search earch B in Relational Databases in Relational Databases B. Aditya, Gaurav Bhalotia, B. Aditya, Gaurav Bhalotia, Soumen Chakrabarti, Soumen Chakrabarti, Arvind Hulgeri,


slide-1
SLIDE 1

BANKS BANKS B Browsing rowsing an and d K Keyword eyword S Search earch in Relational Databases in Relational Databases

  • B. Aditya, Gaurav Bhalotia,
  • B. Aditya, Gaurav Bhalotia,

Soumen Chakrabarti, Soumen Chakrabarti, Arvind Hulgeri, Charuta Nakhe, Arvind Hulgeri, Charuta Nakhe, Parag, S. Sudarshan Parag, S. Sudarshan IIT Bombay IIT Bombay http://www.cse.iitb.ac.in/banks/ http://www.cse.iitb.ac.in/banks/

slide-2
SLIDE 2

Aug 2002 VLDB 2002 DEMO 2

Motivation Motivation

§ § Web search engines are very successful Web search engines are very successful

  • Simple and intuitive keyword query interface

Simple and intuitive keyword query interface

§ § Database querying using keywords is desirable Database querying using keywords is desirable

  • Query languages, e.g.,SQL/QBE, are not appropriate

Query languages, e.g.,SQL/QBE, are not appropriate for casual users for casual users

  • Form interfaces cumbersome, give limited views

Form interfaces cumbersome, give limited views

§ § Examples of keyword queries on databases Examples of keyword queries on databases

  • e

e-

  • store database: “camcorder panasonic”

store database: “camcorder panasonic”

  • Book store: “sudarshan databases”

Book store: “sudarshan databases”

§ § Differences from IR/Web Search Differences from IR/Web Search

  • Normalization splits related data across multiple tuples

Normalization splits related data across multiple tuples

  • Answer to a query is a set of (closely) connected

Answer to a query is a set of (closely) connected tuples that match all given keywords tuples that match all given keywords

slide-3
SLIDE 3

Aug 2002 VLDB 2002 DEMO 3

Basic Model Basic Model

§ § Database: modeled as a graph Database: modeled as a graph

  • Nodes = tuples

Nodes = tuples

  • Edges = references between tuples

Edges = references between tuples

w w foreign key, inclusion dependencies, etc. foreign key, inclusion dependencies, etc. w w Edges are directed Edges are directed

MultiQuery Optimization

  • S. Sudarshan

Prasan Roy writes author paper Charuta BANKS: Keyword search…

slide-4
SLIDE 4

Aug 2002 VLDB 2002 DEMO 4

Answer Model Answer Model

§ § Rooted, directed tree connecting keyword nodes Rooted, directed tree connecting keyword nodes

  • May include internal nodes that contain no keywords

May include internal nodes that contain no keywords

  • Root node has special significance

Root node has special significance

w w May be restricted to relations representing entities May be restricted to relations representing entities w w Avoid relations representing relationships, e.g. “writes” Avoid relations representing relationships, e.g. “writes”

§ § An example: “sudarshan roy” An example: “sudarshan roy” § § Multiple answers may exist Multiple answers may exist

  • Ranked by

Ranked by proximity proximity + + prestige prestige

MultiQuery Optimization

  • S. Sudarshan

Prasan Roy writes writes author author paper

slide-5
SLIDE 5

Aug 2002 VLDB 2002 DEMO 5

Relevance Calculation Relevance Calculation

§ § Proximity Proximity

  • Forward edges: foreign key

Forward edges: foreign key primary key primary key

  • Weight of forward edge is based on schema

Weight of forward edge is based on schema

w w E.g. “cites” link weight greater than “writes” link weight E.g. “cites” link weight greater than “writes” link weight

  • May need backward edges to form answer tree

May need backward edges to form answer tree

w w Weight of backward edge u Weight of backward edge u v v ∝ ∝ indegree of u indegree of u

§ § Node prestige based on indegree Node prestige based on indegree § § Answer tree relevance Answer tree relevance

  • Edge score

Edge score E E = = 1 / 1 / Σ Σ edge edge-

  • weights

weights

  • Node score

Node score N N = = Σ Σ root root-

  • and leaf

and leaf-

  • node

node-

  • weights

weights

w w Ignore weights of internal nodes Ignore weights of internal nodes

  • Normalize and combine using weighting factor

Normalize and combine using weighting factor λ λ

w w Additive: Additive: (1 (1-

  • λ

λ) ) E E + + λ λN; multiplicative: N; multiplicative: EN ENλ

λ

slide-6
SLIDE 6

Aug 2002 VLDB 2002 DEMO 6

Answer Trees Answer Trees

§ § Anecdotal results Anecdotal results

“Mohan” “Mohan”: C. Mohan at the top based on prestige (# of papers) : C. Mohan at the top based on prestige (# of papers) “Transaction” “Transaction”: Jim Gray’s classic paper and textbook at the top : Jim Gray’s classic paper and textbook at the top based on prestige (# of citations) based on prestige (# of citations) “Sunita Seltzer” “Sunita Seltzer”: No common papers, but both have papers with : No common papers, but both have papers with Stonebraker; system finds this connection Stonebraker; system finds this connection

§ § Backward expanding search algorithm Backward expanding search algorithm

  • Start at leaf nodes each containing a query keyword

Start at leaf nodes each containing a query keyword

  • Run concurrent single source shortest path algorithm from each

Run concurrent single source shortest path algorithm from each such node, traversing edges backwards such node, traversing edges backwards

  • Confluence of backward paths identify answer tree roots

Confluence of backward paths identify answer tree roots

  • Answer trees may not be generated in relevance order

Answer trees may not be generated in relevance order

w w Insert answers to a small buffer (heap) as are generated Insert answers to a small buffer (heap) as are generated w w Output highest ranked answer from buffer when buffer is full Output highest ranked answer from buffer when buffer is full

slide-7
SLIDE 7

Aug 2002 VLDB 2002 DEMO 7

The BANKS System The BANKS System

§ § Available on the web, with (part of) DBLP data Available on the web, with (part of) DBLP data

  • http://www.cse.iitb.ac.in/banks/

http://www.cse.iitb.ac.in/banks/

§ § No programming needed for customization No programming needed for customization

  • Minimal preprocessing to create indices and give weights to link

Minimal preprocessing to create indices and give weights to links s

§ § Provides keyword search coupled with extensive Provides keyword search coupled with extensive browsing features browsing features

  • Schema browsing + data browsing

Schema browsing + data browsing

  • Hyperlinks are automatically added to all displayed results

Hyperlinks are automatically added to all displayed results

  • Browsing data by grouping and creating crosstabs

Browsing data by grouping and creating crosstabs

  • Graphical display of data: bar charts, pie charts, etc

Graphical display of data: bar charts, pie charts, etc

BANKS User

JDBC HTTP Web Server+Servlets

Database