Result Clustering for Keyword Search on Graphs Madhulika Mohanty - - PowerPoint PPT Presentation

result clustering for keyword search on graphs
SMART_READER_LITE
LIVE PREVIEW

Result Clustering for Keyword Search on Graphs Madhulika Mohanty - - PowerPoint PPT Presentation

Result Clustering for Keyword Search on Graphs Madhulika Mohanty Supervisor: Dr Maya Ramanath Common data formats across the Web Easily interpretable by machines Web of data LINKED DATA Collection of knowledge bases.


slide-1
SLIDE 1

Result Clustering for Keyword Search on Graphs

Madhulika Mohanty

Supervisor: Dr Maya Ramanath

slide-2
SLIDE 2
  • Common data formats

across the Web

  • Easily interpretable by

machines → “Web of data”

slide-3
SLIDE 3

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

LINKED DATA

  • Collection of knowledge bases.
  • All the knowledge bases are interlinked.
  • Represented as RDF.
  • RDF : Resource Description Framework
  • Data model to represent structured

data

  • Triples: <subject> <predicate> <object>
  • Example:

<Tom_Hanks> <ActedIn> <Cast_Away> <Tom_Hanks> <ActedIn> <Forrest_Gump>

Tom Hanks Cast Away ActedIn

slide-4
SLIDE 4

Sample YAGO graph1

1 http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/

slide-5
SLIDE 5

Querying graphs

  • SPARQL queries – structured queries

– Structured results – eg. Graph databases like Neo4j

  • Natural Language queries → SPARQL

→ Structured results

  • Relationship queries – unstructured text
slide-6
SLIDE 6

Relationship queries

  • Unstructured text, like Google.
  • Answers are relationships among queried entities.
  • More popularly known as “Keyword Search”.
  • Why Keyword Search?

– Make graphs query-able by casual users. – Find interesting relationships – even surprise

discoveries.

slide-7
SLIDE 7

Jeff Weiner Mark Zuckerberg

slide-8
SLIDE 8

Jeff Weiner Mark Zuckerberg

I bet you know this..

slide-9
SLIDE 9

Jeff Weiner Mark Zuckerberg

Now that's interesting!!

slide-10
SLIDE 10

Bill Gates Nobel Prize winner - Edwin G. Krebs Mausam 14th Dalai Lama

Another interesting one..

slide-11
SLIDE 11

Bill Gates Nobel Prize winner - Edwin G. Krebs Mausam 14th Dalai Lama Honorary Doctorate Honorary Doctorate Faculty Doctorate

Another interesting one..

slide-12
SLIDE 12

Tom Hanks Robin Wright Forrest Gump

A c t e d I n Acted In

Cast Away

A c t e d I n

Movie

IsA I s A

Actor

I s A IsA

Daniel Craig The Girl with the Dragon Tattoo

A c t e d I n Acted In IsA IsA

Rooney Mara

I s A Acted In

Larry Crowne

Acted In I s A

Casino Royale

ActedIn

2011

InYear

InYear 2000 2006 1994

InYear InYear InYear IsA

Movie dataset graph

slide-13
SLIDE 13

Tom Hanks Robin Wright Forrest Gump

A c t e d I n Acted In

Cast Away

A c t e d I n

Movie

IsA I s A

Actor

I s A IsA

Daniel Craig The Girl with the Dragon Tattoo

A c t e d I n Acted In IsA IsA

Rooney Mara

I s A Acted In

Larry Crowne

Acted In I s A

Casino Royale

ActedIn

2011

InYear

InYear 2000 2006 1994

InYear InYear InYear IsA

Searching for 'Hanks Wright'

Movie dataset graph

slide-14
SLIDE 14

Tom Hanks Robin Wright Forrest Gump Acted In Acted In Cast Away A c t e d I n Movie I s A IsA Actor IsA I s A Daniel Craig The Girl with the Dragon Tattoo Acted In Acted In IsA IsA Rooney Mara IsA A c t e d I n Larry Crowne A c t e d I n IsA Casino Royale ActedIn 2011 I n Y e a r InYear 2000 2006 1994 InYear I n Y e a r I n Y e a r I s A

slide-15
SLIDE 15

Tom Hanks Robin Wright Forrest Gump Acted In Acted In Cast Away A c t e d I n Movie I s A IsA Actor IsA I s A Daniel Craig The Girl with the Dragon Tattoo Acted In Acted In IsA IsA Rooney Mara IsA A c t e d I n Larry Crowne A c t e d I n IsA Casino Royale ActedIn 2011 I n Y e a r InYear 2000 2006 1994 InYear I n Y e a r I n Y e a r I s A

slide-16
SLIDE 16

Tom Hanks Robin Wright Forrest Gump

A c t e d I n Acted In

Cast Away

A c t e d I n

Movie

IsA I s A

Actor

I s A IsA

Daniel Craig The Girl with the Dragon Tattoo

A c t e d I n Acted In IsA IsA

Rooney Mara

I s A Acted In

Larry Crowne

Acted In I s A

Casino Royale

ActedIn

2011

InYear

InYear 2000 2006 1994

InYear InYear InYear IsA

slide-17
SLIDE 17

Tom Hanks Robin Wright Forrest Gump

A c t e d I n Acted In

Cast Away

A c t e d I n

Movie

IsA I s A

Actor

I s A IsA

Daniel Craig The Girl with the Dragon Tattoo

A c t e d I n Acted In IsA IsA

Rooney Mara

I s A Acted In

Larry Crowne

Acted In I s A

Casino Royale

ActedIn

2011

InYear

InYear 2000 2006 1994

InYear InYear InYear IsA

  • Results are trees.
slide-18
SLIDE 18

Tom Hanks Robin Wright Forrest Gump

A c t e d I n Acted In

Cast Away

A c t e d I n

Movie

IsA I s A

Actor

I s A IsA

Daniel Craig The Girl with the Dragon Tattoo

A c t e d I n Acted In IsA IsA

Rooney Mara

I s A Acted In

Larry Crowne

Acted In I s A

Casino Royale

ActedIn

2011

InYear

InYear 2000 2006 1994

InYear InYear InYear IsA

  • Results are trees.
  • There should exist interconnection

between all pairs of keyword nodes.

slide-19
SLIDE 19

Keyword Search in a Graph structured data

Given a set of query keywords,Q=k1,k2,.....,kn and a graph G=(V , E);find top- K minimal answer trees A1, A2,...., Ak

  • rdered by their relevance score.

Query

slide-20
SLIDE 20

Query

Research Areas

slide-21
SLIDE 21

Query

Research Areas

Efficiency

slide-22
SLIDE 22

Query

Research Areas

Efficiency

  • Ranking of results
  • Quality of results
slide-23
SLIDE 23

Query

Research Areas

Efficiency

  • Ranking of results
  • Quality of results

User experience

slide-24
SLIDE 24

Query

Research Areas

Efficiency

  • Ranking of results
  • Quality of results

User experience

slide-25
SLIDE 25

Searching for 'Rekha Bachchan'

slide-26
SLIDE 26

18 such results

Searching for 'Rekha Bachchan'

slide-27
SLIDE 27

18 such results

Searching for 'Rekha Bachchan'

Different contexts

slide-28
SLIDE 28

User experience

  • All kinds of results shown.
  • Multiple results of same type. Eg. Amitabh and

Rekha were co-actors in multiple movies.

– Most of them ranked high. – User is forced to scroll through all

before finding new answers.

  • Results with different contexts.

– User might completely miss some

information.

slide-29
SLIDE 29

User experience

  • All kinds of results shown.
  • Multiple results of same type. Eg. Amitabh and

Rekha were co-actors in multiple movies.

– Most of them ranked high. – User is forced to scroll through all

before finding new answers.

  • Results with different contexts.

– User might completely miss some

information.

  • One way to deal with it – Clustering similar

results.

slide-30
SLIDE 30

Result clustering

  • Cluster similar results together.
  • Rank the clusters.
  • Show one representative per cluster

(Highest Ranked Tree).

– User may click it and see all results.

  • Advantages:

– Can be used with any existing

Keyword Search algorithm.

– Provides user with a bird's eye view

  • ver the results.

– Easy to analyze interesting patterns.

slide-31
SLIDE 31

Result clustering (contd.)

Isomorphism based Tree Edit distance based Language Model (LM) based

  • Cluster isomorphic

trees together.

  • Two trees need to

have exact same structure to be clustered together.

  • Ends up generating

too many clusters.

  • Clustering based on tree-

edit distance with a similarity threshold of 0.9

  • Cannot differentiate

different contexts like the “Amitabh Bachchan” and “Bol Bachchan” case.

  • Agglomerative Complete Link

Clustering

  • Each tree represented as a

LM.

  • JS Divergence as similarity

measure.

slide-32
SLIDE 32

Clustering Quality measure: User evaluation

  • Dataset: IMDB
  • User evaluations over 20 manually

selected queries.

– Varying from 2-6 keywords in each.

  • User was not aware of the underlying

technique.

  • Asked to rate on a scale of 1-5:

– How similar trees are within a cluster? – How dissimilar trees are between

different clusters?

slide-33
SLIDE 33

Thank you