Keyword Weight Propagation for Indexing Structured Web Content Jong - - PowerPoint PPT Presentation

keyword weight propagation for indexing structured web
SMART_READER_LITE
LIVE PREVIEW

Keyword Weight Propagation for Indexing Structured Web Content Jong - - PowerPoint PPT Presentation

Keyword Weight Propagation for Indexing Structured Web Content Jong Wook Kim, and K. Selcuk Candan Comp. Sci. and Eng. Dept Arizona State University {jong, candan}@asu.edu WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006,


slide-1
SLIDE 1

1

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Jong Wook Kim, and K. Selcuk Candan

  • Comp. Sci. and Eng. Dept

Arizona State University {jong, candan}@asu.edu

Keyword Weight Propagation for Indexing Structured Web Content

slide-2
SLIDE 2

2

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Table

Motivation Approach Related Work Relative Content of Entries Keyword Propagation

Keyword Propagation between a Pair of Entries Keyword Propagation across a Complex Structure

Experiment Conclusion and Future Work

slide-3
SLIDE 3

3

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Motivation

Many web sites and portals organize content in a navigation hierarchy

slide-4
SLIDE 4

4

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Motivation

Many web sites and portals organize content in a navigation hierarchy

slide-5
SLIDE 5

5

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Motivation

Many web sites and portals organize content in a navigation hierarchy A navigation hierarchy

Effective when browsing to find a specific content Semantic relationships between the data contents

Generalization/ Specialization

slide-6
SLIDE 6

6

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Motivation

The Yahoo CS hierarchy

Keyword contents of the intermediate nodes may describe their content in the hierarchy ambiguously

slide-7
SLIDE 7

7

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Motivation

In a navigational hierarchy, keyword searchs are usually directed

to the root of the hierarchy, or

Undesirable topic drift

to the leaves

May not be enough to satisfy the query

It is important for individual nodes to be properly indexed

slide-8
SLIDE 8

8

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Table

Motivation Approach Related Work Relative Content of Entries Keyword Propagation

Keyword Propagation between a Pair of Entries Keyword Propagation across a Complex Structure

Experiment Conclusion and Future Work

slide-9
SLIDE 9

9

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Approach

Keyword and keyword weight propagation

Enrich the individual nodes with the contents of the neighboring nodes

How to decide what to propagate and how much?

The original semantic structure should be preserved

Generalization/ Specialization

Challenge

How to represent the semantic structure (i.e., generalization/ specialization) between nodes? How to determine the degree of keyword inheritance?

slide-10
SLIDE 10

10

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Approach

Contributions of the Paper

Develop a method for discovering and quantifying the generalization/ specialization relationship between entries in a navigation hierarchy Develop a keyword propagation algorithm using this relationship

slide-11
SLIDE 11

11

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Table

Motivation Approach Related Work Relative Content of Entries Keyword Propagation

Keyword Propagation between a Pair of Entries Keyword Propagation across a Complex Structure

Experiment Conclusion and Future Work

slide-12
SLIDE 12

12

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Related Work

Score and Keyword Frequency Propagation

Propagate the relevance score [Shakery, and Zhai, TREC’03] Propagate the term frequency value [Savoy et al. JASIS’97]

[Song et al. TREC’04]

Propagate the relevance score and the term frequency value

[Qin et al. SIGIR’05]

slide-13
SLIDE 13

13

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Table

Motivation Related Work Approach Relative Content of Entries Keyword Propagation

Keyword Propagation between a Pair of Entries Keyword Propagation across a Complex Structure

Experiment Conclusion and Future Work

slide-14
SLIDE 14

14

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Relative Content of Entries

In a navigation hierarchy,

A specialized entry corresponds to more constrained concept

As one moves down in a hierarchy, the nodes get more specialized

A general entry is less constrained

As one moves up in a hierarchy, the nodes get more generalized.

slide-15
SLIDE 15

15

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Relative Content of Entries

Intuition

Given two entries, A and B (A is an ancestor of B),

Assume

– A has three keyword (k1, k2, k3) , and – B has two keyword (k2, k3)

“Entry A is more general than B” A being less constrained than B by keywords If B is interpreted as k2 ν k3, then A should be interpreted as k1 ν k2 ν k3 – Less constrained than k2 ν k3 Interpreted as the disjunction of keywords

slide-16
SLIDE 16

16

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Relative Content of Entries

In extended boolean model [Salton 83],

OR-ness

An entry further away from O better matches the k1 ν k2 Measured as a distance from O O = ┐(k1 ν k2)

slide-17
SLIDE 17

17

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Relative Content of Entries

Given two entries, A and B (A is an ancestor of B),

Assume

A has three keyword (k1, k2, k3) , and B has two keyword (k2, k3)

How much entry A and B represent a disjunct ?

  • ,

If A is more general than B, then

| | | | B O B = − | | | | A O A = −

| | | | O B O A − > −

slide-18
SLIDE 18

18

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Relative Content of Entries

Relative Content

Measure whether the additional keywords (AU) make A more general or less general than BC

| | | | | | | |

C C U C AB

B A A B A R + = =

Visual representation of the keyword contents

slide-19
SLIDE 19

19

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Table

Motivation Approach Related Work Relative Content of Entries Keyword Propagation

Keyword Propagation between a Pair of Entries Keyword Propagation across a Complex Structure

Experiment Conclusion and Future Work

slide-20
SLIDE 20

20

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Keyword Propagation between a pair of entries

The purpose of keyword propagation

Enrich the entries in a navigational hierarchy The original semantic properties (i.e., relative generality) should be preserved

Propagation Degree, α

Govern how much keyword weights two neighboring entries should exchange

slide-21
SLIDE 21

21

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Keyword Propagation between a pair of entries

Propagation Degree, α

Given two entries, A and B, ai : weight associated with keywords ki KA bi : weight associated with keywords ki KB A’ and B’ Enriched entries after keyword propagation For all ki KA’ If ki (KA - KB), then a’i = ai If ki (KA ∩ KB), then a’i = ai + αbi If ki (KB - KA), then a’i = αbi For all ki KB’ If ki (KA - KB), then b’i = αai If ki (KA ∩ KB), then b’i = bi + αai If ki (KB - KA), then b’i = bi

∈ ∈ ∈ ∈ ∈ ∈ ∈ ∈ ∈ ∈

slide-22
SLIDE 22

22

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Keyword Propagation between a pair of entries

Propagation Degree, α

A’ and B’ are located in a common keyword space

KC = KA’ = KB’ = KA KB

After keyword propagation, relative content should be preserved

AB B A

R R =

' '

AB C B A

R B A B A R = = = | ' | | ' | | | | |

' '

slide-23
SLIDE 23

23

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Table

Motivation Approach Related Work Relative Content of Entries Keyword Propagation

Keyword Propagation between a Pair of Entries Keyword Propagation across a Complex Structure

Experiment Conclusion and Future Work

slide-24
SLIDE 24

24

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Keyword Propagation across a Complex Structure

Let H(N,E) be a navigation hierarchy,

N : the set of nodes E : the set of edges

Propagation Adjacency Matrix, M

If there is an edge eij E, then both (i,j) and (j,i) of M are equals to αij (the pairwise propagation degree) Otherwise, both (i,j) and (j,i) of M are equal to 0.

α12 α12 α23 α23

slide-25
SLIDE 25

25

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Keyword Propagation across a Complex Structure

Keyword Propagation Process

Given a hierarchy, H(N,E)

T : Term-node matrix M : Propagation Adjacency matrix

Term Propagation Matrix

  • P = T M

α12 α12 α23 α23 K1 K2 K2 K3 K3 α12 K1 α12 K2 α12K2 α23 K2 α12 K3 α23K3 α23 K3

P T M

=

Inherited from its neighbors in M node term

slide-26
SLIDE 26

26

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Keyword Propagation across a Complex Structure

After keyword propagation

T’ = T + P = T + TM = T(I + M) = TMI

New enriched term-node matrix All diagonal values are 1 and all non- diagonal entries are same with M Propagation Adjacency matrix

slide-27
SLIDE 27

27

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Keyword Propagation across a Complex Structure

Keyword Propagation Process

slide-28
SLIDE 28

28

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Keyword Propagation across a Complex Structure

Keyword Propagation Process

Tfinal= TMI1MI2…MId

Propagation adjacency matrix computed for the dth iteration (d is the greatest number of edges between any nodes)

d = 2

slide-29
SLIDE 29

29

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Table

Motivation Approach Related Work Relative Content of Entries Keyword Propagation

Keyword Propagation between a Pair of Entries Keyword Propagation across a Complex Structure

Experiment Conclusion and Future Work

slide-30
SLIDE 30

30

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Experiment

Experiment Setup

Data

Yahoo Hierarchy Computer Science, Mathematics, and Movie directory

Ground truth and Query

10 sample keyword queries User study (8 users)

slide-31
SLIDE 31

31

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Experiment

Experiment Setup

Query processing

N (No Keyword Propagation) KP (Keyword Propagation) Dt and Dn

– No Keyword Propagation, but context extracted from the whole tree

  • r neighbor

KP+ Dt and KP31+Dn

– keyword Propagation, and context extracted from the whole tree or neighbor

Evaluation measure

P@10 MRR (Mean reciprocal rank of the first relevant document) Paired t-Test

slide-32
SLIDE 32

32

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Keyword Propagation/ No Propagation

P@10 Average MRR

slide-33
SLIDE 33

33

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Keyword Propagation/ No Propagation

P-values for the t-Test

slide-34
SLIDE 34

34

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Keyword Propagation/ No Propagation

slide-35
SLIDE 35

35

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Keyword Propagation/ Alternative Context Extraction

Differentiated: P@10 Differentiated: t-Test relative No Keyword Propagation

slide-36
SLIDE 36

36

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Effect of the Structural Distance

slide-37
SLIDE 37

37

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Statistical Validation of the Ground Truth

ANOVA test

A statistical test to observe the agreement between the assessors We Identified two users whose judgments were significantly different from the other 6 users When excluding these two users, the user judgments were in agreement

slide-38
SLIDE 38

38

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Statistical Validation of the Ground Truth

Differentiated: P@10 Differentiated: t-Test relative No Keyword Propagation

slide-39
SLIDE 39

39

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Table

Motivation Approach Related Work Relative Content of Entries Keyword Propagation

Keyword Propagation between a Pair of Entries Keyword Propagation across a Complex Structure

Experiment Conclusion and Future Work

slide-40
SLIDE 40

40

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Conclusion and Future Work

Conclusion

Present a technique to identify a semantic relationship Introduce a relative content preserving keyword propagation technique

Future Work

Incorporate of other types of semantic cues

Structured-based method Information-based method

slide-41
SLIDE 41

41

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Philadelphia, PA, USA

Question