1
play

1 Outline Overview of Kikori-KS Background Summary of our - PDF document

Outline Background Summary of our Contribution Kikori-KS: An Effective and Kikori-KS Efficient Keyword Search System for Digital Libraries in XML User Interfaces Implementation of Keyword Search on Relational Databases


  1. Outline � Background � Summary of our Contribution Kikori-KS: An Effective and � Kikori-KS Efficient Keyword Search System for Digital Libraries in XML � User Interfaces � Implementation of Keyword Search on Relational Databases � Ranking Model Toshiyuki Shimizu (Kyoto University) � Experiments Norimasa Terada (Nagoya University) � Conclusions, Future Works Masatoshi Yoshikawa (Kyoto University) ICADL 2006 29 th November 2 Background (2/2) Background (1/2) � Large number of documents in digital � For the keyword “database” libraries are now structured in XML : Element Node � Growing demand for XML Information : Text Value article Retrieval (XML-IR) Systems � We can identify meaningful document fragments transaction title body by encoding documents in XML section section � ex) Sections, subsections and paragraphs XML Index in scholarly articles database title p title p � Browsing only document fragments relevant to a certain topic Introduction XML Labeling …XML database � Keyword search on XML documents Query processing… � Simple, intuitively understandable, yet useful form of queries, especially for unskilled end-users � We do not need to understand XML query languages and XML schema 3 4 Outline Summary of our Contribution � Background � We have developed Kikori-KS � A prototype system for XML-IR � Summary of our Contribution � Under Kikori Project � Kikori-KS � Accepts Keyword Set as a query � User Interfaces � Implementation of Keyword Search on Relational � User-friendly interface Databases � Ranking Model � FetchHighlight interface � Experiments � Storage schema on RDB � Conclusions, Future Works � The database schema is carefully designed � Acceptable search time 5 6 1

  2. Outline Overview of Kikori-KS � Background � Summary of our Contribution SQL Translation � Kikori-KS RDB Set of Keywords Module � User Interfaces � Implementation of Keyword Search on Relational Ranked relevant elements Databases Storage Module � Ranking Model End User � Experiments User Interface Search Results <?xml version=“1.0”?> � Conclusions, Future Works Module <document> ~ ~ </document> XML Documents 7 8 User Interfaces Outline � Search results of XML-IR are document � Background fragments, which may be nested � Summary of our Contribution � INEX 2005 project * defined three strategies � Kikori-KS for element retrieval � User Interfaces E 1 E 1 D 1 D 1 � Implementation of Keyword Search on Relational E 2 E 2 E 11 E 11 Databases E 3 E 3 E 12 E 111 E 13 E 112 : : � Ranking Model : : E 12 : D 2 � Experiments : E 21 D 2 ( E i does not E 22 E 21 � Conclusions, Future Works overlap with E j ) : : Thorough Focussed FetchBrowse FetchHighlight � Three strategies of INEX are not necessarily intended to be used in designing user interfaces 9 10 * http://inex.is.informatik.uni-duisburg.de/2005/ Retrieval Strategy of INEX (1/3) Retrieval Strategy of INEX (2/3) � Thorough � Focussed E 1-3 (0.6) E 1-3 (0.6) E 1-8 (0.5) E 1-8 (0.5) � The system retrieves only focussed � Relevant elements are retrieved in E 1-10 (0.4) E 2-10 (0.4) elements (i.e. non-overlapping descending order of their scores E 2-10 (0.4) elements) : � Ranked in relevance order : element score element score D 1 D 2 D 1 D 2 0.2 0.1 0.2 0.1 E 1-1 E 2-1 E 1-1 E 2-1 0 0.6 0.1 0 0 0.2 0 0.6 0.1 0 0 0.2 E 1-2 E 1-3 E 1-4 E 2-2 E 2-3 E 2-4 E 1-2 E 1-3 E 1-4 E 2-2 E 2-3 E 2-4 0 0.5 0 0.3 0 0.5 0 0.3 E 1-5 E 1-8 E 2-5 E 2-8 E 1-5 E 1-8 E 2-5 E 2-8 0 0 0.3 0.4 0 0 0.3 0.4 0 0 0 0.4 0 0 0 0.4 E 1-6 E 1-7 E 1-9 E 1-10 E 2-6 E 2-7 E 2-9 E 2-10 E 1-6 E 1-7 E 1-9 E 1-10 E 2-6 E 2-7 E 2-9 E 2-10 11 12 2

  3. Retrieval Strategy of INEX (3/3) FetchHighlight � FetchBrowse � Fetching Phase � Displaying search result elements aggregated � The system first identifies relevant by XML documents is effective D 1 (0.2) documents and ranks them in E 1-3 (0.6) � FetchBrowse is of that style relevance order (0.5) E 1-8 � Browsing Phase � Displaying search result elements in their (0.4) E 1-10 � Within a fetched document, the : document order is useful D 2 (0.1) system identifies relevant elements and ranks them in relevance order E 2-10 (0.4) D 1 : document score E 11 � XML documents are first sorted in their D 1 0.2 D 2 0.1 E 111 relevance order element score 0.2 0.1 E 1-1 E 2-1 E 112 � Relevant elements within the XML 0 0.6 0.1 0 0 0.2 E 12 document are displayed in document : E 1-2 E 1-3 E 1-4 E 2-2 E 2-3 E 2-4 D 2 order E 21 0 0.5 0 0.3 � Elements are indented in accordance : E 1-5 E 1-8 E 2-5 E 2-8 with their depth in the XML tree FetchHighlight 0 0 0.3 0.4 0 0 0 0.4 E 1-6 E 1-7 E 1-9 E 1-10 E 2-6 E 2-7 E 2-9 E 2-10 13 14 FetchHighlight Interface Browsing Document Fragment Outline elements are displayed Elements with high score are Aggregated by Document order displayed by using a larger font document * Selected document fragment is Highlighted * Search words are Highlighted 15 16 The Feature of FetchHighlight Interface Outline � Focussed elements are easily identified � Background � Users can also recognize the parts in the � Summary of our Contribution documents with many high relevant elements � Kikori-KS clustered � User Interfaces � Implementation of Keyword Search on Relational � Outline elements Databases � Displayed even if the score is 0 � Ranking Model � The elements with particular structural information � Experiments � ex) such as sections and subsections � Conclusions, Future Works � Useful for browsing 17 18 3

  4. Storing XML documents into RDB Conceptual Database Design � A huge number of document fragments have Path Element to be handled efficiently pathID pathexp docID elemID pathID start end label 1 /article 1 1 1 1 236 XML Index � ex) There are 16,080,830 document fragments 2 /article/transaction 1 2 2 10 44 database (elements) against 16,819 documents in the 1 3 3 45 68 XML Index 3 /article/title INEX 1.9 collection used in our experiments : : : : : : : : � Storage schema based on XRel Term 1 � Independent of the logical structure of XML term docID elemID tfipf article elemID database 1 1 0.3 documents. 2 3 4 transaction title body database 1 2 0.1 � Conceptual Database Design 5 8 : : : : section section XML Index � Element (docID, elemID, pathID, start, end, label) XML 1 1 0.3 9 10 6 7 database XML 1 3 0.4 � Path (pathID, pathexp) title p title p : : : : � Term (term, docID, elemID, tfipf) Introduction XML Labeling Query processing * label: short text representing the element * tfipf : term weight in the element 19 20 We explain XML database Schema Refinement (1/4) Schema Refinement (2/4) � Materialized view � Materialized view � Join Element table, Path table, and Term table � Join Element table, Path table, and Term table � Partitioning the Term table with each term Term Element Path � Term_xyz (docID, elemID, tfipf, start, end, label, pathexp) term docID elemID tfipf start end label pathexp database 1 1 0.3 1 236 XML Index /article database 1 2 0.1 10 44 database /article/transaction � Selecting outline elements and constructing : : : : : : : : an Outline table in advance XML 1 1 0.3 1 236 XML Index /article XML 1 3 0.4 45 68 XML Index /article/title � The system designer have to predefines outline : : : : : : : : elements � Outline (docID, elemID, start, end, label, pathexp) 21 22 Schema Refinement (3/4) Schema Refinement (4/4) � Partitioning the table by terms � Selecting outline elements 1 and constructing an article � Term_xyz (docID, elemID, tfipf, start, end, label, pathexp) elemID 2 3 4 Outline table in advance transaction title body � The system designer 5 8 predefine outline elements section section XML Index 10 9 6 7 Term_database database title p title p docID elemID tfipf start end label pathexp 1 1 0.3 1 236 XML Index /article Introduction XML Labeling 1 2 0.1 10 44 database /article/transaction Query processing : : : : : : : We explain XML database Term_XML � Outline (docID, elemID, start, end, label, pathexp) docID elemID tfipf start end label pathexp Outline 1 1 0.3 1 236 XML Index /article 1 3 0.4 45 68 XML Index /article/title docID elemID start end label pathexp : : : : : : : 1 5 75 143 Introduction /article/body/section 1 8 144 219 XML Labeling /article/body/section 23 24 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend