Toward a Multi-tier Index for Information Retrieval System
Madhav Ram FJ37459
Toward a Multi-tier Index for Information Retrieval System Madhav - - PowerPoint PPT Presentation
Toward a Multi-tier Index for Information Retrieval System Madhav Ram FJ37459 IR systems are mainly developed to help manage huge literature that have been developed IR systems provide users with easy access to information and its main
Madhav Ram FJ37459
literature that have been developed
and its main function are representation, storage,
decreases drastically as the information stored in the system increases
The previous work in this field follows in mainly in two directions The first is sequential processing- here one processor is at a time to construct inverted file index used for information retrieval The second is through parallel processing which uses multi processor to construct inverted file index An Inverted Index is a sorted list of Keywords with each keyword having links to documents containing those keywords
documents linkage is called the posting file
Fig: Implementation of inverted index using sorted array.
implemented using block addressing idea to speedup the construction of the inverted file is developed in
block addressing is the shrinking of the inverted file size to become only 5%
text size
into into smaller buckets that fir into main memory
Fig: Partial Indexing Technique merging the partial indexes in a binary fashion
tackled using two approaches they are: Local Index approach and Global Index approach
considering only the documents which are stored respectively
produce a single inverted list index which are identical to sequential
very large text collections. The three algorithms are Local Buffer and Local List algorithm (LL Algo.); Local Buffer and Remote List algorithm (LR Algo.); Remote Buffer and Remote List Algorithm (RR Algo.)
special purpose hardware and the second one is to use the Multi-Tier index algorithm
algorithms
which represents data as indexed data
because it is expensive
searching and the updating time of the inverted file index
consists of two associated files, the first file is dictionary and the second file is called postings
process for any query and easily updating
identify the first letter in query and in second-tier determine file name to perform the search
index for updated files and remerge
words to text document
MHZ with 64MB RAM; second is 2.8MHZ Dell server with 1GB RAM
size of inverted file
Updating by 1KB using PII 333
50 100 150 1K 512 K 2M 8M
Inverted File Index Size Updating Time
Partial Multi-Tier
Updating by 1MB using PII 333 100 200 300 400 1 K 5 1 2 K 2 M 8 M Inverted File Index Size Updating Time
Partial Multi-Tier
Figure a: Updating time by 1KB file size using Partial and Multi-Tier inverted file Figure b: Updating time by 1 MB file size using Partial and Multi-Tier inverted file
Figure c: Updating time by 2KB file size using Partial and Multi-Tier inverted file
Updating by 2MB using PII 333 100 200 300 400 500 600 1 K 5 1 2 K 2 M 8 M Inverted File Index Size Updating Time
Partial Multi-Tier
Figure d: Updating time by 1KB file size using Partial and Multi-Tier inverted file Figure e: Updating time by 1 MB file size using Partial and Multi-Tier inverted file
Updating by 1KB using 2.8 GHZ
5 10 15 1K 512 K 2M 8M Inverted File Index Size
Updating Time
Partial Multi-Tier
Updating by 1MB using 2.8 GHZ
10 20 30 40 50 1K 512 K 2M 8M Inverted File Index Size
Updating Time
Partial Multi-Tier
Figure f: Updating time by 2MB file size using Partial and Multi-Tier inverted file Updating by 2MB using 2.8GHz
20 40 60 80 100 1K 512 K 2M 8M Inverted File Index Size
Updating Time
Partial Multi-Tier
partial index technique
partial index
small file size with predictable performance