eyeShot Multimedia Search Engine Multimedia Search Engine eyeShot - - PowerPoint PPT Presentation

eyeshot multimedia search engine multimedia search engine
SMART_READER_LITE
LIVE PREVIEW

eyeShot Multimedia Search Engine Multimedia Search Engine eyeShot - - PowerPoint PPT Presentation

eyeShot Multimedia Search Engine Multimedia Search Engine eyeShot Extracting text patterns from the WWW Extracting text patterns from the WWW to characterize Multimedia Resources to characterize Multimedia Resources A project by


slide-1
SLIDE 1

eyeShot eyeShot Multimedia Search Engine Multimedia Search Engine

“Extracting text patterns from the WWW “Extracting text patterns from the WWW to characterize Multimedia Resources” to characterize Multimedia Resources” A project by A project by

Demetris Zeinalipour & Theodoros Folias Demetris Zeinalipour & Theodoros Folias

Online Demo: Online Demo: http://www.cs.ucr.edu/~ csyiazti/eyeshot/ http://www.cs.ucr.edu/~ csyiazti/eyeshot/

slide-2
SLIDE 2

The Problem The Problem

  • Many multimedia resources on the WWW which are not indexed by the

various WWW Search Engines efficiently. It is estimated that the 1/3 of the web consist of only Images.

  • Many proprietary solutions
  • 1. Specific Image Engines (WebSeer, GoogleImage, WeebSeek etc)
  • 2. Specific Streaming Audio Search Engines (SpeechBot search an

archive of 6,500 hours of online radio-show transcripts).

  • 3. Specific MP3 engines (SingingFish.com has developed what it claims

to be the largest index of MP3 files).

  • These solutions are content specific. The search in the multimedia file is

achieved by content processing techniques, e.g. image processing, Audio to text converters etc.

  • But the formats targeted for the Web are growing exponentially. Will we

need to design a proprietary search engine for each particular resource?

Scalability?

slide-3
SLIDE 3

The Solution The Solution

  • We argue that the WWW was designed for text. All other resources are

supplementary.

  • This means that someone who publishes a multimedia resource will of

course first have an html page from where the particular resource will be linked.

  • Our Approach: Analyze the text based pages which link to that

particular resource and try to characterize the particular multimedia resource.

  • This Solutions scales up as the number of file formats increases since it

makes no use of resource-specific details

slide-4
SLIDE 4

Design & Implementation Details Design & Implementation Details

< title> NBA … Jordan< title>

< h2> < b> Michael Jordan< /b> < h2>

The Title Captions and bold text Surrounding Text

{ # 23, Brooklyn, North, Carolina, Washington, Wizards, College…}

The resource is characterized by the above characteristics. The overall rank of that resource is identified by the importance of the page that hosts the resource

slide-5
SLIDE 5

A A r r c c h h i i t t e e c c t t u u r r e e

eyeShot Multimedia Search Engine eyeShot Index Server WebRACE High Performance Crawler

eyeShot Page Processor Crawler URL-Queue

W W W

request queue

Object Cache

cache index

Coordinator

crawl (URL, depth)

request queue URL fetchers meta-info store

crawl (URL, depth-1, owners) store (URL) filter (URL) getState (URL) read (metainfo)

1 2

HTML Filtering Processors Object index Object Validation index

validate (URL) add(URL,{keywords})

eyeShot offline Lexicon Generator

Object store Object Validation store seed.txt Lexicon store Lexicon index

eyeShot Web UI

Lexicon index

add (keword,{urls}) load Index

Video Tape Web Client Web Client Web Client

A A r r c c h h i i t t e e c c t t u u r r e e

slide-6
SLIDE 6

Online Demo Demonstration Online Demo Demonstration Target Web : www.cs.ucr.edu Target Web : www.cs.ucr.edu

http:/ / www.cs.ucr.edu/ ~ csyiazti/ eyeshot/ http:/ / www.cs.ucr.edu/ ~ csyiazti/ eyeshot/

slide-7
SLIDE 7

Online Demo Demonstration Online Demo Demonstration Target Web : www.cs.ucr.edu Target Web : www.cs.ucr.edu

Lookup time = 7 milliseconds

http:/ / www.cs.ucr.edu/ ~ csyiazti/ eyeshot/ http:/ / www.cs.ucr.edu/ ~ csyiazti/ eyeshot/