Crawler.NET
A component-based distributed framework for web traversal
Crawler.NET: A component-based distributed framework for web - - PowerPoint PPT Presentation
Crawler.NET: A component-based distributed framework for web traversal Levente Hunyadi (BME AAIT) March 23, 2007 Crawler.NET A component-based distributed framework for web traversal Motivation Introduction The Web: Motivation a source
A component-based distributed framework for web traversal
A component-based distributed framework for web traversal
Introduction Motivation Objectives Architecture Component framework Crawler application Conclusions
A component-based distributed framework for web traversal
Introduction Motivation Objectives Architecture Component framework Crawler application Conclusions
A component-based distributed framework for web traversal
Introduction Motivation Objectives Architecture Component framework Crawler application Conclusions
A component-based distributed framework for web traversal
Introduction Motivation Objectives Architecture Component framework Crawler application Conclusions
A component-based distributed framework for web traversal
Introduction Motivation Objectives Architecture Component framework Crawler application Conclusions
A component-based distributed framework for web traversal
Introduction Component framework Building blocks Components Providers Connectors Crawler application Conclusions
A component-based distributed framework for web traversal
Introduction Component framework Building blocks Components Providers Connectors Crawler application Conclusions
A component-based distributed framework for web traversal
Introduction Component framework Building blocks Components Providers Connectors Crawler application Conclusions
GenericComponent Generic- Producer Generic- Consumer Simple- Filter Complex- Filter Asynchronous- ComplexFilter Synchronous- OutputFilter SemiSynchronous- ComplexFilter Synchronous- CompexFilter
A component-based distributed framework for web traversal
Introduction Component framework Building blocks Components Providers Connectors Crawler application Conclusions
A component-based distributed framework for web traversal
Introduction Component framework Building blocks Components Providers Connectors Crawler application Conclusions
A component-based distributed framework for web traversal
Introduction Component framework Building blocks Components Providers Connectors Crawler application Conclusions
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Architecture Server components Client components Traversal Load balancing Parsing URL distributor component Conclusions
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Architecture Server components Client components Traversal Load balancing Parsing URL distributor component Conclusions
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Architecture Server components Client components Traversal Load balancing Parsing URL distributor component Conclusions
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Architecture Server components Client components Traversal Load balancing Parsing URL distributor component Conclusions
Server Traversal component Downloader Parser URL distributor host, #new items document url, referrer url url, HTTP header, document content base url, links url, length, start/stop time, HTTP status code local url queue Client 1 external url internal url url belonging to Client 1 next url finished
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Architecture Server components Client components Traversal Load balancing Parsing URL distributor component Conclusions
Server Traversal component Downloader Parser URL distributor host, #new items document url, referrer url url, HTTP header, document content base url, links url, length, start/stop time, HTTP status code local url queue Client 1 external url internal url url belonging to Client 1 next url finished
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Architecture Server components Client components Traversal Load balancing Parsing URL distributor component Conclusions
Url distributor Traversal component Downloader host, #new items url, referrer url url queue finished
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Architecture Server components Client components Traversal Load balancing Parsing URL distributor component Conclusions
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Architecture Server components Client components Traversal Load balancing Parsing URL distributor component Conclusions
Url distributor Traversal component Downloader host, #new items url, referrer url url queue statistics available host Load balancer
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Architecture Server components Client components Traversal Load balancing Parsing URL distributor component Conclusions
Server Traversal component Downloader Parser URL distributor host, #new items document url, referrer url url, HTTP header, document content base url, links url, length, start/stop time, HTTP status code local url queue Client 1 external url internal url url belonging to Client 1 next url finished
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Architecture Server components Client components Traversal Load balancing Parsing URL distributor component Conclusions
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Architecture Server components Client components Traversal Load balancing Parsing URL distributor component Conclusions
Server Traversal component Downloader Parser URL distributor host, #new items document url, referrer url url, HTTP header, document content base url, links url, length, start/stop time, HTTP status code local url queue Client 1 external url internal url url belonging to Client 1 next url finished
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Architecture Server components Client components Traversal Load balancing Parsing URL distributor component Conclusions
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Conclusions Future work Summary
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Conclusions Future work Summary
A component-based distributed framework for web traversal
Introduction Component framework Crawler application Conclusions Future work Summary