capto gennaio 2010 1 the problem to solve
play

CAPTO Gennaio 2010 1 The problem to solve Nowadays information - PowerPoint PPT Presentation

CAPTO Gennaio 2010 1 The problem to solve Nowadays information published on internet is not manageable any more; the consequence is that any internet search is not precise. Due to the overwhelming amount of information and the


  1. CAPTO Gennaio 2010 1

  2. The problem to solve • Nowadays information published on internet is not manageable any more; the consequence is that any internet search is not precise. • Due to the overwhelming amount of information and the inherent nature of internet (polling protocol), manual internet retrieval can be a human exhaustive activity; • The relevant information is only a fraction of the available one; • All these problems, that lead to a loss of information (hence power), pertain to the information created by a company as well; CAPTO Gennaio 2010 2

  3. The Goal To have a way to retrieve an information: On time => When needed Precise => Noise reduction Fruitful => Structured and harmonized Complete => Extracted from any media CAPTO Gennaio 2010 3

  4. The solution Capto is the complete solution to create information acquiring and indexing media from multiple sources CAPTO Gennaio 2010 4

  5. Characteristics • Focus on relevant information; • A unique portal to retrieve all the information you need; • Users can subscribe to ‘information channels’, being notified when new pertinent information is created; • A complete information management workflow; CAPTO Gennaio 2010 5

  6. Technical Characteristics • Enhanced crawling capabilities (authentication, javascript processing, WEB 2.0); • Distributed and scalable acquisition from internet sources; • Enhanced Text Indexing (stemming, ranking (BM25), probabilistic search,…); • An highly configurable CMS portal (Jsr-168 compatible portlets, can be registered in any legacy CMS); • Can scale up to millions of indexed documents; CAPTO Gennaio 2010 6

  7. Application domains • Data Monitoring: • Finance, stock markets… • Information monitoring and analysis (document repositories, news, web press, news feeds, blogs, mails,…) • Brand analysis (brand monitoring, sentiment analysis,…) • Massive text indexing and retrieval • …by and large any domain where the retrieval and analysis of information creates new (and more useful) information; CAPTO Gennaio 2010 7

  8. The architecture Domain dependent Domain independent www External File System, DBMS,… CAPTO Gennaio 2010 8

  9. PA Case history:Edison The problem : monitoring of Italian laws and regulations on the environmental impact related with the production of Energy The solution : • Automatic acquisition from several national, regional, federal and local web portals; • A complete validation workflow; • Information precision: before (manual acquisition) <50%, after ~100% CAPTO Gennaio 2010 9

  10. Other products on the market Text indexing and ranking : • Apache Lucene (http://lucene.apache.org) • ClusterClick (www.clusterclick.com) • Amberfish (http://www.etymon.com/tr.html) • Terrier (http://ir.dcs.gla.ac.uk/terrier/) Document Management : • OpenText (www.opentext.com) • SearchExpress (www.searchexpress.com) • IndexData (www.indexdata.com) • AutonomyVirage (www.virage.com) Internet Information Retrieval: • HtDig (www.htdig.org) CAPTO Gennaio 2010 10

  11. Conclusions • Can be used to monitor the acquisition of multimedia from internet sources; • Can be used to index and retrieve textual information from any archived media; • Can be used to shorten the time-to-information; • Can be used to provide a more precise information (and to map the information you have); • Can be easily adopted (low cost of software adoption) • Domain agnostic and multi-language CAPTO Gennaio 2010 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend