displaying the level of contentiousness of wikipedia
play

Displaying the level of contentiousness of Wikipedia pages via a - PowerPoint PPT Presentation

Displaying the level of contentiousness of Wikipedia pages via a coloring scheme. http://www.wikitruthiness.com Katherine Baker Aaron Miller David Koenig Cullen Walsh Aspirations v. Reality Goals: Total article content contention


  1. Displaying the level of contentiousness of Wikipedia pages via a coloring scheme. http://www.wikitruthiness.com Katherine Baker Aaron Miller David Koenig Cullen Walsh

  2. Aspirations v. Reality Goals: Total article content contention determined from reversions, edit wars, other indicators at a paragraph/sentence level. Final Results: Determine recent article contention on a sentence → word level by assigning scores based on content insertion, deletion, and modification. http://www.wikitruthiness.com/

  3. Technical Overview Home Search Compute Analyze Version Diff Search Diffs Graph Results Choose Result Mark Up Fetch No Content w/ Have Cached Wikipedia Analysis Result? Content Results Yes h s e r f e R Display Cache Result http://www.wikitruthiness.com/

  4. Front End Details Home ● Search Utilizing Google Search API: Search Compute Analyze initiate processing (Ruby on Rails) Version Diff Search Diffs Graph Results ● Wikipedia Scraper: Choose Result fetch data for processing (RoR) Mark Up Fetch Content w/ Have Cached ● Render Output w/ Mediawiki API: Wikipedia Analysis Result? Content display the results (Ruby on Rails) Results Yes h s e Work by Cullen Walsh r f e R Display Cache Result http://www.wikitruthiness.com/

  5. Back End Details Home Search Compute Analyze Version Diff Search Diffs Graph Results ● Difference Analysis: Mark Up Fetch No Content w/ version differences graph (Python) Have Cached Wikipedia Analysis Result? Content Results ● Contention Identification: Yes linear scaling (KDE approx.) (Python) h s e r f e R Display Cache Work by David Koenig Result http://www.wikitruthiness.com/

  6. Middleware Details ● AWS Home ● S3 – Caching Results & Wikipedia Data ● EC2 – small instance for front end; high CPU instance for analysis Search Compute ● MySQL Analyze Version ● Queuing requests, storing Wikipedia article versions (30 most recent) Diff Search Diffs Graph Results Work by David Koenig and Cullen Walsh Mark Up Fetch No Content w/ Have Cached Wikipedia Analysis Result? Content Results Yes h s e r f e R Display Cache Result http://www.wikitruthiness.com/

  7. Demonstration http://www.wikitruthiness.com/

  8. Experimental Methodology ● Compare against related work: WikiTrust ● WikiTrust highlights untrustworthy words in a Wikipedia article based on many parameters ● Compute precision, recall against WikiTrust ● True Positives = # blocks which contain > 0 WikiTrust highlighted words ● False Positives = # blocks which do not contain any WikiTrust highlighted words ● False Negatives = # WikiTrust highlighted words which are not within our blocks http://www.wikitruthiness.com/

  9. Experimental Results Precision Recall 10.84% 52.43% Worst 20.25% 68.93% Average 38.82% 79.37% Best Results of evaluating 33 articles Work by Katherine Baker and Aaron Miller http://www.wikitruthiness.com/

  10. Challenges ● Getting the algorithm and coloring to work ● Obtaining cache coherency across memcached, S3, and MySQL ● Comparing data formats of WikiTrust and WikiTruthiness outputs ● Retrieving articles from Wikipedia in a timely fashion http://www.wikitruthiness.com/

  11. What We Learned ● Mixing technologies and having them interface is difficult ● Choosing your development language is important (e.g. Python not always best) ● Limited version history to 30 most current for speed; in production, would use more revisions ● Good evaluation requires significant time and effort, esp. when crawling and processing- intensive algorithms are involved http://www.wikitruthiness.com/

  12. Questions Email: {ajmiller,kbaker4,koenig,ckwalsh}@cs.washington.edu http://www.wikitruthiness.com/

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend