scaling java based recommendation engine for leading q a
play

Scaling Java Based Recommendation Engine for Leading Q&A site - PowerPoint PPT Presentation

Scaling Java Based Recommendation Engine for Leading Q&A site Anurag Gupta Senior Architect, Yahoo Answers anuragg@yahoo-inc.com Agenda Yahoo Answers A Snapshot Why Open Question Recommender? Technology Technology


  1. Scaling Java Based Recommendation Engine for Leading Q&A site Anurag Gupta Senior Architect, Yahoo Answers anuragg@yahoo-inc.com

  2. Agenda Yahoo Answers – A Snapshot • • Why Open Question Recommender? • Technology – Technology Requirements – Architecture – Architecture – Design – Flow Diagrams – Science Models – Infrastructure – Results • Future

  3. Yahoo! Answers – A Snapshot Yahoo! Answers Total Reach 9 Languages 21 Markets Over 200M users and 2B page views each month One of top 40 sites globally in terms of traffic 5 th most popular social website Over 300 million questions and 1 billion answers Mobile growth 2X YOY, 80M users, 400M PVs Source: Yahoo Internal Data

  4. Why Open Question Recommender? Why Open Question Recommender? • Difficult for answerers to find relevant questions • Need to increase number of answers / question question • Increase signal to noise ratio • Route questions to “right” answerers

  5. Technology Requirements Requirements • Use answerer's interests for recommendations • Bucket (A/B Test) aware • Avoid questions already answered by answerer • Fraction of open questions are diverse • Submission to globally available in less than 1s • Submission to globally available in less than 1s • 90 th percentile serving latency less than 100ms • Diagnostics

  6. Architecture Users Spam Abuse Security Junk Detector Front End Caching Yapache / Maple Karma Applications API / YQL Middle Tier Tools Java, Tomcat Recommender Systems Customer User Care Question Database Recommender Editorial Editorial SocDir SocDir Related Analytics Questions Storage Tooldev AutoCat Oracle NoSQL Search Hadoop Grid

  7. Open Question Recommender Design Front-End Open Question Recommender Data Store Engine R/W APIs U->list of NoSQL Mapping machine topics Algorithms Middle-tier ActiveMQ User � list of machine topics Q->list of machine machine topics Search APIs Q � list of machine topics Machine Topic Machine � list of Topic->list of NoSQL questions questions Oracle Cache Machine Topics Model

  8. Information Flow in Open Question Recommender

  9. Sequence Flows

  10. Enable Users to Discover Relevant Content • Science model to infer user interest – Objective is to surface most relevant open questions for answerers – Model per top level Answers category – Clusters based on selectivity of words – Clusters based on selectivity of words – Questions / users mapped to clusters – Relevant questions surfaced based on answerer’s affinity to cluster • Success Metric – Average number of answers per question

  11. Science Models • LDA (Latent Dirichlet Allocation) • TFIDF – (Term Frequency Inverse Document Frequency) • Answers Category • Answers Category • Diversity • Decay

  12. User Representation Model

  13. Infrastructure • Bucketing / Analytics infrastructure – Evaluate if differences in science algorithms, UI are statistically significant • ActiveMQ • ActiveMQ – Publish / subscribe system that decouples serving from peripheral systems

  14. Results • Global CTR increased by ~50% • US CTR increased by ~3X • 4% growth in answers / user • 90 th percentile serving latencies below 100ms 90 th percentile serving latencies below 100ms

  15. Sample Verbatim Feedback from Users • “this is a good idea i like it.” • “This is a cool idea.” • “Good idea.” • “Just now I saw this feature enabled for me. I liked it. Actually, when I came back from office, I liked it. Actually, when I came back from office, I will go to unanswered question or popular questions. Because I can’t check all the questions. This recommended feature is pretty nice.” • “I find this feature quite handy. Also this brings up qusetions which I would not have known ever existed for me to answer.”

  16. Future • Re-architect front-end for mobile – HTML5 / JS / CSS – Driven off APIs • Re-architect back-end • Re-architect back-end – Scale across multiple colos – Reduce latency by caching content in colos closer to user • Discovery

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend