Scaling Java Based Recommendation Engine for Leading Q&A site - - PowerPoint PPT Presentation

scaling java based recommendation engine for leading q a
SMART_READER_LITE
LIVE PREVIEW

Scaling Java Based Recommendation Engine for Leading Q&A site - - PowerPoint PPT Presentation

Scaling Java Based Recommendation Engine for Leading Q&A site Anurag Gupta Senior Architect, Yahoo Answers anuragg@yahoo-inc.com Agenda Yahoo Answers A Snapshot Why Open Question Recommender? Technology Technology


slide-1
SLIDE 1

Scaling Java Based Recommendation Engine for Leading Q&A site

Anurag Gupta Senior Architect, Yahoo Answers anuragg@yahoo-inc.com

slide-2
SLIDE 2
  • Yahoo Answers – A Snapshot
  • Why Open Question Recommender?
  • Technology

– Technology Requirements – Architecture

Agenda

– Architecture – Design – Flow Diagrams – Science Models – Infrastructure – Results

  • Future
slide-3
SLIDE 3

Yahoo! Answers

Total Reach 9 Languages 21 Markets

Yahoo! Answers – A Snapshot

Over 200M users and 2B page views each month One of top 40 sites globally in terms of traffic

Source: Yahoo Internal Data

5th most popular social website Over 300 million questions and 1 billion answers Mobile growth 2X YOY, 80M users, 400M PVs

slide-4
SLIDE 4

Why Open Question Recommender?

  • Difficult for answerers to find relevant

questions

  • Need to increase number of answers /

question Why Open Question Recommender? question

  • Increase signal to noise ratio
  • Route questions to “right” answerers
slide-5
SLIDE 5

Requirements

  • Use answerer's interests for recommendations
  • Bucket (A/B Test) aware
  • Avoid questions already answered by answerer
  • Fraction of open questions are diverse
  • Submission to globally available in less than 1s

Technology Requirements

  • Submission to globally available in less than 1s
  • 90th percentile serving latency less than 100ms
  • Diagnostics
slide-6
SLIDE 6

Architecture

Front End Yapache / Maple Middle Tier Java, Tomcat Caching API / YQL Security

Spam Abuse

Junk Detector

Recommender Systems

Question Recommender User Database SocDir Customer Care Editorial

Tools

Karma

Users Applications

Hadoop Grid Storage

Search Oracle NoSQL Related Questions AutoCat SocDir Editorial

Analytics

Tooldev

slide-7
SLIDE 7

Open Question Recommender Design

Front-End Middle-tier

ActiveMQ

Open Question Recommender Engine

U->list of machine topics Q->list of machine User list of machine topics

Mapping Algorithms Data Store

NoSQL R/W APIs

Q list of machine topics

Oracle NoSQL Cache

Machine Topics Model Machine Topic->list of questions Machine Topic list of questions machine topics

Search APIs

slide-8
SLIDE 8

Information Flow in Open Question Recommender

slide-9
SLIDE 9

Sequence Flows

slide-10
SLIDE 10
  • Science model to infer user interest

– Objective is to surface most relevant open questions for answerers – Model per top level Answers category – Clusters based on selectivity of words

Enable Users to Discover Relevant Content

– Clusters based on selectivity of words – Questions / users mapped to clusters – Relevant questions surfaced based on answerer’s affinity to cluster

  • Success Metric

– Average number of answers per question

slide-11
SLIDE 11

Science Models

  • LDA (Latent Dirichlet Allocation)
  • TFIDF

– (Term Frequency Inverse Document Frequency)

  • Answers Category
  • Answers Category
  • Diversity
  • Decay
slide-12
SLIDE 12

User Representation Model

slide-13
SLIDE 13

Infrastructure

  • Bucketing / Analytics infrastructure

– Evaluate if differences in science algorithms, UI are statistically significant

  • ActiveMQ
  • ActiveMQ

– Publish / subscribe system that decouples serving from peripheral systems

slide-14
SLIDE 14
  • Global CTR increased by ~50%
  • US CTR increased by ~3X
  • 4% growth in answers / user

90th percentile serving latencies below 100ms Results

  • 90th percentile serving latencies below 100ms
slide-15
SLIDE 15
slide-16
SLIDE 16
  • “this is a good idea i like it.”
  • “This is a cool idea.”
  • “Good idea.”
  • “Just now I saw this feature enabled for me. I

liked it. Actually, when I came back from office, I

Sample Verbatim Feedback from Users

liked it. Actually, when I came back from office, I will go to unanswered question or popular

  • questions. Because I can’t check all the questions.

This recommended feature is pretty nice.”

  • “I find this feature quite handy. Also this brings

up qusetions which I would not have known ever existed for me to answer.”

slide-17
SLIDE 17
  • Re-architect front-end for mobile

– HTML5 / JS / CSS – Driven off APIs

  • Re-architect back-end

Future

  • Re-architect back-end

– Scale across multiple colos – Reduce latency by caching content in colos closer to user

  • Discovery