Deal Personalization Systems @ Groupon Ameya Kanitkar - PowerPoint PPT Presentation

    Deal Personalization Systems @ Groupon � Ameya ¡Kanitkar ¡ ameya@groupon.com ¡

Relevance & Personalization Systems @ Groupon �

What are Groupon Deals? �

Our Relevance Scenario � Users ¡

Our Relevance Scenario � Users ¡ How ¡do ¡we ¡surface ¡relevant ¡ deals ¡? ¡ ¡ ¡ ¡ • Deals ¡are ¡perishable ¡(Deals ¡ expire ¡or ¡are ¡sold ¡out) ¡ • No ¡direct ¡user ¡intent ¡(As ¡in ¡ tradiDonal ¡search ¡adverDsing) ¡ • RelaDvely ¡Limited ¡User ¡ InformaDon ¡ • Deals ¡are ¡highly ¡local ¡ ¡ ¡ ¡ ¡ ¡ ¡

Two Sides to the Relevance Problem � Algorithmic ¡ Scaling ¡ Issues ¡ Issues ¡ ¡ ¡ How ¡to ¡find ¡ How ¡to ¡handle ¡ relevant ¡deals ¡for ¡ relevance ¡for ¡ individual ¡users ¡ all ¡users ¡across ¡ given ¡a ¡set ¡of ¡ mulDple ¡ opDmizaDon ¡criteria ¡ delivery ¡plaJorms ¡

Developing Deal Ranking Algorithms � • Exploring Data � Understanding signals, finding ➥ patterns � • Building Models/Heuristics � Employ both classical machine ➥ learning techniques and heuristic adjustments to estimate user purchasing behavior � • Conduct Experiments � Try out ideas on real users and ➥ evaluate their effect �

Data Infrastructure � Growing ¡Deals ¡ Growing ¡Users ¡ 2011 ¡ 2012 ¡ • 100 ¡Million+ ¡subscribers ¡ 2013 ¡ • We ¡need ¡ ¡to ¡store ¡data ¡ like, ¡user ¡click ¡history, ¡ ¡ email ¡records, ¡service ¡ 20+ ¡ logs ¡etc. ¡This ¡tunes ¡to ¡ billions ¡of ¡data ¡points ¡ 400+ ¡ and ¡TB’s ¡of ¡data ¡ 2000+ ¡

Deal Personalization Infrastructure Use Cases � Deliver Personalized Deliver Personalized Website & Mobile Emails � Experience � Email ¡ Personalize ¡billions ¡of ¡emails ¡for ¡hundreds ¡ Personalize ¡one ¡of ¡the ¡most ¡popular ¡ of ¡millions ¡of ¡users ¡ e-‑commerce ¡mobile ¡& ¡web ¡app ¡ for ¡hundreds ¡of ¡millions ¡of ¡users ¡& ¡page ¡views ¡ Offline ¡System ¡ Online ¡System ¡

Earlier System � Email ¡ Online ¡Deal ¡ PersonalizaDon ¡ ¡ Offline ¡ API ¡ PersonalizaDon ¡ Map/Reduce ¡ MySQL ¡Store ¡ Data ¡Pipeline ¡(User ¡Logs, ¡Email ¡Records, ¡User ¡History ¡etc) ¡

Earlier System � • ¡Scaling ¡MySQL ¡for ¡data ¡ such ¡as ¡user ¡click ¡history, ¡ Email ¡ email ¡records ¡was ¡ painful ¡unless ¡we ¡shard ¡ data ¡ Offline ¡ Online ¡Deal ¡ PersonalizaDon ¡ PersonalizaDon ¡ ¡ • ¡Need ¡to ¡maintain ¡two ¡ Map/Reduce ¡ API ¡ separate ¡data ¡pipelines ¡ for ¡essenDally ¡the ¡same ¡ data. ¡ MySQL ¡Store ¡ Data ¡Pipeline ¡

• Common ¡data ¡store ¡that ¡ Ideal System � serves ¡data ¡to ¡both ¡online ¡ and ¡offline ¡systems ¡ • Data ¡store ¡that ¡scales ¡to ¡ Email ¡ hundreds ¡of ¡millions ¡of ¡ records ¡ Offline ¡ Online ¡Deal ¡ PersonalizaDon ¡ • Data ¡store ¡that ¡plays ¡well ¡ PersonalizaDon ¡ ¡ Map/Reduce ¡ API ¡ with ¡our ¡exisDng ¡hadoop ¡ based ¡systems ¡ Ideal ¡Data ¡Store ¡ • Data ¡store ¡that ¡supports ¡get() ¡ put() ¡access ¡paberns ¡based ¡ on ¡a ¡key ¡(User ¡ID). ¡ Data ¡Pipeline ¡

Why HBase? � • Open ¡Source ¡distributed ¡map ¡data ¡store ¡modeled ¡ acer ¡Google’s ¡Big ¡Table ¡ • Distributed ¡Data ¡Store: ¡Store ¡data ¡on ¡1-‑700 ¡node ¡ cluster. ¡Linear ¡scaling. ¡Add ¡capacity ¡by ¡adding ¡more ¡ machines. ¡ • Very ¡light ¡schema. ¡Each ¡row ¡may ¡have ¡any ¡number ¡of ¡ columns. ¡Columns ¡need ¡not ¡be ¡defined ¡upfront. ¡ (Something ¡like: ¡Row1-‑> ¡Map<byte[], ¡byte[]) ¡

Why HBase? � • Consistent ¡Database. ¡Highly ¡available. ¡AutomaDcally ¡ shards/ ¡scales. ¡Can ¡scale ¡to ¡billions ¡of ¡rows ¡and ¡mulD ¡ terabyte ¡data ¡sizes ¡ • Writes ¡: ¡1-‑10 ¡ms, ¡Reads ¡20-‑50 ¡ms ¡ • Tight ¡out ¡of ¡the ¡box ¡integraDon ¡with ¡Hadoop ¡and ¡Map ¡ Reduce ¡

HBase Table � Row ¡ Cf:<qual> ¡ Cf:<qual> ¡ …. ¡ Cf:<qual> ¡ row1 ¡ Cf1:qual1 ¡ Cf1:qual2 ¡ row11 ¡ Cf1:qual2 ¡ Cf1:qual22 ¡ Cf1:qual3 ¡ row2 ¡ Cf2:qual1 ¡ rowN ¡

Architecture Options � Email ¡ Offline ¡ Online ¡ ¡ PersonalizaDon ¡ PersonalizaDon ¡ Map/Reduce ¡ HBase ¡System ¡ Data ¡Pipeline ¡

Architecture Options � Pros ¡ • Simple ¡design ¡ • Consolidated ¡system ¡that ¡ Email ¡ serves ¡both ¡online ¡and ¡offline ¡ personalizaDon ¡ Offline ¡ Online ¡ ¡ PersonalizaDon ¡ PersonalizaDon ¡ Map/Reduce ¡ HBase ¡System ¡ Data ¡Pipeline ¡

Architecture Options � Cons ¡ • We ¡now ¡have ¡same ¡upDme ¡ SLA ¡on ¡both ¡offline ¡and ¡online ¡ system ¡ Email ¡ • Maintaining ¡online ¡latency ¡ SLA ¡for ¡bulk ¡writes ¡and ¡bulk ¡ reads ¡is ¡hard. ¡ Offline ¡ Online ¡ ¡ PersonalizaDon ¡ And ¡here ¡is ¡why… ¡ PersonalizaDon ¡ Map/Reduce ¡ HBase ¡System ¡ Data ¡Pipeline ¡

Architecture � • We ¡can ¡now ¡ maintain ¡different ¡ Email ¡ SLA ¡on ¡online ¡and ¡ offline ¡systems ¡ Real ¡Time ¡ Relevance ¡ • We ¡can ¡tune ¡HBase ¡ Relevance ¡ Map/Reduce ¡ cluster ¡differently ¡ for ¡online ¡and ¡ offline ¡systems ¡ ReplicaDon ¡ HBase ¡Offline ¡ HBase ¡for ¡Online ¡ System ¡ System ¡ Data ¡Pipeline ¡

HBase Schema Design � User ¡ID ¡ Column ¡Family ¡1 ¡ Column ¡Family ¡2 ¡ Unique ¡IdenDfier ¡for ¡ User ¡History ¡and ¡ Email ¡History ¡For ¡Users ¡ Users ¡ Profile ¡InformaDon ¡ Append ¡email ¡history ¡for ¡ each ¡day ¡as ¡a ¡separate ¡ Overwrite ¡user ¡history ¡ columns. ¡(On ¡avg ¡each ¡row ¡ and ¡profile ¡info ¡ has ¡over ¡200 ¡columns) ¡ • Most ¡of ¡our ¡data ¡access ¡paberns ¡are ¡via ¡“User ¡Key” ¡ • This ¡makes ¡it ¡easy ¡to ¡design ¡HBase ¡schema ¡ • The ¡actual ¡data ¡is ¡kept ¡in ¡JSON ¡

Cluster Sizing � • Machine ¡Profile ¡ HBase ¡ • 96 ¡GB ¡RAM ¡(HBase ¡25 ¡ ReplicaDon ¡ GB) ¡ Hadoop ¡+ ¡ • 24 ¡Virtual ¡Cores ¡CPU ¡ Online ¡HBase ¡ HBase ¡ ¡ Cluster ¡ • 8 ¡2TB ¡Disks ¡ Cluster ¡ • Data ¡Profile ¡ • 100 ¡Million+ ¡Records ¡ 100+ ¡machine ¡Hadoop ¡ • 2TB+ ¡Data ¡ cluster, ¡this ¡runs ¡heavy ¡map ¡ 10 ¡Machine ¡ dedicated ¡HBase ¡ ¡ reduce ¡jobs ¡ • Over ¡4.2 ¡Billion ¡Data ¡ The ¡same ¡cluster ¡also ¡hosts ¡ cluster ¡to ¡serve ¡real ¡ Points ¡ Dme ¡SLA ¡ 15 ¡node ¡HBase ¡cluster ¡

Other Takeaways � • Choose data storage format carefully. (We are using JSON, but one can consider Avro, Protobufs etc) � • Always store compressed data. We use LZO, its easy to map reduce � • Always store processed data in HBase. � • HBase needs some tuning before it scales. Tuning garbage collection is important. So is various timeouts and caching parameters, cluster can be unstable without these tuning parameters. �

Questions? �

QuesYons? ¡ Thanks! ¡ ameya@groupon.com � www.groupon.com/techjobs �

Deal Personalization Systems @ Groupon Ameya Kanitkar - PowerPoint PPT Presentation

Deal Personalization Systems @ Groupon Ameya Kanitkar ameya@groupon.com Relevance & Personalization Systems @ Groupon What are Groupon Deals? Our Relevance Scenario Users Our Relevance Scenario

Frontiers in E-Commerce Personalization Sri Subramaniam VP, Relevance, Groupon

These slides are available at http://tiny.cc/directedfeedback Overview Personalization

Database Operations at Groupon using Ansible Mani Subramanian Sr. Manager Global Database

Automated Fault-Tolerance Testing Ajay Vaddadi, Groupon April 11th, 2016 About Me and My Team

Towards Usable Privacy in Cross-System Personalization Yang Wang CMU Usable Privacy and Security

6. "Happy Days Are Here Again": FDR and the New Deal 6.1 FDR and the New Deal 6.2 A

The Green Deal Tracy Vegro Director, Green Deal Contents 1. Introducing the Green Deal 2. ECO

DYNAMIC WEBSITE PERSONALIZATION AGENDA Defining dynamic website personalization Why

THE POTENTIAL FOR PERSONALIZATION IN WEB SEARCH Susan Dumais, Microsoft Research Sept 30, 2016

Google News Personalization: Scalable Google News Personalization: Scalable Online Collaborative

This Time, Its Personalized Preparing Your Site for Effective Personalization AGENDA 1.

Roosevelt's New Deal Mr. Venezia Roosevelt's New Deal 1 Election of 1932 Roosevelt's New Deal

Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher,

Solving the high-dimensional Vlasov equation with deal.II and hyper.deal Eighth deal.II Users and

Adaptivity and Personalization in Learning System s Sabine Graf School of Computing and

Web Adaptation and Personalization Marios Belk Outline Overview and Importance of

Lists defined inductively ( A ) is the smallest set satisfying this equation: LIST ( A ) f (cons a

Convert email data to seq2seq N ATURAL LAN GUAGE GEN ERATION IN P YTH ON Biswanath Halder

tomferry.com/success tomferry.com/success tomferry.com/success Send me a Tweet @TomFerry w/

D.A.M. Data Append Mastery LIVE JEFF COGA TRAINING If Youre NOT Embarrassed By Your First

Flask Pyth thon on web eb fram amewor orks ks Django Roughly follows MVC pattern

Understanding JavaScript Event-Based Interactions Saba Alimadadi Sheldon Sequeira Ali Mesbah

CSC 151 Spring 2020 Topic: Vectors April 3, 2020 Day 26 Announcements CS Table: - Virtually

A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores Ting

Deal Personalization Systems @ Groupon Ameya Kanitkar - PowerPoint PPT Presentation

Deal Personalization Systems @ Groupon Ameya Kanitkar ameya@groupon.com Relevance & Personalization Systems @ Groupon What are Groupon Deals? Our Relevance Scenario Users Our Relevance Scenario

Frontiers in E-Commerce Personalization Sri Subramaniam VP, Relevance, Groupon

These slides are available at http://tiny.cc/directedfeedback Overview Personalization

Database Operations at Groupon using Ansible Mani Subramanian Sr. Manager Global Database

Automated Fault-Tolerance Testing Ajay Vaddadi, Groupon April 11th, 2016 About Me and My Team

Towards Usable Privacy in Cross-System Personalization Yang Wang CMU Usable Privacy and Security

6. &quot;Happy Days Are Here Again&quot;: FDR and the New Deal 6.1 FDR and the New Deal 6.2 A

The Green Deal Tracy Vegro Director, Green Deal Contents 1. Introducing the Green Deal 2. ECO

DYNAMIC WEBSITE PERSONALIZATION AGENDA Defining dynamic website personalization Why

THE POTENTIAL FOR PERSONALIZATION IN WEB SEARCH Susan Dumais, Microsoft Research Sept 30, 2016

Google News Personalization: Scalable Google News Personalization: Scalable Online Collaborative

This Time, Its Personalized Preparing Your Site for Effective Personalization AGENDA 1.

Roosevelt's New Deal Mr. Venezia Roosevelt's New Deal 1 Election of 1932 Roosevelt's New Deal

Web Personalization &amp; Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher,

Solving the high-dimensional Vlasov equation with deal.II and hyper.deal Eighth deal.II Users and

Adaptivity and Personalization in Learning System s Sabine Graf School of Computing and

Web Adaptation and Personalization Marios Belk Outline Overview and Importance of

Lists defined inductively ( A ) is the smallest set satisfying this equation: LIST ( A ) f (cons a

Convert email data to seq2seq N ATURAL LAN GUAGE GEN ERATION IN P YTH ON Biswanath Halder

tomferry.com/success tomferry.com/success tomferry.com/success Send me a Tweet @TomFerry w/

D.A.M. Data Append Mastery LIVE JEFF COGA TRAINING If Youre NOT Embarrassed By Your First

Flask Pyth thon on web eb fram amewor orks ks Django Roughly follows MVC pattern

Understanding JavaScript Event-Based Interactions Saba Alimadadi Sheldon Sequeira Ali Mesbah

CSC 151 Spring 2020 Topic: Vectors April 3, 2020 Day 26 Announcements CS Table: - Virtually

A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores Ting

6. "Happy Days Are Here Again": FDR and the New Deal 6.1 FDR and the New Deal 6.2 A

Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher,