GoodReads Book Recommendation Service Yijun Tian, Vicky Bai, Zeynep - PowerPoint PPT Presentation

GoodReads Book Recommendation Service Yijun Tian, Vicky Bai, Zeynep Doganata

Introduction/Related Work - “A subclass of information filtering system that seek to predict the ‘rating’ or ‘preference’ that a user would give to an item” -- Wikipedia - Recommendation systems drive significant engagement and revenue for companies such as Amazon, Netflix, and Good Readers. - Approaches: Collaborative filtering, Content-based filtering, Contextual filtering, Social and demographic filtering. - Techniques: Supervised Learning, Clustering/Unsupervised Learning, Transfer Learning, Text Classification, Text Embedding

Data - GoodReads Book Information (book id, title, Book description) Book User Information Recommendation Book (user id, user’s shelf) Engine Book User Behavior (rating, is_read) Book Datasets: ● Meta-Data of Books ( 2.36M books); ● User-Book Interactions ( 229M user-book interactions); ● Book Review Texts (15M records).

Book A Pipeline Extract Similar Books - Input: a book ID - Output: most similar books, Ground Truth Reader-based including ID, titles, description Similar Books Similar Books (A, Similar Book B1) Model (A, Similar Book Bn) InferSent Book A Similar Books Model embedding embedding Calculate Similarity Score Most Similar Books

Extract Similar Books - Ground Truth Similar Books Provided in GoodReads dataset. However, we don’t know how they generate the similar books (e.g. same (1) Ground Truth Similar Books series, topic, author?) - Reader-based Similar Books - Share same readers - Share same ratings (4/5 stars) - Randomly select 200 similar books (2) Reader-based Similar Books

Model Exploration 1. Word embedding: Word2vec vs FastText 2. Transfer learning: ULMFit 3. Sentence embedding: InferSent

Word Embedding One Hot Encoding Co-Occurrence Matrix Word2Vec apple apples <ap app ppl ple le> Word2Vec fastText

ULMFiT: Universal Language Model Fine-tuning for Text Classification - Transfer learning in NLP - Consists of 3 main phases: - Language model trained on general domain corpus - Fine tuning begins on target task data using slanted triangular learning rates to learn features - Further fine-tuning using gradual unfreezing and slanted triangular learning rates - to preserve low-level learnings and adapt high-level representations - Not yet used very much for unsupervised tasks such as Semantic Text Similarity - many tasks implemented with ULMFiT involve classification Paper: “Universal Language Model Fine-tuning for Text Classification”

InferSent: sentence embedding - To obtain general-purpose sentence embeddings that capture generic information - Pre-trained on Stanford Natural Language Inference (SNLI) dataset. - 570k humangenerated English sentence pairs - u: premise representation - v: hypothesis representation - 3-class classifier: entailment, contradiction and neutral - Example: A soccer game with multiple males playing & Some men are playing a sport. (entailment) Paper: “Supervised Learning of Universal Sentence Representations from Natural Language Inference Data”

InferSent: sentence embedding - Our accuracy: 0.73 in all similar books, 0.77 in top 5. (test size: 3000 books and their similar books) Paper: “Supervised Learning of Universal Sentence Representations from Natural Language Inference Data”

Example Results Original Book: Anita Diamant's international bestseller "The Red Tent" brilliantly re-created the ancient world of womanhood. Diamant brings her remarkable storytelling skills to "Good Harbor" -- offering insight to the precarious balance of marriage and career, motherhood and friendship in the world of modern women. The seaside town of Gloucester, Massachusetts is a place where the smell of the ocean lingers in the air and the rocky coast glistens in the Atlantic sunshine. When longtime Gloucester-resident Kathleen Levine is diagnosed with breast cancer, her life is thrown into turmoil. Frightened and burdened by secrets, she meets Joyce Tabachnik -- a freelance writer with literary aspirations -- and a once-in-a-lifetime friendship is born. Joyce has just bought a small house in Gloucester, where she hopes to write as well as vacation with her family. Like Kathleen, Joyce is at a fragile place in her life. A mutual love for books, humor, and the beauty of the natural world brings the two women together. They share their personal histories, and help each other to confront scars left by old emotional wounds. With her own trademark wisdom and humor, Diamant considers the nature, strength, and necessity of adult female friendship. "Good Harbor" examines the tragedy of loss, the insidious nature of family secrets, as well as the redemptive power of friendship. Similar Book: In A Little Love Story, Roland Merullo--winner of the Massachusetts Book Award and the Maria Thomas Fiction Award--has created a sometimes poignant, sometimes hilarious tale of attraction and loyalty, jealousy and grief. It is a classic love story--with some modern twists. Janet Rossi is very smart and unusually attractive, an aide to the governor of Massachusetts, but she suffers from an illness that makes her, as she puts it, "not exactly a good long-term investment." Jake Entwhistle is a few years older, a carpenter and portrait painter, smart and good-looking too, but with a shadow over his romantic history. After meeting by accident--literally--when Janet backs into Jake's antique truck, they begin a love affair marked by courage, humor, a deep and erotic intimacy . . . and modern complications. Working with the basic architecture of the love story genre, Merullo--a former carpenter known for his novels about family life--breaks new ground with a fresh look at modern romance, taking liberties with the classic design, adding original lines of friendship, spirituality, and laughter, and, of course, probing the mystery of love. ... (Score: 0.8631)

API Demo

Service Hosting Specs/Details: ● Flask application with model and data preloading ● GET endpoint with book_id and “top n” parameter ● Docker image ~ 10.4 GB ● InferSent model file size ~ 4.5 GB ● GoodReads data ~ 2.5 GB

AWS Fargate ● AWS Serverless compute engine for containers ● Works with ECS - elastic container service ● UI configuration - not very intuitive ○ Task definitions ○ Container definitions ○ Soft and hard limits on resources at both layers ● 10 GB memory limit! :(

Kubernetes on DigitalOcean ● “The node had condition: [MemoryPressure]”

AWS SageMaker ● Targeted towards Data Scientist and ML engineers to provide serverless capabilities for: ○ Labeling ○ Building ○ Training ○ Sharing notebooks ○ Deploying models ○ Managing Inference endpoints ○ Supports “ custom ” containers

More on ... ● Containers must be deployed to AWS ECR ● Must be organized in a compatible way: ○ Infer POST endpoint following Sagemaker spec ○ Model directory that gets packaged and uploaded to S3 as part of the deployment ○ Data directory

Short-term solution: EC2 Instance ● Sagemaker looked promising, but after deploying our container we saw it would require a non-trivial amount of refactoring to make it work ● To ensure we had our service deployed somewhere, we provisioned an EC2 instance ● Trade - offs: ○ Availability - intermittent crashing ○ Scaling requires: ■ fleet management ■ load balancer

Hosting Enhancements If we had more time… ● Dockerize better - layering analysis and pruning unnecessary base image packages ● Host the model file externally (S3) ● Upload GoodReads data externally or pickle data structures (S3) ● Possibly use a small key-value based DB for GoodReads data storage ● Try SageMaker which has been optimized to do this for us ● Latency optimization: ○ GPU inference ○ Experiment with precomputing embeddings for our dataset

Other Enhancements - Recommendation Engine Enhancement - Combined with the user reviews and other information - Use another metrics instead of accuracy - Embedding enhancements - Incorporate the trained fastText model in InferSent - Combine the word embeddings and text embeddings

Reference Datasets ： ● Mengting Wan, Julian McAuley, "Item Recommendation on Monotonic Behavior Chains", in RecSys'18 . [bibtex] ● Mengting Wan, Rishabh Misra, Ndapa Nakashole, Julian McAuley, "Fine-Grained Spoiler Detection from Large-Scale Review Corpora", in ACL'19 . [bibtex] ● Common Crawl: https://commoncrawl.org/ Models: ● Howard, Jeremy and Ruder, Sebastian. "Universal Language Model Fine-tuning for Text Classification." Paper presented at the meeting of the ACL, 2018. ● Conneau, Alexis, Douwe Kiela, Holger Schwenk, Loïc Barrault and Antoine Bordes. “Supervised Learning of Universal Sentence Representations from Natural Language Inference Data.” EMNLP (2017).

Questions

Appendix

GoodReads Book Recommendation Service Yijun Tian, Vicky Bai, Zeynep - PowerPoint PPT Presentation

GoodReads Book Recommendation Service Yijun Tian, Vicky Bai, Zeynep Doganata Introduction/Related Work - A subclass of information filtering system that seek to predict the rating or preference that a user would give to an

How People Discover Books Online What is Goodreads? Goodreads is the largest site for readers and

Casey Rosenthal @caseyrosenthal Part One. SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E

Water Quality Fun Book ter Quality Fun Book Water Quality Fun Book ater Quality Fun Book Join

Book Diskette Guide Life Presentation Skill Windows pem chodron book titles for windows xp book

Antariksh Bothale and Maria Antoniak LING 575 -- Spring 2014 Corpus Collection Amazon Book

MIXED BOOK FOCUS QUESTIONS WHAT WAS THE BOOK ABOUT? GIVE A QUICK SUMMARY OF THE BOOK- BUT

C hildrens Book Award Federation of Childrens Book Groups Sponsorship Charity no. 268289 C

The Mata Book William Gould President StataCorp LLC September 2018, London W. Gould

We are learning: to write a book review Sharing a shell Day 4 Book Review.notebook June 04, 2020

Content-based recommendation systems (based on chapter 9 of Mining of Massive Datasets, a book by

PERFORMANCE FAULT TOLERANCE AVAILABILITY FEATURE VELOCITY PERFORMANCE FAULT TOLERANCE

Plains Nitrogen Recommendation Plains Nitrogen Recommendation N lbs/A = (yield * N req.) lbs of

Writing Letters of Recommendation What is a letter of recommendation? A statement of

2015-2016 SUPERINTENDENTS BUDGET RECOMMENDATION BUDGET RECOMMENDATION CHESHIRE PUBLIC

Recommended For You: A First Look at Content Recommendation Networks Muhammad Ahmad

Why learn how to build recommendation engines? Jamen Long Data Scientist DataCamp Building

CMPS 112: Spring 2019 Comparative Programming Languages Intro to Haskell Owen Arden

Marketing Strategy of Apple Published by : www.studymarketing.org 1 Introducing Apple Steve

Lecture 1: Introduction, Types & Expressions (Chapter 1, Section 2.6) CS 1110 Introduction

a new approach to software design analysis Daniel Jackson MIT CSAIL ISSTA, Baltimore

How to do research March 6, 2013 Bill Freeman, CSAIL, MIT Computer Science and Artificial

Compsc sci 201 201 Wo Work, Nbody dy, , ArrayL yLists ts Susan Rodger January 29, 2020

Data Analysis and Map-Reduce with MongoDB and pymongo Alexander C. S. Hendorf, EuroPython 2015,

Four-Lesson Special The Holocaust, Anti-Semitism, and UsPart 4 June 7, 2016 Dean Bible

GoodReads Book Recommendation Service Yijun Tian, Vicky Bai, Zeynep - PowerPoint PPT Presentation

GoodReads Book Recommendation Service Yijun Tian, Vicky Bai, Zeynep Doganata Introduction/Related Work - A subclass of information filtering system that seek to predict the rating or preference that a user would give to an

How People Discover Books Online What is Goodreads? Goodreads is the largest site for readers and

Casey Rosenthal @caseyrosenthal Part One. SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E

Water Quality Fun Book ter Quality Fun Book Water Quality Fun Book ater Quality Fun Book Join

Book Diskette Guide Life Presentation Skill Windows pem chodron book titles for windows xp book

Antariksh Bothale and Maria Antoniak LING 575 -- Spring 2014 Corpus Collection Amazon Book

MIXED BOOK FOCUS QUESTIONS WHAT WAS THE BOOK ABOUT? GIVE A QUICK SUMMARY OF THE BOOK- BUT

C hildrens Book Award Federation of Childrens Book Groups Sponsorship Charity no. 268289 C

The Mata Book William Gould President StataCorp LLC September 2018, London W. Gould

We are learning: to write a book review Sharing a shell Day 4 Book Review.notebook June 04, 2020

Content-based recommendation systems (based on chapter 9 of Mining of Massive Datasets, a book by

PERFORMANCE FAULT TOLERANCE AVAILABILITY FEATURE VELOCITY PERFORMANCE FAULT TOLERANCE

Plains Nitrogen Recommendation Plains Nitrogen Recommendation N lbs/A = (yield * N req.) lbs of

Writing Letters of Recommendation What is a letter of recommendation? A statement of

2015-2016 SUPERINTENDENTS BUDGET RECOMMENDATION BUDGET RECOMMENDATION CHESHIRE PUBLIC

Recommended For You: A First Look at Content Recommendation Networks Muhammad Ahmad

Why learn how to build recommendation engines? Jamen Long Data Scientist DataCamp Building

CMPS 112: Spring 2019 Comparative Programming Languages Intro to Haskell Owen Arden

Marketing Strategy of Apple Published by : www.studymarketing.org 1 Introducing Apple Steve

Lecture 1: Introduction, Types &amp; Expressions (Chapter 1, Section 2.6) CS 1110 Introduction

a new approach to software design analysis Daniel Jackson MIT CSAIL ISSTA, Baltimore

How to do research March 6, 2013 Bill Freeman, CSAIL, MIT Computer Science and Artificial

Compsc sci 201 201 Wo Work, Nbody dy, , ArrayL yLists ts Susan Rodger January 29, 2020

Data Analysis and Map-Reduce with MongoDB and pymongo Alexander C. S. Hendorf, EuroPython 2015,

Four-Lesson Special The Holocaust, Anti-Semitism, and UsPart 4 June 7, 2016 Dean Bible

Lecture 1: Introduction, Types & Expressions (Chapter 1, Section 2.6) CS 1110 Introduction