Recommender System in KKBOX Simple Complex Ranking Model - PowerPoint PPT Presentation

Recommender System in KKBOX

Simple Complex Ranking Model Collaborative Persona Aware based Filtering Embedding Attribute Based Context Aware Representation

#Item x #users x #attributes

Serendipity/Novelty Diversity Precision

Collaborative Filtering Matrix Factorization

Word2Vec - “ The results, to our own surprise, show that the buzz is fully justified, as the context-predicting models obtain a thorough and resounding victory against their count-based counterparts. ” - Marco et al. “You should know the word by the company it keeps” (Firth J.R.)

CBOW Skip-gram CBOW

DeepWalk (Bryan Perozzi, Rami Al-Rfou& Steven Skiena, 2014 ) Random Walk Word2Vec

青花瓷給我一首歌的時珊瑚海我不配間黃金甲珊瑚海雙截棍天地一鬥

(#item + #users) x log(#Item + #users) x #hidden nodes x window_size

Cold start ? Learn the relationships between laten factors and audio signals

We got features - And ranking is another problem

Click/Play Prediction ● Regression ● Classification Learn to Rank Content User Understanding Understaning ● Embedding ● User Profiling ● Classification ● Embedding ● Topic Mining

買菜送蔥 Building a pipeline Data pre-processing → ETL Job Feature extraction → Numerical/ Categorical... Model fitting → Logistic Regression/GBDT Validation stages → Cross Validation

Challenges ● Big data ● Heterogeneous sources ● Various formatting ● Data versioning ● Data quality ● Data freshness ● Cost ● Coding is hard, debugging is harder

Logs: External Datasets / Logs: Databases: Parquet, Genre, BPM, Artist Json, Tsv, Songs, DB Mixpanel, App Annie Members, …... Text, …... …... ● Data cleaning, normalization ETL ● Pre aggregation / join Parquet files in S3, partitioned by date and service region if needed.

ETL Data (Parquet files on S3) DB Thrift (or Protobuf, Hive Table Replication Avro) Schema Presto (or Amazon Athena) ● Apache Spark (Scala) ○ From files on S3 to RDD / Dataframe ○ Use JDBC Driver from Presto ● Python / R ○ Read file from S3, deserialize parquet ○ Use JDBC/ODBC driver from Presto

Example

Challenges ● Big data = EC2 + Spark + Hadoop Family + Presto ● Heterogeneous sources = ETL ● Various formatting = ETL ● Data versioning = ETL ● Data quality = ETL ● Data freshness = DB Replication, Data Streaming ● Cost = EC2, Good Tool Chain ● Coding is hard, debugging is harder - Good Design

Case Study

Nearest Neighbors of Songs 1. Build a weighted bipartite graph of users and songs from logs ● Terebytes of data, billions of nodes and edges ● Spark cluster on EC2. (On-demand, hundreds of cores, I/O optimized) 2. Put each song on a vector space 3. Find K-NN for each song ● Random walks ● O(n^2) is impossible ● An embedded model (We use ● Approximation. For example, word2vec) Locality-Sensitive Hashing ● In an very very large instance with a ● Using a spark cluster on EC2, the lot of memory and cores. worker nodes are cpu optimized. All middle results are in parquet format on S3, so we can inpect them with Presto.

Songs a User Like to Listen Again 1. Extract features from logs, databases, and external data set ● Join billions of transactions. ● Spark cluster on EC2. (On-demand, hundreds of cores) 2. Train a model 3. Repeat - feature selection, parameter tuning ● Spark MLlib (EX: GBDT) ● Deep learning frameworks 4. Predict from recent logs (TensorFlow)

Life cycle of ML-related features Define the Problem Deploy and Inspect the A/B Testing Data Train and Verify the Hypothesis Model

References Apache Spark ● Apache Parquet ● Apache Thrift ● Apache Hive ● Presto ● Amazon Elastic Compute Clould ●

Recommender System in KKBOX Simple Complex Ranking Model - PowerPoint PPT Presentation

Recommender System in KKBOX Simple Complex Ranking Model Collaborative Persona Aware based Filtering Embedding Attribute Based Context Aware Representation #Item x #users x #attributes Serendipity/Novelty Diversity Precision

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

Web Mining and Recommender Systems Recommender Systems: Introduction Learning Goals

CSE 258 Web Mining and Recommender Systems Advanced Recommender Systems This week

CSE 258 Web Mining and Recommender Systems Advanced Recommender Systems This week

Web Mining and Recommender Systems Advanced Recommender Systems This week Methodological papers

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21:

2. Recommender Systems Recommenders Everywhere Advanced Topics in Information Retrieval /

Affect- and Personality-based Recommender Systems Part II: Acquisition, Usage in Recommender

On the Economics of Recommender Systems Emilio Calvano Center for Studies in Econ and Finance U.

KKBOX Ann Chen, York Tsai 2017/04/20 This

Vagrant Docker Gea-Suan Lin KKBOX Technologies Vagrant

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

ACADEMIC RECOMMENDER SYSTEM DESIGN 1 WHATS ACADEMIC RECOMMENDER SYSTEM Similar

D ATA S CIENCE E COSYSTEM M. T AMER ZSU N ANCY R EID R AYMOND N G U. W ATERLOO U. T ORONTO UBC

A Framework and Tool for Collaborative Extraction of Reliable Information Graham Neubig 1 ,

Adaptive Sequential Recommendation for Discussion Forums on MOOCs using Context Trees Fei Mi, Boi

Health Cloud Project Integrated Media Systems Center University of Southern California Dimitrios

Requirements Engineering in the Days of Social Computing Computing John Mylopoulos University

SYSTEM FOR TURKISH CUISINE Supervisor Assist. Prof. Dr. Engin DEMR Prepared by 201112031

FlickOh : Personalized Movie Recommendation and Rating System What is FlickOh? Movie rating

Word Semantic Representations using Bayesian Probabilistic Tensor Factorization Jingwei Zhang,