KKBOX Ann Chen, York Tsai 2017/04/20 This - - PowerPoint PPT Presentation

kkbox
SMART_READER_LITE
LIVE PREVIEW

KKBOX Ann Chen, York Tsai 2017/04/20 This - - PowerPoint PPT Presentation

KKBOX Ann Chen, York Tsai 2017/04/20 This presentation is provided on a strictly private and confidential basis for information purposes only. Machine Learning algorithms behind Personalized


slide-1
SLIDE 1

This presentation is provided on a strictly private and confidential basis for information purposes only.

KKBOX 利用機器學習讓用戶越聽越多元

Ann Chen, York Tsai 2017/04/20

slide-2
SLIDE 2

This presentation is provided on a strictly private and confidential basis for information purposes only.

Machine Learning algorithms behind Personalized Recommenders

slide-3
SLIDE 3

This presentation is provided on a strictly private and confidential basis for information purposes only.

slide-4
SLIDE 4

This presentation is provided on a strictly private and confidential basis for information purposes only.

Attribute Based Collaborative Filtering Learn to Rank Context Aware Persona Aware

Simple Complex

Cold Start Interactive Recommender

slide-5
SLIDE 5

This presentation is provided on a strictly private and confidential basis for information purposes only.

User Understanding Content Understanding

Learn to Rank

  • Embedding
  • Classification
  • Topic Mining
  • Popularity
  • Trending
  • User Profiling
  • Embedding

Click/Play Prediction

  • Regression
  • Classification
  • Distance in Feature Space
slide-6
SLIDE 6

This presentation is provided on a strictly private and confidential basis for information purposes only.

slide-7
SLIDE 7

This presentation is provided on a strictly private and confidential basis for information purposes only.

slide-8
SLIDE 8

This presentation is provided on a strictly private and confidential basis for information purposes only.

slide-9
SLIDE 9

This presentation is provided on a strictly private and confidential basis for information purposes only.

Precision Diversity Serendipity/Novelty

slide-10
SLIDE 10

This presentation is provided on a strictly private and confidential basis for information purposes only.

User Understanding Content Understanding

Learn to Rank

  • Embedding
  • Classification
  • Topic Mining
  • Popularity
  • Trending
  • User Profiling
  • Embedding

Click/Play Prediction

  • Regression
  • Classification
  • Distance in Feature Space
slide-11
SLIDE 11

This presentation is provided on a strictly private and confidential basis for information purposes only.

slide-12
SLIDE 12

This presentation is provided on a strictly private and confidential basis for information purposes only.

Collaborative Filtering

slide-13
SLIDE 13

This presentation is provided on a strictly private and confidential basis for information purposes only.

Word2Vec - “The results, to our own surprise, show that the buzz is fully justified, as the context-predicting models obtain a thorough and resounding victory against their count-based counterparts.” - Marco et al.

“You should know the word by the company it keeps” (Firth J.R.)

slide-14
SLIDE 14

This presentation is provided on a strictly private and confidential basis for information purposes only.

slide-15
SLIDE 15

This presentation is provided on a strictly private and confidential basis for information purposes only.

slide-16
SLIDE 16

This presentation is provided on a strictly private and confidential basis for information purposes only.

slide-17
SLIDE 17

This presentation is provided on a strictly private and confidential basis for information purposes only.

DeepWalk (Bryan Perozzi, Rami Al-Rfou& Steven Skiena, 2014 )

slide-18
SLIDE 18

This presentation is provided on a strictly private and confidential basis for information purposes only.

青花瓷

珊瑚海 我不配 給我一首歌的時 間

黃金甲

霍元甲 雙截棍 天地一鬥

slide-19
SLIDE 19

This presentation is provided on a strictly private and confidential basis for information purposes only.

Audio attributes Tempo Instrument Vocal Sound Topic (extract from playlist title) Context (咖啡館, 睡前...) Mood (放鬆, 暢快...) Activity (飛輪, 旅遊...) Genre / Sub-genre HipHop - Boom Bap, Urban, Trap Rock -Rock & Roll, Folk/Indie Pop, Punk Grunge, Dream Pop Electronic - House, Techno, Trap/Twerk

slide-20
SLIDE 20

This presentation is provided on a strictly private and confidential basis for information purposes only.

Learn the relationships between lebals/latent factors and audio signals

  • Multimodal
  • Multiple
  • bjective
slide-21
SLIDE 21

This presentation is provided on a strictly private and confidential basis for information purposes only.

slide-22
SLIDE 22

This presentation is provided on a strictly private and confidential basis for information purposes only.

Learn to rank - Regression or Classification

slide-23
SLIDE 23

Personal Preferences Context Trending/ Popular Social Factors Serendipity/ Novelty

slide-24
SLIDE 24

This presentation is provided on a strictly private and confidential basis for information purposes only.

User Understanding Content Understanding

Learn to Rank

  • Embedding
  • Classification
  • Topic Mining
  • Popularity
  • Trending
  • User Profiling
  • Embedding

Click/Play Prediction

  • Regression
  • Classification
  • Distance in Embedding Space
slide-25
SLIDE 25

This presentation is provided on a strictly private and confidential basis for information purposes only.

Challenges

slide-26
SLIDE 26

This presentation is provided on a strictly private and confidential basis for information purposes only.

Heterogeneous Data Sources

  • Database
  • Log
  • Streaming
  • Static Dataset
  • API
slide-27
SLIDE 27

This presentation is provided on a strictly private and confidential basis for information purposes only.

Dynamic Workload

slide-28
SLIDE 28

This presentation is provided on a strictly private and confidential basis for information purposes only.

Heterogeneous Resource Requirement

  • CPU
  • GPU
  • Memory
  • SSD
slide-29
SLIDE 29

This presentation is provided on a strictly private and confidential basis for information purposes only.

Make Development Cycle Easy

Feature Definition Data Inspection Hypothesis Model Training and Verification Deploy and A/B Testing

slide-30
SLIDE 30

This presentation is provided on a strictly private and confidential basis for information purposes only.

Infrastructure in KKBOX Research Center

slide-31
SLIDE 31

This presentation is provided on a strictly private and confidential basis for information purposes only.

slide-32
SLIDE 32

This presentation is provided on a strictly private and confidential basis for information purposes only.

Infrastructure in KKBOX RDC

slide-33
SLIDE 33

This presentation is provided on a strictly private and confidential basis for information purposes only.

Infrastructure as Code (IaC)

slide-34
SLIDE 34

This presentation is provided on a strictly private and confidential basis for information purposes only.

S3: Data Storage

slide-35
SLIDE 35

This presentation is provided on a strictly private and confidential basis for information purposes only.

ETL and Data Format

slide-36
SLIDE 36

This presentation is provided on a strictly private and confidential basis for information purposes only.

Presto: Data Analytic Engine

slide-37
SLIDE 37

This presentation is provided on a strictly private and confidential basis for information purposes only.

Spark: Computing Framework

○ ○

slide-38
SLIDE 38

This presentation is provided on a strictly private and confidential basis for information purposes only.

Dashboard

slide-39
SLIDE 39

This presentation is provided on a strictly private and confidential basis for information purposes only.

Data Pipeline Examples

slide-40
SLIDE 40

This presentation is provided on a strictly private and confidential basis for information purposes only.

Nearest Neighbors for Songs

slide-41
SLIDE 41

This presentation is provided on a strictly private and confidential basis for information purposes only.

Predict Songs User Might Listen Again

slide-42
SLIDE 42

This presentation is provided on a strictly private and confidential basis for information purposes only.

Genre Classification

slide-43
SLIDE 43

This presentation is provided on a strictly private and confidential basis for information purposes only.

Spark with S3 Tips

  • spark.hadoop.fs.s3a.connection.maximum
  • .set("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", "2")
  • spark.speculation
  • .set("spark.hadoop.fs.s3a.readahead.range", "512M")
slide-44
SLIDE 44

This presentation is provided on a strictly private and confidential basis for information purposes only.

EC2 Tips

slide-45
SLIDE 45

This presentation is provided on a strictly private and confidential basis for information purposes only.

Thank You!