Effjcient Similarity Computation for Collaborative Filtering in - - PowerPoint PPT Presentation

effjcient similarity computation for collaborative
SMART_READER_LITE
LIVE PREVIEW

Effjcient Similarity Computation for Collaborative Filtering in - - PowerPoint PPT Presentation

Effjcient Similarity Computation for Collaborative Filtering in Dynamic Environments Olivier Jeunen 1 , Koen Verstrepen 2 , Bart Goethals 1,2 September 18th, 2019 1 Adrem Data Lab, University of Antwerp 2 Froomle olivier.jeunen@uantwerp.be 1


slide-1
SLIDE 1

Effjcient Similarity Computation for Collaborative Filtering in Dynamic Environments

Olivier Jeunen1, Koen Verstrepen2, Bart Goethals1,2 September 18th, 2019

1Adrem Data Lab, University of Antwerp 2Froomle

  • livier.jeunen@uantwerp.be

1

slide-2
SLIDE 2

Introduction & Motivation

slide-3
SLIDE 3

Setting the scene

                    u1 i1 t1 u1 i2 t2 u1 i3 t3 u2 i4 t4 u2 i2 t5 u3 i1 t6 u2 i5 t7 u2 i7 t8 u3 i6 t9 . . . . . . . . .                     We deal with implicit feedback: a set of (user, item, timestamp)-triplets, representing clicks, views, sales, … Suppose we have a set of pageviews of this form.

2

slide-4
SLIDE 4

Problem statement

In neighbourhood-based collaborative fjltering1, we need to compute similarity between pairs of items. Items are represented as sparse, high-dimensional columns in the user-item matrix P.                     . . . 1 1 . . . 1 . . . 1 1 . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . 1 1 1 . . . . . . 1 1 1 . . . 1                    

1Still a very competitive baseline, but often deemed unscalable

3

slide-5
SLIDE 5

A need for speed

Typically, the model is periodically recomputed. For ever-growing datasets, these iterative updates can become very time-consuming and model recency is often sacrifjced.

20 40 60 80 100 120 140 160

time

10 20 30 40 50 60

runtime

∆t ∆t

...

Iterative model updates over time

4

slide-6
SLIDE 6

Previous work

Existing approaches tend to speed up computations through

  • Approximation.
  • Parallellisation.
  • Incremental computation.

But currently existing exact solutions do not exploit the sparsity that is inherent to implicit-feedback data streams.

5

slide-7
SLIDE 7

Contribution & Methodology

slide-8
SLIDE 8

Incremental Similarity Computation

In the binary setting, cosine-similarity simplifjes to the number of users that have seen both items, divided by the square root of their individual numbers. cos(i, j) = |Ui ∩ Uj|

  • |Ui|
  • |Uj|

= Mi,j √Ni Nj As such, we can compute these building blocks incrementally instead of recomputing the entire similarity with every update: N ∈ Nn : Ni = |Ui| and M ∈ Nn×n : Mi,j = |Ui ∩ Uj|.

6

slide-9
SLIDE 9

Dynamic Index

Existing approaches tend to build inverted indices in a preprocessing step… we do this on-the-fmy! Initialise a simple inverted index for every user, to hold their histories. For every pageview (u, i):

  • 1. Increment item co-occurence for i and other items seen by u.
  • 2. Update the item’s count.
  • 3. Add the item to the user’s inverted index.

7

slide-10
SLIDE 10

Online Learning

As Dynamic Index consists of a single for-loop over the pageviews, it can naturally handle streaming data. 1 2 3 4 5 6

|P|

×103

1 2 3 4

runtime (s)

×107

|∆P| ∆t ti ti+1 Impact of Online Learning

8

slide-11
SLIDE 11

Parallellisation Procedure

We adopt a MapReduce-like parallellisation framework:

  • Mapping is the Dynamic Index algorithm.
  • Reducing two models M = {M, N, L} and M′ = {M′, N′, L′} is:
  • 1. Summing up M, M′ and N, N′
  • 2. Cross-referencing (u, i)-pairs from L[u] with (u, j)-pairs from L′[u].

Step 2 is obsolete if M and M′ are computed on disjoint sets of users!

9

slide-12
SLIDE 12

Recommendability

Often, the set of items that should be considered as recommendations is constrained by recency, stock, licenses, seasonality, … We denote Rt as the set of recommendable items at time t, and argue that it is often much smaller than the full item collection. Rt ≪ I As such, we only need an up-to-date similarity sim(i, j) if either i or j is recommendable: i ∈ Rt ∨ j ∈ Rt To keep up-to-date with recommendability updates: add a second inverted index for every user.

10

slide-13
SLIDE 13

Experimental Results

slide-14
SLIDE 14

Datasets

Table 1: Experimental dataset characteristics.

Movielens* Netfmix* News Outbrain # “events” 20e6 100e6 96e6 200e6 # users 138e3 480e3 5e6 113e6 # items 27e3 18e3 297e3 1e6 mean items per user 144.41 209.25 18.29 1.76 mean users per item 747.84 5654.50 242.51 184.50 sparsity user-item matrix 99.46% 98.82% 99.99% 99.99% sparsity item-item matrix 59.90% 0.22% 99.83% 99.98%

11

slide-15
SLIDE 15

RQ1: Are we more effjcient than the baselines?

0.0 0.5 1.0 1.5 2.0 ×107 0.0 0.2 0.4 0.6 0.8 1.0 1.2

runtime (s)

×103

Movielens

0.00 0.25 0.50 0.75 1.00 ×108 0.00 0.15 0.30 0.45 0.60 0.75 0.90 1.05 ×104

Netflix

0.00 0.25 0.50 0.75 1.00

|P|

×108 1 2 3 4 5 6 7

runtime (s)

×103

News

0.0 0.5 1.0 1.5 2.0

|P|

×108 0.0 0.2 0.4 0.6 0.8 1.0 1.2 ×103

Outbrain Sparse Baseline Dynamic Index

12

slide-16
SLIDE 16

RQ1: Are we more effjcient than the baselines?

Observations

  • More effjcient if M is sparse
  • More effjcient if users have shorter histories
  • Average number of processed interactions per second

ranges from 14 500 to 834 000

13

slide-17
SLIDE 17

RQ2: How effective is parallellisation?

0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 ×107 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

runtime (s)

×103

Movielens

0.0 0.2 0.4 0.6 0.8 1.0 ×108 1 2 3 4 5 6 7 8 ×103

Netflix

0.0 0.2 0.4 0.6 0.8 1.0

|P|

×108 1 2 3 4

runtime (s)

×103

News

0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

|P|

×108 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ×102

Outbrain n = 1 n = 2 n = 4 n = 8

14

slide-18
SLIDE 18

RQ2: How effective is parallellisation?

Observations

  • Speedup factor of > 4 for Netfmix and News datasets with 8 cores
  • Incremental updates complicate reduce procedure:
  • For suffjciently large batches, performance gains are tangible.
  • For small batches, single-core updates are preferred.

15

slide-19
SLIDE 19

RQ3: What is the effect of constrained recommendability?

101 102 103

runtime (s) News (n = 8)

0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75

time (h)

×102 103 104 105

|Rt| δ = 6h δ = 12h δ = 18h δ = 24h δ = 48h δ = 96h δ = 168h δ = ∞

16

slide-20
SLIDE 20

RQ3: What is the effect of constrained recommendability?

Observations

  • Clear effjciency gains for lower values of δ:
  • 48h only needs < 10% of the runtime needed without

restrictions.

  • 24h

< 5%

  • 6h

1.6%

  • Slope of increasing runtime with more data is fmattened,

improving scalability.

17

slide-21
SLIDE 21

Conclusion & Future Work

slide-22
SLIDE 22

Conclusion

We introduce Dynamic Index, which:

  • is faster than the state-of-the art in exact similarity computation

for sparse and high-dimensional data.

  • computes incrementally by design.
  • is easily parallellisable.
  • naturally handles and exploits recommendability of items.

18

slide-23
SLIDE 23

Conclusion

We introduce Dynamic Index, which:

  • is faster than the state-of-the art in exact similarity computation

for sparse and high-dimensional data.

  • computes incrementally by design.
  • is easily parallellisable.
  • naturally handles and exploits recommendability of items.

18

slide-24
SLIDE 24

Conclusion

We introduce Dynamic Index, which:

  • is faster than the state-of-the art in exact similarity computation

for sparse and high-dimensional data.

  • computes incrementally by design.
  • is easily parallellisable.
  • naturally handles and exploits recommendability of items.

18

slide-25
SLIDE 25

Conclusion

We introduce Dynamic Index, which:

  • is faster than the state-of-the art in exact similarity computation

for sparse and high-dimensional data.

  • computes incrementally by design.
  • is easily parallellisable.
  • naturally handles and exploits recommendability of items.

18

slide-26
SLIDE 26

Questions?

Source code is available: Academics hire too! PhD students + Post-docs

19

slide-27
SLIDE 27

Future Work

  • More advanced similarity measures:
  • Jaccard index, Pointwise Mutual Information (PMI), Pearson correlation,…

are all dependent on the co-occurrence matrix M.

  • Beyond item-item collaborative fjltering:
  • With relatively straightforward extensions…

(e.g. including a value in the inverted indices to allow for non-binary data) …we can tackle more general Information Retrieval use-cases.

20