PageRank and recommenders on very large scale A Big Data - - PowerPoint PPT Presentation

pagerank and recommenders on very large scale
SMART_READER_LITE
LIVE PREVIEW

PageRank and recommenders on very large scale A Big Data - - PowerPoint PPT Presentation

PageRank and recommenders on very large scale PageRank and recommenders on very large scale A Big Data perspective through Stratosphere Mrton Balassi Data Mining and Search Group 1 1 Computer and Automation Research Institute of the Hungarian


slide-1
SLIDE 1

PageRank and recommenders on very large scale

PageRank and recommenders

  • n very large scale

A Big Data perspective through Stratosphere Márton Balassi Data Mining and Search Group1

1Computer and Automation Research Institute of the Hungarian Academy of Sciences

May 8, 2014

slide-2
SLIDE 2

PageRank and recommenders on very large scale

Table of Contents

Distributing data-intensive algorithms Stratosphere Input Contracts PageRank and recommender systems Reference

slide-3
SLIDE 3

PageRank and recommenders on very large scale Distributing data-intensive algorithms

Table of contents

Distributing data-intensive algorithms Stratosphere Input Contracts PageRank and recommender systems Reference

slide-4
SLIDE 4

PageRank and recommenders on very large scale Distributing data-intensive algorithms Motivation

Motivation

Let’s do a PageRank on this graph. . .

◮ The soc-LiveJournal1 provided by Stanford LNDC1 ◮ 4.8 · 106 nodes ◮ 6.9 · 107 edges ◮ 250 MB of compressed data ◮ „Conventional” single machine solution seems sufficient

1Stanford Large Network Dataset Collection

slide-5
SLIDE 5

PageRank and recommenders on very large scale Distributing data-intensive algorithms Motivation

Motivation

Let’s do a PageRank on this graph. . .

◮ The soc-LiveJournal1 provided by Stanford LNDC1 ◮ 4.8 · 106 nodes ◮ 6.9 · 107 edges ◮ 250 MB of compressed data ◮ „Conventional” single machine solution seems sufficient

1Stanford Large Network Dataset Collection

slide-6
SLIDE 6

PageRank and recommenders on very large scale Distributing data-intensive algorithms Motivation

Motivation

Let’s do a PageRank on this graph. . .

◮ The soc-LiveJournal1 provided by Stanford LNDC1 ◮ 4.8 · 106 nodes ◮ 6.9 · 107 edges ◮ 250 MB of compressed data ◮ „Conventional” single machine solution seems sufficient

1Stanford Large Network Dataset Collection

slide-7
SLIDE 7

PageRank and recommenders on very large scale Distributing data-intensive algorithms Motivation

Motivation

Let’s do a PageRank on this graph. . .

◮ The soc-LiveJournal1 provided by Stanford LNDC1 ◮ 4.8 · 106 nodes ◮ 6.9 · 107 edges ◮ 250 MB of compressed data ◮ „Conventional” single machine solution seems sufficient

1Stanford Large Network Dataset Collection

slide-8
SLIDE 8

PageRank and recommenders on very large scale Distributing data-intensive algorithms Motivation

Motivation

Let’s do a PageRank on this graph. . .

◮ The soc-LiveJournal1 provided by Stanford LNDC1 ◮ 4.8 · 106 nodes ◮ 6.9 · 107 edges ◮ 250 MB of compressed data ◮ „Conventional” single machine solution seems sufficient

1Stanford Large Network Dataset Collection

slide-9
SLIDE 9

PageRank and recommenders on very large scale Distributing data-intensive algorithms Motivation

Motivation

Let’s do a PageRank on this graph. . .

◮ A large Portugese webcrawl1 ◮ 3.1 · 109 nodes ◮ 1.1 · 1011 edges ◮ 80 GB of compressed data ◮ Divide and conquer is almost mandatory

1a large Portuguese crawl of the Portuguese Web Archive obtained from

Daniel Gomes

slide-10
SLIDE 10

PageRank and recommenders on very large scale Distributing data-intensive algorithms Motivation

Motivation

Let’s do a PageRank on this graph. . .

◮ A large Portugese webcrawl1 ◮ 3.1 · 109 nodes ◮ 1.1 · 1011 edges ◮ 80 GB of compressed data ◮ Divide and conquer is almost mandatory

1a large Portuguese crawl of the Portuguese Web Archive obtained from

Daniel Gomes

slide-11
SLIDE 11

PageRank and recommenders on very large scale Distributing data-intensive algorithms Motivation

Motivation

Let’s do a PageRank on this graph. . .

◮ A large Portugese webcrawl1 ◮ 3.1 · 109 nodes ◮ 1.1 · 1011 edges ◮ 80 GB of compressed data ◮ Divide and conquer is almost mandatory

1a large Portuguese crawl of the Portuguese Web Archive obtained from

Daniel Gomes

slide-12
SLIDE 12

PageRank and recommenders on very large scale Distributing data-intensive algorithms Motivation

Motivation

Let’s do a PageRank on this graph. . .

◮ A large Portugese webcrawl1 ◮ 3.1 · 109 nodes ◮ 1.1 · 1011 edges ◮ 80 GB of compressed data ◮ Divide and conquer is almost mandatory

1a large Portuguese crawl of the Portuguese Web Archive obtained from

Daniel Gomes

slide-13
SLIDE 13

PageRank and recommenders on very large scale Distributing data-intensive algorithms Motivation

Motivation

Let’s do a PageRank on this graph. . .

◮ A large Portugese webcrawl1 ◮ 3.1 · 109 nodes ◮ 1.1 · 1011 edges ◮ 80 GB of compressed data ◮ Divide and conquer is almost mandatory

1a large Portuguese crawl of the Portuguese Web Archive obtained from

Daniel Gomes

slide-14
SLIDE 14

PageRank and recommenders on very large scale Distributing data-intensive algorithms MapReduce

MapReduce

slide-15
SLIDE 15

PageRank and recommenders on very large scale Distributing data-intensive algorithms Pregel

Pregel

Traits

◮ Bulk Synchronous

Parallel

◮ „Think like a vertex” ◮ Graph kept in memory

Scheme of the BSP system

Wikipedia, public domain

slide-16
SLIDE 16

PageRank and recommenders on very large scale Distributing data-intensive algorithms Pregel

Pregel

Traits

◮ Bulk Synchronous

Parallel

◮ „Think like a vertex” ◮ Graph kept in memory

Scheme of the BSP system

Wikipedia, public domain

slide-17
SLIDE 17

PageRank and recommenders on very large scale Distributing data-intensive algorithms Pregel

Pregel

Traits

◮ Bulk Synchronous

Parallel

◮ „Think like a vertex” ◮ Graph kept in memory

Scheme of the BSP system

Wikipedia, public domain

slide-18
SLIDE 18

PageRank and recommenders on very large scale Distributing data-intensive algorithms Counting the number of triangles in a graph

Triangle Counter – Sequential algorithm

Sequential algorithm

Every vertex executes a search of itself bounded in depth of three. Thus every triangle is counted three times.

slide-19
SLIDE 19

PageRank and recommenders on very large scale Distributing data-intensive algorithms Counting the number of triangles in a graph

Triangle Counter – MapReduce algorithm

Representation

0 1 2 1 2 2 0 3 1 2 3

slide-20
SLIDE 20

PageRank and recommenders on very large scale Distributing data-intensive algorithms Counting the number of triangles in a graph

Triangle Counter – MapReduce algorithm

First Map

Let’s send our ID to all of our neighbours possessing a higher ID than ours. Let’s send our neighbours to ourselves.

First Reduce

Let’s write out the information received. 1 2 1

slide-21
SLIDE 21

PageRank and recommenders on very large scale Distributing data-intensive algorithms Counting the number of triangles in a graph

Triangle Counter – MapReduce algorithm

Second Map

If the ID received is smaller then

  • urs let’s pass it on to our

neighbours. Let’s send our neighbours to

  • urselves.

Second Reduce

If the ID received is our neighbour then let’s increment a global counter. 0 [] 1 [0] 2 [1] 1

slide-22
SLIDE 22

PageRank and recommenders on very large scale Distributing data-intensive algorithms Counting the number of triangles in a graph

Triangle Counter – MapReduce algorithm

Second Map

If the ID received is smaller then

  • urs let’s pass it on to our

neighbours. Let’s send our neighbours to

  • urselves.

Second Reduce

If the ID received is our neighbour then let’s increment a global counter. 0 + + 1 2

slide-23
SLIDE 23

PageRank and recommenders on very large scale Distributing data-intensive algorithms Counting the number of triangles in a graph

Runtime of the three solutions

slide-24
SLIDE 24

PageRank and recommenders on very large scale Stratosphere Input Contracts

Table of contents

Distributing data-intensive algorithms Stratosphere Input Contracts PageRank and recommender systems Reference

slide-25
SLIDE 25

PageRank and recommenders on very large scale Stratosphere Input Contracts Map

Map

Wordcount Map

For lines of input text emit (word, 1) for each word.

slide-26
SLIDE 26

PageRank and recommenders on very large scale Stratosphere Input Contracts Map

Map

public static class TokenizeLine extends MapStub implements Serializable { private static final long serialVersionUID = 1L; // initialize reusable mutable objects private final PactRecord outputRecord = new PactRecord(); private final PactString word = new PactString(); private final PactInteger one = new PactInteger(1); @Override public void map(PactRecord record, Collector<PactRecord> collector) { // get the first field (as type PactString) from the record PactString line = record.getField(0, PactString.class); // normalize the line with AsciiUtils ... // tokenize the line this.tokenizer.setStringToTokenize(line); while (tokenizer.next(this.word)){ // emit a (word, 1) pair this.outputRecord.setField(0, this.word); this.outputRecord.setField(1, this.one); collector.collect(this.outputRecord); } } }

slide-27
SLIDE 27

PageRank and recommenders on very large scale Stratosphere Input Contracts Map

Map

slide-28
SLIDE 28

PageRank and recommenders on very large scale Stratosphere Input Contracts Reduce

Reduce

Wordcount Reduce

For multiple instances of (word, 1) count frequency of each word.

slide-29
SLIDE 29

PageRank and recommenders on very large scale Stratosphere Input Contracts Reduce

Reduce

public static class CountWords extends ReduceStub implements Serializable { private final PactInteger cnt = new PactInteger(); @Override public void reduce(Iterator<PactRecord> records, Collector<PactRecord> out) throws Exception { PactRecord element = null; int sum = 0; while (records.hasNext()) { element = records.next(); PactInteger i = element.getField(1, PactInteger.class); sum += i.getValue(); } this.cnt.setValue(sum); element.setField(1, this.cnt);

  • ut.collect(element);

} @Override public void combine(Iterator<PactRecord> records, Collector<PactRecord> out) throws Exception { // same logic as reduce so simply a call to it this.reduce(records, out); } }

slide-30
SLIDE 30

PageRank and recommenders on very large scale Stratosphere Input Contracts Reduce

Reduce

slide-31
SLIDE 31

PageRank and recommenders on very large scale Stratosphere Input Contracts Cross

Cross

K-Means Cross

Given data points and cluster centers compute the distance between each data point and cluster center.

slide-32
SLIDE 32

PageRank and recommenders on very large scale Stratosphere Input Contracts Cross

Cross

public class ComputeDistance extends CrossStub implements Serializable { private static final long serialVersionUID = 1L; private final PactDouble distance = new PactDouble(); //Output Format: (pointID, pointVector, clusterID, distance) @Override public void cross(PactRecord dataPointRecord, PactRecord clusterCenterRecord, Collector<PactRecord> out) { CoordVector dataPoint = dataPointRecord.getField(1, CoordVector.class); PactInteger clusterCenterId = clusterCenterRecord.getField(0, PactInteger.class); CoordVector clusterPoint = clusterCenterRecord.getField(1, CoordVector.class); this.distance.setValue(dataPoint.computeEuclidianDistance(clusterPoint)); // add cluster center id and distance to the data point record dataPointRecord.setField(2, clusterCenterId); dataPointRecord.setField(3, this.distance);

  • ut.collect(dataPointRecord);

} }

slide-33
SLIDE 33

PageRank and recommenders on very large scale Stratosphere Input Contracts Cross

Cross

slide-34
SLIDE 34

PageRank and recommenders on very large scale Stratosphere Input Contracts Match

Match

Path Match

Given edges (e, f ) and (f , g) of a graph construct (e, g) paths.

slide-35
SLIDE 35

PageRank and recommenders on very large scale Stratosphere Input Contracts Match

Match

public static class ConcatPaths extends MatchStub implements Serializable { //define outputRecord, length, hopCnt, hopList... @Override public void match(PactRecord rec1, PactRecord rec2, Collector<PactRecord>

  • ut) throws Exception {

// rec1 has matching start, rec2 matching end final PactString fromNode = rec2.getField(0, PactString.class); final PactString toNode = rec1.getField(1, PactString.class); if (fromNode.equals(toNode)) return; //circle prevention // Create new path

  • utputRecord.setField(0, fromNode);
  • utputRecord.setField(1, toNode);

// Compute length of new path & hop count ... // Concatenate hops lists and insert matching node... // Append the whole path in a Stringbuilder... hopList.setValue(sb.toString().trim());

  • utputRecord.setField(4, hopList);
  • ut.collect(outputRecord);

} }

slide-36
SLIDE 36

PageRank and recommenders on very large scale Stratosphere Input Contracts Match

Match

slide-37
SLIDE 37

PageRank and recommenders on very large scale Stratosphere Input Contracts CoGroup

CoGroup

Floyd CoGroup

Given shortest paths to inneighbours of a vertex in a directed graph and the edges of the graph compute the shortest path to the vertex.

slide-38
SLIDE 38

PageRank and recommenders on very large scale Stratosphere Input Contracts CoGroup

CoGroup

public static class FindShortestPath extends CoGroupStub implements Serializable { // define outputRecord, shortestPaths, hopCnts, minLength ... @Override public void coGroup(Iterator<PactRecord> inputRecords, Iterator<PactRecord> concatRecords, Collector<PactRecord> out) { // init minimum length and minimum path ... // find shortest path of all input paths... // find shortest path of all input and concatenated paths...

  • utputRecord.setField(0, fromNode);
  • utputRecord.setField(1, toNode);
  • utputRecord.setField(2, minLength);

// emit all shortest paths for(PactString shortestPath : shortestPaths) {

  • utputRecord.setField(3, hopCnts.get(shortestPath));
  • utputRecord.setField(4, shortestPath);
  • ut.collect(outputRecord);

} hopCnts.clear(); shortestPaths.clear(); } }

slide-39
SLIDE 39

PageRank and recommenders on very large scale Stratosphere Input Contracts CoGroup

CoGroup

slide-40
SLIDE 40

PageRank and recommenders on very large scale PageRank and recommender systems

Table of contents

Distributing data-intensive algorithms Stratosphere Input Contracts PageRank and recommender systems Reference

slide-41
SLIDE 41

PageRank and recommenders on very large scale PageRank and recommender systems Iterations in Stratosphere

Iterations in Stratosphere

Denotation

◮ S is a partitioned dataset ◮ f is a Stratosphere program ◮ < is a termination criterion

slide-42
SLIDE 42

PageRank and recommenders on very large scale PageRank and recommender systems Iterations in Stratosphere

Iterations in Stratosphere

Denotation

◮ S is a partitioned dataset ◮ f is a Stratosphere program ◮ < is a termination criterion

slide-43
SLIDE 43

PageRank and recommenders on very large scale PageRank and recommender systems Iterations in Stratosphere

Iterations in Stratosphere

Denotation

◮ S is a partitioned dataset ◮ f is a Stratosphere program ◮ < is a termination criterion

slide-44
SLIDE 44

PageRank and recommenders on very large scale PageRank and recommender systems Iterations in Stratosphere

Iterations in Stratosphere

Denotation

◮ S is a partitioned dataset ◮ f is a Stratosphere program ◮ < is a termination criterion

1: while S < f (S) do 2:

do S := f (S)

slide-45
SLIDE 45

PageRank and recommenders on very large scale PageRank and recommender systems Iterations in Stratosphere

Bulk iterations

Traits

◮ Each iteration is a

synchronization point (superstep)

◮ Optimizer weighs costs of

dynamic data path with iterations

◮ Caches where data paths meet ◮ Pushes repeated work to

constant data path

PageRank scheme

slide-46
SLIDE 46

PageRank and recommenders on very large scale PageRank and recommender systems Iterations in Stratosphere

Bulk iterations

Traits

◮ Each iteration is a

synchronization point (superstep)

◮ Optimizer weighs costs of

dynamic data path with iterations

◮ Caches where data paths meet ◮ Pushes repeated work to

constant data path

PageRank scheme

slide-47
SLIDE 47

PageRank and recommenders on very large scale PageRank and recommender systems Iterations in Stratosphere

Bulk iterations

Traits

◮ Each iteration is a

synchronization point (superstep)

◮ Optimizer weighs costs of

dynamic data path with iterations

◮ Caches where data paths meet ◮ Pushes repeated work to

constant data path

PageRank scheme

slide-48
SLIDE 48

PageRank and recommenders on very large scale PageRank and recommender systems Iterations in Stratosphere

Bulk iterations

Traits

◮ Each iteration is a

synchronization point (superstep)

◮ Optimizer weighs costs of

dynamic data path with iterations

◮ Caches where data paths meet ◮ Pushes repeated work to

constant data path

PageRank scheme

slide-49
SLIDE 49

PageRank and recommenders on very large scale PageRank and recommender systems Iterations in Stratosphere

Incremental iterations

Rationale

◮ New construct: incremental (workset) iteration ◮ W contains elements from S that may change in the next

iteration

◮ D computed from S, W and efficiently merged with prior S

Workset W recomputed from D

slide-50
SLIDE 50

PageRank and recommenders on very large scale PageRank and recommender systems Iterations in Stratosphere

Incremental iterations

Rationale

◮ New construct: incremental (workset) iteration ◮ W contains elements from S that may change in the next

iteration

◮ D computed from S, W and efficiently merged with prior S

Workset W recomputed from D

slide-51
SLIDE 51

PageRank and recommenders on very large scale PageRank and recommender systems Iterations in Stratosphere

Incremental iterations

Rationale

◮ New construct: incremental (workset) iteration ◮ W contains elements from S that may change in the next

iteration

◮ D computed from S, W and efficiently merged with prior S

Workset W recomputed from D

slide-52
SLIDE 52

PageRank and recommenders on very large scale PageRank and recommender systems Iterations in Stratosphere

Incremental iterations

Rationale

◮ New construct: incremental (workset) iteration ◮ W contains elements from S that may change in the next

iteration

◮ D computed from S, W and efficiently merged with prior S

Workset W recomputed from D

1: S := I, W := S 2: while W = ∅ do 3:

D := u(S, W )

4:

W := δ(D, S, W )

5:

S := S ⊎ W

slide-53
SLIDE 53

PageRank and recommenders on very large scale PageRank and recommender systems Iterations in Stratosphere

Pregel as a Stratosphere job

slide-54
SLIDE 54

PageRank and recommenders on very large scale PageRank and recommender systems Recommender systems

Recommender systems

Alternating Least Squares (ALS)

◮ We have a U user and an I itemset ◮ The users rating are stored in R ∈ R|U|×|I| ◮ But |U| and |I| can easily be at the range of millions. . . ◮ Let’s find P and Q such that PQ ≈ R ◮ Let P ∈ R|U|×k and Q ∈ Rk×|I|, where k is a small constant ◮ The algorithm uses least squares to estimate, alternating for

P and Q

slide-55
SLIDE 55

PageRank and recommenders on very large scale PageRank and recommender systems Recommender systems

Recommender systems

Alternating Least Squares (ALS)

◮ We have a U user and an I itemset ◮ The users rating are stored in R ∈ R|U|×|I| ◮ But |U| and |I| can easily be at the range of millions. . . ◮ Let’s find P and Q such that PQ ≈ R ◮ Let P ∈ R|U|×k and Q ∈ Rk×|I|, where k is a small constant ◮ The algorithm uses least squares to estimate, alternating for

P and Q

slide-56
SLIDE 56

PageRank and recommenders on very large scale PageRank and recommender systems Recommender systems

Recommender systems

Alternating Least Squares (ALS)

◮ We have a U user and an I itemset ◮ The users rating are stored in R ∈ R|U|×|I| ◮ But |U| and |I| can easily be at the range of millions. . . ◮ Let’s find P and Q such that PQ ≈ R ◮ Let P ∈ R|U|×k and Q ∈ Rk×|I|, where k is a small constant ◮ The algorithm uses least squares to estimate, alternating for

P and Q

slide-57
SLIDE 57

PageRank and recommenders on very large scale PageRank and recommender systems Recommender systems

Recommender systems

Alternating Least Squares (ALS)

◮ We have a U user and an I itemset ◮ The users rating are stored in R ∈ R|U|×|I| ◮ But |U| and |I| can easily be at the range of millions. . . ◮ Let’s find P and Q such that PQ ≈ R ◮ Let P ∈ R|U|×k and Q ∈ Rk×|I|, where k is a small constant ◮ The algorithm uses least squares to estimate, alternating for

P and Q

slide-58
SLIDE 58

PageRank and recommenders on very large scale PageRank and recommender systems Recommender systems

Recommender systems

Alternating Least Squares (ALS)

◮ We have a U user and an I itemset ◮ The users rating are stored in R ∈ R|U|×|I| ◮ But |U| and |I| can easily be at the range of millions. . . ◮ Let’s find P and Q such that PQ ≈ R ◮ Let P ∈ R|U|×k and Q ∈ Rk×|I|, where k is a small constant ◮ The algorithm uses least squares to estimate, alternating for

P and Q

slide-59
SLIDE 59

PageRank and recommenders on very large scale PageRank and recommender systems Recommender systems

Recommender systems

Alternating Least Squares (ALS)

◮ We have a U user and an I itemset ◮ The users rating are stored in R ∈ R|U|×|I| ◮ But |U| and |I| can easily be at the range of millions. . . ◮ Let’s find P and Q such that PQ ≈ R ◮ Let P ∈ R|U|×k and Q ∈ Rk×|I|, where k is a small constant ◮ The algorithm uses least squares to estimate, alternating for

P and Q

slide-60
SLIDE 60

PageRank and recommenders on very large scale PageRank and recommender systems Distributing ALS

Limitations of BSP

Challenge

◮ Algorithmic and

physical partitions are different to utilize cpus

◮ In PageRank its OK to

send the same rank multiple times

◮ In ALS it means

duplicating the matrix each time!

Scheme of the BSP system

Wikipedia, public domain

slide-61
SLIDE 61

PageRank and recommenders on very large scale PageRank and recommender systems Distributing ALS

Limitations of BSP

Challenge

◮ Algorithmic and

physical partitions are different to utilize cpus

◮ In PageRank its OK to

send the same rank multiple times

◮ In ALS it means

duplicating the matrix each time!

Scheme of the BSP system

Wikipedia, public domain

slide-62
SLIDE 62

PageRank and recommenders on very large scale PageRank and recommender systems Distributing ALS

Limitations of BSP

Challenge

◮ Algorithmic and

physical partitions are different to utilize cpus

◮ In PageRank its OK to

send the same rank multiple times

◮ In ALS it means

duplicating the matrix each time!

Scheme of the BSP system

Wikipedia, public domain

slide-63
SLIDE 63

PageRank and recommenders on very large scale PageRank and recommender systems Distributing ALS

Possible solution

Proposed new Stratosphere input contract

Given a set of values pi indexed by i, and a relation Rij over the index set, form the co-group ∀j as: j : pi for Rij

slide-64
SLIDE 64

PageRank and recommenders on very large scale PageRank and recommender systems Distributing ALS

Possible solution

Proposed new Stratosphere input contract

Given a set of values pi indexed by i, and a relation Rij over the index set, form the co-group ∀j as: j : pi for Rij In other words, a directed graph defines the values pi that have to be aggregated at nodes j.

slide-65
SLIDE 65

PageRank and recommenders on very large scale PageRank and recommender systems Distributing ALS

Possible solution

Proposed new Stratosphere input contract

Given a set of values pi indexed by i, and a relation Rij over the index set, form the co-group ∀j as: j : pi for Rij In other words, a directed graph defines the values pi that have to be aggregated at nodes j. Both ALS and PageRank (and I guess may more) use this Input Contract.

slide-66
SLIDE 66

PageRank and recommenders on very large scale Reference

Table of contents

Distributing data-intensive algorithms Stratosphere Input Contracts PageRank and recommender systems Reference

slide-67
SLIDE 67

PageRank and recommenders on very large scale Reference Where to look for additional info

Literature

„TriangleCounter”

Englert et al. (2014): Efficiency Issues of Computing Graph Properties of Social Networks, Presented at The 9th International Conference on Applied Informatics, Eger, proceedings are under publish.

Stratosphere PACTs

Battré et al. (2010): Nephele/PACTs: a programming model and execution framework for web-scale analytical processing, Proceedings of the 1st ACM symposium on Cloud computing, p119-130.

slide-68
SLIDE 68

PageRank and recommenders on very large scale Reference Where to look for additional info

On the web

Data Mining and Search & Big Data BI Groups

Our research groups can be found at dms.sztaki.hu and at bigdatabi.sztaki.hu.

Stratosphere project homepage

The project can be found at stratosphere.eu. The homepage served as a source for all the images and code presented on these slides.