Multi-pass Sorted Neighborhood Blocking with MapReduce Lars Kolb, - - PowerPoint PPT Presentation

multi pass sorted neighborhood blocking with mapreduce
SMART_READER_LITE
LIVE PREVIEW

Multi-pass Sorted Neighborhood Blocking with MapReduce Lars Kolb, - - PowerPoint PPT Presentation

Multi-pass Sorted Neighborhood Blocking with MapReduce Lars Kolb, Andreas Thor, Erhard Rahm Jens Hildebrandt, Jakob Zwiener Agenda 2 1. Sorted Neighborhood Method with Map Reduce with Entity Replication 2. Multipass Sorted


slide-1
SLIDE 1

Multi-pass Sorted Neighborhood Blocking with MapReduce

Lars Kolb, Andreas Thor, Erhard Rahm Jens Hildebrandt, Jakob Zwiener

slide-2
SLIDE 2

Agenda

  • 1. Sorted Neighborhood Method

■ with Map Reduce ■ with Entity Replication

  • 2. Multipass Sorted Neighborhood Method
  • 3. Load Balancing
  • 4. Benchmarks

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 2

slide-3
SLIDE 3

Sorted Neighborhood Method

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 3

sorting key artist_name disc_title Genre tracks Sonny Terry The Blues Blues 18 Fats Waller Portrait Jazz 17 Blind Blake Best Of Blues 18 Fats Domino I'M Walking Blues 18 Chris Rea Stony Road Blues 17 Jazz Jazz Jazz 20 Acustica Acustica Blues 19 Various The Blues Blues 17 Kelis Tasty R+B 17

  • 1. Calculate Sorting Key
  • Genre + tracks
slide-4
SLIDE 4

Sorted Neighborhood Method

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 4

sorting key artist_name disc_title Genre tracks Blues18 Sonny Terry The Blues Blues 18 Jazz17 Fats Waller Portrait Jazz 17 Blues18 Blind Blake Best Of Blues 18 Blues18 Fats Domino I'M Walking Blues 18 Blues17 Chris Rea Stony Road Blues 17 Jazz20 Jazz Jazz Jazz 20 Blues19 Acustica Acustica Blues 19 Blues17 Various The Blues Blues 17 R+B17 Kelis Tasty R+B 17

  • 1. Calculate Sorting Key
  • Genre + tracks
  • 2. Sort
slide-5
SLIDE 5

sorting key artist_name disc_title Genre tracks Blues17 Chris Rea Stony Road Blues 17 Blues17 Various The Blues Blues 17 Blues18 Sonny Terry The Blues Blues 18 Blues18 Blind Blake Best Of Blues 18 Blues18 Fats Domino I'M Walking Blues 18 Blues19 Acustica Acustica Blues 19 Jazz17 Fats Waller Portrait Jazz 17 Jazz20 Jazz Jazz Jazz 20 R+B17 Kelis Tasty R+B 17

Comparisons: O(n*w)

Sorted Neighborhood Method

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 5

  • 1. Calculate Sorting Key
  • Genre + tracks
  • 2. Sort
  • 3. Move a window over

the data

  • Window size w = 3
  • Row count n = 9

Comparisons: ?

slide-6
SLIDE 6

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 6

disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks I'M Walking Blues 18 Stony Road Blues 17 Jazz Jazz 20 disc_title Genre tracks Acustica Blues 19 The Blues Blues 17 Tasty R+B 17

Sorted Neighborhood with Map Reduce - Algorithm

slide-7
SLIDE 7

Sorted Neighborhood with Map Reduce - Algorithm

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 7

disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks I'M Walking Blues 18 Stony Road Blues 17 Jazz Jazz 20 disc_title Genre tracks Acustica Blues 19 The Blues Blues 17 Tasty R+B 17

map1 map2 map3

sort disc_title ... The Blues ... Portrait ... Best Of ... sort disc_title ... I'M Walking ... Stony Road ... Jazz ... sort disc_title ... Acustica ... The Blues ... Tasty ...

slide-8
SLIDE 8

Sorted Neighborhood with Map Reduce - Algorithm

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 8

disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks I'M Walking Blues 18 Stony Road Blues 17 Jazz Jazz 20 disc_title Genre tracks Acustica Blues 19 The Blues Blues 17 Tasty R+B 17

map1 map2 map3

sort disc_title ... Blues18 The Blues ... Jazz17 Portrait ... Blues18 Best Of ... sort disc_title ... Blues18 I'M Walking ... Blues17 Stony Road ... Jazz20 Jazz ... sort disc_title ... Blues19 Acustica ... Blues17 The Blues ... R+B17 Tasty ...

Map:

  • 1. Calculate

SortingKey: Genre+tracks

slide-9
SLIDE 9

Sorted Neighborhood with Map Reduce - Algorithm

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 9

disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks I'M Walking Blues 18 Stony Road Blues 17 Jazz Jazz 20 disc_title Genre tracks Acustica Blues 19 The Blues Blues 17 Tasty R+B 17

map1 map2 map3

part.sort disc_title ... 1.Blues18 The Blues ... Jazz17 Portrait ... 1.Blues18 Best Of ... part.sort disc_title ... 1.Blues18 I'M Walking ... 1.Blues17 Stony Road ... Jazz20 Jazz ... part.sort disc_title ... 1.Blues19 Acustica ... 1.Blues17 The Blues ... R+B17 Tasty ...

Map:

  • 1. Calculate

SortingKey: Genre+tracks

  • 2. Calculate Partition:

sorting key partition B… 1

slide-10
SLIDE 10

Sorted Neighborhood with Map Reduce - Algorithm

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 10

disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks I'M Walking Blues 18 Stony Road Blues 17 Jazz Jazz 20 disc_title Genre tracks Acustica Blues 19 The Blues Blues 17 Tasty R+B 17

map1 map2 map3

part.sort disc_title ... 1.Blues18 The Blues ... 2.Jazz17 Portrait ... 1.Blues18 Best Of ... part.sort disc_title ... 1.Blues18 I'M Walking ... 1.Blues17 Stony Road ... 2.Jazz20 Jazz ... part.sort disc_title ... 1.Blues19 Acustica ... 1.Blues17 The Blues ... R+B17 Tasty ...

Map:

  • 1. Calculate

SortingKey: Genre+tracks

  • 2. Calculate Partition:

sorting key partition B… 1 J… 2

slide-11
SLIDE 11

Sorted Neighborhood with Map Reduce - Algorithm

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 11

disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks I'M Walking Blues 18 Stony Road Blues 17 Jazz Jazz 20 disc_title Genre tracks Acustica Blues 19 The Blues Blues 17 Tasty R+B 17

map1 map2 map3

part.sort disc_title ... 1.Blues18 The Blues ... 2.Jazz17 Portrait ... 1.Blues18 Best Of ... part.sort disc_title ... 1.Blues18 I'M Walking ... 1.Blues17 Stony Road ... 2.Jazz20 Jazz ... part.sort disc_title ... 1.Blues19 Acustica ... 1.Blues17 The Blues ... 2.R+B17 Tasty ...

Map:

  • 1. Calculate

SortingKey: Genre+tracks

  • 2. Calculate Partition:

sorting key partition B… 1 J… 2 R… 2

slide-12
SLIDE 12

Sorted Neighborhood with Map Reduce - Algorithm

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 12

disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks I'M Walking Blues 18 Stony Road Blues 17 Jazz Jazz 20 disc_title Genre tracks Acustica Blues 19 The Blues Blues 17 Tasty R+B 17

map1 map2 map3

part.sort disc_title ... 1.Blues18 The Blues ... 2.Jazz17 Portrait ... 1.Blues18 Best Of ... part.sort disc_title ... 1.Blues18 I'M Walking ... 1.Blues17 Stony Road ... 2.Jazz20 Jazz ... part.sort disc_title ... 1.Blues19 Acustica ... 1.Blues17 The Blues ... 2.R+B17 Tasty ...

Partitioning

slide-13
SLIDE 13

Sorted Neighborhood with Map Reduce - Algorithm

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 13

disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks I'M Walking Blues 18 Stony Road Blues 17 Jazz Jazz 20 disc_title Genre tracks Acustica Blues 19 The Blues Blues 17 Tasty R+B 17

map1 map2 map3

part.sort disc_title ... 1.Blues18 The Blues ... 2.Jazz17 Portrait ... 1.Blues18 Best Of ... part.sort disc_title ... 1.Blues18 I'M Walking ... 1.Blues17 Stony Road ... 2.Jazz20 Jazz ... part.sort disc_title ... 1.Blues19 Acustica ... 1.Blues17 The Blues ... 2.R+B17 Tasty ...

Partitioning

part.sort disc_title ... 1.Blues17 The Blues ... 1.Blues17 Stony Road ... 1.Blues18 I'M Walking ... 1.Blues18 Best Of ... 1.Blues18 The Blues ... 1.Blues19 Acustica ... part.sort disc_title ... 2.Jazz17 Portrait ... 2.Jazz20 Jazz ... 2.R+B17 Tasty ...

slide-14
SLIDE 14

Sorted Neighborhood with Map Reduce - Limitations

  • Neighboring sorting keys must be on the same

reducer  own partition function

  • Self defined partitioning + sorting
  • Internal load balancing does not work

anymore

  • Boundary entities
  • Sliding window cannot compare entities that

are assigned to different reduce nodes

  • Solution: data replication

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 14

part.sort disc_title ... 1.Blues17 The Blues ... 1.Blues17 Stony Road ... 1.Blues18 I'M Walking ... 1.Blues18 Best Of ... 1.Blues18 The Blues ... 1.Blues19 Acustica ... part.sort disc_title ... 2.Jazz17 Portrait ... 2.Jazz20 Jazz ... 2.R+B17 Tasty ...

reduce1 reduce2

slide-15
SLIDE 15

Sorted Neighborhood with Entity Replication

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 15

map1 map2 map3 reduce1 reduce2

slide-16
SLIDE 16

Sorted Neighborhood with Entity Replication

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 16

map1 map2 map3

sort disc_title ... Blues18 The Blues ... Jazz17 Portrait ... Blues18 Best Of ... sort disc_title ... Blues18 I'M Walking ... Blues17 Stony Road ... Jazz20 Jazz ... sort disc_title ... Blues19 Acustica ... Blues17 The Blues ... R+B17 Tasty ...

reduce1 reduce2

slide-17
SLIDE 17

Sorted Neighborhood with Entity Replication

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 17

map1 map2 map3

part.sort disc_title ... 1.Blues18 The Blues ... 2.Jazz17 Portrait ... 1.Blues18 Best Of ... part.sort disc_title ... 1.Blues18 I'M Walking ... 1.Blues17 Stony Road ... 2.Jazz20 Jazz ... part.sort disc_title ... 1.Blues19 Acustica ... 1.Blues17 The Blues ... 2.R+B17 Tasty ...

reduce1 reduce2

slide-18
SLIDE 18

Sorted Neighborhood with Entity Replication

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 18

map1 map2 map3

part.sort disc_title ... 1.Blues18 The Blues ... 2.Jazz17 Portrait ... 1.Blues18 Best Of ... 1.Blues18 The Blues ... 1.Blues18 Best Of ... part.sort disc_title ... 1.Blues18 I'M Walking ... 1.Blues17 Stony Road ... 2.Jazz20 Jazz ... 1.Blues17 Stony Road ... 1.Blues18 I'M Walking ... part.sort disc_title ... 1.Blues19 Acustica ... 1.Blues17 The Blues ... 2.R+B17 Tasty ... 1.Blues19 Acustica ... 1.Blues17 The Blues ...

reduce1 reduce2

slide-19
SLIDE 19

Sorted Neighborhood with Entity Replication

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 19

map1 map2 map3

red.part.sort disc_title ... 1.1.Blues18 The Blues ... 2.2.Jazz17 Portrait ... 1.1.Blues18 Best Of ... 2.1.Blues18 The Blues ... 2.1.Blues18 Best Of ... red.part.sort disc_title ... 1.1.Blues18 I'M Walking ... 1.1.Blues17 Stony Road ... 2.2.Jazz20 Jazz ... 2.1.Blues17 Stony Road ... 2.1.Blues18 I'M Walking ... red.part.sort disc_title ... 1.1.Blues19 Acustica ... 1.1.Blues17 The Blues ... 2.2.R+B17 Tasty ... 2.1.Blues19 Acustica ... 2.1.Blues17 The Blues ...

reduce1 reduce2

slide-20
SLIDE 20

Sorted Neighborhood with Entity Replication

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 20

map1 map2 map3

red.part.sort disc_title ... 1.1.Blues18 The Blues ... 2.2.Jazz17 Portrait ... 1.1.Blues18 Best Of ... 2.1.Blues18 The Blues ... 2.1.Blues18 Best Of ... red.part.sort disc_title ... 1.1.Blues18 I'M Walking ... 1.1.Blues17 Stony Road ... 2.2.Jazz20 Jazz ... 2.1.Blues17 Stony Road ... 2.1.Blues18 I'M Walking ... red.part.sort disc_title ... 1.1.Blues19 Acustica ... 1.1.Blues17 The Blues ... 2.2.R+B17 Tasty ... 2.1.Blues19 Acustica ... 2.1.Blues17 The Blues ...

Partitioning

red.part.sort disc_title ... 1.1.Blues17 The Blues ... 1.1.Blues17 Stony Road ... 1.1.Blues18 I'M Walking ... 1.1.Blues18 Best Of ... 1.1.Blues18 The Blues ... 1.1.Blues19 Acustica ... red.part.sort disc_title ...

reduce1 reduce2

slide-21
SLIDE 21

Sorted Neighborhood with Entity Replication

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 21

map1 map2 map3

red.part.sort disc_title ... 1.1.Blues18 The Blues ... 2.2.Jazz17 Portrait ... 1.1.Blues18 Best Of ... 2.1.Blues18 The Blues ... 2.1.Blues18 Best Of ... red.part.sort disc_title ... 1.1.Blues18 I'M Walking ... 1.1.Blues17 Stony Road ... 2.2.Jazz20 Jazz ... 2.1.Blues17 Stony Road ... 2.1.Blues18 I'M Walking ... red.part.sort disc_title ... 1.1.Blues19 Acustica ... 1.1.Blues17 The Blues ... 2.2.R+B17 Tasty ... 2.1.Blues19 Acustica ... 2.1.Blues17 The Blues ...

Partitioning

red.part.sort disc_title ... 1.1.Blues17 The Blues ... 1.1.Blues17 Stony Road ... 1.1.Blues18 I'M Walking ... 1.1.Blues18 Best Of ... 1.1.Blues18 The Blues ... 1.1.Blues19 Acustica ... red.part.sort disc_title ... 2.1.Blues17 The Blues ... 2.1.Blues17 Stony Road ... 2.1.Blues18 I'M Walking ... 2.1.Blues18 The Blues ... 2.1.Blues18 Best Of ... 2.1.Blues19 Acustica ...

reduce1 reduce2

slide-22
SLIDE 22

Sorted Neighborhood with Entity Replication

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 22

map1 map2 map3

red.part.sort disc_title ... 1.1.Blues18 The Blues ... 2.2.Jazz17 Portrait ... 1.1.Blues18 Best Of ... 2.1.Blues18 The Blues ... 2.1.Blues18 Best Of ... red.part.sort disc_title ... 1.1.Blues18 I'M Walking ... 1.1.Blues17 Stony Road ... 2.2.Jazz20 Jazz ... 2.1.Blues17 Stony Road ... 2.1.Blues18 I'M Walking ... red.part.sort disc_title ... 1.1.Blues19 Acustica ... 1.1.Blues17 The Blues ... 2.2.R+B17 Tasty ... 2.1.Blues19 Acustica ... 2.1.Blues17 The Blues ...

Partitioning

red.part.sort disc_title ... 1.1.Blues17 The Blues ... 1.1.Blues17 Stony Road ... 1.1.Blues18 I'M Walking ... 1.1.Blues18 Best Of ... 1.1.Blues18 The Blues ... 1.1.Blues19 Acustica ... red.part.sort disc_title ... 2.1.Blues17 The Blues ... 2.1.Blues17 Stony Road ... 2.1.Blues18 I'M Walking ... 2.1.Blues18 Best Of ... 2.1.Blues18 The Blues ... 2.1.Blues19 Acustica ... 2.2.Jazz17 Portrait ... 2.2.Jazz20 Jazz ... 2.2.R+B17 Tasty ...

reduce1 reduce2

slide-23
SLIDE 23
  • Sorted Neighborhood with Map Reduce
  • Multipass in one Map Reduce
  • Load Balancing for Nodes

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 24

Challenges in Sorted Neighborhood on Map Reduce

slide-24
SLIDE 24

Multipass Sorted Neighborhood Method

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 25

disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks Stony Road Blues 17 Jazz Jazz 20

map1 map2

slide-25
SLIDE 25

Multipass Sorted Neighborhood Method

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 26

disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks Stony Road Blues 17 Jazz Jazz 20

map1 map2

red.part.sort disc_title ... 1.1.Blues18 The Blues ... 2.2.Jazz17 Portrait ... 1.1.Blues18 Best Of ... red.part.sort disc_title ... 1.1.Blues17 Stony Road ... 2.2.Jazz20 Jazz ...

slide-26
SLIDE 26

Multipass Sorted Neighborhood Method

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 27

disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks Stony Road Blues 17 Jazz Jazz 20

map1 map2

red.part.sort disc_title ... 1.1.Blues18 The Blues ... 2.2.Jazz17 Portrait ... 1.1.Blues18 Best Of ... The Blues ... Portrait ... Best Of ... red.part.sort disc_title ... 1.1.Blues17 Stony Road ... 2.2.Jazz20 Jazz ... Stony Road ... Jazz ...

slide-27
SLIDE 27

Multipass Sorted Neighborhood Method

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 28

disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks Stony Road Blues 17 Jazz Jazz 20

map1 map2

red.part.sort disc_title ... 1.1.Blues18 The Blues ... 2.2.Jazz17 Portrait ... 1.1.Blues18 Best Of ... Th18 The Blues ... Po17 Portrait ... Be18 Best Of ... red.part.sort disc_title ... 1.1.Blues17 Stony Road ... 2.2.Jazz20 Jazz ... St17 Stony Road ... Ja20 Jazz ...

slide-28
SLIDE 28

Multipass Sorted Neighborhood Method

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 29

disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks Stony Road Blues 17 Jazz Jazz 20

map1 map2

red.part.sort disc_title ... 1.1.Blues18 The Blues ... 2.2.Jazz17 Portrait ... 1.1.Blues18 Best Of ... 2.2.Th18 The Blues ... 2.2.Po17 Portrait ... 1.1.Be18 Best Of ... red.part.sort disc_title ... 1.1.Blues17 Stony Road ... 2.2.Jazz20 Jazz ... 2.2.St17 Stony Road ... 1.1.Ja20 Jazz ...

slide-29
SLIDE 29

Multipass Sorted Neighborhood Method

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 30

disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks Stony Road Blues 17 Jazz Jazz 20

map1 map2

pass.red. part.sort disc_title ... 1.1.1.Blues18 The Blues ... 1.2.2.Jazz17 Portrait ... 1.1.1.Blues18 Best Of ... 2.2.2.Th18 The Blues ... 2.2.2.Po17 Portrait ... 2.1.1.Be18 Best Of ... pass.red. part.sort disc_title ... 1.1.1.Blues17 Stony Road ... 1.2.2.Jazz20 Jazz ... 2.2.2.St17 Stony Road ... 2.1.1.Ja20 Jazz ...

slide-30
SLIDE 30
  • Sorted Neighborhood with Map Reduce
  • Multipass in one Map Reduce
  • Load Balancing for Nodes

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 33

Challenges in Sorted Neighborhood on Map Reduce

slide-31
SLIDE 31

sortK disc_title ... Blues17 Stony Road ... Blues17 The Blues ... Blues18 The Blues ... Blues18 Best Of ... Blues18 I'M Walking ... Blues19 Acustica ... sortK disc_title ... Jazz17 Portrait ... Jazz20 Jazz ... R+B17 Tasty ...

Load Balancing

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 34

sort disc_title ... Blues18 The Blues ... Jazz17 Portrait ... Blues18 Best Of ... sort disc_title ... Blues18 I'M Walking ... Blues17 Stony Road ... Jazz20 Jazz ... sort disc_title ... Blues19 Acustica ... Blues17 The Blues ... R+B17 Tasty ... sort disc_title ... Blues17 Stony Road ... Blues17 The Blues ... Blues18 The Blues ... Blues18 Best Of ... sort disc_title ... Blues18 I'M Walking ... Blues19 Acustica ... Jazz17 Portrait ... Jazz20 Jazz ... R+B17 Tasty ...

slide-32
SLIDE 32

sort.mapN disc_title ... Blues18.2 I'M Walking ... Blues19.3 Acustica ... Jazz17.1 Portrait ... Jazz20.2 Jazz ... R+B17.3 Tasty ...

Load Balancing

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 35

sort.mapN disc_title ... Blues18.1 The Blues ... Jazz17.1 Portrait ... Blues18.1 Best Of ... sort.mapN disc_title ... Blues18.2 I'M Walking ... Blues17.2 Stony Road ... Jazz20.2 Jazz ... sort.mapN disc_title ... Blues19.3 Acustica ... Blues17.3 The Blues ... R+B17.3 Tasty ... sort.mapN disc_title ... Blues17.2 Stony Road ... Blues17.3 The Blues ... Blues18.1 The Blues ... Blues18.1 Best Of ...

slide-33
SLIDE 33

sort.mapN.counter disc_title ... Blues18.2.1 I'M Walking ... Blues19.3.1 Acustica ... Jazz17.1.1 Portrait ... Jazz20.2.1 Jazz ... R+B17.3.1 Tasty ...

Load Balancing

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 36

sort.mapN.counter disc_title ... Blues18.1.1 The Blues ... Jazz17.1.1 Portrait ... Blues18.1.2 Best Of ... sort.mapN.counter disc_title ... Blues18.2.1 I'M Walking ... Blues17.2.1 Stony Road ... Jazz20.2.1 Jazz ... sort.mapN.counter disc_title ... Blues19.3.1 Acustica ... Blues17.3.1 The Blues ... R+B17.3.1 Tasty ... sort.mapN.counter disc_title ... Blues17.2.1 Stony Road ... Blues17.3.1 The Blues ... Blues18.1.1 The Blues ... Blues18.1.2 Best Of ...

slide-34
SLIDE 34

part.sort disc_title ...

Load Balancing

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 37

part.sort disc_title ... sortKey MapN: 1 2 3 Blues17 1 1 Blues18 2 1 0 Blues19 0 1 Jazz17 1 0 0 Jazz20 1 0 R+B17 0 1 Blues18.2.1

slide-35
SLIDE 35

part.sort disc_title ... 2.Blues18 I'M Walking ...

Load Balancing

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 38

part.sort disc_title ... sortKey MapN: 1 2 3 Blues17 1 1 Blues18 2 1 0 Blues19 0 1 Jazz17 1 0 0 Jazz20 1 0 R+B17 0 1 Blues18.2.1

slide-36
SLIDE 36

part.sort disc_title ... 2.Blues18 I'M Walking ...

Load Balancing

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 39

part.sort disc_title ... sortKey MapN: 1 2 3 Blues17 1 1 Blues18 2 1 0 Blues19 0 1 Jazz17 1 0 0 Jazz20 1 0 R+B17 0 1 Blues18.1.1

slide-37
SLIDE 37

part.sort disc_title ... 2.Blues18 I'M Walking ...

Load Balancing

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 40

part.sort disc_title ... 1.Blues18 The Blues ... sortKey MapN: 1 2 3 Blues17 1 1 Blues18 2 1 0 Blues19 0 1 Jazz17 1 0 0 Jazz20 1 0 R+B17 0 1 Blues18.1.1

slide-38
SLIDE 38

part.sort disc_title ... 2.Blues18 I'M Walking ... 2.Blues19 Acustica ... 2.Jazz17 Portrait ... 2.Jazz20 Jazz ... 2.R+B17 Tasty ...

Load Balancing

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 41

part.sort disc_title ... 1.Blues17 Stony Road ... 1.Blues17 The Blues ... 1.Blues18 The Blues ... 1.Blues18 Best Of ... sortKey MapN: 1 2 3 Blues17 1 1 Blues18 2 1 0 Blues19 0 1 Jazz17 1 0 0 Jazz20 1 0 R+B17 0 1

slide-39
SLIDE 39

Benchmarks

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 43

slide-40
SLIDE 40

Benchmarks

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 45

artist[ : 2] + title artist[ : 1] + title[ : 1]

slide-41
SLIDE 41

Summary

  • 1. Sorted Neighborhood Method

■ with Map Reduce ■ with Entity Replication

  • 2. Multipass Sorted Neighborhood Method
  • 3. Load Balancing
  • 4. Benchmarks

Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013 46