multi pass sorted neighborhood blocking with mapreduce
play

Multi-pass Sorted Neighborhood Blocking with MapReduce Lars Kolb, - PowerPoint PPT Presentation

Multi-pass Sorted Neighborhood Blocking with MapReduce Lars Kolb, Andreas Thor, Erhard Rahm Jens Hildebrandt, Jakob Zwiener Agenda 2 1. Sorted Neighborhood Method with Map Reduce with Entity Replication 2. Multipass Sorted


  1. Multi-pass Sorted Neighborhood Blocking with MapReduce Lars Kolb, Andreas Thor, Erhard Rahm Jens Hildebrandt, Jakob Zwiener

  2. Agenda 2 1. Sorted Neighborhood Method ■ with Map Reduce ■ with Entity Replication 2. Multipass Sorted Neighborhood Method 3. Load Balancing 4. Benchmarks Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

  3. Sorted Neighborhood Method 3 1. Calculate Sorting Key sorting key artist_name disc_title Genre tracks • Genre + tracks Sonny Terry The Blues Blues 18 Fats Waller Portrait Jazz 17 Blind Blake Best Of Blues 18 I'M Fats Domino Walking Blues 18 Stony Chris Rea Blues 17 Road Jazz Jazz Jazz 20 Acustica Acustica Blues 19 Various The Blues Blues 17 Kelis Tasty R+B 17 Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

  4. Sorted Neighborhood Method 4 1. Calculate Sorting Key sorting key artist_name disc_title Genre tracks • Genre + tracks Blues18 Sonny Terry The Blues Blues 18 2. Sort Jazz17 Fats Waller Portrait Jazz 17 Blues18 Blind Blake Best Of Blues 18 I'M Blues18 Fats Domino Walking Blues 18 Stony Blues17 Chris Rea Blues 17 Road Jazz20 Jazz Jazz Jazz 20 Blues19 Acustica Acustica Blues 19 Blues17 Various The Blues Blues 17 R+B17 Kelis Tasty R+B 17 Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

  5. Sorted Neighborhood Method 5 1. Calculate Sorting Key sorting key artist_name disc_title Genre tracks • Genre + tracks Stony Blues17 Chris Rea Blues 17 2. Sort Road 3. Move a window over Blues17 Various The Blues Blues 17 the data Window size w = 3 • Blues18 Sonny Terry The Blues Blues 18 • Row count n = 9 Blues18 Blind Blake Best Of Blues 18 I'M Comparisons: ? Blues18 Fats Domino Walking Blues 18 Comparisons: O(n*w) Blues19 Acustica Acustica Blues 19 Jazz17 Fats Waller Portrait Jazz 17 Jazz20 Jazz Jazz Jazz 20 R+B17 Kelis Tasty R+B 17 Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

  6. Sorted Neighborhood with Map Reduce - Algorithm 6 disc_title Genre tracks The Blues Blues 18 Portrait Jazz 17 Best Of Blues 18 disc_title Genre tracks I'M Blues 18 Walking Stony Road Blues 17 Jazz Jazz 20 disc_title Genre tracks Acustica Blues 19 The Blues Blues 17 Tasty R+B 17 Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

  7. Sorted Neighborhood with Map Reduce - Algorithm 7 disc_title Genre tracks sort disc_title ... map 1 The Blues Blues 18 The Blues ... Portrait ... Portrait Jazz 17 Best Of ... Best Of Blues 18 disc_title Genre tracks sort disc_title ... I'M map 2 Blues 18 Walking I'M Walking ... Stony Road Blues 17 Stony Road ... Jazz ... Jazz Jazz 20 disc_title Genre tracks sort disc_title ... Acustica Blues 19 map 3 Acustica ... The Blues Blues 17 The Blues ... Tasty ... Tasty R+B 17 Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

  8. Sorted Neighborhood with Map Reduce - Algorithm 8 Map: disc_title Genre tracks sort disc_title ... 1. Calculate map 1 The Blues Blues 18 Blues18 The Blues ... SortingKey: Jazz17 Portrait ... Portrait Jazz 17 Genre+tracks Blues18 Best Of ... Best Of Blues 18 disc_title Genre tracks sort disc_title ... I'M map 2 Blues 18 Walking Blues18 I'M Walking ... Stony Road Blues 17 Blues17 Stony Road ... Jazz20 Jazz ... Jazz Jazz 20 disc_title Genre tracks sort disc_title ... Acustica Blues 19 map 3 Blues19 Acustica ... The Blues Blues 17 Blues17 The Blues ... R+B17 Tasty ... Tasty R+B 17 Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

  9. Sorted Neighborhood with Map Reduce - Algorithm 9 Map: disc_title Genre tracks part.sort disc_title ... 1. Calculate map 1 The Blues Blues 18 1.Blues18 The Blues ... SortingKey: Jazz17 Portrait ... Portrait Jazz 17 Genre+tracks 1.Blues18 Best Of ... Best Of Blues 18 2. Calculate Partition: disc_title Genre tracks sorting key partition part.sort disc_title ... I'M B… 1 map 2 Blues 18 Walking 1.Blues18 I'M Walking ... Stony Road Blues 17 1.Blues17 Stony Road ... Jazz20 Jazz ... Jazz Jazz 20 disc_title Genre tracks part.sort disc_title ... Acustica Blues 19 map 3 1.Blues19 Acustica ... The Blues Blues 17 1.Blues17 The Blues ... R+B17 Tasty ... Tasty R+B 17 Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

  10. Sorted Neighborhood with Map Reduce - Algorithm 10 Map: disc_title Genre tracks part.sort disc_title ... 1. Calculate map 1 The Blues Blues 18 1.Blues18 The Blues ... SortingKey: 2.Jazz17 Portrait ... Portrait Jazz 17 Genre+tracks 1.Blues18 Best Of ... Best Of Blues 18 2. Calculate Partition: disc_title Genre tracks sorting key partition part.sort disc_title ... I'M B… 1 map 2 Blues 18 Walking 1.Blues18 I'M Walking ... J… 2 Stony Road Blues 17 1.Blues17 Stony Road ... 2.Jazz20 Jazz ... Jazz Jazz 20 disc_title Genre tracks part.sort disc_title ... Acustica Blues 19 map 3 1.Blues19 Acustica ... The Blues Blues 17 1.Blues17 The Blues ... R+B17 Tasty ... Tasty R+B 17 Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

  11. Sorted Neighborhood with Map Reduce - Algorithm 11 Map: disc_title Genre tracks part.sort disc_title ... 1. Calculate map 1 The Blues Blues 18 1.Blues18 The Blues ... SortingKey: 2.Jazz17 Portrait ... Portrait Jazz 17 Genre+tracks 1.Blues18 Best Of ... Best Of Blues 18 2. Calculate Partition: disc_title Genre tracks sorting key partition part.sort disc_title ... I'M B… 1 map 2 Blues 18 Walking 1.Blues18 I'M Walking ... J… 2 R… 2 Stony Road Blues 17 1.Blues17 Stony Road ... 2.Jazz20 Jazz ... Jazz Jazz 20 disc_title Genre tracks part.sort disc_title ... Acustica Blues 19 map 3 1.Blues19 Acustica ... The Blues Blues 17 1.Blues17 The Blues ... 2.R+B17 Tasty ... Tasty R+B 17 Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

  12. Sorted Neighborhood with Map Reduce - Algorithm 12 disc_title Genre tracks part.sort disc_title ... map 1 The Blues Blues 18 1.Blues18 The Blues ... 2.Jazz17 Portrait ... Portrait Jazz 17 1.Blues18 Best Of ... Best Of Blues 18 Partitioning disc_title Genre tracks part.sort disc_title ... I'M map 2 Blues 18 Walking 1.Blues18 I'M Walking ... Stony Road Blues 17 1.Blues17 Stony Road ... 2.Jazz20 Jazz ... Jazz Jazz 20 disc_title Genre tracks part.sort disc_title ... Acustica Blues 19 map 3 1.Blues19 Acustica ... The Blues Blues 17 1.Blues17 The Blues ... 2.R+B17 Tasty ... Tasty R+B 17 Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

  13. Sorted Neighborhood with Map Reduce - Algorithm 13 disc_title Genre tracks part.sort disc_title ... part.sort disc_title ... map 1 The Blues Blues 18 1.Blues18 The Blues ... 1.Blues17 The Blues ... 2.Jazz17 Portrait ... Portrait Jazz 17 1.Blues17 Stony Road ... 1.Blues18 Best Of ... 1.Blues18 I'M Walking ... Best Of Blues 18 1.Blues18 Best Of ... 1.Blues18 The Blues ... Partitioning disc_title Genre tracks part.sort disc_title ... 1.Blues19 Acustica ... I'M map 2 Blues 18 Walking 1.Blues18 I'M Walking ... Stony Road Blues 17 1.Blues17 Stony Road ... 2.Jazz20 Jazz ... Jazz Jazz 20 part.sort disc_title ... 2.Jazz17 Portrait ... disc_title Genre tracks part.sort disc_title ... 2.Jazz20 Jazz ... Acustica Blues 19 map 3 1.Blues19 Acustica ... 2.R+B17 Tasty ... The Blues Blues 17 1.Blues17 The Blues ... 2.R+B17 Tasty ... Tasty R+B 17 Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

  14. Sorted Neighborhood with Map Reduce - Limitations 14 reduce 1 • Neighboring sorting keys must be on the same reducer part.sort disc_title ...  own partition function 1.Blues17 The Blues ... 1.Blues17 Stony Road ... • Self defined partitioning + sorting 1.Blues18 I'M Walking ... • Internal load balancing does not work 1.Blues18 Best Of ... anymore 1.Blues18 The Blues ... 1.Blues19 Acustica ... • Boundary entities reduce 2 • Sliding window cannot compare entities that are assigned to different reduce nodes part.sort disc_title ... • Solution: data replication 2.Jazz17 Portrait ... 2.Jazz20 Jazz ... 2.R+B17 Tasty ... Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

  15. Sorted Neighborhood with Entity Replication 15 map 1 reduce 1 map 2 reduce 2 map 3 Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

  16. Sorted Neighborhood with Entity Replication 16 sort disc_title ... Blues18 The Blues ... map 1 Jazz17 Portrait ... reduce 1 Blues18 Best Of ... sort disc_title ... Blues18 I'M Walking ... map 2 Blues17 Stony Road ... Jazz20 Jazz ... reduce 2 sort disc_title ... Blues19 Acustica ... map 3 Blues17 The Blues ... R+B17 Tasty ... Sorted Neighborhood with MapReduce | Jens Hildebrandt, Jakob Zwiener | 6. Mai 2013

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend