MMap (Memory Mapping) Simple, minimalist approach to scale up - - PowerPoint PPT Presentation

mmap memory mapping
SMART_READER_LITE
LIVE PREVIEW

MMap (Memory Mapping) Simple, minimalist approach to scale up - - PowerPoint PPT Presentation

Class Website CX4242: MMap (Memory Mapping) Simple, minimalist approach to scale up computation Mahdi Roozbahani Lecturer, Computational Science and Engineering, Georgia Tech When should you use Spark/Hadoop, AWS, Azure? And when should you


slide-1
SLIDE 1

Class Website

CX4242:

MMap (Memory Mapping)

Simple, minimalist approach to scale up computation

Mahdi Roozbahani Lecturer, Computational Science and Engineering, Georgia Tech

slide-2
SLIDE 2

When should you use Spark/Hadoop, AWS, Azure? And when should you not?

slide-3
SLIDE 3

MMap

Fast Billion-Scale Graph Computation

  • n a PC via Memory Mapping

3

Lead by

Zhiyuan (Jerry) Lin Georgia Tech CS Undergrad

Now: Stanford PhD student

MMap: Fast Billion-Scale Graph Computation on a PC via Memory Mapping. Zhiyuan Lin, Minsuk Kahng, Kaeser Md. Sabrin, Duen Horng Chau, Ho Lee, and U Kang. Proceedings of IEEE BigData 2014 conference. Oct 27-30, Washington DC, USA. Towards Scalable Graph Computation on Mobile Devices. Yiqi Chen, Zhiyuan Lin, Robert Pienta, Minsuk Kahng, Duen Ho

slide-4
SLIDE 4

Graph Computation on Computer Cluster?

Steep learning curve Cost Overkill for smaller graphs

Image source: http://www.drupaltky.org/en/article/20

slide-5
SLIDE 5

Best-of-breed Single-PC Approaches

  • GraphChi – OSDI 2012
  • TurboGraph – KDD 2013

What do they have in common?

  • Sophisticated Data Structures
  • Explicit Memory Management
slide-6
SLIDE 6

Can We Do Less?

To get same or better performance?

e.g., auto memory management, faster, etc.

slide-7
SLIDE 7

Main Idea: Memory-mapped the Graph

7

slide-8
SLIDE 8

B p

How to compute PageRank for huge matrix?

Use the power iteration method

http://en.wikipedia.org/wiki/Power_iteration

Can initialize this vector to any non-zero vector, e.g., all “1”s

p’ + p = c B p + (1-c) 1 = c (1-c) 2 3 5 4 1 n n

8

slide-9
SLIDE 9

Example: PageRank (implemented using MMap)

9

http://www.cc.gatech.edu/~dchau/papers/14-bigdata-mmap.pdf

slide-10
SLIDE 10

10

8000 lines of code

slide-11
SLIDE 11

11

slide-12
SLIDE 12

Why Memory Mapping Works?

High-degree nodes’ info automatically cached/kept in memory for future frequent access Read-ahead paging preemptively loads edges from disk. Highly-optimized by the OS No need to explicitly manage memory (less book-keeping)

slide-13
SLIDE 13

14

Also works on tablets! (If you want.) Big Data on Small Devices (270M+ Edges)

slide-14
SLIDE 14

“Mobile” devices are now very powerful

15

https://www.macrumors.com/2018/11/01/2018-ipad-pro-benchmarks-geekbench/

slide-15
SLIDE 15

Lead by

Dezhi (Andy) Fang, Georgia Tech CS Undergrad. Now: Airbnb software engineer

slide-16
SLIDE 16

17

MMap project website http://poloclub.gatech.edu/mmap/