15799
Final Project Presentation
- Dec. 2nd, 2013
Qing Zheng & Atreyee Maiti
Final Project Presentation Dec. 2 nd , 2013 Qing Zheng & Atreyee - - PowerPoint PPT Presentation
15799 Final Project Presentation Dec. 2 nd , 2013 Qing Zheng & Atreyee Maiti Goals Graph Queries How different DBs handle large graph? Whats the differences in performance? What DB to which for a specific use-case? 2
Qing Zheng & Atreyee Maiti
2
community
3
community
familiar with it
4
5
<6
<6
7 <6
<6 min min
8 <6
<6 min min
9
10
1 2 2 2 2 2 3 3
11
12
13
14
Neo4j MyS QL S ix-Degree S hortest-Path Most-Cited-Page seconds
3 6 9 12 15 18 cold warm 500 1000 1500 2000 2500 3000 cold warm 1800 3600 5400 7200 9000 10800 cold warm
15
page cache for buffering data blocks
16
17
18
19
20
s 1 1 2 2 2 2 1 d
21
s 1 1 2 2 2 2 1 d
22
s 1 1 2 2 2 2 1 d
23
s 1 1 2 2 2 2 1 d
Group By / S ubquery
24
s 1 1 2 2 2 2 1 d
Group By / S ubquery Insert Ignore Into …
25
26
Need to keep temp table short!
27
Adolf-Hitler
1,210/ 1,211 85,829/ 340,632 1 2
Walk-to-the-S ky
19/ 19 9,270/ 11,743 1 2 3 619,132/ 2,594,398
28
29
30
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 A 42 G Pt T M N AH WttS S JG Ah R
seconds 2,786 secs
W
31
32
+----+-------------+-------+-------+---------------+---------+---------+------+-----------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+-----------+----------------------------------------------+ | 1 | SIMPLE | links | index | NULL | REVERSE | 8 | NULL | 709804739 | Using index; Using temporary; Using filesort | +----+-------------+-------+-------+---------------+---------+---------+------+-----------+----------------------------------------------+
33
+----+-------------+-------+-------+---------------+---------+---------+------+-----------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+-----------+----------------------------------------------+ | 1 | SIMPLE | links | index | NULL | REVERSE | 8 | NULL | 709804739 | Using index; Using temporary; Using filesort | +----+-------------+-------+-------+---------------+---------+---------+------+-----------+----------------------------------------------+
34
+----+-------------+-------+-------+---------------+---------+---------+------+-----------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+-----------+----------------------------------------------+ | 1 | SIMPLE | links | index | NULL | REVERSE | 8 | NULL | 709804739 | Using index; Using temporary; Using filesort | +----+-------------+-------+-------+---------------+---------+---------+------+-----------+----------------------------------------------+
35
+----+-------------+-------+-------+---------------+---------+---------+------+-----------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+-----------+----------------------------------------------+ | 1 | SIMPLE | links | index | NULL | REVERSE | 8 | NULL | 709804739 | Using index; Using temporary; Using filesort | +----+-------------+-------+-------+---------------+---------+---------+------+-----------+----------------------------------------------+
36
37
38
45x more rows scanned than sorted
39
40
Importing tool
Graph Structure Node: pages with property "title" Relationship: "Link" Lucene index
Neo4j GraphAlgoFactory
Benchmark Implementation six degree findS inglePath with max depth shortest path shortestPath most cited node get all relationships, maintain count
A B KNOWS Name:Qing Age:24
KNOWS
traverse a lot of linked lists - most cited page
resource availability
T wo types
storage file data similar representation as disk for fast traversal
individual nodes and relationships and their properties in a form that is optimized for fast traversal of the graph - relies on garbage collection for eviction from the cache in an LRU manner. cache levels
JVM options: initial heap size = 512m max heap size = 1024m CMS InitiatingOccupancyFrac tion=50 UseConc MarkS weepGC Cache type: weak c ache type (object c ac he) - Provides short life span for c ached objec ts. S uitable for high throughput applications where a larger portion of the graph than what c an fit into memory is frequently ac c essed. Memory mapping options: ( based on sizes of the c orresponding store files) nodes = 200M relationships = 5G propertystore= 500M
Algorithm On 8GB RAM, SSD On 4GB ram, HDD S ix degree 5 seconds 11 S hortest path 5 seconds 11 Most cited node 161 seconds 256
50
algorithm needs to be optimized to a large extent - for unknown destination graph algorithms, mysql is very poor
need to be known and explored. Increasing heap is not always the solution!!
mostly for importing large data
large it is, the fan out of the graph
Description system comments highly connected nodes problem neo4j specific neo4j 2.1 will be solving this but not yet released neo4j has a huge set of algorithms that can be used out of the box neo4j specific neo4j community is very active neo4j specific
S D
ciDB
54
We thank Professor Andy Pavlo for giving us direction at various points in the project. We are also grateful to AWS for the funding towards running the experiments.
http:/ / www.slideshare.net/ thobe/ an-overview-of-neo4j-internals http:/ / event.c wi.nl/ grades2013/ 07-welc.pdf http:/ / docs.neo4j.org/ chunked/ milestone/ configuration-caches.html http:/ / www.slideshare.net/ markhneedham/ football-graph-neo4j-and-the-premier-league https:/ / github.com/ mirkonasato/ graphipedia http:/ / istc-bigdata.org/ index.php/ benchmarking-graph-databases/ http:/ / dev.mysql.c om/ doc/ refman/ 5.5/ en/ index.html http:/ / dumps.wikimedia.org/ http:/ / vldb.org/ pvldb/ vol5/ p358_jungao_vldb2012.pdf