Tr Treelogy: : A Benchma mark rk Su Suite for r Tree - - PowerPoint PPT Presentation

tr treelogy a benchma mark rk su suite for r tree
SMART_READER_LITE
LIVE PREVIEW

Tr Treelogy: : A Benchma mark rk Su Suite for r Tree - - PowerPoint PPT Presentation

Purdue University Programming Languages Group Tr Treelogy: : A Benchma mark rk Su Suite for r Tree Traversals Nikhil Hegde, Jianqiao Liu, Kirshanthan Sundararajah, and Milind Kulkarni School of Electrical and Computer Engineering


slide-1
SLIDE 1

Tr Treelogy: : A Benchma mark rk Su Suite for r Tree Traversals

Nikhil Hegde, Jianqiao Liu, Kirshanthan Sundararajah, and Milind Kulkarni School of Electrical and Computer Engineering Purdue University

1

Purdue University Programming Languages Group

ISPASS2017

slide-2
SLIDE 2
  • Tree algorithms are important
  • Data mining, statistics, scientific computing, graphics, bioinformatics

etc.

  • Application-specific optimizations and tree algorithms

have been developed over the years

2

Tr Tree algorithms

ISPASS2017

slide-3
SLIDE 3

Tr Tree algorithms and Optimizations

3

Tree algorithms Optimizations Barnes-Hut Fast multipole method Vantage point trees Accelerating ray tracing K-means clustering Frequent item set mining Locality Communication Vectorization Scheduling Barnes,1986 Rokhlin,1985 Yianilos,1993 Foley,2005 Alsabti,1997 Han,2000 Zhang,1997 Gray,2001 Warren,1992 Hamada,2009 Makino,1990 Liu,2016 Ghoting,2007 Höhl,2002

ISPASS2017

slide-4
SLIDE 4

Tr Tree algorithms and optimizations

  • 1. Does the tree algorithm admit an existing
  • ptimization?
  • 2. Can an optimization be generalized to other tree

algorithms?

4

Treelogy helps to answer these questions.

ISPASS2017

slide-5
SLIDE 5

Tr Treelogy

5

Tree algorithm Ontology Optimization

Generalize Categorize Get associated optimizations Categorize

ISPASS2017

slide-6
SLIDE 6

Co Contri ributions

6

  • Ontology for tree traversal algorithms
  • Mapping of optimizations with structural properties of tree

algorithms

  • A suite of 9 tree traversal algorithms from multiple domains
  • Evaluation with multiple tree types and hardware platforms

(GPUs, shared- and distributed-memory systems)

  • https://bitbucket.org/plcl/treelogy

ISPASS2017

slide-7
SLIDE 7

Ba Backg kground

  • Why trees and how?
  • Search space elimination and compact data representation
  • Often traversed repeatedly
  • Metric trees and n-fix trees are the most common

types

7

ISPASS2017

slide-8
SLIDE 8

Ex Exampl ples s – me metri ric trees

A B C D E F G

2-dimensional space of points

G F E A C B D

Binary kd-tree, 1 point /leaf cell

e.g. K-dimensional (kd-), Vantage Point (vp-), quad-trees, octrees, ball-trees

X Y

8

ISPASS2017

slide-9
SLIDE 9

G

9

Kd Kd-tr tree ee for tw two-po point correl elation

Goal: for every point, find the number of points that are located within a given distance R.

Naïve solution: O(N2) With kd-trees: O(NlogN)

N 2 1 Does the distance to any point within the cell < R ? Input points = {1, 2, … , N} Î ℝK Kd-tree F E A C B D Treelogy kernels with metric trees:

  • 1. Two-point correlation (PC) 2. Nearest Neighbor (NN)
  • 3. K-Nearest Neighbor (K-NN) 4. Barnes-Hut (BH)
  • 5. K-means clustering (KC) 6. Photon mapping (PM)
  • 7. Fast multipole method (FMM)

ISPASS2017

slide-10
SLIDE 10

Ex Exampl ples s – n-fi fix x tree

  • We refer to prefix and suffix trees as n-fix trees

10

  • e.g. suffix tree (trie) for string ATAC$

Suffix set: {C} {AC} {TAC} {ATAC} C A C T A C $ $ $ $ $ A C T {$}

ISPASS2017

slide-11
SLIDE 11

Ge Generaliz alized suffix fix trees for

  • r lon

longest co common substring

11

Naïve solution: O(N*M2)

Goal: find the longest common substring of two strings: 1) ATGA and 2) ATGTA (answer: ATG) ATGTA$

Path to a node: substring of string 1 or string 2 or both (vertex number)

* * * * * * TG T A G G A# $ A# TA$ # $ TA$ A$ A# TA$ 1 2 2 1 2 2 1

Generalized suffix tree With suffix trees: O(N+M) in time and space Longest common substring? Deepest vertex with *

GA# AT

Treelogy kernels with n-fix trees: 1. Frequent item set mining (FIM) 2. Longest common substring (LCS)

ISPASS2017

slide-12
SLIDE 12

Tr Treelogy Ke Kernels

  • Two-point Correlation (PC)
  • Nearest Neighbor (NN)
  • K-Nearest Neighbor (KNN)
  • Barnes-Hut (BH)
  • Photon Mapping (PM)
  • Frequent Item-set Mining (FIM)
  • K-Means Clustering (KC)
  • Longest Common Substring (LCS)
  • Fast Multipole Method (FMM)
  • Traversals dominate computation
  • Multiple Traversals
  • Independent
  • Do not modify the tree during traversal
  • Traversals dominate computation
  • Multiple Traversals
  • Independent
  • Do not modify the tree during traversal
  • Two-point Correlation (PC)
  • Longest Common Substring (LCS)

Top-down traversals, different tree type

  • Two-point Correlation (PC)
  • Nearest Neighbor (NN)
  • K-Nearest Neighbor (KNN)
  • Barnes-Hut (BH)
  • Photon Mapping (PM)
  • Frequent Item-set Mining (FIM)
  • K-Means Clustering (KC)
  • Longest Common Substring (LCS)
  • Fast Multipole Method (FMM)
  • Top-down traversal, different tree type
  • Barnes-Hut (BH)
  • Fast Multipole Method (FMM)

Bottom-up traversal, same tree type

  • Top-down traversal, different tree type
  • Bottom-up traversal, same tree type
  • Photon Mapping (PM)
  • K-Means Clustering (KC)
  • Barnes-Hut (BH)
  • Frequent Item-set Mining (FIM)

Iterative, modify tree and (or) traversals

  • Top-down traversal, different tree type
  • Bottom-up traversal, same tree type
  • Iterative, modify tree or (and) traversals
  • Traversals dominate computation
  • Multiple Traversals
  • Independent
  • Do not modify the tree during traversal

12

ISPASS2017

slide-13
SLIDE 13

Th The Ontology

  • Top-down vs. Bottom-up
  • Type of tree
  • Iterative with tree mutation
  • Iterative with working-set mutation
  • Guided vs. Unguided

13

ISPASS2017

slide-14
SLIDE 14

Gu Guid ided vs. Un Unguid ided

14

G

1.Unguided traversal[15]

  • Fixed order for every traversal

(e.g. left child followed by right)

2.Guided traversal

  • Data dependent traversal order
  • Order depends on vertex-computation

[15] Goldfarb et.al.,SC’13

ISPASS2017

slide-15
SLIDE 15

Cl Classi ssifi fication

15

Benchmark Domain Attributes Tree Type Two-Point Correlation Astrophysics, Statistics Top-down (preorder), guided (vp), unguided (kd) Kd, vp Nearest Neighbor Data mining Top-down (preorder), guided Kd, vp K-Nearest Neighbor Data mining Top-down (preorder), guided Kd, Ball Barnes-Hut Astrophysics Top-down (preorder), unguided, tree mutation

  • ct, Kd

Photon Mapping Computer Graphics Top-down (preorder), unguided, working-set mutation Kd Frequent item-set mining Data mining Bottom-up, unguided, tree mutation, working-set mutation Prefix K-Means Clustering Data mining, Machine learning Top-down (inorder), guided, tree mutation Kd Longest common substring Bioinformatics Top-down (postorder), unguided, tree mutation Suffix Fast Multipole Method Scientific computing Top-down (preorder) and bottom- up, unguided, tree mutation Quad

ISPASS2017

slide-16
SLIDE 16

Al Algorithm hm -> > On Ontology

16

Tree algorithm Ontology Optimization

Determine optimizations Categorize

What we have seen so far…

ISPASS2017

slide-17
SLIDE 17

Tiling Top-down, bottom-up Profile driven scheduling Top-down Optimization Structural properties Vectorization Unguided Data representation Vp trees for NN, prefix trees for FIM, suffix trees for LCS.

Op Optimizations

17

  • Optimizations are effective only when certain properties hold

Communication overhead Top-down

ISPASS2017

slide-18
SLIDE 18

Ev Evaluation Methodology

  • Platforms:
  • Shared-memory (SHM): processors - 2 10-core Xeon E5 2660 V3,

memory - 32 KB L1, 256KB L2, 25MB L3, 64GB RAM

  • Distributed-memory (DM): 10 nodes with high-speed Ethernet

interconnect

  • GPU: nVidia Tesla K20C.

host – 2 AMD 6164 HE processors, 32GB RAM

  • Metrics:
  • Architecture-independent
  • Average traversal length, Load imbalance
  • Architecture-dependent
  • L3 Miss Rate, CPI
  • All measurements consider traversal times only

18

ISPASS2017

slide-19
SLIDE 19

Sc Scalability

19

Number of processes Runtime (s)

ISPASS2017

slide-20
SLIDE 20

Sc Scalability contd.

  • Adding more cores results in better performance
  • DM plots show excellent scaling
  • SHM and GPU plots similar
  • KC and LCS are exceptions
  • Iterative tree mutation algorithms marked by heavy

synchronization at the end of an iteration

  • LCS less available parallelism

20

ISPASS2017

slide-21
SLIDE 21

Su Summa mmary (scalability)

  • Most kernels scale well while taking advantage of
  • ntology-driven optimizations
  • Point Correlation (PC) with vp-tree is better than

kd-tree

  • Barnes-Hut (BH) is sensitive to tree type and input

distribution

21

ISPASS2017

slide-22
SLIDE 22

Al Algorithm hm <- Op Optimization

22

Generalize Categorize

Tree algorithm Ontology Optimization

Map optimizations Categorize

What we have seen so far…

ISPASS2017

slide-23
SLIDE 23

Ca Case se study

  • Generalizing locally essential trees (LET)
  • BH specific (distributed-memory)
  • Partial replication of tree structure
  • Partial replication of only the top-subtree.
  • Improves load-imbalance and minimizes communication overhead

23

ISPASS2017

slide-24
SLIDE 24

Co Conclusi sions

  • Treelogy
  • Ontology
  • Mapping of optimizations to structural properties
  • A suite of 9 tree traversal kernels spanning ontology
  • Shared-memory, distributed-memory, and GPU

implementations

  • Multiple tree types based on popularity and efficiency
  • Evaluations showed that most kernels scale well
  • Two-point correlation (PC) with vp-trees better than

standard tree used in literature

24

ISPASS2017

slide-25
SLIDE 25

Thank you

25

ISPASS2017