Visualizing Distributed Memory Computations with Hive Plots VizSec - - PowerPoint PPT Presentation

visualizing distributed memory computations with hive
SMART_READER_LITE
LIVE PREVIEW

Visualizing Distributed Memory Computations with Hive Plots VizSec - - PowerPoint PPT Presentation

Visualizing Distributed Memory Computations with Hive Plots VizSec 2012, October 15, 2012, Seattle, Washington Sophie Engle and Sean Whalen 2 Introduction Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean


slide-1
SLIDE 1

Visualizing Distributed Memory Computations with Hive Plots

VizSec 2012, October 15, 2012, Seattle, Washington Sophie Engle and Sean Whalen

slide-2
SLIDE 2

Introduction

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

2

slide-3
SLIDE 3

Introduction

  • High performance computing environments

– Used for scientific computing applications at several national laboratories – Potential for misuse by insiders and outsiders

  • Anomaly detection

– Determine normal versus abnormal behavior for these environments to prevent unauthorized use – Can classify codes into “computational dwarves” to determine “normal” (Asanovic 2006)

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

3

slide-4
SLIDE 4

Introduction

  • Several network measures can be used as

features in classification

– Time consuming to calculate these measures – Time consuming to compare how well these measures perform for classification

  • Use visualization to choose network measures to

use as classification features

– Which measures look similar for similar codes? – Which measures look distinct for distinct codes?

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

4

slide-5
SLIDE 5

Dataset

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

5

slide-6
SLIDE 6

Original Dataset

  • Data collection

– Collected by NERSC at LBNL – Used IPM to monitor MPI calls between ranks (captures communication between compute nodes)

  • Dataset contents

– Total of 1681 IPM logs – Covers 29 different scientific computing codes with varying ranks, parameters, and architectures

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

6

slide-7
SLIDE 7

Original Dataset

Src,Dst,MPICall,Bytes,Repeats,Code 0,1,29,99856,52,cactus 0,4,29,99856,52,cactus 0,0,2,4,5,cactus 0,0,2,8,7,cactus 0,1,22,599136,26,cactus 0,-1,5,0,1,cactus 0,4,22,599136,26,cactus 0,16,29,99856,52,cactus

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

7

slide-8
SLIDE 8

Original Dataset

Src,Dst,MPICall,Bytes,Repeats,Code 0,1,29,99856,52,cactus 0,4,29,99856,52,cactus 0,0,2,4,5,cactus 0,0,2,8,7,cactus 0,1,22,599136,26,cactus 0,-1,5,0,1,cactus 0,4,22,599136,26,cactus 0,16,29,99856,52,cactus

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

8

slide-9
SLIDE 9

Subset Analyzed

Code Description Nodes Edges cactus astrophysics 64 989 ij algebraic multi-grid 64 8596 milc lattice gauge theory 64 1473 namd molecular dynamics 64 8208 paratec materials science 64 16492 superlu sparse linear algebra 64 3239 tgyro magnetic fusion 64 1123 vasp materials science 64 13760

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

9

slide-10
SLIDE 10

Subset Analyzed

Code Description Nodes Edges cactus astrophysics 64 989 ij algebraic multi-grid 64 8596 milc lattice gauge theory 64 1473 namd molecular dynamics 64 8208 paratec materials science 64 16492 superlu sparse linear algebra 64 3239 tgyro magnetic fusion 64 1123 vasp materials science 64 13760

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

10

slide-11
SLIDE 11

Network Measures

Measure Description degree the number of adjacent edges betweenness number of shortest paths going through a node closeness measures steps required to reach every other node eccentricity shortest path distance from farthest node page rank measures relative importance of node transitivity probability adjacent nodes are connected (clustering coefficient)

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

11

Calculated in R using the igraph library.

slide-12
SLIDE 12

Motivation

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

12

slide-13
SLIDE 13

Traditional Degree CCDF Plot

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

13

cactus ij milc namd superlu tgyro

slide-14
SLIDE 14

Traditional Degree CCDF Plot

  • Pros:

– Able to compare individual metrics across datasets – Simple approach, widely used

  • Cons:

– Contains no information on topology – Lines look visually similar, may not be appropriate for generating visual signatures

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

14

slide-15
SLIDE 15

Adjacency Matrices

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

15

slide-16
SLIDE 16

Adjacency Matrices

  • Pros:

– Comparable across datasets – Easy to see communication patterns – Many distinct codes look distinct

  • Cons:

– No information on metrics needed for classification

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

16

slide-17
SLIDE 17

Issues Identified

  • Traditional CCDF plot does not convey any

information about network topology

  • Traditional adjacency matrices do not convey

any information about network properties

  • Traditional network layout algorithms are not

repeatable or comparable across networks

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

17

slide-18
SLIDE 18

Hive Plots

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

18

slide-19
SLIDE 19

Introduction to Hive Plots

  • What are hive plots?

– Network layout algorithm using network properties for consistent node placement – A radially-arranged parallel coordinate plot

  • Why use hive plots?

– Repeatable, comparable network layouts – Integration of network properties with topology

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

19

slide-20
SLIDE 20

Understanding Hive Plots

32 259 837

tgyro degree

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

20

slide-21
SLIDE 21

Understanding Hive Plots

32 259 837

tgyro degree

primary axis self loop max axis value edges b/w nodes

  • n different axes

degree ranges from 0 to 837 across datasets node degree b/w 260 and 837 edges b/w nodes

  • n same axis

duplicate axis

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

21

slide-22
SLIDE 22

Implementation

  • Existing implementations exist

– JHive (Java) – HiveR (R) – HiveGraph (webapp) – Prototypes in Perl and d3.js

  • Custom implementation in R and ggplot2

– Implements grammar of graphics (Wilkinson) – Polar plots to create hive plots – Facets to create hive panels* – Non-interactive

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

22

slide-23
SLIDE 23

Implementation

0.008 0.015 0.661 0.008 0.015 0.661

evenly spaced nodes

0.008 0.015 0.661 0.008 0.015 0.661

linear interpolation

0.008 0.015 0.661 0.008 0.015 0.661

interpolation and jitter page rank (milc) page rank (cactus)

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

23

slide-24
SLIDE 24

Implementation

32 259 837 0.008 0.016 0.016

consistent alpha

32 259 837 0.008 0.016 0.016

variable alpha closeness (cactus) degree (superlu)

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

24

slide-25
SLIDE 25

Hive Plot References

Hive Plots—Rational Approach to Visualizing Networks by Martin Krzywinski, Inanc Birol, Steven JM Jones and Marco A Marra in Briefings in Bioinformatics, volume 13, issue 5, pages 627–644, 2012 Hive Plots: Rational Network Visualization—Farewell to Hairballs by Martin Krzywinski at http://www.hiveplot.com online Getting Into Visualization of Large Biological Data Sets by Martin Krzywinski, Inanc Birol, Steven Jones, Marco Marra in BioVis 2012 Posters, 2nd floor foyer, Sunday 8:30am – Monday 5:55pm

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

25

slide-26
SLIDE 26

Initial Results

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

26

slide-27
SLIDE 27

Degree

cactus namd ij superlu milc tgyro

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

27

slide-28
SLIDE 28

Betweenness

cactus namd ij superlu milc tgyro

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

28

slide-29
SLIDE 29

Closeness

cactus namd ij superlu milc tgyro

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

29

slide-30
SLIDE 30

Eccentricity

cactus namd ij superlu milc tgyro

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

30

slide-31
SLIDE 31

Page Rank

cactus namd ij superlu milc tgyro

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

31

slide-32
SLIDE 32

Transitivity

cactus namd ij superlu milc tgyro

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

32

slide-33
SLIDE 33

Visually Distinct

cactus cactus ij ij milc milc namd namd superlu superlu tygro tygro

degree

x x x x

betweenness

x x x x x x

closeness

x x x x

eccentricity

x x x

page rank

x x x x

transitivity

x x x x x x

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

33

slide-34
SLIDE 34

Visually Distinct

cactus cactus ij ij milc milc namd namd superlu superlu tygro tygro

degree

x x x x

betweenness

x x x x x x

closeness

x x x x

eccentricity

x x x

page rank

x x x x

transitivity

x x x x x x

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

34

slide-35
SLIDE 35

Next Steps

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

35

slide-36
SLIDE 36

Next Steps

  • Improve hive plot visualizations

– Explore variable-length axes – Explore better axes assignment

  • Incorporate more information from data set

– Multiple-edge connections – Type of IPM calls – Amount of data transmitted

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

36

slide-37
SLIDE 37

Next Steps

  • Feature identification

– Compare hive plots for more distinct codes – Compare hive plots for similar codes – Identify features that visually distinguish codes

  • Classification and anomaly detection

– Determine if features identified by visualization lead to better classifiers and anomaly detection

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

37

slide-38
SLIDE 38

Conclusion

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

38

slide-39
SLIDE 39

Summary

  • Motivation and goals

– Improve anomaly detection in HPC environments – Improve classification of HPC codes – Use exploratory visualization for feature selection

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

39

slide-40
SLIDE 40

Summary

  • Motivation and goals

– Improve anomaly detection in HPC environments – Improve classification of HPC codes – Use exploratory visualization for feature selection

  • Initial results

– Hive plots allow visual comparison of HPC codes – Some features distinguish distinct HPC codes

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

40

slide-41
SLIDE 41

References

Hive Plots—Rational Approach to Visualizing Networks by Martin Krzywinski, Inanc Birol, Steven JM Jones and Marco A Marra in Briefings in Bioinformatics, volume 13, issue 5, pages 627–644, 2012 Network-Theoretic Classification of Parallel Computation Patterns by Sean Whalen, Sophie Engle, Sean Peisert, and Matt Bishop in International Journal of High Performance Computing Applications (IJHPCA), volume 26, number 2, pages 159–169, May 2012 Multiclass Classification of Distributed Memory Parallel Computations by Sean Whalen, Sean Peisert, and Matt Bishop to appear in Pattern Recognition Letters (PRL), 2012

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

41

slide-42
SLIDE 42

Contact Information

Sophie Engle University of San Francisco Department of Computer Science sjengle@cs.usfca.edu w http://sjengle.cs.usfca.edu Sean Whalen Mount Sinai School of Medicine Institute for Genomics and Multiscale Biology shwhalen@cs.columbia.edu w http://node99.org

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

42

slide-43
SLIDE 43

Questions?

VizSec 2012, October 15, 2012, Seattle, Washington Visualizing Distributed Memory Computations with Hive Plots by Sophie Engle and Sean Whalen

43