Remora: A Resource Monitoring Tool for Everyone Carlos Rosales - - PowerPoint PPT Presentation

remora a resource monitoring tool for everyone
SMART_READER_LITE
LIVE PREVIEW

Remora: A Resource Monitoring Tool for Everyone Carlos Rosales - - PowerPoint PPT Presentation

Remora: A Resource Monitoring Tool for Everyone Carlos Rosales carlos@tacc.utexas.edu Where does that odd name come from??? It attaches to the user processes It travels with them in the system It feeds off your job


slide-1
SLIDE 1

Remora: A Resource Monitoring Tool for Everyone

Carlos ¡Rosales carlos@tacc.utexas.edu

slide-2
SLIDE 2

Where does that odd name come from???

  • It attaches to the user

processes

  • It travels with them in

the system

  • It feeds off your job

(overhead) but provides some benefits (information)

slide-3
SLIDE 3

What is Remora?

  • Remora monitors all user activity and provides per-node

and per-job resource utilization data

  • Developed by Antonio Gomez-Iglesias and Carlos Rosales

at TACC

  • Open source, available at github
  • NOT a profiler
  • NOT a debugger
  • But the data collected can often be used to improve

code performance or detect issues

slide-4
SLIDE 4

Common Issues

  • User questions:

– Why did I get banned from running jobs? – Why did my job crash? – Why is my performance so low in your supercomputer?

  • We have some tools in place:

– Server logs (Splunk) – TACC Stats (hardware counter data, 10 min period)

slide-5
SLIDE 5

Current Tools Are Insufficient

  • 10 min interval in TACC Stats misses spikes of

activity.

– Fails to detect single large memory allocations – Fails to detect localized instances of high IO traffic.

  • Splunk is tedious to parse and typically only

contains catastrophic errors.

  • NEITHER is visible to the user
  • Many useful features, but missing some critical to
  • ur users
slide-6
SLIDE 6

How does Remora fix those issues?

  • Fine-grained temporal resolution (tunable)
  • Simplified output for basic user

– Highlights possible issues without overwhelming

  • Raw data available for advance users

– Deep analysis of each run possible – Post-processing tools provided

slide-7
SLIDE 7

Information Collected

  • Detailed timing of the application
  • CPU utilization
  • Memory utilization
  • NUMA information
  • I/O information (FS load and Lustre traffic)
  • Network information (topology and IB traffic)
slide-8
SLIDE 8

Accelerator support

  • Intel Xeon Phi

– Treated like any other node – Background process is bound to core 61 to minimize overhead

  • GPU

– Collects memory information using nvidia-smi – Other information is much harder to get to!

slide-9
SLIDE 9

Remora Summary

============================================================================== TACC: Max Memory Used Per Node : 8.52 GB TACC: Total Elapsed Time : 0d 0h 0m 27s 64ms TACC: MDS Load (IO REQ/S) : 0.00 (HOME) / 0.00 (WORK) / 2.00 (SCRATCH)

  • TACC: Sampling Period : 2 seconds

TACC: Complete Report Data : /full/path/to/workdir/remora_5905747 ==============================================================================

Plus ¡additional ¡lines ¡for ¡memory ¡utilization ¡is ¡MICs ¡or ¡GPUs ¡are ¡used

slide-10
SLIDE 10

Raw Data Analysis

5 10 15 20 25 30 35 50 100 150 200 250 Memory Used (GB/s) Time (seconds) Remora Max Allowed Automated Collection 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 1000 2000 3000 4000 5000 6000 7000 8000 IO (requests/s) Time (seconds) Original Improved

slide-11
SLIDE 11

Raw Data Analysis

1 1.5 2 2.5 3 3.5 4 4.5 5 20 40 60 80 100 120 140 Memory Used (GB) Execution Time (s) CPU PHI

slide-12
SLIDE 12

Raw Data Analysis

slide-13
SLIDE 13

Simple to Use

module load remora remora ibrun mympi.code module load remora remora ./mycrazy.script

slide-14
SLIDE 14

Implementation

  • Bash and python, plus some C xltop trickery

by Antonio J

  • Master starts flat tree ssh connection to all

nodes

  • Background task spawned in each node
  • Background task collects data regularly
  • IO data collected only from master node
slide-15
SLIDE 15

Implementation

Programs

  • numastat
  • mpstat,
  • nvidia-smi
  • ibtracert
  • Ibstatus
  • xltop
  • python

Files

  • /proc/meminfo
  • /proc/<pid>/status
  • /proc/sys/lnet/stats
  • /sys/class/infinband/…
slide-16
SLIDE 16

Portability

  • Some hardcoded strings only applicable to TACC – easy fix

(coming soon)

  • Hardcoded MPI launcher (ibrun) – easy fix (coming soon)
  • XPost-processing has some TACC specific entries – easy fix

(coming soon)

  • ltop requirement for Lustre IO report
  • Need to expand on the way the hostlist is collected
slide-17
SLIDE 17

Future Plans

  • Comprehensive report generation
  • Identify egregious performance issues and generate

appropriate warnings

  • Add database for better comparative / historical

data analysis

  • Improve launch step for better scalabilty
slide-18
SLIDE 18

For more information: www.tacc.utexas.edu

Thanks! {carlos,agomez}@tacc.utexas.edu www.github.com/TACC/remora