XALT: User Environment Tracking Robert McLay, Mark Fahey, Reuben - - PowerPoint PPT Presentation

xalt user environment tracking
SMART_READER_LITE
LIVE PREVIEW

XALT: User Environment Tracking Robert McLay, Mark Fahey, Reuben - - PowerPoint PPT Presentation

XALT: User Environment Tracking Robert McLay, Mark Fahey, Reuben Budiardja, Sandra Sweat The Texas Advanced Computing Center, Argonne National Labs, NICS Jan. 31, 2016 XALT Conclusion XALT: What runs on the system A U.S. NSF Funded


slide-1
SLIDE 1

XALT: User Environment Tracking

Robert McLay, Mark Fahey, Reuben Budiardja, Sandra Sweat

The Texas Advanced Computing Center, Argonne National Labs, NICS

  • Jan. 31, 2016
slide-2
SLIDE 2

XALT Conclusion

XALT: What runs on the system

  • A U.S. NSF Funded project: PI: Mark Fahey and Robert

McLay

  • A Census of what programs and libraries are run
  • Running at TACC, NICS, U. Florida, KAUST, ...
  • Integrates with TACC-Stats.

2/29

slide-3
SLIDE 3

XALT Conclusion

Design Goals

  • Be extremely light-weight
  • Provide provenance data: How?
  • How many use a library or application?
  • Collect Data into a Database for analysis.

3/29

slide-4
SLIDE 4

XALT Conclusion

Design: Linker

  • The linker (ld) wrapper intercepts the user link line.

– A shell script wrapper, ld which uses python scripts – Generate assembly code: key-value pairs – Capture tracemap output from ld – Transmit collected data in *.json format

4/29

slide-5
SLIDE 5

XALT Conclusion

Design: Launcher

  • Program Launcher: mpirun, aprun, ibrun ...

– A shell script wrapper is called which uses python scripts – Find Executable by parsing command – Collect executable info, shared libraries, env. – Transmit collected data in *.json format

  • The future is now. This is nolonger necessary!

5/29

slide-6
SLIDE 6

XALT Conclusion

Design: Transmission to DB

  • File: collect nightly
  • Syslog: Use Syslog filtering
  • Direct to DB.

6/29

slide-7
SLIDE 7

XALT Conclusion

Lmod to XALT connection

  • Lmod spider walks entire module tree.
  • Can build A Reverse Map from paths to modules
  • Can map program & libraries to modules.
  • /opt/apps/i15/mv2 2 1/phdf5/1.8.14/lib/libhdf5.so.9 ⇒

phdf5/1.8.14(intel/15.02:mvapich2/2.1)

7/29

slide-8
SLIDE 8

XALT Conclusion

Lmod: Priority Path

  • Fixed Job Launcher: ibrun, aprun
  • Variable Launchers: mpirun, mpiexec
  • Priority Path:

prepend path{"PATH", "/opt/apps/xalt/1.0/bin", priority=100}

8/29

slide-9
SLIDE 9

XALT Conclusion

Database Changes (I)

  • Tables sizes in XALT:

+------------------+------------+ | Table | Size in MB | +------------------+------------+ | join_run_env | 199603.00 | | join_run_object | 9388.00 | | join_link_object | 5013.00 | | xalt_run | 4613.00 | | xalt_object | 4175.00 | | xalt_link | 814.00 | +------------------+------------+

  • join run env has 2.1 billion rows

9/29

slide-10
SLIDE 10

XALT Conclusion

Database Changes (II)

  • Environment variables are important.
  • But mainly for reproducing results
  • Not SQL tests (mostly)

10/29

slide-11
SLIDE 11

XALT Conclusion

Database Changes (III): New Design

  • Store complete env ⇒ compressed json blob
  • Filter Env’s with Accept Test followed by Reject Test
  • Instead of 250 vars per job ⇒ 20 to 30.

11/29

slide-12
SLIDE 12

XALT Conclusion

Protecting XALT (I): UTF8 Characters

  • Linux supports UTF8 Characters in file names, env. vars.
  • Python supports UTF8 if you know what you are doing.
  • Switch XALT to use cursor.execute(query, (job id,

user, ...)

  • Where query="INSERT INTO table VALUE(%s,%s)"
  • This prevent SQL injection: “johnny drop tables;”
  • Also supports UTF8 characters.

12/29

slide-13
SLIDE 13

XALT Conclusion

Protecting XALT (II): PYTHONHOME,...

  • Four Ways: LD LIBRARY PATH, PATH, PYTHONPATH,

PYTHONHOME

  • Solution: LD LIBRARY PATH=”@ld lib path@” PATH=

@python@ -E python-script ...

  • Everything that depends on PATH must be hard coded
  • basename ⇒ /bin/basename
  • Unique install for each operating system.
  • Programs move around: basename

13/29

slide-14
SLIDE 14

XALT Conclusion

Using XALT Data

  • Targetted Outreach: Who will be affected
  • Largemem Queue Overuse
  • XALT and TACC-Stats

14/29

slide-15
SLIDE 15

XALT Conclusion

Publishing XALT Data

  • Student Sandra Sweat
  • Sanitized Data
  • Community Codes Reported: Vasp*, WRF*, OpenFOAM*,
  • users names : U012354, Charge Accounts: A12345
  • Unique mapping, Added Field of Science

15/29

slide-16
SLIDE 16

XALT Conclusion

Tracking Non-mpi jobs (I)

  • Originally we tracked only MPI Jobs
  • By hijacking mpirun etc.
  • Now we can use ELF binary format to track jobs

16/29

slide-17
SLIDE 17

XALT Conclusion

ELF Binary Format Trick

void myinit(int argc, char **argv) { /* ... */ } void myfini() { /* ... */ } __attribute__((section(".init_array"))) typeof(myinit) *__init = myinit; __attribute__((section(".fini_array"))) typeof(myfini) *__fini = myfini;

17/29

slide-18
SLIDE 18

XALT Conclusion

Using the ELF Binary Format Trick

  • This C code is compiled and linked in through the hijacked

linker

  • It can also be used with LD PRELOAD
  • We are using both...

18/29

slide-19
SLIDE 19

XALT Conclusion

Downsides

  • Currently, we only track task 0 jobs.
  • MPMD programs will only record the Task 0 job.
  • We also lose the ability to capture return exit status

19/29

slide-20
SLIDE 20

XALT Conclusion

Upsides (I)

  • Can now track all executables period.
  • Can now track “launcher” jobs.

20/29

slide-21
SLIDE 21

XALT Conclusion

Upsides (II)

  • Do not need to write/maintain a parser for ibrun, mpirun ...
  • Do not need to correctly jump over certain executables:

– OK: ibrun tacc affinity user program – Not O.K: ibrun env -u foo user program

21/29

slide-22
SLIDE 22

XALT Conclusion

Challenges (I)

  • With both LD PRELOAD and init.o linked in. ⇒ double records
  • Do not want to track mv, cp, etc
  • Only want to track some executables on compute nodes
  • Do not want to get overwhelmed by the data.

22/29

slide-23
SLIDE 23

XALT Conclusion

Why do both?

  • We want both linking in and LD PRELOAD, Why?

– Data on programs built before XALT – Data on GUI debugger, ... – User sets LD PRELOAD

23/29

slide-24
SLIDE 24

XALT Conclusion

Avoid Double counting

  • .init array and _

fini array work like an onion.

  • .init array: a Stack: LIFO
  • .fini array: a Queue: LILO
  • Preload, Built-in, program, Built-in, Preload
  • Use an env. var. to prevent double counting

24/29

slide-25
SLIDE 25

XALT Conclusion

Other Safety Features

  • XALT Tracking only told to
  • Compute node only
  • Filter based on Path
  • Protection against closing stderr before fini.

25/29

slide-26
SLIDE 26

XALT Conclusion

Path Filtering

  • Accept test, following an Ignore Test,
  • Two files containing regex patterns, converted to code.
  • Accept List Tests: Track /usr/bin/ddt, /bin/tar
  • Ignore List Tests: /usr/bin, /bin, /sbin, ...

26/29

slide-27
SLIDE 27

XALT Conclusion

A LD PRELOAD debug version

  • Normal Version is fast with minimal tests.
  • A debug version is provide to help with testing:
  • LD PRELOAD=$XALT DEBUG INIT ./a.out

27/29

slide-28
SLIDE 28

XALT Conclusion

XALT Demo

  • Show modules hierarchy
  • ml –raw show xalt
  • Show debugging output
  • type -a ld,mpirun
  • Build programs
  • Run tests
  • Run utf8 program
  • Show database results

28/29

slide-29
SLIDE 29

XALT Conclusion

Conclusion

  • Lmod:

– Source: github.com/TACC/lmod.git, lmod.sf.net – Documentation: lmod.readthedocs.org

  • XALT:

– Source: github.com/Fahey-McLay/xalt.git, xalt.sf.net – Documentation: doc/*.pdf

29/29