xalt user environment tracking
play

XALT: User Environment Tracking Robert McLay, Mark Fahey, Reuben - PowerPoint PPT Presentation

XALT: User Environment Tracking Robert McLay, Mark Fahey, Reuben Budiardja, Sandra Sweat The Texas Advanced Computing Center, Argonne National Labs, NICS Jan. 31, 2016 XALT Conclusion XALT: What runs on the system A U.S. NSF Funded


  1. XALT: User Environment Tracking Robert McLay, Mark Fahey, Reuben Budiardja, Sandra Sweat The Texas Advanced Computing Center, Argonne National Labs, NICS Jan. 31, 2016

  2. XALT Conclusion XALT: What runs on the system • A U.S. NSF Funded project: PI: Mark Fahey and Robert McLay • A Census of what programs and libraries are run • Running at TACC, NICS, U. Florida, KAUST, ... • Integrates with TACC-Stats. 2/29

  3. XALT Conclusion Design Goals • Be extremely light-weight • Provide provenance data: How? • How many use a library or application? • Collect Data into a Database for analysis. 3/29

  4. XALT Conclusion Design: Linker • The linker (ld) wrapper intercepts the user link line. – A shell script wrapper, ld which uses python scripts – Generate assembly code: key-value pairs – Capture tracemap output from ld – Transmit collected data in *.json format 4/29

  5. XALT Conclusion Design: Launcher • Program Launcher: mpirun, aprun, ibrun ... – A shell script wrapper is called which uses python scripts – Find Executable by parsing command – Collect executable info, shared libraries, env. – Transmit collected data in *.json format • The future is now. This is nolonger necessary! 5/29

  6. XALT Conclusion Design: Transmission to DB • File: collect nightly • Syslog: Use Syslog filtering • Direct to DB. 6/29

  7. XALT Conclusion Lmod to XALT connection • Lmod spider walks entire module tree. • Can build A Reverse Map from paths to modules • Can map program & libraries to modules. • /opt/apps/i15/mv2 2 1/phdf5/1.8.14/lib/libhdf5.so.9 ⇒ phdf5/1.8.14(intel/15.02:mvapich2/2.1) 7/29

  8. XALT Conclusion Lmod: Priority Path • Fixed Job Launcher: ibrun, aprun • Variable Launchers: mpirun, mpiexec • Priority Path: prepend path { "PATH", "/opt/apps/xalt/1.0/bin", priority=100 } 8/29

  9. XALT Conclusion Database Changes (I) • Tables sizes in XALT: +------------------+------------+ | Table | Size in MB | +------------------+------------+ | join_run_env | 199603.00 | | join_run_object | 9388.00 | | join_link_object | 5013.00 | | xalt_run | 4613.00 | | xalt_object | 4175.00 | | xalt_link | 814.00 | +------------------+------------+ • join run env has 2.1 billion rows 9/29

  10. XALT Conclusion Database Changes (II) • Environment variables are important. • But mainly for reproducing results • Not SQL tests (mostly) 10/29

  11. XALT Conclusion Database Changes (III): New Design • Store complete env ⇒ compressed json blob • Filter Env’s with Accept Test followed by Reject Test • Instead of 250 vars per job ⇒ 20 to 30. 11/29

  12. XALT Conclusion Protecting XALT (I): UTF8 Characters • Linux supports UTF8 Characters in file names, env. vars. • Python supports UTF8 if you know what you are doing. • Switch XALT to use cursor.execute(query, (job id, user, ...) • Where query="INSERT INTO table VALUE(%s,%s)" • This prevent SQL injection: “johnny drop tables;” • Also supports UTF8 characters. 12/29

  13. XALT Conclusion Protecting XALT (II): PYTHONHOME,... • Four Ways: LD LIBRARY PATH, PATH, PYTHONPATH, PYTHONHOME • Solution: LD LIBRARY PATH=”@ld lib path@” PATH= @python@ -E python-script ... • Everything that depends on PATH must be hard coded • basename ⇒ /bin/basename • Unique install for each operating system. • Programs move around: basename 13/29

  14. XALT Conclusion Using XALT Data • Targetted Outreach: Who will be affected • Largemem Queue Overuse • XALT and TACC-Stats 14/29

  15. XALT Conclusion Publishing XALT Data • Student Sandra Sweat • Sanitized Data • Community Codes Reported: Vasp*, WRF*, OpenFOAM*, • users names : U012354, Charge Accounts: A12345 • Unique mapping, Added Field of Science 15/29

  16. XALT Conclusion Tracking Non-mpi jobs (I) • Originally we tracked only MPI Jobs • By hijacking mpirun etc. • Now we can use ELF binary format to track jobs 16/29

  17. XALT Conclusion ELF Binary Format Trick void myinit(int argc, char **argv) { /* ... */ } void myfini() { /* ... */ } __attribute__((section(".init_array"))) typeof(myinit) *__init = myinit; __attribute__((section(".fini_array"))) typeof(myfini) *__fini = myfini; 17/29

  18. XALT Conclusion Using the ELF Binary Format Trick • This C code is compiled and linked in through the hijacked linker • It can also be used with LD PRELOAD • We are using both... 18/29

  19. XALT Conclusion Downsides • Currently, we only track task 0 jobs. • MPMD programs will only record the Task 0 job. • We also lose the ability to capture return exit status 19/29

  20. XALT Conclusion Upsides (I) • Can now track all executables period. • Can now track “launcher” jobs. 20/29

  21. XALT Conclusion Upsides (II) • Do not need to write/maintain a parser for ibrun, mpirun ... • Do not need to correctly jump over certain executables: – OK: ibrun tacc affinity user program – Not O.K: ibrun env -u foo user program 21/29

  22. XALT Conclusion Challenges (I) • With both LD PRELOAD and init.o linked in. ⇒ double records • Do not want to track mv, cp, etc • Only want to track some executables on compute nodes • Do not want to get overwhelmed by the data. 22/29

  23. XALT Conclusion Why do both? • We want both linking in and LD PRELOAD , Why? – Data on programs built before XALT – Data on GUI debugger, ... – User sets LD PRELOAD 23/29

  24. XALT Conclusion Avoid Double counting • .init array and _ fini array work like an onion. • .init array : a Stack: LIFO • .fini array : a Queue: LILO • Preload, Built-in, program, Built-in, Preload • Use an env. var. to prevent double counting 24/29

  25. XALT Conclusion Other Safety Features • XALT Tracking only told to • Compute node only • Filter based on Path • Protection against closing stderr before fini. 25/29

  26. XALT Conclusion Path Filtering • Accept test, following an Ignore Test, • Two files containing regex patterns, converted to code. • Accept List Tests: Track /usr/bin/ddt, /bin/tar • Ignore List Tests: /usr/bin, /bin, /sbin, ... 26/29

  27. XALT Conclusion A LD PRELOAD debug version • Normal Version is fast with minimal tests. • A debug version is provide to help with testing: • LD PRELOAD=$XALT DEBUG INIT ./a.out 27/29

  28. XALT Conclusion XALT Demo • Show modules hierarchy • ml –raw show xalt • Show debugging output • type -a ld,mpirun • Build programs • Run tests • Run utf8 program • Show database results 28/29

  29. XALT Conclusion Conclusion • Lmod: – Source: github.com/TACC/lmod.git, lmod.sf.net – Documentation: lmod.readthedocs.org • XALT: – Source: github.com/Fahey-McLay/xalt.git, xalt.sf.net – Documentation: doc/*.pdf 29/29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend