high performance python c bindings with pypy and cling
play

High-performance Python-C++ bindings with PyPy and Cling Wim - PowerPoint PPT Presentation

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N High-performance Python-C++ bindings with PyPy and Cling Wim Lavrijsen (LBNL) and Aditi Dutta


  1. C O M P U T A T I O N A L R E S E A R C H D I V I S I O N High-performance Python-C++ bindings with PyPy and Cling Wim Lavrijsen (LBNL) and Aditi Dutta (Nanyang Tech) PyHPC 2016 6th Workshop on Python for High-Performance and Scientific Computing November 14, 2016, Salt Lake City, UT, USA

  2. Background: High Energy Physics ● High energy physics (HEP) – A.k.a. “particle physics”, explores matter, energy and the fundamental forces of nature – Often works on huge, long running experiments in large, geographically dispersed collaborations – The original “Big Data” ● Software development challenges – Range of different skill sets, preferences, interests – Large turnover of people over experiment life time – Run on everything, everywhere: grids, clusters, HPC systems, clouds, and @home High-performance Python-C++ bindings with PyPy and Cling 2

  3. ATLAS Detector High-performance Python-C++ bindings with PyPy and Cling 3

  4. Background: Python in HEP ● Historical time line of Python usage – 2001: first interest and implementations – 2004: gone mainstream – 2009: drives frameworks, job transforms, analyses – 2013: Nobel Prize in Physics (Higgs boson) – 2016: first-class citizen in new experiments ● Technology – C++ adopted in 1994, main language since ~1998 – Python bindings home-grown: piggy-backed on C++ reflection for serialization and interactivity (CINT) – Increased Python use thanks to Machine Learning High-performance Python-C++ bindings with PyPy and Cling 4

  5. H → ZZ → 2e2 μ High-performance Python-C++ bindings with PyPy and Cling 5

  6. Our Goals ● Support C++11 and beyond ● The scale and distribution to support large codes ● High performance (with PyPy) High-performance Python-C++ bindings with PyPy and Cling 6

  7. First Target: C++11 and beyond ● C++ language standardization went hyperdrive – Then: C++98 1 – Now: C++11, C++14, C++17, C++2x, ... ● We parse C++ headers for Reflection to – Automate I/O and schema evolution – Use C++ interactively from an interpreter – Provide automatic Python-C++ bindings ● Impossible to keep up with a small team ... CINT, a homegrown parser originated at HP, was replaced with Cling, which is an interactive C++ interpreter based on Clang (LLVM). Cling is developed by CERN. Our CPython-based Python bindings have followed suit. 1 With technical corrigendum in '03 High-performance Python-C++ bindings with PyPy and Cling 7

  8. Second Target: Scale and Distribution Problem Solution Fully automatic, interactive bindings, C++ developers, but Python users based on parsing C++ headers Lazy lookup/creation: pre-compiled Huge number of classes, functions, etc. modules and bindings only at run-time Lots of libraries and dependencies Automatic loaders with search paths Follow C++ (i.e. linker) structure to Name clashes, duplicates scope and uniquely identify names Reflection-based pythonizations Too much “C++ feel” (automatic) and regexp-based support for pythonizing common patterns Different Python versions: v2, v3, Only core bindings module (cppyy) CPython, pypy-c, ... depends on Python We leverage Python's and Cling's dynamic natures to maximize lazy evaluation, leading to shorter startup times and lower memory use. High-performance Python-C++ bindings with PyPy and Cling 8

  9. Third Target: High Performance ● Important for perception and decision making – Python is not slow for those who use it ● Part truth (heavy CPU loads in C++), part self-selection ● Improve performance completely transparently Python performance too fast enough borderline fast slow our target new tools, extensions, annotations, ll rewrites Note: this turns out to be a rather small, and changing, group of Python users! High-performance Python-C++ bindings with PyPy and Cling 9

  10. Technologies (Re-)Used ● Goal: maximize reuse of existing projects – Capture expertise, maintenance, future development ● Projects so leveraged: – Cling/ROOT (C++ interpreter https://root.cern/cling ) – Clang/LLVM (C++ compiler http://llvm.org ) – PyPy (Python w/ JIT http://pypy.org ) – CFFI (Python FFI to C https://cffi.readthedocs.io ) Lines of C++ Lines of (R)Python Not counting the CPython/cppyy ~18K ~1K ~1200 unit tests! PyPy/cppyy ~2K ~4K Note: we also build this work on an earlier, Refex-based, version of cppyy. With Cling, we get more functionality, improved ease-of-use, better performance, and C++1x. High-performance Python-C++ bindings with PyPy and Cling 10

  11. Architecture (example of function calls shown) 4: fnd C++ Python Cling headers 2: lazy lookup 5: parse 6: AST 7A: generate 1: module import 3: lazy lookup 7B: AST 13: Py result cppyy wrapper Clang code 10A: wrapper function ptrs 12: LL result 10B: direct 8: compile function ptrs 11: args & call ORCJit C++ Two paths: CFFI 7A-10A: wrappers (LLVM) libraries 7B-10B: direct FFI 9: link High-performance Python-C++ bindings with PyPy and Cling 11

  12. Functionality ● Both Python and Cling are interactive, allowing: – Automatic template instantiations ● Transparent unique_ptr<>, shared_ptr<>, etc. – std::vector<> optimizations – Offset calculations for multiple virtual inheritance – Cross-language derivation (both ways) – Etc.?! More ideas to explore ... ● C++1x much better at expressing ownership – Improved automatic memory management – Also targeted semi-manually with pythonizations ● E.g. name-based, custom smart pointers, etc. High-performance Python-C++ bindings with PyPy and Cling 12

  13. Optimizations ● PyPy JIT is conservative and optimizes Python – JIT hints needed to “teach it C++,” e.g.: ● Class hierarchies are fixed (and so are most offsets) ● Side-effect free functions are elidable ● Specialized paths (e.g. lookups, overloading, FFI) – Need micro-benches to debug & verify performance ● Hints are elementary, so scales to more complex codes ● Set of micro-benchmarks follows – Not feature-set exhaustive (yet) – Comparisons made with: ● Target: optimized C++ ● CPython/cppyy: the default of most of our users ● Swig: well-known, widely used High-performance Python-C++ bindings with PyPy and Cling 13

  14. Micro-benchmark: empty function call 0.2x Remaining overhead is GIL release/re-acquire (~20x). pypy-c pure inlines. High-performance Python-C++ bindings with PyPy and Cling 14

  15. Micro-benchmark: “complex” function call 1.5x Remaining overhead is GIL release/re-acquire (~3x). High-performance Python-C++ bindings with PyPy and Cling 15

  16. Micro-benchmark: overloaded function call 18x Swig tries methods in order, cppyy hashes successful calls. FFI suffers from GIL. High-performance Python-C++ bindings with PyPy and Cling 16

  17. Micro-benchmark: data member access 1.7x 4.2x SWIG creates Python properties in Python, CPython/cppyy in C++. High-performance Python-C++ bindings with PyPy and Cling 17

  18. Micro-benchmark: std::vector<int> 15x There's a frame left in FFI path; pypy-c pure uses array.array High-performance Python-C++ bindings with PyPy and Cling 18

  19. Realistic Code Creates values, applies some math, makes selections, store in histograms and ntuple format, write to disk. High-performance Python-C++ bindings with PyPy and Cling 19

  20. Caveats ● Non-JITed pypy-c is ~2x slower than CPython – Is code generation problem; don't expect fix ● PyPy uses a true garbage collector – C++ destructors called “randomly” ● Can call gc.collect() explicitly to force calls ● No guarantee that destructors will be called on exit – No true RAII possible ● PyPy JIT can be fickle – Inner loop branches take a long time to heat up – Minor code changes can cause performance drops High-performance Python-C++ bindings with PyPy and Cling 20

  21. Distribution ● Two modules and a pip for externals – cppyy in PyPy is builtin ● Currently on cling-support branch; on main soon – cppyy for CPython is extension module ● In most Linux distros, MacPorts, etc. (as part of ROOT) – Pip package with externals (for PyPy) to be released ● Licenses: – All open source, all very permissive High-performance Python-C++ bindings with PyPy and Cling 21

  22. Conclusions ● We developed Cling-based Python-C++ bindings – Supports C++1x and beyond – Supports large C++ codes – High performance with PyPy ● Combined interactive C++ with Python – New functionality and optimizations ● Showed 3x improvement for realistic code This work was supported by the ATLAS Collaboration, Google Summer of Code, and CERN SFT. High-performance Python-C++ bindings with PyPy and Cling 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend