High-performance Python-C++ bindings with PyPy and Cling Wim - - PowerPoint PPT Presentation

high performance python c bindings with pypy and cling
SMART_READER_LITE
LIVE PREVIEW

High-performance Python-C++ bindings with PyPy and Cling Wim - - PowerPoint PPT Presentation

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N High-performance Python-C++ bindings with PyPy and Cling Wim Lavrijsen (LBNL) and Aditi Dutta


slide-1
SLIDE 1

High-performance Python-C++ bindings with PyPy and Cling

Wim Lavrijsen (LBNL) and Aditi Dutta (Nanyang Tech) PyHPC 2016

6th Workshop on Python for High-Performance and Scientific Computing

November 14, 2016, Salt Lake City, UT, USA

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

slide-2
SLIDE 2

High-performance Python-C++ bindings with PyPy and Cling 2

Background: High Energy Physics

  • High energy physics (HEP)

– A.k.a. “particle physics”, explores matter, energy and

the fundamental forces of nature

– Often works on huge, long running experiments in

large, geographically dispersed collaborations

– The original “Big Data”

  • Software development challenges

– Range of different skill sets, preferences, interests – Large turnover of people over experiment life time – Run on everything, everywhere: grids, clusters, HPC

systems, clouds, and @home

slide-3
SLIDE 3

High-performance Python-C++ bindings with PyPy and Cling 3

ATLAS Detector

slide-4
SLIDE 4

High-performance Python-C++ bindings with PyPy and Cling 4

Background: Python in HEP

  • Historical time line of Python usage

– 2001: first interest and implementations – 2004: gone mainstream – 2009: drives frameworks, job transforms, analyses – 2013: Nobel Prize in Physics (Higgs boson) – 2016: first-class citizen in new experiments

  • Technology

– C++ adopted in 1994, main language since ~1998 – Python bindings home-grown: piggy-backed on C++

reflection for serialization and interactivity (CINT)

– Increased Python use thanks to Machine Learning

slide-5
SLIDE 5

High-performance Python-C++ bindings with PyPy and Cling 5

H → ZZ → 2e2μ

slide-6
SLIDE 6

High-performance Python-C++ bindings with PyPy and Cling 6

Our Goals

  • Support C++11 and beyond
  • The scale and distribution to support large codes
  • High performance (with PyPy)
slide-7
SLIDE 7

High-performance Python-C++ bindings with PyPy and Cling 7

First Target: C++11 and beyond

  • C++ language standardization went hyperdrive

– Then: C++981 – Now: C++11, C++14, C++17, C++2x, ...

  • We parse C++ headers for Reflection to

– Automate I/O and schema evolution – Use C++ interactively from an interpreter – Provide automatic Python-C++ bindings

  • Impossible to keep up with a small team ...

1With technical corrigendum in '03

CINT, a homegrown parser originated at HP, was replaced with Cling, which is an interactive C++ interpreter based on Clang (LLVM). Cling is developed by CERN. Our CPython-based Python bindings have followed suit.

slide-8
SLIDE 8

High-performance Python-C++ bindings with PyPy and Cling 8

Second Target: Scale and Distribution

Problem Solution

C++ developers, but Python users Fully automatic, interactive bindings, based on parsing C++ headers Huge number of classes, functions, etc. Lazy lookup/creation: pre-compiled modules and bindings only at run-time Lots of libraries and dependencies Automatic loaders with search paths Name clashes, duplicates Follow C++ (i.e. linker) structure to scope and uniquely identify names Too much “C++ feel” Reflection-based pythonizations (automatic) and regexp-based support for pythonizing common patterns Different Python versions: v2, v3, CPython, pypy-c, ... Only core bindings module (cppyy) depends on Python

We leverage Python's and Cling's dynamic natures to maximize lazy evaluation, leading to shorter startup times and lower memory use.

slide-9
SLIDE 9

High-performance Python-C++ bindings with PyPy and Cling 9

Third Target: High Performance

  • Important for perception and decision making

– Python is not slow for those who use it

  • Part truth (heavy CPU loads in C++), part self-selection
  • Improve performance completely transparently

fast enough borderline too slow fast

new tools, extensions, annotations, ll rewrites

Python performance

  • ur target

Note: this turns out to be a rather small, and changing, group of Python users!

slide-10
SLIDE 10

High-performance Python-C++ bindings with PyPy and Cling 10

Technologies (Re-)Used

  • Goal: maximize reuse of existing projects

– Capture expertise, maintenance, future development

  • Projects so leveraged:

– Cling/ROOT (C++ interpreter https://root.cern/cling ) – Clang/LLVM (C++ compiler

http://llvm.org )

– PyPy

(Python w/ JIT http://pypy.org )

– CFFI

(Python FFI to C https://cffi.readthedocs.io )

Note: we also build this work on an earlier, Refex-based, version of cppyy. With Cling, we get more functionality, improved ease-of-use, better performance, and C++1x. Lines of C++ Lines of (R)Python CPython/cppyy ~18K ~1K PyPy/cppyy ~2K ~4K Not counting the ~1200 unit tests!

slide-11
SLIDE 11

High-performance Python-C++ bindings with PyPy and Cling 11

Architecture

(example of function calls shown)

Python cppyy C++ libraries Cling Clang ORCJit

(LLVM)

C++ headers

6: AST 5: parse 1: module import 2: lazy lookup

wrapper code

9: link 8: compile

CFFI

3: lazy lookup 4: fnd 7A: generate 10A: wrapper function ptrs 11: args & call 12: LL result 13: Py result 10B: direct function ptrs Two paths: 7A-10A: wrappers 7B-10B: direct FFI 7B: AST

slide-12
SLIDE 12

High-performance Python-C++ bindings with PyPy and Cling 12

Functionality

  • Both Python and Cling are interactive, allowing:

– Automatic template instantiations

  • Transparent unique_ptr<>, shared_ptr<>, etc.

– std::vector<> optimizations – Offset calculations for multiple virtual inheritance – Cross-language derivation (both ways) – Etc.?! More ideas to explore ...

  • C++1x much better at expressing ownership

– Improved automatic memory management – Also targeted semi-manually with pythonizations

  • E.g. name-based, custom smart pointers, etc.
slide-13
SLIDE 13

High-performance Python-C++ bindings with PyPy and Cling 13

Optimizations

  • PyPy JIT is conservative and optimizes Python

– JIT hints needed to “teach it C++,” e.g.:

  • Class hierarchies are fixed (and so are most offsets)
  • Side-effect free functions are elidable
  • Specialized paths (e.g. lookups, overloading, FFI)

– Need micro-benches to debug & verify performance

  • Hints are elementary, so scales to more complex codes
  • Set of micro-benchmarks follows

– Not feature-set exhaustive (yet) – Comparisons made with:

  • Target: optimized C++
  • CPython/cppyy: the default of most of our users
  • Swig: well-known, widely used
slide-14
SLIDE 14

High-performance Python-C++ bindings with PyPy and Cling 14

Micro-benchmark: empty function call

Remaining overhead is GIL release/re-acquire (~20x). pypy-c pure inlines.

0.2x

slide-15
SLIDE 15

High-performance Python-C++ bindings with PyPy and Cling 15

Micro-benchmark: “complex” function call

Remaining overhead is GIL release/re-acquire (~3x).

1.5x

slide-16
SLIDE 16

High-performance Python-C++ bindings with PyPy and Cling 16

Micro-benchmark:

  • verloaded function call

Swig tries methods in order, cppyy hashes successful calls. FFI suffers from GIL.

18x

slide-17
SLIDE 17

High-performance Python-C++ bindings with PyPy and Cling 17

Micro-benchmark: data member access

SWIG creates Python properties in Python, CPython/cppyy in C++.

1.7x 4.2x

slide-18
SLIDE 18

High-performance Python-C++ bindings with PyPy and Cling 18

Micro-benchmark: std::vector<int>

15x

There's a frame left in FFI path; pypy-c pure uses array.array

slide-19
SLIDE 19

High-performance Python-C++ bindings with PyPy and Cling 19

Realistic Code

Creates values, applies some math, makes selections, store in histograms and ntuple format, write to disk.

slide-20
SLIDE 20

High-performance Python-C++ bindings with PyPy and Cling 20

Caveats

  • Non-JITed pypy-c is ~2x slower than CPython

– Is code generation problem; don't expect fix

  • PyPy uses a true garbage collector

– C++ destructors called “randomly”

  • Can call gc.collect() explicitly to force calls
  • No guarantee that destructors will be called on exit

– No true RAII possible

  • PyPy JIT can be fickle

– Inner loop branches take a long time to heat up – Minor code changes can cause performance drops

slide-21
SLIDE 21

High-performance Python-C++ bindings with PyPy and Cling 21

Distribution

  • Two modules and a pip for externals

– cppyy in PyPy is builtin

  • Currently on cling-support branch; on main soon

– cppyy for CPython is extension module

  • In most Linux distros, MacPorts, etc. (as part of ROOT)

– Pip package with externals (for PyPy) to be released

  • Licenses:

– All open source, all very permissive

slide-22
SLIDE 22

High-performance Python-C++ bindings with PyPy and Cling 22

Conclusions

  • We developed Cling-based Python-C++ bindings

– Supports C++1x and beyond – Supports large C++ codes – High performance with PyPy

  • Combined interactive C++ with Python

– New functionality and optimizations

  • Showed 3x improvement for realistic code

This work was supported by the ATLAS Collaboration, Google Summer of Code, and CERN SFT.