O PP' O PP' - - PowerPoint PPT Presentation

o p p o p p
SMART_READER_LITE
LIVE PREVIEW

O PP' O PP' - - PowerPoint PPT Presentation

O PP' O PP' A JIT A


slide-1
SLIDE 1

O PP' O PP'

A JIT A JIT

Ronan Lamy

slide-2
SLIDE 2

A A

PyPy core dev Python consultant and freelance developer Contact: Ronan.Lamy@gmail.com @ronanlamy

slide-3
SLIDE 3

Guido van Rossum, PyCon 2015

"If you want your code magically to run faster, you should probably just use PyPy"

slide-4
SLIDE 4

PP PP

Fast and compliant implementation of Python Full support for 2.7 Beta support for 3.6 (full release soonish) Fast! (1x to 100x faster than CPython) cffi: fast and convenient interface to C code

slide-5
SLIDE 5

C C

numpy, scipy, pandas, scikit-learn, lxml, ... Cython + most extensions written in Cython 'pip install' works Wheels available at

https://github.com/antocuni/pypy-wheels

slide-6
SLIDE 6

CP CP

C compiler CPython source (C) Python code Bytecode Byte interp.

python

Do stuff or whatever

slide-7
SLIDE 7

Do stuff or whatever

slide-8
SLIDE 8

PP PP

RPython toolchain PyPy source (RPython) Python code Bytecode Byte interp. Tracing Machine code

pypy

Do stuff or whatever

slide-9
SLIDE 9

Do stuff or whatever

slide-10
SLIDE 10

RP RP

RPython code import Python objects (functions, classes, ...) Bytecode analysis, type inference Typed control flow graphs Add GC and JIT Generate C code gcc Compiled executable

slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13

E E

class Quantity: def __init__(self, value, unit): self.value = value self.unit = unit def __add__(self, other): if isinstance(other, Quantity): if other.unit != self.unit: raise ValueError("units must match") else: return Quantity(self.value + other.value, self.unit) else: return NotImplemented def __str__(self): return f"{self.value} {self.unit}" def compute(n): total = Quantity(0, 'm') increment = Quantity(1., 'm') for i in range(n)

slide-14
SLIDE 14

D D

slide-15
SLIDE 15

INPLACE_ADD () INPLACE_ADD ()

def INPLACE_ADD(self, *ignored): w_2 = self.popvalue() w_1 = self.popvalue() w_result = self.space.inplace_add(w_1, w_2) self.pushvalue(w_result) def inplace_add(space, w_lhs, w_rhs): w_impl = space.lookup(w_lhs, '__iadd__') if w_impl is not None: # cpython bug-to-bug compatibility: if (space.type(w_lhs).flag_sequence_bug_compat and not space.type(w_rhs).flag_sequence_bug_compat): w_res = _invoke_binop(space, space.lookup(w_rhs, '__radd__'), w_rhs, w_lhs) if w_res is not None: return w_res w_res = space.get_and_call_function(w_impl, w_lhs, w_rhs) if _check_notimplemented(space, w_res): return w_res return space.add(w_lhs, w_rhs)

slide-16
SLIDE 16

T JIT T JIT

Pareto principle

80% of the time is spent in 20% of the code

Most branches are very imbalanced

slide-17
SLIDE 17

T JIT T JIT

Pareto principle

80% of the time is spent in 20% of the code

Most branches are very imbalanced Compile only hot loops Optimise for the fast path Take advantage of run-time information

slide-18
SLIDE 18

T JIT T JIT

Pareto principle

80% of the time is spent in 20% of the code

Most branches are very imbalanced Compile only hot loops Optimise for the fast path Take advantage of run-time information Trace = record one iteration of the loop Optimise trace and add guards Trace the interpreter, not user code

slide-19
SLIDE 19

J J

RPython code contains JIT hints JIT drivers @dont_look_inside @elidable quasi-immutables Toolchain creates flowgraphs Flowgraphs serialised to JIT-friendly IR: jitcode jitcodes stored in binary

slide-20
SLIDE 20

T T

The Python interpreter runs on top of a tracing interpreter: the meta-interpreter Meta-interpreter executes jitcodes and records operations in SSA form. Inlines function calls, flattens loops, ... Program values labeled as constants or variables Tracing ends when loop is closed

slide-21
SLIDE 21

G G

Guards ~ JIT-level assertions On failure, must resume normal execution Checked at runtime Examples: conditional branches, overflow, exceptions, ... If a guard fails oen, compile a "bridge"

slide-22
SLIDE 22

G G

Guards ~ JIT-level assertions On failure, must resume normal execution Checked at runtime Examples: conditional branches, overflow, exceptions, ... If a guard fails oen, compile a "bridge" Out-of-line guards: invalidate the whole trace Zero run-time cost!

slide-23
SLIDE 23
  • Statistical profiler for CPython and PyPy

Visualise JIT traces vmprof client records profile and JIT information Server renders logs

slide-24
SLIDE 24

D D

slide-25
SLIDE 25

O O

Strength reduction intbounds Constant-folding strings Remove extra guards Virtuals and virtualisables

slide-26
SLIDE 26

U U

Compute invariants First iteration: preamble Second iteration: tight loop

slide-27
SLIDE 27

B B

x86, x86_64, PowerPC, S390x, ARMv7, ARM64 (in progress) GC has to be informed of dynamic allocations Linear register allocator Hand-written assembly for each operation

def genop_float_add(self, op, arglocs, result_loc): self.mc.ADDSD(arglocs[0], arglocs[1])

slide-28
SLIDE 28

D D

slide-29
SLIDE 29

S S

Be wary of microbenchmarks! RPython toolchain has a generic JIT framework PyPy interpreter exploits JIT hints Abstractions for free!

slide-30
SLIDE 30

C C

IRC: #pypy on Freenode IRC http://pypy.org pypy-dev @ python.org PyPy help desk Friday morning Sprint Saturday and Sunday Questions?

slide-31
SLIDE 31

T T