Jussi Enkovaara Martti Louhivuori Python in High-Performance - PDF document

Jussi Enkovaara Martti Louhivuori Python in High-Performance Computing CSC – IT Center for Science Ltd, Finland January 29-31, 2018 import sys, os try: from Bio.PDB import PDBParser __biopython_installed__ = True except ImportError: __biopython_installed__ = False __default_bfactor__ = 0.0 # default B-factor __default_occupancy__ = 1.0 # default occupancy level __default_segid__ = '' # empty segment ID class EOF(Exception): def __init__(self): pass class FileCrawler: """ Crawl through a file reading back and forth without loading anything to memory. """ def __init__(self, filename): try: self.__fp__ = open(filename) except IOError: raise ValueError, "Couldn't open file '%s' for reading." % filename self.tell = self.__fp__.tell self.seek = self.__fp__.seek def prevline(self): try: self.prev()

Agenda Monday Tuesday 9:00-9:15 Python and HPC 9.00-9.45 OptimisingPython with Cython 9:15-10:00 NumPy – fast array 9:45-10:30 Cython cont. interface to Python 10.30-10.45 Coffee break 10:00-10:30 Exercises 10:45-12:00 Exercises 10:30-10:45 Coffee Break 12.00-13.00 Lunch break 10:45-11:00 NumPy tools 13.00-13:45 Interfacing external libraries 11:00-12:00 Exercises 13:45-14:30 Exercises 12:00-13:00 Lunch break 14.30-14.45 Coffee break 13:00-13:45 Advanced indexing, Vectorized operations & 14.45-15.30 Multiprocessing broadcasting, numexpr 15:30-16:30 Exercises 13:45-14:30 Exercises 14:30-14:45 Coffee Break 14:45-15:30 Performance analysis 15:30-16:30 Exercises Wednesday 9.00-9.45 MPI introduction 9:45-10:30 Point-to-point communication 10.30-10.45 Coffee break 10:45-12:00 Exercises 12.00-13.00 Lunch break 13.00-13:45 Non-blocking communication and communicators 13:45-14:30 Exercises 14.30-14.15 Coffee break 14.15-15.30 Collective communications 15:30-16:30 Exercises 16:30-16:45 Summary of Python HPC strategies

PYTHON AND HIGH-PERFORMANCE COMPUTING

Efficiency Improving Python performance Python is an interpreted language Array based computations with NumPy – no pre-compiled binaries, all code is translated on-the-fly Using extended Cython programming language to machine instructions Embed compiled code in a Python program – byte-code as a middle step and may be stored (.pyc) – C, Fortran All objects are dynamic in Python Utilize parallel processing – nothing is fixed == optimisation nightmare – lot of overhead from metadata Flexibility is good, but comes with a cost! Parallelisation strategies for Python Agenda Tuesday Wednesday Monday Global Interpreter Lock (GIL) 9:00-9:15 Python and HPC 9.00-9.45 OptimisingPython with Cython 9.00-9.45 MPI introduction – CPython’s memory management is not thread-safe 9:15-10:00 NumPy – fast array 9:45-10:30 Point-to-point communication interface to Python 9:45-10:30 Cython cont. 10.30-10.45 Coffee break – no threads possible, except for I/O etc. 10.30-10.45 Coffee break 10:00-10:30 Exercises 10:45-12:00 Exercises 10:45-12:00 Exercises 10:30-10:45 Coffee Break 12.00-13.00 Lunch break – affects overall performance if threading 12.00-13.00 Lunch break 10:45-11:00 NumPy tools 13.00-13:45 Non-blocking communication and communicators 13.00-13:45 Interfacing external libraries 11:00-12:00 Exercises Process-based “threading” with multiprocessing 13:45-14:30 Exercises 13:45-14:30 Exercises 12:00-13:00 Lunch break 14.30-14.15 Coffee break 14.30-14.45 Coffee break 13:00-13:45 Advanced indexing, – fork independent processes that have a limited way to 14.15-15.30 Collective communications Vectorized operations & 14.45-15.30 Multiprocessing broadcasting, numexpr 15:30-16:30 Exercises 15:30-16:30 Exercises communicate 13:45-14:30 Exercises 16:30-16:45 Summary of Python HPC strategies 14:30-14:45 Coffee Break Message-passing is the Way to Go to achieve true 14:45-15:30 Performance analysis 15:30-16:30 Exercises parallelism in Python

NUMPY BASICS

Numpy – fast array interface Numpy arrays Standard Python is not well suitable for numerical All elements of an array have the same type computations Array can have multiple dimensions – lists are very flexible but also slow to process in numerical The number of elements in the array is fixed, shape can computations be changed Numpy adds a new array data type – static, multidimensional – fast processing of arrays – some linear algebra, random numbers Python list vs. NumPy array Creating numpy arrays Python list NumPy array From a list: >>> import numpy as np >>> a = np.array((1, 2, 3, 4), float) >>> a array([ 1., 2., 3., 4.]) >>> >>> list1 = [[1, 2, 3], [4,5,6]] >>> mat = np.array(list1, complex) >>> mat array([[ 1.+0.j, 2.+0.j, 3.+0.j], [ 4.+0.j, 5.+0.j, 6.+0.j]]) Memory layout Memory layout >>> mat.shape (2, 3) … >>> mat.size 6 Creating numpy arrays Indexing and slicing arrays More ways for creating arrays: Simple indexing: >>> import numpy as np >>> mat = np.array([[1, 2, 3], [4, 5, 6]]) >>> a = np.arange(10) >>> mat[0,2] >>> a 3 array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> mat[1,-2] >>> >>> 5 >>> b = np.linspace(-4.5, 4.5, 5) Slicing: >>> b array([-4.5 , -2.25, 0. , 2.25, 4.5 ]) >>> a = np.arange(5) >>> >>> a[2:] >>> c = np.zeros((4, 6), float) array([2, 3, 4]) >>> c.shape >>> a[:-1] (4, 6) array([0, 1, 2, 3]) >>> >>> a[1:3] = -1 >>> d = np.ones((2, 4)) >>> a >>> d array([0, -1, -1, 3, 4]) array([[ 1., 1., 1., 1.], [ 1., 1., 1., 1.]]) Indexing and slicing arrays Views and copies of arrays Slicing is possible over all dimensions: Simple assignment creates references to arrays >>> a = np.arange(10) Slicing creates “views” to the arrays >>> a[1:7:2] array([1, 3, 5]) Use copy() for real copying of arrays >>> >>> a = np.zeros((4, 4)) example.py >>> a[1:3, 1:3] = 2.0 a = np.arange(10) >>> a b = a # reference, changing values in b changes a array([[ 0., 0., 0., 0.], b = a.copy() # true copy [ 0., 2., 2., 0.], [ 0., 2., 2., 0.], c = a[1:4] # view, changing c changes elements [1:4] of a [ 0., 0., 0., 0.]]) c = a[1:4].copy() # true copy of subarray

Array manipulation Array manipulation reshape : change the shape of array concatenate : join arrays together >>> mat = np.array([[1, 2, 3], [4, 5, 6]]) >>> mat1 = np.array([[1, 2, 3], [4, 5, 6]]) >>> mat >>> mat2 = np.array([[7, 8, 9], [10, 11, 12]]) array([[1, 2, 3], >>> np.concatenate((mat1, mat2)) [4, 5, 6]]) array([[ 1, 2, 3], >>> mat.reshape(3,2) [ 4, 5, 6], array([[1, 2], [ 7, 8, 9], [3, 4], [10, 11, 12]]) [5, 6]]) >>> np.concatenate((mat1, mat2), axis=1) array([[ 1, 2, 3, 7, 8, 9], [ 4, 5, 6, 10, 11, 12]]) ravel : flatten array to 1-d split : split array to N pieces >>> mat.ravel() >>> np.split(mat1, 3, axis=1) array([1, 2, 3, 4, 5, 6]) [array([[1], [4]]), array([[2], [5]]), array([[3], [6]])] Array operations Array operations Most operations for numpy arrays are done element- Numpy has special functions which can work with array wise arguments – +, -, *, /, ** – sin, cos, exp, sqrt, log, ... >>> import numpy, math >>> a = np.array([1.0, 2.0, 3.0]) >>> a = numpy.linspace(-math.pi, math.pi, 8) >>> b = 2.0 >>> a >>> a * b array([-3.14159265, -2.24399475, -1.34639685, -0.44879895, array([ 2., 4., 6.]) 0.44879895, 1.34639685, 2.24399475, 3.14159265]) >>> a + b >>> numpy.sin(a) array([ 3., 4., 5.]) array([ -1.22464680e-16, -7.81831482e-01, -9.74927912e-01, >>> a * a -4.33883739e-01, 4.33883739e-01, 9.74927912e-01, array([ 1., 4., 9.]) 7.81831482e-01, 1.22464680e-16]) >>> >>> math.sin(a) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: only length-1 arrays can be converted to Python scalars I/O with Numpy Numpy provides functions for reading data from file and for writing data into the files Simple text files NUMPY TOOLS – numpy.loadtxt – numpy.savetxt – Data in regular column layout – Can deal with comments and different column delimiters Random numbers Polynomials The module numpy.random provides several functions Polynomial is defined by array of coefficients p for constructing random arrays p(x, N) = p[0] x N-1 + p[1] x N-2 + ... + p[N-1] – random: uniform random numbers Least square fitting: numpy.polyfit – normal: normal distribution Evaluating polynomials: numpy.polyval – poisson: Poisson distribution Roots of polynomial: numpy.roots – ... ... >>> import numpy.random as rnd >>> x = np.linspace(-4, 4, 7) >>> rnd.random((2,2)) >>> y = x**2 + rnd.random(x.shape) array([[ 0.02909142, 0.90848 ], >>> [ 0.9471314 , 0.31424393]]) >>> p = np.polyfit(x, y, 2) >>> rnd.poisson(size=(2,2)) >>> p array([ 0.96869003, -0.01157275, 0.69352514])

Jussi Enkovaara Martti Louhivuori Python in High-Performance - PDF document

Jussi Enkovaara Martti Louhivuori Python in High-Performance Computing CSC IT Center for Science Ltd, Finland January 29-31, 2018 import sys, os try: from Bio.PDB import PDBParser __biopython_installed__ = True except ImportError:

Jussi Enkovaara Martti Louhivuori Python in High-Performance Computing CSC IT Center for

RELEASE OF CONTENT THROUGH MECHANO- SENSITIVE GATES IN PRESSURISED LIPOSOMES Martti Louhivuori

NumPy Arno Proeme, ARCHER CSE Team aproeme@epcc.ed.ac.uk Attributed to Jussi Enkovaara &

Massively parallel electronic structure calculations with Python software Jussi Enkovaara

SciPy & other packages Arno Proeme, ARCHER CSE Team aproeme@epcc.ed.ac.uk Attributed to

GPAW Jussi Enkovaara CSC IT Center for Science, Finland Outline Introduction to GPAW

WCDMA Radio Access Network Jussi Katajala Jussi Katajala 2.2.2005 T-110.456 Next generation

Python for Data Science Overview of Python Why Python Installing Python Installing Python Modules

Directed Diffusion for Wireless Sensor Networking Jussi Nikander Jussi.Nikander@hut.fi 9th

Python Tidbits Python created by that guy ---> Python is named after Monty Pythons

Challenges of Parallel Processor Design Martti Forsell (VTT Oulu) Ville Lepp anen (University

Productizing Telephony and Audio in a GNU/Linux (Sailfish OS) Smartphone Martti Piirainen

Looping through Python data structures Justin Kiggins Product Manager DataCamp Python for

HPC Python Programming Ramses van Zon July 10, 2019 Ramses van Zon HPC Python Programming July

First Tool: Python! Introduction to python programming Gholamhossein Tavasoli @ ZNU First Tool:

Use cases for interactive multi-lingual multi-media information access? Jussi Karlgren, SICS

Using Treebanks tgrep2 Lecture 2: 07/12/2011 Using Corpora For discovery For evaluation

Towards precision neutrino physics Patrick Huber Center for Neutrino Physics at Virginia Tech

Introduction to Formal Languages Carl Pollard Department of Linguistics Ohio State University

TINA Sweep Policy Change The "Sweep", as a creation of the 1980's designed to

e3? ag5 Foqccdr' te.r'f : {o-vr .. Ortl'"r P "{ fq r^a f .'ovr I Par'.'cult e

LEARNING PRUNING POLICIES FOR LINEAR CONTEXT-FREE REWRITING SYSTEMS INF-PM-FPG Andy Pschel

to the Analysis of Chemical Networking Protocols Supervisors Prof. Marco Luise (University of

2014-2015 David Dewey, Director of Research Pindrop Security July 23, 2015 2015 Pindrop Security

Jussi Enkovaara Martti Louhivuori Python in High-Performance - PDF document

Jussi Enkovaara Martti Louhivuori Python in High-Performance Computing CSC IT Center for Science Ltd, Finland January 29-31, 2018 import sys, os try: from Bio.PDB import PDBParser __biopython_installed__ = True except ImportError:

Jussi Enkovaara Martti Louhivuori Python in High-Performance Computing CSC IT Center for

RELEASE OF CONTENT THROUGH MECHANO- SENSITIVE GATES IN PRESSURISED LIPOSOMES Martti Louhivuori

NumPy Arno Proeme, ARCHER CSE Team aproeme@epcc.ed.ac.uk Attributed to Jussi Enkovaara &amp;

Massively parallel electronic structure calculations with Python software Jussi Enkovaara

SciPy &amp; other packages Arno Proeme, ARCHER CSE Team aproeme@epcc.ed.ac.uk Attributed to

GPAW Jussi Enkovaara CSC IT Center for Science, Finland Outline Introduction to GPAW

WCDMA Radio Access Network Jussi Katajala Jussi Katajala 2.2.2005 T-110.456 Next generation

Python for Data Science Overview of Python Why Python Installing Python Installing Python Modules

Directed Diffusion for Wireless Sensor Networking Jussi Nikander Jussi.Nikander@hut.fi 9th

Python Tidbits Python created by that guy ---&gt; Python is named after Monty Pythons

Challenges of Parallel Processor Design Martti Forsell (VTT Oulu) Ville Lepp anen (University

Productizing Telephony and Audio in a GNU/Linux (Sailfish OS) Smartphone Martti Piirainen

Looping through Python data structures Justin Kiggins Product Manager DataCamp Python for

HPC Python Programming Ramses van Zon July 10, 2019 Ramses van Zon HPC Python Programming July

First Tool: Python! Introduction to python programming Gholamhossein Tavasoli @ ZNU First Tool:

Use cases for interactive multi-lingual multi-media information access? Jussi Karlgren, SICS

Using Treebanks tgrep2 Lecture 2: 07/12/2011 Using Corpora For discovery For evaluation

Towards precision neutrino physics Patrick Huber Center for Neutrino Physics at Virginia Tech

Introduction to Formal Languages Carl Pollard Department of Linguistics Ohio State University

TINA Sweep Policy Change The &quot;Sweep&quot;, as a creation of the 1980's designed to

e3? ag5 Foqccdr' te.r'f : {o-vr .. Ortl'&quot;r P &quot;{ fq r^a f .'ovr I Par'.'cult e

LEARNING PRUNING POLICIES FOR LINEAR CONTEXT-FREE REWRITING SYSTEMS INF-PM-FPG Andy Pschel

to the Analysis of Chemical Networking Protocols Supervisors Prof. Marco Luise (University of

2014-2015 David Dewey, Director of Research Pindrop Security July 23, 2015 2015 Pindrop Security

NumPy Arno Proeme, ARCHER CSE Team aproeme@epcc.ed.ac.uk Attributed to Jussi Enkovaara &

SciPy & other packages Arno Proeme, ARCHER CSE Team aproeme@epcc.ed.ac.uk Attributed to

Python Tidbits Python created by that guy ---> Python is named after Monty Pythons

TINA Sweep Policy Change The "Sweep", as a creation of the 1980's designed to

e3? ag5 Foqccdr' te.r'f : {o-vr .. Ortl'"r P "{ fq r^a f .'ovr I Par'.'cult e