what you need to learn now to decide what you need to learn next - - PowerPoint PPT Presentation

what you need to learn now to decide what you need to
SMART_READER_LITE
LIVE PREVIEW

what you need to learn now to decide what you need to learn next - - PowerPoint PPT Presentation

Scientific computing: An introduction to tools and programming languages what you need to learn now to decide what you need to learn next Bob Dowling rjd4@cam.ac.uk University Information Services Course outline Basic concepts


slide-1
SLIDE 1

Scientific computing: An introduction to tools and programming languages

what you need to learn now to decide what you need to learn next

“ ”

Bob Dowling rjd4@cam.ac.uk University Information Services

slide-2
SLIDE 2

Course outline

Good practice Specialist applications Programming languages Basic concepts

slide-3
SLIDE 3

Course outline

Good practice Specialist applications Programming languages Basic concepts

slide-4
SLIDE 4

Serial computing

Single CPU

slide-5
SLIDE 5

Parallel computing

Multiple CPUs Single Instruction Multiple Data MPI OpenMP

slide-6
SLIDE 6

Parallel computing courses

Parallel Programming: Options and Design Parallel Programming: Introduction to MPI

slide-7
SLIDE 7

Distributed computing

Multiple computers

slide-8
SLIDE 8

Distributed computing courses

HTCondor and CamGrid

slide-9
SLIDE 9

High Perfomance Computing course

High Performance Computing: An Introduction

slide-10
SLIDE 10

Floating point numbers

e.g. numerical simulations Universal principles: 0.1 → 0.1000000000001 and worse… >>> 0.1 + 0.1 0.2 >>> 0.1 + 0.1 + 0.1 0.30000000000000004

slide-11
SLIDE 11

Floating point courses

Program Design: How Computers Handle Numbers

slide-12
SLIDE 12

Text processing

e.g. sequence comparison text searching ^f.*x$

fabliaux factrix falx faulx faux fax feedbox … fornix forty-six fourplex fowlpox fox fricandeaux frutex fundatrix

“Regular expressions”

slide-13
SLIDE 13

Regular expression courses

Programming Concepts: Pattern Matching Using Regular Expressions Python 3: Advanced Topics (Self-paced) (includes a regular expressions unit)

slide-14
SLIDE 14

Course outline

Good practice Specialist applications Programming languages Basic concepts

slide-15
SLIDE 15

“Divide and conquer”

Complex problem Simple problem Less complex problem Partial solution Simple problem Simple problem Simple problem Simple problem Partial solution Partial solution Partial solution Partial solution Partial solution Complete solution

“divide” “conquer” “glue”

slide-16
SLIDE 16

“Divide and conquer” — the trick

Simple problem Partial solution Simple problem Simple problem Simple problem Simple problem Partial solution Partial solution Partial solution Partial solution

“conquer” No need to use the same tool for each “mini-conquest” !

slide-17
SLIDE 17

Example

Read 256 lines of data represented in a CSV format. Each line should have 256 numbers on it, but some are split into two lines of 128 numbers each. Run Aardvark’s algorithm on each 256×256 set of data. Write out the output as text in the same CSV format (exactly 256 numbers per line, every line) and plot a heat graph of the output to a separate file. Keep reading 256-line lumps like this until they’re all done.

“ ”

slide-18
SLIDE 18

Example

Keep reading 256-line lumps like this until they’re all done. Read 256 lines of data represented in a Each line should have 256 numbers on it, but some are split into two lines of 128 numbers each. Run Aardvark’s algorithm on each 256×256 set of data. Write out the (exactly 256 numbers per line, every line)

  • utput as text in the same CSV format

and plot a heat graph of the output to a separate file. CSV format.

slide-19
SLIDE 19

Example

Keep reading 256-line lumps like this until they’re all done. Aardvark’s algorithm 256×256 set of data.

  • utput

CSV format plot a heat graph Read 256 lines of data Each line will have 256 numbers on it. CSV format. Read Process Graphics Write file Repeat CSV Write file

slide-20
SLIDE 20

“Structured programming”

Split program into “lumps” Use lumps methodically “Lumps” ? Programs Use lumps methodically Functions Modules Units Do not repeat code

slide-21
SLIDE 21

Example: unstructured code

a_norm = 0.0 for i in range(0,100): a_norm += a[i]*a[i] b_norm = 0.0 for i in range(0,100): b_norm += b[i]*b[i] c_norm = 0.0 for i in range(0,100): c_norm += c[i]*c[i] … … Repetition !

slide-22
SLIDE 22

Example: structured code

def norm2(v): v_norm = 0.0 for i in range(0,100): v_norm += v[i]*v[i] return v_norm a_norm = norm2(a) b_norm = norm2(b) c_norm = norm2(c) … … Single instance

  • f the code.

Calling the function three times

slide-23
SLIDE 23

Structured programming

Once!

Write function Test function Time function Debug function Improve function

All good practice follows from structured programming

Import function

slide-24
SLIDE 24

Example: improved code

def norm2(v): w = [] for i in range(0,100): w.append(v[i]*v[i]) w.sort() v_norm = 0.0 for i in range(0,100): v_norm += w[i] return v_norm a_norm = norm2(a) b_norm = norm2(b) c_norm = norm2(c) … … Improved code No change to calling function

slide-25
SLIDE 25

Example: improved again code

def norm2(v): w = [item*item for item in v] w.sort() v_norm = 0.0 for w_item in w: v_norm += w_item return v_norm a_norm = norm2(a) b_norm = norm2(b) c_norm = norm2(c) … … Still no change to calling function

More flexible, “pythonic” code

slide-26
SLIDE 26

Example: best code

from library import norm2 a_norm = norm2(a) b_norm = norm2(b) c_norm = norm2(c) … … Somebody else’s code! No change to calling function

slide-27
SLIDE 27

Structured programming courses

Programming Concepts: Introduction for Absolute Beginners

slide-28
SLIDE 28

Libraries

Written by experts In every area Learn what libraries exist in your area Use them Save your effort for your research

slide-29
SLIDE 29

Example libraries

Scientific Python Numerical Algorithms Group Numerical Python

slide-30
SLIDE 30

Hard to improve on library functions

for(int i=0; i<N, i++) { for(int j=0; j<P, j++) { for(int k=0; k<Q, k++) { a[i][j] += b[i][k]*c[k][j] } } } for(int j=0; j<P, j++) { for(int k=0; k<Q, k++) {

This “trick” may save you 1%

  • n each matrix multiplication.

It is a complete waste of time!

slide-31
SLIDE 31

Hard to improve on library functions

( )( ) ( )

A11 A21 A12 A22 B11 B21 B12 B22 C11 C21 C12 C22

=

M1=(A11+A22)(B11+B22) M2=(A21+A22)B11 M3=A11(B12 B ‒

22)

M4=A22(B21 B ‒

11)

M5=(A11+A12)B22 M6=(A21‒A11)(B11+B12) M7=(A12 A ‒

22)(B21+B22)

C11=M1+M2‒M5+M7 C12=M3+M5 C21=M2+M4 C22=M1‒M2+M3+M6

Applied recursively: much faster

slide-32
SLIDE 32

Algorithms

Size of dataset / Required accuracy Time taken / Memory used vs. Algorithm selection makes or breaks programs. O(n2) notation

slide-33
SLIDE 33

Unit testing

Test each function individually Test each function’s common use “edge cases” bad data handling Catch your bugs early ! Extreme version: “Test Driven Development”

slide-34
SLIDE 34

Revision control

Code “checked in” and “checked out” Branches for trying things out Communal working Reversing out errors.

slide-35
SLIDE 35

Revision control

Two main programs: subversion git Starting from scratch? git Something in use already? Use it! github.com try.github.io free repository (for open source) free online training

slide-36
SLIDE 36

Integrated Development Environment

“All in one” systems: necessarily quite complex

Eclipse Most languages Qt Creator C++. JavaScript Visual Studio C++. C#, VB, F#, … NetBeans Java XCode Most languages

slide-37
SLIDE 37

make — the original build system

Command line tool Dependencies Build rules Used behind the scenes by many IDEs $ make target target target.c cc target.c -o target Makefile

slide-38
SLIDE 38

Building software courses

Unix: Building, Installing and Running Software

slide-39
SLIDE 39

Course outline

Good practice Specialist applications Programming languages Basic concepts

slide-40
SLIDE 40

Specialist applications

Often no need to program Or only to program simple snippets All have pros and cons

slide-41
SLIDE 41

Spreadsheets

Microsoft Excel LibreOffice Calc Apple Numbers

slide-42
SLIDE 42

Spreadsheets

Taught at school Taught badly at school! Easy to tinker Easy to corrupt data Easy to get started Hard to be systematic Very hard to debug Example: Best selling book, buggy spreadsheets!

slide-43
SLIDE 43

Excel courses

Excel 2010/2013: Introduction Analysing and Summarising Data Functions and Macros Managing Data & Lists

slide-44
SLIDE 44

Statistical software

slide-45
SLIDE 45

Statistical software

Stata: Introduction R: Introduction for Beginners SPSS: Introduction for Beginners SPSS: Beyond the Basics

slide-46
SLIDE 46

Mathematical manipulation

Matlab Mathematica Octave

slide-47
SLIDE 47

Mathamtical software courses

Matlab: Introduction for Absolute Beginners Linear Algebra Graphics (Self-paced)

slide-48
SLIDE 48

Drawing graphs

Manual or automatic?

slide-49
SLIDE 49

Courses for drawing graphs

Python 3: Advanced Topics (Self-paced) (includes a matplotlib unit)

slide-50
SLIDE 50

Course outline

Good practice Specialist applications Programming languages Basic concepts

slide-51
SLIDE 51

Computer languages

Interpreted Compiled Shell script C,C++, Fortran Perl Java Python What the system sees What you do What files get created Untyped Typed

slide-52
SLIDE 52

Shell script

Suitable for… gluing programs together “wrapping” programs small tasks Easy to learn Very widely used Unsuitable for… performance- critical jobs floating point GUIs complex tasks

slide-53
SLIDE 53

Shell script

#!/bin/bash job="${1}" … Several “shell” languages: /bin/sh /bin/csh /bin/bash /bin/tcsh /bin/ksh /bin/zsh /bin/sh

slide-54
SLIDE 54

Shell scripting courses

Unix: Introduction to the Command Line Interface (Self-paced) Simple Shell Scripting for Scientists Simple Shell Scripting for Scientists — Further Use

slide-55
SLIDE 55

“Further shell scripting”?

Python!

✘ ✔

slide-56
SLIDE 56

High power scripting languages

Python Perl #!/usr/bin/python import library … #!/usr/bin/perl use library; …

Both can call out to libraries written in other languages. Both have extensive libraries of utility functions.

slide-57
SLIDE 57

Perl

The “Swiss army knife” language Suitable for… text processing data pre-/post-processing small tasks CPAN: Comprehensive Perl Archive Network Widely used Bad first language Very easy to write unreadable code “There's more than

  • ne way to do it.”

Beware Perl geeks

slide-58
SLIDE 58

Python

“Batteries included” Suitable for… text processing data pre-/post-processing small & large tasks Built-in comprehensive library of functions Scientific Python library Excellent first language Easy to write maintainable code The “Python way” Code nesting style is “unique” Very widely used

slide-59
SLIDE 59

Python courses

Python 3: Introduction for Absolute Beginners Python 3: Further Topics (self paced) Python 3: Introduction for Those with Programming Experience

slide-60
SLIDE 60

Compiled languages

No specialist system and scripts are not fast enough Library requirement with no script interface C C++ Fortran Java Compiled language

Use only as a last resort

slide-61
SLIDE 61

Compiling, linking, running

source code files

  • bject files

executable execution

fubar.c main() pow() zap() snafu.c pow() zap() printf()

compilation linking run-time

fubar.o main() pow() zap() snafu.o pow() zap() printf()

text files machine code files

fubar main() pow() zap() printf() fubar

machine code file

libc.so.6 … printf() …
slide-62
SLIDE 62

No need to compile whole program

Python script

Critical function

slide-63
SLIDE 63

No need to write the whole program in a compiled language

Python script Python module

function.c

C, C++ or Fortran SWIG

function.f

f2py

slide-64
SLIDE 64

Fortran

The best for numerical work Excellent numerical libraries Unsuitable for everything else Very different versions: 77, 90, 95, 2003

slide-65
SLIDE 65

Fortran course

Fortran: Introduction to Modern Fortran Three full days

slide-66
SLIDE 66

C

Excellent libraries Superceded by C++ for applications The best for Unix (operating system) work Memory management

slide-67
SLIDE 67

C++

Standard template library Very hard to learn well Extension of C Object oriented General purpose language

slide-68
SLIDE 68

C++ books

“Thinking in C++, 2nd ed.” Eckel, Bruce (2003) (two volumes: 800 and 500 pages!) “Programming: principles and practice using C++” Stroustrup, Bjarne (2008) harder but better for scientific computing

slide-69
SLIDE 69

From the intro to Stroustrup’s book

“How long will [leaning C++ from scratch using this book] take? … maybe 15 hours a week for 14 weeks.”

slide-70
SLIDE 70

C++ course

C++: Programming in Modern C++ 12 lectures, 3 terms, significant homework Uses Stroustrup’s book

slide-71
SLIDE 71

Java

Some poorly thought out libraries Multiple versions: Use >= 1.6 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 Object oriented General purpose language Much easier to learn and use than C++

slide-72
SLIDE 72

Java courses

Object oriented programming CL lectures (also classes, ask at the CL)

slide-73
SLIDE 73

Scientific Computing

training.cam.ac.uk/ucs/theme/scientific-comp scientific-computing@ucs.cam.ac.uk

www.ucs.cam.ac.uk/docs/course-notes/unix-courses