[PPT] - Programming in Python Lecture 2: Sequences Michael Schroeder Sven PowerPoint Presentation

SLIDE 1

1

Programming in Python

Michael Schroeder Sven Schreiber

sven.schreiber@tu-dresden.de

Updates by Andreas Henschel

Lecture 2: Sequences

Slides derived from Ian Holmes, Department of Statistics, University of Oxford

SLIDE 2

2

Overview

Types of sequences and their properties

– Lists, Tuples, Strings, Range

Building, accessing and modifying sequences
List comprehensions
File operations

SLIDE 3

3

Types and Properties of Sequences

SLIDE 4

4

Lists vs tuples

Both are sequences (used to store collections of objects)
Tuples are immutable, Lists mutable
List are more flexible
Tuples provide better performance
Rule of thumb: Lists for similar kind of objects, tuples for different

l = [1,2,3,4] l2 = [‘Apple’, ‘Banana’, ‘Orange’] t = (‘sebastian’, ‘m’, 28) t2 = (‘motif’, ‘ATTCG’, ‘E44’)

Construction (Syntax) Accessing Elements

l[0] t[0]

1 sebastian

l.append(3) l[1] = 5 t.append(3) t[1] = 5 l3 = l+[3,2] t3 = t + (‘phd’,’biotec’)

Adding/modifying Elements Concatenating immutable !

SLIDE 5

5

Range

Used to provide collections of sequent integer numbers
Allow iteration with loops
Numbers are not stored in memory, but just generated when

needed (while looping)

Saves time and memory with larger number sets

for x in range(10000): print(x) 1 2 3 ... ... 9998 9999

Excluding last number!

SLIDE 6

6

Working with Lists

SLIDE 7

7

Lists

A list is a collection of values/objects We can think of the above as a container with 4 entries

nucleotides = ['a', 'c', 'g', 't'] print("Nucleotides: ", nucleotides) Nucleotides: ['a', 'c', 'g', 't']

a c g t

element 0 element 1 element 2 element 3 the list is the collection

f all four elements

Note that the element indices start at zero!

SLIDE 8

8

List literals

There are several ways to create or obtain lists.

a = [1,2,3,4,5] print("a = ",a) b = ['a','c','g','t'] print("b = ",b) c = list(range(1,6)) print("c = ",c) d = "a c g t".split() print("d = ", d) a = [1,2,3,4,5] b = ['a','c','g','t'] c = [1,2,3,4,5] d = ['a','c','g','t'] This is the most common: a comma- separated list, delimited by squared brackets

SLIDE 9

9

Accessing lists

To access list elements, use square brackets e.g. x[0] means "element zero of list x"

Remember, element indices start at zero!
Negative indices refer to elements counting from the

end e.g. x[-1] means "last element of list x"

x = ['a', 'c', 'g', 't'] i= 2 print(x[0], x[i], x[-1]) a g t

SLIDE 10

10

List operations

You can sort and reverse lists...
You can add, delete and count elements

x = ['a', 't', 'g', 'c'] print("x =",x) x.sort() print("x =",x) x.reverse() print("x =",x) x = ['a', 't', 'g', 'c'] x = ['a', 'c', 'g', 't'] x = ['t', 'g', 'c', 'a'] nums = [2,2,5,2,6] nums.append(8) print(nums) print(nums.count(2)) nums.remove(5) print(nums) [2,2,5,2,6,8] 3 [2,2,2,6,8]

SLIDE 11

11

More list operations

>>> x=[1,0]*2 >>> x [1, 0, 1, 0] >>> x.pop() >>> x [1, 0, 1] >>> x+=x >>> x [1, 0, 1, 1, 0, 1] >>> x.index(0) 1

pop() obtains and removes the last element of a list multiplying lists concatenating lists with +

r +=

index(..) searches for the first occurrence of an element

SLIDE 12

12

Example: Reverse complementing DNA

dna = "accACgttAGgtct".lower() replaced = dna.replace("a",“_a") \ .replace("t","a").replace(“_a","t") \ .replace("g",“_g").replace("c","g") \ .replace(“_g", "c") replacedList = list(replaced) replacedList.reverse() print("".join(replacedList))

agacctaacgtggt Start by making string lower case

again. This is generally good practice

Convert back to string using join Replace 'a' with 't', 'c' with 'g', 'g' with 'c' and 't' with 'a'

A common operation due to double-helix symmetry of DNA

Convert to list and reverse

SLIDE 13

13

Taking a slice of a list

The syntax x[i:j] returns a list containing

elements i,i+1,…,j-1 of list x

nucleotides = ['a', ’g’, 'c', 't'] print(nucleotides) print(nucleotides[0:2]) # nucleotides[:2] also works print(nucleotides[2:4]) # nucleotides[2:] also works print(nucleotides[-2:]) # takes last two elements print(nucleotides[::2]) # takes every second print(nucleotides[::-1]) # obtains reversed list ['a', 'g', 'c', 't'] ['a', 'g'] ['c', 't'] ['c', 't'] [‘a', ‘c'] [‘t', ‘c', ‘g', ‘a']

SLIDE 14

14

Lists and Strings

A string can be translated into a list of strings and

– Using the split method: string.split(separator)

A list of strings can be translated into one string

– Using the join method: separator.join(list)

sentence = ‘This is a complete sentence.’ print(sentence.split()) [‘This’, ‘is’, ‘a’, ‘complete’, ‘sentence’] datarow = ‘Apples,Bananas,Oranges’ print(datarow.split(‘,’)) [‘Apples’,’Bananas’,’Oranges’] cities = [‘Dresden’, ‘Munich’, ‘Hamburg’, ‘Cologne’] print(‘ -> ’.join(cities)) ‘Dresden -> Munich -> Hamburg -> Cologne’

SLIDE 15

15

List Comprehensions

SLIDE 16

16

What are list comprehensions?

Very concise way to build and transform lists
Typically replaces a for loop and an if-construction
Used very often in Python
Syntax: [expr(var) for var in sequence if condition]

newlist = [] for x in range(1,11): if x % 2: newlist.append(x**2) Verbose construction of list [1,9,25,49,81] newlist = [x**2 for x in range(1,11) if x % 2] Construction with list comprehension Squares of all odd numbers between 1 and 10

SLIDE 17

17

Examples: List comprehensions

sentence = ‘I like MySQL but not Python’ print([(w.lower(), len(w)) for w in sentence.split()])

[(i, 1), (like, 4), (mysql, 5), (but, 3), (not, 3), (python, 6)]

numbers = (1,0,-1,6,3,-2,3,4) sum = sum([x for x in numbers if x >0]) print(sum)

17

Sum up all positive integers in a tuple

SLIDE 18

18

File IO

SLIDE 19

Opening and reading a file

f = open(‘myfile.txt’, ‘r’) for line in f: if not line.startswith(‘#’): print(line) f.close() #Old number 1234 # New number 5555 # Test 1 1234 5555 1 Returns file handler Loop variable Linewise iteration over file! File mode (r, w, a, ...) with open(‘myfile.txt’, ‘r’) as f: for line in f: if not line.startswith(‘#’): print(line) Shorter and better form File is closed after block!

SLIDE 20

20

Example: FASTA format

A format for storing multiple named sequences
This file contains 3' UTRs

for Drosophila genes CG11604 CG11455 CG11488

>CG11604 TAGTTATAGCGTGAGTTAGT TGTAAAGGAACGTGAAAGAT AAATACATTTTCAATACC >CG11455 TAGACGGAGACCCGTTTTTC TTGGTTAGTTTCACATTGTA AAACTGCAAATTGTGTAAAA ATAAAATGAGAAACAATTCT GGT >CG11488 TAGAAGTCAAAAAAGTCAAG TTTGTTATATAACAAGAAAT CAAAAATTATATAATTGTTT TTCACTCT

Name of sequence is preceded by > symbol NB sequences can span multiple lines fly3utr.txt

SLIDE 21

21

Example: FASTA format

with open(‘fly3utr.txt’, ‘r’) as f: for line in f: if line.startswith(‘>’): print(line[1:]) CG11604 CG11455 CG11488

What if we want to show the length of each sequence record?

>CG11604 TAGTTATAGCGTGAGTTAGT TGTAAAGGAACGTGAAAGAT AAATACATTTTCAATACC >CG11455 TAGACGGAGACCCGTTTTTC TTGGTTAGTTTCACATTGTA AAACTGCAAATTGTGTAAAA ATAAAATGAGAAACAATTCT GGT >CG11488 TAGAAGTCAAAAAAGTCAAG TTTGTTATATAACAAGAAAT CAAAAATTATATAATTGTTT TTCACTCT

SLIDE 22

22

Example: FASTA format

name = None length = None with open('fly3utr.txt', 'r') as f: for line in f: line = line.rstrip() if line.startswith('>'): # None -> False if name: print(name, length) name = line[1:] length = 0 else: length += len(line) print(name, length) CG11604 58 CG11455 83 CG11488 69

>CG11604 TAGTTATAGCGTGAGTTAGT TGTAAAGGAACGTGAAAGAT AAATACATTTTCAATACC >CG11455 TAGACGGAGACCCGTTTTTC TTGGTTAGTTTCACATTGTA AAACTGCAAATTGTGTAAAA ATAAAATGAGAAACAATTCT GGT >CG11488 TAGAAGTCAAAAAAGTCAAG TTTGTTATATAACAAGAAAT CAAAAATTATATAATTGTTT TTCACTCT

SLIDE 23

23

Summary

Strings, lists, tuples and ranges are all sequences
Lists (usually for elements of same type)

– More flexible, more memory consumption

Tuples (usually store elements of different types)

– Immutable, less memory consumption

Ranges for fast numeric iteration

– Least memory consumption

List comprehension as concise way to transform sequences
Convert strings into lists and vice versa with join and split
File handlers provides line-wise iteration